Detecting cells - Cell segmentation - Cancer cell segmentation and data extraction

4. Cell segmentation

4.5 Detecting cells

After the preprocessing, the goal is to detect actual cell borders. The important part is to separate background from foreground, but detecting individual cells is usually the challenge. Most common methods are presented here, i.e., Otsu thresholding and watershed. The main appeal with these is how they are easy to understand and implement.

4.5.1 Otsu thresholding

Thresholding is the process of separating values into categories based on their value.

Assume we have a set of values A = [1,1,1,2,2,3,4,5,6,7,8,9,10,10,10,12,14]

and we know that values can exist in one of two categories, low value and high value, separated by a threshold. Several different possible values are options for determining the threshold for this task. We could arbitrarily choose any value from 3 to 9 and be relatively happy with our choicee.g., a threshold 5.5 would mean that low groupA₁ = [1,1,1,2,2,3,4,5]and that high group A₂ = [6,7,8,9,10,10,10,12,14].

Choosing the value arbitrary is not however necessary, as there is the standard method for choosing the threshold.

Otsu thresholding has been the standard method in the field for over 30 years (Meijering 2012). It is a method for discovering optimal thresholds from a his-togram. In essence, the goal of Otsu thresholding is to minimize variance between the two groups. However, a computationally more efficient method is derived at Otsu (1979). The method separates the background and foreground. If cells are not clustered this alone is enough to produce segmentation assuming there is no back-ground noise. The method is so standard that it can be considered as preprocessing.

Here we present only the required formulas for it.

Otsu thresholding method chooses thresholds so that there is minimal difference between points in each group. The goal is to find optimal value k^* which is one that maximizes the goodness of threshold. For the sake of simplicity explanation values are normalized and regarded as the probability function. Let us define that different values in our dataset are[1,2,3, ..., L]and then probabilty for value being a is

p_a=N_a/N (4.7)

whereN_a is amount of values at level a and N is a total number of values. We also

4.5. Detecting cells 32

(a) (b) (c)

(d) (e) (f)

Figure 4.7 Demonstration of Otsu thresholding (a) Green fluorescence image of cos7.

(b) masked image, white being fore-ground and black being background. (c) two level mask, gray can be interpreted as background or as foreground depending on goal. (d) histogram of green channel. (e) histogram of mask divided into two groups with minimized variance.

(f ) histogram of mask divided into three groups with minimized variance.

define that

µ(k) =

i=1

(ip_i) (4.8)

and

ω(k) =

i=1

(p_i) (4.9)

from which we can create our goodness criterion formula:

g = [µ(L)ω(k)−µ(k)]²

ω(k)[1−ω(k)] (4.10)

Where g is our goodness criterion. We can then apply this to our example data set A. We can determine that as a histogram of values would be [3,2,1,1,1,1,1, 1,1,3,0,1,0,1] and that N = 17. Unique values would be [1,2,3, ...,13,14]. With these we can get goodness values and sort each k into an ascending order based on goodness values: [12,13,1,10,11,2,9,3,8,4,7,5,6]. We can see that 6 is the largest value, so we determine that division should be A₁ = [1,1,1,2,2,3,4,5,6]

and A₂ = [7,8,9,10,10,10,12,14]. This can be considered the optimal solution.

4.5. Detecting cells 33 Let us go through another example. Figure 4.7(a) is our image of interest. In more accurately its green channel contains signal we are interested of. Figure 4.7(d) shows the distribution of green channel in logarithmic scale. Applying Equation 4.10 we can get histogram shown in Figure 4.7(e) where the background is marked as light gray and foreground as white. The image produced by this is shown in Figure 4.7(b) where high and low values are marked with white and black, respectively.

For this specific image, we can see that there might actually be two different fore-ground groups we want to capture: high and medium intensity. When we find which two thresholds would minimize the variance between three groups we get Fig-ure 4.7(c) and related histogram FigFig-ure 4.7(e). This could be continued to even more threshold levels, but eventually, the problem could regress back to our original problem, only we would be choosing which groups to mark as the foreground and which mark as the background.

It is possible that with certain images Otsu thresholding alone can produce sufficient results if each cell is separated from each other by background. If this is case images can be considered to be extremely easy to segment. In typical cell segmentation problem, there are cells that are extremely close to each other and form a cluster, when thresholding can be used only as preprocessing.

4.5.2 Watershed

Watershed has been a common tool in image segmentation for a long time (Meyer

& Beucher 1990, Vincent & Soille 1991, Moga 1997). It is a common segmentation method and has applications in several fields e.g., industrial, biomedical, and com-puter vision (Moga 1997, p.3). For example, the watershed has been recently used in industrial solid waste segmentation (Bao & Yu 2015), identification of lung cancer cell from computed tomography images (Logesh Kumar et al. 2016), and computer vision counting task (Duan et al. 2015). In the watershed, topographic properties of the image are in consideration.

Algorithm emulates how raining water would form lakes, to form borders where the bodies of water would meet. Lowest points in the image are filled with water and these ponds are slowly filled. If two ponds would connect dam is built between them.

These dams form boundaries of objects and each pond is considered as an object.

For example, lets consider one-dimensional curve, as presented in Figure 4.8(a). At beginning of watershed, we start to increase water level, solitary water areas start to form, as seen in Figure 4.8(b). We want to detect every area as part of some lake.

4.5. Detecting cells 34

(a)

water LakeB LakeC

(b)

water LakeA

(c)

water

A B C

(d)

Figure 4.8 In one dimensional watershed (a) a starting point would be a curve (b) then water would start to rise and form separate lakes and (c) when two separate bodies of water would meet border would be formed between. (d This would continue until all areas are segmented and we would have distinctive lakes defined, in here A, B, and C

When the water level rises so high, that two bodies of water would become one, this is spot is marked as a border (see Figure 4.8(c)). This marking of borders will mark each unique body of water.

Watershed can be very over-segmenting (Wu et al. 2008, p. 140) when all local minimums are chosen as ponds. This can be seen in lake A in Figure 4.8(d), which could be considered as part of lake B. Lake B and C being larger ones, it might make sense to call those lakes, and Lake A could be considered only as part of Lake B. However, it might be that in the valley segmented, the only single lake should be detected Some other methods can be used to mitigate over-segmentation. Denoising methods can remove small lakes such lake A and merging methods could merge larger lakes such as lake B and C together. Thresholding could be used to remove change that we would at any point consider B and C same entities (By capping our water level to what it is in Figure 2).

It might also be the case, that we are searching for mountaintops, not valleys. This

4.6. Postprocessing over-segmentation methods 35

In document Cancer cell segmentation and data extraction (sivua 41-45)