Automatic Pectoral Muscle Segmentation in Full-Field Digital Mammography Images Using Log-Gabor Filters

(1)

LUKAS SCHEER

AUTOMATIC PECTORAL MUSCLE SEGMENTATION IN FULL- FIELD DIGITAL MAMMOGRAPHY IMAGES USING LOG- GABOR FILTERS

Bachelor of Science thesis

Examiner: Dr. Said Pertuz

(2)

I

ABSTRACT

LUKAS SCHEER: Automatic Pectoral Muscle Segmentation in Full-Field Digital Mammography Images Using Log-Gabor Filters

Tampere University of Technology Bachelor of Science thesis, 21 pages May 2018

Master’s Degree Programme in Information Technology Major: Signal Processing

Examiner: Dr. Said Pertuz

Keywords: Mammogram, Pectoral Muscle, Segmentation, MLO, Log Gabor, Wavelet

Mammography image segmentation is one of the first steps taken by a computer- aided diagnosis system, to find the region of interest for further processing and risk assessment. Failure to do so properly could lead to false positives (identifying cancer when there is none) or false negatives (failure to identify cancer), both of which are not good outcomes.

Although many unique segmentation methods exist already, many of them are either slow to compute or not able to adapt to the variety of different mammograms. In this thesis a new method is proposed, based on log Gabor filters. In the method, two log Gabor filter banks are used for the extraction of the muscle edge: a preliminary filter bank to identify the approximate orientation of the muscle edge, then the primary filter bank is used for identifying the pectoral muscle edge. The method is simple, quite fast and works well in a variety of mammograms.

(3)

II

PREFACE

The research work in this thesis was conducted for OpenBreast, an open framework currently in development between Tampere Univerity of Technology and Universi- dad Industrial de Santander (Colombia). Additionally, the mammograms used for the evaluation were collected with the collaboration of Tampere University Hospi- tal for the purpose of clinical validation of OpenBreast. I thank all the involved organizations for making this project possible.

Mostly, I would like to express my sincere gratitude to Dr. Said Pertuz Arroyo for providing me with this interesting topic, and especially for his excellent guidance and unconditional help throughout the project.

Tampere, 1.5.2018 Lukas Scheer

(4)

III

LIST OF FIGURES

2.1 Terminology of mammograms used in this thesis . . . 3

2.2 Different kinds of pectoral muscle edges . . . 4

2.3 Showcasing typical difficulties . . . 5

3.1 Illustrating the preprocessing steps . . . 9

3.2 Creation of the pectoral muscle mask . . . 10

3.3 Illustrating the edge extraction results . . . 11

3.4 Visualization of the filter banks . . . 13

3.5 Illustrating the post-processing stages . . . 16

5.1 Illustrating partially incorrect results . . . 21

5.2 Illustrating more extraction examples . . . 21

(6)

V

LIST OF ABBREVIATIONS AND SYMBOLS

CADx Computer-aided diagnosis

CC Craniocaudal view; a mammogram taken from head-to-foot view CLAHE Contrast-limited adaptive histogram equalization; a method that

locally evens the distribution of pixel intensities

DC Component of the Fourier transform with no frequency

FFDM Full-Field Digital Mammogram; a mammogram taken with a digital detector

MIAS Mammographic Image Analysis Society database; a popular publicly available digitized film mammography image database

MIC Maximum inscribed circle; the largest circle that fits a given space MLO Mediolateral oblique view; a mammogram taken at approximately

45^◦ angle from the side

ROI Region of interest; a sample that is selected for a particular purpose

Φ(x, y) Edge map

ϕ(x, y) Phase angle matrix

φ(x, y) Edge mask (thresholded phase) ρ(x, y) Pectoral muscle mask

I(x, y) Image matrix

If(x, y) Filtered image matrix

Im(x, y) Odd-symmetric part of the filtered image (imaginary) J(x, y) Shifted and weighted filtered image matrix

LG(F, θ) Log Gabor filter (frequency domain) lg(x, y) Log Gabor filter (spatial domain) m(x, y) Magnitude matrix

Re(x, y) Even-symmetric part of the filtered image (real)

σ Modifying parameter

θ Polar angle

ω Activation vector

F Radial frequency

f Frequency

(7)

1

1. INTRODUCTION

Breast cancer is the second leading cause of cancer related death in women [1].

However, early detection and treatment of breast cancer substantially increase the chance of survival [2]. It is therefore imperative to develop ways to aid early detection. Mammography is a commonly used screening method for detection of breast cancer, for example in Finland, screening mammography is typically performed ev- ery other year on women at risk. This has led to a massive growth in the number of mammograms. Nowadays computer-aided diagnosis (CADx) is used by doctors in order to facilitate the analysis and interpretation of mammograms [3].

Mammography image segmentation is usually the first step taken by any CADx system, in order to identify the areas that are relevant to the task at hand. Here, the aim of a CADx system is to automatically find and analyze regions of interest (ROI) in the mammogram, where cancer is usually found. For this purpose, MLO (mediolateral oblique) viewed mammograms are especially useful as they capture more of the critical breast tissue, compared to CC (craniocaudal) viewed mammograms [4].

Issues in automatic diagnosis of MLO mammograms are especially caused by the pectoral muscle, which appears in the mammogram as a dense triangular region in the top corner, hindering the diagnosis and potentially leading to a false positive diagnosis [5]. Therefore, accurate segmentation of the pectoral muscle is an important step that will improve the performance and reliability of such systems. However, reliably finding the edge of the pectoral muscle is not always easy. Problems are caused by, among other things, inconsistencies in density, overlapping tissue, high density of the glandular tissue, and varying shapes of pectoral muscle edges.

In this thesis a new method for pectoral muscle segmentation is proposed. The method uses log Gabor filters to find real and illusory edges even in dense mammograms, where the edge is partially obscured by fibroglandular tissue. The method adapts to accommodate to the various shapes and angles of the pectoral muscle.

The proposed method solves the usual problem of intensity based methods, that have difficulties segmenting the obscured parts of the pectoral muscle edge.

(8)

2

2. PRIOR RELATED WORK

This chapter presents the current state of the art. The basic terminology related to mammograms that is used in this thesis is illustrated in Figure 2.1. To better understand the typical difficulties faced in pectoral muscle segmentation, Figure 2.2 illustrates the variety of muscle edge shapes, and Figure 2.3 illustrates some of the obstacles that a robust method needs to take into account.

2.1 Overview of Other Methods

Many unique solutions have been proposed for pectoral muscle segmentation during the past decades. Here, some of the different methods are presented by breaking them down into three distinct categories, similarly as in [6]: Intensity, texture and soft computing methods.

Intensity based methods use the intensities of the mammogram generically as a way to identify the pectoral muscle. These methods can be further divided into global, growing and gradient based methods. Global methods use different approached to separate the pectoral muscle based on image intensities, such as histogram analysis [7] or the watershed transform [8]. Global methods are typically efficient, but may preform poorly in dense or fibrous mammograms. Growing methods measure the connection between neighboring regions to decide where the segment should grow to segment the pectoral muscle, for example [9]. Growing methods work typically better when there are gradually changing intensities, however they are sensitive to abrupt discontinuities that might prematurely end the process. Gradient based methods use the change in intensity over distance to determine the segmentation, for example [10]. Gradient based methods are typically robust, but lack in adaptability, often only supporting a straight line approximation.

Texturebased methods leverage different texture analysis techniques, such as wavelets [11], polynomial probabilities [12], and active contours [13]. Texture based methods have the most variance among them, both in terms of performance and methodology.

These methods are typically complex and relatively slow to compute.

(9)

2.2. Difficulty of Comparing with the State of the Art 3

Figure 2.1Displaying the terminology related to mammograms that is used in this thesis.

The image is a typical MLO viewed mammogram.

The last category are soft computing methods, including those based on machine learning [14] and fuzzy logic [15]. There is not much publicly available work in this category, however some resent studies have shown promise in terms of accuracy [14]. The difficulties are usually related to making the models work in a variety of situations.

2.2 Difficulty of Comparing with the State of the Art

Comparing with state of the art methods is difficult for several reasons. Notably, the technology used in mammography has developed quickly. The switch from film to digital imaging is quite recent as well, for example most of the work cited in the previous section has been evaluated on the MIAS database, which is a set of scanned film mammograms. The quality and nature of the mammograms depend on the type and vendor that was used for the screening, so one method designed for a certain screening type or vendor may not work well with another. This makes it difficult to compare different methods, and also makes it difficult to advance on the

(10)

2.2. Difficulty of Comparing with the State of the Art 4

Figure 2.2The distinct muscle edge types that are usually encountered. A)Straight edge, ideal angle. B)Convex edge. C) Concave edge. D) Complex edge. E) Shallow angle. F) Steep angle.

state of the art. Only a handful of the publicly available work has been done using modern full-field digital mammograms.

Another problem is the lack of standardized scoring methods. Often the images are classified subjectively into those that are ’acceptable’ and those that are ’unac- ceptable’, possibly including intermediate categories, which is difficult to relate to, because what is acceptable depends on the rest of the pipeline. The quantifiable scoring methods that are typically used include FP (false positive) and FN (false negative) rates, Dice’s coefficient, or the Jaccard index. The problem with those scoring methods is that the resulting score is normalized based on the area of the pectoral muscle (see Equations 4.1 and 4.2), which in many cases does not give a sound assessment of the quality of the segmentation. For example, if the pectoral muscle is oriented at a steep angle, and has therefore a large surface area, then there is more tolerance to error compared to a case where the pectoral muscle is at a shallow angle, and has therefore a smaller surface area, even if the length of the muscle edge is identical in both cases. Consider the edges shown in Figure 2.2A and

(11)

2.3. Literature Highlight 5

Figure 2.3Typical obstacles faced in different mammograms. The pectoral muscle is seen in the top left corner of each example. A)Edge partially obscured by fibroglandular tissue.

B)Edge intercepted by large vessels. C)Vertical fold in the muscle. D)Horizontal fold in the muscle. E) Faint edge. F) Most of the muscle edge obscured by fibroglandular tissue.

G) Faint edge caused by particularly dense fibroglandular tissue.

E. If they were segmented with identical accuracy, the score of the segment in E would get a lower value, simply because the surface area of the muscle is smaller.

2.3 Literature Highlight

The work of this thesis is conducted for OpenBreast, an open framework currently in development between Tampere Univerity of Technology and Universidad Industrial de Santander (Colombia), for the computerized analysis of mammograms for breast cancer risk assessment, where mammography image segmentation is an important step of the pipeline. The current implementation for pectoral muscle detection works similarly as in [16, 17]. This method uses a straight line approximation for the segmentation. The method first applies a Canny edge detection on a morphologically dilated and Gaussian blurred image. Subsequently, each edge pixel is represented into a parametric Hough accumulator space. Finally, the pectoral muscle edge is detected as the local maxima of the Hough space histogram, based on [18]. This method is used here as a baseline for measuring the performance of the proposed

(12)

2.3. Literature Highlight 6 method.

In addition to the original algorithm used in OpenBreast, I selected the most promis- ing method in the literature as a baseline. In particular, I selected [9], since it was very recently reported to outperform other state of the art methods. In that work, Taghanaki et al. propose a method based on geometric rules and region growing.

The method is broken down into three steps: first find the breast contour, then use geometric rules to locate the initial area of the pectoral muscle, and lastly, use a modified region growing algorithm to account for the different shapes of pectoral muscles:

In the first step they use CLAHE (Contrast Limited Adaptive Histogram Equaliza- tion) to improve the contrast of the image. A copy of the image is then binarized using a fixed threshold, and the breast contour is then identified by using an edge detection algorithm. In the second step the maximum inscribed circle (MIC) is computed within the contour found in the first step. The starting point of the pectoral muscle is identified from the mean gray levels at the top of the mammogram.

Then, a tangent is drawn from the MIC to the starting point at the top to form the region that should contain the pectoral muscle. Two support lines are offset from the tangent line to provide the ROI for the muscle edge. In the last step a modified region growing algorithm is used for the final segmentation to account for curved edges, where the muscle edge starting point is used as the seed for the algorithm.

In preliminary experiments, I identified some limitations of the method in [9]. To begin with, the geometric rules did not always provide a good initial ROI for the muscle edge. The size of the MIC depends on the shape, size and orientation of the breast. Especially in cases where the breast was larger or smaller than average, the MIC tended to provide poor support for the tangent line causing the line to slope too fast or too little. Also, in a few examples where the breast was small enough, the MIC was placed too high to begin with. Additionally, there were difficulties in accurately finding the starting point of the pectoral muscle from the mean at the top. In many cases there was either another suitable starting point before the correct one, or the edge was too obscure to stand out in the mean, even with CLAHE applied. The region growing algorithm did not work that well either, although it is possible that my implementation was not exact. However, it was often intercepted either by a highly contrasted vessel or glandular tissue, or by the pectoral muscle being cut by the support line, leaving the bottom inaccurately segmented. Other problems were caused by discontinuities in the muscle in some of the mammograms.

In conclusion, there were too many issues for me to continue pursuing with this method.

(13)

2.3. Literature Highlight 7 Another intensity based method was considered as well. In [19], Shrivastava et al. propose a sliding window method. In that method a window is slided across a morphologically modified mammogram until one of two conditions is met: either the intensity of the window becomes too low, or the corner-to-corner intensity difference of the window becomes too big. Then the window is lowered and the process starts again until the bottom is reached. The method is very fast and the results are quite accurate in ideal cases, where the breast consists of primarily fatty tissue, however the method quickly fails when, for example, there are discontinuities in the pectoral muscle. The method is also unable to account for the varying intensities present among full-field digital mammograms.

(14)

8

3. PROPOSED METHOD

In this chapter the procedure of the proposed method is explained in detail. The steps are roughly divided into three sections, starting from the preprocessing of the mammogram, then moving to theextraction of the edge map of the pectoral muscle, and ending with post-processing steps that identifies the muscle edge from the edge map.

3.1 Preprocessing

While it is difficult to account for all types of mammograms, preprocessing can be utilized in order to facilitate subsequent tasks. In this work, preprocessing is performed in four steps: size standardization, histogram equalization, background extraction and mask initialization. Three of the steps are illustrated in Figure 3.1.

Mammograms come in a variety of sizes. Additionally the contrast of the mammogram depends on the vendor and type of screening method. As the first preprocessing step,size standardization, the images are resized to a standard spatial resolution of 0.25mm/pixel. The resizing is done using the bilinear method. Subsequently, histogram equalization is performed using CLAHE, in the same way as it is done in [9]:

The distribution of the histogram is set to ’uniform’, clipping is limited to 0.01, and the number of tiles used is 2x2.

The background of mammography images is a large uniform low intensity region around the breast. Sometimes there are also scanning labels or artifacts in the background, but otherwise it should be void. For background extraction, the mammogram is linearly decimated to 32 distinct values, in order to eliminate any un- evenness in the background. The 25% percentile of the intensities is then used as a threshold to binarize the mammogram, and only the largest region is kept. The holes in the binary region are filled, including holes that are open against the top and left edge of the mammogram. The edge of the contour is then smoothed using morphological operations.

As a final preprocessing step, a rough estimate of the ROI of the pectoral muscle

(15)

3.2. Extracting the Pectoral Muscle Edge 9

Figure 3.1 Illustrating the primary preprocessing steps. A) Original mammogram standardized to 0.25mm/pixel. B)The mammogram after adaptive histogram equalization. C) Initial pectoral muscle ROI.

is generated, namely mask initialization. It is used for the elimination of irrelevant information during the extraction of the muscle edge. The mask also prevents over- segmentation and reduces computational overhead. The construction of the pectoral muscle mask is illustrated in Figure 3.2. In that figure, the initial pectoral muscle mask is defined as the area enclosed by linesOA,AF, FDand DO. The accuracy of the process is improved by slightly rotating the contour points around point A, then, pointBcan be easily located as the minimum horizontal value of the contour points below the horizontal maximum. The pectoral muscle mask is denoted as ρ(x, y), for later use.

3.2 Extracting the Pectoral Muscle Edge

The extraction of the pectoral muscle edge is done by utilizing log Gabor filters.

However, to make the method more robust to the various angles that MLO mammograms are screened at, a preliminary adaptive method finds theoptimal orientations of the filters used for the muscle edge map extraction. The intermediate results of the steps are illustrated in Figure 3.3

(16)

Figure 3.2 Illustrating the construction of the pectoral muscle mask. The mammogram has been shaded according to the mask for clarity. O) Origin. A)The first contour point at the top of the mammogram. B) The closest contour point to the breast in the local minimum of the contour points at the bottom of the mammogram. C) The point against the left side of the mammogram, with identical height as B. D) Equivalent to point 0.8C.

E)Equivalent to point 1.5A. F) Intersection of lines AB and DE.

3.2.1 Log Gabor Filters

Gabor filters are selected for the purpose of extracting the pectoral muscle edge, because of their ability to separate directional features: The pectoral muscle consist of fibers oriented in the direction of the muscle edge, while fibroglandular tissue is typically oriented at a tangent of the edge, towards the nipple. For this reason, Ga- bor filters can efficiently separate the two, to find the edge more reliably than other edge detection techniques. Specifically, log Gabor filters are selected here because they solve some of the limitations of Gabor filters, which make the normal Gabor filters unfit for this application. Normal Gabor filters, in the frequency domain, are defined simply as an offset Gaussian. Specifically, Gabor filters at the lower frequencies bleed significantly to other orientations, resulting in poor angular separation.

By definition, log Gabor filters are restricted directionally by an angular component.

Because in the proposed method we are interested in low frequency filters, and because angular separation is especially important, log Gabor filters were selected, as their behavior is more easily controlled in this region.

(17)

Figure 3.3 Illustrating the pectoral muscle edge map extraction results. A)Correlations between each pair of filters in the preliminary step. As can be seen, the first pair shows the most correlation, hence the first four filters are used from the primary filter bank. B) Odd-symmetric (edge sensitive) part of the combined filtered images of the primary filter bank. C) Edge mask created from the thresholded phase. D) Edge map created from the combination of B and C.

In the spatial domain, Gabor filters are expressed as a complex sinusoid modulated by a Gaussian. Log Gabor filters differ primarily by a warping factor, which causes the sinusoid to warp in or out from the center. Therefore, log Gabor filters do not have an analytic expression, and they are constructed in the frequency space by combining a log Gabor band-pass filter with a Gaussian modulated radial mask.

Additionally, a low-pass filter is applied to guarantee a uniform behavior of the filters at all angles and frequencies, according to [20].

The log Gabor filters are constructed in the frequency domain as:

LG(F, θ) = exp

(−log(F/f)² 2·log(σ_f)²

) exp

( −θ² 2·σ²_θ

)

(3.1) lg(x, y) = IF T(

LG(F, θ))

(3.2) WhereF is the polar-radius normalized to unit radius,θis the rotated polar-angle,f is the peak frequency of the band-pass component, which affects the size of the filter,

(18)

3.2. Extracting the Pectoral Muscle Edge 12 σ_f is the width parameter of the band-pass component, affecting the bandwidth of the filter, andσ_θ is the radial parameter, affecting the angular precision of the filter.

lg(x, y) is the log Gabor filter in spatial domain.

An image filtered by a log Gabor filter is expressed as:

I_f(x, y) =I(x, y)∗lg(x, y) = Re(x, y) +Im(x, y) (3.3)

Where I_f(x, y) is the complex filtered image, I(x, y) is the original image, ∗ is the convolution operator, Re(x, y) is the even-symmetric part of the filtering and Im(x, y) is the odd-symmetric part of the filtering.

One of the essential features of Gabor filters is the ability to extract the phase and magnitude of the image. In the method proposed in this thesis, the phase of the image is used to locate edges in the mammogram, while the magnitude and imaginary component are used as measures of strength of the edge. The filtered image can be written in exponential terms of phase and magnitude as:

I_f(x, y) =m(x, y) exp(

i·ϕ(x, y))

(3.4)

Where m(x, y) is the magnitude and ϕ(x, y) is the phase angle, both defined here as:

m(x, y) = √

Re²(x, y) +Im²(x, y) (3.5) ϕ(x, y) = arg(

I_f(x, y))

(3.6)

Where arg(z)∈(−π, π]is the complex argument, or the four-quadrant arctangent.

The filtered images used in the proposed method are additionally shifted and weighted by their own magnitude, to better emphasize strong edges, applied through complex multiplication:

J(x, y) = I_f(x, y)◦(

m(x, y) +i·m(x, y))

(3.7) This is especially useful in the construction of the final filtered image, as it highlights where the filters correlate the most.

3.2.2 Extracting the Edge Map of the Mammogram

The pectoral muscle edge is located in the mammogram by filtering it with a bank of log Gabor filters. In the proposed method, the pectoral muscle edge is identified

(19)

Figure 3.4 Visualizing the half-peak magnitudes of the filter banks used in this method.

The small circle indicates DC. A)The preliminary filter bank used for approximating the orientation of the pectoral muscle edge. Dark area indicates primary overlap of the filters.

B)Displaying only the 1st, 5th and 9th filters used for the actual edge extraction, the rest of the filters are spanned evenly in between.

from the phase and odd-symmetric part of the sum of the filtered images: The phase is thresholded to form the edge mask, displaying potential edges in the image, which is then applied on the odd-symmetric part of the filtered image, to form theedge map of potential edges and their strengths. However, the phase is sensitive to change, and it is easily distorted when the final result is computed across the wide range of filters used to account for the different orientations of the pectoral muscle. To tackle this problem, before the edge map isactually extracted, the pectoral muscle orientation is approximated at several orientations to decide the optimal filters to use for the actual edge map extraction. For this purpose, a preliminary filter bank is created for the sole purpose of approximating the pectoral muscle edge orientation. The result of the approximation is an activation vectorω that controls which filters are used in the primary filter bank, for the purpose of edge map extraction. Both filter banks are visualized in Figure 3.4. Because the spatial resolution of the mammograms has been standardized, a single-scale filter bank is sufficient, and also computationally faster than a multi-scale filter bank. For example, multi-scale phase congruency [21]

was considered, but did not appear to improve the edge detection, compared to the proposed method, and it is also slower to compute.

The filters in each filter bank generously overlap their neighbors. The reason for this design choice is, because prominent edges in the filtered image of one filter are more likely to persist over on to the next filtered image. The idea is then to use each subsequent overlapping filter as a validation for the edges in the previous filter. This way, the overlapping filters can collectively identify the most prominent edges in the

(20)

3.2. Extracting the Pectoral Muscle Edge 14 mammogram. The filter bank used for the actual extraction is especially dense, to ensure a smooth edge is extracted, while the filters of the preliminary filter bank only overlap enough for them to efficiently find the correlation of edges between a pair of subsequent filters.

The five filters of the secondary filter bank have approximately 50% overlap with each neighboring filter. The filters span from9^◦ to57^◦, in increments of 12 degrees.

The log Gabor parameters for these filters are: f = 0.025, σ_f = 0.74, σ_θ = 0.175.

The primary filter bank consist of 10 filters with approximately 75% overlap with each neighbor. These filters span from6^◦ to60^◦, in increments of 6 degrees. The log Gabor parameters for these filters are: f = 0.025,σ_f = 0.55, σ_θ = 0.175. The filters in the two filter banks differ only by the bandwidth parameter σ_f, which controls the sharpness of the filter. According to Kovesi [20], a width parameter of 0.74 corresponds to a filter of approximately one octave, and likewise, a width parameter of 0.55 is approximately two octaves. In the context of this work, the sharpness affects the exactness of the edge detection. A filter of one octave does not follow the edge exactly, but is also less affected by noise. A filter of two octaves is more affected by noise, but follows the edge more exactly. By using the preliminary filters to approximate the optimal orientations, the noise effect on the primary filters is decreased. The value of σ_θ was selected analytically, considering angular precision and overlap. That leaves the size parameter f as the only variable. Here, f was selected by testing different values across a subset of mammograms, selecting the optimal value based on overall stability and individual accuracy.

The approximate orientation of the pectoral muscle edge is computed from the correlation between each pair of subsequent filters of the preliminary filter bank. In total, four correlation scores are considered, which are then used to construct the activation vectorω, which controls the filters that are used for theactual extraction of the edge. The correlation between two filters is computed from the Jaccard index of the edge masks of the pair of filters. Jaccard index is defined in Equation 4.1, and the edge mask is constructed from the thresholded phase as:

φ(x, y) =

⎧

⎨

⎩

1, ϕ(x, y)≥ ^π₂ 0, otherwise

(3.8)

The scores are then calculated from:

S(n) = Jaccard(

φn(x, y), φn+1(x, y))

, n= 1,2,3,4 (3.9) Where n denotes the filter index in the preliminary filter bank.

(21)

3.3. Post-Processing 15 Subsequently, the threshold for the activation of the scores is computed from:

T =max(S)−(

max(S)−mean(S))

·0.5 (3.10)

And finally, the activation vector is constructed from the scores as:

ω(u) =

⎧

⎨

⎩ 1, (

1 + 2·(n_a−1))

≤u≤(

2 + 2·n_b) 0, otherwise

, u= 1,2, ...,10 (3.11)

Where n_a is the first index n, where S(n) ≥T, and likewise n_b is the last index n, whereS(n)≥T. Parameter urepresents the index of the filter in the primary filter bank. Intuitively, each score in S(n) logically controls a set of four indexes u in ω, overlapped by neighboring indexes, according to:

ω_u =

  

S(1)≥T

ω₁, ω₂,

S(2)≥T

   ω₃, ω₄,

  

S(3)≥T

ω₅, ω₆,

S(4)≥T

  

ω₇, ω₈, ω₉, ω₁₀

Where, if S(1) ≥ T ∧S(4) ≥ T, then all filters are active. ω is now the binary representation of the filters that are used for the extraction process.

The edge map is then extracted from the resulting image H(x, y), constructed by using the primary filter bank, controlled by the activation vector ω. H(x, y) is computed from the matrix addition of all the filtered images as:

H(x, y) =

10

∑

u=1

(ω_uJ_u(x, y))

(3.12)

The edge map, Φ(x, y), is now formed from the thresholded phase of the result, φ_H(x, y), where masked areas of the edge mask correspond to the areas where the filtering has identified potential edges. The mask is then applied onto the odd- symmetric part of the result,Im_H(x, y), which is the part of the filter that is sensitive to edges, where the intensities correspond to edge strength. This is also where the pectoral muscle mask, ρ(x, y), is applied:

Φ(x, y) =φH(x, y)◦ImH(x, y)◦ρ(x, y) (3.13) The final edge is then extracted from the edge map in the post-processing stages.

(22)

3.3. Post-Processing 16

Figure 3.5 Illustrating the post-processing stages. A) Percentile-clipped edge map. B) Binarized edge map with the found edge plotted in red. C) Final result after polynomial fitting. Red: second order polynomial constrained to converge towards the top right. Green:

third order polynomial fitted along the edge. Blue: second order polynomial constrained to converge towards the bottom left. The top and bottom parts are each 25% of the edge length.

3.3 Post-Processing

Typically the pectoral muscle edge is visible in the edge map as the strongest line.

However, especially in dense mammograms, or mammograms where there are other prominent features, such as folds in the direction of the muscle, there may be several edges to choose from. In the final steps of the proposed method, the edge map is first processed further to identify the muscle edge, and then the edge is smoothed and completed by fitting a robustlyconstrained polynomial model on the edge. The stages in the post-processing are illustrated in Figure 3.5.

The pectoral muscle edge should span the corner of the mammogram from the left edge to the top edge as much as possible. It is also assumed that the pectoral muscle edge should be the longest edge, when several edges span the corner. The edge should also have a high uniform intensity. However, sometimes the edge mask may contain abnormally high values, for example due to artifacts in the mammogram. To make

(23)

3.3. Post-Processing 17 the method more robust, values in both extremes are eliminated. By using all of the non-zero values of the edge map, the 5% percentile is computed, and subsequently all values below that threshold are clipped to that value. Then, using all of the values except those that were thresholded, the 75% percentile is computed and used to clip values above it, to that value. Doing so prevents noise and exceptionally strong features from affecting the process too much. The values of the map are then normalized to the range of [0, 1].

Next, the edge map is adaptively binarized using Bradley’s thresholding method [22], where the binarizing is done in a local window based on T percent of the local mean. Here the value of T = 0.1, and the size of the window is approximately 1/8 of the edge map. In general, after the binarization step, several edges can be found in the image. As a result, it is necessary to select the one most likely to correspond to the real pectoral muscle edge. For this purpose, the regional properties are analyzed to find the best edge. The properties used here are: the area of the region, bounding box, mean and maximum intensity of the region, and length of the major axis. The major axis is computed from the longest axis of an ellipse that has the same second- moments as the region.

To find the best edge, the regions are given a score based on their properties. The score should favor regions that are long, span the mammogram from edge to edge, and have a uniformly high intensity. As such, the scores in this work are computed by a heuristic defined as:

Score= max(40, Dtop+Dlef t) L^(0.5+0.5·I^mean^+I^max⁾

Where D_top and D_{lef t} are the distances from the edge to the bounding box respectively,L is the major axis length of the area, and I_mean and I_max are the mean and maximum intensities of the area. The area with the lowest score is selected as the edge. The minimum distance sum of the edge to bounding box is set to 40 (10mm), so that cases where the correct edge does not quite span the entire length are taken better into account.

Now, the edge is taken along the right side of the region, which roughly corresponds to the the peak slope of the edge in the phase. The edge found like this is typically quite accurate, however it may contain noise, or have missing parts at the ends of the edge. For this reason a polynomial model is fitted onto the edge. At best, the model should fit a complex pectoral muscle edge (see Figure 2.2D). This is accomplished by approximating the edge by means of fitting three-piece-wise low order polynomials, as illustrated in Figure 3.5C.

(24)

18

4. EXPERIMENTAL RESULTS

4.1 Dataset and Performance Measures

The dataset used for the evaluation was collected with the collaboration of Tampere University Hospital (TAYS) for the purpose of clinical validation of OpenBreast.

For this work, a subset of the original dataset was considered: images corresponding to 50 patients of screening mammography at TAYS were used. The pectoral muscle was manually segmented by a resident radiologist and were subsequently reviewed by an experienced radiologist with more than 20 years of experience. By considering the MLO views from both breasts (left and right) a total of 109 images with manual segmentations were used in the experiments. Of the subset, 2 mammograms have no pectoral muscle visible.

To evaluate the performance of the segmentation, three scores are computed: Jac- card index [23] and Dice’s coefficient [24], as well as the mean distance (or error) of the lines in millimeters. Using the mean distance as a performance measure is favored here, as it is only affected by the accuracy of the segmentation, and normalized based on the spatial resolution of the mammogram. It should therefore be more relatable for subsequent evaluation of other methods.

Jaccard index is computed from:

Jaccard(A, B) = |A∩B|

|A∪B| (4.1)

and Dice’s coefficient is computed from:

Dice(A, B) = 2|A∩B|

|A|+|B| (4.2)

Where A and B are the area enclosed by the segmentation line and the top and left edges of the mammogram. The mean distance is computed by finding the shortest distance of each point of the segmentation line to the ground truth (ground truth clips to zero at the edge), and then taking the mean of the distances in millimeters.

(25)

4.2. Results 19

4.2 Results

In total 109 mammograms were evaluated. The results for the different segmentation algorithms used for comparison are listed in Table 4.1. From the results it can be seen that the proposed method worked by far the most accurately. The proposed method is also evidently quite reliable, as the scores are good even in the bottom 25% percentile. The sliding window algorithm and the Hough based method worked about evenly based on the scores. The sliding window algorithm is typically more accurate than the Hough based method, however, it is also less reliable, because the window is not constrained in any way. The segmentation of the sliding window algorithm is typically either precise or poor, while the segmentation of the Hough based method is more even, but rarely accurate.

Additionally, the Mann-Whitney U test demonstrates that the mean distance dis- tributions of the methods compared to the proposed differ significantly (at 5% sig- nificance).

Method Hough Based [17, 16] Sliding Window [19] Proposed

Percentile 25 50 25 50 25 50

Jaccard (%) 53.52 65.16 53.64 74.93 86.20 90.02

Dice (%) 69.73 78.90 69.82 85.67 92.59 94.75

Mean Dist. (mm) 13.35 9.00 7.12 4.18 2.90 1.83

Mann-Whitney U 2.2523e-25 6.8661e-09 −

Table 4.1 Segmentation results.

(26)

20

5. DISCUSSION AND CONCLUSION

In this thesis a new method for automatic pectoral muscle segmentation using log Gabor filters is proposed. The method works quite reliably, as in most cases the segmentation accurately follows the edge of the muscle, closely following the ground truth. In cases where the muscle is segmented incompletely, the polynomial fitting is typically able to compensate by completing the edge properly. Some of the problems of the proposed method are related to steep complex edges, where the edge angle is negative. This sometimes causes a problem in the filtering process, where the diagonal edges overpower the vertical edges, effectively splitting the muscle edge in half, for example Figure 5.1. Other issues could be caused by strong divergent features, such as folds, that may confuse the orientation approximation. This can be avoided by restricting the orientations that are considered, based on the approximate angle of the breast in the mammogram. Another limitation are breasts where the fibroglandular tissue is extremely dense and covers most of the edge, as the method relies on the edge being at least partially visible. Some more segmentation examples are shown in Figure 5.2.

(27)

Chapter 5. Discussion and Conclusion 21

Figure 5.1 Illustrating some partially incorrect results of the proposed method. Green lines are the ground truth, pink lines are the segmentation. A-C): segmentation results of a mammogram; Hough based, sliding window and proposed method respectively. D-F):

Another example.

Figure 5.2 Illustrating some more segmentation examples. Green lines are the ground truth, pink lines are the segmentation. A-C): segmentation results of a mammogram;

Hough based, sliding window and proposed method respectively. D-F): Another example.

(28)

22

BIBLIOGRAPHY

[1] National Breast Cancer Foundation INC., “Breast Cancer Facts - National Breast Cancer Foundation.” [Online]. Available: http:

//www.nationalbreastcancer.org/breast-cancer-facts

[2] Office for National Statistics, “Cancer survival by stage at diagnosis for England (experimental statistics) - Office for Na- tional Statistics.” [Online]. Available: https://www.ons.gov.uk/

peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/

bulletins/cancersurvivalbystageatdiagnosisforenglandexperimentalstatistics/

adultsdiagnosed20122013and2014andfollowedupto2015

[3] M. L. Giger, N. Karssemeijer, and J. A. Schnabel, “Breast Image Analysis for Risk Assessment, Detection, Diagnosis, and Treatment of Cancer,” Annual Review of Biomedical Engineering, vol. 15, no. 1, pp. 327–357, 7 2013.

[4] E. S. De Paredes and E. S. De Paredes, Atlas of mammography. Wolters Kluwer Health/Lippincott Williams & Wilkins, 2007.

[5] K. Ganesan, U. R. Acharya, K. C. Chua, L. C. Min, and K. T. Abraham,

“Pectoral muscle segmentation: A review,” Computer Methods and Programs in Biomedicine, vol. 110, no. 1, pp. 48–57, 4 2013.

[6] S. Sapate and S. Talbar, “An Overview of Pectoral Muscle Extraction Algo- rithms Applied to Digital Mammograms,” 2016.

[7] K. Thangavel and M. Karnan, “Computer Aided Diagnosis in Digital Mammo- grams: Detection of Microcalcifications by Meta Heuristic Algorithms,” 2005.

[8] K. S. Camilus, V. K. Govindan, and P. S. Sathidevi, “Computer-aided identification of the pectoral muscle in digitized mammograms.” Journal of digital imaging, vol. 23, no. 5, pp. 562–80, 10 2010.

[9] S. A. Taghanaki, Y. Liu, B. Miles, and G. Hamarneh, “Geometry-Based Pec- toral Muscle Segmentation From MLO Mammogram Views,” IEEE Transac- tions on Biomedical Engineering, vol. 64, no. 11, pp. 2662–2671, 11 2017.

[10] M. Molinara, C. Marrocco, and F. Tortorella, “Automatic segmentation of the pectoral muscle in mediolateral oblique mammograms,” in Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems.

IEEE, 6 2013, pp. 506–509.

(29)

BIBLIOGRAPHY 23 [11] R. Ferrari, R. Rangayyan, J. Desautels, R. Borges, and A. Frere, “Automatic Identification of the Pectoral Muscle in Mammograms,” IEEE Transactions on Medical Imaging, vol. 23, no. 2, pp. 232–245, 2 2004.

[12] M. Mustra and M. Grgic, “Robust automatic breast and pectoral muscle segmentation from scanned mammograms,” 2012.

[13] J. H. Kim, B.-Y. Park, F. Akram, B.-W. Hong, and K. N. Choi, “Multipass active contours for an adaptive contour map.” Sensors (Basel, Switzerland), vol. 13, no. 3, pp. 3724–38, 3 2013.

[14] A. Rodriguez-Ruiz, J. Teuwen, K. Chung, N. Karssemeijer, M. Chevalier, A. Gubern-Mérida, and I. Sechopoulos, “Pectoral muscle segmentation in breast tomosynthesis with deep learning,” in Medical Imaging 2018: Computer-Aided Diagnosis, K. Mori and N. Petrick, Eds., vol. 10575. SPIE, 2 2018, p. 90.

[15] I. L. Aroquiaraj and K. Thangavel, “Pectoral Muscles Suppression in Digital Mammograms using Hybridization of Soft Computing Methods,” 1 2014.

[16] B. M. Keller, J. Chen, D. Daye, E. F. Conant, and D. Kontos, “Preliminary evaluation of the publicly available Laboratory for Breast Radiodensity Assess- ment (LIBRA) software tool: comparison of fully automated area and volu- metric density measures in a case-control study with digital mammography,”

Breast Cancer Research, vol. 17, no. 1, p. 117, 12 2015.

[17] B. M. Keller, D. L. Nathan, Y. Wang, Y. Zheng, J. C. Gee, E. F. Conant, and D. Kontos, “Estimation of breast percent density in raw and processed full field digital mammography images via adaptive fuzzy c-means clustering and support vector machine segmentation,” Medical Physics, vol. 39, no. 8, pp.

4903–4917, 7 2012.

[18] D. Ballard, “Generalizing the Hough transform to detect arbitrary shapes,”

Pattern Recognition, vol. 13, no. 2, pp. 111–122, 1 1981.

[19] A. Shrivastava, A. Chaudhary, D. Kulshreshtha, V. Prakash Singh, and R. Sri- vastava, “Automated digital mammogram segmentation using Dispersed Region Growing and Sliding Window Algorithm,” in2017 2nd International Conference on Image, Vision and Computing (ICIVC). IEEE, 6 2017, pp. 366–370.

[20] P. Kovesi, “Log-Gabor Filters.” [Online]. Available: http://www.peterkovesi.

com/matlabfns/PhaseCongruency/Docs/convexpl.html [21] ——, “Phase Congruency Detects Corners and Edges,” 2003.

(30)

Bibliography 24 [22] D. Bradley and G. Roth, “Adaptive Thresholding Using the Integral Image,”

2007.

[23] P. Jaccard, “The Distribution of the Flora in the Alpine Zone,” no. 2, 1907.

[24] L. R. Dice, “Measures of the Amount of Ecologic Association Between Species,”

Ecology, vol. 26, no. 3, pp. 297–302, 7 1945.

Automatic Pectoral Muscle Segmentation in Full-Field Digital Mammography Images Using Log-Gabor Filters

LUKAS SCHEER