Cancer cell segmentation and data extraction

(1)

OSSI SANDBERG

CANCER CELL SEGMENTATION AND DATA EXTRACTION

Master of Science thesis

Examiners: Prof. Heikki Huttunen, Meenakshisundaram Kandhavelu

Examiner and topic approved by the Faculty Council of the Faculty of Engineering Sciences

on 11th November 2016

(2)

ii

ABSTRACT

OSSI SANDBERG: Cancer cell segmentation and data extraction Tampere University of Technology

Master of Science thesis, 49 pages October 2018

Master’s Degree Programme in Automation Technology Major: Computation Systems Biology

Examiners: Prof. Heikki Huttunen, Meenakshisundaram Kandhavelu

Keywords: Cancer cell, Segmentation, Fluorescence, Mammalian cell, Microbiology

Cancer is a health problem affecting approximately 14 million people each year. One of the causes of cancer when normal cell life is disturbed and the cell does not die as programmed. In addition to this, as cell research requires immortalized cells, cancer cell provides a good ground to research mammalian cells.

Fluorescence is an important tool in biological research. Fluorescence images can be captured with specialized microscope. These images require more processing to extract countable data. Instead of microscope imagas, flow cytometry can be used to calculate fluorescence in each cell. Flow cytometry loses information in comparison to microscopy, so it cannot replace microscopy completely. However, microscopy image can contain all the same information that flow cytometry techniques can.

This thesis is a literature review on how fluorescence data is extracted from a cancer cell. The review was conducted in two parts. In first part, recently developed methods were examined and in second part, methods currently are in use were reviewed.

Segmentation and microscopy methods seemed prevalent on method research, but fluorescence-activated cell sorting was most used.

Fluorescence-activated cell sorting is a common flow cytometry technique used even though it loses information of location and morphology. This seems to be due to unambiguous data it produces. Microscopy methods are less in use but are still used and segmentation is performed but typically with custom algorithms. Segmentation is a difficult task that requires specific expertise, which might be the reason that it is not in wider use in recent publications. Work is needed to close cap between biologist and image analyst.

(3)

iii

TIIVISTELMA

OSSI SANDBERG: Syöpasolun segmentointi ja datan keruu Tampereen teknillinen yliopisto

Diplomityö, 49 sivua Lokakuu 2018

Automaatiotekniikan koulutusohjelma Pääaine: Laskennallinen systeemibiologia

Tarkastajat: Prof. Heikki Huttunen, Meenakshisundaram Kandhavelu

Avainsanat: Syöpäsolu, Segmentointi, Fluoresenssi, Nisäkässolu, Mikrobiologia

Syöpä on terveysongelma joka vaikuttaa noin 14 miljoonaan ihmiseen vuosittain.

Yksi syövän aiheuttajista on normaalin solunelinkaaren häiriintyminen niin ettei solu kuole kuten sen on pitäisi. Tämän lisäksi koska solututkimuksessa tarvittavien solujen tulee elää pysyvästi, ja siksi syöpäsolut ovat tärkeitä nisäksäsolujen tutkimuk- selle.

Fluoresenssi on tärkeä työkalu biologisessa tutkimuksessa. Mikroskoopin kanssa sillä voi tuotta fluoresenssikuvia. Näitä kuvia prosessoimalla saadan käytettävää dataa. Yksinkertaisempi lähestymistapa on käyttää virtaussytometriaa analysoidak- seen jokaisen solun yksittäin. Virtaussytometriassa kuitenkin menettätään tietoa joten se ei voi korvata mikroskopiaa täysin. Kuitenkin mikroskoopilla voi saada vastaavan tiedon kuin kuin virtaussytometrialla, joskin tämä on vaikeampaa.

Tämä diplomityö on kirjallisuusselvitys kuinka syöpäsoluista kerätään fluoresenssi- dataa. Selvitys on kaksiosainen, ensin selvitettiin mitä metodeja on kehitetty, jonka jälkeen tarkasteltiin mitä metodeja on käytössä. Segmentointi ja mikroskopia olivat paljon tutkittuja lähiaikoina kun taasen virtaussytometria oli käytössä enemmän.

Fluoresesnssilla aktivoituva solujen järjestely on yleinen virtaussytometriatekniikka siitä huolimatta, että siinä menetetään tieto solun sijainnista ja morfologiasta. Tämän menetelmän etu on kuitenkin siinä, että sen tuottama data on yksiselitteistä. Seg- mentointi on sen sijaan hankala tehtävä joka vaatii erityisosaamista, joka saattaa olla syynä sen suosion vähäisyyteen. Tarvitaan selvästi lisää työtä yhdistämään kuvankäsittely ja biologia.

(4)

iv

PREFACE

This work was done during a two year period. Work started in September 2016 but much work was primed during my two years I worked in the Molecular Signaling Lab before the start of this thesis. As images used here were acquired from my work during with the team and I thank Meenakshisundaram Kandhavelu for allowing me to have those images for use and for the opportunity to work with other great people in MS Lab. Much help was received from examiner Heikki Huttunen and I like to thank him for his efforts and quick responses to my queries.

In addition, I like to use this space to thank people who tolerated not only my long master thesis process but also the whole lengthy process of studies. Lastly, I thank Ymir from his peer support, opinions, and awesome presence.

14th October, Ossi Sandberg

(5)

v

LIST OF FIGURES

2.1 Central dogma and macromolecules related to it. . . 5

3.1 Illustration of lenses . . . 12

3.2 Microscopy images of HEK293 cells . . . 14

3.3 Illustration of flow cytometry system . . . 18

4.1 A board full of letters to demonstrate segmentation . . . 22

4.2 10x magnification images from different cell lines . . . 23

4.3 Split fluorescent image of cells . . . 24

4.4 A typical approach in cell segmentation with example methods. . . . 24

4.5 Diversity between image denoising methods . . . 25

4.6 Comparison between denoising methods . . . 28

4.7 Demonstration of Otsu thresholding . . . 31

4.8 Watershed demonstration . . . 33

(8)

viii

LIST OF TABLES

2.1 Genetic code of mapping codons to amino acids . . . 8

5.1 List of articles inspected . . . 37

5.2 Reasons why articles used fluorescence in their studies . . . 38

5.3 Fluorescence methods used in articles . . . 39

5.4 Microscopes and computational methods used to access the data . . . 40

(9)

ix

LIST OF ABBREVIATIONS AND SYMBOLS

DNA Deoxyribonucleic acid is a molecule storing information in cell RNA Ribonucleic acid is a molecule commonly involved in cellular activ-

ities

tRNA transfer RNA

mRNA messager RNA

ncRNA non-coding RNA

Ala Alanine

Arg Arginine

Asn Asparagine

Asp Aspartic acid

Cys Cysteine

Gln Glutamine

Glu Glutamic acid

Gly Glycine

His Histidine

Ile Isoleucine

Leu Leucine

Met Methionine

Phe Phenylalanine

Pro Proline

Thr Threonine

Trp Tryptophan

Tyr Tyrosine

Val Valine

f focal point

M Magnification

Θ Angle of target

Θ⁰ Angle perceived

σ² variance

DAPI 4’,6-diamidino-2–phenylindole is commonly used fluorochrome

FP Fluorescence Protein

GFP Green Fluorescence Protein

(10)

x FRET Förster resonance energy transfer

FACS Fluorescence-activated cell sorting

FRAP Fluorescence recovery after photobleaching FLIP Fluorescence loss in photobleaching

FLAP Fluorescence localization after photobleaching FLIM Fluorescence lifetime imaging microscopy

PFL Publication Forum Level

(11)

1

1. INTRODUCTION

Biology means study (latin. logia) of life (latin. bios). It has roots in ancient Mesopotamia, Greece, Egypt, and Greece. The classical study is done by first observing and then recording findings, a common method in the natural sciences.

The methodology of observing and recording has been used in significant works such as Historia Anim¯alium (Aristoteles 384–322 BCE) and Systema naturae (Linnaeus C. 1735).

Observations has been limited by scale. As observations are done traditionally by a human eye, life forms invisible to the naked eye have not been detected. This inability left much of life forms uncategorized and two of three taxonomy domains uncategorized. Even one that we could observe, much was left unseen. This changed when system of lenses was used to see microscopic life.

One of the first microscopes was developed in the 17th century by van Leeuwenhoek and was used to observe single cell organisms. With this microscope, certain cells became observable. With careful observation, it became clear that these cells were replicating, and feeding, and thus, had some basic features of life. Likewise, some specific features could be detected, and as microscopes developed, more and more features became visible. For example, two distinct borders could be identified, inner and outer. The outer border defines a cell, and the inner was cell feature (nucleus).

Cells can be divided into two groups based on whether they have this nucleus feature.

Smaller ones, without this nucleus feature are known as procaryotic cells. Larger ones with this clear inner border are known as eukaryotic cells.

P rocaryotic cells are divided into two groups: archaea and bacteria. Cells in these groups are single cell organisms. Eukaryotic cells can be either single cell organisms or can consist of several cells working together. Example of a eucaryotic multi- cellular life form is the human (Homo sapiens). Study of multi-cellular cells tends to be difficult, as they commonly are dependent on a large structure of cells working together, whereas single cell organisms can be grown more easily in certain envi- ronments, (though several single cell organisms depend on the existence of other

(12)

1. Introduction 2 cells from parasitic to symbiotic relations). Manufacturing stable cell populations is important to be able to reproduce results and for cells not to be dependent on some temporary state that can expire.

The first immortalized human cell population was breast cancer cell population.

This means that as long as the cell is kept in correct condition, the population can exist indefinitely. Though it is possible to do research with non-immortalized cells, this is relatively rarely done as there can be several different variables on this kind of research. Using non-immortalized cells usually means that cells are extracted from the live host such as a mouse.

Cancer cells are useful for the somewhat obvious research area: Cancer research.

Cancer is a major, worldwide health problem. There were approximately 14 million new cancer cases during 2012 and this annual rate is expected to rise by 70% during the next two decades (World Health Organization 2018). Thus it is important to perform cancer studies and one key part is the testing viability of drugs on cancer cells.

When studying cells, the microscope alone is not enough to observe important events. For better observations, coloring features of interest are done. This coloring can be done to cell organs, proteins, DNA, or even reactions. It is common to do coloring with fluorescence, though other methods exist and it depends on what is being colored and purpose of the study. This means, that much of our cell research is still done by observing and recording.

Fluorescence studies seem to be on the rise and new methods are being developed.

To understand the state of fluorescence studies, the following research questions were chosen:

• What are newly developed methods for extracting data?

• How fluorescence data is extracted in cancer cell studies?

• How much papers in high impact journals use new methods?

• How useful are new methods in current research?

The goal of this thesis is to answer these questions.

First, in Chapter 2, we familiarize ourselves to the biological cell. During Chapters 3 and 4, state of art methods are introduced and explained in greater detail. Chapter 3 describes how the signal can be extracted from cells and Chapter 4 outlines cell

(13)

1. Introduction 3 segmentation. In Chapter 5, we explore several top journal articles and discover what methods those papers have used. The study is concluded in Chapter 6.

(14)

4

2. CELL AS A BIOLOGICAL UNIT

The goal of this chapter is to introduce to the biological background of this thesis:

the cell as a fundamental biological unit. Important biological concepts in this thesis are the central dogma, a eukaryotic cell, and a cancer cell.

2.1 Introduction to the cell as a biological unit

We look into a microscope and under its lens tiny organisms become visible. These microorganisms, i.e., an organism that is not visible to the naked eye, are called cells. During this microscope session, one of the microorganisms dies. It might have been part of its lifecycle to eventually die or it might have run out of nutrients. We continue observing and after some time, we can see that some of these microorganisms are also replicating, a process which also requires nutrients. These organisms must have metabolism i.e., they take nutrients from an environment and after processing these nutrients into waste they release them. Microorganisms do not exist in isolation but instead interact (communicate) chemically with their environment and each other. If the examination was continued long enough a cell would change its functionality and pass new functionality to its offspring during replication. Some microorganisms might form new structures during their life cycle while others would start to move. (Madigan et al. 2006)

The cell is a basic biological unit and its characteristics are metabolism, reproduc- tion, communication, and evolution. Differentiation and movement are also typical but not present in every cell. Cells are separated from each other by a membrane, and sometimes also by other structures e.g., cell wall. Cell sizes can also differ hugely, from micrometers (prokaryotic cell) to centimeters (mammalian eggs e.g., ostrich egg); not every cell is actually a microorganism. (Madigan et al. 2006)

2.2 Central dogma of molecular biology

In molecular biology, there is a central dogma which describes information flow between informational macromolecules. These informational macromolecules are

(15)

2.2. Central dogma of molecular biology 5

DNA

Replication

Transcription RNA

Protein

Translation

(a) Central dogma

P P P P

C T A G

G A T C

P P P P

(b) DNA

P P P P

C T A G

(c) RNA

Ala Arg Asn Asp Cys

Gln Gly Glu

His Ile

Leu Lys Met Phe Pro

Ser Trp Thr

Val Tyr

(d) Protein

Figure 2.1 Central dogma and macromolecules related to it. (a) Information flow in the cell in regards to the central dogma of molecular biology. (b) DNA complementary strands showing all four bases related to DNA, adenine (A), thymine (T), cytosine (C), and guanine (G), and complimentary pair for each. (c) RNA sequence, similar to DNA but instead of thymine there is uracil(U). (d) Proteins primary structure i.e., amino acid chain.

deoxyribonucleic acids (DNA), ribonucleic acids (RNA), and proteins, and they are responsible for all information stored in cells. Replication, transcription, and translation form the central dogma of molecular biology. This is the information flow in the cell and it is demonstrated in Figure 2.1(a). DNA can be replicated and the replication will result in two identical DNA molecules. During transcription, RNA is produced as an information replica of DNA. The functional unit of the cell, protein, is produced from RNA during a translation. (Madigan et al. 2006)

(16)

2.2. Central dogma of molecular biology 6

2.2.1 DNA

DNA is the central hub of information in cells. One DNA is a macromolecule, made of pentose sugar deoxyribose and four bases: adenine, cytosine, guanine, and thymine. Sugars are connected as a chain with a link between 3rd carbon and 5th carbon. Phosphate is in the middle of this link between two sugars. For future, 3rd carbon end is marked as 3’ and 5th is marked as 5’ and these are used to mark direction as well. Each sugar has one of four bases connected to it. Information in the cell is stored in two strands of DNA. These strands are complementary to each other based on bonding of bases, in DNA Cytosine-Guanine and Thymine- Adenine hydrogen bonds are possible. An example structure of a two-strand DNA is demonstrated in Figure 2.1(b). Linking between shows hydrogen bonds between DNA strands. (Madigan et al. 2006)

According to the central dogma of molecular biology, during replication, DNA strands are complemented resulting two identical DNA. Sometimes errors occur during replication, or original DNA has been altered somehow. There are mechanisms in the cell that protect the original DNA from alterations but it is possible that those fail, resulting in altered DNA. Assuming organism survives with the alteration its offspring will also inherit the altered DNA. (Madigan et al. 2006)

2.2.2 RNA

RNA has the similar structure to the single strand DNA which is shown in Figure 2.1(c). It has a ribose instead of the deoxyribose as the backbone and a uracil replaces the thymine. Even though RNA has affinities similar to the DNA, it does not typically appear as the double strand. Usually, inside of the cell has several singular RNA strands of differing sizes, all much smaller in length than DNA. During transcription, information from a part of the DNA strand is imported into an RNA strand. (Madigan et al. 2006)

RNA polymerase is a protein that constructs RNA from DNA during transcription. RNA polymerase recognizes specific region in DNA strand called a promoter sequence. After finding a promoter sequence RNA polymerase starts moving 5’ to 3’ direction of the strand from which it found the promoter sequence. A promoter sequence determines how easily RNA polymerase can bind to it, different promoters have different binding rates. RNA polymerase complements the strand it is moving along creating a replica of the other DNA strand, with uracil instead of thymine. It continues this until it reaches a terminator sequence which will stop the transcription. (Madigan et al. 2006)

(17)

2.2. Central dogma of molecular biology 7 The RNA can have several different functions and its function is usually specified when talked about RNA i.e., messenger RNA (mRNA) and non-coding RNA (ncRNA). There are several different kinds of non-coding RNA e.g., transfer RNA (tRNA) and ribosomal RNA (rRNA). These are all important to cell functions, but in regards to the central dogma, mRNA is of our interest. While the DNA works as an information storage of the cell role of the mRNA is an information transmitter.

(Madigan et al. 2006)

There are many parallels in translation and transcription. Structure called ribosome is responsible on translation process. The mRNA also has regions, which ribosome attaches, starts and ends translation. However, information is not directly copied but instead, it is formulated through a genetic code.

The genetic code consists of codons, which are three base sequences (or triplets). As there are four different bases in RNA (adenine, cytosine, glycine, and uracil), this means that there are4³ = 64different codes that codons can have. We can consider there to be three types of codons: start, stop, and amino acid. Start-codon begins protein synthesis and stop-codon ends it. Other codons map only to an amino acid, a total of 20 different amino acid is coded by these. As this is a total of 22 different functions, it becomes clear that there is an overlap. There are three different codons for the stop signal, 60 mapping to 19 amino acids, and one of amino acid codon also works as the start signal. Table 2.1 shows whole codon-amino acid mapping in greater detail. This is genetic code can be considered as universal code between all cells. (Madigan et al. 2006)

2.2.3 Protein

While DNA and RNA are information mediums in the cell, proteins carry cell functions. Protein is constructed of several amino acids linked with peptide bonds and is thus a polypeptide. Proteins have four stages of structure, the primary being its amino acid sequence. The content of the polypeptide chain determines functions of the protein and its structures, though what structures form also contributes to protein does. Proteins primary structure is constructed during translation in a protein synthesis mediated by a ribosome. (Madigan et al. 2006)

The ribosome is a main actor in the protein synthesis. Its structure has both proteins and rRNA. The ribosome attaches itself into mRNA into a region called ribosome binding site and starts moving towards 3’ end. When it finds the AUG-sequence protein synthesis starts. During a protein synthesis, ribosomes require that tRNA carries amino acids into its structure. These amino acids ribosome attaches to a

(18)

2.3. Eukaryotic cells 8 Table 2.1Genetic code of mapping codons to amino acids: Alanine (Ala), Arginine (Arg), Asparagine (Asn), Aspartic acid (Aspartate) (Asp), Cysteine (Cys), Glutamine (Gln), Glutamic acid (Glutamate) (Glu), Glycine (Gly), Histidine (His), Isoleucine (Ile), Leucine (Leu), Lysine (Lys), Methionine (Met), Phenylalanine (Phe), Proline (Pro), Serine (Ser), Threonine (Thr), Tryptophan (Trp), Tyrosine (Tyr), Valine (Val) and stop codon (STOP).

*AUG is also the start of the codon coding as well as methionine.

Codon Amino Codon Amino Codon Amino Codon Amino

Acid Acid Acid Acid

AAA Lys CAA Gln GAA Glu UAA STOP

AAC Asn CAC His GAC Asp UAC Tyr

AAG Lys CAG Gln GAG Glu UAG STOP

AAU Asn CAU His GAU Asp UAU Tyr

ACA Thr CCA Pro GCA Ala UCA Ser

ACC Thr CCC Pro GCC Ala UCC Ser

ACG Thr CCG Pro GCG Ala UCG Ser

ACU Thr CCU Pro GCU Ala UCU Ser

AGA Arg CGA Arg GGA Gly UGA STOP

AGC Ser CGC Arg GGC Gly UGC Cys

AGG Arg CGG Arg GGG Gly UGG Trp

AGU Ser CGU Arg GGU Gly UGU Cys

AUA Ile CUA Leu GUA Val UUA Leu

AUC Ile CUC Leu GUC Val UUC Phe

AUG *Met CUG Leu GUG Val UUG Leu

AUU Ile CUU Leu GUU Val UUU Phe

polypeptide chain, which is shown in Figure 2.1(d). Each tRNA carries singular amino acid and has affinity only to certain codon. Thus, the sequence described in mRNA is forced into a polypeptide chain. (Madigan et al. 2006)

Secondary structure is specific twisting and folding that happens in protein determined by its primary structure. Final three-dimensional structure determined by previous structures is the tertiary structure of the protein. Some proteins are com- posed of several subunits. Those proteins have the quaternary structure which is subunits bonding together. (Madigan et al. 2006)

2.3 Eukaryotic cells

A eukaryotic cell (e.g., a human cell) differ from a prokaryotic cell (e.g., a bacterial cell) in several ways. As our study is related to cancer which exists only in eukaryotic cells, we focus on specifics of mammalian cells and do not consider properties of non- mammalian cells. Some mentioned here might be applicable to another kind of cells but these similarities are not of interest of the thesis. The central dogma of molecular biology is relevant to all life forms; we now discuss some more specified properties

(19)

2.4. Cancer 9 of eukaryotic cells. (Madigan et al. 2006)

2.3.1 Nucleus

A nucleus stores information of the cell in two complementary strands of DNA.

Nucleus encapsulates the DNA within two membranes. Each membrane has different functionality based on the solution outside it. The inner membrane has special interactions with insides of the nucleus and the outer membrane has interactions for outside of the nucleus. There are pores going through nucleus membranes which joins inner and outer membranes together. Pores also allow nuclear transport, a process which allows a complex of proteins to move through membranes. The nuclear transport requires energy to function. (Madigan et al. 2006)

2.3.2 Mitochondrion

The mitochondrion is a cell-like structure inside a cell. It can take different shapes from circular to rod-shaped. A count of mitochondria varies depending on cell size but a typical mammalian cell can contain more than 1000 mitochondria. A mitochondrion has an inner and an outer membrane. The inner membrane is folded several times. Most of its functions are coded in DNA inside nucleus but it contains also its own DNA. The purpose of mitochondria is to produce energy for the cell.

(Madigan et al. 2006)

2.4 Cancer

Cancer is a major health issue globally and increasingly so. World Cancer Report 2014 (Stewart, B. W., Wild 2014) informs that cancer incidences have increased from 12.7 million in 2008 to 14.1 million in 2012 and it projects incidences to close 25 million during the next two decades. There were 8 million cancer-related deaths during 2012. Lung, breast, colorectum, prostate, stomach, and liver cancer are the most common sites for cancer. However, cancer location varies depending on sex, lung cancer affecting about two times as much men than women. (Stewart, B. W., Wild 2014) (Madigan et al. 2006)

As described in the World Health Organization fact sheet on cancer (World Health Organization 2018), generic terms cancer, malignant tumor, and neoplasm are used to describe this large group of diseases. This group of diseases can affect any part of the body by cells rapidly replicating beyond their normal functionality. Cancer

(20)

2.5. Cell lines in research 10 cells can spread to other organs thus disrupting usual functions of the organ. This process is called metastasis and it is the typical cause of death from cancer. Because of cell replication, only a single cell is required to have cancer properties and cancer will form. (World Health Organization 2018)

The human body has several mechanisms that force cells to grow in a certain manner.

Thus, a cell has to acquire several new capabilities in order to cancer to form. It is suggested by Hanahan & Weinberg (2000), that most, or perhaps all cancers have to acquire six functional capabilities before they develop into cancers. Otherwise, mechanisms in the body would prevent cancer. This would suggest that no single mutation would be enough for cancer formation. These capabilities are:

• Self-sufficiency in growth signals

• Insensitivity to anti-growth signals

• Tissue invasion and metastasis

• Limitless replicative potential

• Sustaining access to blood circulation

• Evading programmed cell death

There can be, however, multiple ways how these capabilities are acquired. For example, while it is possible that cell would produce growth signals for itself, it is also possible that it instead induces its neighbors to produce required signals for cancer. This explains why treating cancer is not straightforward and why advanced methods are required to study it. (Hanahan & Weinberg 2000)

2.5 Cell lines in research

A cell line is derived from a single cell through replication. The cell line has distinct genetic makeup and it can be used to form cell culture. Cell culture is growing cells in a controlled manner in a laboratory condition. Several different cell lines are used in research. For example, Cellosaurus (Bairoch 2016) has 76852 cell lines, 63707 from humans. Based on search results from the search tool it has 1438 lung, 590 breast, 442 colorectal, 186 prostate, 206 osteosarcoma, 14 stomach, and 186 liver cancer cells and the total of 12906 human cancer cell lines. They also have a total of 2716 non-human cancer cell lines including cells from dog, mouse, rat, cat, and several other species. Due to their immortalized nature, cancer cells play important role in biological research.

(21)

11

3. SIGNAL AND DATA EXTRACTION

During any signal processing task, it is crucial to understand the signal of interest.

This chapter gives an understanding of the methods used in fluorescence studies.

Without this understanding, it can be hard to effectively extract the signal, ask correct questions about the signal, or asses validity of methods.

3.1 Microscopy

Microscopy uses microscopes to visualize objects otherwise undetectable due to small size. The function of the microscope is to magnify the visualization of the object.

Microscopes can be mounted with a recording device to permanently store the vi- sualized object. An important concept in microscopy is how the route of light can be distorted with lenses. The lens is a simple device inside light microscopes that have two refracting surfaces.

Let us consider a thin lens. The thin lens is a lens with two spherical surfaces so close to each other that we can ignore the thickness of the lens. The lens has one important property that depends on the geometry of the lens: a focal point. Light moving directly towards the lens will go through this point and the light emitting from it will turn perpendicular in contrast to the lens as illustrated in Figure 3.1(a).

Location of the focal point as a distance from the lens f can be determined with the Lensmaker’s equation

1

f = (n−1)( 1 R₁ − 1

R₂) (3.1)

where n is the index of refraction,R₁is the curvature of the first surface, andR₁is the curvature of the second surface. R_n is positive if the center of its curve originates from the same side as outgoing light. The lensmaker’s equation is illustrated in Figure 3.1(b). (Young et al. 2016, pp. 1154, 1155)

Lenses are naturally occurring phenomena, the prime example being the eye. The eye can change the focal point by controlling lens radiuses inside pupil. By doing

(22)

3.1. Microscopy 12

F⁰ F

(a) Focal points

F₂ R₂ f

R₁

(b) Lens illustrated

Figure 3.1 Illustration of lenses. (a) focal points and their effect (b) According to lensmaker’s equation R1 would be positive and R2 negative

this, the eye can adjust to the distance of an object and focus light originating from a single point to land in the same spot in the retina. It does this to every point producing an image on the retina which is transferred into the brain via the optic nerve. (Young et al. 2016, p. 1164) However, the eye is not able to capture every possible image. This is due to physical limitations: such as creating the required shape with pupil or retina’s lack of sensitivity to certain signals.

A camera is a common tool used to capture images. It has similar components as the eye: it has a lens, ability to adjust the lens, and ability to captures image.

Unlike when using eye, captured images are recorded and thus readily available for analysis. Ability to record is vital in modern microscopy. The camera alone also has the same limitation as eye, the inability to shape the lens to capture small objects.

For microscopy, the camera is an important recording device as this allows digital analysis of images.

A lens can also be used to magnify objects. The lens used this way is called the magnifier. The limitation of the eye can be circumvented by placing an object in the focal point of the magnifier. The magnifier will distort light so that the object can be viewed closer than the eye would normally allow. Magnification can be expressed with the angular magnification

M = Θ⁰

Θ (3.2)

whereΘ⁰ is the angle the object is viewed through the magnifier and Θis the angle

(23)

3.1. Microscopy 13 the object actually is without the magnifier. If angles are small, we can say for a simple magnifier that

M_{magnif ier} = Θ⁰

Θ = y/f y/n = n

f (3.3)

where y is object size, f is focal length, and n is near point. The near point is a property of a capturing device. For the human eye, the near point is typically 25cm.

It is important to note that we can change f in this with properties of the lens as explained by Equation 3.1. (Young et al. 2016, pp. 1167, 1168)

As mentioned, the point of the microscope is to visualize object too small to be seen without it. Even though the magnifier can give some magnification, it alone is not enough. The compound microscope utilizes the magnifier in its design. However, another lens, objective, is added to create image identical to object size inside the focal point of the magnifier. Magnifier magnifies this image. Angular magnification of the compound microscope is

M_microscope = ns⁰

f_{magnif ier}f_objective (3.4)

where s⁰ is object distance from objective. We have now additional focal distances to work with. (Young et al. 2016, pp. 1169, 1170)

Illumination in microscopy can be done in different ways. We talk here only about light microscopy but same principal works for all microscopy. The important thing to recognize, is that object in vision is focused to the certain distance by manipulating the path of light. In bright light microscopy, an object is illuminated from behind and shadow is of the object is recorded. In fluorescence microscopy, light is reflected only from the target.

Figure 3.2 shows examples of cell images captured with a microscope. Images, such as these, can be analyzed further, e.g., quantifying color intensity or counting cells. Figure 3.2(c) is a bright light image and Figure 3.2(a) and Figure 3.2(b) are fluorescence images. For comparison, also summed image of Figure 3.2(a), Figure 3.2(b), and 3.2(c) is presented in Figure 3.2(d), in which we can see how images align.

Staining refers to the process of marking some features to give better contrast.

For example, 4’,6-diamidino-2–phenylindole (DAPI) is commonly used (Kepner &

(24)

3.2. Fluorescence and fluorochromes 14

(a) (b)

(c) (d)

Figure 3.2 Microscopy images of HEK293 cells in same location. (a) Green fluorescence image, original green channel is multiplied for visualization purposes. (b) Red fluorescence image. (c) Bright light image. (d) Combined image of a, b, and c.

Pratt 1994) to stain the nucleus and make it more visible in fluorescence images. If red fluorescent in Figure 3.2(b) means, for example, that a cell is dying, we could also say that we have stained each dying cell. Fluorescence staining is a powerful tool which can be used to stain several interesting features, from cells to proteins to protein interactions.

3.2 Fluorescence and fluorochromes

Fluorescence is an important imaging tool in biology. It produces information from molecular level and can be used to detect even protein interactions. Several studies use fluorescence methodse.g., Roberts et al. (2016), Cai et al. (2016), Arulanandam et al. (2015), Dermawan et al. (2016), Jang et al. (2015), Russell et al. (2015), Zhao et al. (2016), Peiris et al. (2015), Perera et al. (2015), Lawson et al. (2015) and Melo et al. (2015). Figures 3.2(a) and 3.2(b) show examples of fluorescence microscope

(25)

3.2. Fluorescence and fluorochromes 15 images. As explained by Ishikawa-Ankerhold et al. (2012, p.4048), fluorescence is a process where light is absorbed by compound only to be released afterward. These compounds are called fluorochromes, but terms fluorophore, fluorescent probe, and fluorescent dye are all used depending on the compound, conditions, and properties (Ishikawa-Ankerhold et al. 2012).

Fluorescence is interaction with certain molecules and light. Light is absorbed by molecule fluorochrome and released afterward. To be precise, electrons in fluorochrome can reach an unstable higher energy state by absorbing a photon i.e., light particle. From this higher state, an electron will lower back to its original low- energy state emitting fluorescent light based on the difference between these two energy states. This is represented with

λ_EM = hc

E_EM, (3.5)

where λ_EM is emission wavelength, h is Plank constant, c is the speed of light, and E_EM is the energy difference between two states. It is important to note that the energy state difference is dependent on molecular structure and thus absorbed photons are not present in the equation and do not determine the wavelength of the emitted photon. In order to push an electron into a higher state the energy of the absorbed photon has to be greater than E_EM. (Wu et al. 2008, p.248)

Fluorochromes have two important properties: quantum yield and fluorescence lifetime. Quantum yield is the ratio of emitted photons to absorbed photons and it determines the brightness of the emission. Fluorescence lifetime is the average time between absorption and emission. Some physical conditions can affect these properties, for example even the orientation of fluorochrome in relation to the incoming photon can affect absorption. Phenomena where interaction with some molecules lower lifetime or quantum yield is known as fluorescence quenching. (Ishikawa- Ankerhold et al. 2012, pp. 4052-4055)

Förster resonance energy transfer (FRET) is an important concept with fluorescence.

It can happen when a fluorochrome is in close proximity to an acceptor, a molecule that can receive energy through FRET. The fluorochrome can donate its received energy during fluorescence lifetime to the other molecule and it is possible that this other molecule will release energy and emit light. Emitted lights E_EM depends now on the acceptor molecule instead of the original receiver. There are three approaches to constructing the FRET biosensor. First is that two different molecules are labeled with different parts of FRET molecules. FRET emission can be observed when these

(26)

3.3. Fluorescence and microscopy 16 molecules interact i.e., get to close proximity. In other way, molecules start with the capability to give FRET emission but when the interaction of interest happens, FRET signal is lost. Lastly, a molecule can be labeled with FRET molecules but it needs to go through changes, after which FRET emission increases.

Fluorescence produced by our fluorochrome is not the only source of fluorescent emission. Cells have naturally occurring fluorescence produced by their cellular components that have fluorescent properties. This is called autofluorescence and there are several strategies for avoiding its effects: optical bandpass signal filtering, using fluorochromes with differentE_EM than autofluorescence or fluorochromes that can emit light in infrared, using fluorochrome with different lifetime than autofluorescence emitters and using this property to filter, record spectra in every pixel, or mathematically disentangling emissions. (Ishikawa-Ankerhold et al. 2012, pp.

4054-4055)

Photobleaching removes fluorescence ability of fluorochrome permanently. It occurs when a fluorochrome reacts with another molecule and produces new covalent bonds with it. This can lead to a new molecule without fluorescence ability. Photobleach- ing rate can be reduced by reducing exposure time or intensity but it cannot be prevented. (Wu et al. 2008, pp. 253-254) However, photobleaching can also be considered a technique as it is utilized in some of advanced fluorescence methods as is discussed later.

3.3 Fluorescence and microscopy

Microscopy and fluorescence can be used together as seen in Figures 3.2(a) and 3.2(b). Let us consider implications of fluorescence for microscopy. The typical way in microscopy would be a light illuminating target from behind and casting shadow to a lens as shown in Figure 3.2(c). In fluorescence, this idea goes somewhat inversely as we are interested in light emitted and not shadow cast. Light from the light source of the microscope is not captured, instead, light emitted from fluorochromes is captured by the capturing device.

The energy of photons from the light source needs to be high enough to allow emissions. Possible solutions for this are high power lamps such as a mercury lamp or a xenon lamp. Also, the wavelength can be filtered as all interesting wavelength are known. Even though wide a field light source can be used,i.e., a simple mercury or xenon lamp, lasers are also utilized in fluorescence studies.

Confocal microscopy focuses the light beam to a certain area and simultaneously

(27)

3.4. Flow cytometry 17 selects a singular location from microscopy target, limiting out of focus light inter- fering with the local signal. (Ishikawa-Ankerhold et al. 2012, p. 4064). For example, this popular microscopy method is used in Doan et al. (2016), and several books (Marshall 2003, Price & Jerome 2011) consider fluorescence important application for confocal microscopy.

Two-photon microscopy (Rubart 2004, Ishikawa-Ankerhold et al. 2012, p. 4067) limits high enough level absorption to the focal distance of the microscope. In it, the aim is that two low energy photons are simultaneously absorbed so that they combined provide enough energy to allow emission. In addition, Rubart (2004) presents the two-photon method as advantageous over confocal microscopy e.g., it protects fluorochromes from photobleaching as with greater precision, only target fluorochromes emit light and are thus susceptible to photobleaching.

3.4 Flow cytometry

A flow cytometry apparatus was first described by Coulter (1953). Apparatus has used in cell counting and cell sorting as described by Fulwyler (1965). A system is capable of doing these tasks based on several different measurable characteristic, e.g., fluorescence, optical density, reflectivity or volume (Fulwyler 1965). Multiple flow cytometry systems are available commercially (Picot et al. 2012), though it is noteworthy that they can be somewhat expensive (Roberts 2013, Davey & Kell 1996).

A flow cytometry system, shown in Figure 3.3, should perform several steps. First, it produces droplets, which are measured, if they contain the target(e.g., cell). Some droplets do not contain the target and are considered waste and ignored. Droplets are produced in a way that they will not contain more than one instances of the target. Measurement is performed to the target. This measurement can be used for deeper analysis later, but it is also commonly used immediately as a droplet is falling.

Droplets are sorted into correct bins by guiding them with certain electronics.

A common application of flow cytometry is Fluorescence-activated cell sorting (FACS).

Measured property for droplet is fluorescence and depending on the result, it can be sorted and counted. This is a very common and useful method used in fluorescence studies. It can acquire fluorescence data from cells individually.

Flow cytometry has its disadvantages, though. Equipment can be expensive and not readily available to all. It also processes cells, which removes them from a state that they previously were. It cannot be used to get time series, or linking directly to

(28)

3.4. Flow cytometry 18

-

+

- -

+ +

Waste

Measurement

Bin chooser Computer

+ -

Figure 3.3 Flow cytometry system. First, the solution is manipulated into droplets.

Then, these droplets are measured. This is then analyzed and a bin is chosen for each droplet. Each droplet is guided to the correct bin based on measurement. Adapted from (Davey & Kell 1996).

(29)

3.5. Photobleaching techniques 19 microscope datai.e., even if you were tracking some cells with a microscope, you will be unable to link flow cytometry data to microscopy data. Other data that can be derived with the microscope is unavailable to flow cytometry data,e.g., morphology, localization, and connections to other cells.

Flow cytometry is an important method, but it cannot replace microscopy studies completely, and it is important to have the ability to peek and see what is happening.

Still, by isolating a cell it can be analyzed with great precision, with little noise to blur data from outside the cell.

3.5 Photobleaching techniques

We have learned what fluorescence is and how it can be used in conjunction with microscopy. Photobleaching techniques presented here are part of advanced fluorescence methods presented in Ishikawa-Ankerhold et al. (2012) which contains further details of these techniques.

Photobleaching can be utilized in fluorescence studies in several ways. Three techniques are predominant: fluorescence recovery after photobleaching (FRAP), fluorescence loss in photobleaching (FLIP), and fluorescence localization after photobleaching (FLAP). As mentioned before, photobleaching permanently removes fluorochromes ability to fluorescence. (Ishikawa-Ankerhold et al. 2012)

The FRAP method was developed to study protein mobility in live cells. It is a very versatile method and is commonly used to study many aspects of cell biology.

FRAP can be used when investigating molecule movement, a connection between cell areas, a rate of exchange between cell areas, protein binding to large structures, or binding between proteins. FRAP is a very useful technique. (Ishikawa-Ankerhold et al. 2012)

The idea of FRAP is to expose the selected area of a cell to high-intensity light. This light bleaches fluorochromes in the area and renders the area free of fluorescence.

Before bleaching control image is taken and then the area is briefly flooded with high-intensity light. After this, fluorescence recovery from areas not affected to affected areas can be observed as preserved fluorochromes diffuse inside the cell. If movement between areas needs to be one-directional, FLIP can be used, in which the area is bleached repetitively. FLIP can be used in conjunction with FRAP, to compare results and confirm the results of other. (Ishikawa-Ankerhold et al. 2012) If more specificity is required, it is also possible to track bleached molecules. With FLAP it is possible to track molecules from the bleached area. Molecules of interest

(30)

3.6. Fluorescence lifetime imaging microscopy 20 carry two different fluorochromes instead of one. For bleaching, light that bleaches only one fluorochrome is used. Then other can be tracked. (Ishikawa-Ankerhold et al. 2012)

3.6 Fluorescence lifetime imaging microscopy

Fluorescence properties can be used in several ways. One of the modern methods is fluorescence lifetime imaging microscopy (FLIM). As discussed before (in section 3.2), FRET is an important feature in fluorescence. FRET is the key mechanism in FLIM. (Ishikawa-Ankerhold et al. 2012)

Resonance energy transfer can happen if molecules are in a close proximity to each other. In it, energy is transferred from one molecule, the donor, to another, the acceptor. If energy is released by the acceptor it can be recognized as different than if it were released by a donor. This is called FRET and it is a way to inspect interactions between molecules. (Ishikawa-Ankerhold et al. 2012)

As explained in fluorochromes have lifetime property. Instead of exploring emission intensity which is more obvious, we can detect FRET by changes in the lifetime.

This has the advantage of not being affected by background light as we are interested in changes in the lifetime, not on light emission itself. It can be affected by other factors in its environment, so it needs to be accounted for. Interesting in FLIM is that it provides a higher resolution as it eliminates much of noise, but equipment is more challenging and expensive. (Ishikawa-Ankerhold et al. 2012)

It is important to note, that our signal can be of different things. It can be light, but with FLIM it can be a time interval. These are methods to explore what is happening on a microscopical level. In the end, it is not important how the signal was acquired (though that can affect many properties of the said signal), what matters is what it represents, and how we can find from that the thing that interests us.

3.7 Scale of interest

Scale defines much on what we need to do to extract the data. This might also change results. First thing on the scale is that are we examining single cell or cell population. How this is chosen might give different results. If we consider the whole population, increases in extremities can be missed. On the other hand, the likelihood of drawing wrong conclusions increases as we add more subsections to analyze.

(31)

3.7. Scale of interest 21 After knowing this, we are to consider where effect occurs and if we should exclude sections of the cell. It is possible that we are interested only in the cell nucleus, or perhaps there is background radiation that we need to remove. Sometimes signal represents the specific event and in those cases we need to identify the event. And of course, sometimes we are interested purely on total intensity. This scale choice defines what techniques are required.

(32)

22

4. CELL SEGMENTATION

Cell segmentation is a crucial part during signal extraction as was discussed in the previous chapter. The purpose of cell segmentation is quite straightforward:

to differentiate cell from a background, other cells, and artifacts. Even though the purpose is straightforward, automated solutions are not. However, there is a large number of variations between different approaches. This section explains the segmentation and its special case the cell segmentation. In this chapter segmentation typically refers to automated cell segmentation. Further, the chapter covers the most prominent methods for mammalian fluorescence cell segmentation.

4.1 Image segmentation

Segmentation is a process of dividing an image into regions. The division is based on the properties of target regions. When an image is segmented it is divided into components of interest. (Mitiche & Ayed 2009)

Let us consider the following task: we have an image of full of letters as demonstrated in Figure 4.1(a). Each letter has a color that is not white while the background is white. We want to know how many letters there are and what the exact borders of each letter are. A result of this process, that we call segmentation, can be seen in Figure 4.1(b), where letters are separated by black background. A reason for this task could be for example that we want to eventually recognize each letter. However, the reason does not matter, as long as task is clear.

One interesting property is that it is not difficult to do manually. It is easy to assume that most people given this task will immediately understand the concept and is able to complete the task. This is in contrast to many other problems that we might have. For example, finding each occurrence of number series "1234" amongst first hundred thousand digits of pi, is relatively easy to accomplish with the computer but horrible errand to do manually. Interestingly, the segmentation task is not as trivial.

A true difficulty to a task comes from how difficult it is to formally define. For a

(33)

4.2. Properties of eukaryotic cell images 23

(a) (b)

Figure 4.1 A board full of letters to demonstrate segmentation. (a) Normal image. (b) Letters detected and separated from each other with black background

person to immediately be able to understand the task, several years of learning is actually done beforehand. This includes understanding concepts such as the letter, surface and reflection, color, and border. To define all these concepts formally would tedious at best. This is the reason why you cannot as easily instruct the computer from image find letter borders, as you can do with from pi find 1234.

4.2 Properties of eukaryotic cell images

Cell images can have different quality. Figure 4.2 shows some different fluorescence cell images. These images are at 10x magnification. Upper images have two channels, red and green. Lower images have only a green channel. However, there are other differences between images.

Eukaryotic cell cultures can be of different sizes. Therefore, it is also very typical that regions of fluorescence mammalian cell image have different size. From Figure 4.2(c) HEK293 cells we can also see clustering. It is also not atypical that the whole cell image is one cluster of cells.

Cells can have different shapes. Both circular and non-circular shapes can be observed in a cos7 image from Figure 4.2(d). Most of the circular regions in that image are actually not cells but instead remnants of fluorescence inside of the cell.

However, nuclei (Figure 4.3, blue channel) have typically very similar shapes. As cells have only one nucleus per cell, the nucleus is important. It is a viable option to first concentrate on detecting cell nucleus accurately and afterward propagate to actual cell borders, pipeline supported by Carpenter et al. (2006). Of course, this is viable only if the nucleus is marked.

(34)

4.3. Typical approaches in cell segmentation 24

(a) u2os (b) mcf-7

(c) HEK (d) cos7

Figure 4.2 Example 10x magnification images from different cell lines. Conditions of cells and a purpose of these images may vary.

4.3 Typical approaches in cell segmentation

This chapter introduces different cell segmentation methods. The goal of the chapter is not to answer how the cell segmentation should be done, but instead how it can be done. This chapter tries to provide the best possible view of the current most prominent and most used methods. Figure 4.4 shows a typical approach to segmentation.

As is common in image processing, it is also common in segmentation to do something that can be considered preprocessing. In the cell segmentation, these are usually to prepare images for later steps. One goal in this is to have a good estima- tion of what is foreground and what is background. Methods that gives results as under-segmented are considered in this thesis as preprocessing. This happens when cells are clustered. If cells are clearly separated from each other by background, segmentation is easy.

(35)

4.3. Typical approaches in cell segmentation 25

Figure 4.3 Split image of cells with three fluorescent channels, blue, green, and red.

Unsplit image in bottom right

Preprocessing Segmenting separated

Segmenting clustered

Propagation from previous steps

BM3D Otsu Watershed Voronoi

Figure 4.4 A typical approach to cell segmentation with example methods. First, the image is preprocessed, then simplest segmentation when separate clusters are identified. Once clusters are identified, cells are detected from clusters. This can cause over-segmentations, so after this, they are re-merged. Example methods are shown below each step. So for example, first, BM3D is applied, then Otsu threshold to separate background from foreground.

Then foreground cluster could be separated from each other with the watershed. Possible over-segmentations would be fixed with Voronoi.

(36)

4.4. Denoising 26

Image Denoising Methods

Spatial Domain Transform

Domain

Lineas Non-Linear Non-Data Adaptive Transform

Data Adaptive Transform

Wavelet Domain

Spatial Frequency Domain

Non-Orthogonal Wavelet Transform Non-Linear

Threshold Filtering Linear

Filtering

Wavelet Coefficient Model

Non-Adaptive Adaptive Deterministic Statistical

Marginal Joint

Figure 4.5 Tree showing diversity between image denoising methods. Adapted from Mot- wani et al. (2004).

Segmenting clustered cells is typically a more difficult task. These methods might be too aggressive on dividing the cluster so this can result in over-segmentation.

When this happens with the chosen method, some other method can be applied to merge some of these pieces.

4.4 Denoising

Image denoising is an important preprocessing step in fluorescence image processing which commonly have a low signal to noise ratio (Chowdhury et al. 2013). Denoising is a process which removes noise. Noise can easily cause incorrect areas of an image to be recognized as a cell. Image denoising is large field as demonstrated in Figure 4.5 and covering it fully is not within the scope of this thesis. The focus is on the fundamentals of concept and most recent development.

In practical terms, noise is an unwanted property of an image. To understand what is meant by the noise we need to define noise for our context: Noise is an unwanted property of a signal, caused by something during or after signal capture, which has negative properties on the task at hand. However, something that exists independently of signal capture, shouldn’t be considered noise. Unwanted properties due to range chosen for capture shouldn’t considered as noise either.

Our signal is an image. High-frequency fluctuations and low-frequency background changes are the most common types of noise which is processed out as is done in Chowdhury et al. (2013) and Wählby et al. (2002), respectively. Some unwanted

(37)

4.4. Denoising 27 properties aren’t considered as noise e.g., cell properties (e.g., clustering, size), sat- uration, and non-cell cell-like objects. If we consider images as two-dimensional arrays, noise can be also explained as

E_N×M =G_N×M −FN×M (4.1)

whereE is noise,Gis obtained image andF is the real representation of the image.

N and M represent height and width of the image.

4.4.1 High frequency removal

Several methods have been proposed through years for removing high-frequency noisei.e., reaching closer to F from Equation 4.1. A very simple method is a mean filter. In mean filtering, each pixel is thought as the center of an L-by-L array where L is odd (it is odd for convenience so that there exists a center point for the array).

The mean of that array is calculated and placed into the filtered image. This can be presented with:

ai,j = 1 L²

i+k

X

m=i−k j+k

X

n=j−k

gm,n and k = L−1

2 (4.2)

where g_i,j is a pixel of the noisy image N-by-N array G and a_i,j is a pixel of the mean filtering result N-by-N array A. In addition, a decision should be done on how to address borders. Mean filtering removes information from the image and running this several times will remove almost all the information from the image.

High-frequency information is however removed first.

Median filter is a similar method to mean filter, but instead of calculating the mean of array, median is used instead:

if D={d₁, d₂, d₃· · ·d_2n−1} and d_i ≤d_i+1, then d_n =M edian D if D={d₁, d₂, d₃· · ·d_2n} and d_i ≤d_i+1, then d_n+d_n+1

/2 =M edian D (4.3)

bi,j =Median







gi−k,j−k · · · gi−k,j+k

... . .. ... g_i+k,j−k · · · g_i+k,j+k







!

, k = L−1

2 (4.4)

(38)

4.4. Denoising 28 where B (N-by-M array with point at b_i,j) is result of median filtering. Similar to mean filter, a special strategy has to be made for border areas. For example, this can be with extending borders ofAfor length k to each direction and filling it with zeroes, closest value to the location, or another side of A.

Both median filter and mean filter are juvenile in many aspects, but in certain situations A ≈ F and B ≈ F. Let us consider the more mature approach, the two- dimensional adaptive Wiener filter as represented in Lim (1990, pp. 538, 539). It approximates F with

w_i,j =a_i,j + 1 σ_i,j²

σ_i,j² +σ_E² g_i,j −a_i,j

(4.5)

σ²_i,j+σ²_E = 1 L²

1+k

X

m=i−k j+k

X

n=j−k

(g_m,n−a_m,n)², k = L−1

2 (4.6)

where w_i,j is pixel of processed image W ≈ F and σ²_E is variance of noise E. The filter is adaptive in the sense that it adapts to local areas while also adjusting to the whole image.

Computing power has increased much since the year 1980. Using this new re- source more complex approaches can be developed. Block-matching and 3D filtering (BM3D) algorithm presented in Dabov et al. (2006) is a novel method (in cell segmentation) that utilizes increased computational power. It searches similar areas from image and process these similar stacks. BM3D uses a two-phase filtering process to denoise groups of blocks from the image based on their similarities. A cell segmentation tool developed by Chowdhury et al. (2013) uses this method as its denoising method.

BM3D consist of several parts that can be changed. There are defining features though, which separates it from other methods, or from methods it uses. First, the image is separated to blocks and for each stack of similar blocks are searched.

This stack is collected for each block in image separately, but each block is of same size(e.g., eight-by-eight). For each stack, we perform the three-dimensional transform. Three-dimensional transform is three one-dimensional transform done in a row to each dimension. In BM3D, two same one-dimensional transforms are performed to image dimensions(i.e.,two-dimensional transform) and third can be different transform performed in the dimension of the stack. Transformed 3D-array is then filtered, and then inversely transformed.

(39)

4.4. Denoising 29

(a)σ²≈6250 (b) σ²≈792 (c) σ²≈294 (d) σ²≈2.33·10⁻⁴

(e) σ²≈11000 (f) σ²≈1250 (g) σ²≈228 (h) σ²≈6.63·10⁻⁴

Figure 4.6 Different denoising methods results. (a) and (e) are original images and to their right are filtered images: (b) and (f ) median filter, (c) and (g) wiener filter, (d) and (h) BM3D. Original image has artificially added gaussian noise to each pixel, top row has σ= 27and bottom row σ= 28.

Now we have produced a new image, which is not the final result but is instead used as a input for new stacking. We use these stack locations but with pixels from the original image. Then, we again do Three-dimensional transform and again we filter.

Now we get the final result when applying the inverse transformation.

Several methods can be used in BM3D. Transform can be any orthonormal transform (Dabov et al. 2008) e.g., discrete cosine transform (Ahmed et al. 1974) and Haar transform (Haar 1910 cited in Heil & Walnut 2009). Discrete cosine transform was used for two-dimensional transform and Haar transform as one-dimensional transform in Dabov et al. (2006). For filtering, Dabov et al. (2006) used first thresholding with a constant value calculated from properties of the image. In the second time, Wiener filtering and it is used adapted to the first result.

Difference between median, wiener, and BM3D is demonstrated in Figure 4.6.

Results are generated using MATLAB ¹ and BM3D MATLAB software ². The median filter uses a 9-by-9 array as a frame of neighbors and substitutes out of border with zeros. Wiener filter uses a 9-by-9 array as a frame of neighbors and the standard deviation was given. BM3D was given used standard deviation. MATLAB

1R2013b, The MathWorks inc., Natick, MA, U.S.

2http://www.cs.tut.fi/ foi/GCF-BM3D/

(40)

4.4. Denoising 30 code used is shown in Program 4.1. The base image is a green color channel on Figure 4.2(d).

1 im = i m w r i t e (" c e l l _ i m a g e "); % Lo a d i m a g e

g =( r g b 2 g r a y ( im ); % C h a n g e c o l o r i m a g e to g r a y s c a l e i m a g e

3 std = 2 ^ 7 % D e f i n e s t a n d a r d d e v i a t i o n ( std )

n o i s e = r a n d n ( s i z e ( g ))* std % G a u s s i a n n o i s e w i t h : m e a n 0 , std 2^7

5 % N o i s e is a d d e d to the g r a y s c a l e i m a g e g r a y I m a g e = u i n t 8 (double( g )+ n o i s e )

7 % M e d i a n filter , n e i g h b o r s 9 - by -9 m e d i a n R e s u l t = m e d f i l t 2 ( g r a y I m a g e , [ 9 ,9 ] ) ;

9 % A d a p t i v e w i e n e r 2 D f i l t e r

w i e n e r R e s u l t = w i e n e r 2 ( g r a y I m a g e ,[9 ,9] , std );

11 % B M 3 D filter , a r g u m e n t d e s c r i p t i o n : 1 st i n f o r m s t h at c o r r e c t i m a g e is

% not p r o v i d e d , 4 th d e s c r i b e s p a r a m e t e r set ( f a s t or n o r m a l profile ,

13 % n o r m a l was c h o s e n . 5 th m e a n s n o t h i n g is p l o t t e d or p r i n t e d by

% f u n c t i o n .

15 B M 3 D R e s u l t = B M 3 D (1 , g r a y I m a g e , std ,’ np ’,0);

Program 4.1MATLAB code and parameters used to produce results of Figure 4.6 top row.

By changing standard deviation to 28 bottom row result of Figure 4.6 could be reproduces

4.4.2 Low frequency noise removal

Noise can take different forms. The background of an image can have large artifacts.

Removing these is important for segmentation as they do not represent the actual cell. If the background is accurately detected, subtracting it from the image removes it. There are several ways to do background detection. One way to do this is to use the methods provided in the previous section but instead, use those to detect larger objects and regard cells as a noise.

These are not only available approaches. One approach specifically developed for background subtraction was presented in Wählby et al. (2002). The method used in it, the background is assumed smooth and slowly changing. With iterations, it is improved until it forms the best match for the background. This method was not tested with a large dataset so it is unclear how well it would perform in different conditions. It is most likely effective only removing this certain type of noise. (Wählby et al. 2002)

Cancer cell segmentation and data extraction

OSSI SANDBERG