• Ei tuloksia

Automated 3D segmentation and morphometry of white matter ultrastructures

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Automated 3D segmentation and morphometry of white matter ultrastructures"

Copied!
142
0
0

Kokoteksti

(1)

DISSERTATIONS | ALI ABDOLLAHZADEH | AUTOMATED 3D SEGMENTATION AND MORPHOMETRY OF WHITE... | No 662

ALI ABDOLLAHZADEH

Automated 3d segmentation and morphometry of white matter ultrastructures

Dissertations in Health Sciences

PUBLICATIONS OF

THE UNIVERSITY OF EASTERN FINLAND

(2)
(3)

AUTOMATED 3D SEGMENTATION AND MORPHOMETRY OF WHITE MATTER

ULTRASTRUCTURES

(4)
(5)

Ali Abdollahzadeh

AUTOMATED 3D SEGMENTATION AND MORPHOMETRY OF WHITE MATTER

ULTRASTRUCTURES

To be presented by permission of the Faculty of Health Sciences, University of Eastern Finland for public examination in MS302 Auditorium,

Kuopio

on December 3rd, 2021, at 16:00

Publications of the University of Eastern Finland Dissertations in Health Sciences

No 662

A.I. Virtanen Institute for Molecular Sciences University of Eastern Finland, Kuopio

2021

(6)

Series Editors

Professor Tomi Laitinen, M.D., Ph.D.

Institute of Clinical Medicine, Clinical Physiology and Nuclear Medicine Faculty of Health Sciences

Professor Tarja Kvist, Ph.D.

Department of Nursing Science Faculty of Health Sciences

Professor Ville Leinonen, M.D., Ph.D.

Institute of Clinical Medicine, Neurosurgery Faculty of Health Sciences

Professor Tarja Malm, Ph.D.

A.I. Virtanen Institute for Molecular Sciences Faculty of Health Sciences

Lecturer Veli-Pekka Ranta, Ph.D.

School of Pharmacy Faculty of Health Sciences

Distributor:

University of Eastern Finland Kuopio Campus Library

P.O.Box 1627 FI-70211 Kuopio, Finland

www.uef.fi/kirjasto

PunaMusta Oy Joensuu, 2021

ISBN: 978-952-61-4384-2 (print/nid.) ISBN: 978-952-61-4385-9 (PDF)

ISSNL: 1798-5706 ISSN: 1798-5706 ISSN: 1798-5714 (PDF)

(7)

Author’s address: A.I. Virtanen Institute for Molecular Sciences University of Eastern Finland

KUOPIO FINLAND

Doctoral programme: Doctoral Programme in Molecular Medicine

Supervisors: Docent Alejandra Sierra, Ph.D.

A.I. Virtanen Institute for Molecular Sciences University of Eastern Finland

KUOPIO FINLAND

Professor Jussi Tohka, Ph.D.

A.I. Virtanen Institute for Molecular Sciences University of Eastern Finland

KUOPIO FINLAND

Reviewers: Assistant Professor Juho Kannala, Ph.D.

Department of Computer Science Aalto University

Espoo Finland

Anna Kreshuk, Ph.D.

Interdisciplinary Center for Scientific Computing University of Heidelberg

Heidelberg Germany

Opponent: Professor Tolga Tasdizen, Ph.D.

Department of Electrical and Computer Engineering University of Utah

Salt Lake City, Utah United States

(8)
(9)

Abdollahzadeh, Ali

Automated 3d segmentation and morphometry of white matter ultrastructures Kuopio: University of Eastern Finland

Publications of the University of Eastern Finland Dissertations in Health Sciences 662. 2021, 82 p.

ISBN: 978-952-61-4384-2 (print) ISSNL: 1798-5706

ISSN: 1798-5706

ISBN: 978-952-61-4385-9 (PDF) ISSN: 1798-5714 (PDF)

ABSTRACT

Two-dimensional electron microscopy (EM) has traditionally been the method of choice when examining certain properties of the brain tissue, e.g., when

measuring cellular aspects of both healthy and diseased brains. Recently, advances in EM imaging have made it possible to obtain three-dimensional (3D) image datasets by employing serial-sectioning methods, allowing for the

acquisition of gigabytes to even petabytes stacks of serial EM images at nanometer resolutions from hundreds of micrometers of brain tissue. Analyzing the

morphology of brain ultrastructures in the acquired large EM volumes requires segmenting the individual ultrastructural components, which is a tedious task when performed manually or semi-automatically and would clearly benefit from the development of automated segmentation techniques.

In this thesis, we developed automated segmentation techniques that trace the entirety of brain ultrastructures in EM volumes of white matter. We developed the AutomatiC 3D Segmentation of axONs (ACSON) pipeline based on a region

growing algorithm that was capable of segmenting myelin, myelinated axons, mitochondria, and cell nuclei in small field-of-view high-resolution EM datasets. We also developed the DeepACSON pipeline that performed a convolutional neural networks-based semantic segmentation and shape decomposition-based instance segmentation, annotating large field-of-view, low-resolution EM datasets of white matter into their ultrastructural components. DeepACSON proved able to segment hundreds of thousands of long-span myelinated axons, thousands of cell nuclei, and millions of mitochondria. The DeepACSON instance segmentation was based on a novel cylindrical shape decomposition technique that we developed

(10)

exploiting the tubularity of myelinated axons which decomposed under- segmented myelinated axons into their constituent axons.

ACSON and DeepACSON pipelines segmented EM volumes of rats’ white matter after a sham-operation or a traumatic brain injury with excellent evaluation scores.

These techniques have made it possible to acquire a comprehensive 3D morphometry of white matter ultrastructures, capturing pathomorphological alterations in the white matter at the nanoscopic level. Our 3D morphology

analyses demonstrated that the diameter of myelinated axons varied substantially along their longitudinal axis, and furthermore that the cross-sections of

myelinated axons were more elliptic than circular. In addition, our findings indicated that even five months after the brain injury, there were persistent changes in the diameter and tortuosity of the axons, as well as in the density of myelinated axons.

Medical Subject Headings: Animals; Axons; Brain; Brain Injuries, Traumatic; Cell Nucleus; Deep Learning; Imaging, Three-Dimensional; Microscopy, Electron;

Mitochondria; Myelin Sheath; Neural Networks, Computer; White Matter

Yleinen suomalainen ontologia: aivot; aivovammat; aksonit; elektronimikroskopia;

eläinkokeet; hermosolut; kuvantaminen; mitokondriot; neuroverkot;

syväoppiminen; tuma; valkea aine; 3D-mallinnus

(11)

ACKNOWLEDGEMENTS

This thesis was conducted in the Multiscale Imaging Group at the A.I. Virtanen Institute for Molecular Sciences, the University of Eastern Finland, 2017-2021.

First and foremost, I want to thank my principal supervisor, Adjunct Professor Alejandra Sierra, Ph.D.; it was my privilege to work in her multidisciplinary team, where I was introduced to multi-scale brain imaging and volume electron

microscopy. For this and for her invaluable advice and guidance, endless patience, and for everything that she taught me, I will always be grateful. Equally, I want to thank my second supervisor Professor Jussi Tohka, Ph.D.; it was an honor to work with him. His remarkable knowledge of mathematics and medical image analysis was a source of inspiration for me, and it assisted me in making my own advances in these fields; for this, I will always be indebted to him.

I wish to thank my thesis pre-examiners, Assistant Professor Juho Kannala, Ph.D., and Anna Kreshuk, Ph.D., for reviewing my thesis and their encouraging evaluation statements. I thank Ewen MacDonald, Ph.D., for the language revision.

I want to express my gratitude to Professor Olli Gröhn, Ph.D., for his

exceptional MR imaging works and the scientific enthusiasm that has inspired all his students, including myself, during these years. I acknowledge the scientific contribution of my co-authors, Ilya Belevich, Ph.D., and Eija Jokitalo, Ph.D., and Maarit Pulkkinen’s help with animal handling for the publications in this thesis. I thank my colleagues in BIU for their excellent seminars and discussions, especially Raimo Salo, Ph.D., Elias Ylä-Herttuala, Ph.D., Ekaterina Paasonen, Ph.D., Viivi Hyppönen, M.Sc., Miguel Valverde, M.Sc., Vandad Imani, M.Sc., and Lenka Dvoráková, M.Sc., that made working in the Virtanen Institute an extremely memorable and pleasant experience.

I thank my friends, some of whom I have not seen in years but with whom I have twenty cherished years of friendship, and whose phone calls help me forget about work for a time; I also thank my friends, that I have come to know from my time in Finland, for their company in these years. But most of all, I thank Lucia Gräschke for her constant support and friendship throughout these years and, of course, my cat, Bernie, and the time that I had the chance to serve His Majesty.

Finally, I owe my deepest thanks to my family, my brother and sisters and their families, and above all to my parents, Asef and Fatemeh - their unconditional love and unwavering support have encouraged me all through my life.

(12)

This work was funded by the Academy of Finland, Doctoral Programme in Molecular Medicine, and National Institutes of Health (NIH).

Kuopio, October 2021, Ali Abdollahzadeh

(13)

LIST OF ORIGINAL PUBLICATIONS

This dissertation is based on the following original publications:

I Abdollahzadeh, A., Belevich, I., Jokitalo, E., Tohka, J. & Sierra, A. Automated 3D Axonal Morphometry of White Matter. Sci. Rep. 9, 6084 (2019).

II Abdollahzadeh, A., Belevich, I., Jokitalo, E., Sierra, A. & Tohka, J. DeepACSON Automated Segmentation of White Matter in 3D Electron Microscopy.

Communications Biology 4 (1), 179 (2021).

III Abdollahzadeh, A., Sierra, A. & Tohka, J. Cylindrical Shape Decomposition for 3D Segmentation of Tubular objects. IEEE Access 9: 23979–95 (2021).

These articles were published under a Creative Commons Attribution License (CC BY).

(14)
(15)

CONTENTS

ABSTRACT ... 7

ACKNOWLEDGEMENTS ... 9

1 INTRODUCTION ... 17

2 REVIEW OF THE LITERATURE ... 19

2.1 White matter ultrastructures ... 19

2.1.1 Axon ... 19

2.1.2 Myelin ... 20

2.2 Electron microscopy ... 20

2.2.1 Scanning electron microscopy ... 22

2.2.2 Serial block-face scanning electron microscopy ... 23

2.3 image segmentation ... 23

2.3.1 Preliminaries ... 24

2.3.2 Seeded region growing ... 24

2.3.3 Deformable models ... 25

2.3.3.1Snakes: active contour models ... 25

2.3.3.2Level set methods ... 26

2.3.4 Shape decomposition ... 27

2.3.4.1Simple primitives ... 28

2.3.4.2Semantic components ... 28

2.3.5 Neural networks ... 29

2.3.5.1Multilayer perceptron ... 30

2.3.5.2Convolutional neural networks ... 30

2.3.6 Image segmentation in electron microscopy ... 34

3 AIM OF THE STUDY ... 37

4 MATERIALS AND METHODS ... 39

4.1 Sample prepAration and image acquisition ... 39

4.1.1 Animals ... 39

4.1.2 The traumatic brain injury model ... 39

4.1.3 Tissue preparation for SBEM ... 39

4.1.4 SBEM data... 40

4.2 Segmentation evaluation metrics ... 42

4.3 Statistics and reproducibility ... 43

5 RESULTS ... 45

(16)

5.1 Pre-processing: data alignment and denoising ... 45

5.2 ACSON segmentation pipeline ... 45

5.2.1 Segmentation post-processing ... 46

5.2.2 Annotating subcellular structures and cells ... 47

5.3 DeepACSON segmentation pipeline ... 47

5.3.3 Instance segmentation of myelin, myelinated axons, and mitochondria ... 49

5.3.4 Instance segmentation of cell nuclei ... 49

5.4 Cylindrical shape decomposition ... 50

5.4.1 Skeleton partitioning ... 50

5.4.1.1Curve skeleton ... 50

5.4.1.2Skeleton graph decomposition ... 52

5.4.2 Cylindrical decomposition ... 52

5.4.2.1Decomposition interval ... 52

5.4.2.2Critical point ... 52

5.4.3 Object reconstruction ... 53

5.5 Experimental results ... 53

5.5.1 Segmentation of white matter ultrastructures ... 53

5.5.2 White matter morphology analysis ... 54

5.5.2.1Substantial differences between 2D and 3D morphological analyses 56 5.5.2.2Ultrastructural analysis of white matter in traumatic brain injury57 5.5.3 Evaluations ... 60

5.5.4 Computation time ... 64

6 DISCUSSION ... 67

7 SUMMARY AND CONCLUSIONS ... 71

REFERENCES ... 73

(17)

ABBREVIATIONS

2D Two-dimensional

3D Three-dimensional

ACD Approximate Convex Decomposition

ACSON Automatic Segmentation of Axons

ARE Adapted Rand Error

ATUM Automated Tape-Collecting Ultramicrotomy

BE Boundary Error

CNN Convolutional Neural Network

CNS Central Nervous System

CSD Cylindrical Shape Decomposition

DNN Deep Neural Network

EM Electron Microscopy

FCN Fully Convolutional Networks

FFN Flood Filling Network

FIB-SEM Focused Ion Beam-Scanning Electron Microscopy

GCD Generalized Cylinder Decomposition

IID Independent And Identically Distributed

(18)

LOGO Leave One Group Out

MIB Microscopy Image Browser

MLP Multilayer Perceptron

NN Neural Network

RE Rand Error

SBEM Serial Block-Face Scanning Electron Microscopy

SEM Scanning Electron Microscopy

SLIC Simple Linear Iterative Clustering

SRG Seeded Region Growing

SVM Support Vector Machine

TBI Traumatic Brain Injury

TEM Transmission Electron Microscopy

TEMCA Transmission Electron Microscope Camera Array

VOI Variance of Information

(19)

1 INTRODUCTION

Quantitative analyses of ultrastructures in health and disease are essential in understanding brain anatomy and its function, brain disease mechanisms, or finding potential biomarkers to monitor disease progression. Electron microscopy (EM) has traditionally been the method of choice to access brain tissue

ultrastructures and quantify their morphology, distribution, or interactions (Gray, 1959), but the EM approach only provides 2-dimensional (2D) imaging; hence it gives a 2D representation of 3-dimensional (3D) structures. Recently, advanced EM techniques have enabled 3D imaging by employing serial-sectioning methods (except for tilt-series-based tomography) such as surface removal, surface milling, or ultra-thin sectioning, which make it possible to acquire gigabytes even

petabytes of serial EM images of the brain tissue at nanometer resolutions (Zheng et al., 2018; Phelps et al., 2021). Serial block-face scanning EM (SBEM) (Denk and Horstmann, 2004) is an example of a serial-sectioning technique that can generate stacks of high-resolution high-contrast images of the brain tissue for up to several hundred micrometers with an approximate resolution of ten nm in-plane and 25 to 30 nm in the imaging direction. SBEM combines automated block-face imaging with serial-sectioning inside the vacuum chamber of a scanning EM (SEM), where each time, an ultra-microtome removes the block-face and exposes a new surface for imaging. SBEM exploits backscattering contrast to visualize a heavy-metal- stained tissue.

Despite the improved image acquisition rate, acquiring SBEM volumes is still a compromise between the imaging resolution and field-of-view. For example, EM imaging of the brain tissue at high resolutions, e.g., 4×4×40 nm3, and large fields- of-view, e.g., one cubic millimeter, requires microscopes continuously running for several months (Phelps et al., 2021), which is both time-consuming and costly. On the other hand, imaging the same tissue volume at low-resolutions, e.g., 50×50×50 nm3, reduces the imaging time and image size by approximately 200 fold.

Therefore, SBEM imaging at high-resolutions provides precision to the level of synaptic connectivity but at the cost of generalization due to small fields-of-view;

on the other hand, imaging at low-resolutions loses ultrastructural details but gains large fields-of-view, allowing for long neurite tracking and makes it possible to analyze the spatial distribution of ultrastructures in the brain.

Analyzing the morphology of brain ultrastructures from the acquired large EM volumes requires annotating individual ultrastructural components, which is

(20)

tedious when performed manually or semi-automatically (Berning, Boergens and Helmstaedter, 2015). Therefore, automated algorithms for EM segmentation is in high demand, but this has proved to be a challenging task because of 1) the large sizes of EM volumes that restrict applying segmentation techniques with high computation complexity or strategies that require a substantial memory size; 2) discontinuities in resolving cellular membranes due to EM acquisition at low- resolutions or staining imperfections; and finally 3) the complexity of the geometry of ultrastructures, millions of which can be tightly packed in a volume smaller than a cubic millimeter.

This thesis aims to develop novel pipelines for automatic 3D segmentation and morphometry of ultrastructures in small field-of-view high-resolution or large field- of-view low-resolution SBEM volumes of white matter. The automated pipelines eliminate the need for time-consuming manual segmentation of 3D datasets and enable comprehensive 3D analyses of white matter ultrastructures.

(21)

2 REVIEW OF THE LITERATURE

2.1 WHITE MATTER ULTRASTRUCTURES

White matter comprises half of the human brain. It is the area of the central nervous system (CNS) in deep parts of the brain and superficial parts of the spinal cord that is mainly composed of long-span bundles of myelinated axons. In addition, white matter contains oligodendrocytes (glial cells that produce myelin), blood vessels, perivascular cells, and neuronal cell bodies (Duchatel, Shannon Weickert and Tooney, 2019). White matter serves as the basis for nervous system connection and communication, affects learning and brain functions, regulates the distribution of action potentials, and coordinates communication across different brain areas (Douglas Fields, 2008).

2.1.1 Axon

An axon is a long slender projection of a neuron (Figure 1a) that conducts electrical impulses away from the neuron’s cell body or soma toward other neurons. Anatomically, all axons have a beginning (the axon hillock), a middle (the axons proper), and an end called the axon terminal. The axon hillock, or initial segment, is the region where the plasma membrane generates nerve impulses. In humans, the length of an axon can range from less than one millimeter to over a Figure 1. The anatomy of a neuron. (a) A typical myelinated axon in thecentral nervous system (CNS). This figure is from (User:Dhp1080, CC BY-SA 3.0, via Wikimedia Commons). (b) Oligodendrocytes form the myelin sheath around the axons of CNS neurons. This figure was released into the public domain.

(22)

meter long, and its diameter may range from about 1 to 25 µm. The speed of action potential varies depending on the axonal diameter, where the impulse travels faster in thicker axons (Bear, Connors and Paradiso, 2007). At the axon terminal, an axon forms junctions, called synapses, with other neurons to pass information (Silverthorn et al., 2018).

2.1.2 Myelin

In the white matter, long axons are generally insulated by the concentric wrapping of oligodendrocytes’ cell processes called the myelin sheath (Figure 1b).

Oligodendrocytes can produce myelin to cover several axons (Bear, Connors and Paradiso, 2007). Myelin comprises approximately 40% water, and the dry mass consists of 60% to 75% lipid and 15% to 25% proteins (Steinman, M.D, 1996).

Myelin reduces the capacitance of the axonal membrane. Thin layers of protein separate the concentric layers of myelin lipids, making myelin a high-resistance, low-capacitance, electrical insulator. The myelin sheath is interrupted periodically, leaving short-length regions called nodes of Ranvier (Figure 1a, b), where the axonal membrane is exposed to the extra-axonal space. This discontinuity in the myelin sheath around axons results in saltatory conduction, in which the action potential propagates down the length of a myelinated axon jumping from one Ranvier node to the next node, increasing the conduction velocity (Waxman, 2005).

Compared to myelinated axons, unmyelinated axons have a much slower conduction velocity in transferring the action potential.

2.2 ELECTRON MICROSCOPY

The nervous system is densely packed with ultrastructures that can be as thin as 40 to 50 nm in diameter (Meinertzhagen and O’Neil, 1991), and tracing these ultrastructures is required to reconstruct their local connectivity or to estimate their morphology. Light microscopic tracing is only possible via labeling a subset of neurons, and more importantly, the resolution available in light microscopy is insufficient to resolve thin ultrastructures when their diameters are substantially below the light wavelength. The application of standard heavy metals-based EM stains, on the other hand, results in relatively unbiased staining of all membranes and synapses (Peters, Palay and Webster, 1991). EM techniques also provide the spatial resolution to resolve densely packed neighboring neuronal processes for reconstructing neural circuits or estimating neuronal geometry (Denk and

(23)

There are two main types of EM techniques, transmission EM (TEM) and SEM.

TEM emits high-voltage electrons to a thin specimen, and only a fraction of electrons pass through the specimen and are focused onto an electron-sensitive detector to form a 2D cross-sectional image of the specimen. In SEM, low-voltage electrons are emitted to the surface of the specimen in a raster-scan manner, and secondary or back-scattered electrons are detected, creating a 3D appearance of the surface of the specimen.

To obtain true 3D information using TEM, tomography-based techniques (Hoppe, 1981; Baumeister, 2002) tilt an ultrathin tissue section, i.e., approximately 1 µm thick tissue, to obtain high-resolution images of macromolecules or

organelles. The TEM camera array (TEMCA) (Bock et al., 2011) or TEMCA2 (Zheng et al., 2018) are other TEM-based techniques that use serial sectioning, 40 to 90 nm thick tissue cut by an ultramicrotome, to image individual thin sections at high resolutions and acquisition rates. Acquiring 3D EM datasets using SEM includes serial block-face scanning electron microscopy (SBEM) (Denk and Horstmann, 2004), focused ion beam-scanning EM (FIB-SEM) (Knott et al., 2008), and automated tape-collecting ultramicrotomy with SEM (ATUM) (Hayworth et al., 2006). In SBEM, an ultramicrotome cuts the surface of the tissue inside an SEM chamber, and in FIB-SEM, a focused ion beam mills the surface. In ATUM, thin tissue sections are cut by an ultramicrotome, collected on a support tape automatically, and imaged with an SEM.

None of the currently available 3D TEM- and SEM-based techniques can be considered the ultimate approach for 3D EM imaging. Instead, each method has tradeoffs in size and resolution and deciding which method is the most

appropriate is relative to the questions under investigation. For example, FIB-SEM offers the highest z-resolution of 5 nm, but the tissue size is limited to 50–100 µm per side. This can be compared to ATUM and SBEM diamond-knife sectioning that offers the z-resolution of 20-30 nm but allows for larger tissue sizes; these are per side 0.5 mm for SBEM and larger than 2 mm for ATUM (Kubota, Sohn and

Kawaguchi, 2018).

In this thesis, we used SBEM imaging as the method of choice to acquire EM volumes of the brain tissue ultrastructures. In the following sections, we briefly describe SEM as the backbone of SBEM and then continue with the SBEM imaging technique.

(24)

2.2.1 Scanning electron microscopy

Electrons are emitted from an electron gun (thermionic or field emission), typically accelerated by low voltages of 0.3 to 4 kV in SEM. As shown in Figure 2a, two or three electromagnetic condenser lenses focus the electron beam into a fine probe that can raster the specimen surface in a selected area. Figure 2b shows that the incident electrons penetrate the specimen surface, where the penetration depth depends on the energy of the electron beam, the atomic masses of elements in the specimen, and the angle at which the electron beam hits the specimen. The electron specimen interaction produces secondary, backscattered, and Auger electrons, x-rays, and perhaps light, which various detectors can collect. Secondary electrons are formed just below the sample surface, escape the orbitals of an atom, showing the morphology and topography of the specimen surface and providing the highest resolution. Backscattered electrons approach the atom Figure 2. (a) Schematic of a scanning electron microscopy. This figure is from (User:Steff, modified by User:ARTEderivative work MarcoTolo, CC BY-SA 1.0, via Wikimedia Commons). (b) Electron specimen interaction. This figure is from

(Ponor, CC BY-SA 4.0, via Wikimedia Commons). (c) Schematic illustration of a serial block-face scanning electron microscope.

(25)

nucleus and are scattered by a large angle, re-emerging from the surface. There are not as many backscattered electrons as secondary electrons, and they form images with slightly lower resolution than secondary electron images, providing compositional or crystallographic information (Reimer, 1998).

2.2.2 Serial block-face scanning electron microscopy

Denk and Horstmann (Denk and Horstmann, 2004) were the first to introduce SBEM, which is now available commercially as the 3View® system (Gatan Inc., Pleasanton, CA, USA). SBEM essentially uses an ultramicrotome inside the

chamber of an SEM. The ultramicrotome cuts through the sample, exposing a new face of the sample. Backscattering contrast is used to visualize the heavy-metal- stained tissue, and low-vacuum conditions prevent any charging of the uncoated block face. This procedure is repeated throughout the entire sample, resulting in a stack of high-resolution, high-contrast images, where each layer can be as small as 25 to 30 nm. SBEM imaging throughput is fast, approximately 0.5 µs/pixel, and the method is relatively straightforward. Compared to the FIB-SEM technique, SBEM can image a larger area because the diamond knife can cut the entire block

surface. Compared to ATUM, the SBEM technique images the block surface located right under the SEM column so that the acquired images are minimally affected by movements (Kubota, Sohn and Kawaguchi, 2018). Compared to TEMCA, SBEM has a longer throughput time and acquires images in a smaller field of view, but TEMCA is an expensive approach and requires customized features and machines that are not yet commercially available. It is noteworthy that the SBEM imaging technique comes with several aspects that need consideration, e.g., debris can fall on the block surface, the sections can crumble at long dwell-time, or an electric charge can accumulate at the block surface (Kubota, Sohn and Kawaguchi, 2018).

2.3 IMAGE SEGMENTATION

Image segmentation is a well-studied topic in computer vision, which is the process of partitioning a digital image into multiple segments by assigning a label to every pixel/voxel in the image such that pixels/voxels with the same label share specific characteristics. There are two approaches to the image segmentation problem: model-free or bottom-up methods and knowledge-based or top-down methods. A model-free approach clusters pixels/voxels with a similarity measure, and a knowledge-based method constrains the space of admissible solutions on the existing a-priori knowledge. We can also distinguish two image segmentation

(26)

types based on the output labels: semantic and instance segmentation. Semantic segmentation assigns each voxel the belonging category of the object, and instance segmentation, on the other hand, identifies individual objects within the categories and assigns each voxel a belonging instance. For example, in an image of a classroom, segmenting students as one object against the background is a semantic segmentation and segmenting each student in the image as an individual represents instance segmentation.

Analyzing the morphology of ultrastructures or mapping their connectivity in the acquired EM datasets requires the segmentation of individual ultrastructural components. However, manual segmentation of ultrastructures in 3D EM datasets is a tedious task and consumes thousands of hours of experts’ time. For example, Berning et al. (Berning, Boergens and Helmstaedter, 2015) reported that manual annotation of 215 neurites in 5×108 voxels required 1500 h, or in study I, we estimated that the manual segmentation of 15×15×15 μm3 of white matter ultrastructures would need 2400 h to be completed. Therefore, there is a high demand to develop automated (or semi-automated) segmentation techniques. In the remainder of this section, we review segmentation techniques that have been used for annotating EM datasets in our studies, including region growing methods, deformable models, and convolutional neural network (CNN)-based techniques.

2.3.1 Preliminaries

Image. We define an image as 𝐼: 𝑋 → ℝ, where 𝑥 ∈ 𝑋 is a 2D (3D) coordinate belonging to the image domain 𝑋 ⊂ ℤ2 (𝑋 ⊂ ℤ3).

Binary image. A binary image is an image where 𝐼: 𝑋 → {0,1}.

Object. An object Ω ⊂ ℝ3 is a nonempty bounded open set, where we assume that its boundary ∂Ω is homeomorphic to a 2-sphere.

Curve skeleton. Given an object Ω and its surface ∂Ω, we define the curve skeleton Υ of the object as a locus of centers of maximal inscribed balls. A ball 𝐵(𝑥, 𝑟) centered at 𝑥 ∈ Ω with radius 𝑟 is maximally inscribed in Ω if ∀𝐵, 𝐵 ⊆ 𝐵⊆ Ω ⇒ 𝐵= 𝐵.

2.3.2 Seeded region growing

Seeded region growing (SRG) is a model-free segmentation technique. Similar to the conventional region growing, SRG postulates the similarity of voxels within regions and is controlled by a small number of voxels known as seeds (Adams and

(27)

𝑞 sets,𝑆1, 𝑆2, … , 𝑆𝑞; a set may consist of a single point. SRG appends a voxel to one of the seed sets at each step. We now consider the state of the set 𝑆𝑖 after 𝑚 steps.

Let 𝐻 be the set of unallocated voxels (voxels without a label) adjacent to at least one of the labeled regions, defined as:

𝐻 = {𝑥 ∉ ⋃ 𝑆𝑖 𝑞

𝑖=1 |𝑁(𝑥) ∩ ⋃𝑞𝑖=1𝑆𝑖≠ ∅} , (1) where 𝑁(𝑥) is the set of immediate neighbors of the voxel 𝑥. If, for 𝑥 ∈ 𝐻, we have that 𝑁(𝑥) meets just one of the 𝑆𝑖, then we define 𝜑(𝑥) ∈ {1,2, … , 𝑞} to be that index such that 𝑁(𝑥) ∩ 𝑆𝜑(𝑥)≠ ∅. Now, we define 𝛿 (𝑥) to be the measure of how different 𝑥 is from the region it adjoins. We calculate 𝛿 (𝑥) as:

𝛿(𝑥) = |𝐼(𝑥) − mean

𝑦∈𝑆𝜑(𝑥) [𝐼(𝑦)]|, (2) where 𝐼(𝑥) is the intensity value of the image at the testing voxel 𝑥. If 𝑥 is a neighbor to two or more labeled regions, 𝑥 appends to the region with minimum 𝛿(𝑥) (Adams and Bischof, 1994; Fan et al., 2005). The SRG algorithm terminates when all voxels are allocated to their corresponding regions. The SRG algorithm is rapid and easy-to-tune, where the choice of similarity metric, similarity threshold, and starting seeds determine the segmentation results.

2.3.3 Deformable models

Deformable models have been widely used in biomedical image segmentation since Kass et al. introduced snakes (Kass, Witkin and Terzopoulos, 1988).

Deformable models use an energy functional whose optimum defines a

segmentation. Generally, deformable models are curves/surfaces that deform by internal forces (enforcing the smoothness of the curve/surface) and external forces (drive the curve/surface toward desired features such as strong edges). We can distinguish between parametric deformable models (active contours) and non- parametric deformable models, also called level sets or geometric deformable models.

2.3.3.1 Snakes: active contour models

Kass et al. (Kass, Witkin and Terzopoulos, 1988) called a parametric deformable model that finds salient contours in an image as a snake or an active contour because the model always minimizes its energy functional and exhibits dynamic behavior. Mathematically, we represent a parametric curve with a function 𝐶(𝑠): [0,1] → ℝ2, where 𝑠 ∈ [0,1] is the arc-length parameter. We can write the energy functional of the parametric curve as:

(28)

𝐸(𝐶) = 𝐸𝑖𝑛𝑡(𝐶) + 𝐸𝑒𝑥𝑡(𝐶), (3) where the internal energy 𝐸𝑖𝑛𝑡(𝐶) characterizes the tension or the smoothness of the contour, and 𝐸𝑒𝑥𝑡(𝐶) moves the curve toward an object boundary or other desired features within the image. We can write the internal energy of the curve as:

𝐸𝑖𝑛𝑡(𝐶) =1

2∫ 𝑤1(𝑠) |𝜕𝐶

𝜕𝑠|

1 2

0

+ 𝑤2(𝑠) |𝜕2𝐶

𝜕𝑠2|

2

𝑑𝑠, (4)

where 𝑤1(𝑠) controls the first-order term, curve tension, and 𝑤2(𝑠) controls the second-order term, curve rigidity. In practice, the weights 𝑤1(𝑠) and 𝑤2(𝑠) are taken as constant values. The external energy 𝐸𝑒𝑥𝑡(𝐶) is derived from the image and is used for driving the curve towards points with high gradient magnitude:

𝐸𝑒𝑥𝑡(𝐶) = ∫ 𝐸𝑖𝑚𝑎𝑔𝑒(𝐶(𝑠))

1 0

𝑑𝑠, (5)

𝐸𝑖𝑚𝑎𝑔𝑒(𝑥) = −|∇(𝐺𝜎(𝑥) ∗ 𝐼(𝑥))|2, (6) where ∇ is the gradient operator, 𝐺𝜎 is a Gaussian of standard deviation 𝜎

convolved with image 𝐼. Adding the two energy terms means that the active contour is attracted to the edges and constrained by its smoothness.

Active contours have several limitations; since the curve is affected by local image features, snakes are sensitive to initial conditions and should be located close to the object boundary. Cohen et al. proposed balloons that use a pressure force to increase the attraction range, reducing the sensitivity to the initialization (Cohen, 1991). Furthermore, because of the explicit parameterization of the model, active contours do not easily allow topological changes, e.g., two neighbor particles are to be assigned to two different regions in an evolution step and require complex reparameterization algorithms (Montagnat, Delingette and Ayache, 2001). In addition, at the high-curvature regions of the propagating front, particles come together, causing numerical instability unless a re-gridding

technique is employed.

2.3.3.2 Level set methods

Geometric deformable models overcome the limitations of snakes and are based on curve evolution theory and level set methods. Osher and Sethian (Osher and Sethian, 1988) introduced the level set as a numerical solution for processing topological changes of contours. In the level set method, a curve is represented implicitly as the zero level set of a time-dependent function 𝜙: Ω → ℝ, such that 𝐶 = {𝑥(𝑡) ∈ Ω: 𝜙(𝑥, 𝑡) = 0}. The curve evolution is independent of parameterization and

(29)

can automatically handle topological changes. The level set function 𝜙 satisfies the level set equation if the curve moves along its normal direction as:

𝜕𝜙

𝜕𝑡 − 𝐹|∇𝜙| = 0, (7)

where ∇𝜙 denotes the gradient of 𝜙, and 𝐹 is a function related to the

characteristics of the evolving curve, e.g., curvature, and image characteristics, e.g., image gradient. The design of 𝐹 depends on image information, and ideally, the curve evolution stops at object boundaries.

The level set methods allow arbitrary topological changes and stable numerical schemes for the curve propagation; however, the curve can pass a discontinuous object boundary. Caselles et al. (Caselles, Kimmel and Sapiro, 1997) proposed the geodesic active contour, a coupling between segmentation based on energy minimization, snakes, and the level set framework, allowing for stable boundary detection, including a discontinuous object boundary. The edge-based evolving contours can still be sensitive to variations in edge contrast and noise and fail for objects with weak edges. An alternative approach is to use image region

characteristics. (Chan and Vese, 2001) proposed the active contours without edges based on regional measures that do not depend on edges; basically, this method solves the minimal partition problem that finds a partitioning of the image, best separates the interior of the curve from its exterior.

2.3.4 Shape decomposition

Shape decomposition is a fundamental problem in geometry processing, where an arbitrary object is seen as an assembly of simple primitives or semantic

components. Shape decomposition is mainly studied in object recognition and retrieval, shape reconstruction, shape clustering, or modeling. In this thesis, we studied shape decomposition in the context of biomedical image segmentation.

We aimed to use shape decomposition as a top-down strategy to incorporate our a priori knowledge of a biological structure, in our case, the tubularity of

myelinated axons, in its segmentation process. More specifically, 3D EM imaging techniques generate large image volumes. The current segmentation techniques generally favor bottom-up strategies, subjected to greedy optimization, to

segment neuronal processes. Shape decomposition as a top-down strategy, on the other hand, can incorporate global objectives, for example, splitting an under- segmented myelinated axon into its constituent axons based on the tubularity of the myelinated axons. In the remainder of this section, we review shape

(30)

decomposition techniques in two categories: representing an object in terms of simple primitives or as semantic components.

2.3.4.1 Simple primitives

Primitives are geometrically homogeneous components with a compact representation and efficient computation (Zhou et al., 2015). For example, ellipsoids (Simari and Singh, 2005) and straight cylinders (Raab, Gotsman and Sheffer, 2004) are primitives with a simple parametric representation; their fitting allows us to simplify the description of a complex object. We can also generate higher-level primitives such as tubular primitives and convex components by trading the primitive simplicity with its generality. In this regard, Plumber (Mortal et al., 2004) constructed tubular primitives applying a seeded region growing

algorithm with a heuristic set of sphere positions and radii to extract ideal tubular components of an object. In (Asafi, Goren and Cohen-Or, 2013; Kaick et al., 2015), an approximate convex decomposition (ACD) was obtained using weakly convex components by analyzing the pairs of points that were visible to each other. Zhou et al. (Zhou et al., 2015) proposed the generalized cylinder decomposition (GCD) algorithm, using cylindricity by measuring the skeleton straightness and the variation among the profiles. Another set of high-level primitive-based

decomposition techniques are cross-sectional sweeping methods that generate homogeneous sweeping components and are computationally less demanding than the ACD and GCD methods. For example, Li et al. (Li, Toon and Huang, 2001) swept the object to detect critical points, where the object geometry/topology substantially changes and decomposed the object at such points. The interactive method in (Goyal et al., 2012) generated local prominent cross-sections from a set of initial seed points to decompose an object.

2.3.4.2 Semantic components

We can use the object curve skeleton or Reeb graph to decompose an object into its semantic components. The curve skeleton is a 1D representation of a 3D object (Cornea, Silver and Min, 2007) that encodes the topology and geometry of an object, and the Reeb graph tracks topology changes in level sets of a scalar function (Cornea, Silver and Min, 2007). In (Reniers, van Wijk and Telea, 2008), skeleton-to-surface mapping, and in (Au et al., 2008), an approximate measure of thickness adjacent to curve skeletons were used for semantic decomposition. The

(31)

partitioning, with skeleton branches serving as sites and structural constraints ensuring tubular decomposition components. The decomposition of a 3D mesh in (Berretti, Del Bimbo and Pala, 2009) was a two-step approach, where at first the Reeb graph captured the surface topology, and then the curvature and adjacency information on the graph critical points were used for a fine localization of boundaries between object components.

2.3.5 Neural networks

Artificial neural networks or neural networks (NNs) are a family of machine learning models that mimic the human brain’s biology and the way it operates.

These models can learn to perform specific tasks based on the provided examples.

For instance, we can feed an NN with thousands of labeled images of cars, houses, and trees and expect that it would find visual patterns in the images to identify cars, houses, and trees in other images. Neural networks are essentially a set of connected artificial neurons, where the connection strength between pairs of neurons, called model parameters, needs to be learned by model training. We train a model through a procedure called backpropagation, adjusting model parameters to minimize a cost function. Typically, neurons of one layer are connected to neurons from the previous and subsequent layers. An input signal enters the first layer, and after a sequence of non-linear transformations in the intermediate layers, called hidden layers, it is received from the output layer. An NN can have a minimum of three layers: one input layer, one hidden layer, and one output layer; however, we can add more hidden layers. An NN with an architecture containing more than three layers is called a deep neural network (DNN). Theoretically, a shallow network with enough neurons in the hidden layer can represent any function, but in practice, DNNs have a better performance because each layer adds its level of non-linearity, increasing the abstraction capacity. This section introduces multilayer perceptron (MLP) basics and the convolutional neural networks (CNNs) models, a specialized NN used in computer vision tasks.

(32)

2.3.5.1 Multilayer perceptron

An MLP is a model that applies a sequence of non-linear transformations to the input data (Figure 3a). Formally, we describe an MLP with 𝐿 layers as follows (we use similar notations as in (Farabet, 2013)):

𝑦 = 𝑓(𝑥, 𝜃) = ℎ𝐿 (8)

𝑙= 𝜎𝑙(𝑊𝑙𝑙−1+ 𝑏𝑙), ∀𝑙 ∈ {1, … , 𝐿} (9)

0= 𝑥 (10)

where 𝜃 = {𝑊𝑙, 𝑏𝑙}, ∀𝑙 ∈ {1, … , 𝐿} is the set of trainable parameters consisting of bias parameters 𝑏𝑙 and trainable weight parameters 𝑊𝑙 for each layer 𝑙, 𝑥 ∈ ℝ𝑑𝑖𝑛 is the input vector, 𝑦 ∈ ℝ𝑑𝑜𝑢𝑡 is the output vector, and 𝜎𝑙 is a point-wise non-linearity at layer 𝑙. The most common non-linear activation functions for 𝜎𝑙, ∀𝑙 ∈ {1, … , 𝐿 − 1}

are the hyperbolic tangent and the rectified linear unit. Without activation functions, the whole system is a set of linear operations and could be written as single matrix multiplication.

2.3.5.2 Convolutional neural networks

CNNs are a subset of DNNs that take images as the input, the basic linear layers are replaced with convolutional layers, and a spatial pooling function follows non- Figure 3. (a) A multilayer perceptron model with two hidden layers. (b) The architecture of a convolutional neural network. This figure is generated by adapting the code from Gavin Weiguang Ding’s Github.

(33)

linear activations (Figure 3b). CNNs were developed because solving image-related tasks using a DNN for an even a modest size image, e.g., 200×200×3 with three color channels, contains 120,000 units of information. A single fully connected neuron on the first layer of an NN would require 120,000 parameters. Having multiple layers in the network, the number of parameters in such a model would quickly increase, resulting in overfitting and computational problems. The architecture of CNNs, on the other hand, possesses two fundamental properties useful for image applications: spatially shared weights and spatial pooling.

Another advantage of CNNs is that by sliding kernels over images, a kernel is a matrix that performs a dot product with the input, we can deal with arbitrary-sized images compared to fixed-sized inputs of MLPs. For a general form CNN, we can write:

𝑌 = 𝑓(𝑋, 𝜃) = 𝐻𝐿 (11)

𝐻𝑙= 𝑝𝑜𝑜𝑙𝑙(𝜎𝑙(𝑊𝑙𝐻𝑙−1+ 𝑏𝑙)), ∀𝑙 ∈ {1, … , 𝐿} (12)

𝐻0= 𝐼 (13)

where 𝜃 = {𝑊𝑙, 𝑏𝑙}, ∀𝑙 ∈ {1, … , 𝐿} is the set of trainable parameters, 𝑋 is the input array of vectors (image 𝐼 is an array of pixels), 𝑌 is an array of output vectors, 𝜎𝑙 is a point-wise non-linearity at layer 𝑙, and 𝑝𝑜𝑜𝑙 is an optional pooling function at layer 𝑙.

The matrices 𝑊𝑙 are Toeplitz matrices in CNNs, while these matrices can take any general form in MLPs. Each hidden unit array 𝐻𝑙 is a convolution between 𝑊𝑙

and its previous hidden unit 𝐻𝑙−1, transformed through a point-wise non-linearity and possible pooling layer, written as:

𝐻𝑙𝑝= 𝑝𝑜𝑜𝑙𝑙(𝜎𝑙(𝑏𝑙𝑝+ ∑ 𝑊𝑙𝑝𝑞∗ 𝐻𝑙−1,𝑞 𝑞𝜖𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑝)

)), (14)

where 𝐻𝑙𝑝 is the 𝑝𝑡ℎ component, called a feature map, of the 𝑙𝑡ℎ feature vector maps 𝐻𝑙. The output of a CNN is connected to a final activation function, e.g., a softmax function, so that the optimization becomes a maximum a posteriori estimation. In the remainder of this section, we describe the three building blocks of a CNN: the convolutional layer, activation function, and pooling function. Then, we describe the learning procedure of NNs, including the loss function and optimization. We finalize this section, briefly describing two important CNN architectures: fully convolutional network (FCN) and U-Net.

Convolutional layer. A convolutional layer cross-correlates the input and kernel and adds a scalar bias to produce an output. A convolutional layer has two parameters: the kernel and scalar bias. We typically initialize the kernels randomly when training models. The input of every layer is a 3D array with 𝑛1 2D feature

(34)

maps of size 𝑛2× 𝑛3. We denote each component as 𝑥𝑖𝑗𝑘 and each feature map as 𝑥𝑖. The output 𝑦 is a 3D array with 𝑚1 feature maps of size 𝑚2× 𝑚3. A trainable kernel 𝑤𝑖𝑗 of size 𝑘1× 𝑘2 and bias parameter 𝑏𝑖 connect an input feature map 𝑥𝑖 with an output feature map 𝑦𝑗. For a convolutional module, we have:

𝑦𝑗= 𝑏𝑗+ ∑ (𝑤𝑖 𝑖𝑗 ∗ 𝑥𝑖), (15) where ∗ is the 2D convolutional operator. Each kernel 𝑤𝑖𝑗 learns a particular feature at every location on the input, forcing spatial invariance.

Activation layer. As in MLP, the point-wise non-linearity such as the hyperbolic tangent and the ReLU is applied to each location 𝑖𝑗𝑘 of the feature maps.

Pooling layer. A pooling layer in its simplest form computes the average values over a neighborhood in each feature map. The average operation can also be replaced by finding the maximum value within the neighborhood. A pooling operation is applicable after each activation layer. It serves two purposes: 1) reducing the sensitivity of convolutional layers to a location, specifically for low- level features, such as edges, and 2) providing a spatially down-sampled

representation of feature maps since by gradually coarsening the feature maps and aggregating information, the network can ultimately learn a global

representation of the image and is more sensitive to the entire input in deeper layers.

Loss function. Once the architecture of an NN is defined, the network can be seen as a function approximator. Assume a training set of 𝑁 training samples 𝐷 = {𝑋, 𝑇} = {𝑥𝑛, 𝑡𝑛}, 𝑛 ∈ {1, … , 𝑁}, with 𝑥𝑛 an input example, and 𝑡𝑛 a target value associated with that example; 𝑡𝑛∈ {1, … , C}, where 𝐶 is the number of possible target classes. In a classification problem, the network output can be seen as a conditional distribution of the target 𝑡𝑛, given the input 𝑥𝑛 and the parameters 𝜃.

This probability can be modeled by transforming the output of the network 𝑓𝑐(𝑥𝑛, 𝜃) for each class 𝑐 ∈ {1, … , 𝐶} with a softmax function:

𝑃(𝑡𝑛|𝑥𝑛, 𝜃) = 𝑒𝑓𝑐(𝑥𝑛,𝜃)

∑ 𝑒𝑗 𝑓𝑗(𝑥𝑛,𝜃) . (16)

In the special case where 𝐶 = 2, the softmax function takes the form of the sigmoid function. We consider a multi-categorical problem in which each label 𝑡𝑛 belongs to one of 𝐶 categories. We find model parameters by maximizing the likelihood over the training data 𝐷 with respect to the parameters 𝜃:

𝜃= argmax

𝜃

𝑃(𝑇|𝑋, 𝜃) ,

= argmax

𝜃

𝑃(𝑡1, 𝑡2, … , 𝑡𝑁|𝑥1, 𝑥2, … , 𝑥𝑁, 𝜃) ,

= argmax∏ 𝑃(𝑡𝑛|𝑥𝑛, 𝜃)

𝑁

,

(17)

(35)

assuming that the training set is sampled from an unknown independent and identically distributed (iid) distribution. Equation (17) can be written as minimizing the negative log-likelihood as:

𝜃= argmin

𝜃

− ∑ ∑ 𝟏𝑐(𝑡𝑛) [𝑓𝑐(𝑥𝑛, 𝜃) − log (∑ 𝑒𝑓𝑗(𝑥𝑛,𝜃)

𝑗

)]

𝑐 𝑛

, (18)

where 𝟏(𝑥) is the indicator function. The right-hand side function in (18) is

supposed to be minimized as a loss function. Equation (18) does not have a closed solution, and we apply an iterative approach in order to minimize it. Note that there are other types of loss functions, but the negative log-likelihood provides a simple and consistent parameter estimation framework.

Optimization. Optimization algorithms allow us to update model parameters and minimize the value of the loss function on the training set. The performance of an optimization algorithm affects model training, and correctly tuning its hyperparameters can improve the performance of an NN (Czum, 2020).

The gradient descent method is the most common optimization algorithm used in machine learning, with a randomly initialized set of parameters 𝜃1 is written as:

𝜃𝑧+1← 𝜃𝑧− 𝜂∇𝜃𝐿(𝜃, 𝑋, 𝑇). (19) It is reasonable to assume that moving in the direction of the negative gradient with small steps decreases the loss function 𝐿. Therefore, in gradient descent, we first choose an initial value for 𝜃, a constant learning rate 𝜂 > 0, and continuously iterate 𝜃 until the stop condition is reached. However, in large-scale machine learning problems, the gradient descent method can become inefficient as it requires a full pass on the data at each iteration. Stochastic gradient descent (Robbins and Monro, 1951) reduces the computational cost at each iteration by uniformly sampling an index 𝑖 ∈ {1, … , 𝑁} for training to estimate the gradient, and the parameters are updated as:

𝜃𝑧+1← 𝜃𝑧− 𝜂∇𝜃𝐿(𝜃, 𝑥𝑖, 𝑡𝑖). (20) Stochastic gradient descent is not particularly computationally efficient since CPUs and GPUs cannot exploit the full power of vectorization. Therefore, we use a midterm approach between stochastic and batch methods called mini-batch gradient descent to optimize the loss function. We use a subset of the training set 𝑚 < |𝐷| to perform each parameter update:

𝜃𝑧+1← 𝜃𝑧− 𝜂∇𝜃𝐿(𝜃, 𝑥𝑖,…,𝑖+𝑚, 𝑡𝑖,…,𝑖+𝑚). (21) There are other variants of these learning rules to reduce the noise in

stochastic directions. For example, the momentum method (Rumelhart, Hinton and Williams, 2013) added a mechanism for averaging over multiple past

gradients, called momentum, to accelerate the convergence. The Adagrad method

(36)

(Duchi, Hazan and Singer, 2011) used an aggregate of the squares of previously observed gradients to adjust the learning rate. Thus, the coordinates that persistently corresponded to large gradients were substantially penalized, and others with small gradients were gently penalized. A problem with Adagrad is that the learning rate decreases at a predefined schedule effectively, not ideal for nonconvex problems; therefore, RMSProp (Tieleman and Hinton, 2012) was proposed as a simple fix to decouple rate scheduling from coordinate-adaptive learning rates. Finally, the Adam optimizer (Kingma and Prodanova, 2014) combined all these techniques into one efficient learning algorithm, providing a robust and effective optimization algorithm.

Architecture. The number of architectures and algorithms that are used in deep learning is wide and varied. Here, we describe two prevalent CNN

architectures in deep learning: FCN and U-Net. FCN is an architecture used mainly for semantic segmentation, employing locally connected layers, such as

convolution, pooling, and up-sampling. An FCN avoids fully connected layers, converting them into 1×1 convolution layers, enabling the network to operate on arbitrarily sized inputs. The network consists of a down-sampling feature

extraction path and an up-sampling path, which allows for localization. An FCN combines layers of the feature hierarchy and refines the spatial precision of the output by adding skip connections to recover the fine-grained spatial information lost in the down-sampling path.

A U-Net architecture (Figure 4) modifies and extends the FCN architecture. It has many feature channels in the up-sampling path, allowing the network to propagate context information to higher resolution layers. The down-sampling and up-sampling paths of a U-Net are almost similar, leading to a U-shaped architecture.

2.3.6 Image segmentation in electron microscopy

Image segmentation of ultrastructures in large EM volumes is a challenging task.

These challenges mainly concern discontinuities in cellular membranes, the large size of EM volumes, and the large number of neuronal processes that are tightly packed together, which restricts applying segmentation techniques with a high computation complexity. Therefore, automated segmentation of EM volumes of the brain tissue requires specifically developed algorithms that address these challenges. In this section, we review several automated segmentation algorithms that use the earlier introduced methods in the context of EM segmentation.

(37)

Lee et al. (Lee et al., 2019) and our automatic segmentation of axons (ACSON;

study I) (Abdollahzadeh et al., 2019) used the SRG algorithm to segment myelinated axons in high-resolution EM volumes of white matter. In (Lee et al., 2019), seeds were located manually for each myelinated axon, while in ACSON, we used the regional maxima of the Euclidean distance transform of myelin to define the seed locations.

The Chan and Vese active contours have been widely used for segmenting ultrastructures in EM volumes. In (Perez et al., 2014), the authors trained an organelle-specific pixel classifier and used the binarized probability maps of organelles to initialize the evolution of active contours. Jorstad and Fua (Jorstad and Fua, 2015) used active surfaces to refine mitochondria segmentation in EM volumes, given the initial boundary prediction from a machine learning-based segmentation algorithm. Tasel et al. (Tasel et al., 2016) used a parabolic arc model to extract membrane structures for anisotropic image stacks and then employed the curve energy based on an active contour to obtain the roughly outlined candidate mitochondrial regions. In the DeepACSON pipeline (study II;

(Abdollahzadeh et al., 2021)), we used deep learning to achieve a semantic Figure 4. U-net architecture with 32×32 pixels in the lowest resolution. Each blue box corresponds to a multi-channel feature maps. The number of channels is written on top of the box and the size of feature maps at the lower left edge of the box. White boxes represent copied feature maps. The arrows denote different operations. This figure is taken from Jackson Huang’s Github under the MIT license.

(38)

segmentation of cell nuclei and applied active contours for their instance segmentation.

State-of-the-art automated segmentation techniques currently use deep CNNs to trace ultrastructures in EM volumes (Zeng, Wu and Ji, 2017; Haberl et al., 2018;

Januszewski et al., 2018; Falk et al., 2019; Funke et al., 2019; Meirovitch et al., 2019;

Miocchi et al., 2021). CNNs are typically used for semantic segmentation, whereas other more traditional image processing techniques are used for instance

segmentation. Zeng et al. in DeepEM3D (Zeng, Wu and Ji, 2017) and Haberl et al. in the cloud version of DeepEM3D (Haberl et al., 2018) used a modified version of Inception-V4 (Szegedy et al., 2015) by reducing the number of inception-residual blocks to account for the overfitting while the semantic segmentation. DeepEM3D applied 3D watershed transform for the instance segmentation of neuronal processes. Funke et al. (Funke et al., 2019) trained 3D U-NET to predict affinities using an extension of the MALIS loss function (Turaga et al., 2009), minimizing topological errors of hypothetical thresholding and connected component analysis. The predicted affinities obtained an over-segmentation that is then merged into the final segmentation using a percentile-based agglomeration algorithm. In DeepACSON (study II), we applied a light-weighted FCN for the semantic segmentation of ultrastructures and developed a cylindrical shape decomposition (CSD; study III) (Abdollahzadeh, Sierra and Tohka, 2021) algorithm for the instance segmentation of myelinated axons, incorporating the tubularity of myelinated axons as a global objective in the segmentation process. Januszewski et al. (Januszewski et al., 2018) suggested flood-filling networks (FFNs), a single-object tracking technique, and Meirovitch et al. (Meirovitch et al., 2019) introduced cross- classification clustering, a multi-object tracking technique, merging the semantic- and instance segmentation in recurrent neural networks. The recurrent networks in these techniques maintain the prediction of the shape of objects and learn to reconstruct neuronal processes with more plausible shapes. Roels et al. (Roels et al., 2019) used a reconstruction decoder for domain adaptation to use the existing annotated EM datasets and reduce the dependency on a large training set for EM segmentation. Falk et al. (Falk et al., 2019) presented an ImageJ plugin of U-Net and proposed inserting an artificial one-pixel wide background ridge between touching instances in the training set to address under-segmentation errors. This method applies a connected component analysis for the instance segmentation of neuronal processes in EM volumes.

(39)

3 AIM OF THE STUDY

In this thesis, we developed automated segmentation pipelines to trace

ultrastructures in large SBEM volumes of white matter in sham-operated rats and rats after a traumatic brain injury (TBI) to make it possible to analyze the changes in the morphology and spatial distribution of ultrastructures in 3D.

The specific aims of this study are as follows:

1. Segmentation and morphology analysis of ultrastructures in high-resolution SBEM volumes of white matter acquired in small fields of view. In study I, we developed the ACSON pipeline to annotate white matter into myelin, myelinated axons, unmyelinated axons, mitochondria, and cell

bodies/processes. ACSON is an SRG-based algorithm whose seeds were determined automatically. Using ACSON, we measured the true

morphology of myelinated axons in 3D and quantified pathomorphological changes of ultrastructures in TBI.

2. Segmentation and morphology analysis of ultrastructures in low-resolution SBEM volumes of white matter acquired in large fields of view. In studies II and III, we developed an automated segmentation and quantification pipeline called DeepACSON, which performed FCN-based semantic segmentation and shape decomposition-based instance segmentation. In order to train DeepACSON, we used the ACSON segmentations of high- resolution EM datasets from study I. The novel instance segmentation of the DeepACSON pipeline called cylindrical shape decomposition and was published separately in study III. DeepACSON segmented white matter SBEM volumes into myelin, myelinated axons, mitochondria, and cell nuclei.

Using DeepACSON, we measured the 3D morphology and spatial

distribution of ultrastructures and quantified morphological changes after TBI.

(40)
(41)

4 MATERIALS AND METHODS

4.1 SAMPLE PREPARATION AND IMAGE ACQUISITION

4.1.1 Animals

We used five adult male Sprague-Dawley rats (10-weeks old, weighing 320 and 380 g, Harlan Netherlands B.V., Horst, Netherlands) in our studies. The animals were singly housed in a room (22±1°C, 50% - 60% humidity) with a 12 h light/dark cycle and free access to food and water. All animal procedures were approved by the Animal Care and Use Committee of the Provincial Government of Southern Finland and performed according to the European Community Council Directive

86/609/EEC guidelines.

4.1.2 The traumatic brain injury model

We induced TBI by lateral fluid percussion injury in three rats (TBI #2, #24, and

#28) as described in (Kharatishvili et al., 2006). Rats were anesthetized with a single intraperitoneal injection. We performed a craniectomy, 5 mm in diameter,

between bregma and lambda on the left convexity (anterior edge 2 mm posterior to bregma; lateral edge adjacent to the left lateral ridge). Lateral fluid percussion injury was induced in one rat by a transient fluid pulse impact (21 - 23 ms) against the exposed intact dura using a fluid-percussion device (Amscien Instruments, Richmond, VA, USA). We adjusted the impact pressure to 3.2 - 3.4 atm to induce a severe injury. The sham operation of two rats (sham #25 and #49) included all the surgical procedures except the impact.

4.1.3 Tissue preparation for SBEM

The rats were transcardially perfused five months after TBI or sham operation using 0.9% NaCl, followed by 4% paraformaldehyde. The brains were removed from the skull and post-fixed in 4% paraformaldehyde and 1% glutaraldehyde overnight. We sectioned the brains into 1-mm thick coronal sections (Figure 5a) with a vibrating blade microtome (VT1000s, Leica Instruments, Germany). From each brain, we selected a section at approximately -3.80 mm from bregma and further dissected two samples from the ipsilateral and the contralateral corpus

(42)

callosum. The samples were stained (Figure 5b) using an enhanced staining protocol described in (Deerinck et al., 2010) and studies I and II.

We selected an area of interest within the samples (Figure 5b) and further trimmed the blocks into a pyramidal shape (1×1 mm2 base and 600×600 µm2 top) to assure that the block remained stable while sectioning in the microscope. The tissue was exposed on all four sides and the bottom and top of the pyramid. The blocks were then mounted on aluminum specimen pins using conductive silver epoxy (CircuitWorks CW2400). Next, we electrically grounded the exposed block edges to the aluminum pins using silver paint (Ted Pella, Redding, CA, USA), except for the block face or the edges of the embedded tissue. The entire surface of the specimen was then sputtered with a thin layer of platinum coating to improve conductivity and reduce charging during the sectioning process.

4.1.4 SBEM data

We acquired all our SBEM datasets using an SEM microscope (Quanta 250 Field Emission Gun; FEI Co., Hillsboro, OR, USA) equipped with the 3View system (Gatan Inc., Pleasanton, CA, USA) on a back-scattered electron detector (Gatan Inc.). The x- y plane was the top (face) of the mounted blocks, and the cutting direction was the z-axis. We imaged all the samples with a beam voltage of 2.5 kV and a pressure of 0.15 Torr. The SBEM volumes were imaged consistently at the same location in the brain. We acquired the SBEM images ipsi- and contralaterally, we had ten samples for five rats. Each sample was simultaneously imaged at two resolutions: high- resolution (voxel size: 15×15×50 nm3; in a small field-of-view: 15×15×15 µm3; from the corpus callosum) and low-resolution (voxel size: 50×50×50 nm3; in a large field- of-view: 200×100×65 µm3; two-thirds of the images corresponded the corpus callosum and one-third the cingulum). Figure 5d shows that myelin, myelinated axons, mitochondria, and cell nuclei were resolved in both resolutions; however, unmyelinated axons and axonal membranes were only resolved in the high-

resolution setting. Table 1 shows the size and resolution of all datasets used in our studies.

(43)

Figure 5. (a) A photomicrograph of 1 mm-thick section of a rat brain. (b) A photomicrograph of a stained semithin section. (c) Selection for high- and low- resolution SBEM imaging. (d) Corpus callosum (cc) and cingulum (cg) were SBEM imaged simultaneously at two resolutions: high-resolution (voxel size: 15×15×50 nm3; field-of-view: 15×15×15 μm3) and low-resolution (voxel size: 50×50×50 nm3; field-of-view: 200×100×60 μm3). We visualized 2D images of the low- and high- resolution datasets from the same location, showing that ultrastructures such as myelin, myelinated axons, mitochondria, and cell nuclei were resolved in both resolutions. Axonal membranes at nodes of Ranvier (cyan panel, arrowheads) and unmyelinated axons (fuchsia panel, asterisks), on the other hand, were only resolved in the high-resolution setting.

Viittaukset

LIITTYVÄT TIEDOSTOT

In this paper, hyperspectral imaging and deep learning con- volutional neural networks were applied to develop a novel ap- proach, for instance segmentation and classification

Therefore, motivated by the promising results of using convolutional neural networks in semantic segmentation of lesions, we attempt to find benefits by using two dif- ferent

A set of 35 test images unseen during training represented a test set, all of which were processed using the trained networks, to estimate muscle thickness, and median muscle

In this paper, hyperspectral imaging and deep learning con- volutional neural networks were applied to develop a novel ap- proach, for instance segmentation and classification

Contrary to the origi- nal method, these methods look for every possible valid concavity point in a concavity region using curvature analysis, thus minimizing false split

To achieve a better trade-off between segmentation accuracy and the number of superpixels, few methods propose to split/merge initial uniform superpixels or distribute

Here we present a deep learning system for automatic localisation of the mandibular canals by applying a fully convolutional neural network segmentation on clinically diverse

From left to right: Pseudo-colored depth map zoomed-in to emphasize noise; Sensed depth map in native low resolution; Color image of the same scene; Upsampled, aligned and