• Ei tuloksia

Automated joint skull-stripping and segmentation with Multi-Task U-Net in large mouse brain MRI databases

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Automated joint skull-stripping and segmentation with Multi-Task U-Net in large mouse brain MRI databases"

Copied!
13
0
0

Kokoteksti

(1)

UEF//eRepository

DSpace https://erepo.uef.fi

Rinnakkaistallenteet Terveystieteiden tiedekunta

2021

Automated joint skull-stripping and

segmentation with Multi-Task U-Net in large mouse brain MRI databases

De Feo, Riccardo

Elsevier BV

Tieteelliset aikakauslehtiartikkelit

© 2021 The Authors

CC BY http://creativecommons.org/licenses/by/4.0/

http://dx.doi.org/10.1016/j.neuroimage.2021.117734

https://erepo.uef.fi/handle/123456789/25864

Downloaded from University of Eastern Finland's eRepository

(2)

ContentslistsavailableatScienceDirect

NeuroImage

journalhomepage:www.elsevier.com/locate/neuroimage

Automated joint skull-stripping and segmentation with Multi-Task U-Net in large mouse brain MRI databases

Riccardo De Feo

a,b,c,

, Artem Shatillo

d

, Alejandra Sierra

c

, Juan Miguel Valverde

c

, Olli Gröhn

c

, Federico Giove

b,e

, Jussi Tohka

c

aSapienza Università di Roma, Rome 00184, Italy

bCentro Fermi–Museo Storico della Fisica e Centro Studi e Ricerche Enrico Fermi, Rome 00184, Italy

cA.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio 70210, Finland

dCharles River Discovery Services, Kuopio, Finland

eFondazione Santa Lucia IRCCS, Rome 00179, Italy

a r t i c le i n f o

Keywords:

MRI Brain Segmentation Deep learning U-Net Mice

a b s t r a ct

Skull-strippingandregionsegmentationarefundamentalstepsinpreclinicalmagneticresonanceimaging(MRI) studies,andthesecommonproceduresareusuallyperformedmanually.WepresentMulti-taskU-Net(MU-Net), aconvolutionalneuralnetworkdesignedtoaccomplishbothtaskssimultaneously.MU-Netachievedhigherseg- mentationaccuracythanstate-of-the-artmulti-atlassegmentationmethodswithaninferencetimeof0.35sand nopre-processingrequirements.

WetrainedandvalidatedMU-Neton128T2-weightedmouseMRIvolumesaswellasonthepubliclyavailable MRMNeATdatasetof10MRIvolumes.WetestedMU-Netwithanunusuallylargedatasetcombiningseveral independentstudiesconsistingof1782mousebrainMRIvolumesofbothhealthyandHuntingtonanimals,and measuredaverageDicescoresof0.906(striati),0.937(cortex),and0.978(brainmask).Further,weexploredthe effectivenessofournetworkinthepresenceofdifferentarchitecturalfeatures,includingskipconnectionsand recentlyproposedframingconnections,andtheeffectsoftheagerangeofthetrainingsetanimals.

ThesehighevaluationscoresdemonstratethatMU-Netisapowerfultoolforsegmentationandskull-stripping, decreasinginterandintra-ratervariabilityofmanualsegmentation.TheMU-Netcodeandthetrainedmodelare publiclyavailableathttps://github.com/Hierakonpolis/MU-Net.

1. Introduction

Preclinicalimagingstudiesserve afundamentalrolein biological andmedicalresearch,relatingresearchresults atthemolecularlevel toclinicalapplicationin diagnosisandtherapy.MagneticResonance Imaging(MRI)representsapproximately23%ofallsmall-animalimag- ingstudiesprovidingtheopportunitytomonitorthedevelopmentof pathologicalconditionsandresponsestotreatmentina non-invasive way(Cunhaetal.,2014).Itsuniquequalitiesalsoincludetheavailabil- ityofdifferentimagingcontrasts,renderingMRIextremelyusefulinthe contextofpreclinicalneurosciencewithapplicationsfromdrugdevelop- ment(Matthewsetal.,2013)tobasicresearch(FeboandFoster,2016).

Skull-strippingandregionsegmentationrepresentanintegralpart ofprocessingpipelinesinmurineMRimaging(Andersonetal.,2019;

Calabreseetal.,2015).Skull-strippingreferstotheidentificationofthe brainwithintheMRIvolume,andregionsegmentationreferstothela-

Correspondingauthorat:SapienzaUniversità diRoma,00184Rome,Italy.

E-mailaddress:riccardo.defeo@uniroma1.it(R.DeFeo).

belingofspecificanatomicalregionsofinterest(ROIs)withinthebrain.

In preclinicalMRI,thesetasks areoften performedmanually.While manualsegmentationrepresentsthegoldstandardandisemployedas thegroundtruthwhenevaluatingautomatedsegmentationalgorithms, itis time-consuminganddependson theexpertiseof theannotators performingthesegmentation.Furthermore,manualsegmentationsuf- fersfrom bothintra-andinter-ratervariability,bothinsmallanimal (Alietal.,2005)andhumanMRI(Entisetal.,2012;Yushkevichetal., 2006).

InpreclinicalMRI,state-of-the-artautomatedregionsegmentation pipelines arebasedonatlasregistration:individualMRIvolumesare alignedwithalabeledtemplate(atlas)andthelabelspropagatedtothe individualvolumes(DeFeoandGiove,2019;Lerchetal.,2011;Pagani etal.,2016;Schwarzetal.,2006;Shariefetal.,2008).Theaccuracy of registration-basedsegmentationdepends onboththesuitabilityof thetemplateandtheregistrationalgorithm.Thesegmentationaccuracy can beimprovedbymulti-atlasstrategies,wheremultipleatlasesare

https://doi.org/10.1016/j.neuroimage.2021.117734

Received17August2020;Receivedinrevisedform9December2020;Accepted7January2021 Availableonline14January2021

1053-8119/© 2021TheAuthors.PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/)

(3)

registeredtothesamevolumeandtheso-resultingsegmentationmaps arecombined,forexample,viamajorityvoting.Regardingmulti-atlas strategiesinmouseMRI,Baietal.(2012)compareddifferentsingleand multi-atlasmethodsforatlas-based segmentationof themousebrain andreportedthatthecombinationofadiffeomorphicregistrationalgo- rithmandmulti-atlassegmentationprovidedthemostaccurateresults.

Maetal.(2014)demonstratedthatthemulti-atlasmethodsaresuperior tosingle-atlasmethodsandtheSTEPSprocedure forcombiningseg- mentations(Cardosoetal.,2013)bringsadvantagesoverearliercom- binationmethodologies.While multi-atlassegmentationaccounts for individualvariabilitymoreeffectivelythansingle-atlassegmentation, italsorequiresmultiplelabeledatlasesandmultipleregistrationsteps, significantlyincreasingthesegmentationtime.Multi-atlassegmentation canbefurthercombinedwiththeconstructionofaMinimumDeforma- tionTemplate(MDT)asanintermediatestepintheprocessingpipeline (Avantsetal.,2010;DeFeoandGiove,2019;Kovačević etal.,2004).An MDTminimizesthedeformationrequiredtoadaptittoeachindividual volume,thusreducingerrorswhenitslabelsarepropagatedtoeachtar- getscan.Insteadofdirectlyemployingoneormoremanuallysegmented atlases,deepneuralnetworks(DNNs)(LeCunetal.,2015)canusethese astrainingdatatolearnamappingfunctionfromtheimagestotheseg- mentationmaps.Inthisway,theanatomicalinformationisnotexplicitly representedinasetofmapsbutimplicitlyencodedinthetrainednet- work.DNNs,andinparticularConvolutionalNeuralNetworks(CNNs), havebeensuccessfullyappliedin alargenumberofcomputervision tasksinmedicalimaging.Forexample,Wachingeretal.(2018)devel- opedaregionsegmentationCNNsignificantlyoutperformingstate-of- the-art,registration-basedmethodsforthehealthyhumanbrainMRI, bothintermsofinferencetimeandaccuracy.Royetal.(2018a)further improvedonbothaspectswithanetworkbasedontheU-Netarchitec- ture(Ronnebergeretal.,2015),withareportedsegmentationtimeof20 sperbrainscan.However,withinsmall-animalMRI,theapplicationsof CNNshavebeenlimitedtoskull-stripping:Royetal.(2018b)traineda CNNalgorithmbasedonGoogleInception(Szegedyetal.,2015)forthe skull-strippinginhumansandmiceaftertraumaticbraininjury,achiev- ingbetterperformancethanotherstate-of-the-artmethods(3DPulse CoupledNeuralNetworks(3D-PCNN)(Chouetal.,2011)andRapidAu- tomaticTissueSegmentation(RATS)(Oguzetal.,2014)).

AspecifictypeofCNNarchitecture,U-Net,hasprovedtobevalu- able in biomedical image segmentation. U-Net is based on the en- coder/decoderstructure,addingskipconnectionsbetweentheencoder andthedecoderbranches,allowingittoeasilyintegratemulti-scalein- formationandbetterpropagatethegradientduringtraining.Thisar- chitecturehasbeenshowntogeneralizeevenfromalimitedamountof annotateddata(Xieetal.,2015),andassuchiswellsuitedformedical imaging,wheredatasetsaslargeastheonescommonlyusedforCNNs arerare.Valverdeetal.(2019)recentlydemonstratedtheeffectiveness ofU-Net-like architectures inpreclinicalresearch,designingthefirst DNNforthesegmentationofischemiclesionsinrodentsandachieving segmentationaccuracycomparableorbettertointer-rateragreementin manualsegmentation.

Inthiswork,weintroducemulti-taskU-Net(MU-Net)tosimulta- neouslyperformskull-strippingandregionsegmentationofthemouse brain,based ontheU-Netarchitecture.We refertoourapproachas multi-taskas weconsiderskull-strippingandregionsegmentationas separatetasks,allowingforthecompletedelineationofthebrainvol- umeregardlessofthechoiceofROIs.Whilethesetasksareoftencon- sideredasseparateinthecontextofmurinebrainsegmentation,they arestronglyrelated.Therefore,ourapproachisnotmulti-tasklearning inthestrongersenseofprovidingtwofundamentallydifferentoutputs, e.g.,segmentationandclassification(Yangetal.,2017).

Ourmaintrainandvalidationdataconsistedof128T2MRIvolumes from32miceat4differentagesaswellasfivemanuallyannotatedre- gions(cortex,hippocampi,ventricles,striatiandbrainmask)fromthese images.ThisdatasetrepresentsMRimagestypicallyemployedindrug development.WedemonstratethatwiththisdataMU-Netachievesa

Table1

Summarycharacteristicsofthethreedatasetsemployedinthisstudy.BM referstobrainmask.Thetestdatasetincludedvariousgenotypesofboth sexes(seeSupplementaryTableS1fordetails).

Dataset name # Animals # MRIs # ROIs Type Train and validation 32 128 4 + BM WT males

Test 817 1,782 2 + BM various

MRM NeAt 10 10 37 + BM WT males

significantlyhigheraccuracythanstate-of-the-artmulti-atlassegmenta- tionmethods(Cardosoetal.,2013;Maetal.,2014)inafractionofthe segmentationtime(approximately0.35s).WetrainedMU-Neton128 MRIvolumesandtestedonanindependentdatasetof 1782volumes acquiredoverthecourseoffouryearsfrombothwildtype(WT)and Huntington(HT)C57BL/6Jmice,allowingustoevaluateMU-Net in avarietyofexperimentalconditions.Additionally,wetrainedMU-Net forthesegmentationofmousebrainMRIwithisotropicvoxelsinto37 ROIsanddemonstratethatthesegmentationaccuracyofMU-Netwas equalorbetterthanastate-of-the-artmulti-atlassegmentationmethod (Maetal.,2014).

2. Materialsandmethods 2.1. Materials

Weutilizedthreedifferentdatasetsinthisworkassummarizedin Table1anddetailedinthefollowingsubsections.

2.1.1. Animals:train,validationandtestsets

Atotalof849mice(CharlesRiverLaboratories,Germany)wereused:

32miceforthetrainandvalidationsetand817miceforthetestset.

Trainandvalidationsetanimalswerescannedatfourdifferentages(5 weeks,12weeks,16weeks,32 weeks)resulting in128volumes.All trainandvalidationsetanimalswereWTmales.

Thetestsetanimalswerepartof10studiesscannedatasingleor multipleagesfrom4upto60weeks,andincludedbothWTandseveral HTgenotypes:R6/2,Q175,Q175DN,Q111,Q50andQ20(Supplemen- taryTableS1),foratotalof1782MRIscans.Thegroupsincludedboth malesandfemales.Thesevolumeswereacquiredaspartoftenstudiesof Huntington’sdisease,kindlyprovidedbytheCHDI’CureHuntington’s DiseaseInitiative’foundation.

Allmicewerehousedingroupsofupto4percage(singlesex)ina temperature(22±1°C)andhumidity(30–70%)controlledenvironment withanormallight-darkcycle(7:00–20:00).

2.1.2. MRI:train,validationandtestsets

Micewereanesthetizedusingisoflurane(5%forinduction,1.5–2%

maintenance)in70%/30%mixofN2/O2carryinggas,fixedtoahead holderandpositionedinthemagnetboreinastandardorientationrela- tivetogradientcoils.Respirationrateandtemperatureweremonitored usingPC-SAMSsoftwareandModel1030Monitoring&GatingSystem, SmallAnimalInstruments,Inc.,StonyBrook,NY.Thetemperaturewas maintainedat∼ 37CusingSmallAnimalInstrumentsfeedbackwater heatingsystem.

All acquisitionswere performedusing a horizontal11.7T magnet withaboresizeof160mm,equippedwithagradientsetcapableofmax- imumgradientstrengthof750mT∕mandinterfacedtoaBrukerAvance IIIconsole(BrukerBiospinGmbH,Ettlingen,Germany).Avolumecoil (BrukerBiospinGmbH,Ettlingen,Germany)wasusedfortransmission andasurfacephasedarraycoilforreceiving(RapidBiomedicalGmbH, Rimpar,Germany).T2weightedanatomicalimageswereacquiredus- ingaTurboRAREsequencewitheffectiveTR/TE=2500∕36ms,8echoes, 12msinter-echodistance,matrixsize256x256,FOV20.0x20.0mm2,31 0.6mmthickcoronalslices,−0.15mminterslicegap,and8averages.Con-

(4)

Fig.1. Generaloutlineofthearchitecturalfeaturesimplementedandcomparedinthenetworksdiscussed,varyingaccordingtothepresenceorabsenceofthe in-blockdenseconnections(purplearrowsintheconvolutionalblock),presenceorabsenceofthelayersubtractionconnections(black),andtheuseof2Dor3D filters.

cerningthetestdata,MRIexperimentalparametersonlydifferedinac- quiring190.7mmthickcontiguouscoronalslices.

Volumeswithineachstudyweremanuallysegmentedbyanexperi- encedrater,whohadreceivedatrainingandpassedthequalification testsaccordingtoSOP(StandardOperatingProcedure)forvolumetric analysisin mice.Differentstudieswereanalyzedby differentraters.

Eachtrainingvolumewasmanuallysegmentedbyasingleraterdrawing thebrainmaskanddelineating4regionsofinterest:cortex,hippocampi, striatiandventricles.Thebrainmaskdidnotincludetheolfactorybulb orthecerebellum.Forthetestset,only3regionsweremanuallylabeled:

brainmask,cortexandstriati.Aseachimagewasonlysegmentedonce byasinglerater,intra-andinter-rateroverlapstatisticsarenotavail- ableforourdataset.Manualsegmentationrequiredfrom10to15min perROIperimage.

2.1.3. MRMNeAtdataset

The MRM NeAt dataset includes atlases of 10 individual T2- weightedinvivobrainMRimagesof12–14weeksoldC57BL/6Jmice;

eachwith37labelledanatomicalstructures(listedinFig.4)inaddition tothebrainmask(Maetal.,2008).Thisdatasetwasdownloadedfrom https://github.com/dancebean/mouse-brain-atlas,whereanimproved atlasisavailable(biascorrectionhasbeenapplied,leftandrightlabels havebeenseparatedand4thventriclelabeladded).Thisdatasetwas usedtoevaluatetheSTEPSalgorithmbyMaetal.(2014)andisused hereforthepurposeofcomparingMU-NetandSTEPSonalargernum- berofROIsonisotropicresolutionMRI.AsdetailedinMaetal.(2008), T2-weightedMRdatawithavoxel-sizeof0.1mm3requiringabout2.8h ofscantimewereacquiredwitha3Dlargeflipanglespinechosequence usingasuper-conducting9.4T/210mmhorizontalboremagnet(Mag- nex)controlledbyanADVANCEconsole(Bruker)andequippedwithan activelyshielded11.6cmgradientset(Bruker,Billerica,MA).

2.2. MU-Nets

2.2.1. Architectures

MU-Net(Fig.1)presentsanencoder-decoderU-Net-likearchitecture, witheachbrancharticulatedinfourconvolutionalblocks.UnlikeU-Net, thefinalblockofthedecoderbranchfurtherbifurcatesintotwodiffer- entoutputmapsrepresentingourtwotasks,sharingthesamefeature

representation.Eachconvolutionalblockontheencodingpathis fol- lowedbya2x2max-poolinglayer.Thelastfeaturemapfeedsintothe bottlenecklayer,a64channel5x5convolutionallayerwithbatchnor- malization(IoffeandSzegedy,2015)connectingthedeepestlayerofthe encodingpathwiththedecodingpath.

Thedecoding pathis composedof4more blocksalternatingone un-poolinglayer(Nohetal.,2015)andoneconvolutionalblock.Un- poolingoperationseffectivelyreplaceup-convolution layersinU-Net withoutanylearnableparameters,whilepreservingspatialinformation.

Theselayersoperatebysimplyplacingtheelementsoftheun-pooled featuremapsinthepositionoftherespectivemaximumactivationfrom thecorrespondingpoolingoperation,andsettingtheresttozero.Skip connectionsconcatenatetheoutputofeachdenselayerintheencoding pathwiththerespectiveun-pooledfeaturemapofthesamesizebefore feedingitasinputtothedecodingconvolutionalblock.

Theoutputofthelastdecodinglayeractsastheinputoftwodifferent classificationlayers,whichsharethesamefeaturerepresentationupto thispoint:a1x1singlechannelconvolutionwithasigmoidactivation function,anda1x15channelslayerfollowedbyasoftmaxactivation function,fortheskull-strippingtaskandtheregionclassificationtask, respectively.

Convolutionalblock

Eachconvolutionalblockincludes3convolutionallayerspreceded byleakyReLUactivation(Maasetal.,2013)layersandbatchnormaliza- tion.All3convolutionsarepaddedandresultin64outputchannels,in analogywithRoyetal.(2018a).Thefirstandsecondconvolutionsem- ploy5x5filters,whilethethirdusesa1x1filter.Thisbecomesespecially relevantinthepresenceofdenseconnections,actingasabottleneckfor the64x3channelsoftheconcatenatedinputsandcompressingthesize ofthefeaturemaps.

2.2.2. Architecturalvariants

Westudyseveralvariationstothebasicnetworkarchitecture.

Denseconnections

Inthemodelsincludingdenseconnections(Huangetal.,2017)we modifyeachconvolutionalblockbyconcatenatingtotheinputofeach convolutiontheoutputsofthepreviousconvolutionswithinthesame block(Fig.1).

(5)

DualFramingconnections

Dualframingconnectionsrefertoadditionalskipconnectionsinthe DualFrameU-Netmodel.HanandYe(2018)proposedthisarchitec- tureforcomputedtomographyreconstructionfromsparsedatabased onsignalprocessingargumentstoreduceartifactsandimproverecov- eryofhighfrequency edges.Dualframingconnectionsconsistinthe subtractionoftheinputofeachconvolutionalblockontheencoding pathfromtheoutputoftherespectiveconvolutionalblockofthesame sizeonthedecodingpath,andassuchtheimplementationofthesecon- nectionsdoesnotincreasethenumberofmodelparameters.

3Dimplementation

A3Dimplementationcould,inprinciple,providebetterresultsby takingintoaccountthefeaturesoftheadjacentslices,whereasa2Dnet- worksevaluateseachcoronalsliceindependently.However,thelarger numberof parameters alsoincreases therisk of overfitting, andthe lowerresolutionintheanterior-posterioraxiscomparedtothein-plane resolutionmightconstituteconfoundingfactorsinthepresenceof3D poolingoperations.

For these reasons, we compared 2D and 3D implementationsof our network, using 5x5x5 filters and2x2x2 max-pooling layers,re- placingthefiltersandpoolinglayersdescribedabove.Thisresultsin 16,008,076and10,286,344parametersforthe3Dnetworkswithand withoutin-blockskipconnections,respectively.Corresponding2Dnet- workscontain3,297,676and2,087,944parameters,respectively.Thus, optingfora3Darchitectureincreasesthenumberofparametersbyfac- torsof4.85and4.93ascomparedtothe2Darchitectures.Thetotal numberofparameterswasmeasuredbyusingthePyTorchinstruction

sum(p.numel() for p in model.parameters())

.Acom- pletebreakdownofmodelparametersforeachnetworkisavailablein supplementaryTableS2.

2.2.3. Lossfunction

Recentliterature suggeststhatDice-basedlossfunctions(Milletari etal.,2016;Royetal.,2018a;Sudreetal.,2017)wouldconstitutean improvementovercross-entropylossesforthesegmentationofmedical images(KarimiandSalcudean,2019).Weoptimizedajointlossfunction 𝐿,thatisthesumoftwoDicelossfunctionscorrespondingtothethe skull-stripping(𝐿𝑆𝑆)andtheregionclassificationtask(𝐿𝑅𝑆).Let𝑝(𝑖)be thepredictedprobabilityofvoxel𝑖ofbelongingtothebrainmask,and 𝑔(𝑖)thegroundtruthforvoxel𝑖(𝑔(𝑖)=1ifthevoxelisinthebrainmask).

Further,let𝑝𝑙(𝑖)and𝑔𝑙(𝑖)bethesamequantitiesforlabel𝑙(𝑙=1,,𝐾) encodingthegroundtruthasaone-hotvector.Then,thelossfunction canbewrittenas:

𝐿=𝐿𝑆𝑆+𝐿𝑅𝑆, (1)

𝐿𝑆𝑆=− 2∑

𝑖𝑝(𝑖)𝑔(𝑖)

𝑖𝑝𝟐(𝑖)+∑

𝑖𝑔𝟐(𝑖), (2)

𝐿𝑅𝑆 =−

𝐾 𝑙=1

2∑

𝑖𝑝𝑙(𝑖)𝑔𝑙(𝑖)

𝑖𝑝2𝑙(𝑖)+∑

𝑖𝑔2𝑙(𝑖), (3)

where𝐾isthenumberoflabels(ROIs)plusthebackgroundclass.

2.2.4. Training

The networks were implemented using the PyTorch framework and trained with stochastic gradient descent using Adam optimizer (KingmaandBa,2014)withthedefaultparameters(theinitiallearning rateof0.001,𝛽1=0.9,𝛽2=0.999andnoweightdecay)onanNVIDIA GeForceGTX1080GPUforupto12h(trainandvalidation)oronan NVIDIAVoltaV100GPUforupto24h(MRMNeAt).Eachnetworkwas trainedwithabatchsizeofone.Qualitatively,thetrainingpaceof2D and3Dnetworkswassubstantiallythesame,asevidencedinsupple- mentaryFig.S1.

Weaugmentedthedataonlineeachtimeanimagewasloadedby scalingthevolumesbya factor𝛼 randomlydrawnfromtheinterval

[0.95,1.01]androtatingthemaroundeachaxisbyarandomanglebe- tween−5◦and5◦.Scalingfactorssmallerthanonewerepreferredto decreasememoryrequirements.Eachtransformationwasappliedwith 50%probability.Tofurtherdecreasememoryrequirements,abounding boxwascreatedforeachvolumeusingtheannotatedbrainmaskasa reference.Eachvolumewasindividuallynormalizedto0meanandunit variance.Hyperparameters,optimizeranddataaugmentationscheme werefixedbeforetrainingensuringthateacharchitecturewouldfitinto memory,andappliedtoeachnetworkwithnoadditionalfinetuning.

2.2.5. Auxiliarybounding-boxnetwork

AsMU-Netwastrainedaftercroppingthevolumestoabounding box,wetrainedalighter2Dnetworktorunafirstestimateforthebrain maskatinferencetimefromthecompletevolume.Thiswasthenusedto drawaboundingboxaroundthebrainwithonevoxelmargin.Thisaux- iliarynetworkfollowsexactlythesamearchitectureofMU-Net,omitting anyframingordenseconnections,andlimitingthenumberofchannels to4,8,16and32,fromtheshallowesttothedeepestlayer.Thisresults inanetworkwithatotalnumberof122,455trainedparameters.

2.3. STEPSmulti-atlassegmentation

STEPSisastateoftheartlabelfusionalgorithmtocombinemultiple registered templatestolabela targetvolume(Cardosoetal., 2013).

Ittakesintoaccountthelocalandglobalimagematching,combining anexpectation-maximizationapproachwithMarkovRandomFieldsto improveonthesegmentationbasedonthequalityoftheregistration itself.

The registrations were performed as follows: before registration, each volume underwent non-parametric N3 bias field correction (Sledetal.,1998)implementedwithintheANTStoolset(Avantsetal., 2009).Takingeachvolumeasreference,allothervolumeswerethen registeredwithanaffinetransformationusingFSLFLIRT(Jenkinsonand Smith,2001)andthennonlinearlyregisteredviaFSLFNIRT(Andersson etal.,2007;Jenkinsonetal.,2012)withtheaidofthemanuallydrawn brainmask.LabelfusionwasachievedwiththeSTEPSalgorithmdis- tributedintheNiftySegpackage(Cardosoetal.,2013;2012).

Weusedcorrelationratio(corratio)asthecostfunctionin FLIRT andFNIRT.WeusedthedefaultFLIRTandFNIRTparameterswiththe followingexceptions.ThesearchrangeofanglesinFLIRTwas[−70, 70]insteadofthedefault[−90,90],becausetheorientationsofthe volumesweresimilar.InFNIRT,weusedsplineinterpolationinsteadof thedefaultlinearinterpolation.

STEPSdependsonthenumberoftemplatesemployedandthestan- darddeviationofitsGaussiankernel.Weperformedagridsearchtose- lecttheoptimalparameters,randomlyselecting10volumesandlabeling themusingSTEPS.WesampledthestandarddeviationoftheGaussian kernelsbetween0.5and6withastrideof0.5,andthenumberoftem- platesrangedbetween1and20randomlyselectedvolumes.Thissame processhasbeenperformedbothusingdiffeomorphicregistrationand usingaffineregistrationonly(supplementaryFig.S2),selecting16tem- platesandkernelstandarddeviationof1.5forthediffeomorphiccase, and18templateswithkernelstandarddeviationof2.5fortheaffinely registeredvolumes.Exploringbothgridsrequiredintotal287h.

Eachvolumewasthensegmentedusingtheseparameters,randomly selectinganappropriatenumberofmiceastemplatesfortheSTEPSal- gorithmasemergedfromtheparametergridsearchoutlinedabove.We repeatedthisprocedurerandomlyselectingthesamenumberoftem- platesfrommiceofthesameageonly.Themicerandomlyselectedas referenceatlaseswereselectedfromthetrainingsetassociatedtoeach volumeaccordingtothesame5-foldcrossvalidationschemeusedto traintheCNNsasoutlinedinSection2.5.

When evaluating STEPS on MRM NeAt dataset, we used scripts provided by Ma et al. (2014) at https://github.com/dancebean/

multi-atlas-segmentationasthisimplementationisoptimizedusingthis dataset.

(6)

The here described computations forthe training andvalidation datasetwereexecutedonaworkstationequippedwith a6-core,12- thread IntelCorei7-8700KCPU runningat3.70GHz. Toaccelerate thecomputationsgeneratingseveralintermediatefileoutputs,weused RAMdisktoreducethenumberof thediskoperations. FortheNeAt dataset,computationswereperformed ona 12-core,24-threadAMD Ryzen93900XProcessor.

2.4. Post-processing

Theonlypost-processingsteps appliedonthesegmentationmaps werethefillingofholesintheresulting3Dvolume,theselectionofthe largestconnectedcomponentasthebrainmaskfortheskull-stripping task,andassigningallvoxelspredictedasnon-braintothebackground class.

2.5. Validationandmetrics

Toassesstheoverlapbetweenthegroundtruthandthepredicted segmentationmasks,weusedtheDicecoefficientastheprimaryper- formancemeasure(Dice,1945).TheDicecoefficientisdefinedastwo timesthesizeoftheintersectionoverthesumofthesizesofthetwo regions:

𝐷= 2||𝑌𝑡𝑌||

||𝑌𝑡||+|𝑌|,

whereby𝑌 weindicateourpredictionandby𝑌𝑡thegroundtruth.This coefficientrangesfrom0,meaningnooverlap,to1,indicatingacom- pleteoverlapbetweenthetworegions.

Wefurtherevaluatedourresultsusingthe95thpercentileofthesym- metricHausdorff distance(HD95)(Huttenlocheretal.,1993).HD95in- dicatesthemagnitudeofthelargestsegmentationerrorcomparedtothe groundtruth,expressedinmillimeters.Weadditionallycomputedpre- cision(definedas|𝑌𝑡∩𝑌|

|𝑌| )andrecall(definedas|𝑌𝑡∩𝑌|

|𝑌𝑡| ).Thesemeasures providecomplimentaryinformationtotheDiceoverlap.

Eachexperimentonthetrainandvalidationdatasetaswellasthe NeAtdataset(seeTable1)wasvalidatedaccordingtoa5-foldcrossvali- dation(CV)scheme.Volumesweredistributedineachfoldaccordingto theindividualidentityofeachanimal,preventingtheuseofthevolumes fromthevalidationanimalsfortraining.Theanimalswererandomly assignedtoeachfoldonce,andthesameanimalsremainedassignedto theirrespectivefoldsthroughallexperiments.Fortrainandvalidation dataset,thisresultedinatrainingsetof25or26miceandavalidation setof6or7miceineachfold.FortheMRMNeAtdataset,5-foldCV resultedin8volumesusedfortraining(orasregistrationatlases)and2 fortestingineachfold.Thetestdatasetwasusedasanexternaltestset toevaluateMU-Nettrainedonthetrainandvalidationdataset.

Unlessotherwisespecified,weusedapairedpermutationtesttoeval- uatethesignificanceofdifferencesbetweentheDicescoresobtainedby differentmethods,pairingtheDicescoresobtainedonthesameMRI volumes.Theunpairedpermutationtestwasusedinsteadwhencompar- ingresultsobtainedondifferentvolumes,forexample,whencomparing theaccuracyofamodelonvolumesfromyoungermicewiththatofthe samemodelonoldermice,andforallcomparisonsonthetestset.We performedpermutationtestsusing100,000iterations,andconsidered averagedifferencestobesignificantwhen𝑝wassmallerthan0.05.The unpairedpermutationtestsofDicecoefficientsbetweendifferentanimal groupswereperformedbypermutinganimals(notimages)betweenthe twogroups.This ensuresexchangeabilitywhenseveralimagesofthe sameanimalexistedduetolongitudinaldesignsinthetestset.

3. Results

Using thetrain and validation dataset, we compared the perfor- manceofdifferent networkarchitectures.Furthermore,wecompared MU-Netwithmulti-atlassegmentationonbothourdataandtheMRM

NeAtdataset,andevaluatedtheimpactofmouseageontheaccuracy ofoursegmentationmaps.TheexperimentsreportedinSections3.1– 3.3arebasedon5-foldCVonthetrainandvalidationset,andexperi- mentsinSection3.4on5-foldCVontheMRMNeAtdataset.Finally,in Section3.5,wetestedMU-Nettrainedontrainandvalidationsetonan independenttestsetthatincluded1782MRIvolumesfrom817mice.

3.1. Architecturecomparison

Wecompared theperformanceof differentnetworks trainedwith andwithoutdenseconnectionsanddualframingconnections,inboth 2Dand3Dimplementations.

As shownin Table 2,all MU-Nets achievedDice scoreswith the groundtruthcomparabletoorhigherthanthetypicalinter-ratervari- abilityofmanual segmentationinthemousebrain(Dicescoresfrom 0.80to0.90(Alietal.,2005)).Theskull-strippingtaskachievedanex- cellent Dicescoreof0.984.Theventricleswerecharacterizedbythe lowestsegmentationperformance(averageDicescore0.907),whilethe cortexdisplayedthehighestoverlapwiththegroundtruth(averageDice score0.966).DicescoresforeachanimalinallROIsareprovidedassup- plementaryTableS3.

ThenetworkdisplayingthehighestaverageDicescoreswas,infact, thesimplest one,includingnoin-blockskipconnectionsnorframing connections,andusing2Dconvolutions.Theaccuracyofthisnetwork wassignificantlyhigherthantheaccuracyofotherallother2Dnetworks (𝑝<0.00003).Becauseofitsexcellentperformanceandsimplicitythis networkisourchoicefortheMU-Netarchitecture,whichisthearchi- tectureweusedforallexperimentsdetailedinSections3.2and3.3.

Thechoicebetween2Dand3Darchitectureswasthemostimpor- tantfactorinincreasingperformance,resultinginamarkedincreasein meanDicescoresforbothtasks(𝑝<0.00001)betweenall2Dnetworks comparedtothe3Dones.WefurthercomparedMU-Netwithonefea- turinglesschannelsperfilter(49,49,50,50,fromtheshallowestto thedeepestconvolutionalblock)tomatchthenumberofparametersto thenumberofparametersofthesimplest 2Dnetwork.Weregistered aslightly(butnotsignificantly,𝑝=0.077)loweraccuracycomparedto MU-Net,indicatedas2DSLPinTable2.

Totestwhethertheincreasedperformanceof2Darchitecturescom- paredtothe3Dimplementationdependedonthereducednumberof parametersoronanexcessivelossofinformationwhenpoolinginthe anterior-posteriordirection,wetrainedanetworkusing3Dfilterswhile limitingpoolingoperationstothecoronalplane.Thisnetworkachieved asegmentationaccuracyin betweenthe3Dand2Dimplementations (Table2),suggestingthatbothabovementionedaspectswererelevant inincreasingthealgorithm’sperformance.

Westudiedtheeffectofbiasfieldcorrectiontotheperformanceof MU-Nettrainingitonimageswithoutbias-correction,andseparately,on N3bias-correctedMRimages(Sledetal.,1998).Thevalidationaccuracy achievedwithbiascorrectionwasindistinguishablefromtheaccuracy ofMU-Nettrainedwithoutbiascorrection(seeTable2).

3.2. Agestratifiedtrainingsets

WeevaluatedtheperformanceofMU-Netwhenrestrictingthetrain- ingsettomiceofaspecificage.Networkstrainedondatafrommiceof 12,16and32weeksachievedhigheraccuracy,bothontheirrespective validationsetandtheoverallgroundtruth,comparedtothenetworks trainedon5weeksmice (𝑝<0.00001).AsshowninFig.5,while all networkstrainedononespecificagedisplayedastatisticallysignificant (𝑝<0.05,unpaired)decreaseinmeanaccuracywhenvalidatedonani- malsofadifferentage,thisdifferencewashighestbetweenthe5weeks dataandtheotherdatasets.

Limitingthetrainingdatatoonespecificageimpliesthatthesenet- worksweretrainedonlyonaquarterofthedatausedtotrainthenet- worksinSection3.1.Irrespectiveofthat,thesenetworksstillachieved averageDicescoreonthemixed-agevalidationdatasetcomparablewith

(7)

Table2

CNNandSTEPSaccuraciesmeasuredusingDicecoefficientacrossdifferentmethodologicalchoices.Cross-validation resultsonthetrainandvalidationdataset.

Dim SC FC Brain mask Cortex Hippocampi Ventricles Striati ROI mean 2D 0.984 ± 0.005 0.966 ± 0.009 0.925 ± 0.017 0.907 ± 0.020 0.939 ± 0.010 0.935 ± 0.026 2D x x 0.984 ± 0.006 0.963 ± 0.010 0.924 ± 0.016 0.905 ± 0.022 0.937 ± 0.009 0.932 ± 0.026 2D x 0.984 ± 0.006 0.963 ± 0.011 0.924 ± 0.017 0.905 ± 0.022 0.938 ± 0.009 0.932 ± 0.026 2D x 0.984 ± 0.005 0.964 ± 0.011 0.923 ± 0.018 0.905 ± 0.024 0.937 ± 0.010 0.932 ± 0.027 3D x x 0.982 ± 0.007 0.956 ± 0.016 0.914 ± 0.033 0.900 ± 0.025 0.926 ± 0.045 0.924 ± 0.038 3D x 0.982 ± 0.007 0.958 ± 0.016 0.916 ± 0.032 0.900 ± 0.025 0.928 ± 0.029 0.925 ± 0.034 3D x 0.982 ± 0.006 0.957 ± 0.016 0.913 ± 0.041 0.899 ± 0.028 0.926 ± 0.042 0.924 ± 0.040 3D 0.982 ± 0.007 0.957 ± 0.013 0.916 ± 0.033 0.899 ± 0.026 0.926 ± 0.039 0.924 ± 0.036 3DConv 2DPool 0.983 ± 0.006 0.961 ± 0.010 0.919 ± 0.026 0.902 ± 0.026 0.934 ± 0.014 0.929 ± 0.030 2D SLP 0.984 ± 0.005 0.965 ± 0.009 0.924 ± 0.016 0.907 ± 0.021 0.939 ± 0.010 0.934 ± 0.026 2D + N3 0.984 ± 0.005 0.965 ± 0.009 0.924 ± 0.020 0.907 ± 0.020 0.939 ± 0.009 0.934 ± 0.026 STEPS (affine) \ 0.920 ± 0.058 0.827 ± 0.079 0.761 ± 0.090 0.873 ± 0.062 0.845 ± 0.093 STEPS (diffeo) \ 0.948 ± 0.036 0.844 ± 0.048 0.812 ± 0.090 0.871 ± 0.045 0.869 ± 0.070 STEPS (affine) \ 0 . 936 ± 0 . 013 0 . 831 ± 0 . 029 0 . 781 ± 0 . 049 0 . 887 ± 0 . 019 0 . 859 ± 0 . 066 STEPS (diffeo) \ 0 . 954 ± 0 . 009 0 . 848 ± 0 . 025 0 . 826 ± 0 . 039 0 . 885 ± 0 . 016 0 . 879 ± 0 . 055 Majority Voting \ 0.889 ± 0.179 0.780 ± 0.232 0.677 ± 0.208 0.816 ± 0.245 0.791 ± 0.230 ListedvaluesaretheaveragevalidationDicescoresbetweenautomaticandmanualsegmentation±standardde- viationsoftheseDicescoresin5-foldCV.ROImeancolumnreferstothemeanDicecoefficientofthecortex,the hippocampi,theventriclesandthestriati.SCandFCindicatethepresenceofskipconnectionandframingconnec- tions.MU-Netresultsaredisplayedinthefirstrow.STEPSreferstoSTEPSusingrandomlyselectedtemplates;STEPS referstoSTEPSrunsusingrandomlyselectingmiceofthesameageonly;affineindicatesthatonlyaffineregistration wasused,whereasdiffeoindicatesthiswasfollowedbyadiffeomorphicregistrationstep;Majorityvotingrefers totheselectionofthemostoccuringlabelafterdiffeomorphicregistration;3DConv2DPool:networkfeaturingno in-blockskipconnectionsorframingconnections,with3Dfilteringand2Dpoolinginthecoronalplane;2DSLP:

2Dnetworkwithin-blockskipconnectionsandalimitednumberofparameters;2D+N3:2Dnetworktrainedon databias-correctedusingtheN3algorithm.Boldfacecharactersindicatethebestperformingnetwork,achieving significantlyhigherDicescoresthanallothernetworksforthatROI.

theaccuracyofmanualsegmentation.TheworstperformingCNNwas thenetworktrainedon5weeksoldmice.Trainingonthe12,16and32 weeksdataandvalidatingonmiceofthesameage,weobservedDice scorescomparablewiththeoverallperformanceofMU-Nettrainedon theentiredataset(𝑝>0.15,unpaired).However,wemeasuredalower overallperformancewhenincludingmiceofallagesinthevalidation data(𝑝<0.00001),slightlyoverfittingforeachspecificage.

3.3. Comparisonwithmulti-atlassegmentation

WecomparedMU-Netwithmulti-atlassegmentation,applyingthe state-of-the-artSTEPS(Cardosoetal.,2013;2012)labelfusionmethod tocombinethelabelsobtainedfromtheregistrationofmultiplelabeled volumes.ThiswasimplementedusingtheNiftysegpackageasdescribed inSection2.3.Werepeatedthisprocedureusingbothdiffeomorphicand affineregistrationmethods,withrandomly-selectedtemplatesrestricted tosame-agemice.Thebrainmasksegmentationwasnotevaluatedasthe manuallydrawnmaskwasusedduringthediffeomorphicregistration procedure.

MU-NetachievedhigherDicecoefficientsthanallSTEPSimplemen- tations(𝑝<0.00001,Cohen’s𝑑: 4.39,seeTable 2).Also,there wasa markedqualitativedifferencebetweenSTEPSsegmentationandMU- Net(Fig.2),thelatterachievingresultsvisuallyindistinguishablefrom manualsegmentation.WecomputedHD95distancesfurtherconfirmed thisdifference,withanaverageof0.084±0.019mmforMU-Netagainst 0.251±0.064mmforSTEPS(𝑝<0.00001).Wemeasuredameanpreci- sionof 0.962±0.008 (MU-Net)vs 0.820±0.025 (STEPS) (𝑝<0.00001) andameanrecallof 0.951±0.011(MU-Net)vs0.952±0.013(STEPS) (𝑝=0.65).

MU-Nethadaninferencetimeofabout0.35sandatrainingtimeof12 h.STEPSsegmentationprocedurerequiredtotalinferencetimeof117 minforeachlabeledvolume(onaverage440sforeachpairwisediffeo- morphicregistrationand7.85sforlabelfusion).ImplementingSTEPS segmentationusingonlytemplatesofthesameageledtoasmallbut significantimprovementin Dicecoefficientsover randomlychoosing templatesofanyage(𝑝<0.0007,Cohen’s𝑑:0.296).Theemploymentof

diffeomorphicregistrationwasthemostimportantfactoraffectingthe performanceofSTEPS,asdisplayedinTable2.Asimplemajorityvoting strategyledtosignificantlylowerperformanceinallROIscomparedto allotherlabelfusionstrategies(𝑝<0.003).

Furthermore,wetrainedMU-Netontheoutputsoftheimplemented STEPSproceduresfeaturingdiffeomorphicregistration,andmeasured theDicescoresofeachnetwork’soutputwiththegroundtruth(Table3).

AsevidencedinTables2and3,andFig.3,MU-NettrainedonSTEPS segmentationsachievedhigherDicescorewiththegroundtruththan thesameSTEPSsegmentationsconstitutingthetrainingsetsofMU-Net (𝑝<0.00001).Withtheexceptionofthenetworktrainedon5weeksold mice,thesehybridnetworkswerestillunder-performingcomparedto trainingonmanuallysegmenteddata(𝑝<0.00001).

3.4. EvaluationonalargenumberofROIswithMRMNeAtdataset

WetrainedandevaluatedMU-NetontheMRMNeAtdatasetsthat includesatlasesof10 individualT2∗-weightedinvivobrainMRim- agesof 12–14weeksoldC57BL/6Jmice;eachwith37 manuallyla- belledanatomicalstructures(Maetal.,2008).Thissamedatabasewas selectedbyMaetal.(2014)toevaluatetheSTEPSmulti-atlassegmen- tationalgorithmonmousebrainMRI.TocompareMU-NetwithSTEPS, wefollowedtheSTEPSimplementationbyMaetal.(2014)asreleased bytheauthors.

Weuseda5-foldcrossvalidationschemeforevaluation(8templates fortrainingand2templatesfortestingineachfold).Theonlyadapta- tionrequiredtotrainMU-Net onMRMNeATdatasetwas toexpand thenumberofoutputchannelsto37(plusoneforthebrainmask)to equalthatofthenumberofROIs.AsdisplayedinFig.4,Dicecoefficient of MU-Netwas greateror comparabletoSTEPS:while inamajority ofregionsMU-Net’saccuracywashigherthantheaccuracyofSTEPS, thiswasstatisticallysignificantonlyforthebrainmask,externalcap- sule,hypothalamusandbrainstem.Intheleftinferiorcolliculi,STEPS achievedsignificantlyhigherDicecoefficientthanMU-Net.Averaging theDicecoefficientsacrossallROIs,wemeasuredanaverageDicescore of0.820±0.031forMU-Netand0.814±0.023forSTEPS.Whilethisaver-

(8)

Fig.2. Segmentationcomparisoninfourslicesfromasingleanimal:(a)STEPS,(b)MU-Net,and(c)manualannotation.In(a)–(c),theregionshighlightedarethe cortex(blue),ventricles(green),striati(red),andhippocampi(yellow).Panel(d)showstheinferredbrainmaskbyMU-Net.

Table3

MeanandstandarddeviationofaverageDicescoresevaluatingtheaccuracyofMU-Nettrained onvolumessegmentedviaSTEPS.

Training Set Cortex Hippocampus Ventricles Striatum ROI mean STEPS 0.954 ± 0.011 0.867 ± 0.027 0.866 ± 0.035 0.898 ± 0.017 0.896 ± 0.043 STEPS 0.953 ± 0.009 0.872 ± 0.022 0.849 ± 0.041 0.885 ± 0.016 0.890 ± 0.046

Fig.3.AverageDicescorecomparisonbetweendifferentsegmentationmeth- ods,acrossallROIs.MU-Net:MU-Nettrainedonthemanuallysegmenteddata;

MU-Net-STEPS:MU-Nettrainedonvolumessegmentedemployingsame-age diffeomorphicSTEPS;STEPS:same-agediffeomorphicSTEPSsegmentation.The errorbarrepresentsstandarddeviation.

ageDicecoefficientforMU-Netwashigher,thedifferencewasnotstatis- ticallysignificant(𝑝=0.170,Cohen’s𝑑:0.134).Similarly,wemeasured anhigher(butnotstatisticallysignificant,𝑝=0.07)averageHD95dis- tanceforMU-Net(0.360±0.252mmvs0.240±0.038mm).Incontrast,we measuredasignificantlyhigheraverageprecisionwithMU-Net(0.823

±0.033vs0.786±0.024,𝑝=0.0009)andasignificantlylowerrecall (0.815±0.032vs0.853±0.023,𝑝=0.001).Afullbreakdownofthese metricsisavailableinsupplementaryFig.S3.Thecomputationtimere-

quiredbySTEPStosegmentasinglevolumewasofapproximately20 minwhileMU-Netrequiredlessthanonesecondpervolume.

3.5. Evaluationwithalargetestdataset

WeoptimizedtheMU-Netmodelonthetrainandvalidationdataset andtestedonalargetestsetof1782MRIvolumes,acquiredfrom817 mice withagesrangingfrom 4to60 weeks,andincludingbothWT andHTmice.Asthe5-foldcross-validationexperimentproducedfive differentMU-Netmodels,thesegmentationmapsforthetestsetwere obtainedbyaveragingthefivepredictionmapsproducedbythefive models.Tooutlinethebrainmask,weaveragedsigmoid-activatedpre- dictionsfrom fivenetworks andthresholdedthemat 0.5.Forregion segmentation,weaveragedthesoftmax-activatedoutputmaps,andfor eachvoxel,weselectedtheclassyieldingthemaximalaveragedvalue asourpredictedlabel.

Outof theentiretestset,segmentationfailedcompletely ontwo volumes,wherenobrainmaskwasdetected.Theremaining1780vol- umesweresuccessfullysegmentedwithanaverageDicescoreof0.978

±0.012forthebrainmask,0.906±0.041forthestriati,and0.937± 0.035forthecortex,distributedasillustratedinFig.7.Therewasno significantdifferencebetweenthesegmentationaccuracyofmaleand femaleanimals(𝑝>0.1,unpaired).However,therewasasignificantdif- ferenceinaccuracybetweenHTandWTmice(𝑝<0.00001,unpaired) forallROIs.DicescoresofWTanimalswere0.4%higherforthebrain mask,1.7%higherforthecortex,and1.9%higherforthestriati.Ap- plyingN3biascorrectiononallvolumesbeforesegmentationdidnot resultinasignificantDicescoredifference.AdetailedlistofDicescores, HD95,precisionandrecall,foreachanimalandeachROI,isavailable insupplementaryTableS4.

(9)

Fig.4. ComparisonbetweentheaverageDicecoefficientsofMU-NetandSTEPSmulti-atlasalgorithmbyMaetal.Errorbarscorrespondtostandarddeviation fortheaverageaccuracy.Permutation-testbasedp-valuesforeachcomparisonareprovidedinparenthesesaftertheROIname,+indicatesthattheaverageDice coefficientforMU-Netwashigherand-indicatesthattheaverageDicecoefficientforSTEPSwashigher,indicatesastatisticallysignificantdifference.

(10)

Fig. 5. Mean accuracy ± standard deviation for the average accuracy of MU-Net trained and evaluated ondifferent datasets according tomouseage.Networksexclusivelytrainedon olderanimalsachievedloweraccuracywhenat- temptingtogeneralizetotheyoungestanimals, andvice-versa.

Fig.6. MU-Netsegmentationcomparedtothemanualsegmentationinfourslicesoffourvolumesofthetestset.Blueandredindicate,respectively,groundtruth andinferredsegmentation,purpletheiroverlap(striatiandcortex);yellowROIs(ventriclesandhippocampi)areinferredROIsforwhichmanualannotationswere notavailable.Rowsindicate(a)thehighestperformingvolume(meanDice0.964,8weeksoldR6/2mouse);(b)thelowestperformingvolume(meanDice0.685, 12weeksoldR6/2mouse);(c)thevolumedisplayingperformanceclosesttothemeanperformanceontheentiretestset(Dice0.923,12weeksoldQ175DNmouse);

(d)onerandomlyselectedvolume(Dice0.919,8weeksoldQ175DNmouse)

Avisualinspectionofthesegmentationmaps(Fig.6)revealedthat ROIswerequalitativelysimilartothoseobtainedonthevalidationset anddisplayed inFig.2. Weobserved,however,avisibledecrease in performanceinthepresenceofstrongringingartifacts(Fig.6.b)Thisis furtherreflectedinthehigheraverageHD95distancesinthetestdataset thaninthevalidationdataset(Table4).

4. Discussion

Wehavepresentedamulti-taskdeepneuralnetwork,MU-Net,for thesimultaneousskull-strippingandsegmentationofmousebrainMRI.

Weselectedthebestperformingnetworkamonganumberofarchitec- turesandfoundittoachievebettersegmentationaccuracyontheval- idation setcompared tostate-of-the-artmulti-atlassegmentationpro- cedures,withamarkedlylowersegmentationtime(0.35svs117min).

WethenevaluatedtheperformanceofMU-Netonalargeandhetero-

(11)

Fig.7. TestsetDicescoredistributionforthebrainmask,cortexandstriatiROIs.MalesandFemalesincludeallmiceofeachgender,bothWTandTG.Likewise, WTandTGincludebothmalesandfemales.

Table4

Averagetestsetmetrics(seeSupplementaryTableS4fordetails).

Metric Brain Mask Cortex Striati

Dice 0.978 ± 0.012 0.937 ± 0.035 0.906 ± 0.041 HD95 (mm) 0.345 ± 0.303 0.223 ± 0.231 0.180 ± 0.167 Precision 0.989 ± 0.006 0.939 ± 0.050 0.929 ± 0.045 Recall 0.969 ± 0.022 0.939 ± 0.054 0.888 ± 0.062

geneoustestsetof1782micefrom10differentstudiesofHuntington disease,withvaryingagesandgeneticbackgrounds(WTaswellasHT Q175andR6/2variants).Inthistestset,wemeasuredaverageDice scoresof0.978,0.906and0.937forthebrainmask,striatiandcortex, rivalinghuman-levelperformance.WeadditionallytrainedMU-Netfor thesegmentationofhighresolutionmouseMRIsoftheMRMNeatatlas into37ROIsmeasuringanaverageDicescoreof0.820.Hence,weargue thattheemploymentofdeepneuralnetworksforthesegmentationof animalMRIisapromisingstrategyforthereductionofbothraterbias andsegmentationtime.

ToputtheDicescoreswehavereportedincontext,Dicescoresbe- tweentwohumanexpertshaverangedfrom0.80to0.90,depending onROI,formousebrainMRIsegmentation(Alietal.,2005).Fordif- ferentsegmentation tasksin brainMRI in general, includinghuman data,inter-andintra-raterDicescore haveranged between0.75and 0.96(Ali et al., 2005; Entis et al., 2012; Yushkevich et al., 2006).

TheDicescoresofMU-Netexceededtheabovementioned scoresbe- tweentwohumanexperts,suggesting human-levelsegmentationper- formance.Inaddition,theDicescoreofMU-Netforskull-strippingwas

higherthanDicescorefromtheskull-strippingCNNimplementedby Royetal.(2018b)(0.949).Obviously,comparingpreviouslyreported Dicescorestooursegmentationaccuracymeasuresmustbedonewith care as these vary across different studies, segmentation tasks, and datasets,andtheconfounding factorsincludeimageresolution,pres- enceofartifactsandnoise,raterexpertise,andthechoiceofROIs.

While Royet al.(2018b) proposed aCNN for skull-strippingfor mouseMRI,toourknowledgethisworkrepresentsthefirstCNNper- formingbothregionsegmentationandskull-strippingin mousebrain MRI.TheadvantagesofCNNswithrespect toatlas-basedregionseg- mentation(Baietal.,2012;DeFeoandGiove,2019;Maetal.,2014) areclear.First,comparedtoatlas-basedsegmentationMU-Netismuch fasterandproducesaccurateresultswithoutpre-processing.Second,we foundMU-Nettobesignificantlymoreaccuratethanthestate-of-the- artSTEPSmulti-atlassegmentation(Maetal., 2014) onanisotropic, relativelyquicktoacquireMRimagesfavoredinpre-clinicaldrugand biomarkerdiscoveryapplications.Third,wefoundMU-Nettoperform better than or equally well compared to STEPS on isotropic, high- resolutionMRimageswithrelativelylongacquisitiontimes,favoredin basicresearch.

Weobservedthatthesegmentationaccuracyofatlas-basedmethods can vary markedly,basedon thespecificuse case dependingon the numberofmanuallydrawnROIs,voxel-size,andimagequality.Thebest performancewasachievedusingadvancedregistration-basedmethods (Maetal.,2014)onthehighresolutiondata(Maetal.,2008)witha denselylabeledatlasof37ROIs,andthelowestusingamajorityvoting ruleonasparselyoutlinedatlaswithalowresolutionalongthefronto- caudaldirection.

(12)

Withadensesegmentationofhighresolutionimages(NEaTdataset), wemeasuredslightlyhigheraverageDicecoefficientswithMU-Netthan withSTEPS,butthedifferencewasnotstatisticallysignificant.There- fore,itappearsthatforthiscasethemainadvantageofMU-Netover STEPSwouldbeintermsofsegmentationtime.TheperformanceofMU- NetontheNeAtdatasetwaslikelyhamperedbythesmallnumberof trainingimagesavailable(8imagesfortrainingineachfold).Thisalso providesanexplanationforthehigherstandarddeviationforHD95dis- tancesforMU-NetcomparedtoSTEPS.Interestingly,MU-Netachieved DicecoefficientssimilartoSTEPSwithalargeraverageprecisionbut aloweraveragerecall.ThiswouldindicatethatSTEPSpredictioncon- tainedmorefalsepositives,labelingbackgroundvoxelsasbelongingto ROIs,andconverselyMU-Net’spredictionfavoredfalsenegatives.For sparselysegmented images,typicalindrugdevelopment,whereonly specificstructuresareofinterest,STEPSappearstobemarkedlyless effectivethanMU-Net,andthetimerequiredformanualannotationis notablydecreased.Thisalsomeansthatitmightbefeasibletoannotate asmallnumberofvolumesasrequiredbythespecificstudy,andthen useMU-Nettoautomatethesegmentationoftheremainingdata.

Interestingly,MU-NetstrainedonautomaticSTEPSmulti-atlasseg- mentations achieved higher Dice score with the ground truth than STEPS,highlightingthegeneralizationabilityofMU-Net.Thissupports theuseofatlasbasedsegmentationmethodstoaugmentMRIsegmenta- tiondatasetssuggestedinRoyetal.(2018a),leveragingunlabeleddata.

TheresultsobtainedbytrainingonSTEPSsegmentationsaloneremain, however,ofinsufficientqualitytoeliminatetheneedformanualanno- tationsinthetrainingdata,astheCNNattemptstoreplicateanyform ofsystematicerrorpresentintheatlas-basedlabelingprocedure.

Inliteratureboth3Dand2DimplementationsofCNNsareavail- ablefordifferentsegmentationtasks(Çiçeketal.,2016;Milletarietal., 2016;Royetal., 2018a),andotherarchitectural variants havebeen proposed:Royetal.(2018a)addeddenseconnections(Huangetal., 2017)intheconvolutionblocksofU-Netwhilekeepingthenumberof outputchannelsconstant;HanandYe (2018)proposedtwovariants basedonsignalprocessingargumentsforthereductionofartifactsin asparseimagereconstructiontask.We, however,foundthata more complexmodeldidnotimproveandinfactloweredtheaccuracyofour results,perhapsgiventhesimplicityofthetask.Thus,inagreementwith Isenseeetal.(2018),wefoundthata2Dapproachwaspreferableto3D approachinthepresenceofanisotropicvoxels.WealsofoundtheDice losstobesufficienttoeffectivelytrainourmodelwithouttheaddition ofacross-entropyloss.Aswedidnotperformanyfinetuningofhyper- parametersforanyofourmodels,itispossiblethataftersufficientfine tuningtheperformanceofoneofthesealternativeapproachesmightbe improved.

Muchlikethehumaneye,MU-Netwasnotsignificantlyaffectedby thepresenceofthebiasfield,anddidnotbenefitfromN3biascorrec- tion.Correctingforthebiasfieldmightstillbebeneficialasitdepends onthespecificexperimentalsetup,andthusN3biascorrectionmight avoidspecializingthenetworktooneparticularacquisitionprocedure.

Forthisreason,wereleasethetrainedparametersofthemodelforMU- Nettrainedonboththenon-correctedandtheN3-correcteddata.

Toensurethenetworkgeneralizestoawideagerange,ourresults indicatethatthedistinctivefeaturespresentbeforeadulthoodneedtobe adequatelyrepresentedinthetrainingdata.Thisisevidencedbythede- gradedperformanceobservedwhentestingnetworkstrainedon5-week oldmiceonthevolumesacquiredfromolderones,andvice-versa.As micearetypicallyweanedat3–4weeksandattainsexualmaturityat8–

12weeks(DuttaandSengupta,2016),5-weekoldmicearenotadults.

Incontrast,trainingsolelyonmalemicedidnotsignificantlyinfluence MU-Netperformanceonfemaleanimals.WestudiedwhytheDiceco- efficientdistributionswerebi-modalwiththelargetestset(seeFig.7).

Thebi-modalnatureofthedistributionsappearsnottobeexplainedby differencesbetweendifferentstudies,genders,orgenotypes(seesupple- mentaryFigs.S4andS5).Wecannotofferadefinitiveexplanationfor thecauseofthesebi-modaldistributions,however,wespeculatethatit

isasumofseveralfactors,includingintra-ratersegmentationvariabil- ity.

Anobviouslimitationofourapproachisitsspecialization forthe specificMRIcontrast thealgorithmistrainedon.MakingMU-Net to be morerobusttomarkedchangesintheimageacquisitioncouldbe achievedbyexpandingthetrainingdatatobemorevariableor/anduti- lizingtechniquessuchasdomainadaptation,transferlearningorimage translationtominimizetheamountofnewtrainingdataforthemodel togeneralizetonewtypeofMRIacquisition(Armaniousetal.,2020;

Zhuangetal., 2020).Thisresearchlineisoneofthemostimportant areasforfutureresearchinMRIsegmentationwithdeeplearning.How- ever,MU-Netsuccessfullygeneralizedtoavarietyoftransgenicmicein anagerangewiderthanthatofthetrainingset,thusofferingavaluable waytoautomatesegmentationtasks.Anotherlimitationofthisstudy isthenumberofROIsasmousebrainatlaseswithextremelydetailed segmentationfeaturingover700ROIscurrentlyexist(Nieetal.,2019).

However,atlasessuchas(Nieetal.,2019)areconstructedbyspecial- izedproceduresanddonotcontainmanualsegmentationsofallimages usedintheatlasconstruction.Therefore,theseatlasesarenotdirectly applicablefortrainingsegmentationneuralnetworks.

TheemploymentofCNNsforthesegmentationofmousebrainMRI providesanumberofbenefitsforpreclinicalresearchers.Beyondallow- ingfortheemploymentoflargedatasetsinatime-efficientmanner,the abilitytogeneralizeandabstractfromthetrainingdataresultsinmore robustandreproduciblepredictions.Wecanthusexpectthesemethods toreducetheconfoundingeffectofintra-andinter-ratervariabilityin- herentinmanualsegmentationprocedureswhilestreamlininganimal MRIexperimentalpipelines.

Declarations

Dataavailabilitystatement

MU-Net code andtrained modelsarefreely available at https://

github.com/Hierakonpolis/MU-Net. Atutorial of usageof MU-Net is available at https://github.com/Hierakonpolis/NN4Kubiac Thetrain- ingandvalidationdatasetispropertyofCharlesRiverDiscoverySer- vices,andthetestdatasetispropertyofCHDI’CureHuntington’sDis- easeInitiative’foundation.TheMRMNeAtdatasetisfreelyavailableat https://github.com/dancebean/mouse-brain-atlas. AlltheDice scores betweenMU-Netandmanualsegmentationsareavailableassupplemen- taryfilestothismanuscript.

Ethicsstatement

All animalexperiments werecarried out accordingtotheUnited StatesNationalInstituteofHealth(NIH)guidelinesforthecareanduse oflaboratoryanimals,andapprovedbytheNationalAnimalExperiment Board.

Creditauthorshipcontributionstatement

Riccardo DeFeo:Methodology,Software, Formalanalysis, Writ- ing-originaldraft.ArtemShatillo:Datacuration.AlejandraSierra:

Methodology,Formalanalysis.JuanMiguelValverde:Methodology.

Olli Gröhn: Conceptualization. Federico Giove: Conceptualization.

JussiTohka:Conceptualization,Software,Writing-originaldraft.

Acknowledgments

R.D.F.’s work has received funding from the European Union’s Horizon 2020 Framework Programme under the Marie Skłodowska Curie grant agreement No #691110 (MICROBRADAM) and J.M.V.’

workwasfoundedfromMarieSkłodowskaCuriegrantagreementNo

#740264(GENOMMED).Thecontentissolelytheresponsibilityofthe

Viittaukset

LIITTYVÄT TIEDOSTOT

Realistisen ilmalämpöpumpun vuosilämpökerroin (SCOP) ilman lämmönluovutuksen kokonais- hyötysuhdetta sekä kun hyötysuhde on otettu huomioon nykyisten määräysten

Pienet ylinopeudet (esim. vähemmän kuin 10 km/h yli nopeusrajoituksen) ovat yleisiä niin, että monilla 80 km/h rajoituksen teillä liikenteen keskinopeus on rajoi- tusta

tieliikenteen ominaiskulutus vuonna 2008 oli melko lähellä vuoden 1995 ta- soa, mutta sen jälkeen kulutus on taantuman myötä hieman kasvanut (esi- merkiksi vähemmän

• Hanke käynnistyy tilaajan tavoitteenasettelulla, joka kuvaa koko hankkeen tavoitteita toimi- vuuslähtöisesti siten, että hankkeen toteutusratkaisu on suunniteltavissa

nustekijänä laskentatoimessaan ja hinnoittelussaan vaihtoehtoisen kustannuksen hintaa (esim. päästöoikeuden myyntihinta markkinoilla), jolloin myös ilmaiseksi saatujen

Tornin värähtelyt ovat kasvaneet jäätyneessä tilanteessa sekä ominaistaajuudella että 1P- taajuudella erittäin voimakkaiksi 1P muutos aiheutunee roottorin massaepätasapainosta,

Furthermore, predictions did not only achieve greater Dice coefficients than the other 3D networks, but also the quality of the final segmentations was noticeably better in

Työn merkityksellisyyden rakentamista ohjaa moraalinen kehys; se auttaa ihmistä valitsemaan asioita, joihin hän sitoutuu. Yksilön moraaliseen kehyk- seen voi kytkeytyä