MRI white matter lesion segmentation using an ensemble of ...

HAL Id: hal-01808412https://hal.archives-ouvertes.fr/hal-01808412

Submitted on 5 Jun 2018

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

MRI white matter lesion segmentation using anensemble of neural networks and overcomplete

patch-based votingJosé Manjón, Pierrick Coupé, Parnesh Raniga, Ying Xia, Patricia Desmond,

Jurgen Fripp, Olivier Salvado

To cite this version:José Manjón, Pierrick Coupé, Parnesh Raniga, Ying Xia, Patricia Desmond, et al.. MRI white matterlesion segmentation using an ensemble of neural networks and overcomplete patch-based voting. Com-puterized Medical Imaging and Graphics, Elsevier, In press, �10.1016/j.compmedimag.2018.05.001�.�hal-01808412�

https://hal.archives-ouvertes.fr/hal-01808412

https://hal.archives-ouvertes.fr

1

MRIwhitematterlesionsegmentationusinganensembleofneuralnetworksandovercompletepatch-basedvoting

JoseV.Manjón1,PierrickCoupé2,3,ParneshRaniga4,YingXia4,

PatriciaDesmond5,6,JurgenFripp4,OlivierSalvado4

1InstitutodeAplicacionesdelasTecnologíasdelaInformaciónydelasComunicacionesAvanzadas(ITACA),UniversitatPolitècnicadeValència,CaminodeVeras/n,46022,Valencia,España

2Univ.Bordeaux,LaBRI,UMR5800,PICTURA,F-33400Talence,France.

3CNRS,LaBRI,UMR5800,PICTURA,F-33400Talence,France.

4Australiane-HealthResearchCentre,CSIRO,BrisbaneQLD4029,Australia.

5DepartmentofRadiology,UniversityofMelbourne,ParkvilleVIC3010,Australia

6DepartmentofRadiology,TheRoyalMelbourneHospital,ParkvilleVIC3050,Australia

Abstract.

Accurate quantification of white matter hyperintensities (WMH) from Magnetic ResonanceImaging(MRI)isavaluabletoolfortheanalysisofnormalbrainageingorneurodegeneration.ReliableautomaticextractionofWMHlesionsischallengingduetotheirheterogeneousspatialoccurrence, their small sizeand theirdiffusenature. In thispaper,wepresentanautomaticmethodtosegmenttheselesionsbasedonanensembleofovercompletepatch-basedneuralnetworks.Theproposedmethodsuccessfullyprovidesaccurateandregularsegmentationsdueto its overcomplete nature while minimizing the segmentation error by using a boostedensembleofneuralnetworks.Theproposedmethodcomparedfavourablytostateofthearttechniquesusingtwodifferentneurodegenerativedatasets.

Keywords:lesionsegmentation,MRI,brain,patch-based,neuralnetwork,ensemble

2

1. Introduction

Whitematterhyperintensities(WMH)areregionsofincreasedMRsignalinT2-Weighted(T2W)and FLuid Attenuated Inversion Recovery (FLAIR) images that are distinct from cavitations(Wardlaw,2012).Thenumber,sizeandlocationofWMHcanprovideimportantinformationintotheaetiologyandprogressionofvariousdiseases.Thishasbeenextensivelystudiedinnormalageing,cerebrovasculardisease,dementia(KuoandLipsitz,2004;DebetteandMarkus,2010)anditsinfluenceonco-morbidities(Leeetal.,2015).Thepresence,topographyandvolumeofWMH is used as biomarkers for stroke (Kuller et al., 2004;Wong et al., 2002), small vesselcerebrovasculardisease(CVD)(Schmidtetal.,2004),dementia(DebetteandMarkus,2010)andinmultiplesclerosis(MS)(FilippiandRocca,2011).

Inclinicalpractice,qualitativevisualratingscaleshavebeenfrequentlyused(Scheltensetal.,2009).However,inordertouseWMHvolumeandspatiallocationasabiomarker,lesionsneedtobeaccuratelyandpreciselysegmented.Somepromisingearly-automatedmethodshavebeenused in longitudinal clinical studies (Mäntylä et al., 1997), with later studies focused onimprovingthesensitivity,specificityandrobustnessofautomatedWMHsegmentation.Manualandsemi-automatedsegmentationofWMHisatediousprocessrequiringtrainedobserversandseveralhoursperimageformanualdelineationbyanexpertmakingitunsuitableforroutineclinicalandresearchusage(Udupaetal.,1997).Moreover,manualsegmentationispronetointerandintraratervariability.

Withmany largeclinical studies investigatingageing,cerebrovasculardisease,anddementia,there is a need for robust, repeatable, accurate, and automated techniques for thesegmentationofWMH.Inrecentyears,severalmethodshavebeenproposedtoautomaticallysegmentWMHinCVDandinMS.Whiletheunderlyingpathologyisdifferent,theradiologicalsignaturesofMSandCVDaresufficientlysimilarthatmethodsdevelopedforonehavegoodperformance for the other (Caligiuri et al., 2015). Demyelinating lesions of MS andcerebrovascular disease appear as hyperintense regions on T2W and FLAIR images. Initialapproaches to segment of WMH relied on the higher intensity in lesions compared tosurroundingtissuetothresholdtheimageaftercorrectionforinhomogeneities(Jacketal.,2001;Soupletetal.,2008).Thehyperintensityassumption ischallengedby thenaturalvariation inintensityfoundinnormaltissuesacrossthebrainsuchastheseptumpellucidumandCSFflowartefactsaroundtheventricles(Neemaetal.,2010).Otherproblemincludesresidualintensityinhomogeneity,evenaftercorrection.

Toaddresstheseissues,morecomplexmethodshavebeenproposed.Thesemethodscanbeclassified into unsupervised and supervised. Unsupervised methods rely on the naturalseparationofimagefeaturesusingclusteringtypeapproaches.Forexample,thelesiongrowthalgorithm (LGA)publicly available aspartof the lesion segmentation toolbox (LST)hasbeenwidelyused(Schmidtetal.,2012).Inthismethod,bothT1WandFLAIRimagesarerequiredtofirst compute amapof possible candidate lesionswhose centres are then used as seeds tosegmenttheentirelesionsusingregiongrowing.AlsoincludedintheLSTtoolbox,amorerecentmethod,thelesionpredictionalgorithm(LPA),onlyrequiresFLAIRimagesasinput.Withinthesame category, (Weiss et al., 2013) proposed a dictionary learning-based approach thatsegments lesions as outliers from a projection of the dataset onto a normative dictionary.

3

Similarly(Ranigaetal.,2011)usedagenerativemodeltosegmentlesionsbydetectingoutliertissue.Moreclassicalunsupervisedapproacheshavealsobeenproposed(Admiraal-Behlouletal.,2005).

SupervisedmethodsrequiretrainingdatasetswhereWMHlesionsaremanuallyannotatedbyexperts.Thistypeofmethodscanworkonsinglechannel(FLAIRorT2W)ormulti-channeldata(FLAIRorT2WandT1WandPDW).SupervisedmethodsforWMHsegmentationtypicallyinvolvemachinelearningmethodsatavoxellevelwithpreand/orpostprocessingstepstoimprovethesensitivityandspecificityoftheresults.Suchmethodshaveusedsupportvectormachines(Laoetal.,2008),k-nearestneighbours(Steenwijketal.,2013),randomforests(Ithapuetal.,2014;Geremiaetal.,2010:JessonandArbel,2015),artificialneuralnetworks(Dyrbyetal.,2008),deeplearning(Broschetal.,2015;Ghafoorianetal.,2016;Valverdeetal.,2017)ormultiatlaspatch-based label fusionmethods (Guizardet al., 2015).All thesemethodswere trainedoneithersingleormulti-channelvoxel intensities jointlywithsomeothercontext-relatedfeaturesandtypicallywithin a standardized anatomical space. Independently of the features used, thesemethodsperformtheclassificationstepatthevoxellevel,anddonottakeintoaccountlabelspatialcorrelations,whichmightaffecttheirperformance.

Toovercomethelackoflocalconsistency(i.e.eachvoxelislabelledindependentlyofneighbourvoxels)ofthemethodsperformingvoxel-wiseclassification,weproposeanautomaticpipelineforhyperintenselesionsegmentationbasedontheuseofpatch-wiseneuralnetworkclassifierthatsegmentsthelesionstakinginconsiderationpatchlabelslocalcontextinanovercompletemanner which further reduces classification errors. This pipeline benefits from some pre-processing steps aimed to improve the image quality and to locate it in a standardizedgeometrical and intensity space. The proposed method which extends a previous methodrecentlypublished (Manjónetal.,2015)usesaboostingbasedensemble learningstrategytominimizetheclassificationerror.Inthefollowingsections,theproposedmethodisdescribedand compared tomanual assessment and two state-of-the-artmethods. This comparison isperformedondatafromtwodatasets.

2.MaterialandMethods

2.1.Datadescription

AIBLdataset

In this work, we used a set of 128 subjects (including a wide range of whitematter lesionseverity, aged 38.6-92.1, male/female: 60/68) from the Australian Imaging Biomarkers andLifestyle(AIBL)study(www.aibl.csiro.au)(Ellisetal.,2009).FLAIRscanswereacquiredforallthesubjectsona3TSiemensMagnetomTrioTimscannerusingthefollowingparameters:TR/TE:6000/421ms,flipangle:120⁰,TI:2100ms,slicethickness:0.90mm,imagematrix:256×240,in-planespacing:0.98mm.ThegroundtruthfortrainingandevaluatingtheproposedmethodwasgeneratedbymanualdelineationofthehyperintenselesionsfromalltheFLAIRimagesbyDr.Parnesh Raniga using MRIcro. Lesion boundaries were delineated on axial slices after biascorrectionandanisotropicdiffusionsmoothingand lesionvolumeswerefilled in.Slicesweresegmentedfrominferiortosuperiorwithneighbouringslicesexaminedtoconfirmcontiguouslesions.Carewastakentoavoidsegmentingnormallyhyperintenseregionssuchastheseptum

4

pellucidumaslesions.OnetotwovoxelboundariesaroundtheventriclesandlargepenetratingareaswereexcludediftheyappearedhyperintenseasthesenormallycorrespondtoCSFflowartefacts.

MICCAI2008dataset

We also used a publicly available clinical dataset provided by the MS lesion segmentationchallengeatMICCAI2008(Styneretal.,2008).AsdonebyWeissetal.(2013),weusedthe20available labelled training cases as well as the test dataset (results on test dataset weresubmitted to the onlineweb service for its evaluation). The data comes originally from theChildren’sHospitalBoston(CHB)andtheUniversityofNorthCarolina(UNC).AlthoughthereareT1W,T2WandFLAIRimagesavailableinthisdataset,ourmethodonlyrequiredtheuseoftheFLAIRimages.

2.2.Preprocessing

Severalpre-processingstepsareappliedtoprojecttheimagesintoastandardizedgeometricalandintensityspace:

1. Noisereduction:TheSpatiallyAdaptive3DNon-localMeansFilterwasappliedtoreducethenoiseintheimages.Thisfilterwaschosenbecauseitautomaticallyadaptstobothstationaryandspatiallyvaryingnoiselevels(Manjónetal.,2010).

2. RegistrationtoMNIspace:Alltheimageswerealignedintoacommoncoordinatespace,enablingtheuseoflocationasafeaturetocaptureintensityvariationacrossbrainanatomy.To do this, the images were linearly registered (affine transform) to the MontrealNeurological Institute(MNI)spaceusingtheMNI152template.ThiswasperformedusingtheAdvancedNormalizationTools(ANTs)(AvantsandTustison,2009).

3. InhomogeneitycorrectionandBrainextraction:SPM12segmentationmodulewasusedto

performtheinhomogeneitycorrectionoftheimagesandtoprovideaninitialsegmentationof thebrain tissues:graymatter (GM),whitematter (WM),andcerebrospinal fluid (CSF)(AshburnerandFriston,2005).Abrainmaskwascreatedby thresholding the (GM+WM)probability maps. This binary mask was further refined by applying an openingmorphologicaloperation(usinga5x5x5voxelkernel)toremovesmallexternalnon-brainrelatedareas.ThefactthatinSPM12severalGaussiandistributionsareusedtomodeleachtissuetypehelpedtosuccessfullyperformtheinhomogeneitycorrectionrobustly.

4. Intensitynormalization:Theestimatedbrainmaskwasusedtoselectonlybrainvoxels.The

resulting volume was intensity standardized by dividing all brain voxels by the medianintensity within the brain region. Finally, resulting intensities were squared to enhanceimagecontrast.

2.3.Proposedmethod

Lesionsegmentationwasperformedinthreesteps:

5

1- LesioncandidateROIselection

Withinthebrainmask,aregionofinterest(ROI)iscreatedbyusingaconservativethreshold,aimingatincludingallthelesionsandsometissue(seesectionbelow).Thegoalofthisstepwastoreducethenumberofvoxelstobeclassified,byreducingthetruenegativesaslesionsinFLAIRimagestypicallyshowhigherintensitiesthannormalwhitematter.

2- NeuralNetworkclassifier

TheROIcontainsamixtureofnormaltissueandlesionvoxels.Aneuralnetworkwastrainedtoclassify voxels belonging to those two classes. We used neural networks instead of otherpowerfulclassifierssuchasrandomforestorsupportvectormachinesduetothepossibilitytoperform structured prediction (whole patch classification) as we will describe later. SeveralfeatureswereextractedfromeveryvoxelwithintheselectedROIandtheneuralnetworkwasusedtomapthesefeaturesintothecorrespondingclass(lesion/non-lesion).

• Features:Thefeaturesusedtotrainthenetworkwerea3DpatchP1aroundthevoxel/stobeclassified,asecondlarger3DpatchP2,usedtomodelthespatialcontextatalargerscale,thex,yandzvoxelcoordinatesofthecentervoxelofthepatchP1inMNIspaceandavaluerepresentingtheapriori lesionprobability (alsoofthecentervoxelofthepatchP1beingclassified). This a priori lesion probability map (Figure 1) was obtained by averaging alltraining lesionmaps in theMNI space (convolvedwitha5mm3Gaussiankernel). Inourexperiments,weusedaP1ofsize3x3x3voxels,aP2of5x5x5voxels(however,since3x3x3ofthe5x5x5voxelsofP2arealreadyincludedinthepatchP1wesubsampledthepatchP2sowe took only odd voxels (1,3,5) in all three dimensions, which resulted in a total of 27voxels).Thus,thenumberoffeaturesvectorwas58:27P1+27P2voxelintensities,3spatialcoordinatesand1apriorilesionprobability).

• Network topology: A feedforward multilayer perceptron with one hidden layer wasimplemented.Twodifferentoutputlayersettingsweretested,voxel-wiseandpatch-wise.In the firstcase, thenetworkthatweusedhad58xNx1neurons (beingNthenumberofneuronsofthehiddenlayer)soonlythecentervoxelofpatchP1wasclassified.Inthesecondcase,weuseda58xNx27network(labellingthewholepatchP1ratherthanjustthecentralvoxel).Inthissecondcase,anovercompleteapproachwasusedsothateachlabelledvoxelhadcontributionsfromseveraladjacentpatchesasdoneindenoising(Manjónetal.,2010).Thisimprovedsegmentationaccuracy(morevotespervoxel)andenforcedregularityinthefinal labelling. A sigmoid activation functionwas used in the hidden layerwhile a linearfunctionwasusedfortheoutputlayer.

6

Figure1.ExampleFLAIR imageoverlaidwith theaprioriprobability lesionmap.Ascanbenoticed, theperiventricularareashowsahighlesionprobability.3- Ensemble-basedclassification

Neuralnetworksareverypowerfulclassifiersbut,sincetheiroutputsarebasedonarandominitializationof theirweightsorsampleordering theaccuracyvariesacrossdifferent trainingsessions.Traditionally,severaltrainingsessionsareperformedandthebestoneischosenforthefinalclassifier.However,thisapproachisnotnecessarilythebestoptionasitcanleadtooverfitting problems. Tominimize this problem, one common solution has been the use ofensembles of classifiers (Opitz and Maclin, 1999) which ideally may help to minimize thevariance and bias of the classification error by combining several classifiers outputs. In thispaper,wehaveexploredtwopopularensemblevariants:bagging(Breiman,1994)andboosting(Schapire,1990).

Bagging(Bootstrapaggregating)isamachinelearningensemblemethoddesignedtoimprovethe stabilityandaccuracybyaveraging theoutputsof several classifiers trainedondifferentrandomly selecteddatasets. This approach reduces classificationerror varianceandhelps tominimize the overfitting problem. On the other side, boosting is also an ensemble-basedalgorithmthatcombinestheoutputofseveralclassifierstominimizenotonlytheclassificationerrorvariancebutalsothebias.Inboosting,theclassifiersarenotindependentlytrainedasinbaggingbuttheoutputofoneclassifierisusedtoimprovethenextone.Thisapproachiterativelygivesmoreweighttothesampleswronglyclassifiedinthenextclassifierorperforminganon-random selection on the training dataset selecting with higher probability samples wronglyclassified previously. Finally, the different classifier outputs are combined according to theiraccuracy.

In summary, after preprocesing, we apply the ensemble of trained neural networks in theselectedROI to createa lesionprobabilitymap. Theobtained lesionprobabilitymap is thenresampledintothenativeimagespaceandthresholdedtoproduceabinarylesionmask.Thetotalprocessingtimeofthefullpipelineisaround3minutes.WecalledtheproposedmethodHIST(forHyperIntenseSegmentationTool).

7

3.Experimentsandresults

All experiments were performed usingMATLAB 2015a and its neural network toolbox on astandardPC(inteli7-6700and16GBRAM)runningWindows10.

3.1.Parametersetting

Toevaluateourproposedmethodandtoestimatealltheparametersettings,weusedtheAIBLdataset(Ellisetal.,2009)torunsomeexperiments.Specifically,theAIBLdataset(N=128)wassplitintwosets,onefortraining/validation(N=68)andonefortesting(N=60).Neuralnetworkparameter settingswere tunedusing the validation set and later applied to the test set. Tomeasurethequalityoftheproposedsegmentationmethodweusedthedicecoefficient.Thetraining/validation dataset was augmented by including the transformed data of each case(symmetricleft-rightcasesalongaxialplane),whichresultedinatotalsizeof136images(where36oftheseimageswereusedforvalidationpurposesandtherestfortraining).

Networktopology

Theneuralnetwork topologyallows findinganoptimalmappingbetween the input featuresdescribingthedataandthedesiredoutput.Inthisstudy,weusedamultilayerperceptronwithonehiddenlayer.Asinputweusedthe58previouslydescribedfeaturesandasoutputthe27labelsof the correspondingP1 3x3x3patchof voxels.Anexperiment (using10000 randomlyselected training samples within the selected ROI) was performed to measure the dicecoefficientasafunctionofthenumberofneuronsofthehiddenlayer.Wefoundexperimentallythat63neuronsinthehiddenlayerwastheoptimalvaluebalancingnetworksimplicity(thusminimizingoverfitting)andaccuracy(intermsofDicecoefficient).Wealsotestedtheadditionof a secondhidden layer and the use of a bigger context patch P2 but the resultswere notsignificantly better. The final setup in all our experiments consisted of a network topologycomprising 58x63x27 neurons (i.e. 5445 trainable weights). A scaled conjugate gradientbackpropagation method was used to train the network (with it defaults parameters) asimplementedinMATLAB2015aneuralnetworktoolbox.

ROIselection

TosegmenthyperintenselesionsinthebrainwebenefitfromthefactthatingeneraltheyhaveahighintensityvalueonFLAIRMRIsandthusasimplethresholdcanbeusedtodefineasensitiveROI.Thisthresholdwasselectedtobelowenoughnottomissanytruelesionbuthighenoughtominimizethenumberofnon-lesionvoxels.Toestimatethisthreshold,theneuralnetworkdescribed above was used with different thresholds (from 1.1 to 1.8 at steps of 0.1) whilemeasuring themeandiceon thevalidationset.Wecompared the resultsobtained fromthecandidate region to investigate howmuch the networkwas improving the initial results. AsshowninatFigure3(left),asimpleglobalthresholdingoft=1.6providedameandiceof0.59±0.16.Usingalowerthresholdproducedalowdiceduetohighnumberoffalsepositiveswhileahigherthresholdreducedthediceduetotheincreaseoffalsenegatives.Ontheotherhand,theapplicationof theproposednetworkwithin the corresponding candidateROI showedaverysignificantimprovementindicemeasureforallusedthresholds(Figure3right).Inthiscase,weobtainedtheoptimaldiceresultof0.78±0.10fort=1.5.Thisimprovementwasonlyduetothe

8

exclusion of false positives since the network did not evaluate voxels not included in thecandidatemask.

Figure2. Left:Meandiceusingassegmentation thecandidateROIobtainedwithdifferent thresholds.Right:MeandiceafteraplyingtheproposedneuralnetworktothecorrespondingcandidateROIsobtainedwithdifferentthresholds(validationdataresults).

Networkoutputaggregation:Voxel-wisevsovercompletePatch-wise

Inourproposedmethod,weclassifypatchesinsteadofindependentvoxelsaimingatimprovingaccuracybyregularizingsegmentationresults.Toinvestigatethishypothesis,wecomparedthedicescorebetweenthetwodifferentscenariosdescribedinthemethodsection(voxel-wiseandpatch-wise).Wetrainedavoxel-wise(58x63x1)neuralnetworkwhereonlythecentralvoxelofthepatchP1waslabelledandcompareditsresultswiththedescribedpatch-wiseversion.Theaveragevalidationdicecoefficientofthevoxel-wiseversionwas0.73±0.12,whichwasnotablylowerthanthecorrespondingpatch-wiseversion(0.78±0.10)demonstratingtheeffectivenessofourpatch-wiseclassificationstrategy.

Ensembleofneuralnetworks

Tofurtherimprovetheclassificationresultsofourproposedmethodweexploredtwovariantsofensemblemethods,baggingandboosting.

Forthebaggingexperiments,wetrained10networksusing20000samplesrandomlyselectedfromthecandidateregionsofthetrainingdataset.Allresultingnetworkoutputswereuniformlyaveragedtoproducethefinaloutput.Fortheboostingexperiments,wealsotrained10networksusing20000samplesrandomlyselectedfromthetrainingdataset.However,inthiscase,afterthefirstnetwork,sampleswiththewrongclassificationswereselectedwithmoreprobabilitythancorrectlyclassifiedsamples.All10resultingnetworksoutputswereaveragedusingthedicecoefficientofeachindividualnetworktoproducethefinaloutput.

Weevaluatedtheimpactofthebagging/boostingapproaches(specificallytheoptimalnumberofneuralnetworkscombined). InFigure3(left), theevolutionoftheDicecoefficient (duringtraining)asafunctionofthenumberofaveragedtrainednetworksisshown.Ourexperimentsshowed thatbagging andboosting improved the classification results but reachedaplateauwhen10networkswereused.However,boostingproducedamorepronouncedimprovement

9

comparedtobaggingthankstoitssystematicerrorreductioncapabilities(thefirstnetworkhadatrainingdiceof0.917whilewhenusing10networkswereached0.922).

Due to theenhancedaccuracyof theproposedmethod (thanks to its ability to reduce falsepositives),were-evaluatedtheoptimalROIthresholdbutthistimeusingaboostedensembleofnetworks. In Figure 3 (right), themean dice of the validation set is presented for differentthresholds.As canbenoticed, theenhancedperformanceof thenetworkensembleallowedusingalowerthresholdreducingthenumberoffalsenegatives(andincreasingtruepositives)andthereforeimprovingtheoverallperformanceofthemethod.Thus,thefinalthresholdofthemethodwassettot=1.2.

Withthesesettings,wetrainedthefinalnetworkensemble(M=10)usingrandomlyselectedsetsof1000000samplesfromthetotalpopulationofaround4600000samplepatches(includingalltrainingandvalidationcases).Everynetworktookapproximately5hourstotrainsothetrainingtimeofthe10networksinasinglecomputerwasaround2days.Thefinalmeandiceofthetestsetusingthefinalensemblewas0.802±0.103.

Figure3.Left:Dicecoefficientasafunctionofthenumberofnetworksusedintheensembleforbagging(blue)andboosting(red)(trainingdataresults).Right:DiceasafunctionoftheROIselectionthresholdonBoosting(validationdataresults).

3.2.Comparisonwithothermethods

WecomparedtheperformanceofHISTwithrelatedpublicallyavailablemethodsincludedintheLSTtoolbox(http://www.applied-statistics.de/lst.html).ThefirstwastheLGAmethodthatusesbothT1WandFLAIR images (Schmidtetal.,2012) (LGAmethodtakesaround10minutes tosegmentanewcase)andthesecondwastheLPAthatonlyrequiresaFLAIRimagetoperformthelesionsegmentation(LPAmethodisfasterthanLGAandtakesonly3.5minutestosegmentanewcase).Wemeasuredtheresultsinnativespacesoallcomparedmethodssharethesamedataconditions.Todoso,weappliedaninverseaffinetransformtomaptheresultinglesionprobabilitymaptonativespace.Asafinalstep,thefinalmapwasthresholdedinnativespacetocreateabinarylesionmask.Weusedabinarizationthresholdof0.45tocompensatefortheinterpolationblurringintroducedbytheinversetransformationusedtomaptheresultsfromMNI to native space. To measure the quality of the proposed method we used the dicecoefficient,sensitivity,specificity,thenormalizedvolumedifference(absolutedifferenceofthereferenceandestimatedvolumedividedbythereferencevolume)andthevolumecorrelation

10

coefficient relating automatically estimated andmanually segmented lesion volumes in thedataset.

InTables1and2thedicecoefficientandmeanvolumedifferenceforthesemethodsandfordifferent lesion sizes is presented. The proposed method significantly outperformed thecomparedmethodsforall lesionsizes.Intable3,thevolumecorrelationshowsthattheHISTmethodhadthestrongervolumecorrelation(0.9938).Figure5showstheboxplotgraphsofdice,sensitivity,specificityandthedatasetvolumecorrelationandFigure6showsavisualexampleofthesegmentationresultsofonetestcase.

Table1.Meandicecoefficient.Bestresultsinbold.HISTresultsweresignificantlybetterthancomparedmethodsforalllesionsizesandinoverall(p<0.05).

MethodLesionsize*

Small(N=19) Medium(N=25) Big(N=16) All(N=60)

LST-LGA 0.4518±0.1531 0.6700±0.0694 0.7668±0.0406 0.6267±0.1597

LST-LPA 0.4973±0.1688 0.7101±0.0983 0.7886±0.0679 0.6636±0.1669

HIST

0.6945±0.1340 0.8141±0.0507 0.8743±0.0377 0.7923±0.1095

*Small(<4ml),medium(4mlto18ml),big(>18ml)

Table2.Meanvolumedifference.Bestresultsinbold.HISTresultsweresignificantlybetterthancomparedmethodsforalllesionsizesandinoverall(p<0.05).

MethodLesionsize*


LST-LGA 0.4044±0.2249 0.2383±0.2131 0.2437±0.1130 0.2923±0.2076

LST-LPA 0.3878±0.2583 0.1817±0.1156 0.1304±0.0650 0.2333±0.1963

HIST

0.2776±0.1810 0.1289±0.1221 0.0634±0.0750 0.1585±0.1577


Table3.PearsoncorrelationforthetotalWMHvolume.Bestresultsinbold.

MethodLesionsize*


LST-LGA 0.7712 0.8859 0.9732 0.9835

LST-LPA 0.8178 0.7649 0.9649 0.9730

HIST

0.7875 0.9067 0.9912 0.9938


11

Figure4.EvaluationresultsofWMHsegmentationinAIBLdataset.Dice,sensitivity,specificityandvolumecorrelationresults.

Figure5.AIBLdatasetvisualexampleresults.NotethatHISTmethodsuccessfullysegmentedhyperintenselesionswithoutincludingnon-pathologicalmid-sagittalplanehyperintensities.

12

SegmentationperformanceonperiventricularanddeepWMH

In order to further investigate the segmentation performance ofHIST regarding the varyinglocationandsizeofWMH,eachindividuallesionintheWMHsegmentationswaslabelledintotwotypes,i.e.,periventricularanddeepWMH,basedonitsdistancetothelateralventricles.Anexample casewith both substantial periventricular and deepWMH volumes is illustrated inFigure 7, where several small deepWMHweremissed in the LPA segmentation results. Incontrast,theHISTmethoddeliveredveryrobustlesionsegmentation,particularlyforsmall-sizedeepWMH.

Figure8summarizesthedicecoefficientsachievedbyLGA,LPAandHISTforsegmentationofperiventricularanddeepWMH.ForsegmentationofbothperiventricularanddeepWMH,theHISTmethodhasdemonstratedasignificanthigherperformance(p<0.001)comparedwiththestate-of-the-artmethods,LGAandLPA.Furthermore,thisadvantageoftheHISTmethodismorepronounced for segmentation of deep WMH with the average dice coefficient of 0.6636(±0.1594),whichismuchhigherthantherelatedaveragedicecoefficients(<0.5)forLGAandLPAmethods.

Figure6.Examplesofperiventricular(green)anddeep(red)WMHsegmentationsusingmanual,LGA,LPAandHISTmethods(Yellowarrowsindicateunder-segmentationofdeepWMH).

13

(a)

(b)

Figure7.Boxplotsofdicecoefficientsforsegmentationof(a)periventricularand(b)deepWMHusingLGA,LPAandHISTmethods.

MICCAI2008datasetresults

TotestourproposedmethodonanindependentdatasetweusedtheMICCAI2008challengedataset.ThisallowedcomparingtheresultsofHISTwithrecentmethodsappliedtothetrainingandtestdatasets(Styneretal.,2008).Inthetrainingdatacase(N=20),weusedtheMICCAI2008challengemetrics (i.e.TruePositiveRate (TPR),PositivePredictiveValue (PPV))and theDicecoefficient to be able to compare with relatedmethods applied to this dataset. To furtherimprovingthemethodaccuracyweretrainedthe10neuralnetworksusingallavailabledata(i.e.thefullAIBLdataset (N=128)).WedidnotusetheMICCAItrainingdataasweobservedthatsomemanuallylabelledcasescontainedsegmentationerrorsandbecausewewantedtofindoutifresultsobtainedusingAIBLdatasetcanbeextrapolatedtootherdatasets.Wecompared

14

ourresultswithpublishedresultsofsomeothermethodsappliedtothesametrainingdataset(Weiss et al., 2013; Souplet et al., 2008; Geremia et al., 2010; Brosch et al., 2015). Table 4summarizes the results of this comparison.HISTmethodobtained thebest results for the3metrics(meanvalueofthe20casesforeachmetric)showingthatthefeatureslearnedonAIBLdatasetwereusefultosegmentlesionsinotherdatasets.

Finally,theproposedmethodwasalsoappliedtothetestdataset(N=23)andtheresultsweresubmitted through the challenge website (http://www.ia.unc.edu/MSseg) for its evaluation(notethattheevaluationwasperformedbythechallengeorganizersaswehavenotaccesstothetestdataset).HISTmethodwasrankedthe9thoveratotalof62submissions(6thifmultiplesubmissionsfromthesameauthorarediscounted).Intable5,theresultsofthedifferentmetricsare compared to themetrics of the two top performingmethods (based on deep learning)(JessonandArbel,2015;Valverdeetal.,2017).Althoughtheproposedapproachwasnottheoverallbestperformingmethod,itshowedalowVDforbothdatasetsanditwasthemoststableonewithsimilarmetricsfordifferentdatasets(noteforexamplehowVDisquitedifferent inJenson´smethodinthetwodatasetswhileourmetricsaremoresimilaracrossdatasets).Veryimportantly,HISTwastheonlymethodusingonlyFLAIRimagesforthesegmentation(theothercomparedmethodsusedbothT1wandFLAIRimages).

Table4.MethodscomparisononMICCAItraindata.Bestresultsinbold.

Method TPR PPV DICE

Souplet2008 0.21 0.30 --

Geremia2010 0.40 0.40 --

Weiss2013 0.33 0.37 0.29

Brosch2015 0.40 0.41 0.36

HIST 0.45 0.47 0.43

Table5.MethodscomparisononMICCAItestdata.ADistheaverageHausdorffdistanceandVDstandsforthepercentvolumedifference.Bestresultsinbold.

Dataset UNC CHB

Method VD AD TPR FPR VD AD TPR FPR

(1) Valverde2017 62.5 5.8 55.5 46.8 40.8 5.2 68.7 46.0

(2) Jensson2015 46.9 5.1 43.9 32.3 113.4 6.1 53.5 24.2

(9)HIST 33.1 5.7 63.8 69.7 59.3 6.4 68.0 68.6

15

4.Discussion

In this paper, we have presented a newmethod to segment hyperintense lesions on FLAIRimagesbasedonanensembleofovercompletepatch-wiseneuralnetworkclassifiers.Wehaveshownthattheproposedovercompletepatch-wiseapproachsignificantlyimprovedthevoxel-wisenetworkbyenforcingtheregularityofthesegmentationsandbyminimizingthevarianceof the classification error due to the aggregation of many patch contributions. We used aboosting strategy to combine an ensemble of neural networks, improving the classificationresultsbyminimizingclassificationbias.

Eachstepofourapproachseekstoimprovetheresultsbyincreasingspecificitywhilekeepingthesensitivitystable.Therefore,westartedwithasimplethresholdprocedurethatissensitivebutnotspecific.Theensembleofpatch-basedneuralnetworkswasthenabletoremovefalsepositiveswhilekeepingtruepositives.TheinitialROIselectionwasabletoreducethesizeandthediversityofdatatobeclassifiedandtherebyreducesomeoftheproblemcomplexity.Whileitmaybepossibletotrainonallinputdata,wefoundthissimpleapproachveryeffective.

Bytakinganovercompleteapproachandaveragingtheresultsofallthepatchesthatavoxelbelongsto,weareincreasingthelocalneighbourhoodthatistakenintoaccountwhenmakingthedecisionwithoutadrasticincreaseinthecomputationtimeandmemoryrequiredtotrainanetworkwithmoreneuronstoaccommodatethelargerinputandoutputpatches.

TheproposedmethodachievedthebestclassificationresultsonAIBLdatasetbutalsoprovidedthehighestvolumecorrelation(0.994)withmanuallabelling,animportantresultforusingHISTinclinicalstudies.

Inaddition,theHISTmethodwasappliedtoanindependentMSdatasetgivingverycompetitiveresultsdemonstratingthegeneralityoftheproposedapproach.Itisinterestingtonotethattheproposedmethodperformedbetterthansomestate-of-the-artdeeplearningapproachesthatutilize multiple MR contrasts (Brosch et al.,2015) while our method only used FLAIR data.Although including T1data couldpotentially improve the results,wedecidednot to includethesedatatokeepthemethodassimpleaspossibleandtoshowthestrengthoftheproposedmethodonmonomodaldata.

The competitive results we have obtained can be understood mainly thanks to the use ofcarefully selected features, such as the apriori probabilitymap, and the use of a simple yeteffectivewaytoclassifythem(i.e.patch-basedboostedensemble)giventhesmallsizeofthetrainingdata.

OneofthelimitationsofourproposedmethodisitsrelativelyhighFPR(Table5).Thisisprobablyduetothethresholdingprocessanditseffectisespeciallysignificantatsmallandmediumsizelesions (Table 3)which results in a small overestimation of the lesion volume.One possiblesolution to thisproblemcouldbe theuseoferror correctionmethods (Wangetal.,2011) tocorrectthesegmentationsgiventhesystematicnatureoftheerrors.AnotherpossiblesolutiontominimizethenumberoffalsepositivescouldbetheuseofacascadeapproachsimilartheoneproposedbyValverdeetal.(2017).Weplantoextendtheproposedmethodinthenearfutureusingmultimodaldata(addingT1imagesforexample).

16

5.Conclusion

WehaveproposedasimpleyeteffectivemethodtosegmentwhitematterhyperintenselesiononFLAIR images.Theproposedmethodbenefitedfromitsovercompletepatch-basednatureandboostingapproachtoprovideregularandaccuratesegmentations.Theproposedmethodcomparedfavourablywithmanystate-of-the-artmethodsintwodifferentMRIdatasetsandcanbeagoodchoicetoperformlarge-scalebrainanalysisstudies.

Acknowledgements

ThisresearchhasbeendonethankstotheAustraliandistinguishedvisitingprofessorgrantfromthe CSIRO (Commonwealth Scientific and Industrial Research Organisation) and the Spanish“Programadeapoyoalainvestigaciónydesarrollo(PAID-00-15)”oftheUniversidadPolitécnicadeValencia.ThisresearchwaspartiallysupportedbytheSpanishgrantTIN2013-43457-RfromtheMinisteriodeEconomiaycompetitividad.ThisstudyhasbeencarriedoutalsowithsupportfromtheFrenchState,managedby theFrenchNationalResearchAgeny in the frameof theInvestmentsforthefutureProgramIdExBordeaux(ANR-10-IDEX-03-02,HL-MRIProject),ClusterofexcellenceCPUandTRAIL(HR-DTIANR-10-LABX-57)andtheCNRSmultidisciplinaryproject"Défiimag'In".SomeofthedatausedinthisworkwascollectedbytheAIBLstudygroup.FundingfortheAIBLstudy isprovidedbytheCSIROFlagshipCollaborationFundandtheScienceandIndustry Endowment Fund (SIEF) in partnership with Edith Cowan University (ECU), MentalHealthResearchInstitute(MHRI),Alzheimer’sAustralia(AA),NationalAgeingResearchInstitute(NARI),AustinHealth,MacquarieUniversity,CogStateLtd,HollywoodPrivateHospital,andSirCharlesGairdnerHospital.

17

References

Admiraal-Behloul,FvandenHeuvelDM,OlofsenH,vanOschMJ,vanderGrondJ,vanBuchemMA, Reiber JH. 2005. Fully automatic segmentation of whitematter hyperintensities inMRimagesoftheelderly.NeuroImage,28,607–617.

Avants,B.,Tustison,N.,Song,G.2009.AdvancedNormalizationTools:V1.0.

Ashburner,J.,Friston,K.J.2005.Unifiedsegmentation.Neuroimage,26,839–851.

Breiman Leo. 1994. Bagging Predictors. Technical Report 421, Department of Statistics,UniversityofCaliforniaBerkeley,CA.

Brosch T, Yoo Y, Tang L , Li D, TraboulseeA, and TamR. 2015.DeepConvolutional EncoderNetworks for Multiple Sclerosis Lesion Segmentation. MICCAI 2015, Volume 9351 of theseriesLectureNotesinComputerScience,3-11.

Caligiuri,M.E.,Perrotta,P.,Augimeri,A.,Rocca,F.,Quattrone,A.,Cherubini,A.2015.AutomaticDetection ofWhiteMatter Hyperintensities in Healthy Aging and Pathology UsingMagneticResonanceImaging:AReview.Neuroinformatics,13,261–276.

DebetteS.andMarkusHS.2010.Theclinical importanceofwhitematterhyperintensitiesonbrain magnetic resonance imaging: systematic review and meta-analysis. Bristish medicalJournal,341,c3666

Dyrby,T.B.RostrupE,BaaréWF,vanStraatenEC,BarkhofF,VrenkenH,RopeleS,SchmidtR,ErkinjunttiT,WahlundLO,PantoniL,InzitariD,PaulsonOB,HansenLK,WaldemarG;LADISstudygroup.2008.Segmentationofage-relatedwhitematterchangesinaclinicalmulticenterstudy.NeuroImage,41,335–345.

Ellis,K.A.BushAI,DarbyD,DeFazioD,FosterJ,HudsonP,LautenschlagerNT,LenzoN,MartinsRN,MaruffP,MastersC,MilnerA,PikeK,RoweC,SavageG,SzoekeC,TaddeiK,VillemagneV,WoodwardM,AmesD;AIBLResearchGroup.2009.TheAustralian Imaging,BiomarkersandLifestyle (AIBL) study of aging:methodology and baseline characteristics of 1112 individualsrecruitedforalongitudinalstudyofAlzheimer’sdisease.IntPsychogeriatr2009,1–16.

Filippi,M.,Rocca,M.A.2011.MRimagingofmultiplesclerosis.Radiology,259,659–681.

GhafoorianM,MehrtashA,KapurT,KarssemeijerNetal.2016.TransferLearningforDomainAdaptation inMRI:Application inBrain Lesion Segmentation.Medical Physics, 43(12), 6246-6258.

Geremia,E.,Menze,B.H.,Clatz,O.,Konukoglu,E.,Criminisi,A.,Ayache,N.2010.SpatialdecisionforestsforMSlesionsegmentationinmulti-channelMRimages.In:Jian,T.,Navab,N.,Pluim,J.,Viergever,M.(eds.)MICCAI2010,PartI.LNCS,vol.6362,Springer,Heidelberg,111–118.

GuizardN,CoupéP,FonovV,ManjónJ.V.,DouglasA,CollinsD.L.2015.Rotation-invariantmulti-contrastnon-localmeansforMSlesionsegmentation.Neuroimage:Clinical,8,376-389.

18

Ithapu,V.,Singh,V.,Lindner,C.,Austin,B.P.,Hinrichs,C.,Carlsson,C.M.,Bendlin,B.B.,Johnson,S.C. 2014. Extracting and summarizing white matter hyperintensities using supervisedsegmentationmethods in Alzheimer’s disease risk and aging studies. Hum BrainMapp. 35,4219–4235.

Jack,C.R.,O’Brien,P.C.,Rettman,D.W.,Shiung,M.M.,Xu,Y.,Muthupillai,R.,Manduca,A.,Avula,R., Erickson, B.J. 2001. FLAIR histogram segmentation for meas-urement of leukoaraiosisvolume.JMagnResonImaging,14,668–676.

JessonAandArbelT.2015.HierarchicalMRFandRandomForestSegmentationofMSLesionsandHealthyTissuesinBrainMRI.ISBI2015,LongitudinalMSlesionsegmentationchallenge.

KullerLewisH.,LongstrethW.T.Jr,ArnoldAliceM.,BernickCharles,BryanR.Nick,BeauchampNorman J. Jr. 2004.WhiteMatter Hyperintensity on CranialMagnetic Resonance Imaging APredictorofStroke.Stroke,35,1821-1825.

KuoHsu-KoandLipsitzLewisA.2004.CerebralWhiteMatterChangesandGeriatricSyndromes:IsThereaLink?JournalofGerontology:MedicalSciences,59(8),818-826.

LaoZ,ShenD,LiuD,JawadAF,MelhemER,LaunerLJ,BryanRN,DavatzikosC.2008.Computer-assistedsegmentationofwhitematterlesionsin3DMRimagesusingsupportvectormachine.AcadRadiol.15(3),300-13.

LeeJJ,LeeEY,LeeSB,ParkJH,KimTH,JeongHG,KimJH,HanJW,KimKW.2015.ImpactofWhite Matter Lesions on Depression in the Patients with Alzheimer's Disease. PsychiatryInvestig.12(4),516-22.

Manjón, J.V., Coupé, P.,Martí-Bonmatí, L., Collins,D.L., Robles,M. 2010.Adaptivenon-localmeansdenoisingofMRimageswithspatiallyvaryingnoiselevels.JMagnResonImaging.31,192–203.

Manjón J.V., Coupe P, Raniga P, Xia Y, Fripp J, and Salvado O. 2016. HIST: HyperIntensitySegmentationTool.Patch-MI2016:Patch-BasedTechniquesinMedicalImaging,92-99.

Mäntylä R, Erkinjuntti T, Salonen O, Aronen HJ, Peltonen T, Pohjasvaara T, Standertskjöld-Nordenstam CG. 1997. Variable agreement between visual rating scales for white matterhyperintensitiesonMRI.Comparisonof13ratingscalesinapoststrokecohort.Stroke,28(8),1614-1623.

NeemaM,GussZ.D.,StankiewiczJ.M.,AroraA,HealyB.C,andBakshiR.2010.NormalfindingsonbrainFLAIRMRIscansat3T.AJNRAmJNeuroradiol,30(5),911–916.

Opitz,D.;Maclin,R.1999.Popularensemblemethods:Anempiricalstudy.JournalofArtificialIntelligenceResearch,11,169–198.

Raniga P, Schmitt P ; Bourgeat P, Fripp J, VillemagneV L, RoweC C, SalvadoO. 2011. Localintensity model: An outlier detection framework with applications to white matterhyperintensitysegmentation.IEEEInternationalSymposiumonBiomedicalImaging:FromNanotoMacro(2011).

19

SchapireR.E.1990.Thestrengthofweaklearnability.MachineLearning,5(2),197:227.

ScheltensP.,ErkinjuntiT.,LeysD,WahlundL.-O·InzitariD.,delSerT.,PasquierF.,BarkhofF.,MäntyläR.,BowlerJ.,WallinA.,GhikaJ.,FazekasF.,PantoniL.1998.WhiteMatterChangesonCTandMRI:AnOverviewofVisualRatingScales.EuropeanNeurology,39,80–89.

Schmidt,P.,Gaser,C.,Arsic,M.,Buck,D.,Förschler,A.,Berthele,A.,Hoshi,M.,Ilg,R.,Schmid,V.J., Zimmer, C., Hemmer, B.,Mühlau,M. 2012. An automated tool for detection of FLAIR-hyperintensewhite-matterlesionsinMultipleSclerosis.Neuroimage,59,3774–3783.

Souplet,J.C.,Lebrun,C.,Ayache,N.,Malandain,G.2008.AnautomaticsegmentationofT2-FLAIRmultiplesclerosislesions.MIDASJournal-MICCAI2008Workshop.

SteenwijkM,PouwelsP,DaamsM,DalenJW,CaanMetal.2013.Accuratewhitematterlesionsegmentation by k nearest neighbor classification with tissue type priors (kNN-TTPs).NeuroimageClinical3,462–469.

Styner,M.,Lee,J.,Chin,B.,Chin,M.S.,Commowick,O.,Tran,H.-H.,Markovic-Plese,S.,Jewells,V., Warfield, S. 2008. 3D Segmentation in the Clinic: A Grand Challenge II: MS lesionsegmentation.MIDASjournal.

SchmidtR.,Scheltens,ErkinjunttiT.,PantoniL.,MarkusH.S.,WallinFRCP,A.,BarkhofF.,FazekasF.2004.Whitematterlesionprogression:Asurrogateendpointfortrialsincerebralsmall-vesseldisease.Neurology,63(1),139:144.

UdupaJK,WeiL,SamarasekeraS,MikiY,vanBuchemMA,GrossmanRI.1997.Multiplesclerosislesionquantificationusingfuzzy-connectednessprinciples.IEEETransMedImaging,16(5),598-609.

Valverde S, CabezasM, Roura E,González-Villà S, ParetoD, Vilanova J.C., Ramió-Torrentà L,RoviraA,OliverA,LladóX.2017.Improvingautomatedmultiplesclerosislesionsegmentationwithacascaded3Dconvolutionalneuralnetworkapproach.NeuroImage,155,159-168.

Wardlaw, J.M., Smith, E.E., Biessels, G.J., Cordonnier, et al. 2013. STandards for ReportIngVascularchangesonnEuroimaging(STRIVEv1):Neuroimagingstandardsforresearchintosmallvesseldiseaseanditscontributiontoageingandneurodegeneration.LancetNeurol.12,822–838.

WangH,DasSR,SuhJW,AltinayM,PlutaJ,CraigeC,AvantsB,YushkevichP.2011.Alearning-based wrapper method to correct systematic errors in automatic image segmentation:consistently improved performance in hippocampus, cortex and brain segmetnation.NeuroImage55(3),968-985.

Weiss,N.,Rueckert,D.,Rao,2013.A.Multiplesclerosislesionsegmentationusingdic-tionarylearningandsparsecoding.MedImageComputComputAssistInterv,16,735–742.

WongTienYin,KleinRonald,SharrettA.Richey,CouperDavidJ.,KleinBarbaraE.K.,LiaoDuan-Ping,HubbardLarryD.,MosleyThomasH.2002.CerebralWhiteMatterLesions,Retinopathy,andIncidentClinicalStroke.JAMA,288(1),67-74.

MRI white matter lesion segmentation using an ensemble of ...

Documents