-
NeuroImage 225 (2021) 117471
Contents lists available at ScienceDirect
NeuroImage
journal homepage: www.elsevier.com/locate/neuroimage
A contrast-adaptive method for simultaneous whole-brain and
lesion segmentation in multiple sclerosis
Stefano Cerri a , b , โ , Oula Puonti b , Dominik S. Meier c ,
Jens Wuerfel c , Mark Mรผhlau d , Hartwig R. Siebner b , e , f ,
Koen Van Leemput a , g
a Department of Health Technology, Technical University of
Denmark, Denmark b Danish Research Centre for Magnetic Resonance,
Copenhagen University Hospital Hvidovre, Denmark c Medical Image
Analysis Center (MIAC AG) and Department of Biomedical Engineering,
University Basel, Switzerland d Department of Neurology and
TUM-Neuroimaging Center, School of Medicine, Technical University
of Munich, Germany e Department of Neurology, Copenhagen University
Hospital Bispebjerg, Denmark f Institute for Clinical Medicine,
Faculty of Medical and Health Sciences, University of Copenhagen,
Denmark g Athinoula A. Martinos Center for Biomedical Imaging,
Massachusetts General Hospital, Harvard Medical School, USA
a r t i c l e i n f o
Keywords:
Lesion segmentation Multiple sclerosis Whole-brain segmentation
Generative model
a b s t r a c t
Here we present a method for the simultaneous segmentation of
white matter lesions and normal-appearing neu- roanatomical
structures from multi-contrast brain MRI scans of multiple
sclerosis patients. The method integrates a novel model for white
matter lesions into a previously validated generative model for
whole-brain segmenta- tion. By using separate models for the shape
of anatomical structures and their appearance in MRI, the algorithm
can adapt to data acquired with different scanners and imaging
protocols without retraining. We validate the method using four
disparate datasets, showing robust performance in white matter
lesion segmentation while simultaneously segmenting dozens of other
brain structures. We further demonstrate that the contrast-adaptive
method can also be safely applied to MRI scans of healthy controls,
and replicate previously documented atrophy patterns in deep gray
matter structures in MS. The algorithm is publicly available as
part of the open-source neuroimaging package FreeSurfer.
1
a d c 2 i
e c 2 2 m m l d m b
p a h 2
l p i a i i a i S p b
1
hRA1
. Introduction
Multiple sclerosis (MS) is the most frequent chronic
inflammatoryutoimmune disorder of the central nervous system,
causing progressiveamage and disability. The disease affects nearly
half a million Ameri-ans and 2.5 million individuals world-wide (
Goldenberg, 2012; Rosati,001 ), generating more than $10 billion in
annual healthcare spendingn the United States alone ( Adelman et
al., 2013 ).
The ability to diagnose MS and track its progression has been
greatlynhanced by magnetic resonance imaging (MRI), which can
detectharacteristic brain lesions in white and gray matter ( Bakshi
et al.,008; Blystad et al., 2015; Garcรญa-Lorenzo et al., 2013;
Lรถvblad et al.,010 ). Lesions visualized by MRI are up to an order
of magnitudeore sensitive in detecting disease activity compared to
clinical assess-ent ( Filippi et al., 2006 ). The prevalence and
dynamics of white matter
esions are thus used clinically to diagnose MS ( Thompson et
al., 2018 ),efine disease stages and to determine the efficacy of a
therapeutic regi-en ( Sormani, 2013 ). MRI is also an unparalleled
tool for characterizing
rain atrophy, which occurs at a faster rate in patients with MS
com-
โ Corresponding author. E-mail address: [email protected] (S.
Cerri).
s
ttps://doi.org/10.1016/j.neuroimage.2020.117471 eceived 11 May
2020; Received in revised form 12 October 2020; Accepted 16
Octovailable online 22 October 2020 053-8119/ยฉ 2020 The Authors.
Published by Elsevier Inc. This is an open access ar
ared to healthy controls ( Azevedo et al., 2018; Barkhof et al.,
2009 )nd, especially in deep gray matter structures and the
cerebral cortex,as been shown to correlate with measures of
disability ( Geurts et al.,012 ).
Although manual labeling remains the most accurate way 1 of
de-ineating white matter lesions in MS ( Commowick et al., 2018 ),
this ap-roach is very cumbersome and in itself prone to
considerable intra- andnter-rater disagreement ( Zijdenbos et al.,
1998 ). Furthermore, manu-lly labeling various normal-appearing
brain structures to assess atrophys simply too time consuming to be
practically feasible. Therefore, theres a clear need for automated
tools that can reliably and efficiently char-cterize the
morphometry of white matter lesions, various neuroanatom-cal
structures, and their changes over time directly from in vivo
MRI.uch tools are of great potential value for diagnosing disease,
trackingrogression, and evaluating treatment. They can also help in
obtaining aetter understanding of underlying disease mechanisms,
and to facilitate
Although selectively fusing several automatic methods has
recently been hown to approach human performance ( Carass et al.,
2020 ).
ber 2020
ticle under the CC BY license (
http://creativecommons.org/licenses/by/4.0/ )
https://doi.org/10.1016/j.neuroimage.2020.117471http://www.ScienceDirect.comhttp://www.elsevier.com/locate/neuroimagehttp://crossmark.crossref.org/dialog/?doi=10.1016/j.neuroimage.2020.117471&domain=pdfmailto:[email protected]://doi.org/10.1016/j.neuroimage.2020.117471http://creativecommons.org/licenses/by/4.0/
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021)
117471
m t h p
L p i
s 4 a f i i a i m s s f a
v M p r t i o t M w t b f
Fig. 1. Segmentation of white matter lesions and 41 different
brain structures from the proposed method on T1w-FLAIR input. From
left to right: sagittal, coronal, axial view. From top to bottom:
T1w, FLAIR, automatic segmentation.
o b
V e m t t i
2
q o w s p a f h
a ( b t F{ v g ๐ฅ b m s I h o
ore efficient testing in clinical trials. Ultimately, automated
softwareools may help clinicians to prospectively identify which
patients are atighest risk of future disability accrual, leading to
better counseling ofatients and better overall clinical
outcomes.
Despite decades of methodological development (cf. Garcรญa-orenzo
et al., 2013 or Danelakis et al., 2018 ), currently available
com-utational tools for analyzing MRI scans of MS patients remain
limitedn a number of important ways:
โข Poor generalizability: Existing tools are often developed and
testedon very specific imaging protocols, and may not be able to
work ondata that is acquired differently. Especially with the
strong surge ofsupervised learning in recent years, where the
relationship betweenimage appearance and segmentation labels in
training scans is di-rectly and statically encoded, the
segmentation performance of manystate-of-the-art algorithms will
degrade substantially when appliedto data from different scanners
and acquisition protocols ( Garcรญa-Lorenzo et al., 2013; Valverde
et al., 2019 ), severely limiting theirusefulness in practice.
โข Dearth of available software: Despite the very large number of
pro-posed methods, most algorithms are only developed and tested
in-house, and very few tools are made publicly available (
Griffantiet al., 2016; Schmidt et al., 2012; Shiee et al., 2010;
Valverde et al.,2017 ). In order to secure that computational
methods will make areal practical impact, they must be accompanied
by software imple-mentations that work robustly across a wide array
of image acqui-sitions; that are made publicly available; and that
are open-sourced,rigorously tested and comprehensively
documented.
โข Limitations in assessing atrophy: There is a lack of dedicated
tools forcharacterizing brain atrophy patterns in MS: many existing
methodscharacterize only aggregate measures such as global brain or
graymatter volume ( Smeets et al., 2016; Smith et al., 2002 )
rather thanindividual brain structures, or require that lesions are
pre-segmentedso that their MRI intensities can be replaced with
placeholder valuesto avoid biased atrophy measures ( Azevedo et
al., 2018; Battagliniet al., 2012; Ceccarelli et al., 2012; Chard
et al., 2010; Gelineau-Morel et al., 2012; Sdika and Pelletier,
2009 ) (so-called lesion fill-ing).
In order to address these limitations, we describe a new
open-sourceoftware tool for simultaneously segmenting white matter
lesions and1 neuroanatomical structures from MRI scans of MS
patients. An ex-mple segmentation produced by this tool is shown in
Fig. 1 . By per-orming lesion segmentation in the full context of
whole-brain model-ng, the method obviates the need to segment
lesions and assess atrophyn two separate processing phases, as
currently required in lesion fillingpproaches. The method works
robustly across a wide range of imag-ng hardware and protocols by
completely decoupling computationalodels of anatomy from models of
the imaging process, thereby side-
tepping the intrinsic generalization difficulties of supervised
methodsuch as convolutional neural networks. Our software
implementation isreely available as part of the FreeSurfer
neuroimaging analysis pack-ge ( Fischl, 2012 ).
To the best of our knowledge, only two other methods have been
de-eloped for joint whole-brain and white matter lesion
segmentation inS. Shiee et al. (2010) model lesions as an extra
tissue class in an unsu-
ervised whole-brain segmentation method ( Bazin and Pham, 2008
),emoving false positive detections of lesions using a combination
ofopological constraints and hand-crafted rules implementing
variousntensity- and distance-based heuristics. However, the method
segmentsnly a small set of neuroanatomical structures (10), and
validation ofhis aspect was limited to a simulated MRI scan of a
single subject.cKinley et al. (2019) use a cascade of two
convolutional neural net-orks, with the first one skull-stripping
individual image modalities and
he second one generating the actual segmentation. However, the
whole-rain segmentation performance of this method was only
evaluated on aew structures (7). Furthermore, as a supervised
method its applicability
n data that differs substantially from its training data will
necessarilye limited.
A preliminary version of this work was presented in Puonti andan
Leemput (2016) . Compared to this earlier work, the current
articlemploys more advanced models for the shape and appearance of
whiteatter lesions, and includes a more thorough validation of the
segmen-
ation performance of the proposed method, including an
evaluation ofhe whole-brain segmentation component and comparisons
with humannter-rater variability.
. Contrast-adaptive whole-brain segmentation
We build upon a method for whole-brain segmentation called
Se-uence Adaptive Multimodal SEGmentation (SAMSEG) that we
previ-usly developed ( Puonti et al., 2016 ), and that we propose
to extendith the capability to handle white matter lesions. SAMSEG
robustly
egments 41 structures from head MRI scans without any form of
pre-rocessing or prior assumptions on the scanning platform or the
numbernd type of pulse sequences used. Since we build heavily on
this methodor the remainder of the paper, we briefly outline its
main characteristicsere.
SAMSEG is based on a generative approach, in which a forward
prob-bilistic model is inverted to obtain automated segmentations.
Let ๐ = ๐ 1 , โฆ , ๐ ๐ผ ) denote a matrix collecting the intensities
in a multi-contrastrain MR scan with ๐ผ voxels, where the vector ๐ ๐
= ( ๐ 1 ๐ , โฆ , ๐
๐ ๐ ) ๐ con-
ains the intensities in voxel ๐ for each of the available ๐
contrasts.urthermore, let ๐ฅ = ( ๐ 1 , โฆ , ๐ ๐ผ ) ๐ be the
corresponding labels, where ๐ ๐ โ1 , โฆ๐พ} denotes one of the ๐พ
possible segmentation labels assigned tooxel ๐ . SAMSEG estimates a
segmentation ๐ฅ from MRI data ๐ by using aenerative model,
illustrated in black in Fig. 2 . According to this model, is
sampled from a segmentation prior ๐ ( ๐ฅ |๐ฝ๐ฅ ) , after which ๐ is
obtainedy sampling from a likelihood function ๐ ( ๐ |๐ฅ , ๐ฝ๐ ) ,
where ๐ฝ๐ฅ and ๐ฝ๐ areodel parameters with priors ๐ ( ๐ฝ๐ฅ ) and ๐ ( ๐ฝ๐
) . Segmentation then con-
ists of inferring the unknown ๐ฅ from the observed ๐ under this
model.n the following, we summarize the segmentation prior and the
likeli-ood used in SAMSEG, as well as the way the resulting model
is used tobtain automated segmentations.
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021)
117471
2
c i m v v m f
๐
a
๐
w
a
bv
a o i e l ( i h d S s q
2
๐พ
e F ๐ b e m
๐
๐
๐
w f M ๐
2
e
๐ฝ
Fig. 2. Graphical model of the proposed method. In black the
existing contrast- adaptive whole-brain segmentation method SAMSEG
(without lesion modeling), in blue the proposed additional
components to also model white matter lesions. Shading indicates
observed variables. The plate indicates ๐ผ repetitions of the
included variables, where ๐ผ is the number of voxels.
w a a
๏ฟฝฬ๏ฟฝ
i a a
๐ค
e s t s p t
c o p o a ( e t c m ๐
s i c i
3
m l l b S ๐ฅ l
2 http://freesurfer.net/
.1. Segmentation prior
To model the spatial configuration of various neuroanatomi-al
structures, we use a deformable probabilistic atlas as detailedn
Puonti et al. (2016) . In short, the atlas is based on a
tetrahedralesh, where the parameters ๐ฝ๐ฅ are the spatial positions
of the meshโs
ertices, and ๐ ( ๐ฝ๐ฅ ) is a topology-preserving deformation prior
that pre-ents the mesh from tearing or folding ( Ashburner et al.,
2000 ). Theodel assumes conditional independence of the labels
between voxels
or a given deformation:
( ๐ฅ |๐ฝ๐ฅ ) = ๐ผ โ๐ =1 ๐ ( ๐ ๐ |๐ฝ๐ฅ ) ,
nd computes the probability of observing label ๐ at voxel ๐
as
( ๐ ๐ = ๐ |๐ฝ๐ฅ ) = ๐ฝ โ๐=1
๐ผ๐ ๐ ๐ ๐ ๐ ( ๐ฝ๐ฅ ) , (1)
here ๐ผ๐ ๐
are label probabilities defined at the ๐ฝ vertices of the
mesh,
nd ๐ ๐ ๐ ( ๐ฝ๐ฅ ) denotes a spatially compact, piecewise-linear
interpolation
asis function attached to the ๐ ๐กโ vertex and evaluated at the ๐
๐กโ
oxel ( Van Leemput, 2009 ). The topology of the mesh, the mode
of the deformation prior ๐ ( ๐ฝ๐ฅ ) ,
nd the label probabilities ๐ผ๐ ๐
can be learned automatically from a setf segmentations provided
as training data ( Van Leemput, 2009 ). Thisnvolves an iterative
process that combines a mesh simplification op-ration with a
group-wise nonrigid registration step to warp the at-as to each of
the training subjects, and an Expectation MaximizationEM) algorithm
( Dempster et al., 1977 ) to estimate the label probabil-ties
๐ผ๐
๐ in the mesh vertices. The result is a sparse mesh that
encodes
igh-dimensional atlas deformations through a compact set of
vertexisplacements. As described in Puonti et al. (2016) , the
atlas used inAMSEG was derived from manual whole-brain
segmentations of 20ubjects, representing a mix of healthy
individuals and subjects withuestionable or probable Alzheimerโs
disease.
.2. Likelihood function
For the likelihood function we use a Gaussian model for each of
thedifferent structures. We assume that the bias field artifact can
be mod-
lled as a multiplicative and spatially smooth effect ( Wells et
al., 1996 ).or computational reasons, we use log-transformed image
intensities in , and model the bias field as a linear combination
of spatially smoothasis functions that is added to the local voxel
intensities ( Van Leemputt al., 1999 ). Letting ๐ฝ๐ collect all bias
field parameters and Gaussianeans and variances, the likelihood is
defined as
( ๐ |๐ฅ , ๐ฝ๐ ) = ๐ผ โ๐ =1 ๐ ( ๐ ๐ |๐ ๐ , ๐ฝ๐ ) ,
( ๐ ๐ |๐ ๐ = ๐, ๐ฝ๐ ) = ( ๐ ๐ |๐๐ + ๐ ๐๐ , ๐บ๐ ) , =
โ โ โ โ ๐ ๐ 1 โฎ ๐ ๐ ๐
โ โ โ โ , ๐ ๐ = โ โ โ โ ๐ ๐, 1 โฎ ๐ ๐,๐
โ โ โ โ , ๐๐ = โ โ โ โ ๐๐ 1 โฎ ๐๐ ๐
โ โ โ โ , here ๐ denotes the number of bias field basis
functions, ๐๐
๐ is the basis
unction ๐ evaluated at voxel ๐, and ๐ ๐ holds the bias field
coefficients forRI contrast ๐ . We use a flat prior for the
parameters of the likelihood: ( ๐ฝ๐ ) โ 1 .
.3. Segmentation
For a given MRI scan ๐ , segmentation proceeds by computing a
pointstimate of the unknown model parameters ๐ฝ = { ๐ฝ๐
, ๐ฝ๐ } : ฬ =
arg max
๐ฝ๐ ( ๐ฝ|๐ ) ,
hich effectively fits the model to the data. Details of this
procedurere given in Appendix A . Once ๏ฟฝฬ๏ฟฝ is found, the
corresponding maximum posteriori (MAP) segmentation
= arg max ๐ฅ ๐ ( ๐ฅ |๐ , ๏ฟฝฬ๏ฟฝ)
s obtained by assigning each voxel to the label with the highest
prob-bility, i.e., ๐ ๐ = arg max ๐ ๏ฟฝฬ๏ฟฝ ๐,๐ , where 0 โค ๏ฟฝฬ๏ฟฝ ๐,๐ โค 1
are probabilistic labelssignments
๐,๐ = ( ๐ ๐ |๐๐ + ๐ ๐๐ , ๐บ๐ ) ๐ ( ๐ ๐ = ๐ |๐ฝ๐ฅ ) โ๐พ
๐ โฒ=1 ( ๐ ๐ |๐๐ โฒ + ๐ ๐๐ , ๐บ๐ โฒ ) ๐ ( ๐ ๐ = ๐ โฒ|๐ฝ๐ฅ ) (2)
valuated at the estimated parameters ๏ฟฝฬ๏ฟฝ. It is worth emphasizing
that,ince the class means and variances { ๐๐ , ๐บ๐ } are estimated
from eacharget scan individually, the model automatically adapts to
each scanโspecific intensity characteristics โ a property that we
demonstrated ex-erimentally on several data sets acquired with
different imaging pro-ocols, scanners and field strengths in Puonti
et al. (2016) .
Our implementation of this method, written in Python with the
ex-eption of C++ parts for the computationally demanding
optimizationf the atlas mesh deformation, is available as part of
the open-sourceackage FreeSurfer 2 . It segments MRI brain scans
without any formf preprocessing such as skull stripping or bias
field correction, takinground 10 minutes to process one subject on
a state-of-the-art computermeasured on a machine with an Intel
12-core i7-8700K processor). Asxplained in Puonti et al. (2016) ,
in our implementation we make use ofhe fact that many
neuroanatomical structures share the same intensityharacteristics
in MRI to reduce the number of free parameters in theodel (e.g.,
all white matter structures share the same Gaussian mean
๐ and variance ๐บ๐ , as do most gray matter structures).
Furthermore, forome structures (e.g., non-brain tissue) we use
Gaussian mixture modelsnstead of a single Gaussian. In addition to
using full covariance matri-es ๐บ๐ , our implementation also
supports diagonal covariances, whichs currently selected as the
default behavior.
. Modeling lesions
In order to make SAMSEG capable of additionally segmenting
whiteatter lesions, we augment its generative model by introducing
a binary
esion map ๐ณ = ( ๐ง 1 , โฆ , ๐ง ๐ผ ) ๐ , where ๐ง ๐ โ {0 , 1}
indicates the presence of aesion in voxel ๐ . The augmented model
is depicted in Fig. 2 , where thelue parts indicate the additional
components compared to the originalAMSEG method. The complete model
consists of a joint (i.e., over both and ๐ณ simultaneously)
segmentation prior ๐ ( ๐ฅ , ๐ณ|๐ก , ๐ฝ๐ฅ ) , where ๐ก is a newatent
variable that helps constrain the shape of lesions, as well as a
joint
http://freesurfer.net/
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021)
117471
l t a r
3
๐
w
๐
i i
๐
H 0 r p
3
e w A ci
๐
H (
m m
l
H t t e
๐
wC j l w
T m e p t r
t B r
r s g a W i 4 E
3
l v
๐
w S d S m l
( b S r b i
3
t l G
๐
w
๐
Ic e
๐
H a
W dw l I fl c e t f i t h d s
ikelihood ๐ ( ๐ |๐ฅ , ๐ณ, ๐ฝ๐ , ๐ฝ๐๐๐ ) , where ๐ฝ๐๐๐ are new
parameters that governheir appearance . In the following, we
summarize the segmentation priornd the likelihood used in the
augmented model, as well as the way theesulting model is used to
obtain automated segmentations.
.1. Segmentation prior
We use a joint segmentation prior of the form
( ๐ฅ , ๐ณ|๐ก , ๐ฝ๐ฅ ) = ๐ ( ๐ณ|๐ก , ๐ฝ๐ฅ ) ๐ ( ๐ฅ |๐ฝ๐ฅ ) , here ๐ ( ๐ฅ |๐ฝ๐ฅ )
is the deformable atlas model defined in Section 2.1 , and ( ๐ณ|๐ก ,
๐ฝ๐ฅ ) = ๐ผ โ
๐ =1 ๐ ( ๐ง ๐ |๐ก , ๐ฝ๐ฅ )
s a factorized model where the probability that a voxel is part
of a lesions given by:
( ๐ง ๐ = 1 ||๐ก , ๐ฝ๐ฅ ) = ๐ ๐ ( ๐ก ) ๐๐ ( ๐ฝ๐ฅ ) . ere 0 โค ๐ ๐ ( ๐ก ) โค
1 aims to enforce shape constraints on lesions, whereas โค ๐๐ ( ๐ฝ๐ฅ )
โค 1 takes into account a voxelโs spatial location within its
neu-oanatomical context. Below we provide more details on both
these com-onents of the model.
.1.1. Modeling lesion shapes
In order to model lesion shapes, we use a variational
auto-ncoder ( Kingma and Welling, 2013; Rezende et al., 2014 )
according tohich lesion segmentation maps ๐ณ are generated in a
two-step process:n unobserved, low-dimensional code ๐ก is first
sampled from a spheri-al Gaussian distribution ๐ ( ๐ก ) = ( ๐ก |๐ , ๐
) , and subsequently โdecoded โnto ๐ณ by sampling from a factorized
Bernoulli model:
๐ ( ๐ณ|๐ก ) = ๐ผ โ๐ =1 ๐ ๐ ( ๐ก ) ๐ง ๐
(1 โ ๐ ๐ ( ๐ก )
)(1โ ๐ง ๐ ) . ere ๐ ๐ ( ๐ก ) are the outputs of a โdecoder โ
convolutional neural network
CNN) with filter weights ๐ , which parameterize the model. Given
a training data set in the form of ๐ binary segmentation maps
= { ๐ณ ( ๐ ) } ๐ ๐ =1 , suitable network parameters ๐ can in
principle be esti-
ated by maximizing the log-probability assigned to the data by
theodel :
og ๐ ๐ ( ) = โ๐ณโ
log ๐ ๐ ( ๐ณ ) , where ๐ ๐ ( ๐ณ ) = โซ๐ก ๐ ๐ ( ๐ณ ๐ |๐ก ) ๐ ( ๐ก ) d ๐ก
. owever, because the integral over the latent codes makes this
in-
ractable, we use amortized variational inference in the form of
stochas-ic gradient variational Bayes ( Kingma and Welling, 2013;
Rezendet al., 2014 ). In particular, we introduce an approximate
posterior
๐( ๐ก |๐ณ) = (๐ก |๐๐( ๐ณ) , diag ( ๐2 ๐( ๐ณ)) ), here the functions
๐๐( ๐ณ) and ๐๐( ๐ณ) are implemented as an โencoder โNN parameterized
by ๐. The variational parameters ๐ are then learned
ointly with the model parameters ๐ by maximizing a
variationalower bound
โ๐ณโ ๐,๐( ๐ณ) โค log ๐ ๐ ( ) using stochastic gradient descent,
here
๐,๐( ๐ณ) = โ ๐ท ๐พ๐ฟ ( ๐ ๐( ๐ก |๐ณ) ||๐ ( ๐ก )) + ๐ผ ๐ ๐( ๐ก |๐ณ) [log ๐ ๐
( ๐ณ|๐ก ) ]. (3) he first term is the KullbackโLeibler divergence
between the approxi-ate posterior and the prior, which can be
evaluated analytically. The
xpectation in the last term is approximated using Monte Carlo
sam-ling, using a change of variables (known as the
โreparameterizationrick โ) to reduce the variance in the
computation of the gradient withespect to ๐ ( Kingma and Welling,
2013; Rezende et al., 2014 ).
Our training data set was derived from manual lesion
segmen-ations in 212 MS subjects, obtained from the University
Hospital ofasel, Switzerland. The segmentations were all affinely
registered andesampled to a 1 mm isotropic grid of size 197 ร233
ร189. In order to
educe the risk of overfitting to the training data, we augmented
eachegmentation in the training data set by applying a rotation of
10 de-rees around each axis, obtaining a total of 1484
segmentations. Therchitecture for our encoder and decoder networks
is detailed in Fig. 3 .e trained the model for 1000 epochs with
mini-batch size of 10 us-
ng Adam optimizer ( Kingma and Ba, 2014 ) with a learning rate
of 1e-. We approximated the expectation in the variational lower
bound ofq. (3) by using a single Monte Carlo sample in each
step.
.1.2. Modeling the spatial location of lesions
In order to encode the spatially varying frequency of occurrence
ofesions across the brain, we model the probability of finding a
lesion inoxel ๐, based on its location alone, as
๐ ( ๐ฝ๐ฅ ) = ๐ฝ โ๐=1
๐ฝ๐ ๐ ๐ ๐ ( ๐ฝ๐ฅ ) ,
here lesion probabilities 0 โค ๐ฝ๐ โค 1 defined in the vertices of
the SAM-EG atlas mesh are interpolated at the voxel location. This
effectivelyefines a lesion probability map that deforms in
conjunction with theAMSEG atlas to match the neuroanatomy in each
image being seg-ented, allowing the model to impose contextual
constraints on where
esions are expected to be found. We estimated the parameters ๐ฝ๐
by running SAMSEG on MRI scans
T1-weighted (T1w) and FLAIR) of 54 MS subjects in whom lesions
hadeen manually annotated (data from the University Hospital of
Basel,witzerland), and recording the estimated atlas deformations.
The pa-ameters ๐ฝ๐ were then computed from the manual lesion
segmentationsy applying the same technique we used to estimate the
๐ผ๐
๐ parameters
n the SAMSEG atlas training phase (cf. Section 2.1 ).
.2. Likelihood function
For the likelihood, which links joint segmentations { ๐ฅ , ๐ณ} to
intensi-ies ๐ , we use the same model as SAMSEG in voxels that do
not containesion ( ๐ง ๐ = 0 ), but draw intensities in lesions ( ๐ง ๐
= 1 ) from a separateaussian with parameters ๐ฝ๐๐๐ = { ๐๐๐๐ , ๐บ๐๐๐ }
:
( ๐ |๐ฅ , ๐ณ, ๐ฝ๐ , ๐ฝ๐๐๐ ) = ๐ผ โ๐ =1 ๐ ( ๐ ๐ |๐ ๐ , ๐ง ๐ , ๐ฝ๐ , ๐ฝ๐๐๐
) ,
here
( ๐ ๐ |๐ ๐ = ๐, ๐ง ๐ , ๐ฝ๐ , ๐ฝ๐๐๐ ) = { ( ๐ ๐ |๐๐๐๐ + ๐ ๐๐ , ๐บ๐๐๐
) if ๐ง ๐ = 1 , ( ๐ ๐ |๐๐ + ๐ ๐๐ , ๐บ๐ ) otherwise . n order to
constrain the values that the lesion intensity parameters ๐ฝ๐๐๐ an
take, we make them conditional on the remaining intensity
param-ters using a normal-inverse-Wishart distribution :
( ๐ฝ๐๐๐ |๐ฝ๐ ) = ( ๐๐๐๐ |๐๐ ๐ , ๐โ1 ๐บ๐๐๐ ) IW ( ๐บ๐๐๐ |๐
๐๐บ๐ ๐ , ๐ โ
๐ โ 2) . (4) ere the subscript โWM โ denotes the white matter
Gaussian and ๐
> 1nd ๐ โฅ 0 are hyperparameters in the model.
This choice of model is motivated by the fact that the
normal-inverse-ishart distribution is a conjugate prior for the
parameters of a Gaussian
istribution: Eq. (4) can be interpreted as providing ๐
โpseudo-voxels โith empirical mean ๐๐ ๐ and variance ๐
๐บ๐ ๐ in
scenarios where the
esion intensity parameters ๐๐๐๐ and ๐บ๐๐๐ need to be estimated
from data.n the absence of any such pseudo-voxels ( ๐ = 0 ), Eq.
(4) reduces to aat prior on ๐ฝ๐๐๐ and lesions are modeled as a
completely independentlass. Although such models have been used in
the literature ( Guttmannt al., 1999; Kikinis et al., 1999; Shiee
et al., 2010; Sudre et al., 2015 )heir robustness may suffer when
applied to subjects with no or veryew lesions, such as controls or
patients with early disease, since theres essentially no data to
estimate the lesion intensity parameters from. Inhe other extreme
case, the number of pseudo-voxels can be set to such aigh value ( ๐
โ โ) that the intensity parameters of the lesions are
fullyetermined by those of WM. This effectively replaces the
Gaussian inten-ity model for WM in SAMSEG by a distribution with
longer tails, in the
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021)
117471
Fig. 3. Lesion shape model architecture consisting of two
symmetrical convolutional neural networks: (a) decoder network and
(b) encoder network. The decoder network generates lesion
segmentations from a low-dimensional code. Its architecture has
ReLU activation functions ( ๐ ( ๐ฅ ) = ๐๐๐ฅ (0 , ๐ฅ ) ) and batch
normalization ( Ioffe and Szegedy, 2015 ) between each
deconvolution layer, with the last layer having a sigmoid
activation function, ensuring 0 โค ๐ ๐ ( ๐ก ) โค 1 . The encoder
network encodes lesion segmentations into a latent code. The main
differences compared to the decoder network are the use of
convolutional layers instead of deconvolutional layers and, to
encode the mean and variance parameters, the last layer has been
split in two, with no activation function for the mean and a
softplus activation function ( ๐ ( ๐ฅ ) = ln (1 + ๐ ๐ฅ ) ) for the
variance.
f b I u n 2 P 2
t o 1 a w t p
3
o f
๐
w h v o
{
i r s a B a
S D
t
๐
fi i
S
S
e 2 i v t s
orm of a mixture of two Gaussians with identical means ( ๐๐๐๐ โก
๐๐ ๐ )ut variances that differ by a constant factor ( ๐บ๐๐๐ โก ๐
๐บ๐ ๐
vs. ๐บ๐ ๐ ).n this scenario, MS lesions are detected as model
outliers in a methodsing robust model parameter estimation ( Huber,
1981 ), another tech-ique that has also frequently been used in the
literature ( Aรฏt-Ali et al.,005; Bricq et al., 2008; Garcรญa-Lorenzo
et al., 2011; Liu et al., 2009;rastawa and Gerig, 2008; Rousseau et
al., 2008; Van Leemput et al.,001 ).
Based on pilot experiments on a variety of datasets (distinct
fromhe ones used in the results section), we found that good
results arebtained by using an intermediate value of ๐ = 500
pseudo-voxels for mm 3 isotropic scans, together with a scaling
factor ๐
= 50 . In order todapt to different image resolutions, ๐
is scaled inversely proportionallyith the voxel size in our
implementation. We will visually demonstrate
he role of these hyperparameters in constraining the lesion
intensityarameters in Section 5.1 .
.3. Segmentation
As in the original SAMSEG method, segmentation proceeds by
firstbtaining point estimates ๏ฟฝฬ๏ฟฝ that fit the model to the data,
and then in-erring the corresponding segmentation posterior:
( ๐ฅ , ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ) , hich is now jointly over ๐ฅ and ๐ณ
simultaneously. Unlike in SAMSEG,owever, both steps are made
intractable by the presence of the newariables ๐ฝ๐๐๐ and ๐ก in the
model. In order to side-step this difficulty, webtain ๏ฟฝฬ๏ฟฝ through a
joint optimization over both ๐ฝ and ๐ฝ๐๐๐ :
ฬ๐ฝ, ๏ฟฝฬ๏ฟฝ๐๐๐ } = arg max { ๐ฝ, ๐ฝ๐๐๐ } ๐ ( ๐ฝ, ๐ฝ๐๐๐ |๐ )
n a simplified model in which the constraints on lesion shape
have beenemoved, by clamping all decoder network outputs ๐ ๐ ( ๐ก )
to value 1. Thisimplification is defensible since the aim here is
merely to find appropri-te model parameters, rather than highly
accurate lesion segmentations.y doing so, the latent code ๐ก is
effectively removed from the modelnd the optimization simplifies
into the one used in the original SAM-
EG method, with only minor modifications due to the prior ๐ (
๐ฝ๐๐๐ |๐ฝ๐ ) .etails are provided in Appendix B .
Once parameter estimates ๏ฟฝฬ๏ฟฝ are available, we compute
segmenta-ions using the factorization
( ๐ฅ , ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ) = ๐ ( ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ) ๐ ( ๐ฅ |, ๐ณ, ๐ , ๏ฟฝฬ๏ฟฝ) , rst
estimating ๐ณ from ๐ ( ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ) (Step 1 below), and then plugging
this
nto ๐ ( ๐ฅ |, ๐ณ, ๐ , ๏ฟฝฬ๏ฟฝ) to estimate ๐ฅ (Step 2): tep 1:
Evaluating ๐ ( ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ) involves marginalizing over both ๐ก
and
๐ฝ๐๐๐ , which we approximate by drawing ๐ Monte Carlo samples{ ๐ก
( ๐ ) , ๐ฝ( ๐ )
๐๐๐ } ๐ ๐ =1 from ๐ ( ๐ก , ๐ฝ๐๐๐ |๐ , ๏ฟฝฬ๏ฟฝ) :
๐ ( ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ) = โซ๐ก , ๐ฝ๐๐๐ ๐ ( ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ, ๐ก , ๐ฝ๐๐๐ ) ๐ ( ๐ก ,
๐ฝ๐๐๐ |๐ , ๏ฟฝฬ๏ฟฝ) d ๐ก , ๐ฝ๐๐๐ โ 1 ๐
๐ โ๐ =1
๐ ( ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ, ๐ก ( ๐ ) , ๐ฝ( ๐ ) ๐๐๐ ) .
This allows us to estimate the probability of lesion occurrence
ineach voxel, which we then compare with a user-specified
thresh-old value ๐พ
๐ ( ๐ง ๐ = 1 |๐ ๐ , ๏ฟฝฬ๏ฟฝ) โท ๐พto obtain the final lesion
segmentation ๏ฟฝฬ๏ฟฝ ๐ . Details on how weapproximate ๐ ( ๐ง ๐ = 1 |๐ ๐
, ๏ฟฝฬ๏ฟฝ) using Monte Carlo sampling are pro-vided in Appendix C .
tep 2: Voxels that are not assigned to lesion ( ฬ๐ง ๐ = 0 ) in
the previousstep are finally assigned to the neuroanatomical
structure withthe highest probability ๐ ( ๐ ๐ = ๐ |๐ง ๐ = 0 , ๐ ๐ ,
๏ฟฝฬ๏ฟฝ) , which simply in-volves computing ฬ๐ ๐ = arg max ๐ ๏ฟฝฬ๏ฟฝ ๐,๐
with ๏ฟฝฬ๏ฟฝ ๐,๐ defined in Eq. (2) .
In agreement with other work ( Aรฏt-Ali et al., 2005;
Garcรญa-Lorenzot al., 2011; Jain et al., 2015; Prastawa and Gerig,
2008; Shiee et al.,010; Van Leemput et al., 2001 ), we have found
that using known priornformation regarding the expected intensity
profile of MS lesions inarious MRI contrasts can help reduce the
number of false positive de-ections. Therefore, we prevent some
voxels from being assigned to le-ion (i.e., forcing ๏ฟฝฬ๏ฟฝ = 0 ) based
on their intensities in relation to the
๐
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021)
117471
e t m c
c A S E S w t b
4
e W t m
4
a d i
Table 1
Summary of the datasets used in our experiments.
v t M p
4
p w
stimated intensity parameters { ฬ๐๐ , ฬ๐บ๐ } ๐พ ๐ =1 : In our
current implemen-ation only voxels with an intensity higher than
the mean of the grayatter Gaussian in FLAIR and/or T2 (if these
modalities are present) are
onsidered candidate lesions. Since estimating ๐ ( ๐ง ๐ = 1 |๐ ๐ ,
๏ฟฝฬ๏ฟฝ) involves repeatedly invoking the de-
oder and encoder networks of the lesion shape model, as detailed
inppendix C , we implemented the proposed method as an add-on
toAMSEG in Python using the Tensorflow library ( Abadi et al., 2015
).stimating ๏ฟฝฬ๏ฟฝ has the same computational complexity as running
SAM-EG (i.e., taking approximately 10 minutes on a state-of-the-art
machineith an Intel 12-core i7-8700K CPU), while the Monte Carlo
sampling
akes an additional 5 minutes on a GeForce GTX 1060 graphics
card,ringing the total computation time to around 15 minutes per
subject.
. Evaluation datasets and benchmark methods
In this section, we describe four datasets that we will use for
thexperiments in this paper, including two taken from public
challenges.e also outline two relevant methods for MS lesion
segmentation that
he proposed method is compared to in detail, as well as the
metrics andeasures used in our experiments.
.1. Datasets
In order to test the proposed method and demonstrate its
contrast-daptiveness, we conducted experiments on four datasets
acquired withifferent scanner platforms, field strengths,
acquisition protocols andmage resolution:
โข MSSeg : This dataset is the publicly available training set of
the MSlesion segmentation challenge that was held in conjunction
with theMICCAI 2016 conference ( Commowick et al., 2018 ). It
consists of 15MS cases from three different scanners, all acquired
using a harmo-nized imaging protocol ( Cotton et al., 2015 ). For
each patient a 3DT1w sequence, a contrast-enhanced (T1c) sequence,
an axial dualPD-T2-weighted (T2w) sequence and a 3D fluid
attenuation inver-sion recovery (FLAIR) sequence were acquired.
Each subjectโs lesionswere delineated by seven different raters on
the FLAIR scan and, ifnecessary, corrected using the T2w scan.
These delineated imageswere then fused to create a consensus lesion
segmentation for eachsubject. Both raw images and pre-processed
images (pre-processingsteps: denoising, rigid registration, brain
extraction and bias fieldcorrection โ see Commowick et al. (2018)
for details) were madeavailable by the challenge organizers. In our
experiments we usedthe pre-processed data, which required only
minor modifications inour software to remove non-brain tissues from
the model. We notethat the original challenge also included a
separate set of 38 testsubjects, but at the time of writing this
data is no longer available.
โข Trio : This dataset consists of 40 MS cases acquired on a
SiemensTrio 3T scanner at the Danish Research Center of Magnetic
Reso-nance (DRCMR). For each patient, a 3D T1w sequence, a T2w
se-quence and a FLAIR sequence were acquired. Ground truth
lesionsegmentations were automatically delineated on the FLAIR
imagesusing Jim software 3 , and then checked and, if necessary,
correctedby and expert rater at DRCMR using the T2w and MPRAGE
images.
โข Achieva : This dataset consists of 50 MS cases and 25 healthy
con-trols acquired on a Philips Achieva 3T scanner at DRCMR. After
avisual inspection of the images, we decided to remove 2
healthycontrols from the dataset as they present marked gray matter
atro-phy and white matter hyperintensities. For each patient, a 3D
T1wsequence, a T2w sequence and a FLAIR sequence were
acquired.Ground truth lesion segmentations were delineated using
the sameprotocol as the one used for the Trio dataset.
3 http://www.xinapse.com/
โข ISBI : This dataset is the publicly available test set of the
MS lesionsegmentation challenge that was held at the 2015
International Sym-posium on Biomedical Imaging ( Carass et al.,
2017 ). It consists of14 longitudinal MS cases, with 4 to 6 time
points each, separated byapproximately one year. Images were
acquired on a Philips 3T scan-ner. For each patient, a 3D T1w
sequence, a T2w sequence, a PDwsequence and a FLAIR sequence were
acquired. Images were firstpreprocessed (inhomogeneity correction,
skull stripping, dura strip-ping, again inhomogeneity correction โ
see Carass et al. (2017) fordetails), and then registered to a 1 mm
MNI template. Each subjectโslesions were delineated by two
different raters on the FLAIR scan,and, if necessary, corrected
using the other contrasts. As part of thechallenge, a training
dataset of 5 additional longitudinal MS cases isalso available,
with the same scanner, imaging protocols and delin-eation procedure
as the test dataset.
A summary of the datasets, with scanner type, image modalities
andoxel resolution details, can be found in Table 1 . For each
subject allhe contrasts were co-registered and resampled to the
FLAIR scan forSSeg, and to the T1w scan for Trio, Achieva and ISBI.
This is the only
reprocessing step required by the proposed method.
.2. Benchmark methods for lesion segmentation
In order to evaluate the lesion segmentation component of the
pro-osed method in detail, we compared it to two publicly available
andidely used algorithms for MS lesion segmentation:
โข LST-lga 4 ( Schmidt et al., 2012 ): This lesion growth
algorithm startsby segmenting a T1w image into three main tissue
classes (CSF, GMand WM) using SPM12 5 , and combines the resulting
segmentationwith co-registered FLAIR intensities to calculate a
lesion belief map.A pre-chosen initial threshold ๐
is then used to
create an initial bi-nary lesion map, which is subsequently grown
along voxels that ap-pear hyperintense in the FLAIR image. We set ๐
to its recommended
4 https://www.applied-statistics.de/lst.html 5
https://www.fil.ion.ucl.ac.uk/spm/software/spm12/
http://www.xinapse.com/https://www.applied-statistics.de/lst.htmlhttps://www.fil.ion.ucl.ac.uk/spm/software/spm12/
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021)
117471
W T p
m r c c
4
t s m P v b w
D
w o
p l s e o i w F t c
5
n t c o a f o
I
b T w
5
m s l p b t a a e l
p l G t c d w r G
p t m F i s T t d
๐
w n d f t s
t i T o u A a W w
5
e s m
default value of 0.3, which was also used in previous studies (
Mรผhlauet al., 2013; Rissanen et al., 2014 ).
โข NicMsLesions 6 (Valverde et al., 2017, 2019) : This deep
learningmethod is based on a cascade of two 3D convolutional neural
net-works, where the first one reveals possible candidate lesion
voxels,and the second one reduces the number of false positive
outcomes.Both networks were trained by the authors of the method on
T1wand FLAIR scans coming from a publicly available training
datasetof the MS lesion segmentation challenge held in conjunction
withthe MICCAI 2008 conference ( Styner et al., 2008 ) (20 cases)
and theMSSeg dataset (15 cases). This method was one of the top
performerson the test dataset of the MICCAI 2016 challenge (
Commowick et al.,2018 ), and one of the few methods for which an
implementation ispublicly available.
e note that both these benchmark methods are specifically
targeting1w-FLAIR input, whereas the proposed method is not tuned
to anyarticular combination of input modalities.
Although we only compared our method in detail to these two
bench-arks, many more good methods for MS lesion segmentation
exist. We
efer the reader to the MSSeg paper ( Commowick et al., 2018 ),
the ISBIhallenge paper ( Carass et al., 2017 ) and the ISBI
challenge website 7 toompare the reported performance further with
other ones.
.3. Metrics and measures
In order to evaluate the influence of varying the input
modalities onhe segmentation performance of the proposed method,
and to assessegmentation accuracy with respect to that of other
methods and hu-an raters, we used a combination of segmentation
volume estimates,earson correlation coefficients between such
estimates and referencealues, and Dice scores. Volumes were
computed by counting the num-er of voxels assigned to a specific
structure and converting into mm 3 ,hereas Dice coefficients were
computed as
ice ๐,๐ = 2 โ
|๐ โฉ ๐ ||๐| + |๐ | ,
here ๐ and ๐ denote segmentation masks, and | โ
| counts the
numberf voxels in a mask.
The proposed method and both benchmark algorithms produce
arobabilistic lesion map that needs to be thresholded to obtain a
finalesion segmentation. This requires an appropriate threshold
value to beet for this purpose (variable ๐พ in the proposed method).
In order tonsure an objective comparison between the methods, we
used a leave-ne-out cross-validation strategy in which the
threshold for each testmage was set to the value that maximizes the
average Dice overlapith manual segmentations in all the other
images of the same dataset.or the reported performance of the
methods on the ISBI dataset, thehresholds were tuned on the 5
training subjects that are part of thehallenge instead.
. Results
In this section, we first illustrate the effect of the various
compo-ents of our model. We then evaluate how the proposed model
adaptso different input modalities and acquisition platforms.
Subsequently weompare the lesion segmentation performance of our
model against thatf the two benchmark methods, relate it to human
inter-rater variability,nd analyze its performance on the ISBI
challenge data. Finally, we per-orm an indirect validation of the
whole-brain segmentation componentf the method.
Throughout the section we use boxplots to show some of the
results.n these plots, the median is indicated by a horizontal
line, plotted inside
6 https://github.com/sergivalverde/nicMsLesions 7
https://smart-stats-tools.org/lesion-challenge
o a s d
oxes that extend from the first to the third quartile values of
the data.he range of the data is indicated by whiskers extending
from the boxes,ith outliers represented by circles.
.1. Illustration of the method
In order to illustrate the effect of the various components of
theethod, here we analyze its behaviour when segmenting
T1w-FLAIR
cans of two MS subjects โ one with a low and one with a high
lesionoad. Fig. 4 shows, in addition to the input data and the
final lesionrobability estimate ๐ ( ๐ง ๐ = 1 |๐ ๐ , ๏ฟฝฬ๏ฟฝ) , also an
intermediate lesion proba-ility obtained with the simplified model
used to estimate ๏ฟฝฬ๏ฟฝ, i.e., beforehe FLAIR-based intensity
constraints and the lesion shape constraintsre applied. From these
images we can see that the lesion shape modelnd the intensity
constraints help remove false positive detections andnforce more
realistic shapes of lesions, especially for the case with lowesion
load.
Fig. 5 analyzes the effect of the prior ๐ ( ๐ฝ๐๐๐ |๐ฝ๐ ) on the
lesion intensityarameters ๐ฝ๐๐๐ for the two subjects shown in Fig. 4
. When the lesionoad is high, the prior does not have a strong
influence, leaving the lesionaussian โfree โ to fit the data.
However, when the lesion load is low,
he lesion Gaussian is constrained to retain a wide variance and
a meanlose to the mean of WM, effectively turning the model into an
outlieretection method for WM lesions. This behavior is important
in caseshen few lesions are present in the images, ensuring the
method works
obustly even when only limited data is available to estimate the
lesionaussian parameters.
In order to analyze the effect of the lesion shape prior, we
com-ared the lesion segmentation performance of the proposed method
withhat obtained when the shape prior was intentionally removed
from theodel (i.e., all the decoder network outputs ๐ ๐ ( ๐ก )
clamped to value 1).
or a fair comparison, the lesion threshold value ๐พ was re-tuned
to max-mize performance for the method without shape prior, in the
way de-cribed in Section 4.3 . Table 2 summarizes the results
across the MSSeg,rio and Achieva datasets, for different ranges of
lesion load. In addi-ion to Dice scores, the table also reports
results for precision and recall,efined as
๐๐๐ ๐๐ ๐๐๐ = ๐ ๐ ๐ ๐ + ๐น ๐
๐๐๐ ๐๐๐ = ๐ ๐ ๐ ๐ + ๐น ๐
,
here TP , FP and FN count the true positive, false positive and
falseegative voxels compared to the manual segmentation. The
results in-icate that, although performance is unchanged for high
lesion loads,or which segmentation is generally easier ( Commowick
et al., 2018 ),he lesion shape prior clearly improves segmentations
in subjects withmall and medium lesion loads.
In order to demonstrate that the model also works robustly in
con-rol subjects (with no lesions at all), and can therefore be
safely appliedn studies comparing MS subjects with controls, we
further segmented1w-FLAIR scans of the Achieva dataset, and
computed the total volumef the lesions in each subject. The results
are shown in Fig. 6 ; the vol-mes were 8.95 ยฑ 9.18 ml for MS
subjects vs. 0.98 ยฑ 0.77 ml for controls.lthough the average lesion
volume for controls was not exactly zero, visual inspection
revealed that this was due to some controls havingM
hyperintensities that were segmented by the method as MS
lesions,hich we find acceptable.
.2. Scanner and contrast adaptive segmentations
In order to demonstrate the ability of our method to adapt to
differ-nt types and combinations of MRI sequences acquired with
differentcanners, we show the methodโs segmentation results along
with theanual segmentations for a representative subset of
combinations for
ne subject in the MSSeg (consensus as manual segmentation), the
Triond the Achieva datasets in Fig. 7 . It is not feasible to show
all pos-ible combinations. For instance, mixing the 5 contrasts in
the MSSegataset alone already yields 31 possible multi-contrast
combinations.
https://github.com/sergivalverde/nicMsLesionshttps://smart-stats-tools.org/lesion-challenge
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021)
117471
Fig. 4. Illustration of how intensity constraints and the lesion
shape model help reduce false positive lesion detections in the
method. Top row: a subject with a low lesion load; Bottom row: a
subject with a high lesion load. From left to right: T1w and FLAIR
input; intermediate lesion probability obtained with the simplified
model used to estimate ๏ฟฝฬ๏ฟฝ; mask of candidate voxels based on
intensity alone (intensity higher than the mean gray matter
intensity in FLAIR); and final lesion probability estimate ๐ ( ๐ง ๐
= 1 |๐ ๐ , ฬ๐ฝ) produced by the method.
Fig. 5. Illustration of the effect of the prior ๐ ( ๐ฝ๐๐๐ |๐ฝ๐ )
on the lesion intensity parameters, both in the case of a lesion
load that is low (left, corresponding to the subject in the top row
of Fig. 4 ) and high (right, corresponding to the subject in the
bottom row of Fig. 4 ). The illustration is from the Monte Carlo
sampling phase of the method: In each case, the value of the
parameters of the lesion Gaussian is taken as the average over the
Monte Carlo samples { ๐ฝ( ๐ )
๐๐๐ } ๐ ๐ =1 , and the points represent the
resulting lesion posterior estimate ๐ ( ๐ง ๐ = 1 |๐ ๐ , ฬ๐ฝ) in
each voxel. Table 2
Comparison in terms of lesion segmentation performance between
the proposed method and a method where the lesion shape model was
intentionally removed. Results are expressed in terms of mean ยฑ
standard deviation of Dice overlap, precision and recall for
different ranges of lesion load. Lesion segmentations were computed
across three different datasets (MSSeg, Trio and Achieva) on
T1w-FLAIR input.
Lesion load
Dice Precision Recall
Shape model No shape model Shape model No shape model Shape
model No shape model
(0, 2] [ml] 0.42 ( ยฑ 0.10) 0.38 ( ยฑ 0.10) 0.32 ( ยฑ 0.12) 0.24 (
ยฑ 0.07) 0.28 ( ยฑ 0.09) 0.24 ( ยฑ 0.07) (2, 10] [ml] 0.50 ( ยฑ 0.13)
0.47 ( ยฑ 0.13) 0.37 ( ยฑ 0.13) 0.33 ( ยฑ 0.11) 0.34 ( ยฑ 0.12) 0.32 (
ยฑ 0.12) (10, โ) [ml] 0.70 ( ยฑ 0.11) 0.70 ( ยฑ 0.11) 0.62 ( ยฑ 0.20)
0.62 ( ยฑ 0.20) 0.55 ( ยฑ 0.12) 0.55 ( ยฑ 0.13)
(0, โ) [ml] 0.57 ( ยฑ 0.16) 0.55 ( ยฑ 0.17) 0.46 ( ยฑ 0.20) 0.43 (
ยฑ 0.20) 0.42 ( ยฑ 0.16) 0.40 ( ยฑ 0.16)
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021)
117471
Fig. 6. Difference between healthy controls (HC) and MS subjects
in lesion vol- ume, as detected by the proposed method on the
Achieva dataset (23 HC sub- jects, 50 MS subjects, T1w-FLAIR
input). Lines indicate means across subjects.
N s w b T v c i
m i t b l d F d a t p t
m s i F
5
m m t b p
5
j m a s o t a t l
t o t c d s o b
5
I F o 0 o
a o ( c p s o s
w a t n t w f t F d d s p u p f
5
i t M t b a h w
5
u p l a c f
8 https://smart-stats-tools.org/lesion-challenge
onetheless, it is clear that the model is indeed able to adapt
to thepecific contrast properties of its input scans. A visual
inspection of itshole-brain segmentation component seems to
indicate that the methodenefits from having access to the T1w
contrast for best performance.his is especially clear when only the
FLAIR contrast is provided, as thisisually degrades the
segmentation of the white-gray boundaries in theortical regions due
to the low contrast between white and gray mattern FLAIR.
When comparing the lesion probability maps produced by theethod
visually with the corresponding manual lesion segmentations,
t seems that the method benefits from having access to the FLAIR
con-rast for the best lesion segmentation performance. This is
confirmedy a quantitative analysis shown in Fig. 8 , which plots
the Dice over-ap scores for each of the seven input combinations
that all our threeatasets have in common, namely T1w, T2w, FLAIR,
T1w-T2w, T1w-LAIR, T2w-FLAIR, and T1w-T2w-FLAIR. Although the
inclusion of ad-itional contrasts does not hurt lesion segmentation
performance, acrossll three datasets the best results are obtained
whenever the FLAIR con-rast is included as input to the model. This
finding is perhaps not sur-rising, given that the manual
delineations were all primarily based onhe FLAIR image.
Considering both the whole-brain and lesion segmentation
perfor-ance together, we conclude that the combination T1w-FLAIR is
well-
uited for obtaining good results with the proposed method,
althought will also accept other and/or additional contrasts beyond
T1w andLAIR.
.3. Lesion segmentation
In order to compare the lesion segmentation performance of
ourodel against that of the two benchmark methods, and relate it to
hu-an inter-rater variability, we here present a number of results
based on
he T1w-FLAIR input combination (which is the combination
requiredy the benchmark methods). We also analyze the lesion
segmentationerformance of our method on the public ISBI
challenge.
.3.1. Comparison with benchmark lesion segmentation methods
Fig. 9 shows automatic segmentations of two randomly selected
sub-ects from the MSSeg, the Trio and the Achieva datasets, both
for ourethod and for the two benchmark methods LST-lga and
NicMSLesions,
long with the corresponding manual segmentations (consensus
manualegmentations for MSSeg). Visually, all three methods perform
similarlyn the Achieva MS data, but some of the results for
NicMSLesions appearo be inferior to those obtained with the other
two methods on MSSegnd Trio data. This qualitative observation is
confirmed by the quanti-ative analysis shown in Fig. 10 , where the
three methodsโ Dice over-ap scores are compared on each dataset:
similar performances are ob-
ained for all methods on the Achieva data, but NicMSLesions
trails thether two methods on MSSeg and Trio data. Especially for
MSSeg datahis is a surprising result, since NicMSLesions was
trained on this spe-ific dataset, i.e., the subjects used for
testing were part of the trainingata of this method, potentially
biasing the results in favor of NicMSLe-ions. Based on Dice scores,
the proposed method outperforms LST-lgan MSSeg data, although there
are no statistically significant differencesetween the two methods
on the other datasets.
.3.2. Results on the ISBI data
We also evaluated the performance of the proposed method on
theSBI challenge data, obtaining a mean Dice score of 0.58 when
T1w-LAIR input is used. This score is comparable to the ones we
obtainedn the other three datasets analyzed in this paper (cf. Fig.
10 ) โ MSSeg:.65, Trio: 0.58 and Achieva: 0.54. A few example
segmentation resultsn the ISBI data are available in the
Supplementary Material, Fig. 4.
The ISBI challenge website 8 ranks submissions according to an
over-ll lesion segmentation performance score that takes into
account Diceverlap, volume correlation, surface distance, and a few
other metricssee Carass et al., 2017 for details). A score of 100
indicates perfectorrespondence, while 90 is meant to correspond to
human inter-ratererformance ( Carass et al., 2017; Styner et al.,
2008 ). We obtained acore of 87.87, which places us around half-way
in the ranking of theriginal challenge ( Carass et al., 2017 ),
although we note that the web-ite currently lists methods with a
much higher score.
In order to relate the performance of our method to the one
obtainedith the two benchmark methods, we also attempted to run
LST-lgand NicMSLesions on this dataset. However, the preprocessing
appliedo the ISBI challenge data proved problematic for LST-lga,
and we wereot able to get any results with this method. Results for
NicMSLesions inerms of Dice overlap are shown in Fig. 11 , together
with those obtainedith the proposed method. It is clear that
NicMSLesions suffers strongly
rom the domain shift between its training data and the ISBI
data, a facthat was already reported in Valverde et al. (2019) .
For completeness,ig. 11 also includes results for NicMSLesions when
its network was up-ated on the ISBI training data as described in
Valverde et al. (2019) :ifferent subsets of network parameters were
retrained on the baselinecan of each of the five ISBI training
subjects, and the combination thaterformed best on all 21 training
images was retained. From the fig-re it can be seen that this
partially retrained network has comparableerformance to the
proposed model, although the latter attains this per-ormance
without any retraining.
.3.3. Inter-rater variability
To evaluate the proposed methodโs lesion segmentation
performancen the context of human inter-rater variability, we took
advantage ofhe availability of lesion segmentations by seven
different raters in theSSeg dataset. Table 3 shows the lesion
segmentation performance in
erms of average Dice overlap between each pair of the seven
raters, andetween each rater and the proposed method. On average,
our methodchieves a Dice overlap score of 0.57, which is slightly
below the meanuman ratersโ range of [0.59, 0.69]. We note that this
result is in lineith those obtained in the MSSeg challenge (
Commowick et al., 2018 ).
.4. Whole-brain segmentation
Since no ground truth segmentations are available for a direct
eval-ation of the whole-brain segmentation component of our method,
weerformed an indirect validation, evaluating its potential for
replacingesion filling approaches that rely on manually annotated
lesions, as wells its ability to replicate known atrophy patterns
in MS. The results con-entrate on the following 25 main
neuroanatomical regions, segmentedrom T1w-FLAIR scans: left and
right cerebral white matter, cerebellum
https://smart-stats-tools.org/lesion-challenge
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021)
117471
Fig. 7. Contrast-adaptiveness of the proposed method to
different combinations of input modalities. Segmentations are shown
for one subject of the MSSeg (top row), the Trio (mid- dle row) and
the Achieva MS (bottom row) dataset. For each subject the top row
shows slices of the data and the manual lesion anno- tation; the
middle row shows the lesion prob- ability map and Dice score
computed by the proposed method for specific input combina- tions;
and the bottom row shows the corre- sponding complete segmentations
produced by the method. Enlarged figures for each subject are
available in the Supplementary Material Figs. 1โ3.
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021)
117471
Fig. 8. Lesion segmentation performance of the proposed method
in terms of Dice overlap with manual raters on three different
datasets when different input contrasts are used (T1w, T2w, FLAIR,
T1w-T2w, T1w-FLAIR, T2w-FLAIR, T1w-T2w-FLAIR). From left to right:
Dice scores on MSSeg, Trio and Achieva MS data.
Table 3
Comparison of lesion segmentation performance in terms of
average Dice score between each pair of the seven raters of the
MSSeg dataset, and be- tween each rater and the proposed method
(T1w-FLAIR input).
w p a f t โ t
5
t M e a f p n i p w m d w i r s c
c w m d T
e s t e a s s t v t i
5
r t 2 p u r u o v s r b 2 v
6
m i m o i p f W p s t r s o
t
hite matter, cerebral cortex, cerebellum cortex, lateral
ventricle, hip-ocampus, thalamus, putamen, pallidum, caudate,
amygdala, nucleusccumbens and brain stem. To avoid cluttering, the
quantitative resultsor left and right structures are averaged. We
note that lesion segmen-ations are not merged into any of these
brain structures (i.e., leavingholes โ in white matter), so that
the results reflect performance only forhe normal-appearing parts
of structures.
.4.1. Comparison with lesion filling
It is well-known that white matter lesions can severely
interfere withhe quantification of normal-appearing structures when
standard brainRI segmentation techniques are used ( Battaglini et
al., 2012; Ceccarelli
t al., 2012; Chard et al., 2010; Gelineau-Morel et al., 2012;
Nakamurand Fisher, 2009; Vrenken et al., 2013 ). A common strategy
is there-ore to use a lesion-filling ( Chard et al., 2010; Sdika
and Pelletier, 2009 )rocedure, in which lesions are first manually
segmented, their origi-al voxel intensities are replaced with
normal-appearing white matterntensities, and standard tools are
then used to segment the resulting,reprocessed images. Using such a
procedure with SAMSEG would yieldhole-brain segmentations that can
serve as โsilver standard โ bench-arks against which the results of
the proposed method (which worksirectly on the original scans) can
be compared. In practice, however,e have noticed that replacing
lesion intensities, which is typically done
n T1w only, did not work well in FLAIR in our experiments.
Therefore,ather than explicitly replacing intensities, we obtained
silver standardegmentations by simply masking out lesions during
the SAMSEG pro-essing, effectively ignoring lesion voxels during
the model fitting.
We wished to interpret segmentation vs. silver standard
discrepan-ies within the context of the human inter-rater
variability associatedith manually segmenting lesions. Therefore,
we performed experi-ents on the MSSeg dataset, repeatedly
re-computing the silver stan-ard using each of the seven ratersโ
manual lesion annotations in turn.he results are shown in Tables 4
and 5 for Pearson correlation co-
fficients between estimated volumes and Dice segmentation
overlapcores, respectively. Each line in these tables corresponds
to one struc-ure, showing the average consistency between the
silver standard ofach rater compared to that of the six other
raters, as well as the aver-ge consistency between the proposed
methodโs segmentation and theilver standards of all raters. The
results indicate that, in terms of Pear-on correlation coefficient,
the performance of our method falls withinhe range of inter-rater
variability, albeit narrowly (average value 0.988s. inter-rater
range [0.988, 0.992]). In terms of Dice scores, however,he method
slightly underperforms compared to the inter-rater variabil-ty
(average value 0.971 vs. inter-rater range [0.978, 0.980]).
.4.2. Detecting atrophy patterns in MS
In a final analysis, we assessed whether previously reported
volumeeductions in specific brain structures in MS can
automatically be de-ected with the proposed method. Towards this
end, we segmented the3 controls and the 50 MS subjects of the
Achieva dataset, and com-ared the volumes of various structures
between the two groups. Vol-mes were normalized for age, gender and
total intracranial volume byegressing them out with a general
linear model. The intracranial vol-me used for the normalization
was computed by summing the volumesf all the structures, as
segmented by the method, within the intracranialault. The results
are shown in Fig. 12 . Although not all volumes showedignificant
difference between groups, well established differences
wereeplicated. In particular, we demonstrated decreased volumes of
cere-ral white matter, cerebral cortex, thalamus and caudate (
Azevedo et al.,018; Chard et al., 2002; Houtchens et al., 2007 ) as
well as an increasedolume of the lateral ventricles ( Zivadinov et
al., 2016 ).
. Discussion and conclusion
In this paper, we have proposed a method for the simultaneous
seg-entation of white matter lesions and normal-appearing
neuroanatom-
cal structures from multi-contrast brain MRI scans of MS
patients. Theethod integrates a novel model for white matter
lesions into a previ-
usly validated generative model for whole-brain segmentation. By
us-ng separate models for the shape of anatomical structures and
their ap-earance in MRI, the algorithm is able to adapt to data
acquired with dif-erent scanners and imaging protocols without
needing to be retrained.
e validated the method using four disparate datasets, showing
robusterformance in white matter lesion segmentation while
simultaneouslyegmenting dozens of other brain structures. We
further demonstratedhat it can also be safely applied to MRI scans
of healthy controls, andeplicate previously documented atrophy
patterns in deep gray mattertructures in MS. The proposed algorithm
is publicly available as partf the open-source neuroimaging package
FreeSurfer.
By performing both whole-brain and white matter lesion
segmenta-ion at the same time, the method we propose aims to
supplant the two-
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021)
117471
Fig. 9. Visual comparison of lesion probability maps on three
different datasets for the proposed method and two state-of-the-art
lesion segmentation methods (LST-lga and NicMsLesions) on T1w-FLAIR
input. (Top) Two subjects from the MSSeg dataset; (Middle) Two
subjects from the Trio dataset; (Bottom) Two subjects from the
Achieva dataset. For each subject the top row shows slices of the
data and the manual annotation while the bottom row shows the
lesion probability maps for our model, LST-lga and
NicMsLesions.
s r a s w b
s a o m d w
tage โlesion filling โ procedure that is commonly used in
morphomet-ic studies in MS, in which lesions segmented in a first
step are used tovoid biasing a subsequent analysis of
normal-appearing structures withoftware tools developed for healthy
brain scans. In order to evaluatehether our method is successful in
this regard, we compared its whole-rain segmentation performance
against the results obtained when le-
ions are segmented a priori by seven different human raters
instead ofutomatically by the method itself. Our results show that
the volumesf various neuroanatomical structures obtained when
lesions are seg-ented automatically fall within the range of
inter-rater variability, in-icating that the proposed method may be
used instead of lesion fillingith manual lesion segmentations in
large volumetric studies of brain
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021)
117471
Table 4
Average Pearson correlation coefficients of brain structure
volume estimates between the silver standard of each rater compared
to that of the six other raters in the MSSeg dataset, as well as
the average consistency between the proposed methodโs segmentation
and the silver standards of all raters (T1w-FLAIR input). Each line
shows an average across raters for a specific brain structure.
Table 5
Same as Table 4 , but with Dice segmentation overlap scores.
Each line shows an average across raters โ similar to the last row
of Table 3 โ for a specific brain structure.
a f f o
p a
a c ๐พ
f q o h
F
sf
trophy in MS. When detailed spatial overlap is analyzed,
however, weound that the automatic segmentation does not fully
reach the per-ormance obtained with human lesion annotation as
measured by Diceverlap.
Like many other methods for MS lesion segmentation, the
methodroposed here produces a spatial map indicating in each voxel
its prob-bility of belonging to a lesion, which can then be
thresholded to obtain
ig. 10. Lesion segmentation performance in terms of Dice overlap
with manual ratLesions) on T1w-FLAIR input. Statistically
significant differences between two methor p -value < 0.001, โโ
โ โ for p -value < 0.01 and โโ โ for p -value < 0.05). From
left to
final lesion segmentation. Although in our experience good
resultsan be obtained by using the same threshold value across
datasets (e.g.,= 0 . 5 ), changing this value allows one to adjust
the trade-off between
alse positive and false negative lesion detections. Since some
MRI se-uences and scanners will depict lesions with a higher
contrast thanthers, and because there is often considerable
disagreement betweenuman experts regarding the exact extent of
lesions ( Zijdenbos et al.,
ers for the proposed method and two benchmark methods (LST-lga
and NicM- ods, computed with a two-tailed paired t -test, are
indicated by asterisks ( โโ โ โ โright: results on the MSSeg, the
Trio and the Achieva dataset.
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021)
117471
Fig. 11. Lesion segmentation performance in terms of Dice
overlap with man- ual raters on the ISBI dataset for the proposed
method, NicMsLesions, and NicM- sLesions with partial retraining
(see text for details). Statistically significant dif- ferences
between two methods, computed with a two-tailed paired t -test, are
indicated by asterisks ( โโ โ โ โ indicates p -value <
0.001).
1 a v a o
c a c n i c s r g b q i
p
t s s m s S 2 W i l d
D
GG P b
C
m P
i W
s S
i M
A
2 C s
F
pw
998 ), in our implementation we therefore expose this threshold
values an optional, tunable parameter to the end-user. Suitable
thresholdalues can be found by visually inspecting the lesion
segmentations of few cases or, in large-scale studies, using
cross-validation as we did inur experiments.
By providing the ability to robustly and efficiently segment
multi-ontrasts scans of MS patients across a wide range of imaging
equipmentnd protocols, the software tool presented here may help
facilitate largeohort studies aiming to elucidate the morphological
and temporal dy-amics underlying disease progression and
accumulation of disabilityn MS. Furthermore, in current clinical
practice, high-resolution multi-ontrast images, which can be used
to increase the accuracy of lesionegmentation, represent a
significantly increased burden for the neu-oradiologist to read,
and are hence frequently not acquired. The emer-ence of robust,
multi-contrast segmentation tools such as ours may helpreak the
link between the resolution and number of contrasts of the ac-uired
data and the human time needed to evaluate it, thus
potentiallyncreasing the accuracy of the resulting measures.
The ability of the proposed method to automatically tailor its
ap-earance models for specific datasets makes it very flexible,
allowing it
ig. 12. Differences between healthy controls (HC) and MS
subjects in normalizedroposed method on the Achieva dataset (23 HC
subjects, 50 MS subjects, T1w-FLAIRith a Welchโs t-test, are
indicated by asterisks ( โโ โ โ for p -value < 0.01 and โโ โ for
p
o seamlessly take advantage of novel, potentially more sensitive
andpecific MRI acquisitions as they are developed. Although not
exten-ively tested, the proposed method should make it possible to,
withinimal adjustments, segment data acquired with advanced
research
equences such as MP2RAGE ( Marques et al., 2010 ), DIR ( Redpath
andmith, 1994 ), FLAIR 2 ( Wiggermann et al., 2016 ) or T2 โ (
Anderson et al.,001 ), both at conventional and at ultra-high
magnetic field strengths.e are currently pursuing several
extensions of the proposed method,
ncluding the ability to go on and create cortical surfaces and
parcel-ations in FreeSurfer, as well as a dedicated version for
longitudinalata ( Cerri et al., 2020 ).
eclaration of Competing Interest
Hartwig R. Siebner has received honoraria as speaker from
Sanofienzyme, Denmark and Novartis, Denmark, as consultant from
Sanofienzyme, Denmark and as senior editor (NeuroImage) from
Elsevierublishers, Amsterdam, The Netherlands. He has received
royalties asook editor from Springer Publishers, Stuttgart,
Germany.
RediT authorship contribution statement
Stefano Cerri: Conceptualization, Methodology, Software, For-al
analysis, Validation, Visualization, Writing - original draft.
Oulauonti: Supervision, Methodology, Software, Writing - review
& edit-
ng. Dominik S. Meier: Resources, Writing - review & editing.
Jensuerfel: Resources, Writing - review & editing. Mark Mรผhlau:
Re-
ources, Writing - review & editing, Funding acquisition.
Hartwig R.iebner: Supervision, Resources, Writing - review &
editing, Fund-ng acquisition. Koen Van Leemput: Supervision,
Conceptualization,ethodology, Software, Writing - review &
editing, Funding acquisition.
cknowledgments
This project has received funding from the European Unionโs
Horizon020 research and innovation program under the Marie
Sklodowska-urie grant agreement No. 765148, as well as from the
National In-titute Of Neurological Disorders and Stroke under
project number
volume estimates of various neuroanatomical structures, as
detected by the input). Statistically significant differences
between the two groups, computed
-value < 0.05).
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021)
117471
R c v ( m G
A
r c i p 1 c t c b a
๐โโโโw
๐ฆ
๐
๐
a
๐
A
w d b t c w
๐ผ
a E
c o e t G
๐
๐บ
๐
๐บ
w
A
C t S r G o
๐บ
w t i
๐
i
as
๐
c T t w s t p
S
t
R
A
01NS112161. Hartwig R. Siebner holds a 5-year professorship in
pre-ision medicine at the Faculty of Health Sciences and Medicine,
Uni-ersity of Copenhagen which is sponsored by the Lundbeck
FoundationGrant Nr. R186-2015-2138). Mark Mรผhlau was supported by
the Ger-an Research Foundation (Priority Program SPP2177,
Radiomics: Nexteneration of Biomedical Imaging) โ project number
428223038.
ppendix A. Parameter optimization in SAMSEG
We here describe how we perform the optimization of ๐ ( ๐ฝ|๐ )
withespect to ๐ฝ in the original SAMSEG model. We follow a
coordinate as-ent approach, in which a limited-memory BFGS
optimization of ๐ฝ๐ฅ isnterleaved with a generalized EM (GEM)
optimization of the remainingarameters ๐ฝ๐ . The GEM algorithm was
derived in ( Van Leemput et al.,999 ) based on ( Wells et al., 1996
), and is repeated here for the sake ofompleteness. It iteratively
constructs a tight lower bound to the objec-ive function by
computing the soft label assignments ๐ค ๐,๐ based on theurrent
estimate of ๐ฝ๐ ( Eq. (2) ), and subsequently improves the loweround
(and therefore the objective function) using the following set
ofnalytical update equations for these parameters :
๐ โ ๐ฆ ๐ and ๐บ๐ โ ๐ ๐ , โ๐
๐ 1 โฎ ๐ ๐
โ โ โ โ โ โ โ โ โ ๐ ๐ ๐ 1 , 1 ๐ โฆ ๐ ๐ ๐ 1 ,๐ ๐
โฎ โฑ โฎ ๐ ๐ ๐ ๐, 1 ๐ โฆ ๐ ๐ ๐ ๐,๐ ๐
โ โ โ โ โ1 โ โ โ โ โ ๐ ๐
(โ๐ ๐ =1 ๐ 1 ,๐ ๐ซ 1 ,๐
)โฎ
๐ ๐ (โ๐
๐ =1 ๐ ๐,๐ ๐ซ ๐,๐ )โ โ โ โ โ ,
here
๐ = โ๐ผ ๐ =1 ๐ค ๐,๐ ( ๐ ๐ โ ๐ ๐๐ )
๐ ๐ with ๐ ๐ =
โ๐ผ ๐ =1 ๐ค ๐,๐ ,
๐ = โ๐ผ ๐ =1 ๐ค ๐,๐ ( ๐ ๐ โ ๐ ๐๐ โ ๐ฆ ๐ )( ๐ ๐ โ ๐ ๐๐ โ ๐ฆ ๐ )
๐
๐ ๐ ,
= โ โ โ โ ๐1 1 โฆ ๐
1 ๐
โฎ โฑ โฎ ๐๐ผ 1 โฆ ๐
๐ผ ๐
โ โ โ โ , ๐ ๐,๐ = diag (๐ ๐,๐ ๐
), ๐ซ ๐,๐ =
โ โ โ โ ๐ ๐,๐
1 โฎ ๐ ๐,๐
๐ผ
โ โ โ โ nd
๐,๐ ๐
= โ๐พ ๐ =1 ๐
๐,๐
๐,๐ , ๐
๐,๐
๐,๐ = ๐ค ๐,๐
(๐บโ1 ๐
)๐,๐ , ๐
๐,๐ ๐
= ๐ ๐ ๐ โ
โ๐พ ๐=1 ๐
๐,๐ ๐,๐ ( ๐๐ ) ๐ โ๐พ
๐=1 ๐ ๐,๐ ๐,๐
.
ppendix B. Parameter optimization
Here we describe how we perform the optimization of ๐ ( ๐ฝ, ๐ฝ๐๐๐
|๐ )ith respect to ๐ฝ and ๐ฝ๐๐๐ in the augmented model of Sec. 3 with
theecoder outputs ๐ ๐ ( ๐ก ) all clamped to value 1. In that case,
the model cane reformulated in the same form as the original SAMSEG
model, so thathe same optimization strategy can be used. In
particular, lesions can beonsidered to form an extra class (with
index ๐พ + 1 ) in a SAMSEG modelith ๐พ + 1 labels, provided that the
mesh vertex label probabilities
ฬ ๐ ๐ =
{ ๐ฝ๐ if ๐ = ๐พ + 1 (lesion) , ๐ผ๐ ๐ ( ๐ฝ๐ โ 1) otherwise .
re used instead of the original ๐ผ๐ ๐ โs in the atlas
interpolation model of
q. (1) . The optimization described in Appendix A does require
one modifi-
ation because of the prior ๐ ( ๐ฝ๐๐๐ |๐ฝ๐ ) binding the means and
variancesf the WM and lesion classes together. The following
altered updatequations for these parameters guarantee that the EM
lower bound, andherefore the objective function, is improved in
each iteration of theEM algorithm:
๐ ๐ โ
( ๐ ๐ ๐ ๐ +
๐๐ ๐ ๐
๐ + ๐ ๐ ๐ ๐บ๐ ๐ ๐บโ1 ๐๐๐
) โ1 (
๐ ๐ ๐ ๐ฆ ๐ ๐ + ๐๐ ๐ ๐
๐ + ๐ ๐ ๐ ๐บ๐ ๐ ๐บโ1 ๐๐๐ ๐ฆ ๐๐๐
) ,
๐ ๐ โ ๐ ๐ ๐ ๐ ๐ ๐ + ๐บ๐๐๐ ๐บโ1 ๐ ๐ ๐ฟ๐๐๐ ๐ ๐ ๐ + ๐ ๐๐๐ + ๐ + 2
,
๐๐๐ โ ๐ ๐๐๐ ๐ฆ ๐๐๐ + ๐๐๐ ๐
๐ ๐๐๐ + ๐,
๐๐๐ โ ๐ฟ๐๐๐ + ๐๐
๐บ๐ ๐
๐ ๐๐๐ + ๐,
here ๐ฟ๐๐๐ = ๐ ๐๐๐ ๐
๐ ๐๐๐ + ๐( ๐ฆ ๐๐๐ โ ๐๐ ๐ )( ๐ฆ ๐๐๐ โ ๐๐ ๐ ) ๐ + ๐ ๐๐๐ ๐ ๐๐๐ .
ppendix C. Estimating lesion probabilities
We here describe how we we approximate ๐ ( ๐ง ๐ = 1 |๐ ๐ , ๏ฟฝฬ๏ฟฝ)
using Montearlo sampling. We use a Markov chain Monte Carlo (MCMC)
approacho sample triplets { ๐ฝ( ๐ )
๐๐๐ , ๐ณ ( ๐ ) , ๐ก ( ๐ ) } from the distribution ๐ ( ๐ฝ๐๐๐ , ๐ณ, ๐ก
|๐ , ๏ฟฝฬ๏ฟฝ) :
tarting from an initial lesion segmentation ๐ณ (0) obtained from
the pa-ameter estimation procedure described in Appendix B , we use
a blockedibbs sampler in which each variable is updated conditioned
on thether ones: ( ๐ +1) ๐๐๐
โผ ๐ ( ๐บ๐๐๐ |๐ , ๏ฟฝฬ๏ฟฝ, ๐ณ ( ๐ ) ) = IW
(๐บ๐๐๐
|||๐ฟ( ๐ ) ๐๐๐ + ๐๐
๏ฟฝฬ๏ฟฝ๐ ๐ , ๐ ( ๐ ) ๐๐๐ + ๐ โ ๐ โ 2 )๐( ๐ +1)
๐๐๐
โผ ๐ ( ๐๐๐๐ |๐ , ๏ฟฝฬ๏ฟฝ, ๐ณ ( ๐ ) , ๐บ( ๐ +1) ๐๐๐ ) =
( ๐๐๐๐
|||| ๐ ( ๐ ) ๐๐๐ ๐ฆ ( ๐ ) ๐๐๐
+ ๐๏ฟฝฬ๏ฟฝ๐ ๐ ๐
( ๐ ) ๐๐๐
+ ๐,
๐บ( ๐ +1) ๐๐๐
๐ ( ๐ ) ๐๐๐
+ ๐
) ๐ก ( ๐ +1) โผ ๐ ( ๐ก |๐ณ ( ๐ ) ) โ ( ๐ก |||๐๐( ๐ณ ( ๐ ) ) , diag (
๐2 ๐( ๐ณ ( ๐ ) )) )๐ณ ( ๐ +1) โผ ๐ ( ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ, ๐ก ( ๐ +1) , ๐ฝ( ๐
+1)
๐๐๐ ) =
๐ผ โ๐ =1 ๐ ( ๐ง ๐ |๐ ๐ , ๏ฟฝฬ๏ฟฝ, ๐ก ( ๐ +1) , ๐ฝ( ๐ +1) ๐๐๐ ) ,
here we use the encoder variational approximation obtained
duringhe training of the lesion shape model (see Sec. 3.1.2 ) to
sample from ๐กn the next-to-last step, and
( ๐ง ๐ = 1 |๐ ๐ , ๏ฟฝฬ๏ฟฝ, ๐ก , ๐ฝ๐๐๐ ) = ( ๐ ๐ |๐๐๐๐ + ๐ ๐๐ , ๐บ๐๐๐ ) ๐
๐ ( ๐ก ) ๐๐ ( ฬ๐ฝ๐ฅ ) โ๐พ ๐ ๐ =1
โ1 ๐ง โฒ๐ =0 ๐ ( ๐ ๐ |๐ ๐ , ๐ง โฒ๐ , ๏ฟฝฬ๏ฟฝ๐ฅ , ๐ฝ๐๐๐ ) ๐ ( ๐ง โฒ๐ |๏ฟฝฬ๏ฟฝ๐ฅ ,
๐ก ) ๐ ( ๐ ๐ |๏ฟฝฬ๏ฟฝ๐ฅ )
n the last step. In these equations, the variables ๐ ( ๐ ) ๐๐๐ ,
๐ฆ ( ๐ )
๐๐๐ , ๐ ( ๐ )
๐๐๐ and ๐ฟ( ๐ )
๐๐๐
re as defined before, but using voxel assignments ๐ค ๐,๐๐๐ = ๐ง (
๐ ) ๐
. Once ๐amples are obtained, we approximate ๐ ( ๐ง ๐ = 1 |๐ ๐ ,
๏ฟฝฬ๏ฟฝ) as ( ๐ง ๐ = 1 |๐ ๐ , ๏ฟฝฬ๏ฟฝ) โ 1 ๐ ๐ โ
๐ =1 ๐ ( ๐ง ๐ = 1 |๐ ๐ , ๏ฟฝฬ๏ฟฝ, ๐ก ( ๐ ) , ๐ฝ( ๐ ) ๐๐๐ ) .
In our implementation, we use ๐ = 50 samples, obtained after
dis-arding the first 50 sweeps of the sampler (so-called โburn-in โ
phase).he algorithm repeatedly invokes the decoder and encoder
networks ofhe lesion shape model described in Sec. 3.1.2 . Since
this shape modelas trained in a specific isotropic space, the
algorithm requires tran-
itioning between this training space and subject space using an
affineransformation. This is accomplished by resampling the input
and out-ut of the encoder and decoder, respectively, using linear
interpolation.
upplementary material
Supplementary material associated with this article can be
found, inhe online version, at 10.1016/j.neuroimage.2020.117471
eferences
badi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro,
C., Corrado, G. S., Davis, A.,Dean, J., Devin, M., Ghemawat, S.,
Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia,
https://doi.org/10.1016/j.neuroimage.2020.117471
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021)
117471
A
A
A
A
A
B
B
B
B
B
B
C
C
C
C
C
C
C
C
D
D
F
FG
G
G
G
GG
G
H
HI
J
K
KKL
L
M
M
M
N
P
P
P
R
R
R
R
R
Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mane,
D., Monga, R., Moore,S., Murray, D., Olah, C., Schuster, M.,
Shlens, J., Steiner, B., Sutskever, I., Talwar,K., Tucker, P.,
Vanhoucke, V., Vasudevan, V., Viegas, F., Vinyals, O., Warden,
P.,Wattenberg, M., Wicke, M., Yu, Y., Zheng, X., 2015. Tensorflow:
large-scale machinelearning on heterogeneous distributed systems.
1603.04467 .
delman, G. , Rane, S.G. , Villa, K.F. , 2013. The cost burden of
multiple sclerosis in theUnited States: a systematic review of the
literature. J. Med. Econ. 16 (5), 639โ647 .
รฏt-Ali, L.S. , Prima, S. , Hellier, P. , Carsin, B. , Edan, G. ,
Barillot, C. , 2005. STREM: A robustmultidimensional parametric
method to segment MS lesions in MRI. In: Lecture Notesin Computer
Science (including subseries Lecture Notes in Artificial
Intelligence andLecture Notes in Bioinformatics), 3749, pp. 409โ416
.
nderson, L. , Holden, S. , Davis, B. , Prescott, E. , Charrier,
C. , Bunce, N. , Firmin, D. ,Wonke, B. , Porter, J. , Walker, J. ,
Pennell, D. , 2001. Cardiovascular T2-star (T2 โ ) mag-netic
resonance for the early diagnosis of myocardial iron overload.
Euro. Heart J. 22(23), 2171โ2179 .
shburner, J. , Andersson, J.L. , Fristen, K.J. , 2000. Image
registration using a symmetricprior โ in three dimensions. Hum.
Brain Mapp. 9 (4), 212โ225 .
zevedo, C.J. , Cen, S.Y. , Khadka, S. , Liu, S. , Kornak, J. ,
Shi, Y. , Zheng, L. , Hauser, S.L. ,Pelletier, D. , 2018. Thalamic
atrophy in multiple sclerosis: a magnetic resonanceimaging marker
of neurodegeneration throughout disease. Ann. Neurol. 83 (2),
223โ234 .
akshi, R. , Thompson, A.J. , Rocca, M.A. , Pelletier, D. ,
Dousset, V. , Barkhof, F. , Inglese, M. ,Guttmann, C.R. ,
Horsfield, M.A. , Filippi, M. , 2008. MRI in multiple sclerosis:
currentstatus and future prospects. Lancet Neurol. 7 (7), 615โ625
.
arkhof, F. , Calabresi, P.A. , Miller, D.H. , Reingold, S.C. ,
2009. Imaging outcomes forneuroprotection and repair in multiple
sclerosis trials. Nat. Rev. Neurol. 5 (5), 256โ266 .
attaglini, M. , Jenkinson, M. , De Stefano, N. , 2012.
Evaluating and reducing the impactof white matter lesions on brain
volume measurements. Hum. Brain Mapp. 33 (9),2062โ2071 .
azin, P.-L. , Pham, D.L. , 2008. Homeomorphic brain image
segmentation with topologicaland statistical atlases. Med. Image
Anal. 12 (5), 616โ625 .
lystad, I. , Hรฅkansson, I. , Tisell, A. , Ernerudh, J. , Smedby,
ร. , Lundberg, P. , Larsson, E.-M. ,2015. Quantitative MRI for
analysis of active multiple sclerosis lesions
withoutgadolinium-based contrast agent. Am. J. Neuroradiol. 37 (1),
94โ100 .
ricq, S. , Collet, C. , Armspach, J.P. , 2008. Lesions detection
on 3D brain MRI using trim-mmed likelihood estimator and
probabilistic atlas. In: Proceedings of the 2008 FifthIEEE
International Symposium on Biomedical Imaging: From Nano to Macro,
Proceed-ings, ISBI, pp. 93โ96 .
arass, A. , Roy, S. , Gherman, A. , Reinhold, J.C. , Jesson, A.
, Arbel, T. , Maier, O. , Han-dels, H. , Ghafoorian, M. , Platel,
B. , Birenbaum, A. , Greenspan, H. , Pham, D.L. ,Crainiceanu, C.M.
, Calabresi, P.A. , Prince, J.L. , Roncal, W.R. , Shinohara, R.T. ,
Oguz, I. ,2020. Evaluating white matter lesion segmentations with
refined Sรธrensen-Dice anal-ysis. Sci. Rep. 10, 1โ19 .
arass, A. , Roy, S. , Jog, A. , Cuzzocreo, J.L. , Magrath, E. ,
Gherman, A. , Button, J. ,Nguyen, J. , Prados, F. , Sudre, C.H. ,
Cardoso, M.J. , Cawley, N. , Ciccarelli, O. , Wheel-er-Kingshott,
C.A.M. , Ourselin, S. , Catanese, L. , Deshpande, H. , Maurel, P. ,
Com-mowick, O. , Barillot, C. , Tomas-Fernandez, X. , Warfield,
S.K. , Vaidya, S. , Chun-duru, A. , Muthuganapathy, R. ,
Krishnamurthi, G. , Jesson, A. , Arbel, T. , Maier, O. , Han-dels,
H. , Iheme, L.O. , Unay, D. , Jain, S. , Sima, D.M. , Smeets, D. ,
Ghafoorian, M. , Pla-tel, B. , Birenbaum, A. , Greenspan, H. ,
Bazin, P.-L. , Calabresi, P.A. , Crainiceanu, C.M. ,Ellingsen, L.M.
, Reich, D.S. , Prince, J.L. , Pham, D.L. , 2017. Longitudinal
multiple scle-rosis lesion segmentation: resource & challenge
HHS public access. NeuroImage 148,77โ102 .
eccarelli, A. , Jackson, J. , Tauhid, S. , Arora, A. , Gorky, J.
, DellโOglio, E. , Bakshi, A. , Chit-nis, T. , Khoury, S.J. ,
Weiner, H.L. , et al. , 2012. The impact of lesion in-painting
andregistration methods on voxel-based morphometry in detecting
regional cerebral graymatter atrophy in multiple sclerosis. Am. J.
Neuroradiol. 33 (8), 1579โ1585 .
erri, S. , Hoopes, A. , Greve, D.N. , Mรผhlau, M. , Van Leemput,
K. , 2020. A longitudinalmethod for simultaneous whole-brain and
lesion segmentation in multiple sclerosis.In: Proceedings of the
Third International Workshop in Machine Learning in
ClinicalNeuroimaging (accepted) .
hard, D.T. , Griffin, C.M. , Parker, G.J.M. , Kapoor, R. ,
Thompson, A.J. , Miller, D.H. , 2002.Brain atrophy in clinically
early relapsing-remitting multiple sclerosis. Brain 125 (2),327โ337
.
hard, D.T. , Jackson, J.S. , Miller, D.H. , Wheeler-Kingshott,
C.A. , 2010. Reducing the im-pact of white matter lesions on
automated measures of brain gray and white mattervolumes. J. Magn.
Resonanc. Imaging 32 (1), 223โ228 .
ommowick, O. , Istace, A.