-
lt
c, W
275759
Available online 3 January 2015
Keywords:Image segmentationMulti-modality dataInfant brain
imageConvolutional neural networks
Xue et al., 2007;Wang. This assumptionmaystributions of WM
andmyelination. In addi-
NeuroImage 108 (2015) 214224
Contents lists available at ScienceDirect
NeuroIm
e lWareld, 2009). There are three distinct WM/GM contrast
patterns inchronological order, which are infantile (birth),
isointense, and adult-like (10 months and onward) (Paus et al.,
2001). In this work, we fo-
tion, many previous methods segmented the tissues using a single
T1or T2 images or the combination of T1 and T2 images (Kim et
al.,2013; Leroy et al., 2011; Nishida et al., 2006; Weisenfeld et
al., 2006a,and disease (Li et al., 2013a, 2013b; Nie et al., 2012;
Yap et al., 2011;Gilmore et al., 2012). It is widely accepted that
the segmentation of in-fant brains is more difcult than that of the
adult brains. This is mainlydue to the lower tissue contrast in
early-stage brains (Weisenfeld and
bution or the mixture of Gaussian distributions (et al., 2011;
Shi et al., 2010a; Cardoso et al., 2013)not be valid for the
isointense stage, since the diGM largely overlap due to early
maturation andDuring the rst year of postnatal human brain
development, thebrain tissues grow quickly, and the cognitive and
motor functionsundergo a wide range of development (Zilles et al.,
1988; Paus et al.,2001; Fan et al., 2011). The segmentation of
infant brain tissues intowhite matter (WM), gray matter (GM), and
cerebrospinal uid (CSF)is of great importance for studying early
brain development in health
makes the tissue segmentation problem very challenging (Shi et
al.,2010b).
Currently, most of priormethods for infant brainMR image
segmen-tation have focused on the infantile or adult-like stages
(Cardoso et al.,2013; Gui et al., 2012; Shi et al., 2010a; Song et
al., 2007; Wang et al.,2011, 2013; Weisenfeld and Wareld, 2009; Xue
et al., 2007). They as-sumed that each tissue class can bemodeled
by a single Gaussian distri-cused on the isointense stage that
correspondproximately 68 months. In this stage, WM an
Corresponding authors.E-mail addresses: [email protected] (S. Ji),
dgshen@med.
http://dx.doi.org/10.1016/j.neuroimage.2014.12.0611053-8119/
2014 Elsevier Inc. All rights reserved.same level of intensity in
both T1 and T2 MR images. This propertyIntroductionDeep learninguid
(CSF) plays an important role in studying early brain development
in health and disease. In the isointensestage (approximately 68
months of age), WM and GM exhibit similar levels of intensity in
both T1 and T2 MRimages, making the tissue segmentation very
challenging. Only a small number of existing methods have
beendesigned for tissue segmentation in this isointense stage;
however, they only used a single T1 or T2 images, orthe combination
of T1 and T2 images. In this paper, we propose to use deep
convolutional neural networks(CNNs) for segmenting isointense stage
brain tissues using multi-modality MR images. CNNs are a type
ofdeep models in which trainable lters and local neighborhood
pooling operations are applied alternatingly onthe raw input
images, resulting in a hierarchy of increasingly complex features.
Specically, we used multi-modality information from T1, T2, and
fractional anisotropy (FA) images as inputs and then generated
thesegmentation maps as outputs. The multiple intermediate layers
applied convolution, pooling, normalization,and other operations to
capture the highly nonlinear mappings between inputs and outputs.
We compared theperformance of our approach with that of the
commonly used segmentation methods on a set of manually seg-mented
isointense stage brain images. Results showed that our
proposedmodel signicantly outperformed priormethods on infant brain
tissue segmentation. In addition, our results indicated that
integration ofmulti-modalityimages led to signicant performance
improvement.
2014 Elsevier Inc. All rights reserved.Article history:Accepted
23 December 2014
The segmentation of infant brain tissue images into white matter
(WM), gray matter (GM), and cerebrospinala b s t r a c ta r t i c l
e i n f oDeep convolutional neural networks for mubrain image
segmentation
Wenlu Zhang a, Rongjian Li a, Houtao Deng b, Li Wanga Department
of Computer Science, Old Dominion University, Norfolk, VA 23529,
USAb Instacart, San Francisco, CA 94107, USAc IDEA Lab, Department
of Radiology and BRIC, University of North Carolina at Chapel Hill,
NCd MRI Lab, Department of Radiology and BRIC, University of North
Carolina at Chapel Hill, NC 2e Department of Brain and Cognitive
Engineering, Korea University, Seoul, Republic of Korea
j ourna l homepage: www.s to the infant age of ap-d GM exhibit
almost the
unc.edu (D. Shen).i-modality isointense infant
eili Lin d, Shuiwang Ji a,, Dinggang Shen c,e,
99, USA9, USA
age
sev ie r .com/ locate /yn img2006b). It has been shown that the
fractional anisotropy (FA) imagesfrom diffusion tensor imaging
provide rich information of major berbundles (Liu et al., 2007),
especially in the middle of the rst year(around 68 months of age).
The studies in Wang et al. (2014, 2012)demonstrated that the
complementary information from multiple
-
large margin when the size of patch increased. This is
consistent withthe fact that CNNs weight pixels differently based
on their distance to
215W. Zhang et al. / NeuroImage 108 (2015) 214224the center
pixel.
Material and methods
Data acquisition and image preprocessing
The experiments were performed with the approval of
InstitutionalReview Board (IRB). All the experiments on infants
were approved bytheir parents with written forms. We acquired T1,
T2, and diffusion-weighted MR images of 10 healthy infants using a
Siemens 3T head-only MR scanner. These infants were asleep,
unsedated, tted with earprotection, and their heads were secured in
a vacuum-xation deviceduring the scan. T1 images having 144
sagittal slices were acquiredwith TR/TE as 1900/4.38 ms and a ip
angle of 7 using a resolution of1 1 1 mm3. T2 images having 64
axial slices were acquired withTR/TE as 7380/119 ms and a ip angle
of 150 using a resolution of1.25 1.25 1.95 mm3. Diffusion-weighted
images (DWI) having60 axial slices were acquired with TR/TE as
7680/82 ms using a resolu-tion of 2 2 2 mm3 and 42 non-collinear
diffusion gradients with adiffusion weight of 1000 s/mm2.
T2 images and fractional anisotropy (FA) images, derived
fromdistortion-corrected DWI, were rst rigidly aligned with the T1
imageand further up-sampled into an isotropic grid with a
resolution of1 1 1 mm3. A rescanning was executed when the data was
accom-panied with moderate or severe motion artifacts (Blumenthal
et al.,2002). We then applied intensity inhomogeneity correction
(Sledet al., 1998) on both T1 and aligned T2 images (but not for FA
imagesince it is not needed). After that, we applied the skull
stripping (Shiet al., 2012) and removal of cerebellum and brain
stem on the T1image by using in-house tools. In this way, we
obtained a brain maskwithout the skull, cerebellum and brain stem.
With this brain mask,we nally removed the skull, cerebellum and
brain stem also from theimage modalities was benecial to deal with
the insufcient tissuecontrast.
To overcome the above-mentioned difculties, we consideredthe
deep convolutional neural networks (CNNs) in this work. CNNs(LeCun
et al., 1998a; Krizhevsky et al., 2012) are a type of
multi-layer,fully trainable models that can capture highly
nonlinear mappings be-tween inputs and outputs. These models were
originally motivatedfrom computer vision problems and thus are
intrinsically suitable forimage-related applications. In this work,
we proposed to employ CNNsfor segmenting infant tissue images in
the isointense stage. One appeal-ing property of CNNs is that it
can naturally integrate and combinemulti-modality brain images in
determining the segmentation. OurCNNs took complementary and
multi-modality information from T1,T2, and FA images as inputs and
then generated the segmentationmaps as outputs. The multiple
intermediate layers applied convolution,pooling, normalization, and
other operations to transform the input tothe output. The networks
contain millions of trainable parameters thatwere adjusted on a set
of manually segmented data. Specically, thenetworks took patches
centered at a pixel as inputs and produced thetissue class of the
center pixel as the output. This enabled the segmenta-tion results
of a pixel to be determined by all pixels in the neighborhood.In
addition, due to the convolution operations applied at
intermediatelayers, nearby pixels contribute more to the
segmentation results thanthose that are far away.We compared the
performance of our approachwith that of the commonly used
segmentation methods. Resultsshowed that our proposed model
signicantly outperformed priormethods on infant brain tissue
segmentation. In addition, our resultsindicated that the
integration of multi-modality images led to signi-cant performance
improvement. Furthermore, we showed that ourCNN-based approach
outperformed other methods at increasinglyaligned T2 and FA
images.To generate manual segmentation, an initial segmentation was
ob-tained with publicly available infant brain segmentation
software,IBEAT (Dai et al., 2013). Then, manual editing was
carefully performedby an experienced rater according to the T1, T2
and FA images forcorrecting possible segmentation errors. ITK-SNAP
(Yushkevich et al.,2006) (www.itksnap.org) was particularly used
for interactive manualediting. For each infant brain image, there
are generally 100 axial slices;we randomly selected slices from
themiddle regions (40th60th slices)for manual segmentation. This
work only used these manually seg-mented slices. Since we were not
able to obtain the FA images of 2 sub-jects, we only used the
remaining 8 subjects in this work. Note thatpixels are treated as
samples in segmentation tasks. For each subject,we generated more
than 10,000 patches centered at each pixel fromT1, T2, and FA
images. These patches were considered as training andtesting
samples in our study.
Deep CNN for multi-modality brain image segmentation
Deep learningmodels are a class of machines that can learn a
hierar-chy of features by building high-level features from
low-level ones. Theconvolutional neural networks (CNNs) (LeCun et
al., 1998a; Krizhevskyet al., 2012) are a type of deep models, in
which trainable lters andlocal neighborhood pooling operations are
applied alternatingly onthe raw input images, resulting in a
hierarchy of increasingly complexfeatures. One property of CNN is
its capability to capture highly nonlin-ear mappings between inputs
and outputs (LeCun et al., 1998a). Whentrainedwith appropriate
regularization, CNNs can achieve superior per-formance on visual
object recognition and image classication tasks(LeCun et al.,
1998a; Krizhevsky et al., 2012). In addition, CNN has alsobeen used
in a few other applications. In Jain et al. (2007), Jain andSeung
(2009), Turaga et al. (2010), and Helmstaedter et al. (2013),CNNs
were applied to restore and segment the volumetric electron
mi-croscopy images. Ciresan et al. (2013, 2012) applied deep CNNs
to de-tect mitosis in breast histology images by using pixel
classiers basedon patches.
In this work, we proposed to use CNN for segmenting the
infantbrain tissues by combining multi-modality T1, T2, and FA
images.Although CNN has been used for similar tasks in prior
studies, none ofthem has focused on integrating and combining
multi-modality imagedata. Our CNN contained multiple input feature
maps correspondingto different datamodalities, thus providing a
natural formalism for com-bining multi-modality data. Since
different modalities might containcomplementary information, our
experimental results showed thatcombining multi-modality data with
CNN led to improved segmenta-tion performance. Fig. 1 showed a CNN
architecture we developed forsegmenting infant brain images into
white matter (WM), gray matter(GM), and cerebrospinal uid
(CSF).
Deep CNN architectures
In this study, we designed four CNN architectures to segment
infantbrain tissues based on multi-modality MR images. In the
following, weprovided details on one of the CNN architectures with
input patch sizeof 13 13 to explain the techniques used in this
work. The detailed ar-chitecture was shown in Fig. 1. This CNN
architecture contained threeinput feature maps corresponding to T1,
T2, and FA image patches of13 13. It then applied three
convolutional layers and one fully con-nected layer. This network
also applied local response normalizationand softmax layers.
The rst convolutional layer contained 64 feature maps. Each of
thefeature maps was connected to all of the three input feature
mapsthrough lters of size 5 5. We used a stride size of one pixel.
This gen-erated feature maps of size 9 9 in this layer. The second
convolutionallayer took the output of the rst convolutional layer
as input andcontained 256 feature maps. Each of the feature maps
was connected
to all of the feature maps in the previous layer through lters
of size
-
5 5. We again used a stride size of one pixel. The third
convolutionallayer contained 768 feature maps of size 1 1. They
were connectedto all feature maps in the previous layer through 5 5
lters. We alsoused a stride size of one pixel in this layer. The
function (Nair andHinton, 2010) was applied after the convolution
operation in all of rec-tied linear unit (ReLU) the convolutional
layers. It has been shown(Krizhevsky et al., 2012) that the use of
ReLU can expedite the training
response normalization and softmax layers have been applied
onthese architectures.We also usedmax-pooling layer for the
architecturewith input patch size of 22 22 after the rst
convolutional layer. Thepooling sizewas set to 2 2 and a stride
size of 2 2wasused. The com-plete details of these architectures
were given in Table 1. The numbersof trainable parameters for these
architectures are 6,577,155,5,947,523, and 5,332,995,
respectively.
Fig. 1. Detailed architecture of the convolutional neural
network taking patches of size 13 13 as inputs.
Norm
aye
orm
216 W. Zhang et al. / NeuroImage 108 (2015) 214224of CNN.In
addition to the convolutional layers, a few other layer types
have
been used in the CNN. Specically, the local response
normalizationscheme was applied after the third convolutional layer
to enforcecompetitions between features at the same spatial
location acrossdifferent feature maps. The fully-connected layer
following the normal-ization layer had 3 outputs that correspond to
the three tissue classes. A3-way softmax layerwas used to generate
a distribution over the 3 classlabels after the output of the
fully-connected layer. Our network mini-mized the cross entropy
loss between the predicted label and groundtruth label. In
addition, we used dropout (Hinton et al., 2012) to learnmore robust
features and reduce overtting. This technique set the out-put of
each neuron to zerowith probability 0.5. The dropoutwas
appliedbefore the fully-connected layers in the CNN architecture of
Fig. 1. Intotal, the number of trainable parameters for this
architecture is5,332,995.
We also considered three other CNN architectures with input
patchsizes of 9 9, 17 17, and 2222. These CNN architectures
consisted ofdifferent numbers of convolutional layers and feature
maps. Both local
Table 1Details of the CNN architectureswith different input
patch sizes used in thiswork. Conv., layer, respectively.
Patch size Layer 1 Layer 2 L
9 9 Layer type Conv. Conv. N
# of feat. maps 256 1024 1024Filter size 5 5 5 5 Conv. stride 1
1 1 1 Input size 9 9 5 5 1 1
13 13 Layer type Conv. Conv. Conv# of feat. maps 64 256
768Filter size 5 5 5 5 5 5Conv. stride 1 1 1 1 1 1Input size 13 13
9 9 5 5
17 17 Layer type Conv. Conv. Conv# of feat. maps 64 128
256Filter size 5 5 5 5 5 5Conv. stride 1 1 1 1 1 1Input size 17 17
13 13 9 9
22 22 Layer type Conv. Pooling Conv# of feat. maps 64 64
256Filter size 5 5 5 5Pooling size 2 2 Pooling stride 2 2 Input
size 22 22 18 18 9 9Model training and calibration
We trained the networks using data consisting of patches
extractedfrom the MR images and the corresponding manual
segmentationground truth images. In this work, we did not consider
the segmenta-tion of background as this is clear from the T1
images. Instead, we fo-cused on segmenting the three tissue types
(GM, WM, and CSF) fromthe foreground. For each foreground pixel, we
extracted three patchescentered at this pixel from T1, T2, and FA
images, respectively. Thethree patches were used as input feature
maps of CNNs. The corre-sponding output was a binary vector of
length 3 indicating the tissueclass to which the pixel belonged.
This procedure generated morethan 10,000 instances, each
corresponding to three patches, from eachsubject. We used
leave-one-subject-out cross validation procedure toevaluate the
segmentation performance. Specically, we used sevenout of the eight
subjects to train the network and used the remainingsubject to
evaluate the performance. The average performance acrossfolds was
reported. All the patches from each training subject are stored
., and Full conn. denote convolutional layers, normalization
layers, and fully-connected
r 3 Layer 4 Layer 5 Layer 6 Layer 7
. Full conn. softmax
3 3 1 1 1 1 1 1 1 1
. Norm. Full conn. softmax 768 3 3 1 1 1 1 1 1 1 1 1 1
. Conv. Norm. Full conn. softmax768 768 3 35 5 1 1 1 1 1 1 5 5 1
1 1 1 1 1
. Conv. Norm. Full conn. softmax768 768 3 35 5 1 1
5 5 1 1 1 1 1 1
-
in a batch le separately, leading to seven batch les in total.
We usedpatches in these seven batches as the input of CNN
consecutively fortraining. Note that patches in each batch le were
presented to thetraining algorithm in random orders as was commonly
used.
The weights in the networks were initialized randomly with
Gauss-iandistributionN(0,1 104) (LeCun et al., 1998b). During
training, theweights were updated by stochastic gradient descent
algorithm with amomentum of 0.9 and a weight decay of 4 104. The
biases inconvolutional layers and fully-connected layer were
initialized to 1.The number of epochs was tuned on a validation set
consisting ofpatches from one randomly selected subject in the
training set. Thelearning rate was set to 4 104 initially.
FollowingKrizhevsky et al. (2012), we rst used the validation
set toobtain a coarse approximation of the optimal epoch by
minimizing thevalidation error. This epoch number was used to train
a model on thetraining and validation sets consisting of seven
subjects. Then the learn-ing rate was reduced by a factor of 10
twice successively, and themodelwas trained for about 10 epochs
each time. By following this procedure,the network with a patch
size of 13 13 was trained for about
370 epochs. The training took less than one day on a Tesla K20c
GPUwith 2496 cores. The networks with other patch sizes were
trained ina similar way. One advantage of using CNN for image
segmentation isthat, at test time, the entire image can be used as
an input to the net-work to produce the segmentation map, and
patch-level prediction isnot needed (Giusti et al., 2013; Ning et
al., 2005). This leads to very ef-cient segmentation at test time.
For example, our CNN models tookabout 50100 s for segmenting an
image of size 256 256.
Results and discussion
Experimental setup
In the experiments, we focused on evaluating our CNN
architecturesfor segmenting the three types of infant brain
tissues. We formulatedthe prediction of brain tissue classes as a
three-class classication task.For comparison purposes, we also
implemented two other commonlyused classication methods, namely the
support vector machine(SVM) and the random forest (RF) (Breiman,
2001)methods. The linear
0.86
0.85
0.84
0.83
0.82
0.81
0.7
0.6
0.5
0.4
0.3
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.88
0.86
0.84
0.82
0.8
9 13 17 22
9 13 17 22
9 13 17 22
9 13 17 22
Patch size
Dic
e ra
tio fo
r CSF
Dic
e ra
tio fo
r GM
MH
D fo
r CSF
MH
D fo
r GM
Patch size
ferenpatize odere
217W. Zhang et al. / NeuroImage 108 (2015) 2142240.88
0.86
0.84
0.82
0.8
0.789 13 17 22
Dic
e ra
tio fo
r WM
Patch size
Patch size
Fig. 2. Box plots of the segmentation performance achieved by
CNNs over 8 subjects for difeach of the three tissue types, and
four different architectures are trained by using differentusing
leave-one-subject-out cross validation and 8 test results are
collected for each patch sand 75th percentiles. Thewhiskers extend
to theminimumandmaximumvalues not consi
measured by MHD using the same conguration.0.7
0.6
0.5
0.4
0.3
0.2
9 13 17 22
MH
D fo
r WM
Patch size
Patch size
t patch sizes. Each plot in the rst column uses Dice ratio to
measure the performance forch sizes of 9 9, 13 13, 17 17, and 22
22, respectively. The performance is evaluatedf each plot. The
central mark represents themedian, the edges of the box denote the
25thd outliers, and outliers are plotted individually. The plots in
the right column are the results
-
SVM was used in our experiments, as other kernels yielded lower
per-formance empirically. The performance of SVM was generated
by
modied Hausdorff distance (MHD). Supposing that C and D aretwo
sets of positive pixels identied manually and computationally,
re-
Fig. 3. Visualization of the 64 lters in the rst convolutional
layer for the model with an input patch size of 13 13.
218 W. Zhang et al. / NeuroImage 108 (2015) 214224tuning the
regularization parameters using cross validation. An RF is
atree-based ensemble model in which a set of randomized trees
arebuilt and the nal decision is made using majority voting by all
trees.This method has been used in image-related applications (Amit
andGeman, 1997), including medical image segmentation (Criminisi
andShotton, 2013; Criminisi et al., 2012). In this work, we used
RFs contain-ing 100 trees, and each treewas grown fully and
unpruned. The numberof features at each node randomly selected to
compete for the best splitwas set to the square root of the total
number of features. We used therandomForest R package (Liaw
andWiener, 2002) in the experiments.We reshaped the raw training
patches into vectors whose elementswere considered as the input
features of SVM andRF.We also comparedour methods with two common
image segmentation methods, namelythe coupled level set (CLS) (Wang
et al., 2011) and the majority voting(MV) methods. Note that the
method based on local dictionaries ofpatches proposed inWang et al.
(2014) requires the images of differentsubjects to be registered,
since a local dictionary was constructed byusing patches extracted
from the corresponding locations on the train-ing images. We thus
did not compare our methods with the one inWang et al. (2014).
To evaluate the segmentation performance, we used the Dice
ratio(DR) to quantitatively measure the segmentation accuracy.
Specically,let A and B denote the binary segmentation labels
generated manuallyand computationally, respectively, about one
tissue class on pixels forcertain subject. The Dice ratio is dened
as
DR A; B 2jABjAj jBj j ;
where |A| denotes the number of positive elements in the binary
seg-mentation A, and |A B| is the number of shared positive
elements byA and B. The Dice ratio lies in [0, 1], and a larger
value indicates a highersegmentation accuracy. We also used another
measure known as theTable 2Comparison of segmentation performance
over different imagemodalities achieved by CNNwitdifferent tissue
segmentation tasks is highlighted.
Sub. 1 Sub. 2 Sub. 3
CSF T1 0.7797 0.7824 0.7928T2 0.6906 0.7238 0.7459FA 0.6021
0.5838 0.6068All 0.8323 0.8314 0.8304
GM T1 0.8123 0.8039 0.8001T2 0.6094 0.5884 0.6026FA 0.7256
0.7459 0.7282All 0.8531 0.8572 0.8848
WM T1 0.8241 0.7476 0.8269T2 0.6942 0.7021 0.7181FA 0.8082
0.6816 0.6627All 0.8798 0.8116 0.8824spectively, about one tissue
class for a certain subject, the MHD is de-ned as
MHD C; D max d C; D ; d D; C ;
where d(C,D)=maxc Cd(c,D), and the distance between a point c
anda set of points D is dened as d(c, D)=mind D||c d||. A smaller
valueindicates a higher proximity of two point sets, thus implying
a highersegmentation accuracy.
Comparison of different CNN architectures
The nonlinear relationship between inputs and outputs of a CNN
isrepresented by its multi-layer architecture using convolution,
poolingand normalization. We rst studied the impact of different
CNN archi-tectures on segmentation accuracy. We devised four
different architec-tures, and the detailed congurations have been
described in Table 1.The classication performance of these
architectures was reported inFig. 2 using box plots. It can be
observed from the results that the predic-tive performance is
generally higher for the architectures with inputpatch sizes of 13
13 and 17 17. This result is consistent with thefact that networks
with more convolutional layers and feature mapstend to have a
deeper hierarchical structure and more trainable param-eters. Thus,
these networks are capable of capturing the complex rela-tionship
between input and output. We can also observe that thearchitecture
with input patch size of 22 22 did not generate substan-tially
higher predictive performance, suggesting that the pooling
opera-tion might not be suitable for the data we used. In the
following, wefocused on evaluating the performance of CNN with
input patch sizeof 13 13. To examine the patterns captured by the
CNNmodels, we vi-sualized the 64 lters in the rst convolutional
layer for the model withan input patch size of 13 13 in Fig. 3.
Similar to the observation inh input patch size of 13 13 on each
subject in terms of Dice ratio. The best performance of
Sub. 4 Sub. 7 Sub. 8 Sub. 9 Sub. 10
0.8072 0.7931 0.8076 0.7610 0.81760.7614 0.7792 0.7737 0.7328
0.79340.5378 0.6076 0.6001 0.6374 0.54080.8373 0.8482 0.8492 0.8211
0.83390.7529 0.7693 0.7499 0.7273 0.81460.4973 0.5897 0.6142 0.6027
0.62660.6065 0.7224 0.7126 0.6991 0.78270.8184 0.8119 0.8652 0.8628
0.86070.7751 0.8006 0.8223 0.7527 0.79960.6318 0.6917 0.7001 0.6892
0.69790.7238 0.7824 0.7774 0.8131 0.81630.8489 0.8689 0.8677 0.8742
0.8760
-
(Zeiler and Fergus (2014), these lters capture primitive image
featuressuch as edges and corners.
Effectiveness of integrating multi-modality data
To demonstrate the effectiveness of integratingmulti-modality
data,we considered the performance achieved by each single image
modali-ty. Specically, the T1, T2, and FA images of each
subjectwere separatelyused as the input of the architecture with a
patch size of 13 13in Table 1. The segmentation performance
achieved using different
modalities was presented in Tables 2 and 3. It can be observed
thatthe combination of different image modalities invariably
yielded higherperformance than any of the single imagemodality.We
can also see thatthe T1 images produced the highest performance
among the threemodalities. This suggests that the T1 images are
most informative indiscriminating the three tissue types. Another
interesting observationis that the FA images are very informative
in distinguishing GMand WM, but they achieved low performance on
CSF. This might bebecause the anisotropic diffusion is hardly
detectable using FA forliquids such as cerebrospinal uid (CSF) in
brain. In contrast, T2 images
Table 3Comparison of segmentation performance over different
image modalities achieved by CNN with input patch size of 13 13 on
each subject in terms of modied Hausdorff distance(MHD). The best
performance of different tissue segmentation tasks is
highlighted.
Sub. 1 Sub. 2 Sub. 3 Sub. 4 Sub. 7 Sub. 8 Sub. 9 Sub. 10
CSF T1 0.7245 0.6724 0.6428 0.6072 0.5537 0.5027 0.6021 0.4478T2
0.9048 0.8228 0.7932 0.6978 0.6004 0.5938 0.6989 0.5457FA 1.2446
1.3895 1.3348 1.4277 1.3271 1.4297 0.9312 1.3389All 0.6320 0.3293
0.3659 0.4395 0.4268 0.4482 0.4970 0.3442
GM T1 0.5069 0.4237 0.4892 0.6528 0.6187 0.6691 0.6843 0.3971T2
1.2372 1.3728 1.2871 1.8421 1.5980 1.2963 1.3325 1.2241FA 0.6839
0.6781 0.6538 0.9479 0.6843 0.6945 0.7461 0.4322All 0.2067 0.2490
0.2010 0.2964 0.4398 0.2367 0.1839 0.1719
WM T1 0.4796 0.6526 0.4232 0.6455 0.4726 0.4271 0.5023 0.4047T2
0.9171 0.7381 0.7974 1.0043 0.9423 0.7169 0.8274 0.8934FA 0.4162
0.8924 1.0258 0.7523 0.6228 0.5428 0.4238 0.5016All 0.2258 0.4362
0.2401 0.3275 0.2504 0.3050 0.3029 0.2271
Table 4Segmentation performance in terms of Dice ratio achieved
by the convolutional neural network (CNN), random forest (RF),
support vector machine (SVM), coupled level sets (CLS), andmajority
voting (MV). The highest performance in each case was highlighted,
and the statistical signicance of the results were given in Table
6.
Sub. 1 Sub. 2 Sub. 3 Sub. 4 Sub. 7 Sub. 8 Sub. 9 Sub. 10
CSF CNN 0.8323 0.8314 0.8304 0.8373 0.8482 0.8492 0.8211
0.8339RF 0.8192 0.8135 0.8323 0.8090 0.8306 0.8457 0.7904 0.7955SVM
0.7409 0.7677 0.7733 0.7429 0.7006 0.7837 0.7243 0.7333CLS 0.8064
0.8152 0.732 0.8614 0.8397 0.8238 0.8087 0.828MV 0.7072 0.6926
0.6826 0.6348 0.6313 0.6136 0.6904 0.692
GM CNN 0.8531 0.8572 0.8848 0.8184 0.8119 0.8652 0.8628
0.8607
219W. Zhang et al. / NeuroImage 108 (2015) 214224RF 0.8288
0.8482 0.8772SVM 0.7933 0.7991 0.8294CLS 0.8298 0.8389 0.8498MV
0.849 0.8442 0.8525
WM CNN 0.8798 0.8116 0.8824RF 0.8612 0.7816 0.8687SVM 0.8172
0.7404 0.7623CLS 0.8383 0.8054 0.7998MV 0.8631 0.8002 0.8504Table
5Segmentation performance in terms of modied Hausdorff distance
(MHD) achieved by the ccoupled level sets (CLS), and majority
voting (MV). The best performance in each case was hig
Sub. 1 Sub. 2 Sub. 3
CSF CNN 0.6320 0.3293 0.3659RF 1.0419 0.5914 0.6802SVM 1.1426
0.8867 0.7571CLS 0.6420 0.3487 0.8151MV 1.5287 1.3788 1.3566
GM CNN 0.2067 0.2490 0.2010RF 0.2771 0.2739 0.1524SVM 0.5247
0.2916 0.3566CLS 0.3615 0.2950 0.2683MV 0.2834 0.2743 0.2483
WM CNN 0.2258 0.4362 0.2401RF 0.3022 0.7981 0.2648SVM 0.3218
0.8290 0.5276CLS 0.6320 0.4923 0.7207MV 0.3063 0.5314 0.28240.8078
0.7976 0.8498 0.8461 0.83530.7527 0.7416 0.7996 0.8017 0.80380.8343
0.813 0.8719 0.8612 0.84210.8027 0.7831 0.797 0.8372 0.82990.8489
0.8689 0.8677 0.8742 0.87600.8373 0.8479 0.8575 0.8393 0.83530.8030
0.7997 0.7919 0.7059 0.75850.8238 0.8437 0.8213 0.8297 0.81070.8171
0.8389 0.8373 0.8412 0.8445onvolutional neural network (CNN),
random forest (RF), support vector machine (SVM),hlighted, and the
statistical signicance of the results were given in Table 6.
Sub. 4 Sub. 7 Sub. 8 Sub. 9 Sub. 10
0.4395 0.4268 0.4482 0.4970 0.34420.9042 0.3610 0.4935 0.9151
0.69490.9014 1.0020 0.4743 1.1789 0.88660.4875 0.4987 0.4939 0.4717
0.49861.5178 1.4157 2.1068 1.1156 1.28890.2964 0.4398 0.2367 0.1839
0.17190.3033 0.3429 0.2315 0.2517 0.27080.4015 0.6308 0.3809 0.4466
0.43620.3577 0.3872 0.2536 0.2530 0.36550.3395 0.4316 0.3569 0.2687
0.33240.3275 0.2504 0.3050 0.3029 0.22710.3201 0.5020 0.3321 0.3268
0.39090.5751 0.4784 0.4445 0.9407 0.60290.5425 0.6947 0.4485 0.5627
0.72160.2907 0.2922 0.3323 0.3271 0.3751
-
are more powerful for capturing CSF instead of GM and WM. These
re-sults demonstrated that certain modality is more informative
indistinguishing certain tissue types, and combination of all
modalitiesleads to improved segmentation performance.
Comparison with other methods
In order to provide a comprehensive and quantitative evaluation
ofthe proposed method, we reported the segmentation performance
onall 8 subjects using leave-one-subject-out cross validation. The
perfor-mance of CNN, RF, SVM, CLS, and MV was reported in Tables 4
and 5using the Dice ratio and MHD, respectively. We can observe
fromthese two tables that CNN outperformed other methods for
segmenting
Table 6Statistical test results in comparing CNN with RF, SVM,
CLS, and MV, respectively. Wecalculated the p-values byperforming
one-sidedWilcoxon signed rank tests using the per-formance reported
in Tables 4 and 5. We performed the left-sided test for the Dice
ratio,and the right-sided test for the MHD.
CSF GM WM
Dice ratio CNN vs. RF 3.30E03 1.55E04 4.02E04CNN vs. SVM 2.55E05
2.51E09 1.87E04CNN vs. CLS 6.59E02 8.88E02 8.37E04CNN vs. MV
6.22E06 2.50E03 1.71E05
MHD CNN vs. RF 2.30E03 2.72E01 2.16E02CNN vs. SVM 1.39E04
3.67E04 7.99E04CNN vs. CLS 5.75E02 1.85E02 5.57E04CNN vs. MV
1.09E05 4.30E03 1.52E02
Fig. 4. Comparison of the segmentation results with themanually
generated segmentation on Sond row shows the manual segmentations
(CSF, GM, and WM). The third and fourth rows sho
220 W. Zhang et al. / NeuroImage 108 (2015) 214224ubject 1. The
rst row shows the original multi-modality data (T1, T2 and FA), and
the sec-
w the segmentation results by CNN and RF, respectively.
-
221W. Zhang et al. / NeuroImage 108 (2015) 214224all three types
of brain tissues in most cases. Specically, CNN couldachieve Dice
ratios as 83.55% 0.94% (CSF), 85.18% 2.45% (GM),and 86.37% 2.34%
(WM) on average over 8 subjects, yielding an over-all value of
85.03% 2.27%. In contrast, RF, SVM, CLS, and MV achievedoverall
Dice ratios of 83.15% 2.52%, 76.95% 3.55%, 82.62% 2.76%,and 77.64%
8.28%, respectively. Meanwhile, CNN also outperformedother methods
in terms of MHD. Specically, CNN could achieveMHDs as 0.4354 0.0979
(CSF), 0.2482 0.0871 (GM), and0.2894 0.0710 (WM), yielding an
overall value of 0.3243 0.1161.In contrast, RF, SVM, CLS, and MV
achieved overall MHDs of 0.4593 0.2506, 0.6424 0.2665, 0.4839
0.1597, and 0.7076 0.5721,respectively.
To assess the statistical signicance of the performance
differences,we performed one-sided Wilcoxon signed rank tests on
both Dice
Fig. 5. Comparison of the segmentation results with themanually
generated segmentation on Sond row shows the manual segmentations
(CSF, GM, and WM). The third and fourth rows shoratio and MHD
produced by the 8 subjects, and the p-values were re-ported in
Table 6. When considering the Dice ratio, we chose the left-sided
test with the alternative hypothesis that the averaged perfor-mance
of CNN is higher than that of either RF, SVM, CLS or MV.
Theright-sided test was considered for MHD.We can see that the
proposedCNN method signicantly outperformed SVM, RF, CLS and MV in
mostcases. These results demonstrated that CNN is effective in
segmentingthe infant brain tissues as compared to other
methods.
In addition to quantitatively demonstrating the advantage of
theproposed CNNmethod, we visually examined the segmentation
resultsof different tissues for two subjects in Figs. 4 and 5. The
original T1, T2,and FA images were shown in the rst row and the
following threerows presented the segmentation results of human
experts, CNN, andRF, respectively. It can be seen that, the
segmentation patterns of CNN
ubject 2. The rst row shows the original multi-modality data
(T1, T2 and FA), and the sec-w the segmentation results by CNN and
RF, respectively.
-
Fig. 6. Label differencemaps of the results generated by CNNand
RF on Subject 1. Therst row shows the original images andmanual
segmentation (T1, T2, FA, andmanual segmentation).The second and
third rows show the results by CNN and RF (CSF, GM,WM, segmentation
result). In each label differencemap, dark blue color indicates
false positives and the dark greencolor indicates false
negatives.
Fig. 7. Label differencemaps of the results generated by CNNand
RF on Subject 2. Therst row shows the original images andmanual
segmentation (T1, T2, FA, andmanual segmentation).The second and
third rows show the results by CNN and RF (CSF, GM,WM, segmentation
result). In each label differencemap, dark blue color indicates
false positives and the dark greencolor indicates false
negatives.
222 W. Zhang et al. / NeuroImage 108 (2015) 214224
-
ectsnd r
223W. Zhang et al. / NeuroImage 108 (2015) 214224are quite
similar to the ground truth data generated by human experts.In
contrast, RF generated more defects and fuzzy boundaries for
differ-ent tissues. These results further showed that the proposed
CNNmethod was more effective than other methods.
In order to further compare results by different methods, the
labeldifference maps that compare the ground-truth segmentation
withthe predicted segmentation were also presented. In Figs. 6 and
7, theoriginal T1, T2, FA images and the ground-truth segmentations
for twosubjects were shown in the rst rows. The false positives and
falsenegatives of CNN and RF were given in the second and third
rows,respectively. We also showed the segmentation results in these
twogures. We can see that the CNN outperformed RF in both the
numberof false pixels and the performance of tissue boundary
detection. For ex-ample, RF generated more false positives around
the surface of brain,and also more false negatives around
hippocampus for white matterson Subject 2. We can also observe that
most of the mis-classied pixelsare located in the areas having
large tissue contrast, such as corticesconsisting of gyri and
sulci. This might be explained by the fact thatour segmentation
methods are patch-based, and patches centered atboundary pixels
contain pixels of multiple tissue types.
To compare the performance between CNNs and RF when the
patchsize varies, we reported the performance differences between
CNNs andRF averaged over 8 subjects for different input patch sizes
in Fig. 8. Wecan observe that the performance gains of CNNs over RF
are generallyamplied for an increased input patch size. This
difference is evenmore signicant for the results of CSF and WM,
which have more re-stricted distributions than GM.
This is because of the fact that RF treated each pixel
independently,and therefore, did not leverage the spatial
relationships between pixels.In comparison, CNNs weighted pixels
differently based on their spatialdistance to the center pixel,
enabling the retaining of spatial informa-tion. The impact of this
essential difference between CNNs and RF is ex-
0.03
0.025
0.02
0.015
0.01
0.005 9 13 17 22Patch size
Dic
e ra
tio D
iffer
ence
CSF GM WM
Fig. 8. Comparison of performance differences between CNNs and
RF averaged over 8 subjferences were obtained by subtracting the
performance of RF from that of CNNs. The left apected to bemore
signicantwith a larger patch size, sincemore spatialinformation is
ignored by RF. This difference probably also explainswhyCNNs could
segment the boundary pixels with a higher accuracy, whichwas shown
in Figs. 4 and 5.
Conclusion and future work
In this study, we aimed at segmenting infant brain tissue images
inthe isointense stage. This was achieved by employing CNNs
withmulti-ple intermediate layers to integrate and combine
multi-modality brainimages. The CNNs used the complementary and
multi-modality infor-mation from T1, T2, and FA images as input
featuremaps and generatedthe segmentation labels as output feature
maps. We compared the per-formance of our approach with that of the
commonly used segmenta-tion methods. Results showed that our
proposed model signicantlyoutperformed prior methods on infant
brain tissue segmentation. Over-all, our experiments demonstrated
that CNNs could produce morequantitative and accurate
computationalmodeling and results on infanttissue image
segmentation.
In this work, the tissue segmentation problem was formulated as
apatch classication task, where the relationship among patches was
ig-nored. Some prior work has incorporated geometric constraints
intosegmentation models (Wang et al., 2014). We will improve our
CNNmodels to include similar constraints in the future. In the
current exper-iments, we employed CNNs with a few hidden layers.
Recent studiesshowed that CNNs with many hidden layers yielded very
promisingperformance on visual recognition tasks when appropriate
regulariza-tion was applied (Krizhevsky et al., 2012). We will
explore CNNs withmany hidden layers in the future as more data
become available. Inthe current study, we used all the patches
extracted from each subjectfor training the convolutional neural
network. The number of patchesfrom each tissue type is not
balanced. The imbalanced data might affectthe prediction
performance. For example, we might use sampling andensemble
learning for combating this imbalance problem, althoughthis will
further increase the training time. The current work used 2DCNN for
image segmentation, because only selected slices have beenmanually
segmented in the current data set. In principle, CNN couldbe used
to segment 3D images when labeled data are available. In thiscase,
it is more natural to apply 3D CNN (Ji et al., 2013) as such
modelshave been developed for processing 3D video data. The
computationalcosts for training and testing 3D CNNs might be higher
than those fortraining 2D CNNs, as 3D convolutions are involved in
these networks.We will explore these high-order models in the
future.
Acknowledgments
This work was supported by the National Science Foundation
grantsDBI-1147134 and DBI-1350258, and the National Institutes of
Health
0.1
0
-0.1
-0.2
-0.3
-0.4 9 13 17 22Patch size
MH
D D
iffer
ence
CSF GM WM
for patch sizes of 9 9, 13 13, 17 17, and 22 22, respectively.
The performance dif-ight gures show the results of Dice ratio and
MHD, respectively.grants EB006733, EB008374, EB009634, AG041721,
MH100217, andAG042599.
References
Amit, Y., Geman, D., 1997. Shape quantization and recognition
with randomized trees.Neural Comput. 9 (7), 15451588.
Blumenthal, J.D., Zijdenbos, A., Molloy, E., Giedd, J.N., 2002.
Motion artifact in magneticresonance imaging: implications for
automated analysis. Neuroimage 16 (1), 8992.
Breiman, L., 2001. Random forests. Mach. Learn. 45 (1),
532.Cardoso, M.J., Melbourne, A., Kendall, G.S., Modat, M.,
Robertson, N.J., Marlow, N., Ourselin,
S., 2013. Adapt: an adaptive preterm segmentation algorithm for
neonatal brain mri.Neuroimage 65, 97108.
Ciresan, D., Giusti, A., Gambardella, L.M., Schmidhuber, J.,
2012. Deep neural networkssegment neuronal membranes in electron
microscopy images. In: Pereira, F.,Burges, C., Bottou, L.,
Weinberger, K. (Eds.), Advances in Neural Information Process-ing
Systems 25. Curran Associates, Inc., pp. 28432851.
Ciresan, D.C., Giusti, A., Gambardella, L.M., Schmidhuber, J.,
2013. Mitosis detection inbreast cancer histology images with deep
neural networks. Proceedings of the Inter-national Conference on
Medical Image Computing and Computer Assisted Interven-tion. vol.
2, pp. 411418.
-
Criminisi, A., Shotton, J., 2013. Decision Forests for Computer
Vision and Medical ImageAnalysis. Springer.
Criminisi, A., Shotton, J., Konukoglu, E., 2012. Decision
forests: a unied framework forclassication, regression, density
estimation, manifold learning and semi-supervisedlearning. Found.
Trends Comput. Graph. Vis. 7 (23), 81227.
Dai, Y., Shi, F., Wang, L., Wu, G., Shen, D., 2013. ibeat: a
toolbox for infant brain magneticresonance image processing.
Neuroinformatics 11 (2), 211225.
Fan, Y., Shi, F., Smith, J.K., Lin, W., Gilmore, J.H., Shen, D.,
2011. Brain anatomical networksin early human brain development.
Neuroimage 54 (3), 18621871.
Gilmore, J.H., Shi, F., Woolson, S.L., Knickmeyer, R.C., Short,
S.J., Lin, W., Zhu, H., Hamer,R.M., Styner, M., Shen, D., 2012.
Longitudinal development of cortical and subcorticalgray matter
from birth to 2 years. Cerebral Cortex 22 (11), 24782485.
Giusti, A., Ciresan, D.C., Masci, J., Gambardella, L.M.,
Schmidhuber, J., 2013. Fast imagescanning with deep max-pooling
convolutional neural networks. 2013 IEEE Interna-tional Conference
on Image Processing, pp. 40344038.
Gui, L., Lisowski, R., Faundez, T., Hppi, P.S., Lazeyras, F.,
Kocher, M., 2012. Morphology-driven automatic segmentation of mr
images of the neonatal brain. Med. ImageAnal. 16 (8), 15651579.
Helmstaedter, M., Briggman, K.L., Turaga, S.C., Jain, V., Seung,
H.S., Denk, W., 2013.Connectomic reconstruction of the inner
plexiform layer in the mouse retina. Nature500 (7461), 168174.
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I.,
Salakhutdinov, R.R., 2012. Improv-ing Neural Networks by Preventing
Co-adaptation of Feature Detectors (arXiv,
Nie, J., Li, G., Wang, L., Gilmore, J.H., Lin,W., Shen, D.,
2012. A computational growthmodelfor measuring dynamic cortical
development in the rst year of life. Cereb. Cortex 2227722284
(Oxford Univ Press).
Ning, F., Delhomme, D., LeCun, Y., Piano, F., Bottou, L.,
Barbano, P.E., 2005. Toward auto-matic phenotyping of developing
embryos from videos. IEEE Trans. Image Process.14 (9),
13601371.
Nishida, M., Makris, N., Kennedy, D.N., Vangel, M., Fischl, B.,
Krishnamoorthy, K.S.,Caviness, V.S., Grant, P.E., 2006. Detailed
semiautomated MRI based morphometryof the neonatal brain:
preliminary results. Neuroimage 32 (3), 10411049.
Paus, T., Collins, D., Evans, A., Leonard, G., Pike, B.,
Zijdenbos, A., 2001. Maturation of whitematter in the human brain:
a review of magnetic resonance studies. Brain Res. Bull.54 (3),
255266.
Shi, F., Fan, Y., Tang, S., Gilmore, J.H., Lin, W., Shen, D.,
2010a. Neonatal brain imagesegmentation in longitudinal MRI
studies. Neuroimage 49 (1), 391400.
Shi, F., Yap, P.-T., Gilmore, J.H., Lin, W., Shen, D., 2010b.
Spatialtemporal constraint forsegmentation of serial infant brain
mr images. Medical Imaging and AugmentedReality. Springer, pp.
4250.
Shi, F., Wang, L., Dai, Y., Gilmore, J.H., Lin, W., Shen, D.,
2012. Label: pediatric brain extrac-tion using learning-based
meta-algorithm. Neuroimage 62 (3), 19751986.
Sled, J.G., Zijdenbos, A.P., Evans, A.C., 1998. A nonparametric
method for automatic correc-tion of intensity nonuniformity in MRI
data. Med. Imaging IEEE Trans. 17 (1), 8797.
Song, Z., Awate, S.P., Licht, D.J., Gee, J.C., 2007. Clinical
neonatal brain MRI segmentationusing adaptive nonparametric data
models and intensity-based markov priors.Medical Image Computing
and Computer-Assisted InterventionMICCAI 2007.Springer, pp.
883890.
224 W. Zhang et al. / NeuroImage 108 (2015) 214224Jain, V.,
Seung, S., 2009. Natural image denoising with convolutional
networks. In: Koller,D., Schuurmans, D., Bengio, Y., Bottou, L.
(Eds.), Advances in Neural InformationProcessing Systems. 21, pp.
769776.
Jain, V., Murray, J.F., Roth, F., Turaga, S., Zhigulin, V.,
Briggman, K.L., Helmstaedter, M.N.,Denk, W., Seung, H.S., 2007.
Supervised learning of image restoration withconvolutional
networks. Computer Vision, 2007. ICCV 2007. IEEE 11th
InternationalConference on. IEEE, pp. 18.
Ji, S., Xu, W., Yang, M., Yu, K., 2013. 3D convolutional neural
networks for human actionrecognition. IEEE Trans. Pattern Anal.
Mach. Intell. 35 (1), 221231.
Kim, S.H., Fonov, V.S., Dietrich, C., Vachet, C., Hazlett, H.C.,
Smith, R.G., Graves, M.M., Piven,J., Gilmore, J.H., Dager, S.R., et
al., 2013. Adaptive prior probability and spatial tempo-ral
intensity change estimation for segmentation of the one-year-old
human brain.J. Neurosci. Methods 212 (1), 4355.
Krizhevsky, A., Sutskever, I., Hinton, G., 2012. Imagenet
classication with deepconvolutional neural networks. In: Bartlett,
P., Pereira, F., Burges, C., Bottou, L.,Weinberger, K. (Eds.),
Advances in Neural Information Processing Systems. 25,pp.
11061114.
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998a.
Gradient-based learning applied to doc-ument recognition. Proc.
IEEE 86 (11), 22782324 (November).
LeCun, Y., Bottou, L., Orr, G.B., Mller, K.-R., 1998b. Efcient
backprop. Neural Networks:Tricks of the Trade. Springer, Berlin
Heidelberg, pp. 950.
Leroy, F., Mangin, J.-F., Rousseau, F., Glasel, H.,
Hertz-Pannier, L., Dubois, J., Dehaene-Lambertz, G., 2011.
Atlas-free surface reconstruction of the cortical greywhite
inter-face in infants. PLoS One 6 (11).
Li, G., Nie, J., Wang, L., Shi, F., Lin, W., Gilmore, J.H.,
Shen, D., 2013a. Mapping region-specic longitudinal cortical
surface expansion from birth to 2 years of age. Cereb.Cortex 23
(11), 27242733.
Li, G., Wang, L., Shi, F., Lin, W., Shen, D., 2013b. Multi-atlas
based simultaneous labeling oflongitudinal dynamic cortical
surfaces in infants. Medical Image Computing andComputer-Assisted
InterventionMICCAI 2013. Springer, pp. 5865.
Liaw, A., Wiener, M., 2002. Classication and regression by
randomforest. R News 2 (3),1822.
Liu, T., Li, H., Wong, K., Tarokh, A., Guo, L., Wong, S.T.,
2007. Brain tissue segmentationbased on dti data. Neuroimage 38
(1), 114123.
Nair, V., Hinton, G.E., 2010. Rectied linear units improve
restricted boltzmannmachines.Proceedings of the 27th International
Conference on, Machine Learning (ICML-10),pp. 807814.Turaga, S.C.,
Murray, J.F., Jain, V., Roth, F., Helmstaedter, M., Briggman, K.,
Denk, W., Seung,H.S., 2010. Convolutional networks can learn to
generate afnity graphs for imagesegmentation. Neural Comput. 22
(2), 511538.
Wang, L., Shi, F., Lin, W., Gilmore, J.H., Shen, D., 2011.
Automatic segmentation of neonatalimages using convex optimization
and coupled level sets. Neuroimage 58 (3),805817.
Wang, L., Shi, F., Yap, P.-T., Gilmore, J.H., Lin, W., Shen, D.,
2012. 4D multi-modality tissuesegmentation of serial infant images.
PLoS One 7 (9).
Wang, L., Shi, F., Yap, P.-T., Lin, W., Gilmore, J.H., Shen, D.,
2013. Longitudinally guidedlevel sets for consistent tissue
segmentation of neonates. Hum. Brain Mapp. 34 (4),956972.
Wang, L., Shi, F., Gao, Y., Li, G., Gilmore, J.H., Lin, W.,
Shen, D., 2014. Integration of sparsemulti-modality representation
and anatomical constraint for isointense infant brainMR image
segmentation. Neuroimage 89, 152164.
Weisenfeld, N.I., Wareld, S.K., 2009. Automatic segmentation of
newborn brain mri.Neuroimage 47 (2), 564572.
Weisenfeld, N.I., Mewes, A., Wareld, S.K., 2006a. Segmentation
of newborn brain mri.Biomedical Imaging: Nano to Macro, 2006. 3rd
IEEE International Symposium on.IEEE, pp. 766769.
Weisenfeld, N.I., Mewes, A.U., Wareld, S.K., 2006b. Highly
accurate segmentation ofbrain tissue and subcortical gray matter
from newborn mri. Medical Image Comput-ing and Computer-Assisted
InterventionMICCAI 2006. Springer, pp. 199206.
Xue, H., Srinivasan, L., Jiang, S., Rutherford, M., Edwards,
A.D., Rueckert, D., Hajnal, J.V.,2007. Automatic segmentation and
reconstruction of the cortex from neonatal mri.Neuroimage 38 (3),
461477.
Yap, P.T., Fan, Y., Chen, Y., Gilmore, J.H., Lin, W., Shen, D.,
2011. Development trends ofwhite matter connectivity in the rst
years of life. PLoS One 6 (9), e24678.
Yushkevich, P.A., Piven, J., Hazlett, H.C., Smith, R.G., Ho, S.,
Gee, J.C., Gerig, G., 2006. User-guided 3d active contour
segmentation of anatomical structures: signicantly im-proved
efciency and reliability. Neuroimage 31 (3), 11161128.
Zeiler, M.D., Fergus, R., 2014. Visualizing and understanding
convolutional networks.Proceedings of the European Conference on
Computer Vision. Springer, pp. 818833.
Zilles, K., Armstrong, E., Schleicher, A., Kretschmann, H.-J.,
1988. The human pattern ofgyrication in the cerebral cortex. Anat.
Embryol. 179 (2), 173179.preprint arXiv:1207.0580).