Convolutional neural networks for automated edema segmentation in
brain CT
images of patients with intracerebral hemorrhage.
Dyantha van der Sluijs
Dr. F.P.J.M. VOORBRAAK
Convolutional neural networks for automated edema segmentation in
brain CT
images of patients with intracerebral hemorrhage.
Student D.G. VAN DER SLUIJS 10164391
[email protected]
Location BIOMEDICAL ENGINEERING AND PHYSICS Academic Medical Center
Meibergdreef 9 1105 AZ Amsterdam-Zuidoost
Supervisor Dr. H.A. MARQUERING Biomedical engineering and physics
[email protected]
Tutor Dr. F.P.J.M. VOORBRAAK Medical Informatics
[email protected]
February, 2017 - January, 2018
1 Introduction 4
2 Preliminaries 4 2.1 Artificial neural networks . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Gradient
descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 6 2.3 Convolutional Neural Networks (CNNs) . . . . .
. . . . . . . . . . . . . . . . . . 6
3 Methods and materials 9 3.1 Dataset . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2
Inclusion and exclusion . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 9 3.3 Pre-processing . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.1 Manual segmentations . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 9 3.3.2 Patches . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . 10 3.3.3 Thin- and
thick sliced images . . . . . . . . . . . . . . . . . . . . . . . .
. . 11
3.4 Hardware and Software . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 11 3.5 CNN training . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5.1 CNN architecture . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 11 3.5.2 Training models . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . 11 3.5.3 Training
on difficult patches . . . . . . . . . . . . . . . . . . . . . . .
. . . . 12
3.6 Statistical analyses . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 12 3.7 Post-processing . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
4 Results 13 4.1 Inter observer variability . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . 13 4.2 Optimal
probability threshold . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 15 4.3 CNN performance . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . 16
4.3.1 Performance of training models on ICH segmentations . . . . .
. . . . . . 16 4.3.2 Performance of training models on edema
segmentations . . . . . . . . . 16 4.3.3 Performance on test sets .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 Post-processing . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . 19
5 Discussion 21
6 Conclusion 22
1
Samenvatting
Introductie
Intracerebrale bloeding is een veelvoorkomende vorm van beroerte
met een hoge ziekte- en sterftecijfer. Vaak ontstaat er oedeem
rondom de bloeding. Doordat oedeem de kans op een slechte afloop
vergroot, is oedeem kwantificatie nodig om de behandeling van de
bloeding te optimaliseren. Het is al aangetoond dat het gebruik van
convolutionele neurale netwerken (CNN) een betrouwbare methode is
voor medische beeld segmentatie. In deze studie introduc- eren we
CNN voor het automatisch kwantificeren van oedeem.
Methode
We gebruikte 191 scans voor het trainen, en 48 scans voor het
testen van onze CNN. We ge- bruikten een CNN architectuur dat
bestond uit 2 convolutionele lagen, 2 pooling lagen, een fully
connected laag en een softmax functie. We hebben 8 modellen getest
die varieerde in patch grootte, aantal iteraties en in plak dikte.
Tevens gebruikten we in 1 model moeilijke patches in de training
set. In 1 model gebruikten we segmentaties van een andere
observator en in 1 model gebruikten we observaties van 2
observatoren. De prestaties van onze modellen zijn gemeten met een
oppervlakte onder de curve (AUC), Dice scores en intraclass
correlation coefficient (ICC) waardes. Interobserver
variabiliteiten tussen 4 observanten werden gemeten om de
prestaties te analyzeren. Het optimaliseren van de CNN segmentaties
werd gedaan door incorrect geclassificeerde voxels te verwijderen
met waarschijnlijkheidsgrenzen en segment ver- wijderingsgrenzen.
Vervolgens elimineerden we oedeem segmenten die niet naast
bloedingen lagen.
Resultaten
Het meest accurate model was getraind met patches van [19 x 19]
voxels in 100 iteraties. Dit model behaalde een AUC waarde van
0.92, een gemiddelde Dice score van 0.32 en een gemid- delde ICC
waarde van 0.73. Na het bewerken van deze segmentatie met
bovengenoemde meth- oden, behaalde dit model een gemiddelde Dice
score van 0.44 en een gemiddelde ICC waarde van 0.68. Dit is lager
dan de gemiddelde interobserver Dice score van 0.52 en een
gemiddelde interobserver ICC waarde van 0.83.
Conclusie
CNN is een veelbelovende methode voor oedeem segmentatie met een
nauwkeurigheid dat dichtbij de interobserver variabiliteit
ligt.
2
Summary
Introduction
Intracerebral hemorrhage (ICH) is a common type of stroke with high
morbidity and mortality rate. Edema often forms around ICH. Because
edema increases the chance of poor outcome, edema quantification is
needed for optimizing ICH treatment. Convolutional neural networks
(CNN) have been proven to be a reliable method in medical image
segmentation. In this study, we introduce CNN to automatically
quantify edema.
Methods
We used 191 scans for training, and 48 scans for testing our CNN.
Furthermore, we used a CNN architecture that included 2
convolutional layers, 2 pooling layers, a fully connected layer and
a softmax function. We tested 8 models varying in patch size, epoch
number, slice thickness, use of difficult patches in the training
set and manual segmenter used for training set segmentation.
Performance was measured with area under the curve (AUC), Dice
scores and intraclass correla- tion coefficient (ICC) values.
Inter-observer variabilities between 4 observers were determined
for analyzing performance. Optimizing segmentations was done using
probability thresholds and particle removal thresholds (PRTs) to
remove incorrect classified voxels. Furthermore, we deleted edema
particles non-adjacent to ICH particles.
Results
The most accurate model was a model trained with patches of [19 x
19] voxels in 100 epochs with an AUC value of 0.92, an average Dice
score of 0.32 and an average ICC value of 0.73. After
post-processing, this model achieved an average Dice score of 0.44
and an average ICC value of 0.68. This is lower than the average
inter-observer Dice score of 0.52 and an average inter-observer ICC
value of 0.83.
Conclusion
CNN is a promising method for edema segmentation with an accuracy
that approaches manual performances.
Keywords: Brain edema, convolutional neural networks, intracerebral
hemorrhage
3
1 Introduction
Intracerebral hemorrhage (ICH) is a common type of stroke (10 - 15
% of all strokes) with a high morbidity and mortality rate1. It is
caused by a rupture of a small, damaged artery in the brain2.
Detoriation of blood vessels can be caused by hypertension or
amyloid angiopathy. Multiple factors associated with ICH could
provoke brain edema3–7. Edema is a swelling of the brain caused by
excessive accumulation of fluid. It most rapidly progresses during
the first 2-3 days after ICH and is visible on CT scans as a
hypodense halo around the hyperdense ICH region8,9.
Edema is a major cause of poor outcome of ICH and can cause further
tissue damage or even death8–12. This poor outcome is associated
with the growth of the edema during the ini- tial 48-72 hours after
the ICH9,10. Therefore, edema quantification after ICH could
improve decision-making for ICH treatment11,12. Although volumetric
analysis of edema by magnetic resonance imaging (MRI) is more
accurate than CT13, an MRI scan takes more time and is more costly.
Precise manual assessment of the edema volume on CT is difficult
and there remain considerable interobserver differences14,15. An
automatic algorithm for quantifying edema on a CT scan would be
beneficial to decrease interobserver variances and improve
quantification of edema. In this study, we use convolutional neural
networks (CNNs) to create an automated edema quantification
method.
Multiple automated ICH segmentation methods using thresholding have
been developed14–17. Using thresholding, a voxel is labeled based
on its intensity. This value could either be the hounsfield unit
(HU)14,16,17 or the relative density increase15. CNN provides an
automated method that also considers intensity values of
neighbouring voxels for classifying voxels. This advantage is
useful in edema segmentation, because the low contrast between
edema and brain regions makes differenting hypodense edema from
brain tissue challenging. Furthermore, CT scans vary in brightness,
making it hard to set one threshold for all scans.
CNN is a machine learning method that uses small learnable filters
for voxel classification. Previous studies already proved
effectiveness of CNNs in medical image segmentations18–21. A CNN
can be trained via an ”off-the-shelf” method or ”from scratch”.
With an ”off-the-shelf” CNN, the CNN is already trained on
non-medical images making it possible to re-use the pre- trained
filters. Training from scratch means that the filters are randomly
initialized. Van Gin- neken et al. (2015) accurately detected
pulmonary nodules using an off-the-shelf CNN22 and Havaei et al.
(2017) successfully segmented brain tumor with a CNN trained from
scratch23. We train our CNN from scratch. The aim of this study is
to produce and test an automated edema segmentation method using
CNN for detecting and quantifying edema following ICH on a non
contrast CT scan. The algorithm was trained and tested on CT scans
obtained from the PATCH study24. This study investigated the effect
of platelet transfusion in ICH patients.
The remainder of this thesis is built up as follows: First, a
global introduction about neural networks is given followed by an
introduction about CNNs. Then, the methods for developing and
testing our segmentation method are provided, followed by the
testing results, a discussion of these results and a
conclusion.
2 Preliminaries
2.1 Artificial neural networks
Artificial neural networks (ANNs) are computing programs that can
be taught to classify pat- terns by providing examples with
corresponding classification labels. These networks are based on
biological neural networks. Biological neural networks in the brain
exist of billions of inter- connected neurons in different layers
that process information in parallel. The network learns
4
to recognize a pattern by firing signals to specific neurons in the
different layers. As a result, connections between firing neurons
are adjusted. ANNs use artificial neurons connected over different
hidden layers starting with an input layer and ending with an
output layer (figure 1)25. These connections, usually denoted as
arrows, contain weights and represent information flow. Neuron
values in hidden layer neurons and output layer neurons are
weighted accumulations of the values of the previous layer. Forward
propagation is the application of the weights to the input data and
calculating the output. For example, if we want to calculate the
value of node h1 in figure1, we use the following equation:
h1 = i1 ∗ w3 + i2 ∗ w5 + b1 ∗ w1 (1)
Often, bias neurons (grey nodes in figure 1) are added to every
layer in the network. These neurons usually have a value of 0, -1
or 1 and have changeable connections to next layer neu- rons.
Because the value of a bias node is constant and cannot be changed
by previous layers, it allows the network to shift its output
function, such as equation 1, vertically. This prevents the network
from overfitting the input data.
b1
i1
i2
b2
h1
h2
o1
o2
w4
w6
w10
w12
w1
w2
w3
w5
w9
w11
w7
w8
Hidden layer
Input layer
Output layer
Figure 1: A simple neural network consisting of an input layer
(green), a hidden layer (purple) and an output layer (red). The
grey circles are bias nodes (often 0, -1 or 1) and the grey arrows
are the weights.
To simulate the learning process of biological neural networks
where connections are changed upon learning, the weights of ANNs
should be changed to minimize error between values of output
neurons and actual output values. A cost function is used to
calculate this error. A large number of cost functions exist. In
this study, we use cross entropy (CE). CE is often used in image
classification23,26. The CE function proved to be faster and more
accurate than other cost functions27. CE is calculated by the
following equation:
Hy′(y) = − ∑ i
y′i log (yi), (2)
where yi is the predicted probability distribution of class i and
y′i is the true probability of that class.
5
2.2 Gradient descent
For optimizing the neural network, the total error should be
minimized. Figure 2 shows the cost function for 2 weights. To reach
the minimum of the function, steps towards this minimum error are
taken by changing the weights. The direction of the steps is
determined by multiplying the partial derivatives or gradients of
all weights. The step sizes are determined by the learning rate.
All weights are updated by subtracting their gradients multiplied
by the learning rate from the original weight values. This is done
untill a minimum total error is reached where all gradients are 0.
This process is called gradient descent. As shown in figure 2,
multiple low points or minima exist in a function. Using gradient
descent, we can misinterpret a local minimum as the global minimum.
Therefore, we should perform gradient descent multiple times with
multiple starting points. We refer to a gradient descent iteration
as an epoch in the remainder of this article.
Figure 2: Gradient descent visualized in a graph with 2 weights on
the x-axes and the cross entropy on the y-axis. The black line
shows the steps taken to find the lowest cross entropy value.
Multiple local minima can be found by gradient descent. 28
2.3 Convolutional Neural Networks (CNNs)
Figure 3 displays an example architecture of a CNN. CNNs are neural
networks with multiple types of layers containing learnable weights
and biases. CNNs can be used for different causes. In this study we
use CNNs for image recognition. The different types of layers
enable the CNN to provide spatial information about objects in an
image. There is one input layer, one output layer, and one or more
hidden layers. In this study, the input is an image. This is
comparable to a visual stimulus for the human brain where the brain
can learn to recognize this visual stimulus by categorizing it.
This image is provided as a matrix with an intensity value for
every voxel. In CT scans, these values are presented in Hounsfield
Units (HU). Training a CNN requires an input image with
corresponding labels.
6
Figure 3: A simple architecture of a CNN comprising a Conv layer, a
pool layer and a FC layer followed by a softmax function.
Multiple types of layers can be used in a CNN23,25,26,29:
1. Convolutional (conv) layers: Different features in the image are
extracted by Conv layers. This layer uses forward propagation and
gradient descent to adjust the weights and im- prove
categorization. The weights, defined as filters, are provided as
small matrices of an arbitrary size. Each filter detects a
particular feature in the image. Every filter is multi- plied with
every part of the input, when slided in steps of an arbitrary
number of voxels from the top left corner to the bottom right
corner of the image (figure 4). The size of these steps is defined
as the stride. After each step, the element-by-element product of
each fil- ter and the current part of the matrix is calculated,
accumulated and added with the bias, resulting in a featuremap.
This calculation is done in equation 3 using the red rectangle in
figure 4 as example. Often, zeros are added around the border,
which is called zero padding. This preserves the size of the input
and the information at the borders.
Figure 4: Multiplication of a filter in one layer with stride 2.
The filter is multiplied with the red part, blue part and green
part and slided along the image in this way. The bias node is added
with each multiplication resulting in the outcome matrix.
7
=
(3a)
= 1 (3b)
ored = b+ ∑
xred f = 1 + 1 = 2 (3c)
In equations 3, xred is the small red sub-matrix of the input image
in figure 4, f is the filter, and ored is the outcome value for
this sub-matrix. Equation a shows the element-by- element
multiplication of the sub-matrix with the filter, equation b shows
the accumula- tion of this product and equation c adds the bias.
The number of filters for a convolutional layer determines the
depth of the input for the next layer. Networks with more kernels
perform better with complex images. This calculation is followed by
a non-linear acti- vation function. Multiple non-linear functions
exist, such as the tanh() sigmoid funtion (f(x) = tanh(x)) and
rectified linear units (ReLU) (f(x) = max(0, x)). ReLUs proved to
train faster than the tanh units30. Weights are adjusted with
gradient descent. Gradient descent is often a time-consuming
process when a large number of patches is provided to the CNN.
Therefore, the network is trained on minibatches of images31. Each
convolu- tional output of a minibatch is normalized with the
following formula:
xi = xi − µB√ σ2 B + ε
, (4)
where xi is the normalized convolutional output, xi is the
convolutional output, µB is the batch mean, σ2
B is the batch variance and ε is a constant added for numerical
stability. Filters are updated after each batch.
2. Pooling (pool) layers: These layers reduce the resolution of the
image and thereby compu- tation power by taking the maximum or
average of non-overlapping square patches from the image of a
pre-defined size. This size is often 2 x 2 voxels. Pool layers help
reducing overfitting.
3. Fully connected (FC) layers: FC layers, or dense layers, are
generally used at the end of a CNN for accumulating all products of
the previous layer and the previous filters. Here, the filter size
should be equivalent to the size of the input volume.
4. Softmax layer: A softmax layer is added to the CNN. This layer
computes probabilities for all patches by squashing the outputs of
each class to be in the range [0, 1] in such a way that the sum of
all outputs is 1. The softmax function is given by:
σ(z)j = ezj∑K k=1 e
zk , (5)
where z is the vector of K outputs indexed by j.
8
3 Methods and materials
Manually marked edema and ICH were used as ground truth values in
the algorithm. We trained our CNN on 80 percent of our scans and
validated the algorithm by comparing the edema segmentations and
volumes computed by the algorithm with the manually annotated
ground truth segmentation in the remainder 20 percent of CT scans.
Because the combination of edema and ICH is a unique structure, we
chose to train our CNN model from scratch.
3.1 Dataset
For this study, the image dataset of the PATCH study was used24.
This dataset comprised of CT images of 190 patients with ICH that
were taken on admission and 24 hours after admis- sion. Patients
were above 18 years and had a Glasgow Coma Scale score between 8
and 15. The amount of intraventricular blood was less than a
sedimentation in the posterior horns of the lat- eral ventricles
and there was no haematoma present that was suggestive of epidural,
subdural, aneurysmal or arterio-venous malformation
haematoma.
3.2 Inclusion and exclusion
For the reason that CNN training should include edema, we only used
CT scans in which edema was visible. Furthermore, scans with large
artefacts such as beam hardening were excluded. To prevent
overfitting on edema of one patient, we excluded the scan taken 24
hours after admis- sion when 2 scans of one patients were similar.
Similarity was subjectively determined by the main investigator.
Finally, 154 patients with 241 scans were included. This set of
scans included 135 scans taken on admission of the patient and 106
scans taken 24 hour after admission.
Figure 5: Example of a manual segmentation with the hemorrhage
segmentation in red and the edema segmentation in green.
3.3 Pre-processing
3.3.1 Manual segmentations
Each scan was manually segmented, using ITK-SNAP, by 3-5 trained
observers (table 2). For ground truth, we used manual segmentations
that were performed by a trained neurologist and checked by 2
radiologists. To assess interobserver variabilities, 1 master
student Medical
9
Informatics and 2 PhD students of the department biomedical
engineering and physics also segmented the scans. First, ICH was
segmented separately. Edema surrounds the ICH as is shown in figure
5. By segmenting edema separately, overlapping of edema and ICH
masks is likely to occur, giving voxels multiple labels. Thus,
edema and ICH were segmented together. Finally, the ICH
segmentations were subtracted from the combined segmentations in
MATLAB to get edema segmentations.
Figure 6: 3 example patches of [19× 19] pixels for the 3
categories; Edema, brain and hemorrhage.
3.3.2 Patches
The edema and ICH segmentations were used as binary masks to
determine the classification of a voxel. Brain masks were made by
stripping the skull and subtracting the edema and ICH masks. Input
images for the CNN were patches of size [19×19] or size [17×17]
voxels (figure 6). Scans were randomly subdivided in 5 sets, where
4 sets contained 48 scans and 1 set contained 49 scans. Per
training, one set was used as test set and the 4 other sets were
used as training set. One set contained 7 scans with inconsistent
slice thicknesses. This set was never used as a test set to prevent
incorrect volume quantification.
For every scan of the training set, an equal number of edema, ICH
and brain patches were taken. Normal and horizontally mirrored
patches were taken around voxels to increase the number of patches.
To maintain an equal number of patches for all classes, we
determined which class contained the smallest number of voxels,
e.g. ICH or edema. We took both normal and mirrored patches around
every voxel of this class.
An equal number of patches was taken around randomly selected
voxels for the class with the second smallest number of voxels. In
the case that this structure had insufficient number of voxels to
obtain this number of patches (when the number of voxels of this
class was smaller than twice the number of voxels of the smallest
class), we flipped patches of randomly selected voxels to obtain
this number. The ’brain’ class always had the largest number of
voxels, thus normal brain patches were taken around randomly
selected voxels to reach a number of patches equally to the number
of patches of the other classes.
10
All patches were labeled corresponding to the tissue of the center
voxel. Therefore, it is possible that a patch is given a certain
label, but also includes voxels of another label. Patches of all
scans were put together in one file and randomly rearranged.
In contrast to the training set where patches were obtained for a
selected part of all voxels of each scan, test set patches were
obtained for every voxel of each scan without class balancing nor
data augmentation.
3.3.3 Thin- and thick sliced images
The Patch study dataset consisted of images with slice thicknesses
ranging from 0.45 mm to 10 mm. Observer 1 segmented the images on
their original thicknesses. Resampling CT scans from thin sliced
(< 4 mm) to thick sliced (> 4 mm) was done in MATLAB by
taking the av- erage HU values of the thin slices. Thereafter,
observer 2-4 segmented edema and ICH in the resampled images.
Convert3D from simpleITK32 was used for increasing slice
thicknesses in segmentations of observer 1. This was done by
nearest-neighbor interpolation.
3.4 Hardware and Software
All experiments were performed using an Intel Xeon CPU E5-1620 with
32 GB RAM memory and a NVIDIA GeForceGTX 1080 GPU driver. The
Microsoft Cognitive Toolkit (CNTK) version 1.7.233 was used for
making the CNNs. SimpleITK32 was used for visualizing segmentations
and manually segmenting edema and ICH on the CT scans.
3.5 CNN training
3.5.1 CNN architecture
Earlier research has shown an optimal architecture for subarachnoid
hemorrhage segmenta- tion18. Because of the similar features of
edema and the use of ICH segmentation in our study, we used that
architecture. This architecture used patches of [19 × 19] voxels
and con- sisted of 2 Conv layers with respectively 128 and 256
filters sized [5 × 5] voxels. Weights were randomly initialized
using a Gaussian distribution with zero mean and standard deviation
of 0.2 ∗
√ nroffeatures. Number of features was calculated by multiplying
the number of voxels
of one filter by the number of filters. Zeropadding and a bias node
were used for each convo- lution. This bias node was initialized as
0. A ReLu function was used as activation function. Following each
Conv layer, a [2 × 2] max Pool layer was implemented. A FC layer
with 256 nodes and a softmax layer were added at the end of the
CNN. Training was done with learning rates of 0.0006 units for the
first 50 minibatches, 0.0003 for the following 100 minibatches and
0.0001 for the last 50 minibatches.
3.5.2 Training models
The characteristics of training images and the CNN architecture
used in the training models are presented in table 1. The CNN was
trained on segmentations of one observer or 2 observers and tested
on segmentations of observer 1 (figure 2). When trained on multiple
observers, segmentations were taken from both observers. Thus,
scans were included twice.
11
Table 1: Training models with information about the observer that
segmented the training images (table 2), slice thickness of
training images, patch sizes used for CNN training, inclusion of
difficult patches, and the number of training epochs.
Training code
Observer segm.
Number of training epochs
A 1 0.45− 10 19× 19 No 100 B 1 > 4 19× 19 No 100 C 1 > 4 17×
17 No 100 D 1 > 4 19× 19 Yes 100 E 2 > 4 19× 19 No 100 F 1
& 2 > 4 19× 19 No 100 G 1 0.45− 10 19× 19 No 200 H 1 0.45−
10 17× 17 No 200
Table 2: Observer information
Occupation Nr. of scans
1 Neurologist 239 2 Master student 239 3 PhD 45 4 PhD 48
3.5.3 Training on difficult patches
Multiple structures in the brain have similar HU values and
patterns as edema. This can lead to misclassification of these
brain voxels by the CNN. Training on difficult brain voxels can
decrease misclassifications. For this training model, we trained
the CNN twice where the first training was needed to determine
which brain patches were too difficult for correctly classifying as
brain. During this training, training patches included all edema
and ICH patches, and ran- dom brain patches. All brain patches of
the training set were classified with the first CNN and the patches
that were misclassified as edema with a probability of more than
90% were marked as ’difficult’. During the second training, the CNN
was trained from scratch and included all edema and ICH patches,
and all difficult brain patches completed with randomly selected
brain patches.
3.6 Statistical analyses
Area under the curves (AUCs) of receiver operating charachteristics
(ROC) curves were cal- culated by comparing CNN classifications and
manual segmentations. Furthermore, we used Dice scores to compare
CNN and manual segmentations and to demonstrate interobserver vari-
ability. Dice scores are calculated by:
Dice = 2|X ∩ Y | |X|+ |Y |
, (6)
where X and Y are 2 segmented volumes. Intraclass correlation
coefficient (ICC) was used for calculating the variability in edema
volumes between CNN segmentations and manual seg- mentations and
again to demonstrate interobserver variability.
12
3.7 Post-processing
Through examining edema segmentations of scans that obtained low
Dice scores between CNN edema segmentation and observer 1
segmentation, we found that some scans were incorrectly segmented;
multiple slices were not segmented. These scans were excluded.
Producing the first version of edema and ICH segmentations was done
by using the probability value that produces the best average Dice
score as threshold. Voxels classified with a probability above this
threshold were included in the segmentation. Since small particles
of voxels connected by a 6-connected neighborhood were likely to be
inaccurate, we used additional particle removal thresholds (PRTs)
for edema and ICH particles. Thresholds were determined for
multiple prob- ability thresholds using the smallest particle sizes
of the accurate connected components of 3 test sets. Additionally,
edema particles without neighbouring ICH particles were eliminated
considering that edema always surrounds ICH. Using 3 test sets, we
tested which combinations of PRTs and probability thresholds for
edema and ICH obtain best Dice scores. We tested this combination
on the 4th test set using Dice scores and ICC values.
4 Results
In the 260 scans, there was a median edema volume of 6.8 ml and 9.0
ml according to expert 1 and CNN respectively. We excluded 2 scans
in set 1 and 2 scans in set 4 due to incomplete segmentation by
expert 1. Training models that used thick sliced images included
around 1.4 billion input patches and training models that used thin
sliced images included around 2.9 billion input patches.
4.1 Inter observer variability
Figure 7 shows scatter plots presenting interobserver variability
using volumes of 2 experts. Dice scores and ICC values are
presented in table 3. Dice scores varied from 0.43 to 0.66 with an
average Dice score between 2 observers of 0.52. ICC values varied
form 0.75 to 0.90 with an average ICC of 0.83.
Table 3: Inter observer variabilities. First column depicts the
observers that are compared with the number of scans that are
compared in the second column. Dice scores are given as Average
Dice± standard deviation [minimum Dice,maximum Dice].
Observers Number of scans Dice ICC Average [95% CI] P-value
1 & 2 159 0.50± 0.15 [0.14, 0.82] 0.90 [0.87−0.93] < 0.001 1
& 3 45 0.54± 0.14 [0.17, 0.77] 0.81 [0.68−0.89] < 0.001 1
& 4 31 0.43± 0.18 [0.08, 0.74] 0.75 [0.45−0.87] < 0.001 2
& 3 30 0.53± 0.15 [0.11, 0.72] 0.87 [0.74−0.93] < 0.001 2
& 4 48 0.66± 0.09 [0.40, 0.79] 0.82 [0.66−0.91] < 0.001 3
& 4 30 0.48± 0.15 [0.06, 0.70] 0.82 [0.66−0.91] <
0.001
13
14
4.2 Optimal probability threshold
Figure 8 shows average Dice scores vs. probability thresholds for
edema segmentations and for ICH segmentations. This figure shows
the results of model G since this model achieved the best average
Dice score. Optimal probability thresholds are 0.95 for edema
segmentation and 0.93 for ICH segmentation. Further results use
these thresholds. Figure 9 shows that the Dice score is relatively
low for small volumes.
Figure 8: Average Dice scores vs. probabilities used as threshold
for binary voxel classification of edema (blue) and ICH
(red).
Figure 9: Dice scores over edema volumes.
15
4.3.1 Performance of training models on ICH segmentations
Table 4 gives AUC values for each ROC curve in figure 10, average
Dice scores, and ICC values for ICH segmentations achieved by the
training models. Our CNN performed better on ICH segmentations with
AUC values ranging from 0.96 to 0.99, Dice scores between 0.70 and
0.74 and ICC values between 0.63 and 0.75.
False positive rate
0. 0
0. 2
0. 4
0. 6
0. 8
1. 0
A
B
C
D
E
F
G
H
Figure 10: ROC curves regarding the classification of voxels as
ICH
Table 4: Performances on ICH segmentations presented by the area
under the ROC curve (AUC), average Dice coefficient with the
standard deviation and the intraclass correlation coefficient (ICC)
of the volumes. Dice scores are given as Average Dice ± standard
deviation [minimum Dice,maximum Dice].
Training AUC Dice ICC Average± stdev [min,max] Average [95% CI]
P-value
A 0.99 0.74± 0.14 [0.24, 0.93] 0.75 [0.59-0.85] < 0.001 B 0.99
0.74± 0.14 [0.28, 0.93] 0.75 [0.60-0.85] < 0.001 C 0.99 0.71±
0.15 [0.16, 0.92] 0.74 [0.58-0.85] < 0.001 D 0.97 0.74± 0.17
[0.31, 0.93] 0.74 [0.57-0.84] < 0.001 E 0.97 0.74± 0.17 [0.23,
0.93] 0.74 [ 0.58-0.84] < 0.001 F 0.96 0.70± 0.22 [0.16, 0.93]
0.70 [0.53-0.82] < 0.001 G 0.99 0.73± 0.16 [0.24, 0.92] 0.74
[0.58-0.85] < 0.001 H 0.99 0.71± 0.17 [0.15, 0.93] 0.63
[0.43-0.78] < 0.001
4.3.2 Performance of training models on edema segmentations
Table 5 presents average Dice scores, and ICC values for ICH
segmentations for edema segmen- tations with corresponding ROC
curves in figure 11. AUC values range from 0.77 to 0.93, Dice
16
scores range from 0.08 to 0.36 and ICC values rand from 0.05 to
0.73. We found that model H obtained the best AUC value (0.93),
model G the best average Dice score (0.36), and model A obtained
the best ICC value (0.73). Model A and G both have an AUC of 0.92.
Figure 12 shows volume correlation of model G and manual
segmentations in a scatter plot.
False positive rate
0. 0
0. 2
0. 4
0. 6
0. 8
1. 0
A
B
C
D
E
F
G
H
Figure 11: ROC curves regarding the classification of voxels as
edema
Table 5: Performances on edema segmentations presented by the area
under the ROC curve (AUC), average Dice coefficient with the
standard deviation and the intraclass correlation coefficient (ICC)
of the volumes.
Training AUC Dice ICC Average± stdev [min,max] Average [95% CI]
P-value
A 0.92 0.32± 0.14 [0.04, 0.61] 0.73 [0.55-0.84] < 0.001 B 0.91
0.33± 0.13 [0.08, 0.54] 0.51 [0.26-0.70] < 0.001 C 0.92 0.33±
0.11 [0.07, 0.53] 0.47 [0.21-0.67] < 0.001 D 0.77 0.08± 0.07
[0.00, 0.36] 0.05 [-0.24-0.33] 0.37 E 0.82 0.25± 0.11 [0.01, 0.44]
0.31 [0.03-0.55] 0.02 F 0.82 0.24± 0.11 [0.01, 0.44] 0.33
[0.04-0.56] 0.01 G 0.92 0.36± 0.13 [0.07, 0.57] 0.47 [0.21-0.67]
< 0.001 H 0.93 0.34± 0.13 [0.03, 0.56] 0.49 [0.24-0.68] <
0.001
4.3.3 Performance on test sets
Results for the 4 test sets are obtained using model G and the 4
training sets. ROC curves for voxel classification of the test sets
are presented in figure 13. AUC, Dice scores and ICC values for the
4 sets are given in table 6. Average Dice scores rang from 0.29 to
0.39 and average ICC values range from 0.47 to 0.64.
17
Figure 12: Correlation of edema volumes between model G and
observer 1.
False positive rate
0. 0
0. 2
0. 4
0. 6
0. 8
1. 0
Figure 13: ROC curves on the 4 test sets
Table 6: Performances of multiple test sets trained on multiple
training sets presented by the area under the ROC curve (AUC),
average Dice coefficient with the standard deviation and the
intraclass correlation coefficient (ICC) of the volumes. Dice
scores are given as Average Dice± standard deviation [minimum
Dice,maximum Dice].
Block AUC Dice ICC Average [95% CI] P-value
1 (n=46) 0.92 0.36± 0.13 [0.07, 0.57] 0.47 [0.21-0.67] < 0.001 2
(n=48) 0.93 0.31± 0.13 [0.06, 0.56] 0.64 [0.44-0.79] < 0.001 3
(n=48) 0.83 0.29± 0.13 [0.04, 0.55] 0.47 [0.21-0.67] < 0.001 4
(n=46) 0.94 0.39± 0.10 [0.17, 0.61] 0.52 [0.27-0.70] <
0.001
18
4.4 Post-processing
The smallest edema and ICH segments with voxels joined by a
6-connected neighborhood that included correctly classified voxels
are presented in table 7. Because a 0.95 probability tresh- old for
ICH classification did not increase ICH Dice scores, we did not use
this threshold for post-processing. We did not use a 0.70
probability threshold for edema classification for post- processing
since decreased probability thresholds resulted in decreasing Dice
scores. Table 8 shows Dice scores for various probability
thresholds and segment size thresholds for both edema and ICH
segmentations. Best Dice scores were obtained with probability
thresholds of 0.90 for both edema and ICH. Particle removal
thresholds were 700 voxels for edema and 500 voxels for ICH. We
used these thresholds for post processing segmentations of all
models. Table 9 present Dice scores and ICC values for the models
after post processing the segmentations of the CNN. We found that
model A scored best with an average Dice score of 0.44 and an
average ICC value of 0.68. The correlation of edema volumes of this
model and manual segmentations is shown in figure 14. Smaller edema
volumes correlate better than larger edema volumes. Example
segmentations of this model are shown in figure 15.
Table 7: Smallest sizes of particles (in nr. of voxels) that
include correctly classified edema or ICH voxels.
Probability threshold
0.70 - 744 0.80 793 669 0.90 708 545 0.95 591 -
Table 8: Dice scores for multiple probability thresholds and PRTs
for Edema and ICH segments.
Edema ICH Dice score Probability PRT Probability PRT
0.80 700 0.70 700 0.36± 0.18 [0.05, 0.77] 0.90 700 0.70 700 0.37±
0.16 [0.04, 0.71] 0.90 700 0.80 600 0.38± 0.16 [0.02, 0.71] 0.90
700 0.90 500 0.40± 0.16 [0, 0.71] 0.95 500 0.80 600 0.33± 0.17 [0,
0.68]
Figure 14: Correlation of edema volumes between model A after
post-processing and observer 1.
19
Table 9: Performances after post processing presented by average
Dice coefficient with the standard deviation and the intraclass
correlation coefficient of the volumes.
Training Dice ICC Average [95% CI] P-value
A 0.44± 0.17 [0, 0.73] 0.68 [0.49-0.80] < 0.001 B 0.42± 0.17 [0,
0.66] 0.65 [0.44-0.78] < 0.001 C 0.41± 0.16 [0, 0.72] 0.57
[0.33-0.73] < 0.001 D 0.06± 0.08 [0, 0.37] 0.63 [0.42-0.77] <
0.001 E 0.32± 0.15 [0, 0.58] 0.43 [0.17-0.64] < 0.001 F 0.29±
0.15 [0, 0.57] 0.40 [0.14-0.61] 0.002 G 0.43± 0.17 [0, 0.72] 0.61
[0.40-0.76] < 0.001 H 0.41± 0.18 [0, 0.66] 0.69 [0.50-0.81] <
0.001
(a)
(b)
(c)
Figure 15: Example scans with segmentations of segmenter 1 (left),
segmentations computed by the CNN (middle) and segmentations after
post-processing (right).
20
5 Discussion
In this study, we developed and evaluated multiple architectures of
CNNs to automatically segment hemorrhage and edema in patients with
an intracranial hemorrhage. Although high AUCs were obtained, the
spatial agreement as Dice coefficient and volumetric agreement as
expressed with the ICC were somewhat lower than the agreement
between multiple observers. Dice scores achieved by the best models
are not that much lower than average inter-observer variability.
Therefore, we conclude that our CNN produces acceptable
segmentations. How- ever, ICC values were noticeable lower than
inter-observer ICC values. This suggests that CNNs are promising
for edema segmentation.
Our study is the first study to use CNN for edema segmentation.
Bardera et al. (2009) already proposed a promising semi-automated
method for edema segmentation using a region growing approach and
tested this on 18 scans34. In contrast to their study, our study
provides a fully automated method for edema segmentation. Moreover,
in this study, edema segmentation was performed on CT scans, which
are more common in clinical practice than MRI scans because of the
high availability and imaging speed.
We showed that there was little difference in results when using
different test sets. We used the same test set for testing every
model since the Dice score and ICC are strict measures and
susceptible for size differences. Segmentations of edema with large
volumes tend to have higher Dice scores than segmentations of edema
with small volumes. Therefore, the composition of our test set with
regard to scans with large edema volumes and scans with small edema
volumes could have influenced average outcomes.
Our used CNN architecture was based on a previous study that
created this architecture for subarachnoid hemorrhage segmentation.
Although subarachnoid hemorrhage is similar in external structure
as edema, it differs in density and internal composition. Edema is
often credent-shaped surrounding the hemorrhage. Furthermore, there
is a high contrast between ICH voxels and brain voxels which makes
it easier to segment ICH than edema. We found that our CNN
performed well for ICH segmentation.
We only tested two patch sizes where the patch size of [19 × 19]
voxels proved to achieve the best results for hemorrhage
segmentation. For edema, the outcomes for these patch sizes did not
differ greatly, but our CNN performed better with a patch size of
[19 × 19] than with a patch size of [17 × 17]. Further research
should be performed to test performances of CNNs with larger patch
sizes.
We used a manual segmentation as gold standard for our study. Due
to the difficulty of edema segmentation, a manual segmentation is
never perfect. Using MRI scans as gold stan- dard would have
increased the reliability of our test results. Furthermore, manual
segmenta- tions of edema were obtained by subtracting manual
segmentations of ICH from manual seg- mentations of both edema and
ICH. The border between edema and ICH was determined by determining
the border of the ICH segmentation. Therefore, less hyperdense
voxels at the edge of ICH were often not classified as ICH. Edema
often surrounds ICH and because edema and ICH were segmented
together, these voxels at the border were misclassified as edema.
If hy- podense edema was taken in to account during the
determination of the ICH border, it would have been evident that
these less hyperdense voxels were still part of ICH.
Brain voxels with similar HU values as edema were often categorized
as edema. We tried to solve this problem by using a CNN model that
was trained on difficult brain patches. How- ever, this model
performed poorly in edema segmentation. Training on brain patches
that were similar to edema caused our CNN to misclassify more than
halve of edema patches as brain, because it was no longer able to
distinguish edema and brain.
We showed, using thick (1.4 billion input patches) and thin sliced
images (2.9 billion input
21
patches), that training on double the number of input patches did
not remarkably improve performance. Thus, we assume that using more
training patches will not improve our edema segmentation
method.
Although we improved the edema segmentation accuracy with
post-processing, this method was not perfect. The optimal threshold
differs for all scans. A post processing method that ad- justs the
particle removal threshold to the size of the largest particle of
the CNN segmentation could be able to include more edema
segmentations. The largest particle segments generally in- clude
the correct edema segment. An additional improvement was obtained
by the elimination of edema particles that are not adjacent to ICH.
We believe that there is potential to improve even further with
additional postprocessing.
Optimizing this automatic edema segmentation method could
eventually save experts time. Our method could already be used as a
semi-automated method when experts improve out- come segmentations
manually. This is already less time consuming than manually
segmenting edema from scratch.
Our study provides a basis for automatic edema segmentation using
CNNs. Although our results are sub optimal, improvements with
regard to CNN architecture and post-processing could further
enhance edema segmentation.
6 Conclusion
We found that optimal results were achieved with a CNN including 2
conv layers, 2 pool layers a FC layer and a softmax function.
Furthermore, training this CNN with patches of [19 × 19] voxels in
100 epochs achieved optimal results. We provided a promising
automatic hemorrhage and edema segmentation method in patients with
ICH using CNN. The accuracy, as assessed with Dice scores and ICC
values, approaches manual inter-observer variation.
22
References
1. S. Sacco, C. Marini, D. Toni, L. Olivieri, and A. Carolei,
“Incidence and 10-year survival of intracerebral hemorrhage in a
population-based registry,” Stroke, vol. 40, 2009.
2. M. E. Fewel, B. G. Thompson Jr., and J. T. Hoff, “Spontaneous
intracerebral hemorrhage: a review,” Neurosurg Focus, vol. 15, no.
4, p. E1, 2003.
3. K. R. Lee, A. L. Betz, S. Kim, R. F. Keep, and J. T. Hoff, “The
role of the coagulation cascade in brain edema formation after
intracerebral hemorrhage,” Acta Neurochirurgica, vol. 138, no. 4,
pp. 396–401, 1996.
4. K. R. Lee, N. Kawai, S. Kim, O. Sagher, and J. T. Hoff,
“Mechanisms of edema formation after intracerebral hemorrhage:
effects of thrombin on cerebral blood flow, blood-brain barrier
permeability, and cell survival in a rat model.,” Journal of
neurosurgery, vol. 86, no. 2, pp. 272– 278, 1997.
5. F.-P. Huang, G. Xi, R. F. Keep, Y. Hua, A. Nemoianu, and J. T.
Hoff, “Brain edema after experimental intracerebral hemorrhage:
role of hemoglobin degradation products,” Journal of Neurosurgery,
vol. 96, no. 2, pp. 287–293, 2002.
6. G. H. Xi, R. F. Keep, and J. T. Hoff, “Erythrocytes and delayed
brain edema formation fol- lowing intracerebral hemorrhage in
rats,” Journal of Neurosurgery, vol. 89, no. 6, pp. 991–996,
1998.
7. M.-A. Babi and M. L. James, “Peri-Hemorrhagic Edema and
Secondary Hematoma Expan- sion after Intracerebral Hemorrhage: From
Benchwork to Practical Aspects,” Frontiers in Neurology, vol. 8,
no. January, pp. 8–11, 2017.
8. A. W. Unterberg, J. Stover, B. Kress, and K. L. Kiening, “Edema
and brain trauma,” Neuro- science, vol. 129, no. 4, pp. 1021–1029,
2004.
9. C. Venkatasubramanian, M. Mlynash, A. Finley-Caulfield, I.
Eyngorn, R. Kalimuthu, R. W. Snider, et al., “Natural history of
perihematomal edema after intracerebral hemorrhage mea- sured by
serial magnetic resonance imaging,” Stroke, vol. 42, no. 1, pp.
73–80, 2011.
10. S. B. Murthy, Y. Moradiya, J. Dawson, K. R. Lees, D. F. Hanley,
and W. C. Ziai, “Peri- hematomal Edema and Functional Outcomes in
Intracerebral Hemorrhage: Influence of Hematoma Volume and
Location,” Stroke, vol. 46, no. 11, pp. 3088–3092, 2015.
11. G. Appelboom, S. S. Bruce, Z. L. Hickman, B. E. Zacharia, A. M.
Carpenter, K. A. Vaughan, et al., “Volume-dependent effect of
perihaematomal oedema on outcome for spontaneous intracerebral
haemorrhages.,” Journal of neurology, neurosurgery, and psychiatry,
vol. 84, no. 5, pp. 488–93, 2013.
12. J. M. Gebel, E. C. Jauch, T. G. Brott, J. Khoury, L. Sauerbeck,
S. Salisbury, et al., “Relative edema volume is a predictor of
outcome in patients with hyperacute spontaneous intrac- erebral
hemorrhage,” Stroke, vol. 33, no. 11, pp. 2636–2641, 2002.
13. C. Kidwell, J. Chalela, J. Saver, S. Starkman, M. Hill, A.
Demchuk, et al., “Comparison of MRI and CT for detection of acute
intracerebral hemorrhage.,” JAMA : the journal of the American
Medical Association, vol. 292, no. 15, pp. 1823–1830, 2004.
23
14. B. Volbers, D. Staykov, I. Wagner, A. Dorfler, M. Saake, S.
Schwab, et al., “Semi-automatic volumetric assessment of
perihemorrhagic edema with computed tomography,” European Journal
of Neurology, vol. 18, no. 11, pp. 1323–1328, 2011.
15. A. M. M. Boers, “Automatic Quantification after Subarachnoid
Hemorrhage on Non- Contrast Computed Tomography,” pp. 1– 47,
2014.
16. P. b. Dvorak, K. Bartusek, and W. Kropatsch, “Automated
segmentation of brain tumor edema in FLAIR MRI using symmetry and
thresholding,” Progress in Electromagnetics Re- search Symposium,
pp. 936–939, 2013.
17. T. Y. Wu, O. Sobowale, R. Hurford, G. Sharma, S. Christensen,
N. Yassi, et al., “Software output from semi-automated planimetry
can underestimate intracerebral haemorrhage and peri-haematomal
oedema volumes by up to 41%,” Neuroradiology, vol. 58, no. 9, pp.
867–876, 2016.
18. R. S. Barros, W. E. van der Steen, M. Boers, I. Zijlstra, R.
van den Berg, W. E. Youssoufi, et al., “Segmentation of
Subarachnoid Hemorrhage in Computed Tomography Images Us- ing
Convolutional Neural Networks.” 2017.
19. H. R. Roth, A. Farag, L. Lu, E. B. Turkbey, and R. M. Summers,
“Deep convolutional net- works for pancreas segmentation in CT
imaging,” 2015.
20. W. Shen, M. Zhou, F. Yang, C. Yang, and J. Tian, “Multi-scale
convolutional neural networks for lung nodule classification,”
Lecture Notes in Computer Science (including subseries Lecture
Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol. 9123, pp. 588–599, 2015.
21. K. H. Cha, L. Hadjiiski, R. K. Samala, H.-P. Chan, E. M.
Caoili, and R. H. Cohan, “Urinary bladder segmentation in CT
urography using deep-learning convolutional neural network and
level sets,” Medical Physics, vol. 43, no. 4, pp. 1882–1896,
2016.
22. B. Van Ginneken, A. A. A. Setio, C. Jacobs, and F. Ciompi,
“Off-the-shelf convolutional neural network features for pulmonary
nodule detection in computed tomography scans,” 2015 IEEE 12th
International Symposium on Biomedical Imaging (ISBI), pp. 286–289,
2015.
23. M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y.
Bengio, et al., “Brain tumor segmentation with Deep Neural
Networks,” Medical Image Analysis, vol. 35, pp. 18–31, 2017.
24. K. de Gans, R. J. de Haan, C. B. Majoie, M. M. Koopman, A.
Brand, M. G. Dijkgraaf, et al., “PATCH: platelet transfusion in
cerebral haemorrhage: study protocol for a multicentre, randomised,
controlled trial.,” BMC neurology, vol. 10, p. 19, 2010.
25. S. Hijazi, R. Kumar, C. Rowen, and C. I. Group, “Using
Convolutional Neural Networks for Image Recognition,” Cadence
Whitepaper, pp. 1–12, 2015.
26. K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun, “What is
the best multi-stage archi- tecture for object recognition? BT -
Computer Vision, 2009 IEEE 12th International Confer- ence on,”
Computer Vision, 2009 . . . , pp. 2146–2153, 2009.
27. P. Golik and P. Doetsch, “Cross-Entropy vs. Squared Error
Training:a Theoretical and Ex- perimental Comparison,” Interspeech,
Isca, vol. 2, no. 2, pp. 1756–1760, 2013.
28. A. NG, “Machine Learning,” 2017.
24
29. Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional
networks and applications in vision,” ISCAS 2010 - 2010 IEEE
International Symposium on Circuits and Systems: Nano-Bio Circuit
Fabrics and Systems, pp. 253–256, 2010.
30. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet
Classification with Deep Convo- lutional Neural Networks,” Advances
In Neural Information Processing Systems, pp. 1–9, 2012.
31. S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating
Deep Network Training by Reducing Internal Covariate Shift,”
2015.
32. P. A. Yushkevich, J. Piven, H. C. Hazlett, R. G. Smith, S. Ho,
J. C. Gee, et al., “User-guided 3D active contour segmentation of
anatomical structures: Significantly improved efficiency and
reliability.,” Neuroimage, vol. 31, no. 3, pp. 1116–1128,
2006.
33. D. Yu, A. Eversole, M. Seltzer, K. Yao, Z. Huang, B. Guenter,
et al., “An Introduction to Computational Networks and the
Computational Network Toolkit,” Tech. Rep. MSR-TR- 2014-112,
2015.
34. A. Bardera, I. Boada, M. Feixas, S. Remollo, G. Blasco, Y.
Silva, et al., “Semi-automated method for brain hematoma and edema
quantification using computed tomography,” Com- puterized Medical
Imaging and Graphics, vol. 33, no. 4, pp. 304–311, 2009.
25
Acronyms
AUC area under the curve. 12
CE cross entropy. 5
conv convolutional. 7
ICH intracerebral hemorrhage. 4
26
Samenvatting
Summary
Introduction
Preliminaries
Hardware and Software
Performance on test sets