Top Banner
IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS , STOCKHOLM SWEDEN 2018 Uveal melanoma identification using artificial neural networks BUSTER STYREN KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
43

Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Oct 07, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2018

Uveal melanoma identification using artificial neural networks

BUSTER STYREN

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Page 2: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts
Page 3: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Uveal melanoma identification using artificialneural networks

Buster Styren

Supervisor (KTH): Johan GustavssonPrincipal (S:t Erik): Lotta All Eriksson

November 12, 2018

Page 4: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Abstract

Uveal melanoma is a deadly form of cancer that can develop from a uveal nevusin the eye fundus. By using deep convolutional networks this thesis aims toclassify fundus images based on malignancy.

A baseline model was compared against two state-of-the-art networks, Inception-v3 and ResNet. The baseline model was trained using different gradient de-scent optimizers and image augmentations to find the best hyper parametersfor the data. The state-of-the-art networks achieved comparable accuracy, withInception-v3 achieving 0.912 AUC after training on 8360 samples.

With 96% sensitivity, the same value as ophthalmologists, the top networkachieves a specificity of 59% meaning that the network can greatly reduce theamount of manual naevi eye examinations by filtering out healthy subjects.

Page 5: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Sammanfattning

Uvealt melanom är en dödlig form av cancer som orsakas av pigmenförändringari retina. Sjukdomen har en hög risk at metastasera sig i levern och när metas-taserna är kliniskt manifesta är överlevnaden i allmänhet begränsad till någrafå månader.

Genom att träna ett neuralt nätverk är målet med detta arbete att klassifi-cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Dettahar genomförts genom att utvärdera tre olika faltningsnätverk. NätverkenInception-v3 och ResNet har jämförts med ett simpelt sex-lagers nätverk. Enrad olika konfigurationer av hyperparametrar har utvärderats för att hitta enoptimal modell.

Efter träning på 8360 datapunkter nådde Inception-v3 ett AUC-värde på 0.912.Med 96% sensitivitet, vilket är samma nivå som oftalmologer, uppnår nätver-ket 59% specificitet. Alltså kan nätverket filtrera bort en stor del av de friskapatienter som undersöks av läkare. Detta kan därför innebära en stor resursef-fektivisering för vården av patienter med pigmentförändringar i retina.

Page 6: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Contents

1 Introduction 11.1 Thesis objective and research question . . . . . . . . . . . . . . . 11.2 Conditions and limitations . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 32.1 Uveal naevi and melanoma . . . . . . . . . . . . . . . . . . . . . 32.2 Artificial neural network . . . . . . . . . . . . . . . . . . . . . . . 62.3 Convolutional neural network . . . . . . . . . . . . . . . . . . . . 6

2.3.1 Convolutional layer . . . . . . . . . . . . . . . . . . . . . . 72.3.2 Pooling layer . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.3 Fully connected layer . . . . . . . . . . . . . . . . . . . . . 92.3.4 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3.5 Image augmentation . . . . . . . . . . . . . . . . . . . . . 102.3.6 Output (softmax) . . . . . . . . . . . . . . . . . . . . . . 102.3.7 Backpropagation . . . . . . . . . . . . . . . . . . . . . . . 102.3.8 Backprop optimizers . . . . . . . . . . . . . . . . . . . . . 11

2.4 Computer vision benchmarking . . . . . . . . . . . . . . . . . . . 122.5 Notable ImageNet networks . . . . . . . . . . . . . . . . . . . . . 12

2.5.1 AlexNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5.2 VGG Net . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5.3 GoogLeNet . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5.4 Microsoft ResNet . . . . . . . . . . . . . . . . . . . . . . . 15

2.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.8 Relevance for this work . . . . . . . . . . . . . . . . . . . . . . . 17

3 Method 183.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2 Augmentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3 Training, testing & validation . . . . . . . . . . . . . . . . . . . . 193.4 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4.1 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.4.2 Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.5 Network setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.6 Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Results 234.1 Baseline model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

i

Page 7: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

4.1.1 Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.1.2 Learning rate . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2 Baseline and state-of-the-art comparison . . . . . . . . . . . . . . 26

5 Discussion 285.1 Optimizer and learning rate . . . . . . . . . . . . . . . . . . . . . 285.2 Randomness in results . . . . . . . . . . . . . . . . . . . . . . . . 285.3 Image cropping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.4 Batch size limitation . . . . . . . . . . . . . . . . . . . . . . . . . 295.5 Image augmentations . . . . . . . . . . . . . . . . . . . . . . . . . 305.6 Accuracy and AUC . . . . . . . . . . . . . . . . . . . . . . . . . . 305.7 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.8 Economic, ecological and social sustainability . . . . . . . . . . . 315.9 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6 Conclusion 33

ii

Page 8: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Chapter 1

Introduction

Uveal naevi are birthmarks in the eye that can, in rare cases, develop intomalignant melanoma. However this cancer is often asymptomatic which meansthat a patient with a naevus have to do regular checkups at an optician thatphotographs the patient and sends the image to a qualified ophthalmologists forexamination. Since eye hospitals usually do not have time to go through imagesfrom more than 10% of the general population this examination is often notperformed for every patient. This introduces a risk for the patient as malignantmelanoma has a mortality rate of approximately 50% [12]. Early discoveryimproves the survival rate, and it is therefore essential for a patient with riskfactors to do regular checkups.

One way to solve the task of automatic examination of uveal naevi would beto program each risk factor that can cause malignant development. Howeverdue to advancements in image classification using deep learning during the lastyears these tasks can usually be solved instead using deep neural networks.The difference being that the network itself learns the features associated withdevelopment of uveal melanoma.

By creating an artificial neural network that can classify a subset of fundusimages of uveal naevi the stress on eye hospitals can be lowered. The effectis twofold, it increases the time doctors can spend treating other patients orengage in other hospital work. It also increases the amount of patients thatcan get treatment in time for their melanoma. By increasing efficiency of uvealmelanoma discovery it is possible that early discovery using artificial intelli-gence can be integrated into general optician care. Allowing for the discoveryof melanomas in passing.

1.1 Thesis objective and research question

The aim of this project is to construct a deep convolutional neural network toautomatically examine and classify patients’ fundus images. By being able toearly filter out patients that do not need to see a doctor the goal is to reduce

1

Page 9: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

the burden put on the doctors to allow more people the chance to get checkedregularly.

However, albeit uveal naevi being present in a large part of our populationuveal melanoma is still a rare disease. Therefore the amount of data is limited,only about 80 cases of malignant uveal melanoma are diagnosed in Sweden eachyear. Therefore the aim of the project is to construct a network that can avoidoverfitting and filter out a sizable amount of patients.

The limiting factor on whether or not deep learning is feasible usually dependson the amount of training data, complexity of the input data and number ofclasses. In this thesis the amount of training data is limited due to the rarityof the disease. The goal is therefore to achieve a high enough specificity withthe sensitivity of an ophthalmologist in order to only let the doctors examine asmall subset of the cases where the model cannot make an accurate prediction.

Research question: Can deep convolutional neural networks generalize thetask of identifying uveal melanoma from fundus images of patients with uvealnaevi?

This thesis evaluates if convolutional neural networks are capable of generalizingthe task of identifying melanoma in fundus images. This is done by compar-ing two state-of-the-art convolutional networks against a baseline model usingdifferent gradient descent optimizers and image augmentations.

The results are compared using AUC (area under curve), the area under theROC-curve (receiver operating characteristic), see section 2.6.

1.2 Conditions and limitations

The study was done at S:t Eriks Eye Hospital in Stockholm and supervised byLotta All Eriksson, M.D. specialized in eye oncology. The study has receivedfunds from SLL Innovations (Stockholms läns landsting).

Since the neural network processes personal data with fundus images of hospi-tal patients the study was approved by the Central Ethics Review Board (ref.2018/945-31/1). All the data used in the thesis was anonymized and no patientpictures are included in the thesis.

The study uses around 10.000 pictures of fundus images from ∼ 2000 patientsat S:t Eriks Eye Hospital between 2011 and 2017. This is a lower amount thanmany similar projects on medical images, see section 2.7.

2

Page 10: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Chapter 2

Background

This section goes over the fundamentals of uveal naevi and uveal melanoma andlists the risk factors that ophthalmologists look for when examining an uvealnaevus.

It also includes a description of artificial neural networks (ANN) and convo-lutional neural networks (CNN). The layers and components that make up aCNN are described in detail, together with examples of current state-of-the-artnetworks and gradient descent optimizers.

The section ends with a brief history of image classification, related work andhow the information is tied together in the following chapters.

2.1 Uveal naevi and melanoma

Birthmarks in the eye (uveal naevi), are small benign tumors that in rare casesdevelop into malignant melanoma. It is estimated that ∼ 20% of the whitepopulation over 50 years have at least one naevus. Uveal melanoma is muchmore common in people with lightly pigmented skin [24]. It has an incidence ofeight per million to two per million ranging from northern to southern Europe.In USA the incidence is five per million [12]. Around 80 patients are diagnosedwith malignant uveal melanoma each year in Sweden and are treated at S:tEriks eye hospital in Stockholm.

Uveal melanoma has a tendency to metastasize to the liver and when the livermetastases are clinically manifested the survival rate is usually limited to a fewmonths. Approximately 50% of patients succumb to metastasis within 10 yearsof diagnosis [12].

The risk of developing uveal melanoma from a naevus is determined by anophthalmologist. The assessment is done using a biomicroscope and takes ap-proximately 10-15 minutes. At the first examination the fundus (the back ofthe eye) is usually photographed so that it can be compared to as a referenceat following examinations to assess possible progress.

3

Page 11: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Figure 2.1: (A) Healthy eye with naevus. (B) eye with melanoma. Notice thespots of orange pigment on the center of the naevus. [20]. (C) Healthy eye of a25-year old without uveal naevi or any signs of disease or pathology. The optic

disc is located to the left.Credit: Mikael Häggström, used with permission

Feature Hazard ratio1Probability to

develop melanoma ExampleThickness >2 mm 2 19% Fig. 2.2aFluid 3 27% Fig. 2.2dSymptoms2 2 23% -Orange pigment 3 30% Fig. 2.2cMargin <3 mm to optic disc 2 13% Fig. 2.2bUltrasonographic hollowness3 3 27% Fig. 2.2aHalo absent4 6 7% Fig. 2.2fDrusen absent5 N/A N/A Fig. 2.2e1 (probability for melanoma with risk factor) / (probability for melanoma, gen-eral population);

2 Distortion, flashes of light, loss of vision;3 Hollowness of the naevus in ultrasound;4 A circular band of depigmentation;5 Drusen are yellow deposits under the retina, made up of lipids, a fatty pro-tein. https://www.aao.org/eye-health/diseases/what-are-drusen ac-cessed 2018-02-22;

Table 2.1: Risk factors doctors look for when evaluating a uveal naevus.

Table 2.1 lists the seven different risk factors ophthalmologists look for whendoing a naevus examination [24]. Without any risk factors re-examination isdone after six months and then yearly. With at least one factor re-examinationsare done every three months for the first year, then every six months for up totwo years, then yearly.

Figure 2.2 displays the risk factors. Thickness and hollowness are the only riskfactors that can not be identified using fundus imagery and instead requiresultrasound (figure 2.2a).

Looking at figure 2.2c,2.1 (B) it is possible to see the orange pigment around thecenter of the naevus. Also the naevi are absent of halo and drusen. A healthy

4

Page 12: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

eye can be seen in figure 2.1 (C) for comparison.

(a) Ultrasound of uveal naevus.(b) naevus with < 2mm margin to optic

disc.

(c) naevus with orange pigment. (d) naevus with fluid.

(e) naevus with drusen.(f) naevus with a halo surrounding it - a

sign it will not develop melanoma.

Figure 2.2: Risk factors

5

Page 13: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

2.2 Artificial neural network

In the animal nervous system there are millions of interconnected cells that aidin the reception and transmission of nerve impulses. While nerve impulses takelonger time to react to stimuli compared to electronic logic gates they are ableto solve problems that no computer can efficiently deal with [17].

Artificial neural networks (ANNs) are computing systems that resemble theinformation processing capabilities of the nervous system. An ANN consists oflayers of neurons that connect to eachother. Each node has an assigned weightwhich determines its significance when calculating the output.

When training, the output value of the network is compared to the groundtruth value, commonly calculated using mean square error, to compute the loss(error). The weights are revalued and re-calibrated throughout training in orderto minimize the loss, usually using the backpropagation algorithm described insection 2.3.7.

A simple multilayer perceptron (MLP) with four inputs, one hidden layer of fivenodes, and one output node is displayed in figure 2.3.

Input #1

Input #2

Input #3

Input #4

Output

Hiddenlayer

Inputlayer

Outputlayer

Figure 2.3: Multilayer perceptron with one hidden layer.

2.3 Convolutional neural network

Due to the importance of the spatial relationship between pixels it is oftenunfeasible to use multilayer perceptrons for the purpose of image data, this isdue to the fact that we most likely do not need to connect every pixel in the inputlayer to all the inputs in the following layer, which causes an unnecessary amountof parameters. Using MLPs for image data can therefore cause overfitting andrequires a lot of computational power.

A convolutional neural network (CNN) is a specialized feed-forward neural net-work that works well on image data. The CNN structure is inspired by themammalian visual cortex [2]. Instead of connecting all pixels the network looks

6

Page 14: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

for small patterns in the image using a set of filters. By reducing the receptivefield from the whole image (in the case of an ANN) to sizes ranging from forexample 1 × 1 to 5 × 5 pixels the amount of parameters and computationalcomplexity is lowered.

A CNN typically consists of three types of layers. These are convolutional layers,pooling layers and fully-connected layers. An exemplary CNN is displayed infigure 2.4, combining convolutional layers with pooling layers (subsampling),finally connecting all output layers using a fully connected layer. A rectifier isusually added after the convolutional steps to introduce non-linearity into thenetwork.

Figure 2.4: Typical CNN layoutCredit: Denis Antyukhov / CC-BY-SA-4.0

2.3.1 Convolutional layer

The convolutional layer moves a filter across the image and compares the filtersto the input data. Filters are usually small in spatial dimensionality. As thefilter convolves across the spatial dimensionality of the data a 2D activationmap is created. The activation map computes the scalar product for each filterat the specified position of the input data. This indicates the presence of thefilter in the input image [16].

7

Page 15: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Figure 2.5: Convolution of 3× 3 filter computes a value in the resulting 2Dfeature map corresponding to the dot product of 3× 3 input and filter.

Source: cambridgespark.com

Figure 2.5 displays a convolution step of input I with a 3 × 3 receptive field.The dot product of input I and filter K is a value that indicates if the filter ispresent in the input volume at the specific location (marked in red). In this casefour of the values in the filter K match the input I which gives the 2D featuremap (displayed in green).

The output of the convolutional layer will be one feature map for each appliedfilter. The stride is a value that changes how the receptive field convolves. Astride of two will move the receptive field two pixels when convolving. Increasingthe stride reduces the dimensionality of the output, increasing the stride from1 to 2 reduces the spatial dimensionality to 1/4th.

The output of the feature map will be both negative and positive, howevera ReLU-layer is usually added after each convolutional layer that transformseach output in the feature map. ReLU is an activation function that introducenon-linearity into the network [8].

f(x) = max(0, x) (2.1)

By chaining convolutional layers together the network can find higher dimen-sional patterns in the image by applying filters to the activation maps. Byapplying multiple filters in each convolutional step the number of activationmaps increases exponentially.

2.3.2 Pooling layer

The aim of the pooling layer is to downsample the spatial dimensionality of theinput. This is most commonly done using average- or max-pooling.

Max pooling is done by selecting a stride and patch size and traversing the input.For each iteration a window function u(x, y) is applied to the input patch and

8

Page 16: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

the maximum value over all n× n input is computed in the neighborhood.

Aj = maxN×N

(an×ni u(n, n)) (2.2)

With a 2× 2 receptive field and a stride of 2 this discards 3/4 of activations inthe input as seen in figure 2.6. Average pooling instead takes the average of thevalues in the neighborhood [18].

Figure 2.6: Max pooling with 2× 2 receptive field and a stride of 2.Credit: Denis Antyukhov / CC-BY-SA-4.0

Subsampling is a generalization of the avarage pooling operation that includesa trainable bias b and a trainable scalar β.

Aj = tanh(β∑N×N

aN×Ni + b) (2.3)

Subsampling has however been shown to be worse at capturing invariances inimage data [18].

2.3.3 Fully connected layer

After series of convolutional and pooling layers the fully connected layers areadded. They connect all the activations of the previous layer, this requires a hugeamount of parameters. This works much like a normal multilevel perceptron,with weights and biases.

With this step all the activation maps generated throughout are connected whichenables the network to subtract information that depends on relationships be-tween the output volumes of the previous steps. However if the amount of datais not sufficiently big this can cause overfitting since the network can makecomplex co-adaptions to the training data [10].

9

Page 17: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

2.3.4 Dropout

Dropout is a regularization technique introduced to reduce overfitting caused bycomplex adaptations to the training data in the fully connected layer. Dropoutomits a percentage of the units on each training case. By omitting units eachtraining epoch will learn features that are helpful for the classification whileavoiding co-dependency between units [10].

2.3.5 Image augmentation

Due to the varying quality of cameras, differences in lighting and other environ-mental factors it is possible to improve the learning by normalizing the imagesfrom these factors.

Featurewise centering takes the mean value for each feature (pixel) of thedataset and normalizes it for all datapoints. By using featurewise standardnormalization we calculate the standard deviation of each feature in the inputspace to reduce the variance of each feature as well. By combining these twocorrections the features will be easier to compare as they have uniform varianceand mean.

Histogram normalization creates a histogram of the intensities of all colourchannels. The histogram is then stretched out so that a larger part of the colourspectrum is represented in the image.

By altering the data it is possible to create more distinct variants from theoriginal dataset. The goal is to reduce overfitting caused by the limited amountof data. Alterations such as translations and rotations will cause the networkto adapt to differences in camera settings.

Augmentation has been used with great success for classification of diabetesretinopathy [4], which also uses fundus images. Data augmentation was also acore concept in AlexNet [13].

2.3.6 Output (softmax)

The last step of a CNN is usually to apply a softmax function in order to limit theprobabilities of each classification between [0, 1] and the sum of all probabilitiesto 1. The highest value output of the softmax function corresponds to the classwith the highest probability.

2.3.7 Backpropagation

Backpropagation is a method commonly used in order to update the weights ofa deep neural network.

The four steps of the backpropagation algorithm:

• In the forward pass a training data point is fed through the network andthe output is calculated.

10

Page 18: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

• The loss function calculates the difference between the network outputand the expected output, usually using mean-square error. The goal is tominimize the loss by updating the weights and biases.

• Working backwards, the error rate is calculated for each neuron in thenetwork. The partial derivative of the error rate with respect to the weightindicates each neuron’s contribution to the output value.

dlidwi

(2.4)

• The weight update in the last step is dependent on the gradients calcu-lated in the previous step.

w = wi − ηdlidwi

(2.5)

with wi being the previous weight and η the learning rate.

The learning rate is determined by the user. A higher learning rate meansfaster convergence but also the risk of overcorrecting and getting a result pastthe optimum. A lower learning rate risks getting caught in local minima, andreduces training speed.

Using a single datapoint when calculating gradient descent is called stochasticgradient descent (SGD).

In order to speed up the process a method called mini-batch gradient descentis used. This means that the gradients for a whole batch are calculated con-currently and updated by taking the mean over the gradients. This results in asmoother gradient, however the noise of a smaller batch size can also help withovercoming local minima.

2.3.8 Backprop optimizers

Using a standard SGD optimizer can sometimes lead to the weights ending upin local minima, it also risks getting stuck in saddle points. To avoid these risksa momentum term can be added that accelerates the gradients to overcomelocal minimas or saddle points. Momentum m adds a fraction γ of the updatevector v from the previous update. The momentum vector is then added to theupdate vector for the current epoch. ∇θJ(θ) denotes the gradients calculatedby minimizing the objective function, in this case the training loss function. ηis the learning rate.

mt = γvt−1 (2.6)

vt = η∇θJ(θ) +mt (2.7)

Features that are frequently used will also be updated frequently during train-ing. This can cause the value to fluctuate heavily. Adaptive optimizers aimto solve this problem by adjusting the learning rate based on how frequentlya feature occurs during training. Adagrad, Adadelta, Nadam, Adam, Adamaxand RMSProp are examples of such optimizers. They generally differ in howthey store the past gradients.

11

Page 19: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

2.4 Computer vision benchmarking

Figure 2.7: Example ImageNet training data.

The ImageNet challenge is a classification test that has been the de facto bench-mark for image classification and computer vision [11]. The participants traintheir network on over one million labeled training samples. Their networks areevaluated on a test set of unlabeled data. Performance is measured based onthe top five error rate, which means the rate of images where the ground truthlabel is not in the top five predictions. The test covers 150, 000 images and1000 classes. Example images are shown in figure 2.7.

The first big breakthrough in image classification came when AlexNet in 2012reached a top five error rate of 15.4% [13]. Since then the results on the ImageNetchallenge have improved drastically, with networks such as GoogLeNet (2015)having a top five error rate of only 6.7% [22], and Microsoft’s ResNet reaching3.6% top five error rate [9].

2.5 Notable ImageNet networks

2.5.1 AlexNet

AlexNet was the first CNN to win the prestigious ImageNet challenge in 2012and the paper, titled “ImageNet Classification with Deep Convolutional Net-works” is regarded as one of the most influential in the field and has been citedover 21.000 times as of 2018.

AlexNet uses five convolutional layers, max-pooling and three fully connectedlayers with dropout and softmax. The layout can be seen in figure 2.8. AlexNetalso augments the dataset by mirroring and cropping the data. By doing five211 × 211 patches of the 256 × 256 input data as well as horizontal reflectionsoverfitting is substantially reduced. The prediction is averaged over the teninput images in the softmax layer.

Since object identity is independent to changes in the intensity and colour ofthe illumination the balance of each pixel is adjusted.

12

Page 20: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Figure 2.8: AlexNet pipelineSource: cv-tricks.com

2.5.2 VGG Net

In 2014 VGG Net was introduced. While only getting second place in theImageNet challenge that year the network displayed valuable insights aboutfilter sizes and network depth. Simonyan et al. [21] argued that by usingsmaller filters the receptive field can be kept big by instead increasing the depthof the network. For example two convolutional layers with 3× 3 filters will stillhave a 5 × 5 receptive field. This greatly reduces the number of parameterswhile still having an effective receptive field of 5× 5. This can be compared toAlexNet which used filters as big as 11× 11.

2.5.3 GoogLeNet

GoogLeNet was the 2015 the winner of the ImageNet Challenge. GoogLeNetis based on the Inception network and introduces the inception module, a wayto string different convolutional- and max-pooling layers together while keepingthe amount of parameters and memory usage low [22].

For example, GoogLeNet uses 12 times fewer paremeters than AlexNet [22],while still having a significantly lower top five error rate. GoogLeNet is thereforefit for real world cases were there are limits on the amount of computationalpower and time that can be used for training.

The Inception v3 pipeline is displayed in figure 2.9. Notice how the networkconsists of a set of bigger modules that perform multiple separate convolutions.

13

Page 21: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Figure 2.9: Inception v3 network pipelineSource: aws.amazon.com

(a) Naive implementation

(b) Implementation with dimensionality re-duction

Figure 2.10: Inception module

At each step of a CNN the network designer has to choose whether to do a con-volution or a downsampling, and what receptive field size and other parametersto use. GoogLeNet solves this by implementing an inception model, a smallbuilding block that does 1× 1, 3× 3, 5× 5 convolutions and 3× 3 max-poolingat the same time, see figure 2.10.

However looking at 2.10a the output volume would be extremely deep. In orderto solve this 1 × 1 convolutions are added, these convolutions are a featurepooling technique that causes a dimensionality reduction by merging the filterlayers. For example if we have a 100 × 100 × 60 (100 × 100 in the spatialdimension and 60 filters) input, a 1× 1 convolution with 20 filters would resultin a 100× 100× 20 output.

14

Page 22: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

1 × 1 convolutions can be thought of as a feature pooling, instead of reducingthe spatial dimensions as we do in the max-pooling, we reduce the depth bypooling the filters together.

By reducing the depth of the output volume the memory requirement is lowered.This allows us to, for example, work with larger images or less hardware.

The last step of each inception module is the Filter concatenation layerwhich concatenates the outputs from the convolutional layers along the depthdimension. For the convolutional operations the image is padded to be able todetect patterns in the corners and edges of the map, this also helps in preservingthe spatial dimensions. For example for a 3 × 3 convolution the input volumewill be padded with two white pixels on every edge. Since all operations on eachbranch of the inception module use a stride of one the spatial dimensions arepreserved. This allows all of the layers to be concatenated.

2.5.4 Microsoft ResNet

Following GoogLeNet’s success in 2014 a team at Microsoft was able to pushtheir network even further. ResNet is a 152 layer network that achieved a recordbreaking 3.6% top five error rate in the 2015 rendition of the ImageNet challenge.

Before ResNet it was uncommon for CNNs to be as deep, with GoogLeNet being21 layers for example. This is mainly due to the vanishing gradient problemwhich can cause the weight update in the backpropagation to be vanishinglysmall since it is proportional to the gradient of the error. ResNet solves thisby introducing the residual block (figure 2.11), also called identity shortcutconnection. The residual block passes the input through each block, this allowsthe gradient to flow through the network since the addition operation distributesthe gradient.

Figure 2.11: Residual block.Source: towardsdatascience.com

2.6 Evaluation

In order to evaluate a binary classification model it is not sufficient to measureonly the accuracy of correctly classified samples, such as the measurement ofthe ImageNet challenge. The reason is that for each classificiation there are

15

Page 23: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

four outcomes depending on if the patient is healthy and whether or not theclassification was correct.

Sensitivity, or true positive rate, is the probability of a positive test given thatthe patient has the disease. A high sensitivity therefore means that the networkis good at predicting when disease is present. On the contrary, specificity, or truenegative rate, denotes the probability of a negative test given that the patient ishealthy. A high specificity means that the network is good at predicting whenthe patient is healthy.

sensitivity =true positive

positive(2.8)

specificity =true negative

negative(2.9)

By setting a lower bound on the confidence on each prediction it is possible tospecify sensitivity in the network. By adjusting this threshold the specificitywill also be changed in the opposite direction. This is often illustrated using aReceiver operating characteristic (ROC) curve that plots the true positive rate,against the false positive rate, which is 1− true negative rate.

2.7 Related work

Similar models have been constructed for other eye related diagnoses. Gulshanet al.[23] implemented the inception-v3 network for grading diabetes retinopathyseverity in fundus photographs. The dataset consisted of 128175 fundus imagesgraded three to seven times by ophthalmologists. The two validation sets of 9963and 1748 images displayed high sensitivity and specificity for grading of diabetesretinopathy and related eye diseases. As seen in figure 2.12 the performanceconverges at around ∼ 60.000 images, with more than 50% specificity and 97%sensitivity at only a few thousand training samples. This demonstrates thepower of the inception network also for smaller data sets.

Figure 2.12: Specificity at each data set size relative to specificity at maximumdata size for diabetes retinopathy grading [6].

16

Page 24: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Esteva et al.[6] trained an inception-v3 network using ∼ 130000 images of skinlesions in an attempt to detect skin cancer. The network was able to achieveperformance on par with dermatologists at classification. The network wastrained on a total of 757 classes of benign and malignant tumors and naevi.

Deep convolutional networks have been implemented for detection of other can-cers, notable mentions: breast cancer diagnosis [3], lung cancer diagnosis [19],brain tumor segmentation [14].

2.8 Relevance for this work

Using convolutional networks the aim is to create a model than can identifythe key risk factors described in section 2.1. A baseline model created usingconvolutional-, pooling and fully connected layers will be compared against anInception-v3 and ResNet network.

The fully connected layers will conclude which classification is most approriatebased on the risk factors and patterns recognized by the convolutional layers.Since Ting et al.[23] achieved more than 50% specificity at 97% sensitivity whenclassifying diabetes retinopathy after training on less than 5000 samples it isreasonable to believe that generalization of uveal melanoma classification ispossible using more than 8000 fundus samples.

All networks will be trained and evaluated using different image augmentations(described in section 2.3.5) in order to artificially enlarge the network with moretraining samples, and to normalize in order to compensate for environmentalfactors. Since images are sent to S:t Erik Eye Hospital from all over the countrycamera configurations vary greatly (see figure 2.2 for examples), therefore imagenormalization can atone for these differences.

17

Page 25: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Chapter 3

Method

This section describes the tools used when training and evaluating the models.It also covers how data collection was done and what data was used and howthe models were evaluated.

3.1 Dataset

The medical journals of each patient was examined to get the proper diagnosisassociated with the patients fundus images. While classification in the journalusually belonged to three categories: benign naevus, atypical naevus and malig-nant melanoma, many patients had disease that the doctors could not diagnose.These pictures were reexamined by a doctor at the hospital and categorizedaccordingly. A total of 2387 medical journals were processed and 27996 pic-tures were manually downloaded from the hospital database. Of patients withuveal melanoma there are both pictures before and after treatment however onlypicture before treatment are used in this thesis.

Some of the patients with melanoma has had previous pictures taken whendoctors deemed the tumor to be benign. In some cases the doctors can seeprogress in the naevus but are waiting a few months before reexamination andofficially declaring and treating it as a melanoma. Since the goal of the study isearly discovery of uveal melanoma the hypothesis is that classifying early photosthat eventually lead up to melanoma as atypical naevi can have a positive effecton accuracy.

All post-treatment images were removed, as well as blurry-, black- and greyscale-images. Of the total 27996 images 9289 pictures were used in the study.

3.2 Augmentations

The following three augmentations were performed:

• Flips (90 degree rotations and mirroring)

18

Page 26: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

• Mean correction (featurewise centering + featurewise standard normaliza-tion)

• Histogram equalization

Flips are done at random with an equal chance for each of the eight distinctpermutations.

The average and standard deviation for each pixel were calculated for each colourchannel of the whole dataset, pictured in figure 3.1. As seen in figure 3.1b thevariation is very high at the edges of the image. This is due to the fact thatthe dataset contains images from many different cameras with varying angle ofview and size.

(a) Average (b) Standard deviation

Figure 3.1: Mean correction over the dataset.

Since the pictures are rectangular they were center-cropped to remove as muchblack noise as possible. This was done by creating a square mask in the centerof the image and removing the left- and right-hand side of it.

3.3 Training, testing & validation

The dataset was split into three subsets: training, validation and test (holdout),with 10% of the patients put into the test set. A histogram over pictures-per-patient was drawn for both holdout- and non-holdout patients to make surethat the distribution was even, see Figure 3.2. Furthermore, to make sure thatthe patients had a fair distribution of diseases comparable to the non-holdoutpatients the list of patients was randomly selected until it had a reasonablespread, see table 3.1.

19

Page 27: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Figure 3.2: Pictures per patient histogram.

Set Benign Atypical MalignantTrain/validation 3640 (39%) 1922 (21%) 3727 (40%)Holdout 452 (43%) 125 (12%) 485 (46%)

Table 3.1: Quantity of images of each class in train/validation and holdout sets.

The training set is used solely for training the network. Afterwards predictionis done on the validation set set in order to test it on data it has not previouslyseen.

Since these two datasets are used in conjunction multiple times a validation set,or hold-out set, of previously unused data is used to validate the results whenthe accuracy on the test data is deemed satisfactory.

10% of all patients were put in the hold-out set. Of the remaining patients10% were used for validation and the rest for training.

Because of the imbalance between the two sets of data (4092 benign, 6259 malig-nant/atypical), a penalty term was added to the loss function. This reduces thenetworks tendency to over-predict classes that are overrepresented. The penaltyterm is equal to the overrepresentation of the class. Therefore the penalty termis 1.52 for the malignant/atypical class and 1.00 for the benign (since it is notoverrepresented).

3.4 Models

Below the design for a baseline model is outlined. This baseline model is com-pared to an Inception-v3 and a ResNet model, using the augmentations andgradient descent optimizer that performed best on the baseline model.

The two state-of-the-art networks were trained using the default parameters forkeras.

20

Page 28: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

3.4.1 Baseline

Figure 3.3: Networklayer topology of base-line model

A baseline model was constructed to beable to compare against the more complexand deeper state-of-the-art models. By test-ing different augmentations and gradient de-scent optimizers it is also possible to betterconclude the optimal hyper parameters forthe state-of-the-art models.The network topology, type of layer andinput- and output tensor shapes are dis-played on the left. The first element of theinput tensor is the batch size, followed bythe data dimension, which is the two dimen-sional image data with three colour channelswhen the input data is fed to the network.It has a total of five 3× 3 convolutional lay-ers and 2×2max-pooling in between each ofthem. The first convolutional layer has 32filters, with the following layers having 64filters each. After the convolution and sub-sampling all of the outputs are flattened toa one-dimensional array. They are then fedto a fully-connected dense layer with 2304nodes. The last step is a classification layerwith two outputs corresponding to the twoclasses.The model has 10% dropout in the convolu-tional layers and 50% dropout in the fully-connected layer.The baseline model has a total of 425474parameters, compared to Inception-v3s’ 22million and ResNets’ 24 million. With morethan five times as few parameters trainingwill be much faster, making it possible toapproximate a good network configurationfor the state-of-the-art models.

3.4.2 Optimizer

An empirical study was performed to conclude the choice of gradient descentoptimizer. The baseline model was trained (without augmentations) using dif-ferent optimizers (see list 3.1) using the default learning rate values for Keras.

21

Page 29: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

The optimizer with the best training loss after 100 epochs was chosen for bothbaseline and complex models.

• SGD (stochastic gradient descent)• RMSProp• Adamax• Adadelta• Adam• Nadam• Adagrad

List 3.1: Keras optimizers

To determine whether the learning rate for the optimizer was the best suited forthis dataset it was compared to a few other configurations. The default learningrate for the best optimizer was increased/decreased two- and ten-fold.

3.5 Network setup

All networks were trained on an Ubuntu 18.10 computer with a GeForce GTX1060 3GB graphics card using the Cuda 9.0 driver.

The baseline network was trained on a batch-size of 64. The state-of-the-art wastrained on a batch-size of 16. With more VRAM it would have been possible tohave equal batch sizes.

All networks trained for 100 epochs.

3.6 Ethics

The network does not use any data that could connect the patients age, eth-niticity or gender to the images. All personal identification numbers are storedat S:t Erik Eye Hospital which reduces the risk that an attacker could get aholdof both images and personal indentification number, which could cause the at-tacker to identify patients with any of the covered diseases.

22

Page 30: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Chapter 4

Results

4.1 Baseline model

4.1.1 Optimizer

The model was trained using all currently implemented gradient descent opti-mizers currently in Keras. The best training rate was achieved using Adamax,it was therefore used throughout the rest of the study for both baseline andstate-of-the-art models.

Figure 4.1: Training loss comparison of state-of-the-art gradient descentoptimizers

4.1.2 Learning rate

The optimizers were configured using the default learning-rate values which havebeen recommended by either the papers author or are recommendations by the

23

Page 31: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

keras team [15]. For Adamax the default learning rate is 2e− 3.

Figure 4.2 displays the training loss during 100 epochs. 2e − 2 was omitted asit had a constant loss over 6. In the end 2e− 3 resulted in the lowest loss.

Figure 4.2: Training loss comparison of learning rate configurations forAdamax optimizer

Flipping and mirroring the images increases the size of the dataset by creatingnew variants. This naturally causes the network to converge slower because ithas to adapt to more data. It also delays the overfitting on the validation data,as seen in figure 4.4. However the network has difficulties generalizing on theflipped data, the loss stays approximately the same throughout training anddoes not have a clear arc.

24

Page 32: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Figure 4.3: ROC curve for different combinations of augmentations usingbaseline model.

Since mean correction with flipping produced the best AUC value (see table 4.1)that augmentation was used for the state-of-the-art networks and final compar-ison.

Augmentations Val. loss Val. acc. AUCNo augmentations 0.579 71,6% 0.789Flips 0.524 73,6% 0.820Mean correction 0.522 72,8% 0.817Mean correction & flips 0.519 75,3% 0.824Histogram 0.576 71,5% 0.770Histogram & flips 0.559 75,3% 0.808Histogram & mean correction & flips 0.585 70,8% 0.770

Table 4.1: Validation loss, validation accuracy and AUC rating of the baselinemodels.

25

Page 33: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Figure 4.4: Training- and validation loss for networks with- and without flips.

4.2 Baseline and state-of-the-art comparison

Figure 4.5 displays the ROC curves for the baseline, Inception-v3 and ResNetmodels. As shown in table 4.2 the state-of-the-art networks achieve better AUCthan the baseline, with Inception-v3 achieving a 0.912 AUC rating.

Figure 4.5: ROC-curve comparison between baseline and state-of-the-artmodels.

In figure 4.6 the sensitivity is set at 96%, this is the same sensitivity asophthalmologists [7]. At this sensitivity the networks achieve a specificity of

26

Page 34: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Model Val. loss Val. acc. AUCBaseline 0.519 75,3% 0.824Inception-v3 0.377 83,7% 0.912ResNet 0.383 81,1% 0.911

Table 4.2: Validation loss, validation accuracy and AUC rating comparison forbaseline and state-of-the-art models.

59%, meaning that the network can filter out 59% of healthy patients, signifi-cantly reducing the amount of time doctors need to spend diagnosing melanomas.

Figure 4.6: ROC-curve of Inception-v3 network with mean correction and flips.

27

Page 35: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Chapter 5

Discussion

The goal of this study was to create a CNN to identify uveal melanoma fromfundus images of patients with uveal naevi. Three different neural networks wereevaluated combined with different configurations of image augmentations. Thewinning network, Inception-v3, achieved a 0.912 AUC rating with a validationloss of 0.377.

5.1 Optimizer and learning rate

It was assumed that optimizer and learning rate were more correlated to theinput data than the network configuration. This hypothesis was not exploredduring the thesis. It is therefore possible that another optimizer configura-tion could be optimal for the state-of-the-art networks rather than the defaultAdamax configuration used throughout.

Since adaptive optimizers throttle the learning of highly and seldom used fea-tures it usually improves the learning on sparse data, where large parts of theimage is black. The data used in the thesis is not very sparse, however sincethe lesion is often local to a small part of the fundus there are many featuresthat will only occasionally be updated. This could probably explain why theadaptive optimizers performed better than SGD and why it is still justified touse Adamax for optimization of the state-of-the-art networks.

SGD is the only non-adaptive optimizer in the set, which is probably why it wasoutperformed. It is uncertain why RMSProp performed so badly since it is verysimilar to Adadelta, however it might have gotten stuck in a local minimum.

5.2 Randomness in results

Despite initializing the networks with the same weights there are still manystochastic variables that interfere with the result. These variables can make itdifficult to evaluate the model when the validation data is not large enough. For

28

Page 36: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

example, the dropout layer is stochastic, and so is the shuffling of the batchesthat are being fed into the network.

Even if all stochastic elements are removed it is still difficult to evaluate whenthe differences loss and AUC are so small. A network can perform well on aspecific train/validation split simply due to ’good luck’. Because the limitedresources available it was not possible to cross-validate using multiple splitsof the dataset, however this would be an improvement since it would help indistinguishing the performance of the various networks and configurations ofthe augmentations.

5.3 Image cropping

The images were center-cropped, this was inspired by the augmentations per-formed for the top scoring networks of the diabetes retinopathy challenge onKaggle[5]. However when diagnosing diabetes retinopathy it is not necessary toinclude the corners of the eye, those networks are more concerned with reduc-ing noise. Yet melanomas can be present anywhere in the fundus image. Animprovement would be to calculate the vertical edges of the fundus and cropthe fundus accordingly, after that the image can be resized. Another solutionis to do an elliptical crop of the fundus, this however loses information as thefundus is never fully elliptical, most cameras crop out the picture at the top andbottom. However by elliptical cropping the image it is possible to get a perfectcircle that can be rotated at an arbitrary angle to bootstrap the dataset.

5.4 Batch size limitation

Because of the limited hardware the batch sizes for the state-of-the-art networkshad to be at max 16. This caused the gradients to be very noisy and resultedin unnecessarily large weight updates. Adaptive gradient descent optimizationhelped to diminish the effect of noisy gradients to some degree however it wouldbe an improvement to be able to train on varying batch-sizes and compare theresult. Better hardware would also make it possible to evaluate optimizer andaugmentations on the state-of-the-art networks directly. If better hardware isnot available, it is also possible to compute the gradients from multiple batchesand average them to compute a smoother gradient with a smaller memory foot-print.

The image size used in the thesis was 256× 256, except for the ResNet networkthat required 224×224 images. This together with the simplicity of the baselinemodel made it possible to train around 10 networks per day. However theGoogLeNet implementation of the inception-v3 network uses 299× 299 images.Testing these sizes instead could maybe increase the accuracy for these networks.Furthermore, the winner of the diabetes retinopathy challenge on Kaggle usedimages of size 540 × 540. Experimenting with other image sizes could be areasonable next step to decrease validation loss by increasing image resolution.

29

Page 37: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

5.5 Image augmentations

Flips and mean correction both increased AUC compared to no augmentations,while histogram equalization generated worse results. It is unclear why his-togram equalization had worse results however one theory is that the equal-ization of colour intensities could enhance image setting differences. Since theimages will have different lighting and colour balancing it is possible that equal-izing the colour histogram will lead to an even greater disparity in colour forsame type lesions and naevi if different parts of the eye react to light differently.

5.6 Accuracy and AUC

The state-of-the-art models were unable to achieve accuracies over 90%. Onereason for this could be that these type of networks are designed for ImageNet-classification with 1000 classes. In this thesis only two classes are used, thereforethe fully connected layers could contain too many nodes leading to overfittingvery early. By reducing the nodes in the fully connected layer it is probablypossible to reduce the overfitting.

Another large contributing factor could be that there is not enough data to suf-ficiently train the weights in the convolutional layers. Burlina et al.[1] proposesa solution that uses a model pretrained on ImageNet, but instead of a fullyconnected layer it trains a support vector machine for the final prediction whilenot training the convolutional layer at all. This reduces the amount of trainableparameters with the assumption that the convolutional patterns can be usefuleven in prediction of fundus images.

Naevi are more frenquently detected today in passing since fundus photogra-phy has become more regular part of patient treatment. Even opticians offerfundus photographies as part of an eye exam to detect eye diseases. Thereforethe introduction of early filtering of patients using neural network could provebeneficial to release stress on eye hospitals.

A specificity of almost 60% at 95% sensitivity means that this network couldbe used to filter out many patients that do not need a manual examinationby a doctor, which today is a tedious task. It has therefore been successful atgeneralizing the task of identifying uveal melanoma since it can filter out morethan half the patients with an ophtalmologists accuracy, proving that CNNsare capable of detecting the distinct differences between naevi and melanomas.Considering that there is no alternative today except manual screening thiscould save hospitals both time and money.

5.7 Models

The state-of-the-art models performed better than the baseline model. More pa-rameters and deeper architecture means that the models are capable of general-izing more complex tasks. However they are much slower to train. Inception-v3took approximately four hours to train, while ResNet took six hours and the

30

Page 38: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

baseline model took 45 minutes. Augmentations had very little to no effect ontime.

The small difference between the state-of-the-art networks is comparable to thesmall differences in errors on ImageNet data. Inception-v3 overfits less, whichcould be because of fewer parameters. Or because of the inception module,which covers multiple convolutions of different sizes, which could make it adaptto more complex features in the input.

5.8 Economic, ecological and social sustainability

The results presented demonstrate the possibilities of artificial intelligence inhealthcare. Neural networks are capable of reducing workload by acting as aprimary filter before the patient is evaluated by a doctor. Since the accuracyof the models can be statistically verified given enough data the patient doesnot have to worry about being wrongly diagnosed any more than they wouldnormally. However the integrity and security of these systems must be verifiedbefore replacing existing protocol in order to ensure that the networks are riskfree.

For society the development of these networks have multiple positive effects. Bydecentralizing care patients can get quick feedback independent of their location,which can be of great value for hospitals with countrywide responsibility for aspecific disease. For example, S:t Erik Eye Hospital has the sole responsibilityto diagnose and treat uveal melanomas is Sweden. This means that patientsrisk travelling to Stockholm only to be declared healthy, which could be avoidedby a decentralized system. The reduction in travel can lead to a reduction incarbon footprint, reduced strain on roads and public traffic, as well as moneyand time saved for both the patient and the healthcare provider.

Many powerful AI frameworks such as Torch and Keras have been developedin the last years and it is now easier than ever to get access to supercomputingclusters. It is therefore possible for small actors to create and evaluate complexAI models given enough quality data. The results presented in the thesis is anindication of the power of these new tools and how even small datasets with lessthan 10000 data points can be used to create networks that once implementedcan have large societal impact.

5.9 Further work

In order to be more certain about the outcome of each test all networks shouldbe cross-validated using multiple folds of the same dataset. Introducing moredata or increase the size of the validation set from 10% would also improveevaluation.

Aggregating multiple learners into one could increase performance. Either byensembling multiple identical networks, or different networks that have theirown independent strengths.

31

Page 39: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Using transfer learning could also prove beneficial, since the dataset may notbe sufficiently large to train a > 20 million parameter network. By for exampleusing a model pre-trained on ImageNet data and only training the fully con-nected layer on the naevi data. Since the layers already have knowledge aboutimage shapes and structures it is plausible that the classifying layer will havesufficient information to get even better accuracy due to reduced overfittingfrom too many parameters.

32

Page 40: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Chapter 6

Conclusion

In this thesis three convolutional networks were trained to classify fundus imagescontaining uveal nevi and uveal melanomas. A baseline model is comparedagainst Inception-v3 and ResNet, two state-of-the-art CNNs with high successrates at classifying ImageNet data.

All networks achieve a > 0.8 AUC rating after training on only 8360 samples,with Inception-v3 achieving the top AUC of 0.912. This shows that CNNs canbe used to successfully reduce the amount of hours spent on diagnosing uvealmelanoma by filtering out clear cases of uveal nevi early.

33

Page 41: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

Bibliography

[1] P. Burlina et al. “Detection of age-related macular degeneration via deeplearning”. In: 2016 IEEE 13th International Symposium on BiomedicalImaging (ISBI). 2016 IEEE 13th International Symposium on BiomedicalImaging (ISBI). 2016, pp. 184–188.

[2] Dan C. Cireşan et al. “Flexible, High Performance Convolutional NeuralNetworks for Image Classification”. In: Proceedings of the Twenty-SecondInternational Joint Conference on Artificial Intelligence - Volume Two.IJCAI’11. Barcelona, Catalonia, Spain: AAAI Press, 2011, pp. 1237–1242.

[3] Angel Cruz-Roa et al. “Automatic detection of invasive ductal carcinomain whole slide images with convolutional neural networks”. In: MedicalImaging 2014: Digital Pathology. Medical Imaging 2014: Digital Pathology.Vol. 9041. International Society for Optics and Photonics, 2014, p. 904103.

[4] Jeffrey De Fauw. Detecting diabetic retinopathy in eye images. 2015. url:http://jeffreydf.github.io/diabetic-retinopathy-detection/#the-opening (visited on 04/02/2018).

[5] Diabetic Retinopathy Detection. url: https: / /www .kaggle . com/ c/diabetic-retinopathy-detection (visited on 10/04/2018).

[6] Andre Esteva et al. “Dermatologist-level classification of skin cancer withdeep neural networks”. In: Nature 542.7639 (2017), p. 115.

[7] Paul T. Finger. Choroidal Melanoma - New York Eye Cancer Center.url: https://eyecancer.com/eye-cancer/conditions/choroidal-tumors/choroidal-melanoma/ (visited on 07/09/2018).

[8] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. “Deep Sparse Rec-tifier Neural Networks”. In: Proceedings of the Fourteenth InternationalConference on Artificial Intelligence and Statistics. Proceedings of theFourteenth International Conference on Artificial Intelligence and Statis-tics. 2011, pp. 315–323.

[9] Kaiming He et al. “Deep Residual Learning for Image Recognition”. In:IEEE conference on computer vision and pattern recognition 2016 (2015),pp. 770–778. arXiv: 1512.03385.

[10] Geoffrey E. Hinton et al. “Improving neural networks by preventing co-adaptation of feature detectors”. In: arXiv preprint (2012). arXiv: 1207.0580.

[11] ImageNet Large Scale Visual Recognition Competition (ILSVRC). url:http://www.image-net.org/challenges/LSVRC/ (visited on 03/02/2018).

34

Page 42: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

[12] Swathi Kaliki, Carol L Shields, and Jerry A Shields. “Uveal melanoma:Estimating prognosis”. In: Indian Journal of Ophthalmology 63.2 (2015),pp. 93–102.

[13] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “ImageNet Clas-sification with Deep Convolutional Neural Networks”. In: Advances inNeural Information Processing Systems 25. Ed. by F. Pereira et al. CurranAssociates, Inc., 2012, pp. 1097–1105.

[14] Henrik Marklund. It’s a no-brainer! Deep learning for brain MR im-ages. Medium. 2018. url: https://medium.com/stanford- ai- for-healthcare/its- a- no- brainer- deep- learning- for- brain- mr-images-f60116397472 (visited on 02/27/2018).

[15] Optimizers - Keras Documentation. url: https://keras.io/optimizers/(visited on 06/18/2018).

[16] Keiron O’Shea and Ryan Nash. “An Introduction to Convolutional NeuralNetworks”. In: arXiv preprint 2015 (2015). arXiv: 1511.08458.

[17] Raul Rojas. Neural Networks: A Systematic Introduction. Google-Books-ID: 4rESBwAAQBAJ. Springer Science & Business Media, 2013. 511 pp.

[18] Dominik Scherer, Andreas Müller, and Sven Behnke. “Evaluation of Pool-ing Operations in Convolutional Architectures for Object Recognition”.In: Artificial Neural Networks – ICANN 2010. International Conference onArtificial Neural Networks. Lecture Notes in Computer Science. Springer,Berlin, Heidelberg, 2010, pp. 92–101.

[19] Wei Shen et al. “Multi-scale Convolutional Neural Networks for LungNodule Classification”. In: Information Processing in Medical Imaging.International Conference on Information Processing in Medical Imaging.Lecture Notes in Computer Science. Springer, Cham, 2015, pp. 588–599.

[20] Carol L. Shields, Jane Grant Kels, and Jerry A. Shields. “Melanoma of theeye: revealing hidden secrets, one at a time”. In: Clinics in Dermatology33.2 (2015), pp. 183–196.

[21] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. “Deep InsideConvolutional Networks: Visualising Image Classification Models and SaliencyMaps”. In: arXiv preprint (2013). arXiv: 1312.6034.

[22] Christian Szegedy et al. “Going Deeper with Convolutions”. In: IEEE con-ference on computer vision and pattern recognition 2015 (2014), pp. 1–9.arXiv: 1409.4842.

[23] Daniel Shu Wei Ting et al. “Development and Validation of a Deep Learn-ing System for Diabetic Retinopathy and Related Eye Diseases UsingRetinal Images From Multiethnic Populations With Diabetes”. In: JAMA318.22 (2017), pp. 2211–2223.

[24] Myron Yanoff and Jay S. Duker. Ophthalmology. Elsevier Health Sciences,2008. 1551 pp.

35

Page 43: Uveal melanoma identification using artificial neural networks1276957/FULLTEXT01.pdf · cera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts

TRITA EECS-EX-2018:730

www.kth.se