1 / 42 Diagnosis of alzheimer's disease with deep learning 2016. 7. 4 Seonho Park
Apr 15, 2017
1 / 42
Diagnosis of alzheimer's disease with deep learning
2016. 7. 4Seonho Park
2 / 42
Outline
Introduction to Machine LearningConvolutional Neural Network
Diagnosing of Alzheimer’s disease
3 / 42
Introduction to Machine LearningConvolutional Neural Network
Diagnosing of Alzheimer’s disease
4 / 42
Introduction to Machine Learning
x1
x2
x1
y
x1
x2
<Supervised Learning> <Unsupervised Learning>classification regression clustering
Category of Machine Learning
문제 + 정답
문제 + 정답
문제 + 정답
데이터 + 레이블 머신러닝 학습
머신러닝 모델 정답 예측새로운 데이터
문제 + 정답
문제 + ???
분류 회귀
CatComputer
LionPencilPig
레이블 없는 데이터 머신러닝 학습 군집화
5 / 42
Introduction to Machine LearningScikit-Learn
• Machine Learning Library in Python• http://scikit-learn.org/• Classification: Decision trees, SVM, NN• Regression: GP, Ordinary LS, Ridge Regression, SVR • Clustering: k-Means, Spectral Clustering
6 / 42
Introduction to Machine LearningWhy Deep Learning?
• Deep Learning = Deep Neural Network• Data and Machine Learning
† http://cs229.stanford.edu/materials/CS229-DeepLearning.pdf
7 / 42
Introduction to Machine LearningArtificial neural networks
Training = Find weights (parameters)Inference = get output by specific input and trained weights
8 / 42
Introduction to Machine LearningConvolutional Neural Network (CNN)
• Image Processing (Computer Vision)
9 / 42
Introduction to Machine LearningRecurrent Neural Network (RNN)
• Time Series Data• Natural Language Processing• Translation, Speech Recognition, Auto Caption• 자동번역 , 음성인식 , 이미지 캡션 생성 등에 활용
† Towards End-to-End Speech Recognition with Recurrent Neural Networks, Alex Graves et al (2014)
10 / 42
Introduction to Machine LearningWhy GPU?
• CuDNN: GPU-accelerated library of primitives for deep neural networks• VRAM limitation, Double/Single/Half Precision• Linear Algebra: CuBLAS, MAGMA
11 / 42
Introduction to Machine LearningFrameworks
Cuda-Con-vNet
Pylearn2Lasagne
12 / 42
Introduction to Machine LearningOpen Sources for Deep Learning
† Comparative Study of Deep Learning Software Frameworks, Soheil Bahrampour et al (2015)
13 / 42
Introduction to Machine LearningPioneers
• Yann Lecun• Geoffrey Hinton• Yoshua Bengio• Andrew Ng• Jürgen Schmidhuber
14 / 42
Image Recognition Speech Recognition Auto Caption
Self Driving Car Natural Language Processing Recommendation System
Introduction to Machine LearningApplications
15 / 42
Introduction to Machine LearningConvolutional Neural Network
Diagnosing of Alzheimer’s disease
16 / 42
Convolutional Neural Network Overview
• Classification• Convolution Operation + MLP• Architecture
• Convolutional Layer (Convolution Operator, Activation)• Subsampling (Downsampling, Pooling)• Fully Connected Layer• Classifier
17 / 42
Convolutional Neural Network LeNet5† Convolutional Operation
† Gradient Based Learning Applied to Document Recognition, Yann LeCun et al (1998)
• Digit Recognition • Weight matrix (filter): 4D tensor [# of feature at layer m, # of features at layer m-1, height, width]
18 / 42
Convolutional Neural Network Activation function (nonlinearity)
† Systematic evaluation of CNN advances on the ImageNet, Dmytro Mishkin, et al (2016)
19 / 42
Convolutional Neural Network Pooling Layer
• Erase Noise• Reduce Feature Map Size (Memory Save)
† Systematic evaluation of CNN advances on the ImageNet, Dmytro Mishkin, et al (2016)
20 / 42
Convolutional Neural Network Training
• Error(Loss) Function: Categorical Cross Entropy
• Design Variable: weights(W), bias(b)
• Backpropagationconjunction with an optimization method such as gradient descent
• Vanishing gradient
21 / 42
Convolutional Neural Network Mini-Batch Method
• Computational Efficiency• Memory Use• Iteration & Epoch
Vanilla Gradient Descent
Stochastic Gradient Descent• Parameter update for each training example x(i) and label y(i)
• Step size(η) is typically set to 10-3
22 / 42
Convolutional Neural Network Training (Optimization)
• Update Functions
• Second-order Method (L-BFGS) is not common in practice• NAG is more standard
23 / 42
Convolutional Neural Network Overfitting and Regularization • Dropout
• Relaxation: Add Regularization Term to Loss Function• Remove Layer (Reduce Parameters), Add Feature
† Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Nitish Srivastava et al (2014)
24 / 42
Convolutional Neural Network Local Optimum?
† Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, Yann N. Dauphin et al (2014)
• Non-convex optimization problem • deeper and more profound difficulty originates from the proliferation of saddle points,
not local minima, especially in high dimensional problems of practical interest
25 / 42
Convolutional Neural Network Parallel Computation
• Architectural Parallel: Divide Channel• Data Parallel: Divide Batch
26 / 42
ILSVRC• Evaluate algorithms for object detection and image classification at large
scale• Training: 1.3M/ Test: 100k, 1000 categories
Convolutional Neural Network
27 / 42
AlexNet• ILSVRC12 1st Place• 15.3% error rate (2nd place achieved 26.5% error rate) • Architecture Parallel (2GPU used)
† ImageNet Classification with Deep Convolutional Neural Networks, Alex Krizhevsky et al. (2012)
Convolutional Neural Network
28 / 42
VGG Net• DeepMind• ILSVRC14 2nd Place• 6.8% error rate
† VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION, Karen Simonyan et al. (2014)
Convolutional Neural Network
29 / 42
GoogLeNet• Google• Inception module• ILSVRC14 1st Place• 6.67% error rate
† Going Deeper with Convolutions, Christian Szegedy et al. (2014)
Convolutional Neural Network
30 / 42
MSRA• MicroSoft• PReLU activation• Weight initialization • 4.94% error rate (Surpass Human Level, 5.1%)
† Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, Kaiming He et al. (2015)
Convolutional Neural Network
31 / 42
Inception-v3• Google• Inception Module Upgrade• 50 GPUs• 3.46% error rate• Public Use with TensorFlow
† Going Deeper with Convolutions, Christian Szegedy et al. (2015)
Convolutional Neural Network
32 / 42
Convolutional Neural Network Deep Neural Networks are Easily Fooled†
† Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images, A Nguyen et al (2014)
• It is possible to produce images totally unrecognizable to human eyes
• interesting differences between human vision and current DNNs
• raise questions about the generality of DNN computer vi-sion
33 / 42
Convolutional Neural Network Neural Style
† A Neural Algorithm of Artistic Style, Leon A. Gatys et al (2014)
• Style + Contents reconstruction
• Caffe framework• https://github.com/jcjohnson/neural-style
34 / 42
Introduction to Machine LearningConvolutional Neural Network
Diagnosing of Alzheimer’s disease
35 / 42
Diagnosing of Alzheimer’s diseaseTraditional Diagnosis of Alzheimer’s disease
• Review medical history• Mini Mental Status Exam• Physical Exam• Neurological Exam• Brain Image: Structural(MRI,CT), Functional(fMRI)
• NC(Normal Condition), MCI(Mild Cognitive Impairment), AD• AD: Vascular/Non-Vascular
36 / 42
Diagnosing of Alzheimer’s diseaseAD Patients’ MRI Features
• Temporal Lobe: Hippocampus• Ventricle
37 / 42
Diagnosing of Alzheimer’s diseaseCase Study: Machine Learning for diagnosing of AD• PET, MRI images
• Patch Extraction• Restrict Bolzmann Machine• Accuracy: 92.4%(MRI), 95.35%(MRI+PET)
† Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis, Heung-Il Suk et al (2014)
38 / 42
Diagnosing of Alzheimer’s diseaseCase Study: Machine Learning for diagnosing of AD• Feature: Cortex Thickness
• FreeSurfer• Linear discriminant analysis (LDA)• Accuracy: Sensitivity: 82%, Specificity: 93%
† Individual subject classification for Alzheimer’s disease based on incremental learning using a spatial frequency representation of cortical thickness data, Young-Sang Cho et al
(2012)
39 / 42
Diagnosing of Alzheimer’s diseasePreprocessing
• Data Set: about 1400 of T1 MRI from SMC• FreeSurfer: Skull Stripping: reduce size [256,256,256][190,190,190] / 67MB27MB • Pixel Value Normalization [0,255] [-1,1] • Mirrored cropping
40 / 42
Diagnosing of Alzheimer’s diseaseArchitecture
• CNN• Lasagne (Theano) Framework• Inception Module, Batch Normalization• 3D Convolution + CuDNN v3 (Github)• 2 TITAN X GPU: Data Parallel (PyCUDA)• Batch Size: 80
• Training Set #Healthy Condition(HC): 761 #Alzheimer’s Disease (AD): 389• Test Set #Healthy Condition(HC): 105 #Alzheimer’s Disease (AD): 84
Data
41 / 42
Diagnosing of Alzheimer’s diseaseArchitecture
input
24*Conv 11/5
MaxPool7/2
288*Conv 3/2
FC 120
DropOut
SoftMax
input
36*Conv 16/6
MaxPool3/2
120*Conv 4/1
Batch Norm
MaxPool3/2
60*Conv 1/1
96*Conv 3/112*Conv 1/1
24*Conv 5/1 24*Conv 1/1
MaxPool3/1
48*Conv 1/1
Concatenate
MaxPool3/2
FC 150
128*Conv 1/1
192*Conv 3/132*Conv 1/1
96*Conv 5/1 64*Conv 1/1
MaxPool3/1
128*Conv 1/1
Concatenate
96*Conv 1/1
208*Conv 3/116*Conv 1/1
48*Conv 5/1 64*Conv 1/1
MaxPool3/1
192*Conv 1/1
Concatenate
SoftMax
input
60*Conv 10/2
MaxPool2/2
144*Conv 3/1
Batch Norm
MaxPool3/2
48*Conv 1/1
72*Conv 3/1
18*Conv 1/1
36*Conv 5/148*Conv 1/1
MaxPool3/1
48*Conv 1/1
Concatenate
MaxPool3/2
FC 500
96*Conv 1/1
208*Conv 3/116*Conv 1/1
48*Conv 5/164*Conv 1/1
MaxPool3/1
192*Conv 1/1
Concatenate
160*Conv 1/1
320*Conv 3/132*Conv 1/1
128*Conv 5/1 128*Conv 1/1
MaxPool3/1
256*Conv 1/1
Concatenate
SoftMax
280*Conv 1/1
340*Conv 3/132*Conv 1/1
128*Conv 5/1 128*Conv 1/1
MaxPool3/1
228*Conv 1/1
Concatenate
AvgPool3/1
MidasNet1
MidasNet2
MidasNet3
42 / 42
Convergence History
Model AccuracyMidasNet1 167/189 (88.4%)MidasNet2 169/189 (89.4%)MidasNet3 169/189 (89.4%)
0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175 182 189 1960.01
0.1
1
10
Epoch
Cost
Diagnosing of Alzheimer’s diseaseResult
Thank You