H 2 O for Genomics 0 Hussam Al-Deen GenomeDx Biosciences
H2O for Genomics
0
Hussam Al-Deen
GenomeDx Biosciences
• About GenomeDx
• Cancer and genomics
• Genomic information we use
‒ Genome-wide RNA expression for applications in cancer
• Our prostate cancer solution
• Why we use H2O ?
• Applications tested:
‒ Tumor Gleason Grade Classifier tested for multiple endpoint prediction
• Conclusions and Future Directions
Outline
1
GenomeDx Biosciences
A b o u t U s
2
A clinical genomics company founded to
transform the practice of oncology
Use machine learning and statistical
algorithms to generate clinical tests
Decipher® metastasis signature
More than 20 Peer-review
publications supporting analytical,
clinical validity and utility
Over 5,000 patients tested in clinical
trials and oncology practice
Decipher GRIDTM platform
Data sharing program for Decipher
users
Free access for academic research
Clinical Lab
San Diego, CA
Informatics Lab
Vancouver, BC
Cancer is a disease of the genome
T i s s u e - b a s e d g e n o m i c s
3
• Cancer is a complex disease and has many, many subtypes
‒ Indolent, aggressive, hormone or chemo sensitive/resistant, etc.
DNA RNA Protein
vector.childrenshospital.org people.duke.edu fineartamerica.com
• Measuring RNA expression (concentration) and activity of genes is
highly informative for a genomic-based understanding of cancer
Measure gene activity using genome-wide expression
analysis of clinical biosamples
T i s s u e - b a s e d g e n o m i c s
4
RNA
EXTRACTIONMICROARRAY
TUMOR
SAMPLE
CANCER PATIENT
BIOPSY/SURGERYEXPRESSION
DATA
M E D I C A L C E N T E R
MOFFITTCancer Center & Research Institute
H. LEE
Decipher GRID a novel data-sharing program
to accelerate cancer genomics innovation
5
4
6
A B C
CMYK
PANTONE
4.1
6.1
Rhode - custom thinner weight
Prostate cancer is a significant burden on the US
healthcare system
P r o s t a t e c a n c e r m o s t p r e v a l e n t c a n c e r a f f e c t i n g m e n
Prostate cancer alone is projected in 2015 to account for 26% of incident
cancer cases in men
Siegel, Rebecca L., Kimberly D. Miller, and Ahmedin Jemal. "Cancer statistics, 2015." CA: a cancer journal for clinicians 65.1 (2015): 5-29.
6
• Accurate forecasting of recurrence
risk key to determining optimal
treatment choice:
‒ Observation
‒ Radiation therapy
‒ Hormone therapy
‒ Chemotherapy
• Goal of risk-adapted therapy:
‒ Reduce side effects of treatment
‒ Reduce costs of treatment
Clinical genomics aims to improve cancer patient care
P r o s t a t e c a n c e r b a l a n c i n g t h e h a r m s a n d b e n e f i t s
7
• Highly advanced algorithms such
as Deep Learning
• Ready to use algorithms with
existing languages and tools
• Easily explore data and develop
models
• Multiple algorithms within the
same package
Why we use H2O?
8
http://h2o.ai/
• Genomics:
‒ High-dimensional Dataset ~ 46K features
‒ Feature selection to reduce dimensionality of data
• Deep Learning:
‒ Can exploit non-linear relationship between features (genes)
‒ Improve performance
‒ Deep Features may help us understand the biology
Deep Neural Network
9
• Different packages to train deep
neural network:
‒ Filtering to reduce # of Features ~ 100
‒ No grid search
‒ Cross Validation AUC ~ 0.5
• H2O Deep neural network :
‒ Filtering to reduce # of Features ~ 100
‒ Good Results (AUC)
Deep Neural Network
10
Application:
Development of a Tumor
Gleason Grade Classifier
11
Tumor gleason grade is a strong prognostic factor and used to
guide treatment decisions
D i g i t i z i n g t h e G l e a s o n G r a d e
• Gleason grade is the current
gold standard in prostate
cancer:
• Assigns score from 1 to 5
based on tissue microscopic
appearance
• Higher score is associated with
more aggressive disease
• Men with higher grade prostate
cancer more likely to receive
chemical castration (hormone
therapy) https://en.wikipedia.org/wiki/Gleason_grading_system
12
Why develop a genomic model for pathology tumor grading?
D i g i t i z i n g t h e G l e a s o n G r a d e
• Gleason grade is subjective:
• Depends on pathologist
experience
• Border line cases differently
interpreted
• Gleason grade on biopsy is
often ‘up-graded’ on final
pathology
• Genomics could provide a more
robust prediction of outcomeshttps://en.wikipedia.org/wiki/Gleason_grading_system
13
G3
(n = 366)
G4+
(n = 624)
G4+
(n = 424)
G3
(n = 113)
Study Design
~ 7000 patients
1,537
Patients
Training
(n = 990)
Testing
(n = 537)G3 : Patients who had Gleason 3
G4+ : Patients who had Gleason 4 or 5
14
Classifier Development Overview
Univariate Filtering
H2O Grid Search (10 Fold C.V)
Deep neural network
Array features on Affymetrix Human
Exon 1.0 ST microarrays were
summarized into ~ 46,000 features
(genes)
H2O
H2O Grid search to optimize hidden
layer size
Two-sample Wilcoxon tests ‘Mann-
Whitney’
n = 366
n = 624
46,000 features
G3
G4+
15
Classification table, with cut-point equal to 0.5
Misclassification Rate = 0.31
Truth
Prediction G3 G4+
G3 179 69
G4+ 99 190
Gleason Grade ROC Curve
• Model score AUC = 0.77 95% CI:(0.73-0.81)
• GC1 score AUC = 0.72 95% CI:(0.68-0.76)
• GC2 score AUC = 0.74 95% CI:(0.70-0.78)
• Biopsy Gleason AUC = 0.72 95% CI:(0.68-
0.76)
Boxplot of Model Score distributionS
en
sit
ivit
y
Specificity
1.0
0.8
0.6
0.4
0.2
0.0
1.0 0.8 0.6 0.4 0.2 0.0
1.0
0.75
0.50
0.25
0.00
Sc
ore
G3 G4+
AUC: 0.77 [0.73 – 0.81]
16
Determining Patient Risk
M e t a s t a t i c p r o s t a t e c a n c e r
• Prostate cancer can spread to other parts of
patient body
• After surgery up to 50%1 of men will have
clinical risk factors that increase the chance
of metastasis
• Very few men will experience metastasis
and die of their cancer2
• Gleason grade is surrogate for metastatic
disease
http://www.drugdevelopment-technology.com/projects/
drug_abiateronecance/drug_abiateronecance5.html
17
[1] Swanson, G.P., et al., Pathologic findings at radical prostatectomy: risk factors for failure and death. Urol
Oncol, 2007. 25(2): p. 110-4.
[2] Pound, C.R., et al., Natural history of progression after PSA elevation following radical prostatectomy. JAMA,
1999. 281(17): p. 1591-7
Genomic Gleason Classifier Predicts
Metastatic Outcomes
AUC : 73.4 [67.36 – 79.43]
1.0
0.75
0.50
0.25
Metastasis
0
Sc
ore
18
MET No-MET
METNo-MET
Pro
ba
bil
ity o
f M
eta
sta
sis
Fre
e S
urv
iva
l
1.0
0.8
0.6
0.4
0.2
0.0
0 24 48 24072 96
Time (Surgery to Metastasis)
p−value < 0.001
120 144 168 192 216
0.75
0.90
MET : Patients who developed metastatic disease
No-MET : Patients who developed metastatic disease
Number of
Features
Training
Time
Number
of LayersActivation
Hidden
layers
Hidden
Dropout
Input
Dropout
Testing
AUC (GG1)
Testing
AUC
(Metastatic Disease)
250 ~ 1 hour 2RectifierWi
thDropout(48, 169) (0.55, 0.09) 0.34 77 70
500 ~ 1 hour 3 Rectifier(339, 204,
91)
(0.04, 0.03,
0.13)0.47 78 67
Random search to reduce training time and
incorporate more features
19
[1] GG : Gleason Grade
• Applied advanced machine learning algorithm to genomic data
• H2O Deep Learning model outperform other Gleason predicting models
• Incorporate more genomic features (46 K) into the analysis to improve model development and performance
• Exploit nonlinear relationship between features (genes)
• Can Deeplearning help us understand the biology ?
Conclusions and Future
Directions
20
GenomeDx- A multi-disciplinary adventure!
21