Oncology Oracle Gregory Koytiger
Aug 17, 2015
Cell lines are a rich source of drug sensitivity data
5280 Unique Drug - Cell line Sensitivity Values
Predict
675 Human cancer cell lines withRNAseq Gene Expression
Ridge regression keeps only a few significant genesin model
Drug Sensitivity
Linear Model
Ridge Regression
Automatic Relevance Determnation Regression tunesmodel sparsity from data
Drug Sensitivity
Linear Model
Ridge Regression
Automatic Relevance Determination Regression
Start with only the most highly varying genes
8000 of the highest varying genes in the Genentech data were chosen as features to reduce noise and for computational tractability
Drug Sensitivity
Linear Model
Ridge Regression
Automatic Relevance Determination Regression
Docetaxel model predicts clinical response in breast cancer patients
False Positive Rate
0.75 ROC AUC
7/8 patients will besensitive
True PositiveRate
False Positive Rate
0.75 ROC AUC
91% of sensitive patients will recieve treatment64% of resistant patients will avoid unnecessary treatment
True PositiveRate
Docetaxel model predicts clinical response inbreast cancer patients
Model predicts Gemcitabine response in TCGA lungcancer data
Gem
cita
bine
lo
g IC
50
With Tumor Tumor Free
p<5% by Wilcoxon Rank-Sum Test 0.80 ROC AUC
Top 4 predictions are all sensitive
About Me
©20
14 N
atur
e A
mer
ica,
Inc.
All
right
s re
serv
ed.
T E C H N I C A L R E P O R T S
Functional interpretation of genomic variation is critical to understanding human disease, but it remains di�cult to predict the e�ects of speci�c mutations on protein interaction networks and the phenotypes they regulate. We describe an analytical framework based on multiscale statistical mechanics that integrates genomic and biophysical data to model the human SH2-phosphoprotein network in normal and cancer cells. We apply our approach to data in The Cancer Genome Atlas (TCGA) and test model predictions experimentally. We �nd that mutations mapping to phosphoproteins often create new interactions but that mutations altering SH2 domains result almost exclusively in loss of interactions. Some of these mutations eliminate all interactions, but many cause more selective loss, thereby rewiring speci�c edges in highly connected subnetworks. Moreover, idiosyncratic mutations appear to be as functionally consequential as recurrent mutations. By synthesizing genomic, structural and biochemical data, our framework represents a new approach to the interpretation of genetic variation.
TCGA and similar projects have generated extensive data on the muta -tional landscape of tumors1. To understand the functional consequences of these mutations, it is necessary to ascertain how they alter the pro-tein-protein interaction (PPI) networks involved in regulating cellular
Low-throughput methods such as �uorescence polarization spec-troscopy provide precise interaction data on a few dozen PIDs and ligands but cannot easily be scaled to the full proteome2, whereas high-throughput array-based methods provide greater scale but su�er from systematic artifacts and high false positive and false negative rates, resulting in data sets that only partly agree 2. An additional challenge is modeling the e�ects of mutations for pro-teins with multiple binding domains and/or multiple sites of phos-phorylation9, a reality for most signaling proteins (for example, the CRK oncoprotein). Existing methods are either limited to individual domains10–13 or are insu�ciently precise to discern the e�ects of single- residue changes14,15.
�e MSM framework we have developed combines genomic, bind-ing and structural data and reconciles inconsistencies within and among data sets to generate PID networks for normal and cancer cells. We develop a bottom-up �rst-principles approach, involving a single mathematical equation based on statistical mechanical ensem-bles, that models domains, proteins and networks, and we then apply this approach to the analysis of SH2 networks and mutations found in TCGA16. We validate newly predicted interactions experimentally and demonstrate the sensitivity of MSM to single-residue mutations that cause subtle changes in binding a�nity. Our analysis provides mechanistic insights into an important PID cancer network and vali-dates a computational approach to PID networks that can be applied
A multiscale statistical mechanical framework integrates biophysical and genomic data to assemble cancer networksMohammed AlQuraishi1–3, Grigoriy Koytiger1,3, Anne Jenney1, Gavin MacBeath2 & Peter K Sorger1
PTPN11 (SHP2)GRB2
STAT
5AST
AT6
STAT
1ST
AT4
STAT
2ST
AT3
STAP
1TE
NC1
TNS1
TNS3
TNS4
RIN1
SH2D5
SH2B1
SH2B2
SH2B3
SHC1
SHC3
PLCG1
PLCG2
BCAR3SH2D3CSH2D3AFERFES
CRKCRKLMATKHSH2DSH2D2ASHBSHFSHDSHEFGRYES1SRCHCKLCKLYNBLKSLA2
PTK6
PTPN6
PTPN11
ABL1G
RAPG
RB2N
CK
1N
CK
2PIK
3R1
PIK3R
3PIK
3R2
2N
HC
ITK
TXK
BTKTE
CBM
XSY
K
ZAP7
0
GRB
7
GRB1
4
GRB1
0
SH2D
1A
SH2D
1B
INPPL1BLNK
LCP2DAPP1VAV1VAV3VAV2RASA1SH3BP2
JAK2JAK3
SUPT6H
CBL
ANKS1A
ANKS1B
GULP1NUMB
NUMBL
DAB1
DAB2
APBA1
APBA2
APBA3
CCM2
DOK1
DOK2DOK4
DOK6DOK5
FRS3IRS1
IRS4APBB1
APBB2A
PBB
3SH
C2
APPL1
EPS8L2STAT
5AST
AT6
STAT
1ST
AT4
STAT
2ST
AT3
STAP
1TE
NC1
TNS1
TNS3
TNS4
RIN1
SH2D5
SH2B1
SH2B2
SH2B3
SHC1
SHC3
PLCG1
PLCG2
BCAR3SH2D3CSH2D3AFERFES
CRKCRKLMATKHSH2DSH2D2ASHBSHFSHDSHEFGRYES1SRCHCKLCKLYNBLKSLA2
PTK6
PTPN6
PTPN11
ABL1G
RAPG
RB2N
CK
1N
CK
2PIK
3R1
PIK3R
3PIK
3R2C
HN
2IT
KTX
KB
TKTEC
BMX
SYK
ZAP7
0
GRB
7
GRB1
4
GRB1
0
SH2D
1A
SH2D
1B
INPPL1BLNKLCP2
DAPP1VAV1VAV3VAV2RASA1SH3BP2JAK2
JAK3
SUPT6H
CBL
ANKS1A
ANKS1B
GULP1NUMB
NUMBL
DAB1
DAB2
APBA1
APBA2
APBA3
CCM2
DOK1
DOK2DOK4
DOK6DOK5
FRS3IRS1
IRS4APBB1
APBB2A
PBB
3SH
C2
APPL1
EPS8L2
Ph.D. Harvard UniversityDepartment of Chemistry and Chemical Biology
Postdoc Harvard Medical SchoolDepartment of Systems Biology
Management ConsultantDean & Co.