Presentation july 31_2015

Oncology Oracle

Gregory Koytiger

Use cell line data to better match cancer patients and therapies

Cell lines are a rich source of drug sensitivity data

5280 Unique Drug - Cell line Sensitivity Values

Predict

675 Human cancer cell lines withRNAseq Gene Expression

Linear models provide ease of interpertability andtransparency

Drug Sensitivity

Linear Model

Ridge regression keeps only a few significant genesin model

Drug Sensitivity

Linear Model

Ridge Regression

Automatic Relevance Determnation Regression tunesmodel sparsity from data

Drug Sensitivity

Linear Model

Ridge Regression

Automatic Relevance Determination Regression

Start with only the most highly varying genes

8000 of the highest varying genes in the Genentech data were chosen as features to reduce noise and for computational tractability

Drug Sensitivity

Linear Model

Ridge Regression

Automatic Relevance Determination Regression

Docetaxel model predicts clinical response in breast cancer patients

False Positive Rate

0.75 ROC AUC

7/8 patients will besensitive

True PositiveRate

False Positive Rate

0.75 ROC AUC

91% of sensitive patients will recieve treatment64% of resistant patients will avoid unnecessary treatment

True PositiveRate

Docetaxel model predicts clinical response inbreast cancer patients

Model predicts Gemcitabine response in TCGA lungcancer data

Gem

cita

bine

lo

g IC

50

With Tumor Tumor Free

p<5% by Wilcoxon Rank-Sum Test 0.80 ROC AUC

Top 4 predictions are all sensitive

About Me

©20

14 N

atur

e A

mer

ica,

Inc.

All

right

s re

serv

ed.

T E C H N I C A L R E P O R T S

Functional interpretation of genomic variation is critical to understanding human disease, but it remains di�cult to predict the e�ects of speci�c mutations on protein interaction networks and the phenotypes they regulate. We describe an analytical framework based on multiscale statistical mechanics that integrates genomic and biophysical data to model the human SH2-phosphoprotein network in normal and cancer cells. We apply our approach to data in The Cancer Genome Atlas (TCGA) and test model predictions experimentally. We �nd that mutations mapping to phosphoproteins often create new interactions but that mutations altering SH2 domains result almost exclusively in loss of interactions. Some of these mutations eliminate all interactions, but many cause more selective loss, thereby rewiring speci�c edges in highly connected subnetworks. Moreover, idiosyncratic mutations appear to be as functionally consequential as recurrent mutations. By synthesizing genomic, structural and biochemical data, our framework represents a new approach to the interpretation of genetic variation.

TCGA and similar projects have generated extensive data on the muta -tional landscape of tumors1. To understand the functional consequences of these mutations, it is necessary to ascertain how they alter the pro-tein-protein interaction (PPI) networks involved in regulating cellular

Low-throughput methods such as �uorescence polarization spec-troscopy provide precise interaction data on a few dozen PIDs and ligands but cannot easily be scaled to the full proteome2, whereas high-throughput array-based methods provide greater scale but su�er from systematic artifacts and high false positive and false negative rates, resulting in data sets that only partly agree 2. An additional challenge is modeling the e�ects of mutations for pro-teins with multiple binding domains and/or multiple sites of phos-phorylation9, a reality for most signaling proteins (for example, the CRK oncoprotein). Existing methods are either limited to individual domains10–13 or are insu�ciently precise to discern the e�ects of single- residue changes14,15.

�e MSM framework we have developed combines genomic, bind-ing and structural data and reconciles inconsistencies within and among data sets to generate PID networks for normal and cancer cells. We develop a bottom-up �rst-principles approach, involving a single mathematical equation based on statistical mechanical ensem-bles, that models domains, proteins and networks, and we then apply this approach to the analysis of SH2 networks and mutations found in TCGA16. We validate newly predicted interactions experimentally and demonstrate the sensitivity of MSM to single-residue mutations that cause subtle changes in binding a�nity. Our analysis provides mechanistic insights into an important PID cancer network and vali-dates a computational approach to PID networks that can be applied

A multiscale statistical mechanical framework integrates biophysical and genomic data to assemble cancer networksMohammed AlQuraishi1–3, Grigoriy Koytiger1,3, Anne Jenney1, Gavin MacBeath2 & Peter K Sorger1

PTPN11 (SHP2)GRB2

STAT

5AST

AT6

STAT

1ST

AT4

STAT

2ST

AT3

STAP

1TE

NC1

TNS1

TNS3

TNS4

RIN1

SH2D5

SH2B1

SH2B2

SH2B3

SHC1

SHC3

PLCG1

PLCG2

BCAR3SH2D3CSH2D3AFERFES

CRKCRKLMATKHSH2DSH2D2ASHBSHFSHDSHEFGRYES1SRCHCKLCKLYNBLKSLA2

PTK6

PTPN6

PTPN11

ABL1G

RAPG

RB2N

CK

1N

CK

2PIK

3R1

PIK3R

3PIK

3R2

2N

HC

ITK

TXK

BTKTE

CBM

XSY

K

ZAP7

0

GRB

7

GRB1

4

GRB1

0

SH2D

1A

SH2D

1B

INPPL1BLNK

LCP2DAPP1VAV1VAV3VAV2RASA1SH3BP2

JAK2JAK3

SUPT6H

CBL

ANKS1A

ANKS1B

GULP1NUMB

NUMBL

DAB1

DAB2

APBA1

APBA2

APBA3

CCM2

DOK1

DOK2DOK4

DOK6DOK5

FRS3IRS1

IRS4APBB1

APBB2A

PBB

3SH

C2

APPL1

EPS8L2STAT

5AST

AT6

STAT

1ST

AT4

STAT

2ST

AT3

STAP

1TE

NC1

TNS1

TNS3

TNS4

RIN1

SH2D5

SH2B1

SH2B2

SH2B3

SHC1

SHC3

PLCG1

PLCG2

BCAR3SH2D3CSH2D3AFERFES

CRKCRKLMATKHSH2DSH2D2ASHBSHFSHDSHEFGRYES1SRCHCKLCKLYNBLKSLA2

PTK6

PTPN6

PTPN11

ABL1G

RAPG

RB2N

CK

1N

CK

2PIK

3R1

PIK3R

3PIK

3R2C

HN

2IT

KTX

KB

TKTEC

BMX

SYK

ZAP7

0

GRB

7

GRB1

4

GRB1

0

SH2D

1A

SH2D

1B

INPPL1BLNKLCP2

DAPP1VAV1VAV3VAV2RASA1SH3BP2JAK2

JAK3

SUPT6H

CBL

ANKS1A

ANKS1B

GULP1NUMB

NUMBL

DAB1

DAB2

APBA1

APBA2

APBA3

CCM2

DOK1

DOK2DOK4

DOK6DOK5

FRS3IRS1

IRS4APBB1

APBB2A

PBB

3SH

C2

APPL1

EPS8L2

Ph.D. Harvard UniversityDepartment of Chemistry and Chemical Biology

Postdoc Harvard Medical SchoolDepartment of Systems Biology

Management ConsultantDean & Co.

Presentation july 31_2015

Art & Photos

sensitive patients

cell line data

resistant patients

better match cancer

genentech data

human cancer cell lines

linear models

sensitive true positive