Pharmacology Powered by Computational Analysis: Prediction of Drug-induced Toxicity Jaehee Shim
Jan 22, 2017
Pharmacology Powered by Computational Analysis:
Prediction of Drug-induced Toxicity
Jaehee Shim
Big Data in the Field of Biology:In the Beginning…
Notable Events that led to Big Data Era: Sanger Sequencing(1977) Roger Tsien et al Patented
“Base-by-Base” Technology(1990)
Pyrosequencing Introduced by Nyren &Tsien. (1996)
Human Genome Project(1990-2003)
Big Data Sources: Genome Transcriptome-expressed
genome Proteome Electronic Medical Records
Big Data in the Field of Biology:In the Beginning…
Human Body:
13 organ systems in human body with 4 basic tissue types
15-70 trillion cells
Genome
Transcript(messenger RNA)
Protein
Drawing of woman's torso from Anatomical Notebooks of Leonardo da Vinci(1452-1519)
Complete set of genetic information
Same in every cell
Selectively expressed genes
Specific to the tissue/organ cell type
Proteins are made from transcripts
Multiple versions of protein can arise from one transcript (post-translation modification)
Sequencing Data:How big are they?
Stephens et al.(2015) PLoS Biol 13(7): e1002195
Projected annual storage & computing needs in 2025
…so in 2025, we can expect to see the annual production of
1 X 1021 Bases/Year X 1byte/4bases =2.5X1020 bytes
OR250 Exa-bytes!
Just from sequencing alone!
Twitter Youtube Genomics0
5E+0181E+019
1.5E+0192E+019
2.5E+0193E+019
3.5E+0194E+019
4.5E+019
Series1
Proj
ecte
d An
nual
St
orag
e N
eed
Now that we have covered the basics…
How are we using this BIG DATA approach to predict drug-induced
cardiotoxicity?
Imperfections of Modern Drug Design
Drug Toxicity: Alternative drug targets perturb cellular dynamics and induce adverse event in a patient
How Common are the Drug Toxicity Events?
: 770,000 injuries or deaths in US per
yearper The Agency for Healthcare Research and Quality
By Stephen Jeffrey, The Economist
Cancer Drug Cardiac heath
Prediction of toxicity requires more investigation.
Underlying mechanisms are not clear.
Albini et al. (2009) J. Natl. Cancer Inst. 102:14–25.
Principal Investigators:Marc Birtwistle
Ravi IyengarEric Sobie
Cellular Signatures for Cardiotoxicity of Targeted Cancer Drugs (Protein Kinase Inhibitors)
Can we obtain precise and personalized signatures?
Drug Toxicity Signature Generation Center (DToxS)
Protein kinase inhibition
altered gene expression
cardiomyopathy
Cardiotoxicity
8
Why Do We Want to Personalize Medicine?
If we had to prescribe the same drugs to EVERYONE before…
Now, we can SELECTIVELY prescribe to the ONES who are
expected to respond!
Advantage?
Precise, effective delivery of the treatment for the individual patient
Lower risk of getting unnecessary side-effects
Reducing the unnecessary medical costs for treatments that may not work.
Drug-Induced Toxicity Prediction Strategy
1. Electrophysiological abnormality-- Arrhythmia :
Thinning of the walls
2. Structural abnormality-- Dilated Cardiomyopathy:
Prediction can be made with mathematical modeling
Transcriptome Data
Gene Perturbation Measurements
Mathematical Modeling
Network AnalysisPrediction of abnormalities is assessed through integrating transcriptome data with dynamical models
Upregulated
Downregulated
Experimental & Computational Strategy for Years 1-2
(1) Focus on cardiotoxicity caused by cancer therapeutics, e.g. tyrosine kinase inhibitors (TKIs)
(2) Treat cells with clinically-relevant doses of FDA approved TKIs and mitigating non-cancer drugs as controls.
Mitigators identified from clinical data in the FDA – Adverse Events Database (FAERS)
(3) Measure changes in gene expression and protein levels at 48 hours using mRNA-seq and proteomics
(4) Analyze results to obtain signatures, build biologically-relevant networks, and integrate network analysis data with predictive dynamical models to obtain dynamically ranked signatures
11
SORAFENIB DASATINIBSUNITINIB PAZOPANIB
TOFACITINIB RUXOLITINIBCRIZOTINIB AFATINIB
ERLOTINIB REGORAFENIBGEFITINIB PONATINIB
IMATINIB DABRAFENIB
BOSUTINIB VEMURAFENIB
VANDETANIB CABOZANTINIB
LAPATINIB TRAMETINIBNILOTINIB CERITINIBAXITINIB
Kinase Inhibitors with Cardiac RiskURSODEOXYCHOLIC
ACID PREDNISIOLONELOPERAMIDE DOMPERIDONE
DOMPERIDONE ALENDRONATEAPREPITANT PAROXETINE
DIETHYLPROPION ESTRADIOLENTECAVIR MONTELUKAST
OLMESARTAN CYCLOSPORINE
DICLOFENAC CEFUROXIME
CYTARABINE METHOTREXATE
GRANISETRON LOXAPINE
Control Drugs
Candidates of Cancer drug & Control Drugs
Experimental designCompare cardiotoxic cancer drugs with non-toxic non-cancer drugs and combinations
mRNA-seqProteomics
48 HOURS
Vehicle CTRL Cardiotoxic Drug
non-Cardiotoxic Drug(CTRL Drug)
Combination
Computational analysis to produce precise, personal signatures
13
Generation of Gene Signatures: Computational Pipeline
Mapping/Counting of the Raw Gene Sequences
RAW Sequence in text format(FASTQ file):
Reference Seq.
Schematic representation of how ‘fragments of sequences’ are “aligned” to a reference sequence.
Generation of Gene Signatures: Computational Pipeline
QC: How to Weed Out the Outliers from Replicate Samples
To identify outliers, correlate each pair of samples in the same experimental group
We exclude Control Sample 4 as an outlier
Pearson correlation > 0.98 seems to indicate good reproducibility for this assay; future results will solidify this QC standard
Summary of Signatures and Center Structure
Questions We Can Address With Gene SignaturesWhat patterns are common amongst potentially cardiotoxic protein kinase inhibitors?
PRECISION IN SIGNATURES
What differences are observed between drugs, and can these be connected to differences in drug/target structure, dosing, and clinical data?
PERSONALIZED SIGNATURES
Can differences in signature patterns between human subjects (cell lines) help us to understand inter-individual variability in drug toxicity?
Drug repurposing for cancer chemotherapy?
Can drug combination signatures help us to understand clinically-observed toxicity mitigation?
19
20
Cardiotoxic Cancer Drugs Show a More Consistent Pattern of Differential Expression
Average –log10(p-value) Across Drug Group
Num
ber o
f Gen
es
Cancer Drugs
Non-Cancer Drugs
Mean Log2 Fold Change
Cancer Drug non-Cancer(CTRL)
collagen fibril organizationcellular localization
regulation of cellular component organizationregulation of apoptotic processresponse to organic substance
response to woundingcellular response to chemical stimulus
regulation of cell deathregulation of programmed cell death
regulation of cell migrationregulation of locomotion
regulation of cellular component movementregulation of cell motility
cellular component organizationcellular component organization or biogenesis
negative regulation of cellular processresponse to stress
negative regulation of biological processextracellular structure organization
extracellular matrix organization
0 10 20 30 40 50
protein complex disassemblyestablishment of protein localization to membrane
macromolecular complex disassemblymRNA catabolic process
cellular protein complex disassemblytranslational elongation
nuclear-transcribed mRNA catabolic processtranslational initiation
translational terminationviral life cycle
protein targeting to membranemulti-organism metabolic process
protein localization to endoplasmic reticulumnuclear-transcribed mRNA catabolic process, nonsense-mediated decay
viral gene expressionviral transcription
establishment of protein localization to endoplasmic reticulumprotein targeting to ER
cotranslational protein targeting to membraneSRP-dependent cotranslational protein targeting to membrane
0 10 20 30 40 50
Minus log10(p-value)
Extracellular matrix, Collagen,Response to wounding
Apoptosis, Cell death
Cell migration
Co-translationalprotein targeting,Translation,Ribosomal proteins
(viral) transcriptionand mRNA catabolism
Protein translation andProtein complex assembly/disassembly
Gene
ral
GO b
iolo
gica
l pr
oces
ses
Card
iom
yopa
thy-
rela
ted
GO b
iolo
gica
l pro
cess
es
Cancer Drug Cardiotoxicity Processes are Enriched in the Initial Transcriptomic Signature
Canc
er D
rugs
Non
-Can
cer
Drug
s (CT
RL)
Tanimoto Coefficient for Structural Similarity0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75
W
hole
Tra
nscr
ipto
me
Cor
rela
tion
Coe
ffici
ent
0.7
0.75
0.8
0.85
0.9
0.95
1BOS, AFA
DAS, AFA
DAS, BOS
ERL, AFA
ERL, BOSERL, DAS
PAZ, AFA
PAZ, BOS
PAZ, DAS
PAZ, ERL
RUX, AFARUX, BOS
RUX, DAS
RUX, ERL
RUX, PAZ
SOR, AFA
SOR, BOS
SOR, DAS
SOR, ERL
SOR, PAZ
SOR, RUX
SUN, AFASUN, BOS
SUN, DAS
SUN, ERL
SUN, PAZ
SUN, RUX
SUN, SOR
VAN, AFA
VAN, BOS
VAN, DASVAN, ERL
VAN, PAZ
VAN, RUX
VAN, SOR
VAN, SUN
Differences Between Cancer Drugs—Relationship Between Gene Expression Similarity and Structural SimilarityHigh correlation because small
changes in expression
Correlated structural and gene expression similarity between drugs
Preliminary efforts to define signature precision
Next Step:
Prediction of Phenotypic Changes Based on Gene Expression Data Using Dynamical Modeling with
Differential Equations
Structural Abnormality Prediction : Hypertrophy
Extracellular Stimuli
Interacting Species
Phenotypic OutputsRyall et al. (2012) JBC 287: 42259–42268.
Beta-adrenergic Receptor
Map Kinase Pathway: cascade of phosphorylation reaction to propagate signal from the stimulus
Kraeutler et al. (2012) BMC Sys Biol. 4:157.
Methods: Model implemented using “Normalized Hill” Ordinary Differential Equations Simulations of dynamics with minimal parameterization.
)(1][, DDfw
dtDd
MAXBactBDD
nn
nBMAX
Bact ECBBY
f50
,,
Structural Abnormality Prediction : Hypertrophy
Each arrow represents a generic activation or inhibition reaction.
Structural Abnormality Prediction : Hypertrophy
Quantitative Analysis of Gene Perturbation in the
Network
Transcriptome(~20,000 genes)
Genes in Hypertrophy Network (~106 genes)
Simulate the time course of different pathway activation that leads to hypertrophy
Mathematical Simulation
Trastuzumab
Sorafenib
Sunitinib
Modeling Strategy:
Hypertrophy Signaling Model Simulation
NFAT
BNP
GSK3B
time (minutes)50 100 150 200 250 300 350 400
0
0.5
1
1.5
2
2.5
time (minutes)50 100 150 200 250 300 350 4000
0.5
1
1.5
2
2.5
time (minutes)50 100 150 200 250 300 350 400Nor
mal
ized
act
i vity
0
0.5
1
1.5
2
2.5
time (minutes)50 100 150 200 250 300 350 4000
0.5
1
1.5
2
2.5
CREB
ControlSorafenibSunitinibTrastuzumab
Stimulus given:
Phenylephrine (PE)
No Stimulus
No Stimulus
No Stimulus
Stretch Isoproterenol (ISO)
Fibroblast Growth Factor (FGF)
Nor
mal
ized
act
i vity
Nor
mal
ized
act
i vity
Nor
mal
ized
act
ivity
Different Cancer Drugs Induce Different Responses in Gene Species for
a Given Stimulus
Next Step: How Each Gene Node Contribute to Overall Phenotypic
(Structural) Changes?
Raw Gene Expression Pattern in Hypertrophy Network
Sorafenib Sunitinib Trastuzumab
Log FC in gene expression data
Noticeable genetic perturbation in Sorafenib
Mild induction of gene change in Sunitinib and Trastuzumab
Q. Does this noticeable gene perturbation necessarily mean activation of hypertrophy?
Next Step: Using Hypertrophy Network Model, simulate the projected changes in hypertophic phenotypes by integrating the raw gene expression pattern!
Predicted Pro-hypertrophic Changes Per Drug Condition
phenotypic outputSERCA
aMHC Cell
Area bMHCBNP
ANPsACT
rNo
mali
zed H
yper
troph
ic Re
spon
se
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4SorafenibSunitinibTrastuzumab
Pro-HypertrophicAnti-Hypertrophic
Sunitnib is the most hypertrophic drug!
Instead of looking at overall gene change, we need to look at how each
gene is affected!
Sensitivity Analysis of Hypertrophy Network Model
Serca aMHC CellArea bMHC BNP ANP sACT
Hypertrophy Network has: 106 interacting Nodes 17 stimuli 7 phenotypic outputs
Strategy for simulating the impact of each of 106 interacting species(Sensitivity Analysis) : Given no stimulus Vary each node’s default
parameter by ±10 % Measure the impact of the
variation in relation to each of 7 phenotypic output
Sensitivity Analysis of 106 Nodes
No Significant Changes
Only 5 Nodes are Responsible for Structural Changes!
Sensitive nodes: GSK3B HDAC SERCA aMHC foxo
Sunitinib-induced gene expression changes in the sensitive nodes have complete opposite pattern from the other two drugs
Cancer Drug Induced Changes in the Sensitive Nodes
Does drug treatment change the sensitivity of the node in overall network? (i.e. Given the drug treatment, will the sensitivity pattern
change?)
'aMHC' 'foxo' 'HDAC' 'SERCA'
'aMHC' 'ANP' 'bMHC' 'CellArea' 'CREB' 'foxo' 'GATA4' 'GSK3B' 'HDAC' 'NFAT’ 'sACT' 'SERCA'
'aMHC' 'foxo' 'HDAC' 'SERCA'
Drug specific sensitivity of 106 nodes per phenotypic outputs
Noticeable Increase in the Number of Sensitive Nodes in Sunitinib Treated Cells
Currently in the process of:1. Expanding sensitivity analysis to all drug conditions 2. Integrating sensitivity metrics with hypertrophy index
Conclusions and Future DirectionsSummary:
Gene expression data were integrated with existing network-based models to investigate pathophysiological mechanisms of drug-induced cardiotoxicity.
Simulations were used to show: Time-dependent changes in intracellular signaling Stimulus-dependent phenotypic changes Changes in sensitive nodes in the network
Current Challenges: Integrating additional network-based dynamical models
EGF-induced signaling Apoptosis
Comparing drug classes in depth using simulation results New predictions for which processes/outputs are most
relevant?
AcknowledgementsDr. Eric Sobie LabMegan CumminsRyan Devenyi Elisa Nuñez-AcostaJingqi Gong
Marc BirtwistleRavi IyengarEric Sobie
Evren AzelogluYi-bang ChenSunita D'SouzaJames GalloMilind MahajanChristoph SchanielAvner Schlessinger
Pedro MartinezTina HuPriyanka DhananRick KochGomathi JayaramanJens HansenYuguang Xiong
The Mount Sinai LINCS DSGC team
Sequencing Data:Who is interested in them?
Sequencing Data: Current Computational Approach to Make Sense of Them
Statistical Computation of Differential Expressed Genes(DEGs)
Trastuzumab
Ursodeoxycholic acid
Combination
73/28 (up/down)
22/28 (up/down)
98/43 (up/down)
Differentially Expressed:
Log2 Fold Change: -4 0 4
FASTQ file (Raw data
from Sequencer)
Sequence Alignment with
BWA
QC: Eliminate Outlier Samples
Consolidate and Normalize BWA
output with EdgeR
EdgeR (Trimmed mean of means, TMM) : Normalize based on a weighted average instead of a median.
EdgeR computes statistical significance based on the normalized data using TMM &generates DEGs with p-values
Tras
tuzu
mab
Using DEGs, statistically imporatant cellular pathway list generated