Inferring Nonstationary Gene Networks from Temporal Gene Expression Data
Post on 11-Jan-2016
21 Views
Preview:
DESCRIPTION
Transcript
1
Harvard Medical School Massachusetts Institute of Technology
Inferring Nonstationary Gene Networks from Temporal Gene Expression Data
Hsun-Hsien Chang1, Jonathan J. Smith2, Marco F. Ramoni1
1Children’s Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School 2Department of Mathematics, Massachusetts Institute of Technology
IEEE Workshop on Signal Processing SystemsOctober 7, 2010
2
Harvard Medical School Massachusetts Institute of Technology
Background
• Genetic information flows from DNA to RNA through transcription.
• Modern microarray technologies are able to assess expression of 50K genes in parallel.
• Gene expression is the measure of RNA abundance in cells, revealing the gene activities.
3
Harvard Medical School Massachusetts Institute of Technology
Clinical Applications
• Thanks to cost down, more samples can be collected in a single study. A new clinical application:– Monitor time-series gene expression in response to drugs,
treatments, vaccines, virus infection, etc.
T0
...gene
expre.
T1 T2 T3 T4 T5
Multiple patients in distinct biological
conditions.
4
Harvard Medical School Massachusetts Institute of Technology
Time-Series Gene Expression Analysis• Since genes interact each other in cells, an intriguing
analysis is to infer gene networks:– Detailed models (e.g., differential equations).
– Abstract models (e.g., Boolean networks).
– Probabilistic graphical models (e.g., dynamic Bayesian networks).
• Do not require densely sampled data. • Model expression levels by random variables to
handle noisy expression measurements and biological variability.
• Utilize the inferred networks to make prediction.
gene on gene off
5
Harvard Medical School Massachusetts Institute of Technology
Data Representation by Bayesian Networks• Bayesian networks are directed acyclic graphs where:
– The network model can serve as a prediction tool.
XT
YTZT+1
givenXT
YT
predictedZT+1
– Example: variables X and Y at time T modulate variable Z at time T+1.
• Dynamic Bayesian networks with arcs indicating temporal dependency.
– Nodes correspond to random variables (i.e., expressions of genes, clinical variables).
– Directed arcs encode conditional probabilities of the target (child) nodes on the source (parent) nodes.
A
BC
ED
6
Harvard Medical School Massachusetts Institute of Technology
Network Inference Engine
AT
BT
CT
NT
VT
AT+1
BT+1
CT+1
NT+1
VT+1• First-order Markov process:
data at time T+1 depends only on the preceding time T.
• For a variable at a time T+1, search which set of variables at time T has the highest likelihood of modulating its value at T+1.
• Step-wise search algorithm.
Clinical variable
Genes
7
Harvard Medical School Massachusetts Institute of Technology
Inference of Whole Dynamic Gene Network
AT
BT
CT
NT
VT
AT+1
BT+1
CT+1
NT+1
VT+1
AT+2
BT+2
CT+2
NT+2
VT+2
• Infer a transition network between every pair of times.
8
Harvard Medical School Massachusetts Institute of Technology
Parallelize Learning Individual Transition Nets
AT+1
BT+1
CT+1
NT+1
VT+1
AT+2
BT+2
CT+2
NT+2
VT+2
AT
BT
CT
NT
VT
AT+1
BT+1
CT+1
NT+1
VT+1
AT+2
BT+2
CT+2
NT+2
VT+2
9
Harvard Medical School Massachusetts Institute of Technology
Parallelize Parent Searching of Individual Variables
AT
BT
CT
NT
VT
AT+1
BT+1
CT+1
NT+1
VT+1
10
Harvard Medical School Massachusetts Institute of Technology
Step-by-Step Prediction
AT
BT
CT
NT
VT
AT+1
BT+1
CT+1
NT+1
VT+1
AT+2
BT+2
CT+2
NT+2
VT+2
AT
BT
CT
NT
VT
AT+2
BT+2
CT+2
NT+2
VT+2
AT+1
BT+1
CT+1
NT+1
VT+1
AT+1
BT+1
CT+1
NT+1
VT+1
given data
predicted predictedgiven data
11
Harvard Medical School Massachusetts Institute of Technology
Forecasting by Initial Data
AT
BT
CT
NT
VT
AT+1
BT+1
CT+1
NT+1
VT+1
AT+2
BT+2
CT+2
NT+2
VT+2
AT
BT
CT
NT
VT
AT+2
BT+2
CT+2
NT+2
VT+2
AT+1
BT+1
CT+1
NT+1
VT+1
given data
predictedpredicted
12
Harvard Medical School Massachusetts Institute of Technology
Clinical Study: HIV Viral Load Tracking• Global AIDS epidemic is one of the greatest threats to
human health, causing 2 million deaths every year.• Viral load (i.e., virus density in blood) is:
– associated with clinical outcomes. – an indicator of which treatment physicians should provide.
• If there is a tool to predict/forecast viral load trajectory, physicians could foresee how patients progress to AIDS and could allocate the best treatments upfront.
Enroll 1 2 4 12 24
viral load
...gene expre.
• Data: Fourteen (12 Africans, 2 Americans) untreated adult patients during acute infection.
13
Harvard Medical School Massachusetts Institute of Technology
Dynamic Gene Network of HIV Viral Load
14
Harvard Medical School Massachusetts Institute of Technology
15
Harvard Medical School Massachusetts Institute of Technology
Accuracy of HIV Viral Load Tracking
Fitted Validation (Accuracy)
Cross Validation (Robustness)
Dynamic Gene Network 97.8% 95.8%Viral Load Auto-Regression 90.1% 89.5%
• Prediction accuracy:
• Forecasting accuracy:Fitted Validation
(Accuracy)Cross Validation
(Robustness)Dynamic Gene Network 92.9% 91.8%
Viral Load Auto-Regression 88.7% 87.0%
16
Harvard Medical School Massachusetts Institute of Technology
30 Genes Dynamically Interact with Viral LoadAMY1A: amylase, alpha 1a; salivary OTOF: otoferlin
TNFAIP6 : tumor necrosis factor, alpha-induced protein 6
KIR2DL3: killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3
NBPF14: neuroblastoma breakpoint family, member 14 OSBP2: oxysterol binding protein 2
IRF7: interferon regulatory factor 7 CFD: complement factor d (adipsin)
HLA-DQA1: major histocompatibility complex, class ii, dq alpha 1
HLA-DRB1: major histocompatibility complex, class ii, dr beta 1
RPS23: ribosomal protein s23 GPR56: g protein-coupled receptor 56
IFI44L: interferon-induced protein 44-like CCL23: chemokine (c-c motif) ligand 23
KLRC2: killer cell lectin-like receptor subfamily c, member 2
ITIF3: interferon-induced protein with tetratricopeptide repeats 3
SOS1: son of sevenless homolog 1 (drosophila) G1P2: interferon, alpha-inducible protein (clone ifi-15k)
LOC652775: similar to ig kappa chain v-v region l7 precursor
CCL3L1: chemokine (c-c motif) ligand 3-like 1
MBP: myelin basic protein S100P: s100 calcium binding protein p
IFITM3: interferon induced transmembrane protein 3 (1-8u)
MX1: myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse)
HERC5: hect domain and rld 5 NME4: non-metastatic cells 4, protein expressed in
HLA-DQB1: major histocompatibility complex, class ii, dq beta 1
LOC653157: similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase)
LOC643313: similar to hypothetical protein loc284701 RSAD2: radical s-adenosyl methionine domain containing 2
17
Harvard Medical School Massachusetts Institute of Technology
Conclusions
• A Bayesian network framework to infer dynamic gene networks from time-series gene expression microarrays:– Does not require densely sampled microarray data.– Able to handle noise and handle biological variability.– Temporal dependency is captured by first-order Markov
process.– The optimal network model is achieved by parallelized search
algorithm. • Application to HIV viral load tracking shows how our
method can be used in clinical studies:– Our network model tracks viral load trajectories with higher
accuracy than viral load auto-regressive model.– Our model provides candidate gene targets for drug/vaccine
development.
18
Harvard Medical School Massachusetts Institute of Technology
Acknowledgements
Supported by Center for HIV/AIDS Vaccine Immunology (CHAVI) # U19 AI067854-06:
•National Institute of Allergy and Infectious Diseases (NIAID)•National Institutes of Health (NIH)•Division of AIDS (DAIDS)•U.S. Department of Health and Human Services (HHS)
19
Harvard Medical School Massachusetts Institute of Technology
AT+2
BT+2
CT+2
NT+2
VLT+2
AT+3
BT+3
CT+3
NT+3
VLT+3
AT+1
BT+1
CT+1
NT+1
VLT+1
AT+2
BT+2
CT+2
NT+2
VLT+2
Stationary Network Inference
AT
BT
CT
NT
VLT
AT+1
BT+1
CT+1
NT+1
VLT+1
AT+2
BT+2
CT+2
NT+2
VLT+2
• All networks between pairs of times are identical.
20
Harvard Medical School Massachusetts Institute of Technology
21
Harvard Medical School Massachusetts Institute of Technology
Pathway: Immune Response (16/30 genes, p<10-6)AMY1A: amylase, alpha 1a; salivary OTOF: otoferlin
TNFAIP6 : tumor necrosis factor, alpha-induced protein 6
KIR2DL3: killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3
NBPF14: neuroblastoma breakpoint family, member 14 OSBP2: oxysterol binding protein 2
IRF7: interferon regulatory factor 7 CFD: complement factor d (adipsin)
HLA-DQA1: major histocompatibility complex, class ii, dq alpha 1
HLA-DRB1: major histocompatibility complex, class ii, dr beta 1
RPS23: ribosomal protein s23 GPR56: g protein-coupled receptor 56
IFI44L: interferon-induced protein 44-like CCL23: chemokine (c-c motif) ligand 23
KLRC2: killer cell lectin-like receptor subfamily c, member 2
ITIF3: interferon-induced protein with tetratricopeptide repeats 3
SOS1: son of sevenless homolog 1 (drosophila) G1P2: interferon, alpha-inducible protein (clone ifi-15k)
LOC652775: similar to ig kappa chain v-v region l7 precursor
CCL3L1: chemokine (c-c motif) ligand 3-like 1
MBP: myelin basic protein S100P: s100 calcium binding protein p
IFITM3: interferon induced transmembrane protein 3 (1-8u)
MX1: myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse)
HERC5: hect domain and rld 5 NME4: non-metastatic cells 4, protein expressed in
HLA-DQB1: major histocompatibility complex, class ii, dq beta 1
LOC653157: similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase)
LOC643313: similar to hypothetical protein loc284701 RSAD2: radical s-adenosyl methionine domain containing 2
22
Harvard Medical School Massachusetts Institute of Technology
major histocompatibility complex, class ii, dr beta 1 otoferlin
tumor necrosis factor, alpha-induced protein 6 killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3
neuroblastoma breakpoint family, member 14 oxysterol binding protein 2
interferon regulatory factor 7 complement factor d (adipsin)
major histocompatibility complex, class ii, dq alpha 1 amylase, alpha 1a; salivary
ribosomal protein s23 g protein-coupled receptor 56
killer cell lectin-like receptor subfamily c, member 2 chemokine (c-c motif) ligand 23
interferon-induced protein 44-like interferon-induced protein with tetratricopeptide repeats 3
son of sevenless homolog 1 (drosophila) interferon, alpha-inducible protein (clone ifi-15k)
similar to ig kappa chain v-v region l7 precursor chemokine (c-c motif) ligand 3-like 1
myelin basic protein s100 calcium binding protein p
interferon induced transmembrane protein 3 (1-8u) myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse)
hect domain and rld 5 non-metastatic cells 4, protein expressed in
major histocompatibility complex, class ii, dq beta 1 similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase)
similar to hypothetical protein loc284701 radical s-adenosyl methionine domain containing 2
Pathway: Antiviral Defense (8/30 genes, p<10-3)
23
Harvard Medical School Massachusetts Institute of Technology
major histocompatibility complex, class ii, dr beta 1 otoferlin
tumor necrosis factor, alpha-induced protein 6 killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3
neuroblastoma breakpoint family, member 14 oxysterol binding protein 2
interferon regulatory factor 7 complement factor d (adipsin)
major histocompatibility complex, class ii, dq alpha 1 amylase, alpha 1a; salivary
ribosomal protein s23 g protein-coupled receptor 56
killer cell lectin-like receptor subfamily c, member 2 chemokine (c-c motif) ligand 23
interferon-induced protein 44-like interferon-induced protein with tetratricopeptide repeats 3
son of sevenless homolog 1 (drosophila) interferon, alpha-inducible protein (clone ifi-15k)
similar to ig kappa chain v-v region l7 precursor chemokine (c-c motif) ligand 3-like 1
myelin basic protein s100 calcium binding protein p
interferon induced transmembrane protein 3 (1-8u) myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse)
hect domain and rld 5 non-metastatic cells 4, protein expressed in
major histocompatibility complex, class ii, dq beta 1 similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase)
similar to hypothetical protein loc284701 radical s-adenosyl methionine domain containing 2
Pathway: Inflammatory Response (5/30 genes, p<0.05)
24
Harvard Medical School Massachusetts Institute of Technology
major histocompatibility complex, class ii, dr beta 1 otoferlin
tumor necrosis factor, alpha-induced protein 6 killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3
neuroblastoma breakpoint family, member 14 oxysterol binding protein 2
interferon regulatory factor 7 complement factor d (adipsin)
major histocompatibility complex, class ii, dq alpha 1 amylase, alpha 1a; salivary
ribosomal protein s23 g protein-coupled receptor 56
killer cell lectin-like receptor subfamily c, member 2 chemokine (c-c motif) ligand 23
interferon-induced protein 44-like interferon-induced protein with tetratricopeptide repeats 3
son of sevenless homolog 1 (drosophila) interferon, alpha-inducible protein (clone ifi-15k)
similar to ig kappa chain v-v region l7 precursor chemokine (c-c motif) ligand 3-like 1
myelin basic protein s100 calcium binding protein p
interferon induced transmembrane protein 3 (1-8u) myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse)
hect domain and rld 5 non-metastatic cells 4, protein expressed in
major histocompatibility complex, class ii, dq beta 1 similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase)
similar to hypothetical protein loc284701 radical s-adenosyl methionine domain containing 2
Interferon Family Dominates
3 pathways; 2 pathways; 1 pathway
top related