Top Banner
Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems NCBI, NLM National Institutes of Health M. Madan Babu, PhD
43

Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Jan 17, 2016

Download

Documents

Darrell Garrett
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles

in living systems

Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles

in living systems

NCBI, NLMNational Institutes of Health

NCBI, NLMNational Institutes of Health

M. Madan Babu, PhDM. Madan Babu, PhD

Page 2: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Explosion of information about living systems

Major Challenge – Integration of information

Generate experimentally testable hypothesis Uncover organizing principles of specific processes at the systems level

Expression 5,000 different conditions20 organisms (ArrayExpress, SMD, GEO)

Interaction100,000 interactions

5 organisms (Bind, DIP, publications)

Structure33,000 structures from 300 organisms (PDB, MSD)

Sequence45,000,000 sequences from 160,000 organisms (EBI, NCBI)

Page 3: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Integration of data to uncover general organizing principles

Integration of gene expression data reveals dynamics in transcriptional networks

Integration of data to generate testable hypotheses

Sequence, Structure, Expression and Interaction data provides convincing support

Regulation in Biological Systems

Introduction to transcriptional regulatory networks

Discovery of sequence specific transcription factors in the malarial parasite

Page 4: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Mosquito Human

Mo

squ

ito

Liver

RB

C

www.cdc.gov

Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription factors in Plasmodium

5300 genes with over 700 metabolic enzymes

Extensive complement of chromosomal regulatory proteins

Extensive complement signaling proteins (GTPases, kinases)

Large number of genes Complex life cycle

Genes need to be regulatedGenes need to be regulated

Page 5: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Alternative regulatory mechanisms

Chromatin-level regulationPost-translational modification

RNA based regulation

Undetected transcription factors

Distantly related or unrelatedto known DNA binding domains

Possible explanations for the paradoxical observation

Proteome of Plasmodium

Profiles & HMMs of known DBDs

bZIP

Homeo

MADs

AT-hook

Forkhead

ARID

PF14_0633

+ ?

AT-Hook

SEG

UncharacterizedGlobular domain

~60 aa

Page 6: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Characterization of the globular domain – sequence analysis I

Non-redundant database

+

...

..

..

Lineage specific expansion in Apicomplexa

Plasmodium falciparumPlasmodium vivax

Cryptosporidium parvum

Theileria annulata

Cryptosporidium hominis

Profiles + HMMof this region

Non-redundant database

+

Floral Homeotic protein Q(Triticum)

49L, an endonuclease (X. oryzae phage Xp10)

Globular region maps to AP2 DNA-binding domain

Page 7: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Non-redundant database

+

AP2 DNA-bindingDomain from

D. PsychrophilaDP2593

MAL6P1.287(Plasmodium falciparum)

Cgd6_1140/Chro.60146(Cryptosporidium)

AP2 DNA-binding domain maps to the Globular region

Characterization of the globular domain – sequence analysis II

Multiple sequence alignment of all globular

domains

JPRED/PHD

Sequence of secondary structure is similar to the AP2 DNA-binding domain

Homologs of the conserved globular domain constitutes a novel family of the AP2 DNA-binding domain

S1 S2 S3 H1

Page 8: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Characterization of the globular domain – structural analysis I

A. thaliana ethylene response factor(ATERF1 - 1gcc – NMR structure)

Binds GC rich sequences

S1 S2 S3 H1

S1 S2 S3 H1

Predicted SS of ApiAP2

SS of ATERF1

S1 S2 S3 H1

12 residues show a strong pattern of conservation andthese are involved in key stabilizing hydrophobic

interactions that determine the path of the backbonein the three strands and helix of the AP2 domain

Core fold of the ApiAp2 domainwill be similar to the plantAP2 DNA-binding domain

Page 9: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Characterization of the globular domain – structural analysis II

Y186

R147

K156

T175

R170

W154

R152

R150

E160

W172

W162

G5

C6

C7

G21

G20

G18

G17

R152 --- G5 (oxo group)D/N --- A (amino group)

R150 --- G20 (oxo group)S/T --- A (amino group)

Changes in base-contacting residues suggest binding to

AT-rich sequence

S2 S3

Charged residues in the insertmay contact multiple phosphate

groups to provide affinity

ApiAp2 domain binds DNA in a sequence specific manner

Page 10: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

RBC infection & merozoite burst

Characterization of the globular domain – expression analysis I

Mosquito Human

Mo

squ

ito

www.cdc.gov

Complex life cycle

Liver

RB

C

Intra-erythrocyte developmental cycleDeRisi Lab

mRNA expression profilingUsing microarray

(sorbitol syncronization)

Page 11: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Characterization of the globular domain – expression analysis II

Co-expressed genes

Ave

rag

e e

xp

ress

ion

pro

file

of

all

ge

nes

0 46Time points

Ring stage

Trophozoite stage

Early Schizont stage

Schizont stage

22 Transcription factors

0 46Time points

Striking expression pattern in specific developmental stages suggests that they could mediate transcriptional regulation of stage specific genes

Page 12: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Characterization of the globular domain – interaction analysis I

Protein interaction network of P. falciparum

Protein (1267)

Physical interaction (2846)LaCount et. al. Nature (2005)

Modified Y2H: Gal4 DBD + Protein + auxotrophic gene

RNA isolated from mixed stages ofIntra-erythrocyte developmental cycle

Guilt by association

Function of interacting neighbors provides cluesabout function of the protein

Page 13: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Characterization of the globular domain – interaction analysis II

Protein interaction network of P. falciparum

Guilt by association supports the role of ApiAp2 proteins to beinvolved in regulation of gene expression

ApiAp2 proteins (13)

Chromatin proteins (8)

50% hypotheticalNucleosome assemblyHMG proteinGlycolytic enzymesAntigenic proteinsHave a PPint domain

MAL8P1.153 (ES)

PFD0985w (S)

PF10_0075 (T)

PF07_0126 (R)

Network of ApiAp2 proteins (97 interactions, 93 proteins)

Gcn5

Page 14: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Sequence Structure Expression Interaction

Conclusion - I

Integration of different types of experimental data allowed us to discover potential transcription factors

in the Plasmodium genome

Integration of data can generate experimentally testable hypotheses

Balaji S, Madan Babu M, Lakshminarayan Iyer, Aravind LNucleic Acids Research (2005)

Page 15: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Integration of data to uncover general organizing principles

Integration of gene expression data reveals dynamics in transcriptional networks

Integration of data to generate testable hypotheses

Sequence, Structure, Expression and Interaction data provides convincing support

Regulation in Biological Systems

Introduction to biological networks & transcriptional regulatory networks

Discovery of sequence specific transcription factors in the malarial parasite

Page 16: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Networks in Biology

Nodes

Links

Interaction

A

B

Network

Proteins

Physical Interaction

Protein-Protein

A

B

Protein Interaction

Metabolites

Enzymatic conversion

Protein-Metabolite

A

B

Metabolic

Transcription factorTarget genes

TranscriptionalInteraction

Protein-DNA

A

B

Transcriptional

Page 17: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Structure of the transcriptional regulatory network

Scale free network(Global level)

all transcriptionalinteractions in a cellAlbert & Barabasi

Madan Babu M, Luscombe N, Aravind L, Gerstein M & Teichmann SACurrent Opinion in Structural Biology (2004)

Motifs(Local level)

patterns ofInterconnections

Uri Alon & Rick Young

Basic unit(Components)transcriptional

interaction

Transcriptionfactor

Target gene

Page 18: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Properties of transcriptional networks

Local level: Transcriptional networks are made up of motifswhich perform information processing task

Global level: Transcriptional networks are scale-free conferring robustness to the system

Page 19: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Transcriptional networks are made up of motifs

Single inputMotif

- Co-ordinates expression- Enforces order in expression- Quicker response

ArgR

Arg

D

Arg

E

Arg

F

Multiple inputMotif

- Integrates different signals- Quicker response

TrpR TyrR

AroM AroL

Network Motif

“Patterns ofinterconnections

that recur at different parts and

with specificinformation

processing task”

Feed ForwardMotif

- Responds to persistent signal - Filters noise

Crp

AraC AraBAD

Function

Shen-Orr et. al. Nature Genetics (2002) & Lee et. al. Science (2002)

Page 20: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

N (k) k

1

Scale-free structure

Presence of few nodes with many links and many

nodes with few links

Transcriptional networks are scale-free

Scale free structure provides robustness to the system

Albert & Barabasi, Rev Mod Phys (2002)

Page 21: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Scale-free networks exhibit robustness

Robustness – The ability of complex systems to maintain their function even when the structure of the system changes significantly

Tolerant to random removal of nodes (mutations)

Vulnerable to targeted attack of hubs (mutations) – Drug targets

Hubs are crucial components in such networksHaiyuan Yu et. al.

Trends in Genetics (2004)

Page 22: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Summary I - Introduction

Transcriptional networks are made up of motifs that havespecific information processing task

Transcriptional networks are scale-free which confers robustnessto such systems, with hubs assuming importance

Madan Babu M, Luscombe N et. alCurrent Opinion in Structural Biology (2004)

Page 23: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Are there differences in the sub-networks under different conditions?

Cell cycle

Sporulation

Stress

Static network

Across all cellular conditions

Dynamic nature of the regulatory network in yeast

How are the networks used under different conditions?

Page 24: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Dataset - gene regulatory network in Yeast

3,962 genes (142 TFs +3,820 TGs)7,074 Regulatory interactions

Individual experimentsTRANSFAC DB + Kepes dataset

288 genes + 477 genes356 interactions + 906 interactions

ChIp-chip experimentsSnyder lab + Young lab1560 genes + 2416 gene

2124 interactions + 4358 interaction

Page 25: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Integrating gene regulatory network with expression data

142 TFs3,820 TGs7,074 Interactions

Transcription Factors

1 condition

2 conditions

3 conditions

4 conditions

5 conditions

142 TFs1,808 TGs4,066 Interactions

Target Genes

Gene expression data

for 5 cellular conditions

Cell-cycleSporulation

DNA damageDiauxic shift

Stress

Page 26: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Back-tracking method to find active sub-networks

Gene regulatory network

Identify differentially regulated genes

Find TFs that regulate the genes Find TFs that regulate these TFs

Active sub-network

Page 27: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

DNA damage

Cell cycleSporulation

Diauxic shift

Stress

Active sub-networks: How different are they ?

Multi-stageprocesses

BinaryProcesses

Page 28: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Network Motifs

Milo et.al (2002), Lee et.al (2002)

Single Input Motif (SIM) – 23%

Feed-forward Motif (FF) – 27%

Multi-Input Motifs (MIM) – 50%

Page 29: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Sub-networks : Network motifs

Network motifs are used preferentially in the different cellular conditions

Page 30: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

- Do different proteins become hubs under different conditions?

- Is it the same protein that acts as a regulatory hub?

Cell cycle Sporulation Diauxic shift DNA damage Stress

Condition specific networks are scale-free

Page 31: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Regulatory hubs change with conditions

Cluster TFs according tothe number of target genes

active in each condition

Different TFs become key regulators in

different conditions

CC SP DS DD SR TF

250 45 20 30 15 Swi6

Page 32: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Hubs regulate other hubs to initiate cellular events

Suggests a structure which transfers weight between hubsto trigger cellular events

Page 33: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Network Parameters

Connectivity

Path length

Clustering coefficient

Page 34: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Network Parameters - Connectivity

Outgoing connections = 49.8

on average, each TF regulates ~50 genes

Changes

Incoming connections = 2.1

on average, each gene is regulated by ~2 TFs

Remains constant

Page 35: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Network parameters : Connectivity

Binary:Quick, large-scale turnover of genes

Multi-stage:Controlled, ticking

over of genes at different stages

• “Binary conditions” greater connectivity

• “Multi-stage conditions” lower connectivity

Page 36: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Number of intermediate TFs until final target

Path length

1 intermediate TF

= 1

Indication of how immediatea regulatory response is

Average path length = 4.7

Network Parameters – Path length

Starting TF

Final target

Page 37: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

• “Binary conditions” shorter path-length “faster”, direct action

• “Multi-stage” conditions longer path-length “slower”, indirect action intermediate TFs regulate different stagesBinaryMulti-stage

Network parameters : Path length

Page 38: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Clustering coefficient

= existing links/possible links= 1/6 = 0.17

Measure of inter-connectedness of the network

Average coefficient = 0.11

6 possible links

1 existing link

4 neighbours

Network Parameters – Clustering coefficient

Ratio of existing links to maximum number of links for neighboring nodes

Page 39: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

• “Binary conditions” smaller coefficientsless TF-TF inter-regulation

• “Multi-stage conditions” larger coefficients more TF-TF inter-regulation

BinaryMulti-stage

Network parameters : Clustering coeff

Page 40: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Sub-networks have evolved both their local structure and global structure to respond to cellular conditions efficiently

multi-stage conditions

• fewer target genes• longer path lengths• more inter-regulation between TFs

binary conditions

• more target genes• shorter path lengths• less inter-regulation between TFs

Page 41: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Implications

First overview of the dynamics of thetranscriptional regulatory network of a eukaryote

Identification of key regulatory hubs under different conditions can serve asgood drug targets

Provides insights into engineering regulatory interactions

Methods developed to reconstruct and compare active networksare generically applicable

Page 42: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Conclusions - II

Sub-networks have evolved both their local structure and global structure to respond to cellular conditions efficiently

Luscombe N, Madan Babu M et. alNature (2004)

Network motifs are preferentially used under the different cellularconditions and different proteins act as regulatory hubs in different

cellular conditions

Integration of data can uncover organizing principles in living systems

Page 43: Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:

Balaji Lakshminarayan Aravind

Acknowledgements

National Center for Biotechnology InformationNational Institutes of Health

MRC Laboratory of Molecular Biology

Nick LuscombeSarah Teichmann

Haiyuan YuMike Snyder

Mark Gerstein