Adaptive evolution in prokaryotic transcriptional regulatory networks M. Madan Babu, PhD NCBI, NLM National Institutes of Health
Jan 20, 2016
Adaptive evolution in prokaryotic transcriptionalregulatory networks
M. Madan Babu, PhD
NCBI, NLMNational Institutes of Health
Networks in Biology
Nodes
Links
Interaction
A
B
Network
Proteins
Physical Interaction
Protein-Protein
A
B
Protein Interaction
Metabolites
Enzymatic conversion
Protein-Metabolite
A
B
Metabolic
Transcription factorTarget genes
TranscriptionalInteraction
Protein-DNA
A
B
Transcriptional
Evolution of the regulatory network across organisms
Evolution of local network structure (motifs)
Structure of the transcriptional regulatory networkComponents, local & global structure
Outline
Evolution of components in the network (genes and interactions)
Evolution of global network structure (scale-free structure)
Evolution of the regulatory network across organisms
Evolution of local network structure (motifs)
Structure of the transcriptional regulatory networkComponents, local & global structure
Outline
Evolution of components in the network (genes and interactions)
Evolution of global network structure (scale-free structure)
Structure of the transcriptional regulatory network
Scale free network(Global level)
all transcriptionalinteractions in a cellAlbert & Barabasi
Madan Babu M, Luscombe N, Aravind L, Gerstein M & Teichmann SACurrent Opinion in Structural Biology (2004)
Motifs(Local level)
patterns ofInterconnections
Uri Alon & Rick Young
Basic unit(Components)transcriptional
interaction
Transcriptionfactor
Target gene
Properties of transcriptional networks
Local level: Transcriptional networks are made up of motifswhich perform information processing task
Global level: Transcriptional networks are scale-free conferring robustness to the system
Transcriptional networks are made up of motifs
Single inputMotif
- Co-ordinates expression- Enforces order in expression- Quicker response
ArgR
Arg
D
Arg
E
Arg
F
Multiple inputMotif
- Integrates different signals- Quicker response
TrpR TyrR
AroM AroL
Network Motif
“Patterns ofinterconnections
that recur at different parts and
with specificinformation
processing task”
Feed ForwardMotif
- Responds to persistent signal - Filters noise
Crp
AraC AraBAD
Function
Shen-Orr et. al. Nature Genetics (2002) & Lee et. al. Science (2002)
N (k) k
1
Scale-free structure
Presence of few nodes with many links and many
nodes with few links
Transcriptional networks are scale-free
Scale free structure provides robustness to the system
Albert & Barabasi, Rev Mod Phys (2002)
Scale-free networks exhibit robustness
Robustness – The ability of complex systems to maintain their function even when the structure of the system changes significantly
Tolerant to random removal of nodes (mutations)
Vulnerable to targeted attack of hubs (mutations) – Drug targets
Hubs are crucial components in such networksHaiyuan Yu et. al.
Trends in Genetics (2004)
Summary I - Structure
Transcriptional networks are made up of motifs that havespecific information processing task
Transcriptional networks are scale-free which confers robustnessto such systems, with hubs assuming importance
Madan Babu M, Luscombe N et. alCurrent Opinion in Structural Biology (2004)
Evolution of the regulatory network across organisms
Evolution of local network structure (motifs)
Structure of the transcriptional regulatory networkComponents, local & global structure
Outline
Evolution of components in the network (genes and interactions)
Evolution of global network structure (scale-free structure)
Evolution of networks across organisms
How does the regulatory network change during the course of organismal evolution ?
Evolving interactions
Change inenvironment
Evolvinginteractions
Change inenvironment
Ancestral networkIn a particular environment
Dataset
112 TFs
711 TGs
1295 Interactions
E. coli transcriptional regulatory network
Shen-orr et al (2002) Nature Genetics
Madan Babu & Teichmann (2003) Nucleic acids Research
Salgado et al (2002) Nucleic Acids Research
Step 1
E. coli
Procedure to reconstruct regulatory network
Define TFs and TGs
Step 2
Genome of interest
Identify orthologs in thegenome of interest
Step 3
Reconstruct interactionsif orthologous TFs and TGs
exist in the genome of interest andare known to interact in E. coli
Genome of interest
Similar to Yu H et. al, Genome Research (2004)Verified with COGS, Tatusov, Koonin, Lipman, Science (1998)
12
171
78
38
250
314
41
251
326
Bacillus anthracis A2012 (5544 genes) Streptomyces coelicolor (7769 genes)
49
156
100
Reconstructed transcriptional networks
http://www.mrc-lmb.cam.ac.uk/genomes/madanm/reconstruct_net
175 completely sequenced prokaryotic genomes20 Archaeal, 156 Bacterial Genomes
175 completely sequenced prokaryotic genomes
20 Archaeal
156 Bacterial Genomes
http://www.mrc-lmb.cam.ac.uk/genomes/madanm/reconstruct_net
Evolution of networks across organisms
How do regulatory interactions change during the course of organismal evolution ?
Evolving interactions
Change inenvironment
Evolvinginteractions
Change inenvironment
Ancestral networkIn a particular environment
Selection can operate at three levels of organization
Network(all transcriptional
interactions in a cell)
Motifs(patterns of
interconnections)
Interactions(transcriptional
interaction)
Transcriptionfactor
Target gene
Madan Babu M, Luscombe N et. alCurrent Opinion in Structural Biology (2004)
Evolution of the basic unit
Network(all transcriptional
interactions in a cell)
Motifs(patterns of
interconnections)
Interactions(transcriptional
interaction)
Transcriptionfactor
Target gene
Madan Babu M, Luscombe N et. alCurrent Opinion in Structural Biology (2004)
Transcription factors and target genes may co-evolve or evolve independently of each other
Co-evolutionIndependent
evolution
Work on protein interaction network has shownthat interacting proteins tend to co-evolve
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100
Target genes present (%)
Tra
nscr
iptio
n fa
ctor
s pr
esen
t (%
)
Transcription factors evolve rapidly and independentlyof their target genes
Does not mean they lose transcription factorsInstead they evolves their own set of regulators
Predicted Transcription Factors from the different genomes
Nimwegen, TIGS (2003); Renea et. al, JMB (2004); Aravind et. al, FEMs letters (2005)
0
100
200
300
400
500
600
700
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Tra
nscr
iptio
n fa
cto
rs
Proteome size
B. pertussis
B. parapertussisB. bronchiseptica
M. magnetotacticum
Pirellula_sp
D. hafniense
N. punctiforme
B. japonicum
Nostoc SpL. interrogans
S. coelicolor
Winged HTH
Classical prokaryotic HTH
C-terminal effector domain
Cro/C1 type HTH
FIS like
96
42
39
25
13
111
39
45
47
6
226
200
142
87
1
Winged HTH
Classical prokaryotic HTH
C-terminal effector domain
Cro/C1 type HTH
FIS like
Winged HTH
Classical prokaryotic HTH
C-terminal effector domain
Cro/C1 type HTH
FIS like
Escherichia coli K12 (4311 genes)
Bacillus anthracis A2012 (5544 genes)
Streptomyces coelicolor (7769 genes)
Transcription Factor conservation profile
Can be used to predict presence/absence of specific response regulatory pathways
organism Ainteraction 0001: yesinteraction 0002: yesinteraction 0003: yesinteraction 0004: nointeraction 0005: yesinteraction 0006: no..interaction 1295: yes
organism Binteraction 0001: yesinteraction 0002: nointeraction 0003: yesinteraction 0004: yesinteraction 0005: yesinteraction 0006: no..interaction 1295: yes
.....
organism Zinteraction 0001: nointeraction 0002: nointeraction 0003: nointeraction 0004: yesinteraction 0005: yesinteraction 0006: no..interaction 1295: no
Interaction conservation profile
interaction 1 2 3 4 5 6 . . 1295organism A 1 1 1 0 1 0 . . 1 organism B 1 0 1 1 1 0 . . 1.organism Z 0 0 0 1 1 0 . . 0
A
B
CDE
F
G H
Do organisms with similar lifestyle conserve similar interactions ?
Procedure to construct tree based on similarity of conserved networks
2.010
81
Bor A in conserved nsinteractio
B andA in conserved nsinteractio1
D
Lp
ng
am
ma
Bfl
gam
ma
Hsp
eur
Wb
ega
mm
aW
brga
mm
aR
pa
lph
aR
coa
lph
aS
pnT
firm
Sp
nfir
mi
Tw
hT
Wac
tTw
hT
acti
Ct c
h lap
irC
cach
lap i
Cm
uch
lap i
Cpn
Tch
laC
pnJ
chla
Cpn
Ach
laC
pnC
chla
Hp
Jep
sil
Hp
2eps
il
Bbsp
iroM
pnfir
mi
Bapgam
ma
Mge
f irm
i
U ufir
mi m
Mga firmi
Mpe firmi
Meq nan
M theu r
M taeur
P fueur
M pu firmi
Kpn gam m a
Gm e delta
M de gamm a
Plu gam m a
Ype K gam m
Ype C g am m
Sen gam m a
St gam m a
S ty g am m a
Sty T gam m
Ec O gam m a
Ec O E g am m
Ec C gam m a
Ec K gam m a
Sfl 2 gam m
Sfl g a m m a
Bfu b etaC vi b etaR m e be ta
Avi g a m m aPpu gam m aPsy ga m m aPae gam m aPf l gam m aSon gam m aVch gam m aVpa gam m aVvu C gam m
Vvu Y gam mDha firm iSpa r a lpha
R so l be taBp be taBpp beta
Bbr be taR hsp alpha
Mlo alpha
Bme l alpha
Bs 1a lpha
Sme
a lpha
Atuc
alph
A tuw
a lph
Rpa
alpha
Rru
a lpha
Brja
a lpha
Mm
agalpha
Ccr a lpha
Xca
gamm
a
Xax
gamm
a
Xci gam
ma
Hdu
3gam
m
Hso
gamm
a
Hi g
amm
a
Pm
ug
amm
a
Ne
ube
ta
Nm
Zb
eta
Nm
Mb
eta
Cbu
ga
mm
a
Xfa
gam
ma
Xfa
Tg
amm
Fn
ufu
so
Ca
uch
loro
Se
pA
firmS
aM
Wfirm
Sa
Nfirm
iS
aM
ufirm
Bc
Afirm
iB
an
Am
firB
anA
2fir
Mle
actin
Cdiactin
Sav
actinS
coeactin
Tmaqte
Tfusactin
Lmo
firmi
Linfirm
iTte
firmi
Cte
Efirm
Cpefirm
iBlo
actinM
tuH
acti
Mtu
Cacti
Mbo
actin
Bhafirm
i
Bsfirmib
Cglactin
Cefactin
Anacyan
Sspcyan
Npucyan
Gvicyan
Lgafirmi
Lmefirmi
Spyfirmi
SpyM8fir
SpyM3fir
SpySfirm
Llafirmi
Sag2firm
SagNfirm
Pgibachlo
Ctepbachl
Hheepsilo
Wsuepsilo
Cjepsilon
Tercyan
Thelcyan
AaeaqteChubachloPspchlapiDdedelta
SstcrenDrdr
BthVbach PmarCcya Lintspiro PmarMcya Scspcyan PmarMEcyCthfirm
i Ooefirmi Lplfirm
i EfaVfirm SsocrenApcren
FaceurSmufirm
iCacfirm
i
Oihfirm
iAfeur
Mm
aeur
Mba
eur
Pyae
cren
Mac
eur
Pheu
r
Paeu
r
Tpsp
iro
Mka
eur
Mje
ur
Bap
Sga
mm
Bsp
gam
ma
Tac
eur
Tvo
eur
Distance tree based on interactions present
Tree based on network similarity
Closely related organismswith similar lifestyle clustertogether
Organisms with similarlifestyle but belonging todifferent phylogeneticgroups cluster together
0.00.30.60.7O4
0.30.00.50.6O3LS2
0.60.50.00.2O2
0.70.60.20.0O1LS1
O4O3O2O1
LS2LS1
Each element in the matrixrepresents the normalized distance Between motif profiles for a given
pair of organisms
0.150.6LS2
0.60.1LS1
LS2LS1
0.15
Max (LS2=0.6)
=0.250
0.6
Max (LS2=0.6)
=1.000
LS2
0.6
Max (LS1=0.6)
=1.000
0.1
Max (LS1=0.6)
= 0.166
LS1
LS2LS1
Average distance between organisms having different
lifestyles
Row normalized distance between organisms having
different lifestyles
1.00.70.40.3O4
0.71.00.50.4O3LS2
0.40.51.00.8O2
0.30.40.81.0O1LS1
O4O3O2O1
LS2LS1
Each element in the matrixrepresents the normalized similarity between interaction or motif profiles
for a given pair of organisms
0.850.4LS2
0.40.9LS1
LS2LS1
0.85
Max (LS2=0.85)
=1.000
0.4
Max (LS2=0.85)
=0.470
LS2
0.4
Max (LS1=0.9)
=0.444
0.9
Max (LS1=0.9)
= 1.000
LS1
LS2LS1
Average similarity between organisms having different
lifestyles
Row normalized similarity between organisms having
different lifestyles
Lifestyle similarity index
Define a lifestyle class for each of the 176 organism based on 4 attributes
Oxygen requirement Optimal growth temperature Habitat Pathogen or not
e.g: E. coli would belong to the class: Facultative:Mesophilic:Host-associated:No
elements diagonal off ofNumber
elements diagonal Off
elements diagonal ofNumber
elements Diagonal
lifestylesdifferent tobelonging
organismsbetween similarity Average
lifestyle same the tobelonging
organismsbetween similarity Average
LSI
Each cell representsAverage similarity
in interaction content between organisms
Lifestyle similarity index
LSI = 1.42 p-value < 10-3
Organisms with similar lifestyle conserve similar interactions
Transcription factors tend to evolve rapidly than their target genes. This coupled with the observation that different genomes evolve
their own transcription factors means that they sense and respond to different signals in their environment.
Summary I - Evolution of the basic unit
At the level of regulatory interactions, organisms with similar lifestyle conserve similar regulatory interactions indicating
the influence of environment on gene regulation.
Evolution of network motifs across organisms
Network(all transcriptional
interactions in a cell)
Motifs(patterns of
interconnections)
Interactions(transcriptional
interaction)
Transcriptionfactor
Target gene
Madan Babu M, Luscombe N et. alCurrent Opinion in Structural Biology (2004)
Interactions in motifs may be conserved as a unit or may evolve like any other interaction in the network
Complete conservationor absence
Partial conservation
Work on protein interaction network has shownthat motifs tend to be completely conserved
F1 Fn S1 Sn M1 MnE. coli ... ... ...
F1 Fn S1 Sn M1 Mnorganism A ... ... ...
F1 Fn S1 Sn M1 Mnorganism B ... ... ...
....Z
.
.
.
.
....
(1/3).(3/3)B
(1/3).(3/3)A
FnF2F1Org
....Z
.
.
.
.
....
(3/3).(2/3)B
(3/3).(2/3)A
SnS2S1Org
....Z
.
.
.
.
....
(4/4).(1/4)B
(4/4).(2/4)A
MnM2M1Org
motif conservationprofile
feed-forward motifs single input modules multi input modules
clustering of motifs (e.g. K-means) clustering of motifs (e.g. K-means)
Generation of motif conservation profiles
Motifs are only partially conserved in many genomes
0% 100%Motifs
Gen
om
es
E. coli
Partially conserved motifs
Are interactions in motifs more conserved thanother interactions in the network?
Simulation of network evolution
Negative selection for interactions
in motifs
Interactions in motifsare selected against
Positive selection for interactions
in motifs
Interactions in motifsare selectively conserved
Neutral selection for interactions
in motifs
Interactions in motifsare neutrally conserved
Interactions in motifs evolve like any other interaction in the network
-1
-0.5
0
0.5
1
30 40 50 60 70 80 90 100
% genes conserved
Cm
Selection for motifs
Observed trend in genomes
Neutral conservation of motifs
Selection against motifs
Evolutionarily closely related organisms that havedissimilar lifestyle do not conserve network motifs
Salmonella typhi( proteobacteria)
Fnr
NarL NuoN
Vibrio cholerae( proteobacteria)
Haemophilus somnus( proteobacteria)
Xylella fastidiosa( proteobacteria)
Blochmannia floridanus( proteobacteria)
Evolutionarily distantly related organisms that havesimilar lifestyle conserve network motifs
R. palustris ( proteobacteria)B. pertussis ( proteobacteria)N. punctiforme (Cyanobacteria)S. avermitilis (Actinobacteria)D. hafniense (Firmicute)
Fnr
NarL
Orthologous genes can be embedded in different motifsaccording to requirements dictated by lifestyle
Interactions in motifs evolve like any other interaction in the network
Feed forward motif Single input motif
Responds to persistent signal Quick response
E. colistable environment – requires persistent signal
H. influenzaeunstable environment – requires quick response
0.00.30.60.7O4
0.30.00.50.6O3LS2
0.60.50.00.2O2
0.70.60.20.0O1LS1
O4O3O2O1
LS2LS1
Each element in the matrixrepresents the normalized distance Between motif profiles for a given
pair of organisms
0.150.6LS2
0.60.1LS1
LS2LS1
0.15
Max (LS2=0.6)
=0.250
0.6
Max (LS2=0.6)
=1.000
LS2
0.6
Max (LS1=0.6)
=1.000
0.1
Max (LS1=0.6)
= 0.166
LS1
LS2LS1
Average distance between organisms having different
lifestyles
Row normalized distance between organisms having
different lifestyles
1.00.70.40.3O4
0.71.00.50.4O3LS2
0.40.51.00.8O2
0.30.40.81.0O1LS1
O4O3O2O1
LS2LS1
Each element in the matrixrepresents the normalized similarity between interaction or motif profiles
for a given pair of organisms
0.850.4LS2
0.40.9LS1
LS2LS1
0.85
Max (LS2=0.85)
=1.000
0.4
Max (LS2=0.85)
=0.470
LS2
0.4
Max (LS1=0.9)
=0.444
0.9
Max (LS1=0.9)
= 1.000
LS1
LS2LS1
Average similarity between organisms having different
lifestyles
Row normalized similarity between organisms having
different lifestyles
Lifestyle similarity index
Define a lifestyle class for each of the 176 organism based on 4 attributes
Oxygen requirement Optimal growth temperature Habitat Pathogen or not
e.g: E. coli would belong to the class: Facultative:Mesophilic:Host-associated:No
elements diagonal off ofNumber
elements diagonal Off
elements diagonal ofNumber
elements Diagonal
lifestylesdifferent tobelonging
organismsbetween similarity Average
lifestyle same the tobelonging
organismsbetween similarity Average
LSI
Each cell representsAverage similarity in motif content
between organisms
Organisms with similar lifestyle conserve network motifsand hence may regulate target genes in a similar manner
Lifestyle similarity index
LSI = 1.34 p-value < 3x10-3
Organisms with similar lifestyle conserve network motifs and hence may regulate target genes in a similar manner
Even though motifs are not conserved as wholeunits, organisms with similar lifestyle tend
to conserve similar motifs
Summary II - Evolution of network motifs
Evolution of global structure
Network(all transcriptional
interactions in a cell)
Motifs(patterns of
interconnections)
Interactions(transcriptional
interaction)
Transcriptionfactor
Target gene
Madan Babu M, Luscombe N et. alCurrent Opinion in Structural Biology (2004)
Regulatory hubs may be conserved or lost and replaced
Conservation of hubs Replacement of hubs
Work on protein interaction network has shownthat hubs tend to be conserved
Are hubs more conserved than other nodes in the network?
Simulation of network evolution
Negative selection for hubs
Hubs in networks areare selected against
Positive selection for hubs
Hubs in networksare selectively conserved
Neutral selection for nodes
Nodes in networksare neutrally conserved
Regulatory hubs are lost as rapidly asother transcription factors in the network
Crp
NarL
Crp
NarL
E. coli H. influenzae B. pertussis
NarL
Crp
Regulatory hubs which are condition specific can beeither lost or replaced
The same protein in organisms living in different lifestyles may conferdifferent adaptive value. Hence it may emerge as a regulatory
hub in the organism to which it confers high adaptive value and not in the others
Different proteins should emerge as hubs in organismswith different lifestyle
CcpA (85)
ComK (48)
AbrB (41)
Fur (37)
PhoP (33)
CodY (30)
Known transcriptional regulatory network of B. subtilis
Crp (188)
Fnr (109)
Ihf (95)
ArcA (69)
NarL (65)
Lrp (52)
Known transcriptional regulatory network of E. coli
Different proteins emerge as regulatory hubs
Scale-free structure emerged independently in evolutionHubs evolve according to requirements dictated by life style
Even though hubs can be lost or replaced, organismswith different lifestyle evolve a scale-free structure where
different proteins emerge as hubs as dictated by their lifestyle
Summary III - Evolution of global structure
Implications
First overview of transcriptional regulatory systems, including predictionof transcription factors, in experimentally intractable organisms and pathogens
Identification of key regulatory hubs can possibly serve asgood drug targets
Good starting point to study how changesin cis-regulatory elements affect gene expression experimentally and in
engineering regulatory interactions
Methods developed are generically applicable
Conclusion
Transcription factors evolve independently of their target genesOrganisms with similar lifestyle conserve similar interactions
Interactions in motifs are not conserved as whole unitsOrganisms with similar lifestyle conserve similar motifs
Hubs are not completely conserved and can be lost or replacedDifferent proteins emerge as hubs in organisms as dictated by lifestyle
Transcriptional networks in prokaryotes are flexible and adapt to their environment by tinkering individual interactions
Sarah TeichmannMRC-LMB, Cambridge, U.K
Acknowledgements
MRC - Laboratory of Molecular BiologyNational Institutes of Health
L AravindNCBI, NIH, Bethesda, USA
Evolutionary dynamics of prokaryotic transcriptional regulatory networksMadan Babu M, Teichmann, SA & Aravind L, submitted
http://www.mrc-lmb.cam.ac.uk/genomes/madanm/publications.html
Madan Babu M & Teichmann SANucleic Acids Research (2003)
Trends in Genetics (2003)
Evolution of transcription factors
duplication of TF
duplication of TG
duplication of TG + TF
Teichmann SA & Madan Babu MNature Genetics (2004)
Growth oftranscriptional
regulatory networks
Luscombe N, Madan Babu M et. alNature (2004)
Condition specific usageof transcriptional
regulatory networks
Past work from our lab
0
0.2
0.4
0.6
0.8
1
1.2
-1
-0.8
-0.6
-0.4
-0.2 0
0.2
0.4
0.6
0.8 1
Pearson Corre lation Coefficient
Rel
ativ
e d
istr
ibu
tio
n
Co-regulated pairs of TGs
TF – TG pairs
Random pair of genes
E. colia
0
0.2
0.4
0.6
0.8
1
1.2
-1
-0.8
-0.6
-0.4
-0.2 0
0.2
0.4
0.6
0.8 1
Pearson Corre lation Coefficient
Rel
ativ
e d
istr
ibu
tio
n
Co-regulated pairs of TGs
TF – TG pairs
Random pair of genes
V. choleraeb
Experimental network753 interactions, 569 proteins
Reconstructed network414 interactions, 322 proteins
4617 87
Interactions in the reconstructednetwork formed by the
123 proteins: 133
Interactions in the experimentally determined network formed by the
123 proteins: 63
Overlap123 proteins
Interactions seen in bothnetworks: 46
Interactions seen ONLY in experimental network: 46
Interactions seen ONLY in reconstructed network: 87
fnr