1
Introduction to Systems Biology4. Gene (Regulatory) Networks
Toni Reverter – June 2010, Valencia, Spain.
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene Networks
When a comprehensive gene expression data set representing a large number of perturbations is made available, the reversed engineering of gene regulatory networks becomes a logical step towards the elucidation of biological pathways of interest.
While developmental (ie. Time series) experiments provide the ideal framework, Basso et al (2005; Nature Genetics, 37:382) showed that a large number of perturbations can do the trick …with the right mathematical approach.
Barabasi & Oltvai (2004) Network Biology: Understanding the cell’s functional organization. Nature Review Genetics 5:101.
“Network theory offers unforeseen possibilities to understand the cell’s internal organization and evolution, fundamentally altering our view of cell biology”.
2
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene NetworksNetworks contain small repeating patterns of interconnections, called network Motifs .
Basic network motifs include:(1) Feed forward; (2) Single input;(3) Multiple input; (4) A combination of the above.
Think of motifs as simple building blocks of complex networks.
Much of a cell’s activity is organised as a network of interacting Modules : Sets of genes co-regulated to respond to different conditions.
Think of Modules as clusters, i.e., genes being highly connected within a cluster but sparsely (if at all) connected across modules.
Understanding this organisation is crucial for understanding cellular responses to internal and external signals.
Once a network is build, both its(1) Mathematical
and (2) Biologicalsoundness needs to be validated.
Scale-free, power-law distribution of its connectivity
Targets via essaysEffects via knock-outs
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene Networks
Network motifs
Patterns of inter-connections thatfrequently appearin the network
Directed graph
nodes represent genes & edges represent
transcriptional interaction
Gene regulatory networkof an organism
A transcription factor, a target gene with its binding site and
a regulatory interaction
Basic unit
Transcription factor
Target gene
Source: M. Madan-BabuMRC Laboratory of Molecular Biology, Cambridge
3
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene NetworksConnectivity Rules
Does this map tell you which cities are important?
This one does!
The nodes with the largest number of links (connections) are most important!
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene Networks
Gene expression is more strongly preserved than gene connectivity. Hypothesis: Molecular wiring makes us human
Chimp vs Human
Additional Hypotheses:1. Alternative splice variants of the same gene2. The role of non-coding DNA � RNAi
Khaitovich et al. (2005)Science 309:1850.
Connectivity Rules
4
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene NetworksConnectivity Rules
Network theory used to capture Saddam Hussein!
“The intelligence background and link diagrams that we built were rooted in the concepts of network analysis. We constructed an elaborate product that traced the tribal and family linkages of Saddam Hussein thereby allowing us to focus on certain individuals who may have had (or presently had) close ties to [him],”
Major Brian J. Reedhttp://www2.asanet.org/footnotes/julyaugust05/fn3.html
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene NetworksConnectivity Rules
http://christakis.med.harvard.edu/
5
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Lee et al. (2002) Science 298:799-804.
MORE TERMINOLOGY:
Nodes are Genes.
Connections (or edges or vertices or links) are Interactions.
Directed interactions (ie. having a regulatory nature) involve a Transcription Factor and its Target(s).
In the main, we’ll deal with gene co-expression networks (…a way to explore the correlation matrix).
4.1 Introduction to Gene Networks
Basic Network Motifs
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene Networks
Basic Network Motifs …and their relevance
Single-Input Motif
TF
G1 G2 G3
Last-In-First-Out (LIFO) Order
Temporal Order of Expression
6
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene Networks
Basic Network Motifs …and their relevance
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene Networks
Basic Network Motifs …and their relevance
Feed-Forward Motif
TF1
G1 G2 G3
First-In-First-Out (FIFO) Order
Temporal Order of Expression
TF2ACTIVATOR
REPRESSOR
� The flagella is a device made of ~30 proteins� Only built if needed (ie. if no food around)� Is an electrical motor (pumping out protons at the expense of ATP)� Capable of pushing the cell at 30 microns/sec.
7
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene Networks
Basic Network Motifs …and their relevance
Temporal Order of ExpressionKEGG database: http://www.genome.ad.jp/kegg/kegg2.html
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene Networks
Basic Network Motifs …and their relevance
Temporal Order of Expression
8
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene Networks
Degree (or Connectivity ) of a node, k, is the No. of links this node has.
Basic Network Measures
The Degree Distribution , P(k), is the probability that a selected node has exactly k links. Networks are classified by their degree distribution.
The Clustering coefficient , C(k) = 2k/(N(N-1)), measures the amount of cohesiveness, the tendency of nodes to form clusters or modules.
Note 1 : the maximum number of connections is N(N-1)/2 (ie. Number of off-diagonals in the R matrix), in which case C(k) = 1.0. Note 2 : For many networks, C(k) ~ k-1 which is an indication of a network hierarchical character (more on this later).Note 3 : For a single node i, C(k i) = 2n i/(k i(k i-1)), where n i is the number of links connecting the k i neighbours of node i with each other and k i(k i-1)/2 is the total number of triangles that would pass through node i should all of its neighbours be connected with each other.
The Path Length = Links we need to pass to travel between two nodes. The mean path length, l tells us the average shortest pass between all pairs of nodes and offers a measure of overall navigability.
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene Networks
Computing Basic Network Measures
Degree Distribution:
The Clustering coefficient , C(k) = 2k/(N(N-1)):
N = 22 Genesk = 31 Connections
C = (2×31)/(22×21) = 0.1342 � 13.4%
9
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
1. Random Network2. Scale-Free Network3. Hierarchical Network
Three Types of Networks According to their
Connectivity Structure
NB: Biological networks are reported to be Scale-Free…a bit like having an airbag in your car.
4.1 Introduction to Gene Networks
Interpreting Basic Network Measures
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Each pair of nodes is connected with probability p, creating a graph with approximately [p × N(N-1)/2] randomly placed links.
The connectivity degree follows a Poisson distribution: Nodes that deviate from the average are rare and decreases exponentially.
The clustering coefficient is independent of a node’s degree of connectivity, so it appears as a horizontal line.
Mean shortest path is l ~ log(N) indicating that most nodes are connected by a short path (Small World Property).
4.1 Introduction to Gene Networks
Random Networks
10
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Most nodes are poorly while a few are highly connected (Hubs ).
The degree distribution approximates a power law: P(k) ~ k –γ, where γ is the degree exponent (Straight line in a Log-Log plot).
The smaller the γ, the more important is the role of the Hubs. Most biological networks have 2 < γ < 3. For γ > 3, Hubs are irrelevant and the network behaves like a random network.
The mean shortest path length is proportional to log(log(N)) (ie. Much shorter than Small World Property).
Line Random Networks, C(k) is independent of k (straight line)
4.1 Introduction to Gene Networks
Scale-Free (Power-Law) Networks
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Scale-Free networks are invariant to changes in scale. Any function ofP(k) remains unchanged within a multiplicative factor P(ak) = b P(k).
This implies that scale-free networks are self-similar, i.e. any part of the network is statistically similar to the whole network and parameters are assumed to be independent of the system size.
Think of a cauliflower:
4.1 Introduction to Gene Networks
Scale-Free (Power-Law) Networks
No matter which scale is chosen the same distribution of degrees
is observed among nodes
11
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
To accommodate modularity, clusters combine in an iterative manner, generating a hierarchical network.
The hierarchical network model seamlessly integrates a scale-freetopology with an inherent modular structure by generating a network that has a power-law degree distribution with degree exponent γ = 1 + ln4/ln3 = 2.26.
The most important signature of hierarchical modularity is the scaling of the clustering coefficient, which follows C(k) ~ k –1 a straight line of slope –1 on a log–log plot.
4.1 Introduction to Gene Networks
Hierarchical Networks
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Topology
Connectivity, k
Clustering Coefficient, C(k)
4.1 Introduction to Gene Networks
Random vs Scale-Free vs Hierarchical
12
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
In the random network (left), the five nodes with the most links (in red) are connected to only 27% of all nodes (green).
In the scale-free network (right), the five most connected nodes (red) are connected to 60% of all nodes (green)
4.1 Introduction to Gene Networks
Random vs Scale-Free Networks
Scale-free networks are resistant to random failure but vulnerable to targeted attack, specifically against hubs. This property has been held to account for the robustness of biological networks to perturbations like mutation and environmental stress. A positive correlation between essentiality and connectivity has been demonstrated linking topological centrality to functional essentiality.
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Luscombe et al. (2004) Genomic analysis of regulatory network dynamics reveals large topological changes . Nature 431:308-312
4.1 Introduction to Gene Networks
13
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Hidalgo et al. (2009) PLoS Comp Biol 5:e1000353
4.1 Introduction to Gene Networks
Phenotypic Disease Network
Understanding and Exploiting the Flow of Information in Networks
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Phenotypic Disease Network
Hidalgo et al. (2009) PLoS Comp Biol 5:e1000353
4.1 Introduction to Gene Networks
Understanding and Exploiting the Flow of Information in Networks
14
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene Networks
Understanding and Exploiting the Flow of Information in Networks
r = 0.53; P = 0.002
r = -0.26; P = 0.146
r = -0.73; P < 0.001
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene Networks
A Model to Generate Networks
1. Evolution: Continuous expansion by the addition of new nodes.2. Growing schema: Starting with a small number of nodes (m0≥2),
at every time step we add a new node with m (≤2) edges that link the new node to m different nodes already present in the system.
3. Preferential-attachment (rich get richer): New nodes attach preferentially to sites that are already well connected � The probability P that a new vertex will be connected to vertex i depends on the connectivity ki of that vertex, so that P(ki) = ki/Σjkj.
Barabasi and Albert. Science (1999) 286 509-512
15
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene Networks
A Model to Generate Networks
Barabasi and Albert. Science (1999) 286 509-512
This network evolves into a scale-invariant state with the probability that a node has k edges, following a power law with an exponent = 2.9 +/- 0.1
This network evolves into a scale-invariant state with the probability that a node has k edges, following a power law with an exponent = 2.9 +/- 0.1
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene Networks
A Model to Generate Networks
Network_Growth.mov
16
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.1 Introduction to Gene Networks
A Model to Generate Networks
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression Networks
• Understand the “system” instead of reporting a list of individual parts• Describe the functioning of the engine instead of enumerating individual
nuts and bolts
• Focus on modules as opposed to individual genes• This greatly alleviates multiple testing problem
• Network terminology is intuitive to biologists
Rationale
Why?
• Genes showing higher than average co-expression are more likely to belong to the same pathway (class comparison)
• Genes controlled by the same transcription factor show higher than average co-expression (class discovery)
Wolfe CJ, Kohane IS and Butte AJ (2005) Systematic survey reveals general applicability of “guilt-by-association” within gene coexpression networks. BMC Bioinformatics, 6, 227.
17
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
24 Normal Tissues (in dupplicate) – 3,327 Genes & 15,841 Connections
http://bioinfow.dep.usal.es/coexpression/
Prieto et al. (2009) Human gene coexpressionlandscape: confident network derived from tissue transcriptomic profiles. PLoS ONE 3:e3911
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression Networks
Statistics are like a bikini, what they reveal is suggestive, but what they conceal is vital (Aaron Levenstein).
The correlation coefficient is the most “bikini” of all statistics (Toni Reverter).
##
##
# ###
##
#
#
#
##
#
##
##
# ###
##
#
#
#
#
#Case 1
Case 2
Gene 1
Gen
e 2
Gene 1
Gen
e 2
##
##
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
## #
#
##
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
## #
#
##
#
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
##
Definition of “high” correlation is somewhat tricky.
Why Correlation?
18
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression Networks
Why Correlation?
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression Networks
Why Correlation?
2
1)(
2
−−=
n
rrSE
0
100
200
300
400
500
600
700
800
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Fre
quen
cy
Correlation Coefficient
5 Conditions
15 Conditions
50 Conditions
tr
nr~
1
22−
−
Finding significant correlations is NOT a trivial task
19
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression Networks
Weighted Un-Weighted
All genes are connectedWidth = StrengthSoft thresholding
Some genes are connectedEqual strength
Hard thresholding
Finding significant correlations is NOT a trivial task
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression NetworksThe PCIT Algorithm
=xyr
Compare Direct vs Partial Correlation
Strength of the r’ship between x and yX
Y
20
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression NetworksThe PCIT Algorithm
Reverter and Chan (2008) Bioinformatics 21:2491
=xyr Strength of the r’ship between x and y
=xy.zr Strength of the r’ship between x and ythat is independent of z
X
Z Y
Compare Direct vs Partial Correlation
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression NetworksThe PCIT Algorithm
Reverter and Chan (2008) Bioinformatics 21:2491
=
⋅
xy
zxy
r
rProportion of the strength of the r’shipbetween x and y that is independent of z
=xyr Strength of the r’ship between x and y
=xy.zr Strength of the r’ship between x and ythat is independent of z
X
Z Y
Compare Direct vs Partial Correlation
21
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression NetworksThe PCIT Algorithm
Reverter and Chan (2008) Bioinformatics 21:2491
PCIT: Use of Partial C orrelations and an Information T heoryapproach to establish connections in the network
X
Z Y
( )( )2yz
2xz
yzxzxyzxy
r1r1
rrrr
−−
−=⋅ 698.0
r
r
r
r
r
r
3
1
yz
xyz
xz
yxz
xy
zxy =
++ ⋅⋅⋅
Threshold
xzxy r698.0r >If
yzxy r698.0r >andConnection between X and Y
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression NetworksThe PCIT Algorithm
Reverter and Chan (2008) Bioinformatics 21:2491
22
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression NetworksThe PCIT Algorithm
Reverter and Chan (2008) Bioinformatics 21:2491
SIMULATED DATA � RANDOM NETWORK
0
2
4
6
8
10
12
14
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Clu
ster
ing
Coe
ffici
ent,
%
Number of Genes
5 Conditions10 Conditions15 Conditions25 Conditions50 Conditions
0
50
100
150
200
250
300
20 25 30 35 40 45 50 55 60 65 70 75
Gen
es
Connection
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression NetworksThe PCIT Algorithm
Reverter and Chan (2008) Bioinformatics 21:2491
REAL DATA � SCALE-FREE NETWORK
23
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression Networks
The PCIT Algorithm …and the Competition
Bin Zhang and Steve Horvath (2005) "A General Framework for Weighted Gene Co-Expression Network Analysis", Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article 17.
Margolin AA et al. (2008) ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 7, S7.
Basso K et al. (2005) Reverse engineering of regulatory networks in human B cells. Nat. Genet. 37, 382-390.
They require human interventionPCIT is completely data driven
PCIT does not force the emergence of a scale-free topoloty
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression Networks
PIEDMONTESE
WAGYU
PRE-NATAL POST-NATALDIETS
A/ Compute the normalized mean expression of 6,603 genes across 26 experimental conditions to generate up to six interlaced gene networks using PCIT:
1. Overall: Across the 26 conditions2. Piedmontese: Across the 10 conditions of Piedmontese samples (circles).3. Wagyu: Across the 10 conditions of Wagyu breed samples (squares).4. Pre-Natal: Across the 8 conditions of pre-natal samples (blue figures).5. Post-Natal: Across the 12 conditions of post-Natal samples (green figures).6. Diets: A network across the 6 conditions of the diet experiment (orange triangles).
B/ Select edges (connections) conserved across the 6 networks � “Always Conserved ” network.
C/ Load the Always Conserved network into Cytoscape, display organic layout algorithm and with a number of gene attributes (including regulation, pathway, etc.) incorporated in the visualization schema to enhance distinguishing among “highly conserved modules ”. Whenever possible, the genomic functionality of each module was labelled (eg. cell cycle, mitochondria, ECM, Z-disk, …).
D/ The average absolute co-expression correlation of each transcriptional regulator with the genes in each module was compute to identify the key regulators .
24
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression Networks
The 6,603 genes (including 912 Transcription Factors)
were selected as those showing ‘good’ signals
throughout samples and ‘minimally’ variables.
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression Networks
The Always Conservednetwork had 3,506 genes and
6,506 connections.Up to 66 modules were
identified
25
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression Networks
Catabolic
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression Networks
Cell CycleDetermines fibre number � double-muscling
TranslationProtein synthesis � double-muscling
Extracellular MatrixConnective tissue � meat quality
MitochondriaOxidative capacity �feed efficiency
Identification of “Key” Regulators
• TF included in each PCIT-derived module• TF with large average |r| with module members
26
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression Networks
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.2 Gene Co-Expression Networks
RNF14 Experimental Validation � Transfection C2C12 cells
1. Designed primers to amplify mouse RNF14 transcript. A PCR using mouse muscle cDNA template amplified two bands � 2 Transcript variants (one short, one long).
2. Both variants were engineered into an “expression construct” � CMV promoter recognised by mammalian cells as a cue to drive expression.
27
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
4.3 Cytoscapewww.cytoscape.org
Contact UsPhone: 1300 363 400 or +61 3 9545 2176
Email: [email protected] Web: www.csiro.au
THE END