Evolution & Design Principles in Biology: a consequence of evolution and natural selection Rui Alves University of Lleida [email protected] Course Website:http://web.udl.es/usuaris/pg193845/Bioinformatics_2009/
Feb 26, 2016
Evolution & Design Principles in Biology:
a consequence of evolution and natural selection
Rui AlvesUniversity of [email protected]
Course Website:http://web.udl.es/usuaris/pg193845/Bioinformatics_2009/
Part I: Molecular Evolution
Theory of Evolution• Evolution is the theory that allows us to understand how organisms came to be how they are•In probabilistic terms, it is likely that all living beings today have originated from a single type of cells•These cells divided and occupied ecological niches, where they adapted to the new environments through natural selection
How did the first cell create different cells?
Neutral Mutation (e.g. by error in genome replication)
How did the first cell create different cells?
Neutral Mutation (e.g. by error in genome replication)
How did the first cell create different cells?
Neutral Mutation (e.g. by error in genome replication)
How did the first cell create different cells?
Deleterious Mutation (e.g. by error in genome replication)
How did the first cell create different cells?
Deleterious Mutation (e.g. by error in genome replication)
How did the first cell create different cells?
Deleterious Mutation (e.g. by error in genome replication)
How did the first cell create different cells?
Advantageous Mutation (e.g. by error in genome replication)
How did the first cell create different cells?
Advantageous Mutation (e.g. by error in genome replication)
And then there was sex…
Why Sex???• Asexual reproduction is quicker, easier more
offspring/individual.• Sex may limit harmful mutations– Asexual: all offspring get all mutations– Sexual: Random distribution of mutations. Those with the
most harmful ones tend not to reproduce.• Generate beneficial gene combinations– Adaptation to changing environment– Adaptation to all aspects of constant environment– Can separate beneficial mutations from harmful ones– Sample a larger space of gene combinations
New Niche/ New conditions in old niche
What drives cells to adapt?
New (better adapted) mutation
What drives cells to adapt?
How do New Genes and Proteins appear?
• Genes (Proteins) are build by combining domains• New proteins may appear either by intradomain
mutation of by combining existing domains of other proteins
Cell DivisionCell
Division …
…
The Coalescent•This model of cellular evolution has implications for molecular evolution
•Coalescent Theory:
• a retrospective model of population genetics that traces all alleles of a gene in a sample from a population to a single ancestral copy shared by all members of the population, known as the most recent common ancestor
Why is the coalescent the de facto standard today?
Alternatives?
Current sequences have evolved from the same original sequence (Coalescent)
Current sequences have converged to a similar sequence from multiple origins of life
Back of the envelop support for ?ACDEFGHIKLMNPQRSTVWY 20A EDYAHIKLMNPQRGTVWY 20
AAi AAk 0]1[ pLog
AAk AAk [ 2] 0Log p
AAi AAk [ 1] 0Log p
AAi
[ 2] 0
1 2
Log p
ptot p p
2121 pppp
Convergence2014614 121 pppptot
Divergence14 62 1p p
Which is more likely?141 ( )1Convergence p
Divergence
Back of the envelop support for divergence
About the mutational process
Point mutations:• Transitions (A↔G, C↔T) are more frequent than transversions (all other
substitutions)• In mammals, the CpG dinucleotide is frequently mutated to TG or CA (possibly
related to the fact that most CpG dinucleotides are methylated at the C-residues)• Microsatellites frequently increase or decrease in size (possibly due to polymerase
slippage during replication)Gene and genome duplications (complete or partial), may lead to:
• pseudogenes: function-less copies of genes which rapidly accumulate (mostly deleterious) mutations, useful for estimating mutation rates!
• new genes after functional diversification Chromosomal rearrangements (inversions and translocation), may lead to
• meiotic incompatibilities, speciationEstimated mutation rates:
• Human nuclear DNA: 3-5×10-9 per year• Human mitochondrial DNA: 3-5×10-8 per year• RNA and retroviruses: ~10-2 per year
Consequences of the coalescent model?
So what if we accept the coalescent model?
A1 TSRISEIRRA2 TSRISEIRRA3 TSRISEIRRA4 TSRISEIRRA5 TSRISEIRRA6 TSRISEIRRA7 PSRISEIRRA8 PKRISEVRRA9 PKRISEVRRA10 PQRISAIQRA11 PQRISAIQRA12 PQRISTIQRA13 PQRISTIQRA14 ASHLHNLQRA15 TKHLQELQREA16 TKHLQELQREA17 TKHLQELQREA18 SKHLHELQRDA19 PKNLHELQKDA20 SKRLHEVQSE
A1-6 TSRISEIRRA7 PSRISEIRRA8-9 PKRISEVRRA10-11 PQRISAIQRA12-13 PQRISTIQRA14 ASHLHNLQRA15-17 TKHLQELQRA18 SKHLHELQRA19 PKNLHELQKA20 SKRLHEVQS
So what if we accept the coalescent model?
A1-6 TSRI SEI RRA7 PSRI SEI RRA8-9 PKRI SEVRRA10-11 PQRI SAI QRA12-13 PQRI STI QRA14 ASHLHNLQRA15-17 TKHLQELQRA18 SKHLHELQRA19 PKNLHELQKA20 SKRLHEVQS
A1-6A7
A10-11A12-A13
A’1-7
A’10-13
So what if we accept the coalescent model?
A’1-7 (p-t) SRI S E I RRA8-9 P KRI S E VRRA’10-13 P QRI S(a-t)I QRA14 A SHLH N LQRA15-17 T KHLQ E LQRA18 S KHLH E LQRA19 P KNLH E LQKA20 S KRLH E VQS
4 3324 5 323
The study of sequence alignments can gives information about the evolution of the different organisms!!!!
Phylogenetic tree reconstruction, overview
Computational challenge: There is an enormous number of different topologies even for a relatively small number of sequences:
3 sequences: 1 4 sequences: 35 sequences: 15 10 sequences: 2,027,025 20 sequences: 221,643,095,476,699,771,875
Consequence: Most tree construction algorithm are heuristic methods not guaranteed to find the optimal topology.
Input data for two major classes of algorithms:1. Input data distance matrix, examples UPGMA, neighbor-joining2. Input data multiple alignment: parsimony, maximum likelihood
Distance matrix methods use distances computed from pairwise or multiple alignments as input.
Building phylogenetic trees of proteins
Genome 1
Genome 2
Genome 3
Genome …
Protein A Protein B Protein C Protein D
Protein A Protein BProtein C Protein D
Protein AProtein B Protein CProtein D
…
Distance based phylogenetic treesACTDEEGGGGSRGHI…A-TEEDGGAASRGHI…ACFDDEGGGGSRGHL……
A1
A2
A3
…
A1
A2A3
A1
5 substitutions 3 substitutionsA2
A3
8 substitutions
A2
A3
A1
3
5
Maximum likelihood phylogenetic trees
ACTDEEGGGGSRGHI…A-TEEDGGAASRGHI…ACFDDEGGGGSRGHL……
Alignment Probability of aa substitution A - E D …
A 1 0.01 0.2 0.09 …
- 0.01 1 0.0001 0.0001 …
E 0.2 0.0001 1 0.5
D 0.09 0.0001 0.5 1
…
Maximum likelihood phylogenetic trees
ACTDEEGGGGSRGHI…A-TEEDGGAASRGHI…ACFDDEGGGGSRGHL……
AlignmentA1
A2
A3
A15 substitutions
3 substitutions
A2
A3
8 substitutions
p(1,2)
p(1,3)
p(2,3)
p(2,3)>p(1,2)>p(1,3)
A1
A3
A2
A2
A3
A1
Statistical evaluation of trees: bootstrapping
1
2
54
3
76
8
Motivation: Some branching patterns in a tree may be uncertain for statistical reasons (short sequences, small number of mutational events)Goal of bootstrapping: To assess the statistical robustness for each edge of the tree.Note that each edge divides the leave nodes into two subsets. For instance, edge 7–8 divides the leaves into subsets {1,2,3} and {4,5}.However, is this short edge statistically robust ?Method: Try to generate tree from subsets of input data as follows:
• Randomly modify input MSA by eliminating some columns and replacing them by existing ones, This results in duplication of columns.
• Compute tree for each modified input MSA.• For each edge of the tree derived from the real MSA, determine the fraction
of trees derived from modified MSAs which contain an edge that divides the leaves into the same subsets. This fraction is called the bootstrap value. Edges with low bootstrap values (e.g. <0.9) are considered unreliable.
Statistical evaluation of trees: bootstrapping
Other Trees
• Use genomes• Use Enzymomes• Use whatever group of molecules are
important for a given function
Part II: Design principles
Outline
• What are design principles
How to study design principles
• Examples
What are design principles?
• Recurrent qualitative or quantitative rules that are observed in similar types of systems as a solution to a given functional problem
• Exist at different levelsNuclear Targeting Sequences
Operon
Gene 1 Gene 2 Gene 3
How can design principles emerge in molecular biology?
• Inteligent design?Not a scientific hypothesis; out of the table
• Evolution?Makes sense, but how could such regularities
emerge?
Climbing down mount improbable
• Overtime, edged stones would accumulate on the slope.
• Smooth, round, stonesaccumulate at the bottom.
Design Principles:- Smooth, roundish rocks roll down the mountain.- Edged, flat, rocks don’t.
Design principles in molecular biology
• Similarly, if a topology or set of parameters has appeared through mutation and it can be shown to create a molecular network that functionally outperforms all other possible alternatives in a given set of conditions, one can talk about a design principle for the system under those conditions.
[sensu engineering]
Index of talk
• How to identify design principles• Design principles in:– Gene expression– Metabolic networks– Signal transduction– Development
• Design principles, what are they good for?• Summary
First step, define the alternatives
Gene
Regulator
+Gene
Regulator_
X0 X1 X2 X3
X0 X1 X2 X3
First step, define the alternatives
X0 X1 X2 X3
X3
t
How strong should the feedback be?
Then, create models for each alternative
Gene
Regulator
+Gene
Regulator_
Finally:
• Compare the dynamic behavior of the models for the two or more alternatives with respect to physiologically relevant criteria.
Then, create models for each alternative
X0 X1 X2 X3
X0 X1 X2 X3
Index of talk
• How to identify design principles• Design principles in:– Gene expression– Metabolic networks– Signal transduction– Development
• Design principles, what are they good for?• Summary
The demand theory for gene expression
• Are there situations where positive regulation of gene expression outperforms negative regulation of gene expression and vice versa?
Gene
Regulator
+Gene
Regulator_
Regulating gene expression has principles
• Positive regulator:– More effective when gene product in demand for large
fraction of life cycle.– Less noise sensitive if signal is low.
• Negative regulator:– More effective when gene product in demand for small
fraction of life cycle.– Less noise sensitive if signal is high.
Gene
Regulator
+Gene
Regulator_
Genetics 149:1665; PNAS 103:3999; PNAS 104:7151;Nature 405: 590
Index of talk
• How to identify design principles• Design principles in:– Gene expression– Metabolic networks– Signal transduction– Development
• Design principles, what are they good for?• Summary
Negative overall feedback is a design principle in metabolic biosynthesis
X0 X1 X2 X3
• Negative overall feedback:– More effective in coupling production to demand.– More robust to fluctuations.
Bioinformatics 16:786; Biophysical J. 79:2290
Index of talk
• How to identify design principles• Design principles in:– Gene expression– Metabolic networks– Signal transduction– Development
• Design principles, what are they good for?• Summary
Bifunctional sensors can be a design principle in signal transduction
• Bifunctional sensor:– Performs best against cross talk
• Independent deactivator:– Better integrator of signals
Mol. Microbiol. 48:25; Mol. Microbiol. 68: 1196
Signal
SensorSensor
EfectorEfectorDeactivator
Effect
Index of talk
• How to identify design principles• Design principles in:– Gene expression– Metabolic networks– Signal transduction– Development
• Design principles, what are they good for?• Summary
Gene
Regulator_
Design principles in development
Gene
Regulator
+
High demand, low signal
Signal
+
High demand, high signal
Low demand, high signal
Low demand, low signal
Signal _
Genetics 149:1665; PNAS 103:3999; PNAS 104:7151;Nature 405: 590
Index of talk
• How to identify design principles• Design principles in:– Gene expression– Metabolic networks– Signal transduction– Development
• Design principles, what are they good for?• Summary
Biological design principles are good to understand why biology works as it does
• Biological design principles may connect molecular determinants to functional effectiveness.
Heat shock
Expr
essio
n of
im
port
ant g
enes
time
Grow
th ra
te
time
BMC Bioinformatics 7:184
Underlying assumption
• Evolution of molecular networks can be treated as modules.
• Work in the group of Uri Alon suggests that– networks evolving to meet simultaneous goals
evolve in a modular fashion– Networks evolving to meet a single goal evolve
globally• Modularity seems like a reasonable first
assumptionPNAS 102:13773; PLOS Comp Biol 4:e1000206;BMC Evol biol 7: 169
The good news about function
• Sometimes, you get stuff for free!!!
• For example:– networks that are responsive to signals, just because
they are responsive may have inbuilt buffering of noise.
– Functions that are associated with marginally stable proteins are favored because due to the large dimensions of sequence space most randomly selected sequences have a structure that is marginally stable.
PNAS 100:14463; PNAS 103:6435; Proteins 46:105
How can biological design principles be applied?
• Design of molecular circuits with specific behaviors!!
Stable Systems
Unstable systems
Oscilations
Bistable systems
Cell 113: 597; PLoS Comput Biol. 5:e1000319; PNAS 106: 6435
Index of talk
• How to identify design principles• Design principles in:– Gene expression– Metabolic networks– Signal transduction– Development
• Design principles, what are they good for?• Summary
Summary
• Design principles can be found in molecular networks.
• Such principles can sometimes be connected to selection for function effectiveness.
• Even in the absence of such a connection, if they are valid they can be used to build biological circuits with specific behaviors.