/183 Natalio Krasnogor ASAP - Interdisciplinary Optimisation Laboratory School of Computer Science Centre for Integrative Systems Biology School of Biosciences Centre for Healthcare Associated Infections Institute of Infection, Immunity & Inflammation University of Nottingham Synthetic Biology 1 Copyright is held by the author/owner(s). GECCO’10, July 7–11, 2010, Portland, Oregon, USA. ACM 978-1-4503-0073-5/10/07.
These slides were used for a tutorial I gave at GECCO 2010. These are similar, yet not identical, to the other tutorials. The keynote file is too large for slideshare but if anybody needs the original I would be happy to provide a url from where to download it.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
/183
Natalio KrasnogorASAP - Interdisciplinary Optimisation LaboratorySchool of Computer Science
Centre for Integrative Systems BiologySchool of Biosciences
Centre for Healthcare Associated InfectionsInstitute of Infection, Immunity & Inflammation
University of Nottingham
Synthetic Biology
1
Copyright is held by the author/owner(s).GECCO’10, July 7–11, 2010, Portland, Oregon, USA.ACM 978-1-4503-0073-5/10/07.
/183
Outline
•Essential Systems Biology
•Synthetic Biology
•Computational Modeling for Synthetic Biology
•Automated Model Synthesis and Optimisation
•A Note on Ethical, Social and Legal Issues
•Conclusions2
L. Cronin, N. Krasnogor, B. G. Davis, C. Alexander, N. Robertson, J.H.G. Steinke, S.L.M. Schroeder, A.N. Khlobystov, G. Cooper, P. Gardner, P. Siepmann, and B. Whitaker. The imitation game - a computational chemical approach to recognizing life. Nature Biotechnology, 24:1203-1206, 2006.
The Basic Unit: A Gene’s Transcription Regulation Mechanics
10
Gene YDNA
Promoter
Gene Y
mRNAtranscription
translation
Protein Y
RNA Polymerase
Y
Gene YDNA
PromoterY
X binding site
X
Gene Y
mRNA+ transcription
+ translationProtein Y
Activator X
Si
Bound activator
YXSi
Bound activator
no transcription
Bound activator
Unbound repressor X
mRNAtranscription
translationProtein Y
/183
Network Motifs: Evolution’s Preferred Circuits•Biological networks are complex and vast•To understand their functionality in a scalable way one must choose the correct abstraction
•Moreover, these patterns are organised in non-trivial/non-random hierarchies
•Each network motif carries out a specific information-processing function
11
“Patterns that occur in the real network significantly more often than in randomized networks are called network motifs” Shai S. Shen-Orr et al., Network motifs in the transcriptional regulation
Radu Dobrin et al., Aggregation of topological motifs in the Escherichia coli transcriptional regulatory network. BMC Bioinformatics. 2004; 5: 10.
x(t) = xste!!"t
dx
dt= ! ! " " x
t 12
=log 2!
xst =!
"
/18312
Y positively regulates X
Negative autoregulation
Positive autoregulation
Negative autoregulation
Simple regulation
Positive autoregulation
U. Alon. Network motifs: theory and experimental approaches. Nature Reviews Genetics (2007) vol. 8 (6) pp. 450-461
Y
/18313
Shai S. Shen-Orr et al., Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics 31, 64 - 68 (2002)
A general transcription factor regulating a second TF, called specific TF, such that both regulate effector operon Z.
In a coherent FFL, the direct effect of the general transcription factor (X) has the same sign (+/-) than the indirect net effect through Y in the effector operon.
If the arrow from X to Z has different sign than the internal ones then the loop is an incoherent FFL
/18314
most commonin E. Coli & S. Cerevisiae
The C1-FFL is a ‘sign-sensitive delay’ element and a persistence detector.
The I1-FFL is a pulse generator and response accelerator
U. Alon. Network motifs: theory and experimental approaches. Nature Reviews Genetics (2007) vol. 8 (6) pp. 450-461
/18315
The C1-FFL is a ‘sign-sensitive delay’ element and a persistence detector.
If the integration function is “OR” (rather than “AND”), C1-FFL has now delay after stimulation by Sx but, instead, manifests the delay when the stimulation stops.
U. Alon. Network motifs: theory and experimental approaches. Nature Reviews Genetics (2007) vol. 8 (6) pp. 450-461
/18316
The I1-FFL is a pulse generator and response accelerator
U. Alon. Network motifs: theory and experimental approaches. Nature Reviews Genetics (2007) vol. 8 (6) pp. 450-461
/18317
Shai S. Shen-Orr et al., Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics 31, 64 - 68 (2002)
SIM is defined by one TF controlling a set of operons, with the same signs and no additional control.
TFs in SIMs are mostly negative autoregulatory (70% in E. coli)
U. Alon. Network motifs: theory and experimental approaches. Nature Reviews Genetics (2007) vol. 8 (6) pp. 450-461
As the activity of the master regulator X changes in time, it crosses the different activation threshold of the genes in the SIM at different times, this prioritizing the activation of the operons
/18318
U. Alon. Network motifs: theory and experimental approaches. Nature Reviews Genetics (2007) vol. 8 (6) pp. 450-461
/18319
Shai S. Shen-Orr et al., Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics 31, 64 - 68 (2002)
DORs are layers of dense sets of TFs affecting multiple operons.
To understand the specific function of these “gate-arrays” one needs to know the input functions (AND/OR) for each output gene. This data is not currently available in most cases.
/18320
Shai S. Shen-Orr et al., Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics 31, 64 - 68 (2002)
•The correct abstract ions facilitates understanding in complex systems.
•Provide a route to engineering & programming cells.
/183
Outline
•Essential Systems Biology
•Synthetic Biology
•Computational Modeling for Synthetic Biology
•Automated Model Synthesis and Optimisation
•Conclusions
21
/183
What is Synthetic Biology
Synthetic Biology is
A) the design and construction of new biological parts, devices, and systems, and
B) the re-design of existing, natural biological systems for useful purposes.
Synthetic Biology• Aims at designing, constructing and developing artificial biological systems •Offers new routes to ‘genetically modified’ organisms, synthetic living entities, smart drugs and hybrid computational-biological devices.
• Synthetic Biology's basic assumption:•Methods readily used to build non-biological systems could also be use to specify, design, implement, verify, test and deploy novel synthetic biosystems. •These method come from computer science, engineering and maths.•Modeling and optimisation run through all of the above.
23
/183
Sys
tem
s B
iolo
gy
Syn
thet
ic B
iolo
gy
Basic goal: to clarify current understandings by formalising what the constitutive elements of a system are and how they interact
Intermediate goal: to test current understandings against experimental data
Advanced goal: to predict beyond current understanding and available data
Dream goal: (1) to combinatorially combine in silico well-understood
components/models for the design and generation of novel experiments and hypothesis and ultimately
(2) to design, program, optimise & control (new) biological systems
24
Synthetic & Systems Biology: Sisters Disciplines
/183 20
Top-Down Synthetic Biology: An Approach to Engineering Biology
Cells are information processors. DNA is their programming language. DNA sequencing and PCR: Identification and isolation of cellular parts.
Recombinant DNA and DNA synthesis : Combination of DNA and construction of new systems.
Tools to make biology easier to engineer: Standardisation, encapsulation and abstraction (blueprints).
E. coli
Vibrio fischeri
Pseudomonas aeruginosaDiscosoma sp.
Aequoreavictoria
25
/183 20
Top-Down Synthetic Biology: An Approach to Engineering Biology
Cells are information processors. DNA is their programming language. DNA sequencing and PCR: Identification and isolation of cellular parts.
Recombinant DNA and DNA synthesis : Combination of DNA and construction of new systems.
Tools to make biology easier to engineer: Standardisation, encapsulation and abstraction (blueprints).
E. coli
Vibrio fischeri
Pseudomonas aeruginosaDiscosoma sp.
Aequoreavictoria
25
/183 20
Top-Down Synthetic Biology: An Approach to Engineering Biology
Cells are information processors. DNA is their programming language. DNA sequencing and PCR: Identification and isolation of cellular parts.
Recombinant DNA and DNA synthesis : Combination of DNA and construction of new systems.
Tools to make biology easier to engineer: Standardisation, encapsulation and abstraction (blueprints).
E. coli
Vibrio fischeri
Pseudomonas aeruginosa
plasmids
Discosoma sp.
Aequoreavictoria
25
/183 20
Top-Down Synthetic Biology: An Approach to Engineering Biology
Cells are information processors. DNA is their programming language. DNA sequencing and PCR: Identification and isolation of cellular parts.
Recombinant DNA and DNA synthesis : Combination of DNA and construction of new systems.
Tools to make biology easier to engineer: Standardisation, encapsulation and abstraction (blueprints).
E. coli
Vibrio fischeri
Pseudomonas aeruginosa
plasmids
DNA synthesis
Discosoma sp.
Aequoreavictoria
25
/183 20
Top-Down Synthetic Biology: An Approach to Engineering Biology
Cells are information processors. DNA is their programming language. DNA sequencing and PCR: Identification and isolation of cellular parts.
Recombinant DNA and DNA synthesis : Combination of DNA and construction of new systems.
Tools to make biology easier to engineer: Standardisation, encapsulation and abstraction (blueprints).
E. coli
Vibrio fischeri
Pseudomonas aeruginosa
plasmids
DNA synthesis
Discosoma sp.
Aequoreavictoria
Chassis
25
/183 20
Top-Down Synthetic Biology: An Approach to Engineering Biology
Cells are information processors. DNA is their programming language. DNA sequencing and PCR: Identification and isolation of cellular parts.
Recombinant DNA and DNA synthesis : Combination of DNA and construction of new systems.
Tools to make biology easier to engineer: Standardisation, encapsulation and abstraction (blueprints).
E. coli
Vibrio fischeri
Pseudomonas aeruginosa
plasmids
DNA synthesis
Discosoma sp.
Aequoreavictoria
Circuit BlueprintChassis
25
/183
Synthetic Biology’s Brick & Mortar (I)
26
D. Sprinzak & M.B. Elowitz (2005). Reconstruction of genetic circuits, Nature 438:24, 443-448.
/18327
Synthetic Biology’s Brick & Mortar (II)
D. Sprinzak & M.B. Elowitz (2005). Reconstruction of genetic circuits, Nature 438:24, 443-448.
/183
Example I: Elowitz & Leibler Represilator
28
M.B. Elowitz & S. Leibler (2000). A Synthetic Oscilatory Network of Transcriptional Regulators. Nature, 403:20, 335-338
/18329
An Example: Elowitz & Leibler Represilator
M.B. Elowitz & S. Leibler (2000). A Synthetic Oscillatory Network of Transcriptional Regulators. Nature, 403:20, 335-338
/183
Example II: Combinatorial Synthetic Logic
30
C.C. Guet et al., Combinatorial Synthesis of Genetic Networks, Science 296, 1466-1470, 2002
/183
Example II: Combinatorial Synthetic Logic
31
C.C. Guet et al., Combinatorial Synthesis of Genetic Networks, Science 296, 1466-1470, 2002
/18332
Example II: Combinatorial Synthetic Logic
C.C. Guet et al., Combinatorial Synthesis of Genetic Networks, Science 296, 1466-1470, 2002
/183
Example III: Push-on/Push-off circuit
33
C. Lou et al., Synthesizing a novel genetic sequential logic circuit: a push-on push-off switch, Molecular Systems Biology 6; Article number 350
/183
Example III: Push-on/Push-off circuit
34
C. Lou et al., Synthesizing a novel genetic sequential logic circuit: a push-on push-off switch, Molecular Systems Biology 6; Article number 350
/183
• Two different bacterial strains carrying specific synthetic gene regulatory networks are used.
• The first strain produces a diffusible signal AHL.
• The second strain possesses a synthetic gene regulatory network which produces a pulse of GFP after AHL sensing within a range of values (Band Pass).
Axample IV: Ron Weiss' Pulse Generator
S. Basu, R. Mehreja, et al. (2004) Spatiotemporal control of gene expression with pulse generating networks, PNAS, 101, 6355-6360
35
/18336
/183
Outline
•Essential Systems Biology
•Synthetic Biology
•Computational Modeling for Synthetic Biology
•Automated Model Synthesis and Optimisation
•Conclusions
37
/183
What is modeling?
• Is an attempt at describing in a precise way an understanding of the elements of a system of interest, their states and interactions
• A model should be operational, i.e. it should be formal, detailed and “runnable” or “executable”.
38
/183
•“feature selection” is the first issue one must confront when building a model
•One starts from a system of interest and then a decision should be taken as to what will the model include/leave out
•That is, at what level the model will be built
39
/183
The goals of Modelling
•To capture the essential features of a biological entity/phenomenon•To disambiguate the understanding behind those features and their interactions•To move from qualitative knowledge towards quantitative knowledge
40
/183
Cells
Colonies
Networks
Systems Biology Synthetic Biology
• Understanding• Integration• Prediction• Life as it is
•Control• Design• Engineering•Life as it could be
Computational modelling toelucidate and characterisemodular patterns exhibitingrobustness, signal filtering,amplification, adaption, error correction, etc.
Computational modelling toengineer and evaluate possible cellular designsexhibiting a desiredbehaviour by combining well studied and characterised cellular modules
41
Modeling in Systems & Synthetic Biology
/183
There is potentially a distinction between modeling for Synthetic Biology vs Systems Biology:
•Systems Biology is concerned with Biology as it is•Synthetic Biology is concerned with Biology as it could be
“Our view of engineering biology focuses on the abstraction and standardization of biological components” by R. Rettberg @ MIT newsbite August 2006.
“Well-characterized components help lower the barriers to modeling. The use of control elements (such as temperature for a temperature-sensitive protein, or an exogenous small molecule affecting a reaction) helps model validation” by Di Ventura et al, Nature, 2006
Co-design of parts and their models hence improving and making both more reliable
42
/183
Model DevelopmentFrom [E. Klipp et al, Systems Biology in Practice, 2005]:
1) Formulation of the problem2) Verification of available information3) Selection of model structure4) Establishing a simple model5) Sensitivity analysis6) Experimental tests of the model predictions7) Stating the agreements and divergences between
experimental and modelling results8) Iterative refinement of model
43
/183
The Challenge of Scales
Within a cell the dissociation constants of DNA/ transcription factor binding to specific/non-specific sites differ by 4-6 orders of magnitude
DNA protein binding occurs at 1-10s time scale very fast in comparison to a cell’s life cycle.
44
R. Milo, et al., BioNumbers—the database of key numbers in molecular and cell biology. Nucleic Acids
The Challenge of Small Numbers in Cellular Systems
Most commonly recognised sources of noise in cellular system are low number of molecules and slow molecular interactions.
Over 80% of genes in E. coli express fewer than a hundred proteins per cell.
Mesoscopic, discrete and stochastic approaches are more suitable: Only relevant molecules are taken into account. Focus on the statistics of the molecular interactions and how often they
take place.
Mads Karn et al. Stochasticity in Gene Expression: From Theories to Phenotypes. Nature Reviews, 6, 451-464 (2005)
Purnananda Guptasarma. Does replication-induced transcription regulate synthesis of the myriad low copy number poteins of E. Coli. BioEssays, 17, 11, 987-997
46
/183
Modelling Approaches
There exist many modeling approaches, each with its advantages and disadvantages.
Macroscopic, Microscopic and MesoscopicQuantitative and qualitativeDiscrete and ContinuousDeterministic and StochasticTop-down or Bottom-up
Set of equations showing relationships between molecular quantities and how they change over time.They are approximated numerically. (I.e. Ordinary Differential Equations, PDEs, etc)
•Operational Semantics Models:
Algorithm (list of instructions) executable by an abstract machine whose computation resembles the behaviour of the system under study. (i.e. Finite State Machine)
Jasmin Fisher and Thomas Henzinger. Executable cell biology. Nature Biotechnology, 25, 11, 1239-1249 (2008)
48
A. Regev, E. Shapiro. The π-calculus as an abstraction for biomolecular systems. Modelling in Molecular Biology., pages 1–50. Springer Berlin., 2004.
D. Harel, "A Grand Challenge for Computing: Full Reactive Modeling of a Multi-Cellular Animal", Bulletin of the EATCS , European Association for Theoretical Computer Science, no. 81, 2003, pp. 226-235
/183
•From [D.E Goldberg, 2002] (adapted): “Since science and math are in the description
business, the model is the thing…The engineer or inventor has much different motives. The engineered object is the thing”
ε, e
rror
C, cost of modelling
Synthetic Biologist
Computer Scientist/Mathematician
Tools Suitability and Cost
49
/18350
/183
There are good reasons to think that information processing is a key viewpoint to take when modeling
Life as we know is:• coded in discrete units (DNA, RNA, Proteins)• combinatorially assembles interactions (DNA-RNA, DNA-Proteins,RNA-Proteins , etc) through evolution and self-organisation• Life emerges from these interacting parts• Information is:
• transported in time (heredity, memory e.g. neural, immune system, etc)• transported in space (molecular transport processes, channels, pumps, etc)
• Transport in time = storage/memory a computational process• Transport in space = communication a computational process• Signal Transduction = processing a computational process
51
/183
Computer Science ContributionsMethodologies designed to cope with: • Languages to cope with complex, concurrent, systems of parts:
• ∏-calculus• Process Calculi• P Systems
• Tools to analyse and optimise:• EA, ML• Model Checking
52
J.Twycross, L.R. Band, M. J. Bennett, J.R. King, and N. Krasnogor. Stochastic and deterministic multiscale models for systems biology: an auxin-transport case study. BMC Systems Biology, 4(:34), March 2010
/18353
InfoBiotics Workbench and Dashboardwww.infobiotics.net
Molecular Species A molecular species can be represented using
individual objects.
A molecular species with relevant internal structure can be represented using a string.
61
/183
Molecular Interactions Comprehensive and relevant rule-based schema
for the most common molecular interactions taking place in living cells.
Transformation/Degradation Complex Formation and Dissociation Diffusion in / out Binding and Debinding Recruitment and Releasing Transcription Factor Binding/Debinding Transcription/Translation
62
/183
Compartments / Cells Compartments and regions are explicitly
specified using membrane structures.
63
/183
Colonies / Tissues Colonies and tissues are representing as
collection of P systems distributed over a lattice.
Objects can travel around the lattice through translocation rules.
v
64
/183
Molecular Interactions Inside Compartments
65
/183
Passive Diffusion of Molecules
66
/183
Signal Sensing and Active Transport
67
/183
Specification of Transcriptional Regulatory Networks
68
/183
Post-Transcriptional Processes For each protein in the system, post-transcriptional processes like
translational initiation, messenger and protein degradation, protein dimerisation, signal sensing, signal diffusion etc are represented using modules of rules.
Modules can have also as parameters the stochastic kinetic constants associated with the corresponding rules in order to allow us to explore possible mutations in the promoters and ribosome binding sites in order to optimise the behaviour of the system.
69
/183
Scalability through Modularity
Cellular functions arise from orchestrated interactions between motifs consisting of many molecular interacting species.
A P System model is a set of rules representing molecular interactions motifs that appear in many cellular systems.
70
/183
Basic P System Modules Used
71
/183 21
Characterisation/Encapsulation of Cellular Parts: Gene Promoters
A modeling language for the design of synthetic bacterial colonies.
A module, set of rules describing the molecular interactions involving a cellular part, provides encapsulation and abstraction. Collection or libraries of reusable cellular parts and reusable models.
E. Davidson (2006) The Regulatory Genome, Gene Regulation Networks in Development and Evolution, Elsevier
72
/183 21
Characterisation/Encapsulation of Cellular Parts: Gene Promoters
A modeling language for the design of synthetic bacterial colonies.
A module, set of rules describing the molecular interactions involving a cellular part, provides encapsulation and abstraction. Collection or libraries of reusable cellular parts and reusable models.
E. Davidson (2006) The Regulatory Genome, Gene Regulation Networks in Development and Evolution, Elsevier
72
/183 21
Characterisation/Encapsulation of Cellular Parts: Gene Promoters
A modeling language for the design of synthetic bacterial colonies.
A module, set of rules describing the molecular interactions involving a cellular part, provides encapsulation and abstraction. Collection or libraries of reusable cellular parts and reusable models.
E. Davidson (2006) The Regulatory Genome, Gene Regulation Networks in Development and Evolution, Elsevier
72
/183 21
Characterisation/Encapsulation of Cellular Parts: Gene Promoters
A modeling language for the design of synthetic bacterial colonies.
A module, set of rules describing the molecular interactions involving a cellular part, provides encapsulation and abstraction. Collection or libraries of reusable cellular parts and reusable models.
LuxRAHL
CI
E. Davidson (2006) The Regulatory Genome, Gene Regulation Networks in Development and Evolution, Elsevier
72
/183 21
Characterisation/Encapsulation of Cellular Parts: Gene Promoters
A modeling language for the design of synthetic bacterial colonies.
A module, set of rules describing the molecular interactions involving a cellular part, provides encapsulation and abstraction. Collection or libraries of reusable cellular parts and reusable models.
Characterisation/Encapsulation of Cellular Parts: Degradation Tags
Degradation tags are amino acid sequences recognised by proteases. Once the corresponding DNA sequence is fused to a gene the half life of the protein is reduced considerably.
Infobiotics: An Integrated Frameworkhttp://www.infobiotics.org/infobiotics-workbench/
Cellular Parts
78
/183 29
Infobiotics: An Integrated Frameworkhttp://www.infobiotics.org/infobiotics-workbench/
Libraries of Modules
Cellular Parts
78
/183 29
Infobiotics: An Integrated Frameworkhttp://www.infobiotics.org/infobiotics-workbench/
Libraries of Modules
Cellular Parts
Module Combinations
78
/183 29
Infobiotics: An Integrated Frameworkhttp://www.infobiotics.org/infobiotics-workbench/
Libraries of Modules
Cellular Parts
Synthetic Circuits
Module Combinations
78
/183 29
Infobiotics: An Integrated Frameworkhttp://www.infobiotics.org/infobiotics-workbench/
Libraries of Modules
Cellular Parts
Synthetic Circuits
Module Combinations
78
/183 29
Infobiotics: An Integrated Frameworkhttp://www.infobiotics.org/infobiotics-workbench/
Libraries of Modules
P systems
Cellular Parts
Synthetic Circuits
Module Combinations
78
/183 29
Infobiotics: An Integrated Frameworkhttp://www.infobiotics.org/infobiotics-workbench/
Libraries of Modules
P systems
Single Cells
Cellular Parts
Synthetic Circuits
Module Combinations
78
/183 29
Infobiotics: An Integrated Frameworkhttp://www.infobiotics.org/infobiotics-workbench/
Libraries of Modules
P systems
Single Cells
Cellular Parts
Synthetic Circuits
Module Combinations
78
/183 29
Infobiotics: An Integrated Frameworkhttp://www.infobiotics.org/infobiotics-workbench/
Libraries of Modules
P systems LPP systems
Single Cells
Cellular Parts
Synthetic Circuits
Module Combinations
78
/183 29
Infobiotics: An Integrated Frameworkhttp://www.infobiotics.org/infobiotics-workbench/
Synthetic Multi-cellular Systems
Libraries of Modules
P systems LPP systems
Single Cells
Cellular Parts
Synthetic Circuits
Module Combinations
78
/183 29
Infobiotics: An Integrated Frameworkhttp://www.infobiotics.org/infobiotics-workbench/
Synthetic Multi-cellular Systems
Libraries of Modules
P systems LPP systems
Single Cells
Cellular Parts
Synthetic Circuits
Module Combinations
78
/183 29
Infobiotics: An Integrated Frameworkhttp://www.infobiotics.org/infobiotics-workbench/
Synthetic Multi-cellular Systems
Libraries of Modules
P systems LPP systems
A compiler based on a BNF grammar
Single Cells
Cellular Parts
Synthetic Circuits
Module Combinations
78
/183 29
Infobiotics: An Integrated Frameworkhttp://www.infobiotics.org/infobiotics-workbench/
Synthetic Multi-cellular Systems
Libraries of Modules
P systems LPP systems
Multi Compartmental Stochastic Simulations
based on Gillespie’s algorithm
A compiler based on a BNF grammar
Single Cells
Cellular Parts
Synthetic Circuits
Module Combinations
78
/183 29
Infobiotics: An Integrated Frameworkhttp://www.infobiotics.org/infobiotics-workbench/
Synthetic Multi-cellular Systems
Libraries of Modules
P systems LPP systems
Multi Compartmental Stochastic Simulations
based on Gillespie’s algorithm
Spatio-temporal Dynamics Analysis
using Model Checking with PRISM and MC2
A compiler based on a BNF grammar
Single Cells
Cellular Parts
Synthetic Circuits
Module Combinations
78
/183 29
Infobiotics: An Integrated Frameworkhttp://www.infobiotics.org/infobiotics-workbench/
Synthetic Multi-cellular Systems
Libraries of Modules
P systems LPP systems
Multi Compartmental Stochastic Simulations
based on Gillespie’s algorithm
Spatio-temporal Dynamics Analysis
using Model Checking with PRISM and MC2
Automatic Design of Synthetic Gene
Regulatory Circuits using Evolutionary Algorithms
A compiler based on a BNF grammar
Single Cells
Cellular Parts
Synthetic Circuits
Module Combinations
78
/183
Stochastic P Systems Are Executable Programs
The virtual machine running these programs is a “Gillespie Algorithm (SSA)”. It generates trajectories of a stochastic syste:
A stochastic constant is associated with each rule.A propensity is computed for each rule by multiplying the stochastic constant by the number of distinct possible combinations of the elements on the left hand side of the rule.
F. J. Romero-Campero, J. Twycross, M. Camara, M. Bennett, M. Gheorghe, and N. Krasnogor. Modular assembly of cell systems biology models using p systems. International Journal of Foundations of Computer Science, 2009
79
/183
Multicompartmental Gillespie Algorithm
1
2
3 r11,…,r1
n1
M1
r21,…,r2
n2
M2
r31,…,r3
n3
M3
( 1, τ1, r01)
( 2, τ2, r02)
( 3, τ3, r03)
( 2, τ2, r02)
( 1, τ1, r01)
( 3, τ3, r03)
Sort Compartments τ2 < τ1 < τ3
Local Gillespie
( 1, τ1-τ2, r01)
( 3, τ3-τ2, r03)
Update Waiting Times
( 2, τ2’, r02)( 1, τ1-τ2, r0
1)
( 2, τ2’, r02)
( 3, τ3-τ2, r03)
Insert new triplet τ1-τ2 <τ2’ < τ3-τ2
‘
80
/183
An Important Difference with “Normal” Programs
•Executable Stochastic P systems are not programs with stochastic behavior
•A cell is a living example of distributed stochastic computing.
F. J. Romero-Campero, J. Twycross, M. Camara, M. Bennett, M. Gheorghe, and N. Krasnogor. Modular assembly of cell systems biology models using p systems. International Journal of Foundations of Computer Science, 2009
82
RapidModelPrototyping
/18383 InfoBiotics Workbench and Dashboard
Spec
ifica
tion
/18383 InfoBiotics Workbench and Dashboard
Spec
ifica
tion
Sim
ulat
ion
Analysis
/183
• Two different bacterial strains carrying specific synthetic gene regulatory networks are used.
• The first strain produces a diffusible signal AHL.
• The second strain possesses a synthetic gene regulatory network which produces a pulse of GFP after AHL sensing within a range of values (Band Pass).
An example: A Pulse Generator
S. Basu, R. Mehreja, et al. (2004) Spatiotemporal control of gene expression with pulse generating networks, PNAS, 101, 6355-6360
84
/183
Sender Cells
Pconst
LuxI AHL
AHL
SenderCell()=
{
Pconst({X = luxI },…)
PostTransc({X=LuxI},{c1=3.2,…})
Diff({X=AHL},{c=0.1})
}
luxI
85
/183
luxRPconst
cIPlux
gfpPluxOR1
LuxR
CI
GFPAHL
AHL
PulseGenerator()=
{
Pconst({X=luxR},…)
PluxOR1({X=gfp},…)
Plux({X=cI},…)
…
…
Diff({X=AHL},…)
}
86
Pulse Generating Cells
/183
Spatial Distribution of Senders and Pulse Generators
Outline•Brief Introduction to Computational Modeling
•Modeling for Top Down SB•Executable Biology•A pinch of Model Checking
•Modeling for the Bottom Up SB•Dissipative Particle Dynamics
•Automated Model Synthesis and Optimisation
•Conclusions97
/18398 InfoBiotics Workbench and Dashboard
Spec
ifica
tion
Sim
ulat
ion
Analysis
/18398 InfoBiotics Workbench and Dashboard
Spec
ifica
tion
Sim
ulat
ion
Analysis
Optim
isation
/183
Automated Model Synthesis and Optimisation
• Modeling is an intrinsically difficult process
• It involves “feature selection” and disambiguation
• Model Synthesis requires• design the topology or structure of the
system in terms of molecular interactions• estimate the kinetic parameters associated
with each molecular interaction
• All the above iterated99
/183
• Once a model has been prototyped, whether derived from existing literature or “ab initio” ➡ Use some optimisation method to fine tune parameters/model structure
• adopts an incremental methodology, namely starting from very simple P system modules (BioBricks) specifying basic molecular interactions, more complicated modules are produced to model more complex molecular systems.
100
/183
Large Literature on Model Synthesis• Mason et al. use a random Local Search (LS) as the mutation to
evolve electronic networks with desired dynamics
• Chickarmane et al. use a standard GA to optimize the kinetic parameters of a population of ODE-based reaction networks having the desired topology.
• Spieth et al. propose a memetic algorithm to find gene regulatory networks from experimental DNA microarray data where the network structure is optimized with a GA and the parameters are optimized with an Evolution Strategy (ES).
• Etc
101
/183
Evolutionary Algorithms for Automated Model Synthesis and Optimisation
EA are potentially very useful for AMSO• There’s a substantial amount of work on:• using GP-like systems to evolve executable
structures• using EAs for continuous/discrete
optimisation• An EA population represents alternative
models (could lead to different experimental setups)
• EAs have the potential to capture, rather than avoid, evolvability of models
102
/183
• The main idea is to use a nested evolutionary algorithm where the first layer evolves model structures while the inner layer acts as a local search for the parameters of the model. • It uses stochastic P systems as a computational, modular and discrete-stochastic modelling framework.
• It adopts an incremental methodology, namely starting from very simple P system modules specifying basic molecular interactions, more complicated modules are produced to model more complex molecular systems.
•Successfully validated evolved models can then be added to the models library
Evolving Executable Biology Models
103
/183
Nested EA for Model Synthesis
F. Romero-Campero, H.Cao, M. Camara, and N. Krasnogor. Structure and parameter estimation for cell systems biology models. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2008), pages 331-338. ACM Publisher, 2008.
H. Cao, F.J. Romero-Campero, S. Heeb, M. Camara, and N. Krasnogor. Evolving cell models for systems and synthetic biology. Systems and Synthetic Biology , 2009
104
/183
Fitness Evaluation
105
/183
Representation
106
/183107
/183
Structural Operators
108
/183
Structural Operators
109
/183
Parameter Optimisation
110
/183
Parameter Optimisation
111
/183
Simple Illustrative Case studies
• Case 1: molecular complexation
• Case 2: enzymatic reaction
• Case 3: autoregulation in transcriptional networks
Case 1: Molecular ComplexationTarget P System Model Structure
Target P System Model Parameters
c1
c2
113
/183
Case 1: Experimental Results
50/50 runs found the same P system model structure as the target one
Target and Evolved Time Series 114
/183
Case 1: Experimental Results
50/50 runs found the same P system model structure as the target one
Best Model Parameters
Target Model Parameters
Target and Evolved Time Series 115
/183
Case 2: Enzymatic ReactionTarget P System Model Structure
Target P System Model Parameters
c1
c2
c3
116
/183
Case 2: Experimental ResultsTarget and Evolved Time Series
117
/183
Case 2: Experimental Results
Best Model Parameters
Target Model Parameters Target and Evolved Time Series
Two Approaches Using Basic Modules
Adding the Newly Found in Case 1
Number of runs the target P system model was
found
12/50 39/50
Mean RMSE 3.8 2.85118
/183
Case 3.1: Negative AutoregulationTarget P System Model Structure Target P System Model Parameters
c1
c2
c3
c4
c5
c6
119
/183
Case 3.1: Experimental Results
Best Model in Design 1
Design Design 1 Design 2
Model
Number of runs 46/50 4/50
Mean +/- STD 4.11+/- 11.73 4.63 +/- 2.91
Best Model in Design 2
120
/183
Case 3.2: Positive AutoregulationTarget P System Model Structure Target P System Model Parameters
c1
c2
c3
c4
c5
c6
c7
121
/183
Case 3.2: Experimental ResultsDesign Design 1 Design 2
Model
Number of runs 30/50 20/50
Mean +/- STD 16.36 +/- 3.03 20.14 +/- 5.13
Best Model in Design 1
122
/183
The Fitness Function• Multiple time-series per target
• Different time series have very different profiles, e.g., response time or maxima occur at different times/places
• Transient states (sometimes) as important as steady states
•RMSE will mislead search
H. Cao, F. Romero-Campero, M.Camara, N.Krasnogor (2009). Analysis of Alternative Fitness Methods for the Evolutionary Synthesis of Cell Systems Biology Models.
123
/183
Four Fitness FunctionsN time seriesM time points in each time series Simulated data Target data
F1
F2
F2 normalises whithin a time series between [0,1]
124
/183
Four Fitness FunctionsN time seriesM time points in each time series Simulated data Target data Randomly generated normalised
vector with:
F3
F3, unlike F1, does not assume an equal weighting of all the errors. It produces a randomised average of weights.
125
/183
Four Fitness FunctionsN time seriesM time points in each time series Simulated data Target data
F4
F4 simply multiplies errors hence even small ones within a set with multiple orders of maginitudes can still have an effect. Indeed, this might lead to numerical instabilities due to nonlinear effects.
126
/183
Problem Specification
127
/183
Case Studies
128
/183129
/183130
/183131
/183132
/183133
/183134
/183135
/183
Results Study Case 1
136
/183
Results Study Case 2
137
/183
Results Study Case 3
138
/183
Alternative, biologically valid, modules
Target model
139
/183140
/183
Results Study Case 4
141
/183
Comparisons of the constants between the best fitness models obtained by F1, F4 and the target model for Test Case 4 (Values different from the target are in bold and underlined)
142
/183143
/183144
/183145
/183146
/183147
/183
Target
148
/183
Target
149
/183
The evolving curve of average fitness by four fitness methods (F1 ∼ F4) in 20 runs for Test Case 4.
150
/183
The evolving curve of average model diversity by four fitness methods (F1 ∼ F4) in 20 runs for Test Case 4
151
/183152
/183
Multi-Objective Optimisation in Morphogenesis
The following slides are based on
Rui Dilão, Daniele Muraro, Miguel Nicolau, Marc Schoenauer. Validation of a morphogenesis model of Drosophila early development by a multi-objective evolutionary optimization algorithm. Proc. 7th European Conference on Evolutionary Computation, ML and Data Mining in BioInformatics (EvoBIO'09), April 2009.
153
/183
• The authors investigate the use of MOEA for modeling morphogenesis
• The model organism is drosophila embrios Rui Dilão, Daniele Muraro, Miguel Nicolau, Marc Schoenauer. Validation of a morphogenesis model of Drosophila early development by a multi-objective evolutionary optimization algorithm. Proc. 7th European Conference on Evolutionary Computation, ML and Data Mining in BioInformatics (EvoBIO'09), April 2009
154
/183
Initial Conditions
PDE model
155
/183
Target
By Evolving:abcd, acad, Dbcd/Dcad, r, mRNA distributions and t
However: Goal is not perfect fit but rather robust fit
156
/183
• The authors use both Single and Multi-objective optimisation• CMA-ES is at the core of both• CMA-ES is a (µ,λ)-ES that employs multivariate Gaussian
distributions • Uses “cumulative path” for co-variantly adapting this
distribution• For the MO case it uses the global pareto dominance based
selection
Bi-Objective Function
Objective Function
157
/183158
/183159
/183160
/183
Parameter Optimisation in Metabolic Models
161
A. Drager et al. (2009). Modeling metabolic networks in C. glutamicum: a comparison of rate laws in combination with various parameter optimization strategies. BMC Systems Biol ogy 2009, 3:5
/183
Outline
•Essential Systems Biology
•Synthetic Biology
•Computational Modeling for Synthetic Biology
•Automated Model Synthesis and Optimisation
•A Note on Ethical, Social and Legal Issues
•Conclusions162
/183
Synthetic BiologyThe new science of synthetic biology aims to re-engineer life at the molecular level and even create completely new forms of life. It has the potential to create new medicines, biofuels, assist climate change through carbon capture, and develop solutions to help clean up the environment.
163
/183
What is Synthetic Biology?Synthetic Biology isA) the design and construction of new biological parts, devices, and systems, andB) the re-design of existing, natural biological systems for useful purposes.
TransplantationItaya, M., Tsuge, K. Koizumi, M., and Fujita, K. Combining two genomes in one C e l l : S t a b l e c l o n i n g o f t h e Synechosystis PCC6803 genome in the Bacillus subtilis 168 genome.Proc. Natl. Acad. Sci., USA, 102, 15971-15976 (2005)
+ =
150 times larger than the human genome
166
/183
A Technology Not a Product“ The problem of deaths and injury as a result of road accidents is now acknowledge to be a global phenomenon.... publications show that in 1990 road accidents as a cause of death or dissability were in ninth place out of a total of over 100 identified causes.... by 2020 forecasts suggest... road accidents will move up to sixth...”
Estimating global road fatalities. G. Jacobs & A. Aeron-Thomas. Global Road Safety Partnership
167
/183
A Technology Not a Product“ The problem of deaths and injury as a result of road accidents is now acknowledge to be a global phenomenon.... publications show that in 1990 road accidents as a cause of death or dissability were in ninth place out of a total of over 100 identified causes.... by 2020 forecasts suggest... road accidents will move up to sixth...”
And yet nobody seriously considers banning mechanical engineering
Estimating global road fatalities. G. Jacobs & A. Aeron-Thomas. Global Road Safety Partnership
167
/183
A Technology Not a Product
168
/183
A Technology Not a Product
And yet nobody seriously considers banning printing technology
168
/183
But ....•Technologies are regulated:
•Cars have seat belts and laws establish speed limits•Main Kampf is banned in Germany and by, e.g., restricting google, China bans uncountable written material of all sorts!
•Societies must establish an informed dialogue involving:•tax payers•Scientists•Lobbies of all sorts•Government
169
/183
What IS Synthetic Biology?Synthetic Biology isA) the design and construction of new biological parts, devices, and systems, andB) the re-design of existing, natural biological systems for useful purposes.
http://syntheticbiology.org/
C) Through rigorous mathematical, computational engineering routes
Summary & Conclusions•This talk has focused on an integrative methodology for Systems & Synthetic Biology
•Executable Biology•Parameter and Model Structure Discovery
•Computational models (or executable in Fisher & Henzinger’s jargon) adhere to (a degree) to an operational semantics.
•Refer to the excellent review [Fisher & Henzinger, Nature Biotechnology, 2007]
173
/183
•The gap present in mathematical models between the model and its algorithmic implementation disappears in computational models as all of them are algorithms.•A new gap appears between the biology and the modeling technique and this can be solved by a judicious “feature selection”, i.e. the selection of the correct abstractions•Good computational models are more intuitive and analysable
Summary & Conclusions
174
/183
•Computational models can thus be executed (quite a few tools out there, lots still missing)•Quantitative VS qualitative modelling: computational models can be very useful even when not every detail about a system is known.•Missing Parameters/model structures can sometimes be fitted with of-the-shelf optimisation strategies (e.g. COPASI, GAs, etc)•Computational models can be analysed by model checking: thus they can be used for testing hypothesis and expanding experimental data in a principled way
Summary & Conclusions
175
/183
A nested evolutionary algorithm is proposed to automatically develop and optimise the modular structure and parameters of cellular models based on stochastic P systems. Several case studies with incremental model complexity demonstrate the effectiveness of our algorithm.
The fact that this algorithm produces alternative models for a specific biological signature is very encouraging as it could help biologists to design new experiments to discriminate among competing hypothesis (models).
Comparing results by only using the elementary modules and by adding newly found modules to the library shows the obvious advantage of the incremental methodology with modules. This points out the great potential to automatically design more complex cellular models in the future by using a modular approach.
Summary & Conclusions
176
/183
•Synthetising Synthetic Biology Models is more like evolving general GP programs and less like fitting regresion or inter/extra-polation•We evolve executable structures, distributed programs(!)•These are noisy and expensive to execute•Like in GP programs, executable biology models might achieve similar behaviour through different program “structure”•Prone to bloat•Like in GP, complex relation between diversity and solution quality•However, diverse solutions of similar fit might lead to interesting experimental routes•Co-desig of models and wetware.
Summary & Conclusions
177
/183
Summary & Conclusions•Some really nice tutorials and other sources:
•Luca Caderlli’s BraneCalculus & BioAmbients•Simulating Biological Systems in the Stochastic π−calculus by Phillips and Cardelli•From Pathway Databases to Network Models by Aguda and Goryachev•Modeling and analysis of biological processes by Brane Calculi and Membrane Systems by Busi and Zandron•D. Gilbert’s website contain several nice papers with related methods and tutorials
178
/183
Other SourcesF. J. Romero-Campero, J. Twycross, M. Camara, M. Bennett, M. Gheorghe, and N. Krasnogor. Modular assembly of cell systems biology models using p systems. International Journal of Foundations of Computer Science, (to appear), 2009.
F.J. Romero-Camero and N. Krasnogor. An approach to biomodel engineering based on p systems. In Proceedings of Computation In Europe (CIE 2009), 2009.
J. Smaldon, N. Krasnogor, M. Gheorghe, and A. Cameron. Liposome logic. In Proceedings of the 2009 Genetic and Evolutionary Computation Conference (GECCO 2009), 2009
F. Romero-Campero, H.Cao, M. Camara, and N. Krasnogor. Structure and parameter estimation for cell systems biology models. In Maarten Keijzer et.al, editor, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2008), pages 331-338. ACM Publisher, 2008. This paper won the Best Paper award at the Bioinformatics track.
J. Smaldon, J. Blake, D. Lancet, and N. Krasnogor. A multi-scaled approach to artificial life simulation with p systems and dissipative particle dynamics. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2008). ACM Publisher, 2008.
179
/183
Other SourcesPăun, Gh. Computing with membranes. Journal of Computer and System Sciences 61 (2000) 108-143
P Systems Web Page http://psystems.disco.unimib.it/ Bianco L. Membrane Models of Biological Systems PhD thesis 2007
Bernardini F, Gheorghe M, Krasnogor N, Terrazas G. Membrane Computing - Current Results and Future Problems. CiE 2005 49-53
Bernardini F, Gheorghe M, Krasnogor N. Quorum sensing P systems. Theoretical Computer Science 371 (2007) 20-33
Miguel Nicolau, Marc Schoenauer. Evolving Specific Network Statistical Properties using a Gene Regulatory Network Model. In ECCS 2008, 5th European Conference on Complex Systems, 2008. Miguel Nicolau, Marc Schoenauer. Evolving Scale-Free Topologies using a Gene Regulatory Network Model. CEC 2008, IEEE Congress on Evolutionary Computation, pp. 3748-3755, IEEE Press, 2008.
A. Ridwan. A parallel implementation of Gillespie's Direct Method. Proc. of the International Conference on Computational Science, p.284-291, Krakow, Poland, June 2004.
G. C. Ewing et al. Akaroa2: Exploiting network computing by distributing stochastic simulation. Proc. of the European Simulation Multiconference, p.175-181, Warsaw, June 1999.
P. Hellekalek. Don't trust parallel Monte Carlo! ACM SIGSIM Simulation Digest, 28(1):82-89, 1998.
M. Schwehm. Parallel stochastic simulation of whole cell models. Proc. of the Second International Conference on Systems Biology, p.333-341, CalTech, C.A., November 2001
Other Sources
181
/183
Acknowledgements
•Jonathan Blake
•Claudio Lima
•Francisco Romero-Campero
•Karima Righetti
•Jamie Twycross
Integrated Environment
Machine Learning & Optimisation
Modeling & Model Checking
Molecular Micro-Biology
Stochastic Simulations
Members of my team working on SB2
EP/E017215/1
EP/H024905/1
BB/F01855X/1
BB/D019613/1
University of NottinghamProf. M. Camara, Dr. S. Heeb, Dr. G. Rampioni, Prof. P. WilliamsWeizmann Institute of ScienceProf. D. Lancet, Prof. I. Pilpel
GECCO 2010 organisers for inviting this tutorial.
You for listening!
182
/183
Any Questions?
www.synbiont.orgBecome a member and have access to a largeinternational community of Synthetic Biologists