Computational Protein Design 3. Applications of Computational Protein Design Pablo Carbonell [email protected]iSSB, Institute of Systems and Synthetic Biology Genopole, University d’Évry-Val d’Essonne, France mSSB: December 2010 Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 1 / 58
58
Embed
Computational Protein Design. 3. Applications in Systems and Synthetic Biology
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Computational Protein Design3. Applications of Computational Protein Design
Generating cellular interactionnetworks : the structuralinteractome
The Structural Interactome
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 4 / 58
Applications of CPD in Synthetic Biology
Engineering signal transduction: modifying the specificity and specificity ofreceptors
Engineering genetic networksModifying transcriptionTargeting gene repair and modification
Novel biosensorsMinimal cells and synthetic genomesMetabolic pathway engineering
Feedback loops design and sensitivity analysis
Programmable switches: allosteric, epigenetic, riboswitchesConditionally delivery of drugsModulation of signal transduction pathwaysInhibition of protein functionAdoption of a toxic conformation
Cell-cell communication
Orthogonal genes
Mathematical dynamical models
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 5 / 58
Outline
1 Applications in Systems and Synthetic Biology
2 Protein Affinity Enhancement
3 Protein Modular Design
4 Protein Promiscuity Reengineering
5 Conclusions
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 6 / 58
Antibody-Antigen Interactions
Antibodies are gamma globulin proteins foundin the immune system of vertebrates
Basic structural units:Two large heavy chains (VH )Two small light chains (VL)
The Fab region or fragment antigen-binding isa region of an antibody that binds to antigens
The Fc region or fragment crystallizable regionis the tail region that interact with cell surfacereceptors
The FV region : variable domain
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 7 / 58
The Variable Domain FV
The variable domain is the most importantregion for binding to antigens
The FV contains3 variable loops of β-strands on the light chainVL3 variable loops of β-strands on the heavy chainVH
These loops are referred to as thecomplementarity determining regions (CDRs)
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 8 / 58
In Silico Design of Immunodiagnostics Assays for Anti TNF-α
Tumor necrosis factor-alpha (TNF-α), a cytokine involved in systemic inflammation,can induce several cell responses depending on the cellular context:
activation of NF-κβ-mediated proliferative programsprogrammed cell death.
The early detection of innusual concentrations of TNF-α is a diagnosticbiomarker of inflammation conditions such as metabolic disorders (obesity),rheumatoid, tuberculosis, and cancer diseases.
Moreover, the use of anti-TNF-α inhibitors have appeared in recent years as a newtherapeutic approach for inflammatory immune-mediated diseases.
The currently used TNF-α inhibitory molecules are antibodies or soluble TNFreceptors which sequester TNF-α.
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 9 / 58
Computational Protein Affinity Design for Anti TNF-α Antibodies
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 10 / 58
Building the Model
No crystal structure available of theTNF-α antibody-antigen complex
Therefore, our first step is to build amodel of the complex throughstructural homology and docking
TNF-α trimer
Anti-TNF-α model from Swiss-Model
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 11 / 58
Docking and Scoring
Using zDock (Accelrys Inc.) for thegeneration of docked complexes
Fast Fourier Transform based proteindocking program.The top 2000 ranked predictions arereturned.
Scoring the complexes through theuse of FastContact
Contact binding free energy scoringtool for protein-protein complexstructuresThe estimates are based on rigidbodies
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 12 / 58
Hot-spots and Energy Minimization
Predicting hot-spotsBy using Foldx , we performed an in silico alaninescanning in order to predict consensus hot-spots forthe models.These hot-spots were experimentally verified in thelaboratory by the experimental group.
3 initial models were selected based on differentcriteria:
minimum predicted binding energy in FastContacthighest coverage of known hot-spots in anti-TNF-α.
Energy was then minimized for the complexes byusing Discovery Studio (Accelrys Inc.).
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 13 / 58
In Silico Combinatorial Library
In silico combinatorial libraries of mutants around thecomplementary determining regions (CDR) were built asfollows:
Models for single-mutation variants were computedthrough through the use of Biopolymer and Builder(Accelrys Inc.) for rotamer selection and side chainpositioning
Mutants were then submitted to a cluster machine of64× 4-core nodes for local energy minimization of theCDRs by using gromacs
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 14 / 58
Virtual Screening
The most beneficial mutations were selected in order to build a combinatoriallibrary of double and triple mutants.
Variants with the lowest predicted binding affinity were shortlisted and comparedwith beneficial mutations observed in the literature
5 organisms: E. coli, S. cerevisiae, C. elegans D.melanogaster, H. sapiens
Binding site clustering :
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 18 / 58
Graph Modular Decomposition
Domains can be decomposed furtherinto connectivity modules byclustering the domain contact mapG(V ,E ,C)
Girvan-Newman algorithm [PNAS(2002)] with maximum modularity stoprule [Kashtan and Alon, PNAS (2005)]:
1 The betweenness of all existing edgesin the network is calculated first.Edge betweenness : the number ofshortest paths between pairs of nodesthat run along the edge
2 The edge with the highestbetweenness is removed
3 The betweenness of all edgesaffected by the removal is recalculated
4 Repeat 2 and 3 until the modularity Qfor the K connected clusters in thenetwork becomes maximum
Q =KX
s=1
"lsL−„
ds
2L
«2#
(1)
ls = number of edges between nodes in module s
ds = sum of node degrees in module s
L = total number of edges in the network
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 19 / 58
Modularity
Modularity Qs is a measure of how tightly members of a module s interact
Qs =lsL−„
ds
2L
«2
(2)
ls = number of edges between nodes in module sds = sum of node degrees in module sL = total number of edges in the network
lsL : fraction of edges in the network that connect vertices in the module s` ds
2L
´2: the expected value of the same quantity if edges fall at random
l̂s =ds
2ps =
ds
2ds/2
L(3)
ps : probability of an edge to connect nodes in module s
In a randomly partitioned network, the expected modularity is Q̂s = 0
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 20 / 58
Biding Site and Modular Overlaps
Modular composition of binding site j :
mj = (mj1,mj2, . . . ,mjM ) (4)
Similarity in modular compoisitionbetween binding sites i and j :
M(i, j) =
PMk=1 mik mjk
|mi||mj |(5)
Relative interface between i ad j :
C(i, j) =12
»ni
Ni+
nj
Nj
–(6)
ni (nj ) : number of residues in i (j) withcontacts in j (i)Ni (Nj ): number of residues in bindingsite i (j)
Kringle domain (PF00051)
Binding site A (blue)
Binding site B (red)
C(A, B) =1
2
4
10+
3
8
!(7)
M(A, B) =(2, 8, 0, 0, 0) · (0, 2, 3, 3, 0)T
√68√
23(8)
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 21 / 58
The Modular Organization of Domain-Domain Interfaces
Non-overlapping binding sitesare assigned to differentmodules
Modules with high modularityQ contain a significantpercentage of binding siteregions
[Del Sol, Carbonell, PLOS Comp. Biology, (2007)]
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 22 / 58
Using Modularity to Identify Binding Regions
Modularity can be used toidentify binding surfaces
Accuracy and coverage ofmodularity and surfacehydrophobic patches aregreater than residueconservationCombining modularity withthe other two methodsimproves notably theperformance
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 23 / 58
Intra-Module Cooperativity and Inter-Module Independence
Human IL-4: a cytokine that plays aregulatory role in the immune system
IL-4 contains 3 energeticallyindependent clusters of hot-spotslocated in 3 modules
These hot-spots can be used togenerate binding affinity andspecificity
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 24 / 58
Intra-Module Cooperativity and Inter-Module Independence
TEM1 β-lactamase confers antibioticresistance to E. coli
This enzyme is inhibited by BLIP
A mutagenesis study showed thatthere are 2 hot-spot clusters which areenergetically independentThese clusters are located in differentmodules
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 25 / 58
Intra-Module Cooperativity and Inter-Module Independence
RI (ribonuclease inhibitor). Hot-spots located in different
modules are known to be independent
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 26 / 58
Modularity as a Measure of Residue Cooperativity
Protein domains can be decomposed into a set of modules that contain groups ofspecialized residuesBinding sites are usually located in highly cooperative modulesModularity, combined with sequence conservation and surface patches, can beused to predict functional regions
This modular architecture confers robustness to protein structures andcontributes to the determination of binding affinity and specificity
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 27 / 58
Energetic Determinants of Protein Binding Affinity
The modular decomposition of proteinstructures is a structural characterization ofprotein interactions
In order to know more about the interplaybetween binding affinity and specificity, it isnecessary a thermodynamicscharacterizationWe focus in this study on one specificinteractome: the yeast interactome (mainsource: MIPS)
Structural interactome: for 259 hubs(>5 partners) participating in 877 differentinteractions
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 28 / 58
Binding Site Clustering
Single and multiple interfacesBinding sites correspond to residues interacting with the partner at a distance≤ 5 ÅBinding sites are mapped into the reference sequence of the hub and clustered byusing a version of the algorithm in Teyra et al. [2008]
1 Compute the N × N binary distance matrix D where
D(i, j) = δij
1 i ∩ j 6= ∅0 i ∩ j = ∅ (9)
2 Start with k = N clusters3 Compute the {k − 1}-means clustering of D4 Recompute D for the k − 1 clusters5 Repeat step 3 while all binding sites within clusters overlap
Total interfaces: 539, involved in 1 to 5 interactions
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 29 / 58
Protein Binding Affinity and Specificity
Binding energies and alanine scanning for each complex estimated using FoldX[Schymkowitz et al., 2005]
Specific binding sites tend to bind their partners with higher affinity thanpromiscuous sites
Interactions between promiscuous binding sites tend to be weaker
In most of the cases, hot-spots are specific to one interaction. Some of them arepromiscuousAre hot-spots specific?
Binding site motifs of interacting partners are determinants of specificityAs the promiscuity of the hot-spots increases, the number of common motifs in thepartners increaseA common evolutionary origin of divergent partners in promiscuous binding
Number of interac-tions in hot-spots
Average number of commonmotifs interacting with hot-spots
1 1.42 2.53 3.04 4.0
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 31 / 58
Hot-spots Modular Distribution and Specificity
We have shown already examples of energetic independence of hot-spots inmodulesFurthermore, the relative number of binding site modules containing hot-spotsincreases with the number of partnersA small part of hot-spots participate in more than one interaction, probably actingas binding site anchors
[ Carbonell, Nussinov, Del Sol, Proteomics, 2009]
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 32 / 58
Modular Distribution of Hot-spots and Specificity
Ubiquitin. A promiscuous protein with weak interactions
cdc42 GTPase. It contains a central module acting as a site
anchor
Cytochrome b. An example of a specific binding site
Calmoduline-dependent kinase. An example of a specific
binding site
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 33 / 58
The Role of Thermodynamics in Promiscuous Binding
In general, protein-protein interactions involving promiscuous binding sites areweakerProteins generally interact with partners with a similar degree of promiscuityHot-spots in promiscuous binding sites tend to be more distributed over differentmodules
Knowing the modular distribution of hot-spots involved in different interactionsmight allow us to rationally modify binding specificity and affinity
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 34 / 58
Large-scale Analysis Workflow
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 35 / 58
Outline
1 Applications in Systems and Synthetic Biology
2 Protein Affinity Enhancement
3 Protein Modular Design
4 Protein Promiscuity Reengineering
5 Conclusions
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 36 / 58
Applications in Synthetic Biology: Design of Metabolic Pathways
The Bio-RetroSynth project
ANR Chair d’Excellence, Faulon’s Lab
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 37 / 58
Tasks in the Bio-RetroSynth project
Bioretrosynthesis. Graphs for heterologous compounds production in E. coli
Computational protein design. Machine learning to mine genomic databases forpredicting protein function
Pathway design. Rank pathways to select the best to engineer
Quantitative Structure-Activity Relationship (QSAR) for enzyme activity andinhibition based on experimental databases and toxicity assays.
Metabolic engineering. E. coli plasmids in order to construct combinatoriallibraries of highest rank heterologous pathways found to produce a target product
Engineering optimization. Flux Balance Analysis (FBA) and non-linearoptimization methods to maximize target yield
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 38 / 58
The Signature Reaction Space σ(R)
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 39 / 58
Examples of Retrosynthesis Graphs in the Reaction Signature Space
RetroPath : an online-toolfor retrosynthesis search ofmetabolic pathways
[D. Fichera, P. Carbonell, J.L. Faulon, Predicting
heterologous compound-forming reaction pathways
through retrosynthesis hypergraphs, in preparation]
Penicillin (antibiotic) Galantamine (treatment of Alzeihmer’s disease)
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 40 / 58
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 41 / 58
Ranking Pathways
Gene heterogeneity
Heterologous gene expression
Enzyme performance for the specified reaction
Compound toxicity
Estimation of nominal fluxes
Consistency of the predicted phenotype
C(p) =X
genes(p)
0@ 1perf (gene)
+ het(gene) +X
prod(gene)
tox(prod)
1A+1
flux(10)
p∗ = arg minp
C(p) (11)
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 42 / 58
Predicting Compound Toxicity
MIC (IC50) assays in E. coli for commercial chemical compounds, includingantibioticsMolecular signature-based QSAR model
[A.G. Planson, E. Paillard, F. Vogliolo, P. Carbonell, J.L. Faulon, unpublished]
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 43 / 58
Enzyme Performance
Putative reactions R∗ discovered in the signature space hσ(R) by theretrosynthesis algorithm often lack annotated enzyme sequences in databases
A protein design procedure has to be implemented in order to identify the bestheterologous enzyme sequence candidate to insert
Conceptually, the idea is to definea metric in the reaction σ(R) andsequence σ(S) signature spacesa convolution operation * betweenboth spaces that generates the kernelfunction k((R1,S1), (R2,S2))a machine-learning algorithm
In practical terms, we are searching in the sequence space S for enzymes with aputative level of promiscuity for the desired reaction R∗
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 44 / 58
Taking Advantage of Enzyme Promiscuity in Protein Engineering
Enzymes can potentially process multiple substrates or reactionsWe can study enzyme promiscuity to enhance enzyme efficiency by proteinengineering techniquesEnzyme promiscuity is an intermediate step in directed evolution
[Tracewell and Arnold, 2009]
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 45 / 58
A Quantitive Definition of Enzyme Promiscuity
Definitions
Enzyme multispecificity: the ability of enzymes to transform a broad range of closelyrelated substrates
Promiscuous function: enzyme activities other than the native one
Using reaction signatures to measure promiscuity :
An enzyme is promiscuous if catalyzes at least 2 reactions with differentsignatures
Reaction chemical diversity for reactions RA and RB at height h:
iSSB, Institute of Systems and Synthetic BiologyGenopole, University d’Évry-Val d’Essonne, France
mSSB: December 2010
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 57 / 58
Bibliography I
S. C. Rothman and J. F. Kirsch. How does an enzyme evolved in vitro compare to naturally occurring homologs possessing the targeted function? Tyrosineaminotransferase from aspartate aminotransferase. Journal of molecular biology, 327(3):593–608, March 2003. ISSN 0022-2836. URLhttp://view.ncbi.nlm.nih.gov/pubmed/12634055.
Joost Schymkowitz, Jesper Borg, Francois Stricher, Robby Nys, Frederic Rousseau, and Luis Serrano. The FoldX web server: an online force field. Nucleicacids research, 33(Web Server issue), July 2005. ISSN 1362-4962. doi: 10.1093/nar/gki387. URL http://dx.doi.org/10.1093/nar/gki387.
Joan Teyra, Maciej Paszkowski-Rogacz, Gerd Anders, and M. Teresa Pisabarro. SCOWLP classification: structural comparison and analysis of proteinbinding regions. BMC bioinformatics, 9:9+, January 2008. ISSN 1471-2105. doi: 10.1186/1471-2105-9-9. URLhttp://dx.doi.org/10.1186/1471-2105-9-9.
Cara A. Tracewell and Frances H. Arnold. Directed enzyme evolution: climbing fitness peaks one amino acid at a time. Current opinion in chemical biology,13(1):3–9, February 2009. ISSN 1879-0402. doi: 10.1016/j.cbpa.2009.01.017. URL http://dx.doi.org/10.1016/j.cbpa.2009.01.017.
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 58 / 58