The Inverse Protein The Inverse Protein Folding Problem* Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 int work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya, X. H anada-China Industrial Workshop, 2005Hong Kong Baptist University
44
Embed
The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The Inverse Protein Folding The Inverse Protein Folding Problem*Problem*Arvind Gupta
Simon Fraser UniversityMay 24, 2005
*Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya, X. Huang
Canada-China Industrial Workshop, 2005 Hong Kong Baptist University
OutlineOutline• Background
• Forces in Protein Folding
• Hydrophobic-Polar Model
• Protein Databank
• Determining Attributes of the Ideal Lattice
• Future Steps
DNA• Genetic code• A “string” of nucleotides over A C G T• Code for all proteins• Self-replicating
Proteins
• A “string” over 20 amino acids• In solvent will fold into a unique 3D spatial
structure with minimal energy
Protein Structure
• Structure determines protein function.• Proteins normally are in an aqueous environment• Proteins are globular.
Proteins in the body
• Proteins are involved in all processes in the body:
Insulin
Hemoglobin
Proteins and diseases
M. Thorpe, Protein Folding, HIV and Drug Design, Physics and Technology Forefronts (2003).
Forward Protein Folding ProblemForward Protein Folding Problem
• Identify the protein structure for a specific amino acid sequence.
MAGWTRLS..
• Central open problem in biology• NP-hard under most models
Inverse Protein Folding ProblemInverse Protein Folding Problem• Given a structure (or a functionality) identify an
amino acid sequence whose fold will be that structure (exhibit that functionality).
• Crucial problem in drug design.• NP-hard under most models.
Forces acting on ProteinsForces acting on Proteins• Hydrogen Bonding
• Van der Waals interactions
• Ion pairing
• Disulfide bonds
• Intrinsic properties
(conformational preference)
• Hydrophobicity: the dominant
force in protein folding (Dill, 1990)
Hydro (water) philic (loving)phobic
(fearing)
Hydrophobic InteractionsHydrophobic Interactions
• Each amino acid can be classified as either hydrophobic or hydrophilic (polar)
• Hydrophobic [Polar] are in a higher [lower] energy state in an aqueous environment.
Hydrophobic – Polar (HP) ModelHydrophobic – Polar (HP) Model
• Introduced by Dill (1985) and Chan (1985)• “0” for polar; “1” for hydrophobic• Protein sequence embedded on lattice• Each amino acid in exactly one cell• Interactions across adjacent cells• Empty lattice cells contain water• Given protein maximize hydrophobic interactions
(native fold).• IE: Given 0-1 string embed onto a lattice,
• Problem: For a given shape find a protein (amino acid string) with a native fold approximating the shape.
• Example.
Constructible structuresConstructible structures
Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S.
• Proof by induction:– Base case:
p(S)=010010010010
Constructible structuresConstructible structures
Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S.
• Proof by induction:– Inductive case:
Constructible structuresConstructible structures
Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S.
• Proof by induction:– Inductive case:
Constructible structuresConstructible structures
Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S.
• Proof:– Folds are saturated: every hydrophobic “1” is involved
in two hydrophobic interactions– saturated implies native
Stability of proteinsStability of proteins
Together 82 native folds!
• Proteins is stable if it has unique “native fold” (fold with minimal energy).
• Most natural proteins are stable.• The protein in our example is not stable:
Stability of proteinsStability of proteins
Conjecture: For any constructible structure S, the protein p(S) is stable.
• Tested for >20,000 constructible structures.• Mathematically proved for two simple infinite
classes of constructible structures L0 and L1.
L0: L1:
Boundary squaresBoundary squares
• Diagonal frame: the smallest diagonal rectangle containing all hydrophobic “1”-s.
• Boundary square: hydrophobic “1” lying on the border of diagonal frame.
5 boundary squares
Boundary squaresBoundary squares• Useful to find the last tile of constructible
structure.• A saturated fold has at least 4 of them.
Lemma. Let p=0{0,1}*0 be a protein string not containing 11, 000 and 10101 as a substring. For every saturated fold of p, each boundary square not adjacent to a terminal is the main square of a corner-closed core.
Proof for LProof for L00 structures structures• Take a saturated fold for p(S), L0.
• It has at least 4 boundary squares, and at least 2 not adjacent to a terminal (the first or the last amino acid).
• By Lemma, each is contained in a corner-closed core, i.e., is a red 1 of substring 1001001 of the protein string.
• In p(S)=0(10010)n(01001)n0, there are only two occurrences of substring 1001001, and they are overlapping.
• Hence, cores match each other and form a fully-closed core (closed on 3 sides) - the last tile.
• Cut the last tile and apply induction.
LL11 structures are more complex structures are more complex• p(S)=0(10010)n010(10010)m(01001)m01(01001)n-10
• p(S) contains one occurrence of substring 10101 (Lemma cannot be directly applied) and three occurrences of 1001001 (two corner-closed cores does not imply a fully-closed core).
Choosing a LatticeChoosing a Lattice• 2D is easier
Fewer options for combinatorial case analysisMore visually intuitiveTorsion angles describe protein mainchain
• 3D is more relevantMore biologically relevantMore representative of actual protein
structuresDirectly applicable to known protein structures
Protein Data Bank (PDB)
• Worldwide repository for
3-D biological macromolecular structure data• Contains 30857 known protein structures (May17,2005)
• Structures derived using different techniques– Nuclear Magnetic Resonance spectroscopy– X-ray crystallography
• PDB ‘known structures’ are really models of the structure of a protein
RMS comparison of latticesRMS comparison of latticesc-RMS d-RMS a-RMS
Truncated Octahedron
5.3053 3.2479 13.0982
Hexagonal Prism 3.8704 2.4312 10.0313
Truncated Tetrahedron
3.6913 2.4133 19.9030
Simple Cubic 3.1123 2.1081 21.1005
Cubeoctahedron 2.5581 1.7427 8.3526
FCC 1.8212 1.4369 8.3346
S+FCC 2.1791 1.5819 6.2022
e-FCC 1.5385 1.1048 2.5700
Angle comparison of latticesAngle comparison of lattices
LatticeTrunc. octahedron
Hexagonal prism
Trunc. tetrahedron
Cubic
Cubocta-hedron
FCC S+FCC e-FCC
Degree 4 5 6 6 8 12 18 42
Closeness to 90
20 18 42 18 30 30 28.82 31.40
Closeness to 120
10 24 36 36 34.29 32.73 36.47 38.72
Future
1. Investigate candidate lattices to determine an ideal lattice for inverse protein folding
2. Mathematically prove that the ideal lattice can generate stable sequences for specified protein shapes within the HP model
3. Attempt to assign specific amino acids to lattice sites
Future4. Investigate protein sequences generated
by the model for stability and folding properties.
5. Incorporate other protein folding forces– Hydrogen Bonding– Van der Waals interactions – Intrinsic properties (conformational preference)– Ion pairing– Disulfide bonds