1 Sample Protein Functional Peggy Yao, Jean-Claude Latombe Biomedical Informatics, Compute Science Stanford University
1
Sample Protein Functional
Peggy Yao, Jean-Claude LatombeBiomedical Informatics, Compute ScienceStanford University
2
Conformations
3
Motivation (I)
ProteinsThe major molecules that carry out our metabolic activities.Function by interact with other molecules.
The interaction is largely determined by structures.
DrugsSmall molecules that inhibit or facilitate the interactions between proteins and some specific molecules.
Computer-aided drug designDesign drugs based on protein structural modeling.
4
Motivation (II)
Protein structures are dynamicConformation selection theory
Experimental techniques can only provide very few conformations,which are not necessarily functional conformations.
Need computational methods
ProteinFunctional
ConformationSampling
+
5
Problem Definition
Sampling protein functional conformations
Entire conformation space
Folded states
Functional (ligand/ion-binding) states
Input: one folded state conformation
Output: one or more functional conformations
General approach: sample protein conformations, and use available function-prediction or ligand-docking methods to check the conformations.
6
Challenges and Observations
Lots of variable elements and constraintsVariable elements
Hundreds of atom positions, bond lengths, bond angles, dihedral angles, etc.
ConstraintsHundreds of bondsNo steric clash
ObservationsNot all variable elements are truly variable
For example, helices and sheets.
…
…
…D1
Di
Dj
Dk
Dn
7
Research Framework
Protein Structure Model
SamplingMethod
Applications:Protein-ligand
InteractionStudy
Bond Constraints
Model
RigidityAnalysis
Aim 1:
To develop a good way to model bonds as constraints to facilitate efficient exploration of the conformation space.
Aim 2:
To develop a sampling approach to reach the functional states efficiently.
Aim 3:
To apply the sampling method to function-prediction or ligand docking.
8
Linkage Model
Variable elements: Dihedral angles
Assumptions:Fixed bond lengthsFixed bond angles
N
Cα
C
N
O
C
HCβ
H
H
HH Alanine
φ ψ
χ
… …
9
Rigidity Analysis
Model bonds as Distance Constraint Graph.3D Pebble Game
An algorithm to identify rigid regions, over-constrained regions, and collective motion regions.
C
N
O
… Cα
C
NH …
C
N
Cα
C
N
O
φ
ψ
…
OO
H
H
H
10
Bond Constraint Model
Question: what are the bonds shall we model as distance constraints?
Bond typesCovalent bonds
Strong and stableNon-covalent bonds
Hydrogen bondsHydrophobic interactionsMany, weak, and dynamic
BondSet
Essential bonds to allow the conformational change
model
11
Hydrogen Bond Selection
Learn H-bond stability from MD (Molecular Dynamics) simulation
Stability measurement: P(presence)P(presence) vs. Energy Decision-tree
P(presence) vs. H-bond Energy (1EIA, Amber03, MD1)
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
<=-7.8 >-7.6 >-7.2 >-6.8 >-6.4 >-6 >-5.6 >-5.2 >-4.8 >-4.4 >-4 >-3.6 >-3.2 >-2.8 >-2.4 >-2 >-1.6 >-1.2 >-0.8 >-0.4 >0
H-bond Energy interval (kcal/mol)
P(p
rese
nce)
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
Per
cent
age
P(presnce) percentage
Angle_D_A_AA
SSE_typeChain_type>129
Range0.46
SS
0.320.59
MA,DA,DL
0.86 0.17
>39
12
Conformation Sampling
Goal: Start from a folded state conformation, efficiently sample the valid conformation space until obtain a functional conformation.Assume there exists a software which can recognize the conformation at the functional state.
13
Functional
Randomly-Guided Conformation Sampling Tree
R4
R9
R2
R1
S0
S1
S4S3
S2
While (not reaching the goal):
1. Generate a random structure.
2. Find the node closest to the random structure, say node i.
3. Identify all H-bonds in node i.
4. Select a subset of H-bonds to be constraints, together with all covalent bonds.
5. Linear-interpolate i to the random structure for 100 steps while maintaining all rigid bodies.
6. Insert the new node into the tree as node i+1.
R8
R7R6
R5
S5
S6 S7
S8
S9
R3
14
Preliminary Results
Catabolite Gene Activator Protein (1G6N)200 amino acids => more than 800 total DOFsGenerated a tree with 100 nodes.
Green: initial conformation
Cyan: goal conformation
Magenta: best achieved