Using Sequence Using Sequence Information Into Protein Information Into Protein Docking Procedure Docking Procedure
Dec 30, 2015
Using Sequence Information Using Sequence Information Into Protein Docking ProcedureInto Protein Docking Procedure
What did we want to doWhat did we want to do ? ?
Why did we Why did we want to do that ?want to do that ?
How did we want to do thatHow did we want to do that ? ?
What did we want to do?What did we want to do?
Incorporate sequence and experimental information into protein-protein or protein-ligand docking procedure
Test the method by treating the case when two proteins are known to bind and 3D modells are available for both binding partners
Why did we want to do that?Why did we want to do that?
Importance of protein-protein interaction in cellular processes
We therefore need accurate tools to predict such events
Methods exist with attempt to predict protein-protein docking
Base on :
• Shape complementarity (Shoichet & Kuntz, 1996; Janin et al., 1995)
• Surface match (Helmer-Citterich &Tramontano, 1994; Walls & Sternberg, 1992)
• Electrostatic (Gabdoulline & Wade, 1998; Vijayakumar et al., 1998)
• Combination of some of the above strategy (Gabb et al., 1997)
Sequence and structural data are actually on increase
The need for combining sequence and structural information to:
To predict protein-protein/ligand docking is urgent
Improve existing methods
Generate new approaches
How did we want to do that?How did we want to do that?
Define type of protein sequence to use
Find out conserved residues in that protein family or subfamily
Find out experimental and structural information available in the literatures
Combine sequence and experimental Information to select residues to use to define
distance constraints in the sdabf program (gabdoulline and Wade, 2002)
Hope to make sampling more faster comparing to not using sequence and experimental information
Hope to have correct docked structure and avoidfalse positives when sequence and experimentalinformation are used
The WW domainThe WW domainDefinition: The WW domain is a protein-protein interaction module
compose of 35-40 amino acids. It has 3 anti-parallel beta-sheet, and is stable in the absence of disulfide bonds, cofactor or ligands
W34
W11
Y23
F25
P37
The domain binds proline-rich or proline containing ligands
it is evolutionary well conserved andpresent in plants, yeast, worm, fly and vertebrates
Classification of WW domainsClassification of WW domains
Group IYAP65, Nedd4, Dystrophin
Group IIFormin Binding Proteins, FE65
Group IIIFormin Binding Proteins
Group IVEss1/Pin1
Group VNpw38/PQBP-1
PPPPPPL/RP
Phospho-(S/T)P
PPxY
(PxxGMxPP)N
Rx(x)PPGPPPxR
PEBP2 transcriptional activator, ENaC sodium channel, beta-dystroglycan
Formin, Mena, Bat2
Splicing factors: SmB, SmB', U1C
RNA Pol II, Cdc25C, p53
NpwBP
Consensus sequence of the ligand Representative ligandsGroups/representatives
http://www.bork.embl-heidelberg.de/Modules/ww_classes.html
Function of the WW domainFunction of the WW domain
Variety of target
Therefore involve in variety of cellular processes such as:
- Co-activation of transcription and modulation of RNA pol II
- Mitotic regulation (G2/M transition)
- Protein processing …
Implicate in several human diseases such as: - Muscular Dystrophy - Alzheimer’s disease - Hypertension - Cancer …
Differences between the Differences between the free and complexed free and complexed pin1pin1 WW domain WW domain
Script „pdbExtractor“
remove phosphates
Sequence alignment, experimental information
+PO3- ; -HG of SER
Script„do_all_whatif“: add the H-atoms with “Whatif”
p2.pdbp1.pdb
dephosphorylated peptide +H-atoms
coordinates of the dephosphorylated peptide
PDB-file 1f8a
file with coordinates of the phosphorylated peptide
file with coordinates of theWW-domain
Script „do-run“
Script„rxnaEditor.py“
Script „do_bf_prepare”:UHBD, ECM,mk_ds_grid
+ other parameters (e.g. the distance constraint)
sdabfcw
sdabf12.in (input-file)
start the simulation
p1e.grd, p1.echa, p1ds.grd
p2e.grd, p2.echa, p2ds.grd
fort.xx* and other output-files
p1.rxna
p2.rxna
Clustering and cluster analysis
Nmrclust, twopdb2rmsd.f, clusteranalyse.f
could sampling be speed up using sequence/experimental information?
constraints
0 1 2 3
sampling time
SDABF(grid sampling)
30.5h 4-6h 6-9h 8-13h
SDA (BD sampling)
ca 2h5-8min
1h (W34)?6min 1-2.5h
program
RMSD
free WW domain complexed WW domain
SDABF(grid sampling)
constraint 0 Y23 0 Y27
17.9
19.4
25.7
23.8
10.2
25.7
23.0
9.5
18.1
18.7
5.5
6.4
8.2
7.5
11.5
13.2
9.2
10.5
4.2
8.0
3.6
10.0
9.4
14.2
4.2
6.2
10.3
15.9
15.0
9.7
3.6
14.5
15.4
6.0
11.8
12.6
11.0
13.4
14.5
13.0
SDA(BD sampling)
18.1
17.8
18.5
24.9
19.0
18.9
18.9
17.3
25.9
26.1
9.7
12.3
7.2
7.1
13.5
9.3
5.0
14.5
12.9
12.1
2.8
2.7
2.4
3.0
3.4
3.7
2.5
3.0
3.1
3.0
2.4
3.0
2.4
2.9
2.9
3.2
2.8
2.8
3.0
2.2
program
RMSD
free WW domain complexed WW domain
SDABF(grid sampling)
constraint W34 Y23 W38 Y27
23.8
23.0
25.1
22.1
8.8
13.5
20.9
22.1
22.1
10.6
5.5
6.4
8.2
7.5
11.5
13.2
9.2
10.5
4.2
8.0
3.6
14.2
4.2
10.3
15.9
15.0
9.7
10.3
11.4
15.4
3.6
14.5
15.4
6.0
11.8
12.6
11.0
13.4
14.5
13.0
SDA(BD sampling)
8.1
24.3
24.8
20.5
10.3
25.8
8.2
18.4
22.9
22.4
9.7
12.3
7.2
7.1
13.5
9.3
5.0
14. 5
12.9
12.1
2.4
3.0
3.0
3.6
2.8
3.4
4.6
3.2
2.8
5.2
2.4
3.0
2.4
2.9
2.9
3.2
2.8
2.8
3.0
2.2
Program
RMSD
free WW domain complexed WW domain
SDABF(grid sampling)
Constraint R17Y23 W34Y23* R21Y27 W38Y27
7.471
11.078
12.859
10.838
13.795
11.513
12.543
14.428
13.468
12.956
8.794
8.149
10.636
8.606
10.505
8.792
7.386
8.357
11.018
8.497
17.814
10.815
13.899
16.749
17.944
17.677
17.006
13.339
18.032
16.458
3.474
10.340
15.363
14.461
11.369
11.184
6.723
11.958
10.046
12.838
SDA(BD sampling)
7.210
6.573
10.384
6.314
9.314
10.434
16.834
8.327
10.045
16.688
*Independent distance = 6
RMSD
program R17Y23W34 R21Y27W38
SDABF
17,737
17,809
4,221
4,192
4,628
3,072
6,713
15,895
4,935
4,412
4,510
6,674
11,710
15,191
15,835
13,062
13,853
10,941
11,070
11,476
SDA
6,214
17,543
4,241
2,430
4,945
3,669
15,556
5,661
4,266
3,063
2,808
2,749
2,362
3,041
2,364
3,401
3,504
3,656
2,455
3,342
ENERGY
program R17Y23W34 R21Y27W38
SDABF
-6,4284
-6,4263
-6,1933
-6,1869
-6,1372
-6,0129
-5,9385
-5,6976
-5,6498
-5,6457
-13,523
-11,443
-10,171
-9,4154
-8,7401
-8,484
-7,9132
-7,5226
-7,3882
-6,8053
SDA
-6,31
-5,98
-5,92
-5,8
-5,78
-5,76
-5,75
-5,74
-5,71
-5,69
-16,3
-16,2
-16,0
-15,9
-15,9
-15,8
-15,8
-15,7
-15,7
-15,7
ConclusionConclusion
RMSD
free WW domain
constraint 0 W34 Y23 R17Y23 W34Y23 R17Y23W34
SDABF(grid sampling)
17.8759.452
23.8328.794
5.4914.221
7.471 8.7947.4
17,7373,072
SDA(BD sampling)
18.13817.297
8.117 9.7145.021
7.2106.314
6,2142,430
RMSD
docked WW domain
constraint 0 W38 Y27 R21Y27 W38Y27 R21Y27W38
SDABF(grid sampling)
3.586 3.586 3.611 17.814
10.815
3.474 4,510
SDA(BD sampling)
2.8082.455
2.362 2.3622.249
2,8082,362
RUN TIME
constraint 0 1 2 3
SDABF 30.5h 4-6h 6h40min 8-13h
SDAca 2h 5-8min
1h (W34)?
6min 1-2.5h