Using Sequence Information Into Protein Docking Procedure

Using Sequence Information Using Sequence Information Into Protein Docking ProcedureInto Protein Docking Procedure

What did we want to doWhat did we want to do ? ?

Why did we Why did we want to do that ?want to do that ?

How did we want to do thatHow did we want to do that ? ?

What did we want to do?What did we want to do?

Incorporate sequence and experimental information into protein-protein or protein-ligand docking procedure

Test the method by treating the case when two proteins are known to bind and 3D modells are available for both binding partners

Why did we want to do that?Why did we want to do that?

Importance of protein-protein interaction in cellular processes

We therefore need accurate tools to predict such events

Methods exist with attempt to predict protein-protein docking

Base on :

• Shape complementarity (Shoichet & Kuntz, 1996; Janin et al., 1995)

• Surface match (Helmer-Citterich &Tramontano, 1994; Walls & Sternberg, 1992)

• Electrostatic (Gabdoulline & Wade, 1998; Vijayakumar et al., 1998)

• Combination of some of the above strategy (Gabb et al., 1997)

Sequence and structural data are actually on increase

The need for combining sequence and structural information to:

To predict protein-protein/ligand docking is urgent

Improve existing methods

Generate new approaches

How did we want to do that?How did we want to do that?

Define type of protein sequence to use

Find out conserved residues in that protein family or subfamily

Find out experimental and structural information available in the literatures

Combine sequence and experimental Information to select residues to use to define

distance constraints in the sdabf program (gabdoulline and Wade, 2002)

Hope to make sampling more faster comparing to not using sequence and experimental information

Hope to have correct docked structure and avoidfalse positives when sequence and experimentalinformation are used

WW domain alignmentWW domain alignment

The WW domainThe WW domainDefinition: The WW domain is a protein-protein interaction module

compose of 35-40 amino acids. It has 3 anti-parallel beta-sheet, and is stable in the absence of disulfide bonds, cofactor or ligands

W34

W11

Y23

F25

P37

The domain binds proline-rich or proline containing ligands

it is evolutionary well conserved andpresent in plants, yeast, worm, fly and vertebrates

Classification of WW domainsClassification of WW domains

Group IYAP65, Nedd4, Dystrophin

Group IIFormin Binding Proteins, FE65

Group IIIFormin Binding Proteins

Group IVEss1/Pin1

Group VNpw38/PQBP-1

PPPPPPL/RP

Phospho-(S/T)P

PPxY

(PxxGMxPP)N

Rx(x)PPGPPPxR

PEBP2 transcriptional activator, ENaC sodium channel, beta-dystroglycan

Formin, Mena, Bat2

Splicing factors: SmB, SmB', U1C

RNA Pol II, Cdc25C, p53

NpwBP

Consensus sequence of the ligand Representative ligandsGroups/representatives

http://www.bork.embl-heidelberg.de/Modules/ww_classes.html

Function of the WW domainFunction of the WW domain

Variety of target

Therefore involve in variety of cellular processes such as:

- Co-activation of transcription and modulation of RNA pol II

- Mitotic regulation (G2/M transition)

- Protein processing …

Implicate in several human diseases such as: - Muscular Dystrophy - Alzheimer’s disease - Hypertension - Cancer …

Differences between the Differences between the free and complexed free and complexed pin1pin1 WW domain WW domain

Script „pdbExtractor“

remove phosphates

Sequence alignment, experimental information

+PO3- ; -HG of SER

Script„do_all_whatif“: add the H-atoms with “Whatif”

p2.pdbp1.pdb

dephosphorylated peptide +H-atoms

coordinates of the dephosphorylated peptide

PDB-file 1f8a

file with coordinates of the phosphorylated peptide

file with coordinates of theWW-domain

Script „do-run“

Script„rxnaEditor.py“

Script „do_bf_prepare”:UHBD, ECM,mk_ds_grid

+ other parameters (e.g. the distance constraint)

sdabfcw

sdabf12.in (input-file)

start the simulation

p1e.grd, p1.echa, p1ds.grd

p2e.grd, p2.echa, p2ds.grd

fort.xx* and other output-files

p1.rxna

p2.rxna

Clustering and cluster analysis

Nmrclust, twopdb2rmsd.f, clusteranalyse.f

could sampling be speed up using sequence/experimental information?

constraints

0 1 2 3

sampling time

SDABF(grid sampling)

30.5h 4-6h 6-9h 8-13h

SDA (BD sampling)

ca 2h5-8min

1h (W34)?6min 1-2.5h

E12= Ecoul + Edes

E12= Ecoul + Edes + Ehyd

E12= Ecoul

E12= Ecoul + Edes

E12= Ecoul + Edes + Ehyd

E12= Ecoul

ComplexCen FreeCen

Y23Y27

program

RMSD

free WW domain complexed WW domain


constraint 0 Y23 0 Y27

17.9

19.4

25.7

23.8

10.2

25.7

23.0

9.5

18.1

18.7

5.5

6.4

8.2

7.5

11.5

13.2

9.2

10.5

4.2

8.0

3.6

10.0

9.4

14.2

4.2

6.2

10.3

15.9

15.0

9.7

3.6

14.5

15.4

6.0

11.8

12.6

11.0

13.4

14.5

13.0

SDA(BD sampling)

18.1

17.8

18.5

24.9

19.0

18.9

18.9

17.3

25.9

26.1

9.7

12.3

7.2

7.1

13.5

9.3

5.0

14.5

12.9

12.1

2.8

2.7

2.4

3.0

3.4

3.7

2.5

3.0

3.1

3.0

2.4

3.0

2.4

2.9

2.9

3.2

2.8

2.8

3.0

2.2

Y27W38

Y23W34

program

RMSD



constraint W34 Y23 W38 Y27

23.8

23.0

25.1

22.1

8.8

13.5

20.9

22.1

22.1

10.6

5.5

6.4

8.2

7.5

11.5

13.2

9.2

10.5

4.2

8.0

3.6

14.2

4.2

10.3

15.9

15.0

9.7

10.3

11.4

15.4

3.6

14.5

15.4

6.0

11.8

12.6

11.0

13.4

14.5

13.0

SDA(BD sampling)

8.1

24.3

24.8

20.5

10.3

25.8

8.2

18.4

22.9

22.4

9.7

12.3

7.2

7.1

13.5

9.3

5.0

14. 5

12.9

12.1

2.4

3.0

3.0

3.6

2.8

3.4

4.6

3.2

2.8

5.2

2.4

3.0

2.4

2.9

2.9

3.2

2.8

2.8

3.0

2.2

Program

RMSD



Constraint R17Y23 W34Y23* R21Y27 W38Y27

7.471

11.078

12.859

10.838

13.795

11.513

12.543

14.428

13.468

12.956

8.794

8.149

10.636

8.606

10.505

8.792

7.386

8.357

11.018

8.497

17.814

10.815

13.899

16.749

17.944

17.677

17.006

13.339

18.032

16.458

3.474

10.340

15.363

14.461

11.369

11.184

6.723

11.958

10.046

12.838

SDA(BD sampling)

7.210

6.573

10.384

6.314

9.314

10.434

16.834

8.327

10.045

16.688

*Independent distance = 6

R17Y23W34R21Y27W38

Grid sampling

R17Y23W34R21Y27W38

BD sampling

RMSD

program R17Y23W34 R21Y27W38

SDABF

17,737

17,809

4,221

4,192

4,628

3,072

6,713

15,895

4,935

4,412

4,510

6,674

11,710

15,191

15,835

13,062

13,853

10,941

11,070

11,476

SDA

6,214

17,543

4,241

2,430

4,945

3,669

15,556

5,661

4,266

3,063

2,808

2,749

2,362

3,041

2,364

3,401

3,504

3,656

2,455

3,342

ENERGY

program R17Y23W34 R21Y27W38

SDABF

-6,4284

-6,4263

-6,1933

-6,1869

-6,1372

-6,0129

-5,9385

-5,6976

-5,6498

-5,6457

-13,523

-11,443

-10,171

-9,4154

-8,7401

-8,484

-7,9132

-7,5226

-7,3882

-6,8053

SDA

-6,31

-5,98

-5,92

-5,8

-5,78

-5,76

-5,75

-5,74

-5,71

-5,69

-16,3

-16,2

-16,0

-15,9

-15,9

-15,8

-15,8

-15,7

-15,7

-15,7

ConclusionConclusion

RMSD

free WW domain

constraint 0 W34 Y23 R17Y23 W34Y23 R17Y23W34


17.8759.452

23.8328.794

5.4914.221

7.471 8.7947.4

17,7373,072

SDA(BD sampling)

18.13817.297

8.117 9.7145.021

7.2106.314

6,2142,430

RMSD

docked WW domain

constraint 0 W38 Y27 R21Y27 W38Y27 R21Y27W38


3.586 3.586 3.611 17.814

10.815

3.474 4,510

SDA(BD sampling)

2.8082.455

2.362 2.3622.249

2,8082,362

RUN TIME

constraint 0 1 2 3

SDABF 30.5h 4-6h 6h40min 8-13h

SDAca 2h 5-8min

1h (W34)?

6min 1-2.5h

Thanks to

All the MCM -GROUP members

Special thanks to

Rebecca Wade

Razif Gabdoulline

Jan Lac and Ting Wang

Using Sequence Information Into Protein Docking Procedure

Documents