Top Banner
16

Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

Jan 13, 2016

Download

Documents

Charla Richard
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.
Page 2: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

Domains or not domains?

ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin

Howard Hughes Medical Institute, Department of Biochemistry,

University of Texas Southwestern

Medical Center at Dallas

http://prodata.swmed.edu/CASP8

Page 3: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

Traditionally, CASP targets are evaluated as domains, i.e. each target structure is parsed into domains, and model quality is computed for each

domain separately. This strategy makes sense for two reasons:

Domains can be mobile and their relative packing can be influenced by ligand presence, crystal packing for X-ray structures, or be semi-random in NMR structures. Thus even a perfect prediction algorithm will not be able to cope with this adequately, for instance in the absence of knowledge about the ligand presence or crystal symmetry.

Predictions may be better or worse for individual domains than for their assembly. This happens when domains are of a different predictability, e.g. one has a close template, but the other one does not. Even if domains of a target are of equal prediction difficulty, it is possible that the mutual domain arrangement in the target structure, while predictable in principle, differs from the template, and thus is modeled incorrectly by predictors.

Comparison of the whole-chain evaluation with the domain-based evaluation dissects the problem of 'individual domain' vs. 'domain

assembly' modeling and should help in development of prediction methods.

Why domains?

Page 4: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

Evolutionary domains: correspond to structurally compact evolutionary modules.

How domains?

Ago protein:T0487

consist of 5 domains

http://prodata.swmed.edu/CASP8/evaluation/DomainDefinition.htm

Page 5: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

Do we need domains?

122 targets, 176 evolutionary domains, do we need that many?

Server predictions helps us to reduce the number of domains:

if whole chain prediction quality is not much different from domain prediction quality, domain evaluation is not necessary.

GDT-TS(whole chain) VS.

Σi=1

Number of domains

Σi=1

Number of domains

Length(domain i) * GDT-TS(domain i)

Length(domain i)

http://prodata.swmed.edu/CASP8/evaluation/Domains.htm

Page 6: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

Correlation between weighted by the number of residues sum of GDT-TS scores for domain-based evaluation (y, vertical axis) and whole chain GDT-TS (x, horizontal axis).

T0490: correlation between whole chain and domain predictions

Page 7: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

Each point represents first server model. Green, gray and black points are top 10, bottom 25% and the rest of prediction models. Blue line is the best-fit slope line (intersection 0) to the top 10 server models. Red line is the diagonal.

Two parameters to describe correlation between whole chain and domain predictions

1. The root mean square (RMS) difference between the weighted sum of GDT_TS on domains and GDT_TS on the whole chain (RMS of y−x) measures absolute GDT-TS difference.

2. A slope of best-fit line with intercept set to 0 (slope) measures relative GDT-TS difference.

These parameters are computed on top 10 (according to the weighted sum) predictions

Page 8: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

Correlation between weighted by the number of residues sum of GDT-TS scores for domain-based evaluation (y, vertical axis) and whole chain GDT-TS (x, horizontal axis).

T0504 needs domain evaluation

Page 9: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

Correlation between weighted by the number of residues sum of GDT-TS scores for domain-based evaluation (y, vertical axis) and whole chain GDT-TS (x, horizontal axis).

T0447 does not need domain evaluation

Page 10: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

                                                                                     Ribbon diagram of 459: 3df8 chain A (rainbow) with its symmetry mate (white).

Domain swaps!                                                                                      5 out of 122 targets (4% !!!!) exhibit domain swaps, e.g.

Page 11: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

Ribbon diagram of 459: 3df8 chain A with a swapped N-terminal β-hairpin from its symmetry mate chain (rainbow) and the swapped hairpin symmetry mate chain (white).

Swapped domain in T0459

Page 12: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

Correlation plots for the two domain definitions (swapped and swapped segment removed) of this single-domain target reveal differences

whole chain 459: 3df8 chain ADomain-swapped 459: 3df8

chain B*:-2-22 plus chain A:23-106459 with domain-swapped segment removed: 3df8 chain A:23-106

Page 13: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

All targets: Correlation between RMS of the difference between GDT_TS on domains and GDT_TS on the whole chain (vertical axis) and the slope of the best-fit line (horizontal axis), both computed on top 10 server predictions.

Page 14: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

All targets: Correlation between RMS of the difference between GDT_TS on domains and GDT_TS on the whole chain (vertical axis) and the slope of the best-fit line (horizontal axis), both computed on top 10 server predictions.

Page 15: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

Summary:

Comparison of domain-based predictions with whole chain predictions revealed a natural, data-dictated cutoff (slope of the zero intercept best-fit line is above 1.3) to select targets that require domain-based evaluation. These 17 targets are:

T0397, T0405, T0407, T0409, T0416, T0419, T0429, T0443, T0457, T0462, T0472, T0478,

T0487, T0496, T0501, T0504, T0510.

Predictions for other targets follow the general trend, are of a more similar quality for 'domain' and 'whole chain' and thus domain-based evaluation may not be necessary for them.

Page 16: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.

Acknowledgement

Our group Collaborators

HHMI, NIH, UTSW,The Welch Foundation

Shuoyong Shi Jing TongRuslan Sadreyev Lisa KinchJimin Pei Ming TangSasha Safronova Yuan QiHua Cheng Jamie WrablIndraneel Majumdar Erik NelsonYong Wang S. Sri KrishnaBong-Hyun Kim Dorothee Staber

David Baker U. WashingtonKimmen Sjölander UC BerkeleyWilliam Noble U. Washington