Top Banner
Prediction of protein Prediction of protein disorder disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary [email protected]
34
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Prediction of protein Prediction of protein disorderdisorder

Zsuzsanna DosztányiInstitute of Enzymology, Budapest, [email protected]

Page 2: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Protein Structure/Function Paradigm

Dominant view: 3D structure is a prerequisite for protein function

Amino acid sequence Structure Function

Page 3: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

But….

Heat stability Protease sensitivity Failed attempts to crystallize Lack of NMR signals “Weird” sequences …

Page 4: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.
Page 5: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

IDPs

Intrinsically disordered proteins/regions (IDPs/IDRs)

Do not adopt a well-defined structure in isolation under native-like conditions

Highly flexible ensembles, little secondary structure, no folded structure

Functional proteins

Page 6: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Protein disorder is prevalent

20

40

60

0

LD

R (

40<

) p

rote

in, %

kingdom

B

A

E

Page 7: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Protein disorder is important

Prion protein Prion disease

CFTR Cystic fibrosis

Alzheimer’s

-synuclein Parkinson’s

p53, BRCA1 cancer

Page 8: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Protein disorder is functional

0

20

40

60

80

regulatorysignaling

biosynthetic

metabolic

pro

tein

(%

)

30< 40< 50< 60 <

length of disordered region

Iakoucheva et al. (2002) J. Mol. Biol. 323, 573

Page 9: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

p53 tumor suppressor

TAD DBD TD RD

transactivation DNA-binding tetramerizationregulation

Wells et al. PNAS 2008; 105: 5762

Page 10: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Heterogeneity in protein disorder

Flexible loop

RC-like Compact

Transient structures

Page 11: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Modularity in proteins

Many proteins contains multiple domains

Composed of ordered and disordered segments

Average length of a PDB chain is < 300

Average length of a human proteins ~ 500

Average length of cancer-related proteins > 900

Structural properties of full length proteins …

Page 12: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Bioinformatics of protein disorder

Part 1 Databases Prediction of protein disorder

Part 2 Prediction of functional regions within IDPs

Page 13: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Datasets Ordered proteins in the PDB

over 94000 structures few 1000 folds

Some structures in the PDB classify as disordered! only adopt a well-defined structure in complex in crystals, with cofactors, proteins, …

Disorder in the PDB Missing electron density regions from the PDB

NMR structures with large structural variations

Less than 10% of all positions

Usually short (<10 residues), often at the termini

Page 14: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Disprot

www.disprot.org

Current release: 6.02Release date: 05/24/2013Number of proteins: 694Number of disordered regions: 1539

Experimentally verified disordered

proteins collected from literature

(X-ray, NMR, CD, proteolysis, SAXS,

heat stability, gel filtration, …)

Page 15: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Additional databases Combining experiments and predictions

Genome level annotations

MobiDB: http://mobidb.bio.unipd.it D2P2: http://d2p2.pro IDEAL: http://www.ideal.force.cs.is.nagoya-u.ac.jp/IDEAL

Page 16: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Amino acid compositions

He et al. Cell Res. 2009; 19: 929

Page 17: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Sequence properties of disordered proteins Amino acid compositional bias High proportion of polar and charged amino acids

(Gln, Ser, Pro, Glu, Lys) Low proportion of bulky, hydrophobhic amino acids

(Val, Leu, Ile, Met, Phe, Trp, Tyr) Low sequence complexity Signature sequences identifying disordered proteins

Protein disorder is encoded in the amino acid sequence

Page 18: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Uversky plot: charge-hydrophobicity (two parameters)

Mean hydrophobicity

Mea

n n

et c

har

ge

Uversky (2002) Eur. J. Biochem. 269, 2

Page 19: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

p53

Prilusky (2005) Bioinformatics 21, 3435

Making it position specific: FoldIndex http://bip.weizmann.ac.il/fldbin/findex

Page 20: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Disorder Prediction Methods

Amino acid propensity scales

GlobPlot

Compare the tendency of amino acids: to be in coil (irregular) structure. to be in regular secondary structure elements

Linding (2003) NAR 31, 3701

Page 21: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

GlobPlot

Page 22: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

GlobPlot

From position specific predictions

Where are the ordered domains?

Longer disordered segments?

Noise vs. real data

Page 23: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

GlobPlot: http://globplot.embl.de/

downhill regions correspond to putative domains (GlobDom)

up-hill regions correspond to predicted protein disorder

Page 24: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Disorder Prediction Methods

Physical principles

IUPred

If a residue cannot form enough favorable interactions within its sequential environment, it will not adopt a well defined structure it will be disordered

Dosztanyi (2005) JMB 347, 827

Page 25: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Energy description of proteins

For example: L-I interaction is frequent (hydrophobic effect)

L-I interaction energy is low (favorable)

K-R interaction is rare (electrostatic repulsion)

K-R interaction energy is high (unfavorable)

Estimation of interaction energies based on statistical potentials:

Calculated from the frequency of amino acid interactions in globular proteins alone, based on the Boltzmann hypothesis.

Page 26: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Predicting protein disorder - IUPred The algorithm:

…PSVEPPLSQETFSDL WKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPRVA PAPAAPTPAA...

Based only on the composition of environment of D’swe try to predict if it is in a disordered region or not:

Amino acid composition of environ-ment:

A – 10%C – 0%D – 12 %E – 10 %F – 2 % etc…

Estimate the interaction energy between the residue and its environment

Decide the probability of the residue being disordered based on this

Page 27: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

IUPred: http://iupred.enzim.hu/

Page 28: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Disorder Prediction Methods

Machine learning

DISOPRED2

Binary classification problem

Ward (2004) JMB 337, 635

Page 29: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

DISOPRED2 …..AMDDLMLSPDDIEQWFTED…..

Assign label: D or O D O

F(inp)SVM with linear kernel

Page 30: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

DISOPRED2

Cutoff value!

Page 31: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

PONDR VSL2

Differences in short and long disorder amino acid composition methods trained on one type of dataset tested on

other dataset resulted in lower efficiencies

PONDR VSL2: separate predictors for short and long disorder combined

length independent predictions

Peng (2006) BMC Bioinformatics 7, 208

Page 32: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

PONDR-FIT

Sequence

Disorder prediction methods

PONDR VLXT

PONDR VL3

PONDR VSL2

IUPred

FoldIndex

TopIDP

PredictionANN

Meta-predictor

Xue et al. Biochem Biophys Acta. 2010; 180: 996

Page 33: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Complexity of protein disorder

Page 34: Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu.

Prediction of protein disorder Disordered residues can be predicted from

the amino acid sequence ~ 80% at the residue level

Methods can be specific to certain type of disorder accordingly, accuracies vary depending on

datasets Predictions are based on binary

classification of disorder