Top Banner
Basics of protein structure and modeling Rui Alves
50
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Basics of protein structure and modeling Rui Alves.

Basics of protein structure and modeling

Rui Alves

Page 2: Basics of protein structure and modeling Rui Alves.

MQTLSERLKKRRIALKMTQTELATKAGVKQQSIQLIEAGVTKRPRFLFEIAMALNCDPVWLQYGTKRGKAA

atgcaaactctttctgaacgcctcaagaagaggcgaattgcgttaaaaatgacgcaaaccgaactggcaaccaaagccggtgttaaacagcaatcaattcaactgattgaagctggagtaaccaagcgaccgcgcttcttgtttgagattgctatggcgcttaactgtgatccggtttggttacagtacggaactaaacgcggtaaagccgcttaa

augcaaacucuuucugaacgccucaagaagaggcgaauugcguuaaaaaugacgcaaaccgaacuggcaaccaaagccgguguuaaacagcaaucaauucaacugauugaagcuggaguaaccaagcgaccgcgcuucuuguuugagauugcuauggcgcuuaacugugauccgguuugguuacaguacggaacuaaacgcgguaaagccgcuuaa

Proteins are the primary functionalmanifestation of genomes

DNA sequence

RNA sequence

proteinsequence

proteinstructure

Protein function

transcription

translation

Being able to predict the protein sequence from the gene sequence allows us to predict structure, which in turn helps us understand how the protein does what it does

Page 3: Basics of protein structure and modeling Rui Alves.

• DNA sequence to protein sequence

• From protein sequence to secondary structure

• Protein tertiary structure

• Predicting protein structure

Outline

Page 4: Basics of protein structure and modeling Rui Alves.

Predicting protein sequence from DNA sequence

• Protein sequence can be predicted by translating the cDNA and using the genetic code.

Page 5: Basics of protein structure and modeling Rui Alves.

Translating cDNA to protein

ATGTCTCTTATATGA…

MetSerLeuIleTer

No Gene!!!!!

Page 6: Basics of protein structure and modeling Rui Alves.

Translating cDNA to Protein

Page 7: Basics of protein structure and modeling Rui Alves.

Translating yeast mitochondrial cDNA into protein sequence

ATGTCTCTTATATGA………SECIS sequence

MetSerThrMetsCys

MetSerLeuIleTer

There is a Gene with a considerably different protein sequence from the one we would

predict from the universal genetic code!!!!!

Page 8: Basics of protein structure and modeling Rui Alves.

• DNA sequence to protein sequence

• From protein sequence to secondary structure

• Protein tertiary structure

• Predicting protein structure

Outline

Page 9: Basics of protein structure and modeling Rui Alves.

• The sequence of AAs is the primary structure of proteins• Sequence determines structure• Amino acids don’t fall neatly into classes• How we casually speak of them can affect the way we

think about their behavior. For example, if you think of Cys as a polar residue, you might be surprised to find it in the hydrophobic core of a protein unpaired to any other polar group. But this does happen.

• The properties of a residue type can also vary with conditions/environment

Amino acids are the primary building blocks of proteins

Page 10: Basics of protein structure and modeling Rui Alves.

Grouping the amino acids by properties

Livingstone & Barton, CABIOS, 9, 745-756, 1993.

Page 11: Basics of protein structure and modeling Rui Alves.

Proteins are made by controlled polymerization of amino acids

H2N CH C

R1

OH

O

H2N CH C

R2

OH

O

H2N CH C

R1

NH

O

CH C

R2

OH

O

pe ptide bond is formed

+ HOH

res idue 1 res idue 2

two amino a cidscondense to form...

...a dipeptide . Ifthe re a re more itbe comes a polype ptide .S hort polype ptide cha insa re usua lly ca lled peptideswhile longer one s a re ca lle dprote ins .

wa te r is e limina ted

N or aminote rminus

C or ca rboxyte rminus

Page 12: Basics of protein structure and modeling Rui Alves.

Repeating torsion angles

/ angles characterize the secondary structure

Page 13: Basics of protein structure and modeling Rui Alves.

Secondary structure elements in proteins

beta-strand(nonlocal interactions)

alpha-helix (local interactions)

A secondary structure element is a contiguous region of a protein sequence characterized by a repeating pattern of main-chain hydrogen bonds and backbone phi/psi angles

Reflect the tendency of backbone to hydrogen bond with itself in a semi-ordered fashion when compacted

Page 14: Basics of protein structure and modeling Rui Alves.

Principal types of secondary structure found in proteins

Repeating (f,y) values

-63o -42o

-57o -30o

-119o +113o

-139o +135o

-helix(15) (right-handed)

310 helix(14)

Parallel -sheet

Antiparallel -sheet

Page 15: Basics of protein structure and modeling Rui Alves.

The alpha-helix: repeating i,i+4 h-bonds

2

1

3

4

5

7

8

9

6

10

11

12

By DSSP definitions, which of residues 1-12 are in the helix? Does this coincide with the residues in the helical region of phi-psi space?

right-handed helical region of phi-psi space

hydrogen

bond-63o -42o

-helix(15) (right-handed)

-60

-120

-180

0

60

120

180

-180 -120 -60 0 60 120

Page 16: Basics of protein structure and modeling Rui Alves.

strands/sheets

Is this a parallel or anti-parallel sheet?

49

50

51

52

53

54

57

56

beta-strand region of phi-psi space

By DSSP definitions, which of res 49-57 are in the sheet? Does this coincide with the residues in the beta-strand region of phi-psi space?

-119o +113o

Parallel -sheet

-60

-120

-180

0

60

120

180

-180 -120 -60 0 60 120 180

Page 17: Basics of protein structure and modeling Rui Alves.

Contact maps of protein structures

1avg--structure of triabin

map of C-C distances < 6 Å

rainbow ribbon diagramblue to red: N to C

-both axes are the sequence of the protein

near diagonal: local contacts in the sequence

off-diagonal: long-range (nonlocal) contacts

Page 18: Basics of protein structure and modeling Rui Alves.

• If, from the primary structure one can predict secondary structure, then this may help in predicting protein function, via evolutionary relationships with known folds

What does secondary structure teach

Page 19: Basics of protein structure and modeling Rui Alves.

• DNA sequence to protein sequence

• From protein sequence to secondary structure

• Protein tertiary structure

• Predicting protein structure

Outline

Page 20: Basics of protein structure and modeling Rui Alves.

Tertiary structure in proteins

• Single polypeptide chain

• The number and order of secondary structures in the sequence (connectivity) and their arrangement in space defines a protein’s fold or topology

• Pattern of contacts between side chains/backbone also an aspect of tertiary structure

• Outer surface and interior

Page 21: Basics of protein structure and modeling Rui Alves.

Obvious interactions in native protein structures

S

S

R3

R1R2

CO2

NH3

ONH

disulfide crosslinks polar interactions (hydrogen bond/salt bridge)

hydrophobic interactions

Page 22: Basics of protein structure and modeling Rui Alves.

The protein databank

The protein databank is a central repository of protein structures

http://www.rcsb.org/pdb/home/home.do

Page 23: Basics of protein structure and modeling Rui Alves.

Major structure classification systems

SCOP (Structural Classification of Proteins)CATH (Class-Architecture-Topology-Homology)DALI/FSSP (Fold classification based on Structure-Structure Alignment)

SCOP and CATH are quite similar and generally combine automated and manual aspects. They are both “curated” by human experts.

Page 24: Basics of protein structure and modeling Rui Alves.

• DNA sequence to protein sequence

• From protein sequence to secondary structure

• Protein tertiary structure

• Predicting protein structure

Outline

Page 25: Basics of protein structure and modeling Rui Alves.

Training set of known structures

Training set of corresponding sequences

Test set of known structures

Test set of corresponding sequences

The knuts and bolts behind fold predition

p(-helix) p(coil) p(-strand)

A 0.23 0.28 0.5

Database of known structures

Database of corresponding sequences

ACDEFGTYAEE……

-helix coil -strand

p(-helix) p(coil) p(-strand)

A…C… A…C.. A…C…

A 0.1…0.03 0.04…0.002 0.1…0.21

p(aa1-coil) p(aa1-helix)

p(aa1-strand) …

Predict 2ary structureCompare

Bad Predictions:

Reshuffle training set and test set and repeat until predictions are correct

Good Predictions:

Method ready for new sequence 2ndary structure prediction

Page 26: Basics of protein structure and modeling Rui Alves.

How does a fold prediction server work?

Database of known structures

Database of corresponding sequences

Database of probabilities of aa in 2ndary structure

YOUR SEQUENCE

Homology

based helix

coil-strand

profile folds database

Server

Strong Homology

… Fold Prediction

Weak/No Homology

Helix-coil-strand

profile prediction

… Fold Prediction

Page 27: Basics of protein structure and modeling Rui Alves.

Predicting protein folding

Page 28: Basics of protein structure and modeling Rui Alves.

Predicting protein structure

• Homology Modeling– 3D-JIGSAW, SWISSMODEL

• Ab initio Modeling– ROBETTA

Page 29: Basics of protein structure and modeling Rui Alves.

Predicting protein structure by homology

Page 30: Basics of protein structure and modeling Rui Alves.

How does a homology modeling server work?

Database of known structures

Database of corresponding sequences

…YDVRSEQVENCE…

Server/

Program

Strong Homologues

Best possible alignment

(Sequence+

Structure)

…YDVR-SEQVENCE…

…YDVRMSD-VDNCD…

…YDVR-SEQVENCE…

…YDVRMSD-VDNCD…

Thread sequence to predict over known structure according to alignment

… Optimization via energy

minimization, etc…

Page 31: Basics of protein structure and modeling Rui Alves.

Predicting protein structure

• Homology Modeling– 3D-JIGSAW,SWISSMODEL

• Ab initio Modeling– ROSETTA

Page 32: Basics of protein structure and modeling Rui Alves.

Predicting protein structure by ab initio methods

Database of corresponding sequences

…YDVRSEQVENCE…

Server/

Program

NO Homologues

Database of structures for smaller amino acid runs

…YDVR-SEQ

…YDVRMSD-……YDVR-SEQ

…YPVRMSD-…

…VENCE…

…YDNCD……VENCE…

…VEQCE…

… Assemble

Energy minimization

& optimization

Page 33: Basics of protein structure and modeling Rui Alves.

Accuracy of modelling

• Accuracy is widely varying.• The quality of the model is VERY dependent on

the quality of the alignment • Globular proteins are more accurately predicted• Membrane proteins are still a big problem• Homology modelling is “bad” if Homology<30%• CASP is a bienial meeting where accuracy of the

different methods is predicted– Baker group is usually and consistently more accurate

than others

http://www.predictioncenter.org/

Page 34: Basics of protein structure and modeling Rui Alves.

• DNA sequence to protein sequence

• From protein sequence to secondary structure

• Protein tertiary structure

• Predicting protein structure

Summary

Page 35: Basics of protein structure and modeling Rui Alves.

“Accessible Surface”

Lee & Richards, 1971Shrake & Rupley, 1973

represent atoms as spheres w/appropriateradii and eliminate overlapping parts...

mathematically roll asphere all around thatsurface...

the sphere’scenter tracesout a surfaceas it rolls...

Page 36: Basics of protein structure and modeling Rui Alves.

The outer surface: water in protein structures

Structures of water-soluble proteins determined at reasonably high resolution will be decorated on their outer surfaces with water molecules (cyan balls) with relatively well-defined positions, and waters may also occur internally

Water is not just surrounding the protein--it is interacting with it

Page 37: Basics of protein structure and modeling Rui Alves.

Water interacts with protein surfaces

second shell water:only contacts other waters

first shell waters:in contact with/hydrogen boundto protein

most waters visible in structures make hydrogen bonds to each other and/or to the protein, as donor/acceptor/both

Page 38: Basics of protein structure and modeling Rui Alves.

Side chain conformation

• side chains differ in their number of degreesof conformational freedom(some don’t have any, such as Ala and Gly)

•but side chains of very different size can havethe same number of cangles.

Page 39: Basics of protein structure and modeling Rui Alves.

Supersecondary structures/structural motifs

• just as there are certain secondary structure elements that are common, there are also particular arrangements of multiple secondary structure elements that are common

• supersecondary structures emphasize issue of topology in protein structure

motif greek key motif

Page 40: Basics of protein structure and modeling Rui Alves.

Topology: differences in connectivity

“greek key”“up-and-down”

• example: a four-stranded antiparallel b sheet can have many different topologies based on the order in which the four b strands are connected:

Page 41: Basics of protein structure and modeling Rui Alves.

Topology: differences in handedness

• example: An extremely common supersecondary structure in proteins is the beta-alpha-beta motif, in which two adjacent beta-strands are arranged in parallel and are separated in the sequence by a helix which packs against them.

• if the two parallel strands are oriented to face toward you, the helix can be either above or below the plane of the strands.

huge preference for right-handed arrangement in proteins

Page 42: Basics of protein structure and modeling Rui Alves.

DIY: The sequence

Page 43: Basics of protein structure and modeling Rui Alves.

DIY: The server

Page 44: Basics of protein structure and modeling Rui Alves.

DIY: The reply

Page 45: Basics of protein structure and modeling Rui Alves.

DIY: fine tuning

Page 46: Basics of protein structure and modeling Rui Alves.

DIY: That is it!

Page 47: Basics of protein structure and modeling Rui Alves.

The CATH Hierarchy1. Divide PDB structure entries into domains (using domain recognition algorithms--domain is

the fundamental unit of structure classification

2. Classify each domain according to a five level hierarchy:

ClassArchitectureTopologyHomologous SuperfamilySequence Family

the top 3 levels of the hierarchyare purely phenetic--basedon characteristics of the structure,not on evolutionary relationships

the bottom two levels includesome phyletic classification as well--groupings according to putativecommon ancestry based on structural similarity, functionalsimilarity, and sequence similarity

There is no purely phyleticsystem of protein classification!(also unlikely that there is anycommon ancestor to all proteins)

Page 48: Basics of protein structure and modeling Rui Alves.

SCOP: A different (but similar) taxonomy system

Correspondences between SCOP and CATH hierarchies:

SCOP CATH

class class

architecture

fold topology

homologous superfamily

superfamily

family sequence family

domain domain

CATH more directed toward structural classification, whereas SCOPpays more attention to evolutionary relationships. Both have in common that they have manual aspects and are curated by experts.

Page 49: Basics of protein structure and modeling Rui Alves.

Internal interactions in a protein

Page 50: Basics of protein structure and modeling Rui Alves.

Amino acids: the building blocks of proteins

H2N CH C

R

OH

O

H3N CH C

R

O

O

The zwitte rionic form isthe pre domina nt form a tne utra l pH

amino group carboxylic acidgroup

s ide cha in

a lpha ca rbon

H3N CC

R

O

O

H

The a lpha ca rbon is a chira l ce nte r--na tura lprote ins a re made ofL amino acids (shownabove) as oppos ed to D