Proteins are built of L-amino acids. 20 different amino acids are encoded by specific

Proteins are built of L-amino acids.20 different amino acids are encoded by specificDNA base triplets.The amino acids are linked together by amide bonds.Proteins are linear chains of amino acids.Peptides are short proteins (< 50 residues)‘Peptide bond’ is another name for the amide bond connecting two amino acids.

Peptide bonds are planar due to partial double-bond character.The ‘backbone’ of a protein consists of the atomsN, C and C’. Side chain carbons are labelled , etc.Bond lengths and bond angles are invariant. Only dihedral angles vary.By convention, a protein starts at the N-terminusand ends at the C-terminus. (The N-terminus issynthesized first during translation.)

Carl Branden & John Tooze, Introduction to Protein Structure, Garland, 1998

N

R

dipeptide

glycine, Gly, G alanine, Ala, A

arginine, Arg, R asparagine, Asn, N aspartic acid, Asp, D

cysteine, Cys, C glutamic acid, Glu, Eglutamine, Gln, Q

histidine, His, H isoleucine, Ile, I leucine, Leu, L

lysine, Lys, K methionine, Met, M phenylalanine, Phe, F

proline, Pro, P serine, Ser, S threonine, Thr, T

tryptophan, Trp, W tyrosine, Tyr, Y valine, Val, V

hydrophobic: A I L M F P W Vpositively charged: R K (H)negatively charged: D Epolar: C N Q S T Ytiny: G

The interior of protein structures is tightly packed.Water is excluded, except for very few buried hydration water molecules.Almost all residues in the interior are hydrophobic or, at least, uncharged.Charged residues are almost always on the protein surface. The same rules apply to protein-protein binding surfaces.

Regular secondary structures form, because amide groups are polar, seeking H-bonding partners when buried in a hydrophobic environment.

Only backbone and hydrophobic sidechains retained

CPK model of ubiquitinyellow: hydrophobicgrey: polar but unchargedblue: positively chargedred: negatively chargedgreen: backbone atoms

Backbone H-bonds

Hydrophobic cores are tightly packed

Since peptide bonds are planar (and virtually always ‘trans’), thebackbone conformation of each amino acid is determined by onlytwo dihedral angles, and . Knowledge of the pairs of each residue is sufficient to define the 3D structure of the entire backbone!

Ramachandran plot

non-Gly residues Gly -helix-sheetleft-handed -helix

The Ramachandran plot displays experimentally observed combinations (one dot per residue).Steric clashes between the side chains of neighboring amino acids limit the accessible conformational space.Glycine can access larger regions in the Ramachandran plot than residues with longer side chains.

2 Cys residues can form a disulfide bridge:

-CH2-SH + 1/2 O2 -CH2-S-S-CH2- + H2Ooxidation

reduction

Primary, secondary, tertiary, quarternary structure

Primary structure = amino acid sequenceSecondary structure = helices, sheets, turns (i.e. regular sub-structures defined by H-bonds between backbone amides)Tertiary structure = 3D structureQuarternary structure = complex between different protein molecules (e.g. dimer, trimer, tetramer)

-helix: 3.6 residues per turn, H-bonds between residuesi and i+4

antiparallel -sheet parallel -sheet

in both types of -sheets, the side-chains point alternatinglyabove and below the plane of the sheet

mixed -sheet example: thioredoxin

Leventhal’s paradox

Assume a small protein with 100 amino acids, each one of them can access 3 different conformations3100 = 5 x 1047 conformationsFastest motions 10-15 sec, so sampling allconformations would take 5 x 1032 sec60 x 60 x 24 x 365 = 31536000 = 3.1536 x 107

seconds in a yearSampling all conformations will take 1.6 x 1025

years, much longer than the age of the universe

In nature, proteins fold correctly within seconds!

The 3D structure is unambiguously encoded in the amino-acid sequence, but protein structures are very hard to predict from amino acid sequence, unless the structure of a similar protein (> 20% amino-acid sequence identity) is known.

Different proteins fold by different, unpredictable mechanisms (some of them even need helper proteins (“chaperones”) to fold.The current picture is that of a folding funnel, where the vertical axis displays energy and the width of the funnel represents the accessible conformational space.

10 20 30 40 50 60 70 | | | | | | |MGRARDAILDALNLTAEEKLKKPKLELLSVPLREGYGRIPRGALLSMDALDLTDKLVSFYLETYGAELTACCHHHHHHHHHHHHHHHHHHHHHHHHHhhcchhHhcCcCcHHHHHhcCHHHHHHHHHHHHHHHHhHHHHHNVLRDMGLQEMAGQLQAATHQHHHHHHHHHHHHHHHHHHccC Sequence length : 91 PHD : Alpha helix (Hh) : 77 is 84.62% 310 helix (Gg) : 0 is 0.00% Pi helix (Ii) : 0 is 0.00% Beta bridge (Bb) : 0 is 0.00% Extended strand (Ee) : 0 is 0.00% Beta turn (Tt) : 0 is 0.00% Bend region (Ss) : 0 is 0.00% Random coil (Cc) : 14 is 15.38% Ambiguous states(?) : 0 is 0.00% Other states : 0 is 0.00%

Secondary structure prediction

PHD is best: http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_phd.htmlExpected accuracy: 72%

MGRARDAILDALNLTAEEKLKKPKLELLSVPLREGYGRIPRGALLSMDALDLTDKLVSFYLETYGAELTANVLRDMGLQEMAGQLQAATHQCCHHHHHHHHHHHHHHHHHHHHHHHHHhhcchhHhcCcCcHHHHHhcCHHHHHHHHHHHHHHHHhHHHHHHHHHHHHHHHHHHHHHHHccC--HHHHHHHHHHH—HHHHHHHHHHH---------------HHHHHH—HHHHHHHHHHHH—HHHHHHHHHHHHHHHH---HHHHHHHHHH--

PYRIN DED CARD DD

ca. 20% sequence identity can still result in similar 3D structure(statistical sequence identity: 6%)

Bars identify side-chains with less than 5% solvent accessibility in PYRIN domainBoxes delineate helix boundariesVertical grey-shading: conserved core residues

Death domains (DD), death effector domains (DED), caspase activation and recruitment domains (CARD) and PYRIN domains form 4 branchesof the death domain superfamily, i.e. their 3D structures are related while their sequence similarity is limited (J. Mol. Biol. 332, 1155 (2003)).

Swiss-Model: www.expasy.org/swissmod/SWISS-MODEL.htmlIf a template structure with >25% amino acid sequence identity is available in the Swiss-Model database,only the amino acid sequence needs to be submitted. Otherwise, the 3D coordinates of the desired templatestructure must be submitted too.

3D structure prediction

Swiss-Model and any other modelling software (best regarded is Modeller, www.salilab.org/modeller/modeller.html) depend crucially on the sequence alignment. Any model has the same coordinates for backbone and Cb atoms as the template. Insertions and deletions are handled ‘gentlemanly’.

Length of target sequence: 91 residuesSearching sequences of known 3D structuresNo suitable target found ==> Exit

Identification of similar structures with DaliFSSP FAMILIES OF STRUCTURALLY SIMILAR PROTEINS, VERSION 1.0 (Apr 1 1995)CREATED Fri Nov 1 01:15:14 GMT 2002 for dali on sputnik2-node68.ebi.ac.ukMETHOD Dali ver. 2.0: Holm, L., Sander, C. (1993) J.Mol.Biol. 233,123-138DATABASE 3241 protein chainsPDBID 6340 HEADER Structure from MOLMOL COMPND pyrin SOURCE AUTHOR SEQLENGTH 90NALIGN 54WARNING pairs with Z<2.0 are structurally dissimilar ## SUMMARY: PDB/chain identifiers and structural alignment statistics NR. STRID1 STRID2 Z RMSD LALI LSEQ2 %IDE REVERS PERMUT NFRAG TOPO PROTEIN 1: 6340 6340 23.2 0.0 90 90 100 0 0 1 S pyrin 2: 6340 1a1z 8.5 2.2 79 83 19 0 0 5 S fadd protein fragment (fas-associating death domain-con 3: 6340 1ich-A 7.3 2.3 75 87 19 0 0 6 S tumor necrosis factor receptor-1 fragment (tnf-1) Muta 4: 6340 3ygs-P 6.9 2.5 78 97 19 0 0 6 S apoptotic protease activating factor 1 fragment procasp 5: 6340 1ngr 6.6 2.6 73 85 10 0 0 5 S p75 low affinity neurotrophin receptor fragment 6: 6340 1dgn-A 5.9 2.8 77 89 13 0 0 6 S iceberg (protease inhibitor) fragment 7: 6340 1cy5-A 5.5 3.4 79 92 15 0 0 6 S apoptotic protease activating factor 1 fragment (apaf-1 8: 6340 1d2z-B 5.0 2.8 80 150 9 0 0 7 S death domain of pelle death domain of tube 9: 6340 3crd 4.6 3.0 75 100 15 0 0 7 S raidd fragment 10: 6340 1ddf 4.3 2.9 73 127 12 0 0 4 S fas 11: 6340 1g71-A 4.0 3.1 68 344 12 0 0 7 S DNA primase 12: 6340 1au7-A 3.8 3.5 62 130 6 0 0 7 S pit-1 fragment (ghf-1) Mutant biological_unit DNA 13: 6340 1d2z-A 3.7 2.5 63 102 5 0 0 6 S death domain of pelle death domain of tube 14: 6340 1dly-A 3.5 3.2 65 121 12 0 0 5 S hemoglobin

http://www.ebi.ac.uk/dali/

alternative: http://cl.sdsc.edu/

Large ribosomal subunit from Haloarcula marismortui

Science 289, 905 (2000)

From Structure to Function

subtilisinchymotrypsin

Convergent evolution: the overall structures of chymotrypsin and subtilisin are very different, but the catalytic triade (Asp, His, Ser: side-chains shown in blue) is conserved

FMN-binding protein from Desulfovibrio vulgaris

Views differ by a 90o rotation around a vertical axis.

protease from hepatitis virus

Catalytically active residues shown as spheres.The structure is composed of two domains, eachof which is similar to the FMN-binding protein.The binding site of FMN is at the site correspondingto the substrate binding site in the protease.

A mechanism for the evolution of proteolytic function

Two orthogonal views of chymotrypsin backbone

Chymotrypsin consists of two subdomains of similar structure.The active site is at the interface. The residues of the catalytic triade are contributed by both domains.

substrate binding site

allosteric regulation site*

Nat. Struct. Biol. 4, 975 (1997)

•Chymotrypsin is activated by proteolytic cleavage of the N-terminal end, resulting in altered binding of the new N-terminus to the C-terminal domain.

Steps for evolution of proteolytic function:

- a primordial peptide-binding protein (similar to the FMN-binding protein)- gene duplication, resulting in 2 domains linked by a polypeptide chain- any proteolysis, even if inefficient, decreases cooperativity of binding, hence peptide fragments dissociate, enabling capture of new, uncleaved peptide

gene duplication

No increased chance forproteolytic activity

Proteolysis could occur,if the correct residues approach the peptide inthe cleft between bothdomains.

If this evolutionary pathway is correct, what happened to the peptide binding site that is not at the interface between the twodomains? It became an allosteric regulation site, i.e.is still a peptide binding site in a way.

Proteins are built of L-amino acids. 20 different amino acids are encoded by specific

Documents

different amino acids

protein surface

linear chains of amino

amide bonds

different protein molecules

bond angles

small protein wi

cys residues