Top Banner
CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling
27

CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Jan 11, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

CS 177 Proteins part 1: Structure-function relationships

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Page 2: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Need for analyses of protein structures

A protein performs metabolic, structural, or regulatory functions in a cell. Cellular biochemistry works based on interactions between 3-D molecular structures

The 3-D structure of a protein determines its function

Therefore, the relationship of sequence to function is primarily concerned with understanding the 3-D folding of proteins and inferring protein functions from these 3-D structures(e.g. binding sites, catalytic activities, interactions with other molecules)

The study of protein structure is not only of fundamental scientific interest in terms of understanding biochemical processes, but also produces very valuable practical benefits

Medicine

The understanding of enzyme function allows the design of new and improved drugs

Agriculture

Therapeutic proteins and drugs for veterinary purposes and for treatment of plant diseases

Industry

Protein engineering has potential for the synthesis of enzymes to carry out various industrial processes on a mass scale

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Page 3: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Need for analyses of protein structures

Protein 3-D structure has direct medical implications:a incorrectly folded protein will not function properly

Examples:

- Adult-onset diabetes

Protein misfolding may be responsible for blood-vessel damage, blindness and other debilitating effects of the disease

- Cystic Fibrosis

Most common mutation underlying cystic fibrosis hinders the dissociation of the transport-regulator protein from one of its chaperones. Thus, the final steps in normal folding cannot occur, and normal amounts of active protein are not produced.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Page 4: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Need for analyses of protein structures

Examples for diseases associates with protein misfolding (cont.):

- Alzheimer's disease

New studies indicate that Alzheimer's disease may be caused by small clumps of wrongly folded proteins. Scientists have found that misfolded amyloid beta protein molecules hinder memory processes in rat brains by blocking synapses

References

1. Walsh, D. M. et al. Naturally secreted oligomers of amyloid (protein potently inhibit hippocampal long-term potentiation in vivo. Nature, 416, 535 - 539, (2002).

2. Bucciantini, M. et al. Inherent toxicity of aggregates implies a common mechanism for protein misfolding diseases. Nature, 416, 507 - 511, (2002). Review of protein

structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

CT scan of the brain of an Alzheimer's patient showing widespread destruction (pink) of brain tissue (green)

Page 5: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Need for analyses of protein structures

Examples for diseases associates with protein misfolding (cont.):

- Transmissible Spongiform Encephalopathies (TSEs) (such as mad cow disease or the human version, Creutzfeldt-Jakob disease)

Infectious agent is probably a small misfolded protein called prion. Prions naturally occur in the brain with unknown function. Infectious prions can cause correctly folded proteins to misfold. Domino-effect: large numbers of misfolded prions cause neural degeneration

- Other non-infectious brain diseases such as Parkinson’s, Huntington’s, and Lou Gehrig’s.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Page 6: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Sources of protein structure information

3-D macromolecular structures stored in databases

The most important database: the Protein Data Bank (PDB)

The PDB is maintained by the Research Collaboratory for Structural Bioinformatics (RCSB) and can be accessed at three different sites (plus a number of mirror sites outside the USA):

- http://rcsb.rutgers.edu/pdb (Rutgers University)- http://www.rcsb.org/pdb/ (San Diego Supercomputer Center)- http://tcsb.nist.gov/pdb/ (National Institute for Standards and Technology)

It is the very first “bioinformatics” database ever build

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Page 7: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.
Page 8: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

The Protein Data Bank (PDB)

PDB: 20,254 structures (4 March 2003)

SwissProt: 122,564 entries (5 March 2003)

Ratio: 1:6 (structure of more than 83% of proteins still unknown)

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Page 9: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Sources of protein structure information

In practice, most biomolecular structures (>99% of structures in PDB) are determined using three techniques:

- X-ray crystallography (low to very high resolution) Problem: requires crystals; difficult to crystallize proteins by maintaining their native conformation; not all protein can be crystallized;

- Nuclear magnetic resonance (NMR) spectroscopy of proteins in solution (medium to high resolution) Problem: Works only with small and medium size proteins (~50% of proteins cannot be studied with this method); requires high solubility

- Electron microscopy and crystallography (low to medium resolution) Problem: (still) relatively low resolution

Experimental structure determination

Experimental methods are still very time consuming and expensive; in most cases the experimental data will contain errors and/or are incomplete. Thus the initial model needs to be refined and rebuild

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Page 10: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Sources of protein structure information

Researches have been working for decades to develop procedures for predicting protein structure that are not so time consuming and not hindered by size and solubility constrains.

As protein sequences are encoded in DNA, in principle, it should therefore be possible to translate a gene sequence into an amino acid sequence, and topredict the three-dimensional structure of the resulting chain from this amino acid sequence

Computational Modeling

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Page 11: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Some common terminology used in homology modeling

Motif (sequence context): conserved pattern of amino acids that is found in two or more proteins

Motif (structural context): combination of several secondary structure elements (also referred to as super-secondary structures and folds)

Domain (sequence context): (also referred to as homologous domain) extended sequence patterns, generally found by sequence alignment methods, that indicate a common evolutionary origin. It is generally longer than motifs (may include all of a given protein sequence)

Domain (structural context): segment of the protein that can fold into a 3-D structure; they are considered elementary units of molecular function

Family (sequence context): group of proteins of similar biochemical function that are more than 50% identical when aligned

Family (structural context): structures that have a significant level of structural similarity but not necessarily significant sequence similarity

Superfamily: group of protein families that are related by distant yet detectable sequence similarities

Fold: (also referred to folding motif) larger combination of secondary structure units in the same configuration. Thus, proteins sharing the same fold have the same combination of secondary structures that are connected by similar loops

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Page 12: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Identification of protein coding regions within DNA sequences (ORFs)

This is one of the single biggest challenges facing the bioinformatics specialistsworking on Genome Projects

Existing software is only about 90% accurate in predicting genes in large stretches of genomic DNA

The problem gets worse in eukaryotic genomes by the common occurrence of pseudogenes that are highly similar to real sequences, but are not transcribed

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling

Gene finding

Page 13: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Similarity search against the expressed sequence tag (EST) database (e.g. dbEST)

Translation and similarity search against the protein databanks (e.g.SWISS-PROT and GenPept)

- automatic translate and search functions implemented in BLASTX and TFASTA

- if a protein (or EST sequence) matches, it can be aligned with the unknown genomic sequence; start and stop codons should line up nicely and the introns should be obvious

- small error rate remains

If there are no handy template sequences in the databanks, one must rely onknowledge of DNA code

- the transcription initiation site is generally a ATG codon; it is usually about 30bp downstream from a TAATAA sequence (or some close approximation)

- graphic map of all 6 reading frames can be produced to search for a long one

- several software packages are available that map ORF’s (e.g. FRAMES, GeneWorks, MacVector, DNA Strider, GRAIL, ORF finder, DNA translation, BCM GeneFinder)

- problem: none of those programs is perfect; errors will occur

- confirming evidence can be collected by looking for regulatory sequences (promoters, enhancers, transcription factors; also known as signal sequences) that generally occur near ORF’s. Several databases for signal sequences are available (e.g. TransFac) and several software tool make use of these databases (e.g. Signal Scan, FindPatterns)

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling

How to find genes?

Page 14: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling

How to predict the protein structure?

Ab initio prediction of protein structure from sequence: not yet.

Problem: the information contained in protein structures lies essentially in theconformational torsion angles. Even if we only assume that every amino-acid residuehas three such torsion angles, and that each of these three can only assume oneof three "ideal" values (e.g., 60, 180 and -60 degrees), this still leaves us with 27possible conformations per residue.

For a typical 200-amino acid protein, this would give 27200 (roughly 1.87 x 10286)possible conformations!

If we were able to evaluate 109 conformations per second, this would still keep us busy 4 x 10259 times the current age of the universe

There are optimized ab initio prediction algorithms available as well as fold recognition algorithms that use threading (compares protein folds with know fold structures from databases), but the results are still very poor

Q: Can’t we just generate all these conformations, calculate their energy and see which conformation has the lowest energy?

Page 15: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling

Solution: homology modeling

Homology (comparative) modeling attempts to predict structure on the strengthof a protein’s sequence similarity to another protein of known structure

Basic idea: a significant alignment of the query sequence with a target sequence from PDB is evidence that the query sequence has a similar 3-D structure (current threshold ~ 40% sequence identity). Then multiple sequence alignment and pattern analysis can be used to predict the structure of the protein

Page 16: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling

Flow chart for protein structure prediction (from Mount, 2001)

Page 17: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling

Protein sequence

- partial or full sequences; predicted through gene finding

Page 18: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling

Database similarity search

- sequence is used as a query in a database similarity search against proteins in PDB

Page 19: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling

Does the sequence align with a protein of known structure?

- Yes: if the database similarity search reveals a significant alignment between the query sequence and a PDB target sequence, the alignment can be used to position the amino acids of the query sequence in the same approximate 3-D structure

- No: proceed to protein family analysis

Page 20: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling

Protein family analysis/relationship to known structure

- Family (structural context): structures that have a significant level of structural similarity but not necessarily significant sequence similarity

- the goal is to exploit these structure sequence relationships; two questions: 1) is the new protein a member of a family, 2) does the family have a predicted structural fold?

- analyze sequence for family specific profiles and patterns. Available databases: 3D-Ali, 3D-PSSM, BLOCKS, eMOTIF, INTERPRO, Pfam …)

- if the family analysis reveals that the query protein is a member of a family with a predicted structural fold, multiple alignment can be used for structural modeling

Page 21: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling

Protein family analysis/relationship to known structure

- if the family analysis is unsuccessful, proceed to structural analyses

Page 22: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling

Structural analysis

- several different types of analyses to infer structural information

- presence of small amino acid motifs in a protein can be indicator of a biochemical function associated with a particular structure. Motifs are available from the Prosite catalog

- spacing and arrangement of amino acids (e.g. hydrophobic amino acids) provide important structural clues that can be used for modeling

- certain amino acid combinations can occur in certain types of secondary structure

- These structural analyses can provide clues as to the presence of active sites and regions of secondary structure. These information can help to identify a new protein as a member of a known structural class

Page 23: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling

3-D structural analysis in lab

- proteins that fail to show any relationship to proteins of known structure are candidates for structural analyses (X-ray crystallography, NMR). There are about 600 known fold families and new structures are frequently found to have already known structural fold. Accordingly, protein families with no relatives of known structure may represent a novel fold

Page 24: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling: summary

Partial or full sequencespredicted through gene

finding

Similarity searchagainst proteins

in PDB

Alignment can be used to position theamino acids of the query sequence inthe same approximate 3-D structure

Find structures that have a significantlevel of structural similarity (but not

necessarily significant sequence similarity)

If member of a family with a predicted structural fold,

multiple alignment can be used for structural modeling

Infer structural information (e.g. presence of smallamino acid motifs; spacing and arrangement of

amino acids; certain typical amino acid combinationsassociated with certain types of secondary structure)

can provide clues as to the presence of active sites andregions of secondary structure

Structural analyses in the lab(X-ray crystallography, NMR)

Page 25: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling: summary

How to predict the protein structure?

Ab initio prediction of protein structure from sequence

Homology (comparative) modeling attempts to predict structure on the strength of a protein’s sequence similarity to another protein of known structure

Experimental structure determination

Page 26: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling: summary

Ab in itio prediction

Hom ology m odeling Experim ental structure determ ination

Page 27: CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure.

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Computational modeling

Viewing protein structures

A number of molecular viewers are freely available and run on most computer platforms and operating systems

Examples:

Cn3D 4.0 (stand-alone)

Rasmol (stand-alone)

Chime (Web browser based on Rasmol)

Swiss 3D viewer Spdbv (stand-alone)

All these viewers can use the PDB identification code or the structural file from PDB