Prof Shoba Ranganathan Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Dept. of Chemistry and Biomolecular Sciences, Sciences, Macquarie University, Sydney, Australia & Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School of Dept of Biochemistry, Yong Loo Lin School of Medicine Medicine National University of Singapore National University of Singapore ([email protected]) ([email protected]) Biomolecular Modeling: Biomolecular Modeling: building a 3D protein building a 3D protein structure from its structure from its sequence sequence
96
Embed
Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Prof Shoba RanganathanProf Shoba Ranganathan
Dept. of Chemistry and Biomolecular Sciences, Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia &Macquarie University, Sydney, Australia &
Dept of Biochemistry, Yong Loo Lin School of MedicineDept of Biochemistry, Yong Loo Lin School of MedicineNational University of SingaporeNational University of Singapore
Levels of protein structureLevels of protein structure
Tertiary packing of secondary
structure elements into a compact spatial unit
“Fold” or domain – this is the level to which structure prediction is currently possible
Levels of protein structureLevels of protein structure
Quaternary Assembly of homo- or
heteromeric protein chains
Usually the functional unit of a protein, especially for enzymes
Structural classesStructural classes
All- (helical) All- (sheet)
(parallel -sheet)
Structural classesStructural classes
(antiparallel -sheet)
Structural informationStructural information
Protein Data Bank: maintained by the Research Collaboratory for Structural Bioinformatics http://www.rcsb.org/pdb > 45,744 structures of proteins Also contains structures of DNA,
carbohydrates, protein-DNA complexes and numerous small ligand molecules.
The PDB dataThe PDB data
Text files Each entry is identified by a unique 4-
letter code: say 1emg 1emg entry
Header information Atomic coordinates in Å (1 Ångstrom
= 1.0e-10 m)
PDB Header detailsPDB Header details identifies the molecule, any modifications, date of
release of PDB entry
organism, keywords, method Authors, reference, resolution if X-ray structure Sequence, x-reference to sequence databases
HEADER GREENFLUORESCENT PROTEIN 12-NOV-98 1EMG TITLE GREEN FLUORESCENT PROTEIN (65-67 REPLACED BY CRO, S65T TITLE 2 SUBSTITUTION, Q80R) COMPND MOL_ID: 1; COMPND 2 MOLECULE: GREEN FLUORESCENT PROTEIN; COMPND 3 CHAIN: A; COMPND 4 ENGINEERED: YES; COMPND 5 MUTATION: 65 - 67 REPLACED BY CRO, S65T SUBSTITUTION, Q80R COMPND 6 SUBSTITUTION; COMPND 7 BIOLOGICAL_UNIT: MONOMER
The data itselfThe data itself
ATOM 1 N SER A 2 29.089 9.397 51.904 1.00 81.75 ATOM 2 CA SER A 2 27.883 10.162 52.185 1.00 79.71ATOM 3 C SER A 2 26.659 9.634 51.463 1.00 82.64 ATOM 4 O SER A 2 26.718 8.686 50.686 1.00 81.02 ATOM 5 CB SER A 2 28.039 11.660 51.932 1.00 75.59ATOM 6 OG SER A 2 27.582 12.038 50.639 1.00 43.28-------ATOM 1737 CD1 ILE A 229 39.535 21.584 52.346 1.00 41.62TER 1738 ILE A 229
Coordinates for each heavy (non-hydrogen) atom from the first residue to the last
Any ligands (starting with HETATM) follow the biomacromolecule
O of water molecules (also HETATM) at the end
Structural FamiliesStructural Families SCOP - Structural Classification Of
Proteins http://scop.mrc-lmb.cam.ac.uk/scop
FSSP – Family of Structurally Similar Proteins http://www.ebi.ac.uk/dali/fssp/
Proteins adopt a limited number of topologies. Homologous sequences show very
similar structures, with strong conservation in secondary structural elements: variations in non-conserved regions.
In the absence of sequence homology, some folds are preferred by vastly different sequences.
Structure comparison factsStructure comparison facts The “active site” (a collection of functionally
critical residues) is remarkably conserved, even when the protein fold is different. Structural models (especially those based
on homology) provide insights into possible function for new proteins.
Implications for protein engineering ligand/drug design, function assignment of genomic data.
Visualizing PDB informationVisualizing PDB information RASMOL: most popular, available for all platforms (Sayle et al, 2005) http://www.bernstein-plus-sons.com/software/rasmol
DeepView Swiss-PDBViewer: from Swiss-Prot (Guex & Peitsch, 1997) http://tw.expasy.org/spdbv/
Chemscape Chime Plug-in: for PC and Mac http://www.mdli.com/products/framework/chemscape
PyMOL: Very good, available for all platforms (DeLano, W.L. The PyMOL Molecular Graphics System, 2002) http://pymol.sourceforge.net
Mapping Functional RegionsMapping Functional Regions
Immunoglobulin light chain - dimer
Hydrophobhic residues in magenta
Hydrophilic and charged residues in cyan
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to Template Structure(s)
6. Building the Model
Siblings and CousinsSiblings and Cousins Siblings or homologues: sequences with at least
30% sequence identity over an alignment length of at least 125 residues and conservation of function.
Cousins or paralogues: < 30% identity but with conservation of function
Both show structural conservation Homologues located using a database search tool
such as BLAST (free webserver): http://www.ncbi.nlm.nih.gov/BLAST
Paralogues require a more sensitive method such as PSI-BLAST
Multiple Sequence AlignmentMultiple Sequence AlignmentFinding the best way to match the residues ofrelated sequences Identical residues must be lined up The rest should be arranged, based on
observed substitution in protein families chemical similarity charge similarity
Where it is impossible to get the residues to line up, the biological concept of insertion/deletion in invoked: the ‘gap’ in alignments
MSA MethodsMSA Methods CLUSTALW / CLUSTALX (Thompson et al, 1997):
freely available for all platforms and one of the best alignment programs
typically Best results obtained by combining several
database search and knowledge-based tools: 3D-PSSM
http://www.sbg.bio.ic.ac.uk/~3dpssm/ FUGUE
http://www-cryst.bioc.cam.ac.uk/fugue/
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to Template Structure(s)
6. Building the Model
One or many templates?One or many templates? Sequence similarity: extract template
sequences and align with query: select the most similar structure
Completeness: Missing data? REMARK 465 MISSING RESIDUES REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.) REMARK 465 REMARK 465 M RES C SSSEQI REMARK 465 MET A 1 REMARK 465 THR A 230
REMARK 470 M RES CSSEQI ATOMS REMARK 470 GLU A 5 OE2 REMARK 470 GLU A 6 CG CD OE1 OE2 REMARK 470 GLU A 17 OE1
One or many templates?One or many templates? X-ray or NMR?:
Lowest resolution X-ray structure X-ray and then NMR NMR average over assembly
One or many?: Structure alignment of C atoms If 2 templates are very close, keep only one Keep templates that provide new information
Many templatesMany templates Sequence alignment from structure
comparison of templates (SSA) can be different from a simple sequence alignment (SA).
For model building, 1. align templates structurally
2. extract the corresponding SSA
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to the Template Structure(s)
6. Building the Model
Query - Template AlignmentQuery - Template Alignment >40% identity: any alignment method is OK Below this, checks are essential.
Collect close sequence homologues (about 10) and align to query to get MSA (multiple sequence alignment)
Collect several structural templates (at least 5) and align them using structure comparison methods: extract the SSA (structural sequence alignment)
Align MSA to SSA using profile alignment Extract query and selected template(s) from the
final alignment – QTA.
QTA ChecksQTA Checks Residue conservation checks
Functional regions Patterns/motifs conserved?
Indels Combine gaps separated by few residues
Editing the alignment Move gaps from secondary structures to
loops Within loops, move gaps to loop ends, i.e.
turnaround point of backbone
QTA ChecksQTA Checks Residue conservation checks
Functional regions Patterns/motifs conserved?
Indels Combine gaps separated by few residues
Editing the alignment Move gaps from secondary structures to
loops Within loops, move gaps to loop ends, i.e.
turnaround point of backbone
Visual Inspection of IndelsVisual Inspection of Indels 2-residue
deletion from sequence alignment
End-of-loop 2-residue deletion
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to Template Structure(s)
5. Fixing the VP1 Alignment5. Fixing the VP1 Alignment
Structural alignment of templates: using VAST (Gibrat, Madej, & Bryant, 1996)
Extract corresponding sequence alignment
Match HFMDV VP1 to aligned templates using profile alignment in CLUSTALW
5. VP1 alignment to templates5. VP1 alignment to templates
VP11BEV1 14 Q A A G A L V A G T S T S T H S V A T D S T P A L Q A A E T G A T S T A R D E S M I E T R T I V P T H G I H E T S V E S F F G R S S L V G M1EAH1 24 - - - A N N L P D T Q S S G P A H S - K E T P A L T A V E T G A T N P L V P S D T V Q T R H V I Q K R T R S E S T V E S F F A R G A C V A I1FPN1 15 - - - - L V V P N I N S S N P T T S - N S A P A L D A A E T G H T S S V Q P E D V I E T R Y V Q T S Q T R D E M S L E S F L G R S G C I H EEV711 23 A L P A P T G Q N T Q V S S H R L D T G E V P A L Q A A E I G A S S N T S D E S M I E T R C V L N S H S T A E T T L D S F F S R A G L V G E
1BEV1 84 P L L A T - - - - - - G T S I T H W R I D F R E F V Q L R A K M S W F T Y M R F D V E F T I I A T S S - T G Q N V T T E Q H T T Y Q V M Y V1EAH1 90 I E V D N D - - - - - S K L F S V W K I T Y K D T V Q L R R K L E F F T Y S R F D M E F T F V V T S N Y T D A N N G H A L N Q V Y Q I M Y I1FPN1 80 S K L E V T L A N Y N K E N F T V W A I N L Q E M A Q I R R K F E L F T Y T R F D S E I T L V P C I S A L - - - S Q D I G H I T M Q Y M Y VEV711 93 I D L P L E - G T T N P N G Y A N W D I D I T G Y A Q M R R K V E L F T Y M R F D A E F T F V A C T P - - - - - T G E V V P Q L L Q Y M F V
1BEV1 147 P P G A P V P S N Q D S F Q W Q S G C N P S V F A D T D G P P A Q F S V P F M S S A N A Y S T V Y D G Y A R F M - - - D T - - - D P D R Y G1EAH1 161 P P G A P I P G K W N D Y T W Q T S S N P S V F Y T Y G A P P A R I S V P Y V G I A N A Y S H F Y D G F A K V P L A G Q A S T E G D S L Y G1FPN1 147 P P G A P V P N S R D D Y A W Q S G T N A S V F W Q H G Q A Y P R F S L P F L S V A S A Y Y M F Y D G Y D E - - - - - - - - - - Q D Q N Y GEV711 157 P P G A P K P E S R E S L A W Q T A T N P S V F V K L T D P P A Q V S V P F M S P A S A Y Q W F Y D G Y P T F G - - - E H K Q E K D L E Y G
1BEV1 211 I L P S N F L G F M Y F R T L E D - - - A A H Q V R F R I Y A K I K H T S C W I P R A P R Q A P Y K K R Y N L V F S - - G - D S D R I C S N1EAH1 231 A A S L N D F G S L A V R V V N D H N P T K L T S K I R V Y M K P K H V R V W C P R P P R A V P Y Y G P - G V D Y K - - D - G L A P - L P G1FPN1 207 T A N T N N M G S L C S R I V T E K H I H K V H I M T R I Y H K A K H V K A W C P R P P R A L E Y T R A H R T N F K I E D R S I Q T A I V TEV711 224 A C P N N M M G T F S V R T V G S S - K S K Y P L V V R I Y M R M K H V R A W I P R P M R N Q N Y L F K A N P N Y A - - G N S I K P T G T S
* : * : * : . * : * : * * . * * * . * * : . .
1BEV1 275 R A S L T S Y1EAH1 296 K - G L T T Y1FPN1 277 R P I I T T AEV711 291 R T A I T T -
: : * :
281301283296
strandsheliceshelices
Pocket-factor binding residues
5. Model building steps5. Model building steps Build all 4 capsid proteins (VP1-VP4)
together to ensure 3D fit Use 1BEV alone for VP2-VP4 For VP1: use aligned 1BEV, 1EAH,