Top Banner
Structural bioinformatics for glycobiology
45

Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Dec 31, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Structural bioinformatics for glycobiology

Page 2: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Structural glycoinformatics approaches

• Structural modeling– Comparative modeling of glycoproteins– Complex modeling: glycoprotein replacement

• Modeling of the complex of glycans and GBPs and GTs:– docking– Analysis of interaction specificities

• Key residues vs. Specific glycan conformations

• Molecular Dynamics– Modeling the dynamics of the recognition of glycans by

GBPs– Modeling the enzymology of GTs: quantum mechanic

calculations

Page 3: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

obtain sequence (target)

fold assignment

comparativemodeling

ab initiomodeling

build, assess model

Approaches to predicting protein structures

high identitylong alignment

low identityfragment alignment

Sequence-sequence alignment orSequence-structure alignment

Page 4: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Comparative modeling of proteins

• Definition: Prediction of three dimensional structure of a target protein from the

amino acid sequence (primary structure) of a homologous (template) protein for which an X-ray or NMR structure is available.

• Why a Model:A Model is desirable when either X-ray crystallography or NMR spectroscopy cannot determine the structure of a protein in time or at all. The built model provides a wealth of information of how the protein functions with information at residue property level, e.g. the interaction with the ligands, GBPs/GTs with glycans.

Page 5: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

??

KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSRNICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE

Comparative Modeling(or homology modeling)

Use as template & model

8lyz1alc

KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRLShare Similar

Sequence

Homologous

Page 6: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Homology models have RMSDs less than 2Å more than 70% of the time.

Homology models can be very smart!

Page 7: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

.

0

20

40

60

80

100

0 50 100 150 200 250

Number of residues aligned

Perc

enta

ge s

equence

identi

ty/s

imila

rity

(B.Rost, Columbia, NewYork)

Sequence identity implies structural similarity

Don’t know region .....

Sequence similarity implies structural similarity?

Page 8: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Step 1: Fold IdentificationAim: To find a template or templates structures from protein database (PDB)

Improved Multiple sequence alignment methods improves sensitivity - remote homologs PSIBLAST, CLUSTAL

pairwise sequence alignment - finds high homology sequences BLAST

Fold recognition programs – find low homology sequences (threading, profile-profile alignment)

Page 9: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Step 2: Model ConstructionAim: To build three dimension (3D) structures of proteins, coordinates of every

atoms of the homology proteins

Approach 1: protein structure buildup: cores, loops and sidechains;

Approach 2: whole protein modeling: constraint-based optimization.

Commonly used programs: Modeller (http://salilab.org/modeller/)Swiss-model (http://swissmodel.expasy.org/)Geno3D (http://geno3d-pbil.ibcp.fr/)… …

Page 10: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Step 3: Model Construction

Page 11: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Modeling of glycan-protein complexes• Template: glycan-protein complex;

– Case 1: same glycan, different protein• Glycoprotein replacement: comparative modeling of protein

structure• Energy minimization, allowing structural flexibility of glycans

– Case 2: same protein, different glycan• Flexible docking of glycans

– Case 3: different protein and different glycan• Comparative modeling of proteins• Flexible docking of glycan• Can also be applied without a template of complex

Page 12: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Flexible docking• Semi-flexible (rigid protein, flexible ligand)

– Useful for drug screening– >150 programs: Dock, AutoDock, FlexX/FlexE, …

• Flexible protein: mainly sidechains (hard)• Two elements of semi-flexible docking algorithms

– ligand sampling methods• Pattern matching: Genetic Algorithm, Molecular Dynamics, Monte

Carlo…– Treatment of intermolecular forces:

• Simplified scoring functions: empirical, knowledge-based and molecular mechanics e.g. AMBER, CHARMM, GROMOS, ...

• Very simple treatment of solvation and entropy, or completely ignored!

Page 13: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Flexible docking of glycans to proteins

• Glycan structure sampling– Automatic generation / sampling of 3D glycan

structures: Sweet II (http://www.dkfz-heidelberg.de/spec/sweet2)

• Docking of each glycan conformation to the GBP: Scoring schemes– Empirical scores– Forcefield

• GLYCAM: modified AMBER forcefield / MD tools for glycans (R. Woods group)

– Challenge: water molecules

Page 14: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Flexibility of molecules

• Atoms connected by covalent bonds

• Bond lengths and bond angles are rigid

• Torsion (dihedral) angles are flexible

Page 15: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Frequently used definitions of glycosidic torsion angles

Angle NMR style

C − 1 crystallographic style

C + 1 crystallographic style

ϕ H1—C1—O—C′x O5—C1—O—C′x O5—C1—O—C′x

ψ C1—O—C′x—H′x C1—O—C′x—C′x−1 C1—O—C′x—C′x+1

ψ [(1–6)-linkage] C1—O—C′6—C′5 C1—O—C′6—C′5 C1—O—C′6—C′5

ω [(1–6)-linkage] O—C′6—C′5—H′5 O—C′6—C′5—C′4 O—C′6—C′5—O′5

ASN

sweet2: http://www.dkfz-heidelberg.de/spec/sweet2/

Page 16: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Induced fit? rigid receptor hypethesis

Page 17: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Preferred torsion angles of glycans

Page 18: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Cone-like (left) and umbrella-like (right) topologies of 2-3 and 2-6 siaylated glycans binding to influenza

viral HAs

Chandrasekaran, et. al. Nature Biotechnology 26, 107 - 113 (2008)

Page 19: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162

Combine structural analysis with the glycan array analysis: providing structural insights.

Page 20: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162

Ligand binding by the scavenger receptor C-type lectin (SRCL) and LSECtin

Page 21: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162

Binding of multiple classes of ligands to DC-SIGN and the macrophage galactose receptor. Model of the binding site in the macrophage galactose receptor with a bound GalNAc residue, based on the structure of the galactose-binding mutant of mannose-binding protein that was created by insertion of key binding site residues from the galactose-binding receptor.

Page 22: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162

Mechanisms of mannose-binding protein interaction with ligands.

Page 23: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Molecular Dynamics: simulation of molecular motions

• Energy model of conformation• Two main approaches:

– Monte Carlo - stochastic– Molecular dynamics – deterministic

• Understand molecular function and interactions– Catalysis of enzymes

• Complementary to experiments• Obtain a movie of the interacting molecules

Page 24: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Basic Concepts of simulation of molecular motion

1. Compute energy for the interaction between all pairs of atoms.

2. Move atoms to the next state.3. Repeat.

Page 25: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Energy Function

• Target function that MD uses to govern the motion of molecules (atoms)

• Describes the interaction energies of all atoms and molecules in the system

• Always an approximation– Closer to real physics --> more realistic, more

computation time (I.e. smaller time steps and more interactions increase accuracy)

Page 26: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

F = MA

exp(-E/kT)

domain

quantumchemistry

moleculardynamics

Monte Carlo

mesoscale continuum

Length Scale

Tim

e Sc

ale

10-10 M 10-8 M 10-6 M 10-4 M

10-12 S

10-8 S

10-6 S

Taken from Grant D. SmithDepartment of Materials Science and EngineeringDepartment of Chemical and Fuels EngineeringUniversity of Utahhttp://www.che.utah.edu/~gdsmith/tutorials/tutorial1.ppt

Scale in Simulations

Page 27: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

The energy model

http://cmm.cit.nih.gov/modeling/guide_documents/molecular_mechanics_document.html

The NIH Guide to Molecular Modeling

• Proposed by Linus Pauling in the 1930s

• Bond angles and lengths are almost always the same

• Energy model broken up into two parts:– Covalent terms

• Bond distances (1-2 interactions)

• Bond angles (1-3)• Dihedral angles (1-4)

– Non-covalent terms• Forces at a distance between

all non-bonded atoms

Page 28: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

The energy equation

Energy = Stretching Energy +Bending Energy + Torsion Energy + Non-Bonded Interaction Energy

These equations together with the data (parameters) required to describe the behavior of different kinds of atoms and bonds, is called a force-field.

Page 29: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Bond Stretching Energy

kb is the spring constant of the bond.

r0 is the bond length at equilibrium.

Unique kb and r0 assigned for each bond pair, i.e. C-C, O-H

Page 30: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Bending Energy

k is the spring constant of the bend.

0 is the bond length at equilibrium.

Unique parameters for angle bending are assigned to each bonded triplet of atoms based on their types (e.g. C-C-C, C-O-C, C-C-H, etc.)

Page 31: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Torsion Energy

A controls the amplitude of the curve

n controls its periodicity

shifts the entire curve along the rotation angle axis ().

The parameters are determined from curve fitting.

Unique parameters for torsional rotation are assigned to each bonded quartet of atoms based on their types (e.g. C-C-C-C, C-O-C-N, H-C-C-H, etc.)

Page 32: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Non-bonded Energy

A determines the degree the attractiveness

B determines the degree of repulsion

q is the charge

A determines the degree the attractiveness

B determines the degree of repulsion

q is the charge

Page 33: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Simulating In A Solvent• The smaller the system, the more particles on the

surface– 1000 atom cubic crystal, 49% on surface

– 106 atom cubic crystal, 6% on surface

• Would like to simulate infinite bulk surrounding N-particle system

• Two approaches:– Implicitly– Explicitly

• Periodic boundary conditions

Schematic representation of periodic boundary conditions.

http://www.ccl.net/cca/documents/molecular-modeling/node9.html

Page 34: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Parameters for MD: Forcefield

• Derived from direct experimental measurements on small molecules (~10 atoms)

• Commonly used: AMBER, CHARMM, GROMOS, etc– GLYCAM for MD of glycoconjugates (derived from

AMBER forcefield)

Page 35: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Monte CarloExplore the energy surface by randomly probing the

configuration space by a Markov Chain approachMetropolis method (avoids local minima):

1. Specify the initial atom coordinates.2. Select atom i randomly and move it by random displacement.3. Calculate the change of potential energy, E corresponding to

this displacement.4. If E < 0, accept the new coordinates and go to step 2.5. Otherwise, if E 0, select a random R in the range [0,1] and:

1. If e-E/kT < R accept and go to step 2 2. If e-E/kT R reject and go to step 2

Page 36: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Deterministic Approach

• Provides us with a trajectory of the system.– From atom positions, velocities, and accelerations,

calculate atom positions and velocities at the next time step.

– Integrating these infinitesimal steps yields the trajectory of the system for any desired time range.

• Typical simulations of small proteins including surrounding solvent in the pico-seconds.

Fi E

x i

F m

a

Page 37: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Deterministic / MD methodology

• From atom positions, velocities, and accelerations, calculate atom positions and velocities at the next time step.

• Integrating these infinitesimal steps yields the trajectory of the system for any desired time range.

• There are efficient methods for integrating these elementary steps with Verlet and leapfrog algorithms being the most commonly used.

Page 38: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

MD algorithm

• Initialize system– Ensure particles do not overlap in initial positions

(can use lattice)– Randomly assign velocities.

• Move and integrate.

{r(t), v(t)}

{r(t+t), v(t+t)}

Leapfrog algorithm

Page 39: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

MD studies of Prion proteins

• Prion protein (PrP) is associated with an unusual class of neurodegenerative diseases– Scrapie (sheep); bovine spongiform encephalopathy (BSE) in cattle; kuru,

Creutzfeldt-Jacob disease (CJD), Gerstmann-Sträussler-Scheinker syndrome

(GSS), and fatal familiar insomnia (FFI) in humans

• Protein-only hypothesis (Prusiner, 1982): the disease is caused by an abnormal form of the 250 amino acid PrP, which accumulates in plaques in the brain.

• PrP (PrPSc) differs from the normal cellular form (PrPC) only in its 3-D structure, and FTIR and CD spectra indicate it has a significantly increased content of ß-sheet conformation compared with PrPC

• Glycosylation appears to protect prion protein (PrPC) from the conformational transition to the disease-associated scrapie form (PrPSc);

Page 40: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

PrP is a glyco-protein

• Available NMR structures are for non-glycosylated PrPC only

• Glycosylation appears to protect prion protein (PrPC) from the conformational transition to the disease-associated scrapie form (PrPSc)

• Objective: study of the influence of two N-linked glycans (Asn181 and Asn197) and of the GPI anchor attached to Ser230

Zuegg, et. al., Glycobiology, 2000, 10(10):959-974.

Page 41: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

MD simulations• Molecular dynamics simulations on the C-terminal region of human prion

protein HuPrP(90–230), with and without the three glycans• AMBER94 force field in a periodic box model with explicit water

molecules, considering all long-range electrostatic interactions• HuPrP(127–227) is stabilized overall from addition of the glycans,

specifically by extensions of two helix and reduced flexibility of the linking turn containing Asn197;

• The stabilization appears indirect, by reducing the mobility of the surrounding water molecules, and not from specific interactions such as H bonds or ion pairs.– Asn197 having a stabilizing role, while Asn181 is within a region with already

stable secondary structure

Zuegg, et. al., Glycobiology, 2000, 10(10):959-974.

Page 42: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Cone-like (left) and umbrella-like (right) topologies of 2-3 and 2-6 siaylated glycans

binding to influenza viral HAs

Chandrasekaran, et. al. Nature Biotechnology 26, 107 - 113 (2008)

A retrospective analysis

Page 43: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

MD simulation of glycan binding of influenza HAs

• A combined approach (MD + sequences) to predict ligand-binding mutants of H5N1 influenza HA– Modeling the ligand-bound state of H5N1 HA using the isolate VN1194

bound to α2,3-sialyllactose as previously crystallized– Excess mutual information was computed between each residue of each

monomer and the corresponding bound ligand, using the average mutual information between the residue and all residues as an estimate of the “background” mutual information.

– Combine these results with sequence analysis of H5N1 mutational data to predict clusters of residues that undergo coordinated mutation, which have some capacity to vary but are subject to selective pressure relating mutation. These residues may be richer targets to change ligand specificity than residues absolutely conserved or residues that display uncorrelated mutations (involved in immune escape).

Kasson, et. al., JACS, 2009, 131 (32), pp 11338–11340

Page 44: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Experimentally identified ligand-binding mutations in red, the top 5% of residues by dynamics scoring in cyan (overlap of these two in magenta), and the six mutation sites identified by both dynamics and sequence analysis in yellow.

The top three mutations from the ligand dissociation analyses in yellow. A modeled α2,3-sialyllactose is shown in orange.

Page 45: Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex.

Prediction of dissociation rate for HA mutants (in silico mutagenesis)

• Bayesian analysis methods to predict dissociation rates based on extensive simulation of each mutant and evaluate whether a mutant has a faster dissociation rate than the influenza clinical isolate that we use as a wild-type reference.

• These simulations were used to estimate the dissociation rate for each mutation.

• The mutation sites predicted by analysis of the molecular dynamics data include both residues immediately contacting the bound glycan and residues located farther away on the globular head of the hemagglutinin molecule.