Bioinformatics of Disease: immune epitope prediction Shoba Ranganathan Professor and Chair – Bioinformatics Dept. of Chemistry and Biomolecular Sciences &Adjunct Professor Biotechnology Research Institute Dept. of Biochemistry Macquarie University Yong Loo Lin School of Medicine Sydney, Australia National University of Singapore, Singapore ([email protected]) ([email protected]) Visiting scientist @ Institute for Infocomm Research (I 2 R), Singapore
80
Embed
Bioinformatics of Disease: immune epitope prediction Shoba Ranganathan Professor and Chair – Bioinformatics Dept. of Chemistry and Biomolecular Sciences.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Bioinformatics of Disease: immune epitope prediction
Shoba Ranganathan
Professor and Chair – Bioinformatics Dept. of Chemistry and Biomolecular Sciences &Adjunct Professor Biotechnology Research Institute Dept. of BiochemistryMacquarie University Yong Loo Lin School of MedicineSydney, Australia National University of Singapore, Singapore([email protected]) ([email protected])
Visiting scientist @ Institute for Infocomm Research (I2R), Singapore
Bioinformatics is …..
Bioinformatics is the study of living systems through computation
Suggest candidate epitopes by in silico screening of entire proteins and even proteomes with specificity at:the allele levelthe supertype leveldisease-implicated alleles alone.
Minimize the number of wet-lab experiments Cut down the lead time involved in epitope
discovery and vaccine design
Computational models can help identify T cell epitopes
187 curated pMHC 16 with TCR Human:110, Murine:74 and Rat:3 Alleles: 40
(interface area, H bonds, gap volume and gap index)
101 new entries 187 entries (Human: 110; Murine: 74; Rat: 3) 134 non-redundant entries (class I: 100; class II: 34) 121 class I and 41 class II entries 26 HLA alleles (class I: 18; class II: 8) 14 rodent alleles (class I: 8; class II: 6) 16 TCR/peptide/MHC complexes
Distribution of MHC by allele
Peptide/MHC binding motifs
Conserved peptide properties in solution structures Classified according to
Alleles Peptide length
Polar Amide Basic Acidic Hydrophobic
1. There were only 36 crystal structures of unique
MHC (2006) alleles vs. 1765 unique MHC alleles
identified in IMGT/HLA database
2. Structure determination through experimental
methods is both expensive and time-consuming
3. Homology model building for alleles with no
structural data!
How to obtain structures of experimentally unsolved alleles?
Introduction Structural Immunoinformatic
Database development Data Analysis of pMHC Class I
complexes Computational models Applications
Data & text mining
Maths/Stats
Structures
MHC Class I superfamilies have different interaction characteristics
Superfamily HLA-A2 (36 entries)
HLA-B7(12 entries)
HLA-B27(18 entries)
Interface area (Å2) 846.3±48.9 876.7±72.4 934.0±136.0
Gap volume (Å3) 799.8±195.2 870.2±198.0 985.1±101.5
Gap index 0.9±0.2 1.0±0.1 1.0±0.3
Hydrogen bonds 11.1±1.9Concentrated at pockets A, B, F
14.3±2.3Well distributed
17.9±2.8Concentrated at pockets A, B, F
Single linkage cluster analysis of 68 pMHC Class I complexes from 13 alleles (all available A and B)
Data 68 peptide–HLA complexes spanning 13 classes I alleles from MPID-T
Hierarchical clustering Hierarchical clustering using the agglomerative algorithm. Distance between structures computed by single-linkage method (MATLAB
version 7.0) based on the separation between the each pair of data points. Nearest neighbors merged into clusters. Smaller clusters were then merged into larger clusters based on inter-cluster
distances, until all structures are combined. Last 3 levels considered for defining HLA class I supertypes.
Interaction parameters Significant for the characterization of peptide/MHC interface:
Intermolecular hydrogen bonds pMHC Interface area
Binding characteristics of HLA supertypes analyzed
Details
Gap volumeGap index
B27
B44
B7
B62
B8
Legend
Do the Class I alleles aggregate into “superfamilies” using receptor-ligand interaction patterns?
80 HLA class I complexes 13 class I alleles Five descriptors Hierarchical clustering using
nearest neighbor algorithm 77% consensus with data
MHC Class I superfamilies from receptor-ligand interactions
B27 B44 B7 B62 B8
Legend
Tong, Tan and Ranganathan (2007) Bioinformatics, 23: 177-183
Introduction Structural Immunoinformatic
Database development Data Analysis Computational models Applications
Maths/Stats
StructuresSequences
Physics/Chemistry
1. Finding the best fit conformation (docking) of peptides within the MHC binding groove
2. Screening potential binders from the background
Two-step approach to predict MHC-binding peptides
Docking is a computationally exhaustive procedure Large number of possible peptide conformations
3 global translational degrees of freedom 3 global rotational degrees of freedom 1 conformational degree of freedom for each rotatable bond
y
x
z R
N C C
C
O
>1010 possible conformations for a 10-residue peptide
Class I peptides N-termini residues
0.02 – 0.29 Å C-termini residues
0.00 – 0.25 Å
Class II binding registers Only 9 residues fit in
the binding groove N-termini residues
0.01 – 0.22 Å C-termini residues
0.02 – 0.27 Å
Conservation of nonamer peptide backbone conformation
Rapid docking of peptide to MHC Tong, Tan & Ranganathan (2004) Protein Sci. 13:2523-2532
Anchoring root fragments to reduce search space (Pseudo-Brownian rigid body docking )
Loop modeling (Loop closure of central backbone by satisfaction of spatial restraints)
Ligand backbone and
side-chain refinement (entire backbone and interacting side-chains
2
3
1
Benchmarking with existing techniques
Author Technique Peptide RMSDa RMSDb
Rognan et al. Simulated Annealing
TLTSCNTSV 1.04 0.46
FLPSDFFPSV 1.59 1.10
GILGFVFTL 0.46 0.32
ILKEPVHGV 0.87 0.87
LLFGYPVYV 0.78 0.33
Desmet et al. Combinatorial Buildup Algorithm RGYVYQGL 0.56 0.32
Rosenfeld et al. Multiple Copy AlgorithmFAPGNYPAL 2.70 0.40
GILGFVFTL 1.40 0.32
Sezerman et al. Combinatorial Buildup Algorithm
LLFGYPVYV 1.40 0.33
ILKGPVHGV 1.30 0.87
GILGFVFTL 1.60 0.32
TLTSCNTSV 2.20 0.46
aRMSD of peptide backbone obtained from respective authors. bRMSD of peptide backbone obtained in our work from redocking bound complexes and single template respectively.
Quantitative separation of binders from non-binders: empirical free energy scoring function DQ3.2involved in several autoimmune
Gbind = binding free energy GH = hydrophobic term GS = decrease in side chain entropy GEL = electrostatic term C = entropy change in system due to external
factors α, β, γ optimized by least-square multivariate regression
with experimental binding affinities (IC50) of MHC-peptides in training dataset (Rognan et al., 1999)
Quantitative separation of binders from non-binders: empirical free energy scoring function
Gbind ≈ -RT ln (IC50) (Rognan et al., 1999).
Test case: MHC Class II DQ8
DQ3.2(DQA1*0301/DQB1*0302)is involved in several autoimmune diseases: Celiac disease insulin-dependent diabetes mellitus IDDM-associated periodontal disease autoimmune polyendocrine syndrome
type II
Data used Structure: 1JK8 - DQ3.2β–insulin B9-23 complex Dataset I: 127 peptides with experimentally determined
T D R R Q S V V V N W M D D G K A A A D E I I I P D Y Y R Q E F L M
L Q L Q P F P Q P Q P F P P L A-gliadin 56-70 -41.01 20 D M T P A D A L D D F D L HSV -40.53 173 A A A A A V A A E A Y Artificial sequence -39.98 48 G V A G L L V A L A V IA-2 499-509 -36.16 95 D S N I M N S I N N V M D E I D F F E K Pf ABRA 487–506 -36.01 171 F E S T G N L I A P E Y G F K I S Y HA 255–271Y -35.70 62 Y P F I E Q E G P E F F D Q E MHC Ia 51–63 analog -35.34 1156 L L D I L D T A G L E E Y S A M R D p21 51–66; C out -35.27 202 Q P Y P Q P Q P F P S Q Q P Y A-gliadin 41-55 -35.26 1120 F P S Q Q P Y L Q L Q P F P Q A-gliadin 49-63 -33.93 20 C D G E R P T L A F L Q D V M GAD 101–115 -33.57 69 S F P P Q Q P Y P Q P Q P Q Y A-gliadin 77-91 -33.35 370 S Q D L E L S W N L N G L Q A D L S S FceR 104–122 -32.89 123 E P R A P W I E Q E G P E Y W MHC Ia 46-63 -32.89 519 P P L Y A T G R L S Q A Q L M P S P P M VP16 -32.59 538 S Q D L E L S W N L N G L Q A Y FceR 104–122 analog -32.49 118
Ligands / Epitopes
I A R A K M F P A V A E K 34P3A -31.91 541
Test Set 1: Improved detection of binders
lacking position specific binding motifs
Binding registers 20/23 (87%) binding registers Only register (aa 4-12) from Test Set 2
(Der p 2: 1-20)
(SE=0.80; SP(LMH)=0.90)
Top 5 predictions are experimental positives at very stringent threshold criteria (SE=0.95; SP(H)=0.63)
T-cell proliferation
Multiple registers (SP=0.95, SE(LMHP =0.81): 58% of Test Set 1)
0123456789
1011121314
1 2 3 4 5 6 7
No of Binding Registers
No
of
Pep
tid
es
Weak Binders Medium Binders Strong Binders
Mainly for medium and high binders
Experimental support: Sinha et al. for DRB1*0402
Is this why binding motifs are unsuccessful?
Introduction Structural Immunoinformatic Database
development Data Analysis Computational models developed Applications
Autoimmune blistering skin disorder Characterized by autoantibodies targeting
desmoglein-3 (Dsg3) Strong association with DR4 and DR6 alleles
Pemphigus vulgaris (PV)
http://www.medscape.com
adam.about.com
www.aafp.org
Who are the major players in PV? DR4 PV implicated alleles (for Semitic)
possess > 2 binding registers 66% (354/539) bind both alleles at different registers Similar proportion (70%) detected in known binders to both
alleles
Both alleles bind similar peptides via different binding registers
0
50
100
150
200
250
300
350
0 1 2 3 4 5 6
No of Binding Registers
No
of
Pep
tid
es
DQB1*0503 DRB1*0402
What next?
We have developed a predictive model for HLA-C (Cw*0401) with very limited (only six) experimental binding values.
The model yields excellent results for test data (AROC=0.93).
Application to determine immunological hot spots for HIV-1 p24gag and gp160gag glycoproteins shows binding energies similar to HLA-A and –B.
Conclusions
Computational models for immunogenic epitope prediction can be successfully developed, even for alleles with limited experimental data.
While computations can never completely replace “wet-lab” experiments, in silico predictions can significantly cut down the development time of therapeutic vaccines.
1. Genome analysis
Approaches EST analysis Annotation pipeline
using workflow strategies
Applications Parasitic nematodes Cancer EST data
Outcomes Comprehensive
annotation at the gene and protein levels
Novel &/or pathogen-specific genes
Immune response evasion strategies
2. Transcriptome analysis
Approaches Graph formalism for
alternative splicing Genome-wide analysis
Applications Drosophila genome Chicken compared to
human and mouse Kallikrein variants as
markers
Outcomes New mRNA-gDNA alignment
method, MGAlign & MGAlignIt First splicing graph database,
DEDB Web server for splicing
graphs, ASGS Sub-graph elements for
alternative splicing Multi-species splicing graph
database, GraphDB
3. Protein/Proteome research:Origin and evolution of structural domainsApproaches Intron mapping to
domain boundary All eukaryotic proteins
analyzed
Applications Domain prediction in
EST/genome data Effect of splice
variants on domains
Outcomes New database of protein
coding genes, XPro Visualization of intronic
locations on protein structural doimains, XDomView
Analysis tool, Go Module Viewer
3. Protein/Proteome research: Small disulfide-rich proteins<100 aa per domain; ≥ 2 SS bonds
Approaches Multiple structure
alignment and hierarchical classification
Comparative modeling rules
Sequence, structure and evolutionary analysis of Potato II inhibitor family
Outcomes New database, DSFD Server for model building,
SDPMOD Understanding of wound-
induced protease inhibitor folding
Applications Design of protease
inhibitors, channel modulators, growth regulators
3. Protein/Proteome research: Protease cleavage site predictionApproaches Detailed structural
modeling and docking of signal peptide moiety to signal peptidase I
SVM for caspases
Applications Enhanced production of
therapeutic and cemmercial heterologous proteins
Apoptosis initiation
Outcomes New databases, SPdb,
CasBase Server for caspase
clevage prediction, CASVM
Signal peptide cleavage prediction (under development)
4. Systems BiologyApproaches Holistic computational,
molecular biology and FRET study to locate secretion roadblocks
EST analysis of host-parasite interactions
Applications Trichoderma reesei as fungal
bioreactor Parasites that lead to: liver
cancer - food borne trematode (Opisthorchis viverrini) and bladder cancer (Schistosoma haematobium).
Outcomes Improved heterologous
protein production using filamentous fungi
Understanding of how parasites evade host immune activation
6. Genome-Phenome mapping
Approaches Mutation data for non-
laboratory animals Mapping to OMIM Mapping to structure
Applications OMIA-OMIM mapping
to structure Correlation between
genotype and disease pehnotype
Outcomes OMIA database, with links
to OMIM (courtesy NCBI) Mutations linked to
severity of disease for α-D-mannosidosis
Predictions of new human disease mutations from known mutation sites in cow, cat and guinea pig
7. Biodiversity Informatics: Customary medicinal plantsApproaches Integrating, visualizing and
analyzing ethnobotanical, phytochemical and pharmacological data on customary medicinal plants
Data from Australian aboriginal elders and Indian Siddha doctors
Applications Novel antimicrobial, anti-
inflammatory and anti-cancer lead compunds
Outcomes CMkb, an integrated
knowledgebase
DedicationsProf. Bernard Pullman
Mme. Alberte Pullman
My brother, a CML survivor
Acknowledgements
Dr. (Victor) J.C. Tong, NUS&I2R, Singapore A/Prof. Tin Wee Tan, NUS Dr. Animesh Sinha, Weill Medical College of
Cornell University & Michigan State University, USA
Drs. J. Tom August (JHU) and Vladimir Brusic (DFCI) (NIAID-NIH Grant #5 U19 AI56541 & Contract #HHSN266200400085C).