Bioinformatics of Disease: immune epitope prediction Shoba Ranganathan Professor and Chair – Bioinformatics Dept. of Chemistry and Biomolecular Sciences & Adjunct Professor Biotechnology Research Institute Dept. of Biochemistry Macquarie University Yong Loo Lin School of Medicine Sydney, Australia National University of Singapore, Singapore ([email protected]) ([email protected]) Visiting scientist @ Institute for Infocomm Research (I 2 R), Singapore
80
Embed
Bioinformatics of Disease: immune epitope prediction
Bioinformatics of Disease: immune epitope prediction. Shoba Ranganathan Professor and Chair – Bioinformatics Dept. of Chemistry and Biomolecular Sciences & Adjunct Professor Biotechnology Research Institute Dept. of Biochemistry Macquarie University Yong Loo Lin School of Medicine - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Bioinformatics of Disease: immune epitope prediction
Shoba RanganathanProfessor and Chair – Bioinformatics Dept. of Chemistry and Biomolecular Sciences & Adjunct Professor Biotechnology Research Institute Dept. of BiochemistryMacquarie University Yong Loo Lin School of MedicineSydney, Australia National University of Singapore, Singapore([email protected]) ([email protected])
Visiting scientist @ Institute for Infocomm Research (I2R), Singapore
Bioinformatics is ….. Bioinformatics is the study of living
Suggest candidate epitopes by in silico screening of entire proteins and even proteomes with specificity at:the allele levelthe supertype leveldisease-implicated alleles alone.
Minimize the number of wet-lab experiments Cut down the lead time involved in epitope
discovery and vaccine design
Computational models can help identify T cell epitopes
(interface area, H bonds, gap volume and gap index)
0
510
1520
25
3035
40
A*01
01
A*02
01
A*68
01
B*15
01
B*3
501
B*08
01
B*27
05
B*2
709
B*44
02
B*44
03
B*44
05
B*51
01
B*53
01
Cw
*030
4
Cw
*040
1
E*01
03
E*01
01
G*0
101
DR
B1*
0101
DR
B1*
0301
DR
B1*
1501
DR
B5*
0101
DQ
B1*0
302
DQ
B1*0
602
DR
B1*
0401
DQ
B1*
0201
RT
1.A
a
RT1
-A1C
H2-
Db
H2-
Dd
H2-
Kb
H2-
Ld
H2-
M3
H2-
Qa-
2
I-Ak
I-Ab
I-Ad
I-Au
I-Ek
I-Ag
7
101 new entries 187 entries (Human: 110; Murine: 74; Rat: 3) 134 non-redundant entries (class I: 100; class II: 34) 121 class I and 41 class II entries 26 HLA alleles (class I: 18; class II: 8) 14 rodent alleles (class I: 8; class II: 6) 16 TCR/peptide/MHC complexes
Distribution of MHC by allele
Peptide/MHC binding motifs
Conserved peptide properties in solution structures Classified according to
Alleles Peptide length
Polar Amide Basic Acidic Hydrophobic
1. There were only 36 crystal structures of unique MHC (2006) alleles vs. 1765 unique MHC alleles identified in IMGT/HLA database
2. Structure determination through experimental methods is both expensive and time-consuming
3. Homology model building for alleles with no structural data!
How to obtain structures of experimentally unsolved alleles?
Introduction Structural Immunoinformatic
Database development Data Analysis of pMHC Class I
complexes Computational models Applications
Data & text mining
Maths/Stats
Structures
MHC Class I superfamilies have different interaction characteristics
Superfamily HLA-A2 (36 entries)
HLA-B7(12 entries)
HLA-B27(18 entries)
Interface area (Å2) 846.3±48.9 876.7±72.4 934.0±136.0
Gap volume (Å3) 799.8±195.2 870.2±198.0 985.1±101.5
Gap index 0.9±0.2 1.0±0.1 1.0±0.3
Hydrogen bonds 11.1±1.9Concentrated at pockets A, B, F
14.3±2.3Well distributed
17.9±2.8Concentrated at pockets A, B, F
Single linkage cluster analysis of 68 pMHC Class I complexes from 13 alleles (all available A and B)
Data 68 peptide–HLA complexes spanning 13 classes I alleles
from MPID-THierarchical clustering Hierarchical clustering using the agglomerative algorithm. Distance between structures computed by single-linkage
method (MATLAB version 7.0) based on the separation between the each pair of data points.
Nearest neighbors merged into clusters. Smaller clusters were then merged into larger clusters based
on inter-cluster distances, until all structures are combined. Last 3 levels considered for defining HLA class I supertypes.Interaction parameters Significant for the characterization of peptide/MHC interface:
Intermolecular hydrogen bonds pMHC Interface area
Binding characteristics of HLA supertypes analyzed
Details
Gap volume Gap index
B27
B44
B7
B62
B8
Legend
Do the Class I alleles aggregate into “superfamilies” using receptor-ligand interaction patterns?
80 HLA class I complexes 13 class I alleles Five descriptors Hierarchical clustering using
nearest neighbor algorithm 77% consensus with data
Desmet et al. Combinatorial Buildup Algorithm RGYVYQGL 0.56 0.32
Rosenfeld et al. Multiple Copy AlgorithmFAPGNYPAL 2.70 0.40
GILGFVFTL 1.40 0.32
Sezerman et al. Combinatorial Buildup Algorithm
LLFGYPVYV 1.40 0.33
ILKGPVHGV 1.30 0.87
GILGFVFTL 1.60 0.32
TLTSCNTSV 2.20 0.46
aRMSD of peptide backbone obtained from respective authors. bRMSD of peptide backbone obtained in our work from redocking bound complexes and single template respectively.
Quantitative separation of binders from non-binders: empirical free energy scoring function DQ3.2b involved in several autoimmune
Gbind = binding free energy GH = hydrophobic term GS = decrease in side chain entropy GEL = electrostatic term C = entropy change in system due to external
factors α, β, γ optimized by least-square multivariate regression
with experimental binding affinities (IC50) of MHC-peptides in training dataset (Rognan et al., 1999)
Quantitative separation of binders from non-binders: empirical free energy scoring function
Gbind ≈ -RT ln (IC50) (Rognan et al., 1999).
Test case: MHC Class II DQ8
DQ3.2b (DQA1*0301/DQB1*0302) is involved in several autoimmune diseases: Celiac disease insulin-dependent diabetes mellitus IDDM-associated periodontal disease autoimmune polyendocrine syndrome
type II
Data used Structure: 1JK8 - DQ3.2β–insulin B9-23 complex Dataset I: 127 peptides with experimentally determined
Dataset II: 12 Dermatophagoides pternnyssinus (Der p 2) peptides with experimental T-cell proliferation values from functional studies, with 7 peptides eliciting DQ3.2β-restricted T-cell proliferation.
Gbind ≈ -RT ln (IC50) (Rognan et al., 1999).
Training 56 binding conformations with known registers 30 non-binding conformations from 3 non-
binders Testing
Test set 1 – 68 peptides from biochemical studies 16 strong ; 13 medium; 21 weak; 18 non-binders
Test set 2 – 12 peptides from functional studies 7 elicit T-cell proliferation
Scoring: Training & testing datasets
Y Q T I E E N I K I F E E D A
E285B 112-126 peptide
Core sequence Binding Energy
YQTIEENIK -23.12
QTIEENIKI -21.34
TIEENIKIF -25.32
IEENIKIFE -29.53
EENIKIFEE -32.27
ENIKIFEED -21.72
NIKIFEEDA -22.95
Screening class II binding register: a sliding window approach
Docking
Anchoring root fragments (probes) to reduce search space
Loop modeling
Refinement of binding register
Extension of flanking residues for MHC Class II
A
B
C
D
4-step protocol used
Sensitivity (SE) = number of binders correctly predicted = TP/AP (TP+FN)
Specificity (SP) = number of non-binders correctly predicted
= TN/AN (TN+FP)
Accuracy estimates
Area under ROC (receiver operating characteristics) curve:>90% excellent>80% good
Results for Training setSpecificity (SP) Level
Group Sensitivity (SE)
Binding Energy Threshold (kJ/mol)
LMH 0.90 -28.70 MH 0.85 -29.10
SP = 0.80
H 0.75 -30.82 LMH 0.84 -29.10 MH 0.77 -30.50
SP = 0.90
H 0.75 -32.74 LMH 0.81 -29.93 MH 0.73 -32.12
SP = 0.95
H 0.63 -33.59
High SE (good for most predictions)
Very few FPs, but also fewer predictions
Group LMH MH HAROC 0.88 0.93 0.93
Screening class II binding register: HLA-DQ8 prediction accuracy for Test Set I
Classification of binding peptides High-affinity binders (H)
IC50 ≤ 500 nM Medium-affinity binders (M)
500 nM < IC50 ≤ 1500 nM Low-affinity binders (L)
1500 < IC50 ≤ 5000 nM
Position 1 4 6 7 9 Source BE
(kJ/mol) IC50 (nM)
Binding Motif
T D R R Q S V V V N W M D D G K A A A D E I I I P D Y Y R Q E F L M
L Q L Q P F P Q P Q P F P P L A-gliadin 56-70 -41.01 20 D M T P A D A L D D F D L HSV -40.53 173 A A A A A V A A E A Y Artificial sequence -39.98 48 G V A G L L V A L A V IA-2 499-509 -36.16 95 D S N I M N S I N N V M D E I D F F E K Pf ABRA 487–506 -36.01 171 F E S T G N L I A P E Y G F K I S Y HA 255–271Y -35.70 62 Y P F I E Q E G P E F F D Q E MHC Ia 51–63 analog -35.34 1156 L L D I L D T A G L E E Y S A M R D p21 51–66; C out -35.27 202 Q P Y P Q P Q P F P S Q Q P Y A-gliadin 41-55 -35.26 1120 F P S Q Q P Y L Q L Q P F P Q A-gliadin 49-63 -33.93 20 C D G E R P T L A F L Q D V M GAD 101–115 -33.57 69 S F P P Q Q P Y P Q P Q P Q Y A-gliadin 77-91 -33.35 370 S Q D L E L S W N L N G L Q A D L S S FceR 104–122 -32.89 123 E P R A P W I E Q E G P E Y W MHC Ia 46-63 -32.89 519 P P L Y A T G R L S Q A Q L M P S P P M VP16 -32.59 538 S Q D L E L S W N L N G L Q A Y FceR 104–122 analog -32.49 118
Ligands / Epitopes
I A R A K M F P A V A E K 34P3A -31.91 541
Test Set 1: Improved detection of binders
lacking position specific binding motifs
Binding registers20/23 (87%) binding registers Only register (aa 4-12) from Test Set 2
(Der p 2: 1-20)(SE=0.80; SP(LMH)=0.90)
Top 5 predictions are experimental positives at very stringent threshold criteria (SE=0.95; SP(H)=0.63)
T-cell proliferation
Multiple registers (SP=0.95, SE(LMHP =0.81): 58% of Test Set 1)
0123456789
1011121314
1 2 3 4 5 6 7
No of Binding Registers
No o
f Pep
tides
Weak Binders Medium Binders Strong Binders
Mainly for medium and high binders Experimental support: Sinha et al. for
DRB1*0402 Is this why binding motifs are unsuccessful?
Introduction Structural Immunoinformatic Database
development Data Analysis Computational models developed Applications
Autoimmune blistering skin disorder Characterized by autoantibodies targeting
desmoglein-3 (Dsg3) Strong association with DR4 and DR6 alleles
Pemphigus vulgaris (PV)
http://www.medscape.com
adam.about.com
www.aafp.org
Who are the major players in PV? DR4 PV implicated alleles (for Semitic)
DR4 PV 8/9 investigated Dsg3 peptides fit perfectly into DRB1*0402 Atomic clashes with all other investigated DR4 subtypesDR6 PV 6/9 investigated Dsg3 peptides fit perfectly into DRB1*0503 Atomic clashes with all other investigated DR6 subtypes
HLA association in DR6 PV more likely to be at DQ than DR locus
Consistent with experimental work done by Sinha et al. (2002, 2005, 2006)
Disease associated alleles vs. innocent bystanders
Tong et al. (2006) Immunome Research, 2: 1
1/9 investigated Dsg3 peptides fits existing binding motifs Flanking residues – clashes in fitting binding register Register-shift for Peptide V (Dsg3 342-356)
possess > 2 binding registers 66% (354/539) bind both alleles at different registers Similar proportion (70%) detected in known binders to both
alleles
Both alleles bind similar peptides via different binding registers
0
50
100
150
200
250
300
350
0 1 2 3 4 5 6
No of Binding Registers
No o
f Pep
tides
DQB1*0503 DRB1*0402
What next? We have developed a predictive model for
HLA-C (Cw*0401) with very limited (only six) experimental binding values.
The model yields excellent results for test data (AROC=0.93).
Application to determine immunological hot spots for HIV-1 p24gag and gp160gag glycoproteins shows binding energies similar to HLA-A and –B.
Conclusions Computational models for immunogenic
epitope prediction can be successfully developed, even for alleles with limited experimental data.
While computations can never completely replace “wet-lab” experiments, in silico predictions can significantly cut down the development time of therapeutic vaccines.
1. Genome analysisApproaches EST analysisAnnotation pipeline using workflow
strategies
ApplicationsParasitic nematodesCancer EST data
OutcomesComprehensive
annotation at the gene and protein levels
Novel &/or pathogen-specific genes
Immune response evasion strategies
2. Transcriptome analysisApproaches Graph formalism for
alternative splicing Genome-wide analysis
Applications Drosophila genome Chicken compared to
human and mouse Kallikrein variants as
markers
Outcomes New mRNA-gDNA alignment
method, MGAlign & MGAlignIt First splicing graph database,
DEDB Web server for splicing graphs,
ASGS Sub-graph elements for
alternative splicing Multi-species splicing graph
database, GraphDB
3. Protein/Proteome research:Origin and evolution of structural domainsApproaches Intron mapping to
domain boundary All eukaryotic proteins
analyzed
Applications Domain prediction in
EST/genome data Effect of splice
variants on domains
Outcomes New database of protein
coding genes, XPro Visualization of intronic
locations on protein structural doimains, XDomView
Analysis tool, Go Module Viewer
3. Protein/Proteome research: Small disulfide-rich proteins<100 aa per domain; ≥ 2 SS bonds
Approaches Multiple structure
alignment and hierarchical classification
Comparative modeling rules
Sequence, structure and evolutionary analysis of Potato II inhibitor family
Outcomes New database, DSFD Server for model building,
SDPMOD Understanding of wound-
induced protease inhibitor folding
Applications Design of protease
inhibitors, channel modulators, growth regulators
3. Protein/Proteome research: Protease cleavage site predictionApproaches Detailed structural
modeling and docking of signal peptide moiety to signal peptidase I
SVM for caspases
Applications Enhanced production of
therapeutic and cemmercial heterologous proteins
Apoptosis initiation
Outcomes New databases, SPdb,
CasBase Server for caspase
clevage prediction, CASVM
Signal peptide cleavage prediction (under development)
4. Systems BiologyApproaches Holistic computational,
molecular biology and FRET study to locate secretion roadblocks
EST analysis of host-parasite interactions
Applications Trichoderma reesei as fungal
bioreactor Parasites that lead to: liver
cancer - food borne trematode (Opisthorchis viverrini) and bladder cancer (Schistosoma haematobium).
Outcomes Improved heterologous
protein production using filamentous fungi
Understanding of how parasites evade host immune activation
6. Genome-Phenome mappingApproaches Mutation data for non-
laboratory animals Mapping to OMIM Mapping to structure
Applications OMIA-OMIM mapping
to structure Correlation between
genotype and disease pehnotype
OutcomesOMIA database, with
links to OMIM (courtesy NCBI)
Mutations linked to severity of disease for α-D-mannosidosis
Predictions of new human disease mutations from known mutation sites in cow, cat and guinea pig
7. Biodiversity Informatics: Customary medicinal plantsApproaches Integrating, visualizing and
analyzing ethnobotanical, phytochemical and pharmacological data on customary medicinal plants
Data from Australian aboriginal elders and Indian Siddha doctors
Applications Novel antimicrobial, anti-
inflammatory and anti-cancer lead compunds
Outcomes CMkb, an integrated
knowledgebase
Dedications Prof. Bernard Pullman
Mme. Alberte Pullman
My brother, a CML survivor
Acknowledgements Dr. (Victor) J.C. Tong, NUS&I2R, Singapore A/Prof. Tin Wee Tan, NUS Dr. Animesh Sinha, Weill Medical College of
Cornell University & Michigan State University, USA
Drs. J. Tom August (JHU) and Vladimir Brusic (DFCI) (NIAID-NIH Grant #5 U19 AI56541 & Contract #HHSN266200400085C).