Immunological bioinformatics
Ole Lund,Center for Biological Sequence Analysis (CBS)Denmark.
World-wide Spread of SARSStatus as of July 11, 2003: 8437 Infected, 813 Dead
SARS
First severe infectious disease to emerge in the post-genomic era Modern societies are vulnerable to epidemics Classical containment strategies has been successful in
controlling the epidemic, but– SARS may resurface (e.g. be seasonal)– Suggested existence of an animal reservoir could compromise the
containment strategy Need to develop a vaccine strategy Biotechnology has provided new tools to analyze
genome/proteome information and guide vaccine development. The causative virus, the SARS corona virus (SARS CoV), has
been isolated and full-length sequenced.
Main scientific achievements
Discovery of causative agent Genome(s) 3D Structure of main
proteinase Origin
– Similar virus found in from Himalayan palm civets and other animals, including a raccoon-dog, and in humans working at an animal market in Guangdong, China (Guan et al., Sep 4, 2003).
Himalayan (Masked) palm civet
Ferret-Badger
Raccoon-doghttp://biobase.dk/~david-c/uk-dk-mammmal-list.htm
New corona viruses
1978 Porcine Epidemic diarrhea virus (PEDV)Probably from humans
1984 Porcine Respiratory Coronavirus1987 Porcine Reproductive and Respiratory
Syndrome (PRRS)1993 Bovine corona virus2003 SARS
Source: Michael Buchmeier, Beijing June, 2003
Will it be back?
When?– Every year?, Like the flu.– Every few years? Like measles used to.– Sporadic? Like Ebola– Never?
Lab safety: The patient, a 27-year-old virologist, worked on the West Nile virus in a biosafety level 3 lab at the Environmental Health Institute, where the SARS coronavirus was also studied (Enserink, 2003)
How does the immune system “see” a virus?
The immune system
The innate immune system– Found in animals and plants – Fast response– Complement, Toll like receptors
The adaptive Immune system– Found in vertebrates– Stronger response 2nd time– B lymphocytes
Produce antibodies (Abs) recognizes 3D shapes Neutralize virus/bacteria outside cells
– T lymphocytes Cytotoxic T lymphocytes (CTLs) - MHC class I
– Recognize foreign protein sequences in infected cells– Kill infected cells
Helper T lymphocytes (HTLs) - MHC class II– Recognize foreign protein sequences presented by immune cells– Activates cells
Weight matrices (Hidden Markov models)
YMNGTMSQVGILGFVFTLALWGFFPVVILKEPVHGVILGFVFTLTLLFGYPVYVGLSPTVWLSWLSLLVPFVFLPSDFFPSCVGGLLTMVFIAGNSAYE
A2 Logo
Protein sequence information content
Entropy– Average Uncertainty in the random variable
– H = -pilog2pi range: 0 to log2(20) = 4.3
– Logo height I = log2(20) + H
Relative entropy (Kullback Leibler distance)
– D = pilog2(pi/qi) range: 0 to infinity
Mutual information– Reduction in uncertainty due to knowledge of another random
variable (corresponds to correlation)
– M = pijlog2(pij/pipj)
Prediction of MHC binding specificity
Simple Motifs– Allowed (non allowed) amino acids
Extended motifs– Amino acid preferences
Structural models– Limitations: precision of force field, and speed
of calculations
Neural networks– Can take correlations into account
Log odds ratios
Used for scoring Alignments (BLAST), HMMs, Matrix methods
Odds ratio of observing given amino acids– Relative probability of observing amino acid i
in motif position j– Oj = p(aai at pos j)/p(aai)
Assumption of independence =>– Odds for observing sequence = O1O2 … On
Log odds ratio– LO = log(O1O2 … On) = log(O1)+log(O2)+…
log(On)– LO in half bits = 2 LO/log(2)
A
F
C
G
Evaluation of prediction accuracy
Coverage = TP/actual_positive
Reliability = TP/predicted_positive
A*1101 performance154 peptides, 9 Binders
0.350.450.5
0.76
0
0.2
0.4
0.6
0.8
1
Pea
rso
n c
orr
elat
ion
co
effi
cien
t
Prediction method
Correlation
SYFPEITHI Bimas HMM NN
0.110.18
0.330.43
0
0.2
0.4
0.6
0.8
1
True
pos
itive
rat
io
Prediction method
50% Coverage
SYFPEITHI Bimas HMM NN
0 0 0
0.44
0
0.2
0.4
0.6
0.8
1
Cov
erag
e
Prediction method
95% Reliability
SYFPEITHI Bimas HMM NN
From Bill Paul, ”Fundamental Immunology”, 4th Ed
The MHC gene region
Human LeuHuman Leukkocyte antigen ocyte antigen (HLA=(HLA=MHC in humansMHC in humans) ) polymorphism - allelespolymorphism - alleles
A total of229 HLA-A464 HLA-B111 HLA-C
class I alleles have been named,a total of
2 HLA-DRA, 364 HLA-DRB22 HLA-DQA1, 48 HLA-DQB120 HLA-DPA1, 96 HLA-DPB1
class II sequences have also been assigned.
As of October 2001 (http://www.anthonynolan.com/HIG/index.html)
HLA polymorphism HLA polymorphism - supertypes- supertypes
•Each HLA molecule within a supertype essentially binds the same peptides•Nine major HLA class I supertypes have been defined
•HLA-A1, A2, A3, A24,B7, B27, B44, B58, B62
Sette et al, Immunogenetics (1999) 50:201-212
Supertypes Phenotype frequencies
Caucasian Black Japanese ChineseHispanicAverage
A2,A3, B27 83 % 86 % 88 % 88 % 86 % 86%
+A1, A24, B44 100 % 98 % 100 % 100 % 99 % 99 %
+B7, B58, B62 100 % 100 % 100 % 100 % 100 % 100 %
Sette et al, Immunogenetics (1999) 50:201-212
HLA polymorphism - frequencies
Conclutions
We suggest to– split some of the alleles in the A1 supertype into a
new A26 supertype– split some of the alleles in the B27 supertype into a
new B39 supertype. – the B8 alleles may define their own supertype– The specificities of the class II molecules can be
clustered into nine classes, which only partly correspond to the serological classification
Lund O, Nielsen M, Kesmir C, Petersen AG, Lundegaard C, Worning P, Sylvester-Hvid C, Lamberth K, Roder G, Justesen S, Buus S, Brunak S. Definition of supertypes for HLA molecules using clustering of specificity matrices. Immunogenetics. 2004 Feb 13 [Epub ahead of print]
MHC class I binding of SARS peptides
Predictions for all supertypes– Broad population coverage
Allele specific neural networks– Peptides with associated measured binding affinity– A1 (A0101), A2 (A0204), A3 (A1101+A0301), B7 (B0702)
Weight matrices– Peptides from public databases (Sypfeithi, MHCpep)– A24, B27, B44, B58 and B62
Super type weight matrices
B27
B62B58
B44
Proteasomal cleavage
Epitope predictions
Binding to MHC class I High probability for C-terminal proteasomal
cleavage No sequence variation
Inside out:1. Position in RNA2. Translated regions (blue)3. Observed variable spots4. Predicted proteasomal cleavage5. Predicted A1 epitopes6. Predicted A*0204 epitopes7. Predicted A*1101 epitopes8. Predicted A24 epitopes9. Predicted B7 epitopes10. Predicted B27 epitopes11. Predicted B44 epitopes12. Predicted B58 epitopes13. Predicted B62 epitopes
Christina Sylvester-Hvid, University of Copenhagen , July, 2003
DevelopmentDevelopment
22mmHeavy chainHeavy chain
peptidepeptide IncubationIncubationPeptide-MHC Peptide-MHC complexcomplex
Strategy for the quantitative ELISA assay
C. Sylvester-Hvid, et al., Tissue antigens, 2002: 59:251
• Step I: Folding of MHC class I molecules in solutionStep I: Folding of MHC class I molecules in solution
• Step II: Detection of Step II: Detection of de novode novo folded MHC class I molecules by ELISA folded MHC class I molecules by ELISA
Summery of peptide binding assays
#tested #binding <500nMA1 15 13A2 15 12 A3 15 14 A24 0 -B7 15 10B27 13 2B44 0 -B58 15 13B62 14 12
• New epitopes 12• Poor C-term cleavage 8• Cleavage within 31• Linker length 12
Initial polytope (19 HIV epitopes)
• New epitopes 1• Weak C-term cleavage 3• Cleavage within 7• Linker length 37
Optimized polytope
MHC class II Molecule
Virtual matrices
HLA-DR molecules sharing the same pocket amino acid pattern, are asumed to have identical amino acid binding preferences.
MHC Class II binding
Virtual matrices– TEPITOPE: Hammer, J., Current Opinion in Immunology 7, 263-269, 1995, – PROPRED: Singh H, Raghava GP Bioinformatics 2001 Dec;17(12):1236-7
Web interface http://www.imtech.res.in/raghava/propred
Prediction Results
MHC class II prediction
Complexity of problem– Peptides of different
length– Weak motif signal
Alignment crucial Gibbs Monte Carlo
sampler
RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPAGSLFVYNITTNKYKAFLDKQSALLSSDITASVNCAKPKYVHQNTLKLATGFKGEQGPKGEPDVFKELKVHHANENISRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE
Class II binding motif
RFFGGDRGAPKRG YLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK PKYVHQNTLKLAT GFKGEQGPKGEP DVFKELKVHHANENI SRYWAIRTRSGGI TYSTNEIDLQLSQEDGQTI
Random ClustalW
Gibbs sampler
Alignment by Gibbs sampler
MHC class II predictionsAllele DRB1_0401
00.10.20.30.40.50.60.70.80.9
MHCbench1
MHCbench2
MHCbench3
MHCbench4
MHCbench5
MHCbench6
MHCbench7
MHCbench8
Southwood
Geluk
Tepitope
Gibbs
Accuracy
Polytope construction
NH2 COOH
Epitope
Linker
M
C-terminal cleavage
Cleavage within epitopes
New epitopescleavage
Prediction of Antibody epitopes
Linear– Hydrophilicity scales (average in ~7 window)
Hoop and Woods (1981) Kyte and Doolittle (1982) Parker et al. (1986)
– Other scales & combinations Pellequer and van Regenmortel Alix
Discontinuous– Protrusion (Novotny, Thornton, 1986)
Neural networks (In preparation)
Secondary structure in epitopes
Sec struct: H T B E S G I .
Log odds ratio
-0.19 0.30 0.21 -0.27 0.24 -0.04 0.00 0.17
H: Alpha-helix (hydrogen bond from residue i to residue i+4)
G: 310-helix (hydrogen bond from residue i to residue i+3)
I: Pi helix (hydrogen bond from residue i to residue i+5)
E: Extended strand
B: Beta bridge (one residue short strand)
S: Bend (five-residue bend centered at residue i)
T: H-bonded turn (3-turn, 4-turn or 5-turn)
. : Coil
Amino acids in epitopes
Amino Acid
G A V L I M P F W S
e/E 0.09 0.07 0.05 0.08 0.04 0.02 0.06 0.03 0.01 0.08
. 0.07 0.08 0.07 0.10 0.06 0.03 0.05 0.05 0.02 0.07
Amino acid
C T Q N H Y E D K R
e/E 0.03 0.08 0.04 0.04 0.02 0.04 0.06 0.07 0.07 0.04
. 0.03 0.06 0.04 0.05 0.02 0.03 0.04 0.04 0.05 0.04
Fre
Dihedral angles in epitopes
Z-scores for number of dihedral angle combinations in epitopes vs. non epitopes
Phi\Psi 1 2 3 4 5 6 7 8 9 10 11 12
1 -0.47 0.44 -0.58 0.45 0.46 0.00 0.00 -0.73 -0.79 0.00 -0.83 1.42
2 -0.01 -0.12 -1.82 0.52 1.75 0.00 0.00 0.00 1.42 -0.82 0.00 0.00
3 1.82 -2.26 -1.57 0.48 0.10 0.00 -0.77 0.45 1.77 0.00 -0.82 0.99
4 1.76 1.15 -0.34 0.75 0.00 0.00 0.97 0.16 0.38 1.03 0.00 0.00
5 -0.85 0.45 -1.09 0.57 0.00 0.00 0.00 0.13 1.52 0.00 1.02 -0.79
6 0.60 1.28 1.30 1.73 0.00 0.00 0.00 0.00 1.32 -0.89 -0.76 0.00
7 0.27 -0.91 1.67 -0.51 0.00 0.00 0.00 0.00 -1.02 -1.09 0.00 0.00
8 0.93 1.21 -0.23 -3.63 0.49 0.00 0.00 0.00 0.00 -0.19 0.31 -0.82
9 0.00 0.28 -0.67 0.33 0.01 -0.83 0.00 0.00 0.87 0.23 0.00 0.00
10 0.00 0.95 1.71 -0.70 0.00 0.00 0.00 1.29 1.08 0.00 1.00 0.00
11 0.00 0.00 1.02 0.00 0.00 0.00 0.00 0.86 -0.75 0.00 0.00 0.00
12 0.42 0.83 0.28 1.68 0.00 0.00 0.00 0.00 1.03 -0.21 -0.79 0.93
Immunological bioinformatics
Classical experimental research– Few data points– Data recorded by pencil and paper/spreadsheet
New experimental methods– Sequencing– DNA arrays– Proteomics
Need to develop new methods for handling these large data sets
Immunological Bioinformatics/Immunoinformatics
Acknowledgements
CBS, Technical University of Denmark
Søren Brunak (Director of CBS)
Morten Nielsen (Epitope prediction)
Peder Worning (Genome atlases)
Claus Lundegaard (Data bases)
Mette Børgesen (CTL prediction)
Jesper Schantz (Polytope optimization)
IMMI, University of Copenhagen
Søren Buus (Professor)
Christina Sylvester-Hvid (Experimental coordinator)
Kasper Lamberth (Peptide bank, Quality control)
Erland Johansson, Jeanette Nielsen (Preparations of peptides)
Hanne Møller (ELISA binding assay)