Immunological bioinformatics

Immunological bioinformatics

Ole Lund,Center for Biological Sequence Analysis (CBS)Denmark.

World-wide Spread of SARSStatus as of July 11, 2003: 8437 Infected, 813 Dead

SARS

First severe infectious disease to emerge in the post-genomic era Modern societies are vulnerable to epidemics Classical containment strategies has been successful in

controlling the epidemic, but– SARS may resurface (e.g. be seasonal)– Suggested existence of an animal reservoir could compromise the

containment strategy Need to develop a vaccine strategy Biotechnology has provided new tools to analyze

genome/proteome information and guide vaccine development. The causative virus, the SARS corona virus (SARS CoV), has

been isolated and full-length sequenced.

Main scientific achievements

Discovery of causative agent Genome(s) 3D Structure of main

proteinase Origin

– Similar virus found in from Himalayan palm civets and other animals, including a raccoon-dog, and in humans working at an animal market in Guangdong, China (Guan et al., Sep 4, 2003).

Himalayan (Masked) palm civet

Ferret-Badger

Raccoon-doghttp://biobase.dk/~david-c/uk-dk-mammmal-list.htm

New corona viruses

1978 Porcine Epidemic diarrhea virus (PEDV)Probably from humans

1984 Porcine Respiratory Coronavirus1987 Porcine Reproductive and Respiratory

Syndrome (PRRS)1993 Bovine corona virus2003 SARS

Source: Michael Buchmeier, Beijing June, 2003

Will it be back?

When?– Every year?, Like the flu.– Every few years? Like measles used to.– Sporadic? Like Ebola– Never?

Lab safety: The patient, a 27-year-old virologist, worked on the West Nile virus in a biosafety level 3 lab at the Environmental Health Institute, where the SARS coronavirus was also studied (Enserink, 2003)

How does the immune system “see” a virus?

The immune system

The innate immune system– Found in animals and plants – Fast response– Complement, Toll like receptors

The adaptive Immune system– Found in vertebrates– Stronger response 2nd time– B lymphocytes

Produce antibodies (Abs) recognizes 3D shapes Neutralize virus/bacteria outside cells

– T lymphocytes Cytotoxic T lymphocytes (CTLs) - MHC class I

– Recognize foreign protein sequences in infected cells– Kill infected cells

Helper T lymphocytes (HTLs) - MHC class II– Recognize foreign protein sequences presented by immune cells– Activates cells

Weight matrices (Hidden Markov models)

YMNGTMSQVGILGFVFTLALWGFFPVVILKEPVHGVILGFVFTLTLLFGYPVYVGLSPTVWLSWLSLLVPFVFLPSDFFPSCVGGLLTMVFIAGNSAYE

A2 Logo

Protein sequence information content

Entropy– Average Uncertainty in the random variable

– H = -pilog2pi range: 0 to log2(20) = 4.3

– Logo height I = log2(20) + H

Relative entropy (Kullback Leibler distance)

– D = pilog2(pi/qi) range: 0 to infinity

Mutual information– Reduction in uncertainty due to knowledge of another random

variable (corresponds to correlation)

– M = pijlog2(pij/pipj)

Prediction of MHC binding specificity

Simple Motifs– Allowed (non allowed) amino acids

Extended motifs– Amino acid preferences

Structural models– Limitations: precision of force field, and speed

of calculations

Neural networks– Can take correlations into account

Log odds ratios

Used for scoring Alignments (BLAST), HMMs, Matrix methods

Odds ratio of observing given amino acids– Relative probability of observing amino acid i

in motif position j– Oj = p(aai at pos j)/p(aai)

Assumption of independence =>– Odds for observing sequence = O1O2 … On

Log odds ratio– LO = log(O1O2 … On) = log(O1)+log(O2)+…

log(On)– LO in half bits = 2 LO/log(2)

A

F

C

G

Evaluation of prediction accuracy

Coverage = TP/actual_positive

Reliability = TP/predicted_positive

A*1101 performance154 peptides, 9 Binders

0.350.450.5

0.76

0

0.2

0.4

0.6

0.8

1

Pea

rso

n c

orr

elat

ion

co

effi

cien

t

Prediction method

Correlation

SYFPEITHI Bimas HMM NN

0.110.18

0.330.43

0

0.2

0.4

0.6

0.8

1

True

pos

itive

rat

io

Prediction method

50% Coverage


0 0 0

0.44

0

0.2

0.4

0.6

0.8

1

Cov

erag

e

Prediction method

95% Reliability


From Bill Paul, ”Fundamental Immunology”, 4th Ed

The MHC gene region

Human LeuHuman Leukkocyte antigen ocyte antigen (HLA=(HLA=MHC in humansMHC in humans) ) polymorphism - allelespolymorphism - alleles

A total of229 HLA-A464 HLA-B111 HLA-C

class I alleles have been named,a total of

2 HLA-DRA, 364 HLA-DRB22 HLA-DQA1, 48 HLA-DQB120 HLA-DPA1, 96 HLA-DPB1

class II sequences have also been assigned.

As of October 2001 (http://www.anthonynolan.com/HIG/index.html)

HLA polymorphism HLA polymorphism - supertypes- supertypes

•Each HLA molecule within a supertype essentially binds the same peptides•Nine major HLA class I supertypes have been defined

•HLA-A1, A2, A3, A24,B7, B27, B44, B58, B62

Sette et al, Immunogenetics (1999) 50:201-212

Supertypes Phenotype frequencies

Caucasian Black Japanese ChineseHispanicAverage

A2,A3, B27 83 % 86 % 88 % 88 % 86 % 86%

+A1, A24, B44 100 % 98 % 100 % 100 % 99 % 99 %

+B7, B58, B62 100 % 100 % 100 % 100 % 100 % 100 %

Sette et al, Immunogenetics (1999) 50:201-212

HLA polymorphism - frequencies

Conclutions

We suggest to– split some of the alleles in the A1 supertype into a

new A26 supertype– split some of the alleles in the B27 supertype into a

new B39 supertype. – the B8 alleles may define their own supertype– The specificities of the class II molecules can be

clustered into nine classes, which only partly correspond to the serological classification

Lund O, Nielsen M, Kesmir C, Petersen AG, Lundegaard C, Worning P, Sylvester-Hvid C, Lamberth K, Roder G, Justesen S, Buus S, Brunak S. Definition of supertypes for HLA molecules using clustering of specificity matrices. Immunogenetics. 2004 Feb 13 [Epub ahead of print]

MHC class I binding of SARS peptides

Predictions for all supertypes– Broad population coverage

Allele specific neural networks– Peptides with associated measured binding affinity– A1 (A0101), A2 (A0204), A3 (A1101+A0301), B7 (B0702)

Weight matrices– Peptides from public databases (Sypfeithi, MHCpep)– A24, B27, B44, B58 and B62

Super type weight matrices

B27

B62B58

B44

Proteasomal cleavage

Epitope predictions

Binding to MHC class I High probability for C-terminal proteasomal

cleavage No sequence variation

Inside out:1. Position in RNA2. Translated regions (blue)3. Observed variable spots4. Predicted proteasomal cleavage5. Predicted A1 epitopes6. Predicted A*0204 epitopes7. Predicted A*1101 epitopes8. Predicted A24 epitopes9. Predicted B7 epitopes10. Predicted B27 epitopes11. Predicted B44 epitopes12. Predicted B58 epitopes13. Predicted B62 epitopes

Christina Sylvester-Hvid, University of Copenhagen , July, 2003

DevelopmentDevelopment

22mmHeavy chainHeavy chain

peptidepeptide IncubationIncubationPeptide-MHC Peptide-MHC complexcomplex

Strategy for the quantitative ELISA assay

C. Sylvester-Hvid, et al., Tissue antigens, 2002: 59:251

• Step I: Folding of MHC class I molecules in solutionStep I: Folding of MHC class I molecules in solution

• Step II: Detection of Step II: Detection of de novode novo folded MHC class I molecules by ELISA folded MHC class I molecules by ELISA

Summery of peptide binding assays

#tested #binding <500nMA1 15 13A2 15 12 A3 15 14 A24 0 -B7 15 10B27 13 2B44 0 -B58 15 13B62 14 12

• New epitopes 12• Poor C-term cleavage 8• Cleavage within 31• Linker length 12

Initial polytope (19 HIV epitopes)

• New epitopes 1• Weak C-term cleavage 3• Cleavage within 7• Linker length 37

Optimized polytope

MHC class II Molecule

Virtual matrices

HLA-DR molecules sharing the same pocket amino acid pattern, are asumed to have identical amino acid binding preferences.

MHC Class II binding

Virtual matrices– TEPITOPE: Hammer, J., Current Opinion in Immunology 7, 263-269, 1995, – PROPRED: Singh H, Raghava GP Bioinformatics 2001 Dec;17(12):1236-7

Web interface http://www.imtech.res.in/raghava/propred

Prediction Results

MHC class II prediction

Complexity of problem– Peptides of different

length– Weak motif signal

Alignment crucial Gibbs Monte Carlo

sampler

RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPAGSLFVYNITTNKYKAFLDKQSALLSSDITASVNCAKPKYVHQNTLKLATGFKGEQGPKGEPDVFKELKVHHANENISRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE

Class II binding motif

RFFGGDRGAPKRG YLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK PKYVHQNTLKLAT GFKGEQGPKGEP DVFKELKVHHANENI SRYWAIRTRSGGI TYSTNEIDLQLSQEDGQTI

Random ClustalW

Gibbs sampler

Alignment by Gibbs sampler

MHC class II predictionsAllele DRB1_0401

00.10.20.30.40.50.60.70.80.9

MHCbench1

MHCbench2

MHCbench3

MHCbench4

MHCbench5

MHCbench6

MHCbench7

MHCbench8

Southwood

Geluk

Tepitope

Gibbs

Accuracy

Polytope construction

NH2 COOH

Epitope

Linker

M

C-terminal cleavage

Cleavage within epitopes

New epitopescleavage

Prediction of Antibody epitopes

Linear– Hydrophilicity scales (average in ~7 window)

Hoop and Woods (1981) Kyte and Doolittle (1982) Parker et al. (1986)

– Other scales & combinations Pellequer and van Regenmortel Alix

Discontinuous– Protrusion (Novotny, Thornton, 1986)

Neural networks (In preparation)

Secondary structure in epitopes

Sec struct: H T B E S G I .

Log odds ratio

-0.19 0.30 0.21 -0.27 0.24 -0.04 0.00 0.17

H: Alpha-helix (hydrogen bond from residue i to residue i+4)

G: 310-helix (hydrogen bond from residue i to residue i+3)

I: Pi helix (hydrogen bond from residue i to residue i+5)

E: Extended strand

B: Beta bridge (one residue short strand)

S: Bend (five-residue bend centered at residue i)

T: H-bonded turn (3-turn, 4-turn or 5-turn)

. : Coil

Amino acids in epitopes

Amino Acid

G A V L I M P F W S

e/E 0.09 0.07 0.05 0.08 0.04 0.02 0.06 0.03 0.01 0.08

. 0.07 0.08 0.07 0.10 0.06 0.03 0.05 0.05 0.02 0.07

Amino acid

C T Q N H Y E D K R

e/E 0.03 0.08 0.04 0.04 0.02 0.04 0.06 0.07 0.07 0.04

. 0.03 0.06 0.04 0.05 0.02 0.03 0.04 0.04 0.05 0.04

Fre

Dihedral angles in epitopes

Z-scores for number of dihedral angle combinations in epitopes vs. non epitopes

Phi\Psi 1 2 3 4 5 6 7 8 9 10 11 12

1 -0.47 0.44 -0.58 0.45 0.46 0.00 0.00 -0.73 -0.79 0.00 -0.83 1.42

2 -0.01 -0.12 -1.82 0.52 1.75 0.00 0.00 0.00 1.42 -0.82 0.00 0.00

3 1.82 -2.26 -1.57 0.48 0.10 0.00 -0.77 0.45 1.77 0.00 -0.82 0.99

4 1.76 1.15 -0.34 0.75 0.00 0.00 0.97 0.16 0.38 1.03 0.00 0.00

5 -0.85 0.45 -1.09 0.57 0.00 0.00 0.00 0.13 1.52 0.00 1.02 -0.79

6 0.60 1.28 1.30 1.73 0.00 0.00 0.00 0.00 1.32 -0.89 -0.76 0.00

7 0.27 -0.91 1.67 -0.51 0.00 0.00 0.00 0.00 -1.02 -1.09 0.00 0.00

8 0.93 1.21 -0.23 -3.63 0.49 0.00 0.00 0.00 0.00 -0.19 0.31 -0.82

9 0.00 0.28 -0.67 0.33 0.01 -0.83 0.00 0.00 0.87 0.23 0.00 0.00

10 0.00 0.95 1.71 -0.70 0.00 0.00 0.00 1.29 1.08 0.00 1.00 0.00

11 0.00 0.00 1.02 0.00 0.00 0.00 0.00 0.86 -0.75 0.00 0.00 0.00

12 0.42 0.83 0.28 1.68 0.00 0.00 0.00 0.00 1.03 -0.21 -0.79 0.93

Immunological bioinformatics

Classical experimental research– Few data points– Data recorded by pencil and paper/spreadsheet

New experimental methods– Sequencing– DNA arrays– Proteomics

Need to develop new methods for handling these large data sets

Immunological Bioinformatics/Immunoinformatics

Acknowledgements

CBS, Technical University of Denmark

Søren Brunak (Director of CBS)

Morten Nielsen (Epitope prediction)

Peder Worning (Genome atlases)

Claus Lundegaard (Data bases)

Mette Børgesen (CTL prediction)

Jesper Schantz (Polytope optimization)

IMMI, University of Copenhagen

Søren Buus (Professor)

Christina Sylvester-Hvid (Experimental coordinator)

Kasper Lamberth (Peptide bank, Quality control)

Erland Johansson, Jeanette Nielsen (Preparations of peptides)

Hanne Møller (ELISA binding assay)

Immunological bioinformatics

Documents

Immunological bioinformatics