Top Banner
Anchor Profiles of HLA-Specific Peptides: Analysis by a Novel Affinity Scoring Method and Experimental Validation Johan Desmet, 1 * Geert Meersseman, 1 Nathalie Boutonnet, 1 Jurgen Pletinckx, 1 Krista De Clercq, 1 Maja Debulpaep, 2 Tessa Braeckman, 2 and Ignace Lasters 1 1 AlgoNomics NV, Gent-Zwijnaarde, Belgium 2 Vrije Universiteit Brussel, Labo Fysiologie-Immunologie, Brussel, Belgium ABSTRACT The study of intermolecular inter- actions is a fundamental research subject in biol- ogy. Here we report on the development of a quanti- tative structure-based affinity scoring method for peptide–protein complexes, named PepScope. The method operates on the basis of a highly specific force field function (CHARMM) that is applied to all-atom structural representations of peptide– receptor complexes. Peptide side-chain contribu- tions to total affinity are scored after detailed rota- meric sampling followed by controlled energy refinement. A de novo approach to estimate dehydra- tion energies was developed, based on the simula- tion of individual amino acids in a solvent box filled with explicit water molecules. Transferability of the method was demonstrated by its application to the hydrophobic HLA-A2 and -A24 receptors, the polar HLA-A1, and the sterically ruled HLA-B7 receptor. A combined theoretical and experimental study on 39 anchor substitutions in FxSKQYMTx/HLA-A2 and -A24 complexes indicated a prediction accuracy of about two thirds of a log-unit in Kd. Analysis of free energy contributions identified a great role of desol- vation and conformational strain effects in establish- ing a given specificity profile. Interestingly, the method rightly predicted that most anchor profiles are less specific than so far assumed. This suggests that many potential T-cell epitopes could be missed with current prediction methods. The results pre- sented in this work may therefore significantly af- fect T-cell epitope discovery programs applied in the field of peptide vaccine development. Proteins 2005;58:53– 69. © 2004 Wiley-Liss, Inc. Key words: peptide–receptor complex; MHC com- plex; HLA complex; anchor residue; an- chor profile; binding specificity; affinity scoring function; solvent model; dock- ing INTRODUCTION Peptides are important regulatory molecules involved in a variety of biological mechanisms. Their function is generally determined by processing kinetics, interaction specificity, and, more fundamentally, binding affinity. A thorough understanding of the contributions relevant for stable complex formation may form the basis of experimen- tal rationalization, detection of novel ligands, and optimi- zation of lead compounds. Here, predictive structure- based methods can be very helpful, provided that they are of sufficiently high accuracy. Structure-based binding studies face two major techni- cal barriers. The first resides in the prediction of accurate 3D structures for peptide–receptor complexes. Peptides are conformationally very flexible, since most of their chemical bonds are subject to free rotation. A partial solution is to perform flexible docking 1,2 using predefined rotamers. 3,4 Yet deviation from ideal rotameric states 5 and small-scale flexibility due to bond angle bending 6 present additional difficulties. Occasionally, peptides adopt vari- able binding modes, 7–10 partly bulge out into solvent, 11 or let some flanking residues hang out of the binding site. 12 Yet it is often observed that one or more peptide side- chains are anchored into well-shaped pockets in the inter- face surface. 13 In such cases, conformational flexibility is limited, which facilitates structure-based analysis. The second problem is to derive accurate binding affini- ties from experimental or modeled representations. Even short peptides easily contain more than 100 atoms, mak- ing thousands of small pairwise atomic interactions. Fur- ther, ligand–receptor interface regions are rarely packed in an optimal way, and they often include multiple water molecules. 14,15 Finally, binding affinity depends on thermo- dynamic properties of not only the bound but also the free states of the molecules involved. In view of these complications, structure-based affinity scoring methods invariably include approximations and/or a parameterization step that makes use of experimental data. Different methodologies can be classified in two groups: dynamic and static approaches. The most success- Abbreviations: 3D, three-dimensional; A1, HLA-A*0101; A2, HLA- A*0201; A24, HLA-A*2402; ASA, accessible surface area; B7, HLA- B*0702; DMSO, dimethylsulfoxide; DTT, dithiothreitol; FEP, free energy perturbation; GSP, group solvation parameter; HLA, human leukocyte antigen; Kd, dissociation constant; MHC, major histocompat- ibility complex; pAla, poly-alanine; PDB, Protein Data Bank; RT, room temperature; wt, wild-type. Grant sponsor: Vlaams Instituut voor de bevordering van het Wetenschappelijk-Technologisch onderzoek in de industrie; Grant number: 010265. *Correspondence to: Johan Desmet, AlgoNomics NV, Technologiepark 4, B-9052 Gent, Belgium. E-mail: [email protected] Received 23 April 2004; Accepted 20 July 2004 Published online 3 November 2004 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.20302 PROTEINS: Structure, Function, and Bioinformatics 58:53– 69 (2005) © 2004 WILEY-LISS, INC.
17

Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

May 01, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

Anchor Profiles of HLA-Specific Peptides: Analysis by aNovel Affinity Scoring Method and Experimental ValidationJohan Desmet,1* Geert Meersseman,1 Nathalie Boutonnet,1 Jurgen Pletinckx,1 Krista De Clercq,1

Maja Debulpaep,2 Tessa Braeckman,2 and Ignace Lasters1

1AlgoNomics NV, Gent-Zwijnaarde, Belgium2Vrije Universiteit Brussel, Labo Fysiologie-Immunologie, Brussel, Belgium

ABSTRACT The study of intermolecular inter-actions is a fundamental research subject in biol-ogy. Here we report on the development of a quanti-tative structure-based affinity scoring method forpeptide–protein complexes, named PepScope. Themethod operates on the basis of a highly specificforce field function (CHARMM) that is applied toall-atom structural representations of peptide–receptor complexes. Peptide side-chain contribu-tions to total affinity are scored after detailed rota-meric sampling followed by controlled energyrefinement. A de novo approach to estimate dehydra-tion energies was developed, based on the simula-tion of individual amino acids in a solvent box filledwith explicit water molecules. Transferability of themethod was demonstrated by its application to thehydrophobic HLA-A2 and -A24 receptors, the polarHLA-A1, and the sterically ruled HLA-B7 receptor. Acombined theoretical and experimental study on 39anchor substitutions in FxSKQYMTx/HLA-A2 and-A24 complexes indicated a prediction accuracy ofabout two thirds of a log-unit in Kd. Analysis of freeenergy contributions identified a great role of desol-vation and conformational strain effects in establish-ing a given specificity profile. Interestingly, themethod rightly predicted that most anchor profilesare less specific than so far assumed. This suggeststhat many potential T-cell epitopes could be missedwith current prediction methods. The results pre-sented in this work may therefore significantly af-fect T-cell epitope discovery programs applied inthe field of peptide vaccine development. Proteins2005;58:53–69. © 2004 Wiley-Liss, Inc.

Key words: peptide–receptor complex; MHC com-plex; HLA complex; anchor residue; an-chor profile; binding specificity; affinityscoring function; solvent model; dock-ing

INTRODUCTION

Peptides are important regulatory molecules involved ina variety of biological mechanisms. Their function isgenerally determined by processing kinetics, interactionspecificity, and, more fundamentally, binding affinity. Athorough understanding of the contributions relevant forstable complex formation may form the basis of experimen-tal rationalization, detection of novel ligands, and optimi-

zation of lead compounds. Here, predictive structure-based methods can be very helpful, provided that they areof sufficiently high accuracy.

Structure-based binding studies face two major techni-cal barriers. The first resides in the prediction of accurate3D structures for peptide–receptor complexes. Peptidesare conformationally very flexible, since most of theirchemical bonds are subject to free rotation. A partialsolution is to perform flexible docking1,2 using predefinedrotamers.3,4 Yet deviation from ideal rotameric states5 andsmall-scale flexibility due to bond angle bending6 presentadditional difficulties. Occasionally, peptides adopt vari-able binding modes,7–10 partly bulge out into solvent,11 orlet some flanking residues hang out of the binding site.12

Yet it is often observed that one or more peptide side-chains are anchored into well-shaped pockets in the inter-face surface.13 In such cases, conformational flexibility islimited, which facilitates structure-based analysis.

The second problem is to derive accurate binding affini-ties from experimental or modeled representations. Evenshort peptides easily contain more than 100 atoms, mak-ing thousands of small pairwise atomic interactions. Fur-ther, ligand–receptor interface regions are rarely packedin an optimal way, and they often include multiple watermolecules.14,15 Finally, binding affinity depends on thermo-dynamic properties of not only the bound but also the freestates of the molecules involved.

In view of these complications, structure-based affinityscoring methods invariably include approximations and/ora parameterization step that makes use of experimentaldata. Different methodologies can be classified in twogroups: dynamic and static approaches. The most success-

Abbreviations: 3D, three-dimensional; A1, HLA-A*0101; A2, HLA-A*0201; A24, HLA-A*2402; ASA, accessible surface area; B7, HLA-B*0702; DMSO, dimethylsulfoxide; DTT, dithiothreitol; FEP, freeenergy perturbation; GSP, group solvation parameter; HLA, humanleukocyte antigen; Kd, dissociation constant; MHC, major histocompat-ibility complex; pAla, poly-alanine; PDB, Protein Data Bank; RT, roomtemperature; wt, wild-type.

Grant sponsor: Vlaams Instituut voor de bevordering van hetWetenschappelijk-Technologisch onderzoek in de industrie; Grantnumber: 010265.

*Correspondence to: Johan Desmet, AlgoNomics NV, Technologiepark4, B-9052 Gent, Belgium. E-mail: [email protected]

Received 23 April 2004; Accepted 20 July 2004

Published online 3 November 2004 in Wiley InterScience(www.interscience.wiley.com). DOI: 10.1002/prot.20302

PROTEINS: Structure, Function, and Bioinformatics 58:53–69 (2005)

© 2004 WILEY-LISS, INC.

Page 2: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

ful dynamic methods are free energy perturbation andthermodynamic integration.16 They allow computation ofrelative binding free energies for a set of congenericligands. Since these methods are computationally verydemanding, various assumptions and simplifications areusually introduced.17 Although promising results havebeen obtained on limited data, general transferabilityremains to be demonstrated.18 The second group of meth-odologies (i.e., static methods) largely ignores the dynamicbehavior of molecules in solution. These methods attemptto derive binding affinities from one or more uniquelydefined 3D structures. Here, a distinction is to be madebetween statistical and empirical methods. Statistical, orknowledge-based, scoring methods operate on the basis ofatom (or group) contact potentials derived from knownprotein structures.19–21 Empirical, or partitioning meth-ods work with predefined physical energy terms, repre-sented by parameterized mathematical equations that areoptimized against experimental data.22–27 In view of theinevitable training step, validation on independent data isrequired. Here, it is not uncommon that methods perform-ing relatively well on data similar to the training set aresignificantly less accurate on more divergent data sets28 ormust even be retrained.29 Transferability therefore re-mains an important and delicate matter.30

In this work, we have studied the binding characteris-tics of anchor residues within peptide ligands of HLAcomplexes. HLA class I molecules are immunologicallyimportant receptors involved in specific recognition be-tween cytotoxic T lymphocytes and pathogen-infectedcells.31 Pathogen-derived peptides, known as antigens,form stable complexes with HLA molecules that are pre-sented at the cell surface as a result of a multistepprocessing mechanism.32 Bound peptides are mostly 8–10residues long.33 Structural information from the PDB34 isavailable for 9 different HLA receptor subtypes. It isobserved that all peptides adopt nearly extended conforma-tions within the binding groove formed by 2 �-helices and a�-sheet. Common features of these complexes are thestrong interactions between receptor side-chains and theN- and C-terminal ends of the peptide backbone. Nonamericpeptides typically bulge out from the groove near residuepositions P4–P5 (by convention, the N-terminal residue isassigned P1 and the C-terminal residue P�; since thiswork only deals with nonamers, P9 will be used instead ofP�). The side-chains of peptide residues P2 (or occasion-ally P3) and P9 are located into well-formed pocketsnamed B (or D) and F, respectively.35 Hence, the side-chain orientation of anchor residues and their structuralcontext are relatively fixed. Yet there is a significantvariety in anchor properties among different subtypes.Finally, anchor residues are dominant contributors to totalaffinity36 and greatly determine binding specificity.37 Forall these reasons, anchor residues in HLA class I com-plexes are ideally suited to develop or test novel affinityscoring methods.

The main purpose of this work was to identify the mostrelevant physicochemical affinity determinants. Prelimi-nary experimentation with possible contributors like con-

tact-based potentials, weight-adapted conformational en-ergy terms, shape complementarity, hydrophobiccorrections, and different entropical components oftenyielded good results but poor transferability (results notshown). Further study indicated the great danger ofoverparameterization, leading to erroneous assignment ofeither false or redundant contributions. Underparameter-ization, in particular of conformational strain, turned outto be another problem. We therefore examined the possibil-ity to develop a scoring function based on an establishedforce field function. Recently, Ogata et al.38 published apeptide design method for HLA complexes based on thesame option. Other design algorithms follow a similarapproach.39 The principal advantage of force field basedapproaches is that different physicochemical interactionscan be computed in a consistent way.

Here we report on the development of a novel affinityscoring method named PepScope. The original CHARMMforce field40 was selected as the sole input function toPepScope. No parameterization steps were performedexcept at the level of the protocol (e.g., the number ofenergy minimization steps, cutoff distances, and modelpreparation strategy). Exactly the same force field func-tion was used to derive (de)hydration energies from simu-lations of amino acid model compounds in explicit waterenvironment. Potential inconsistencies between solventterms derived from experimental data41–43 and intracom-plex terms based on the force field were thus avoided.

The PepScope method has been applied to 4 physicallydifferent HLA receptors. A systematic study of all naturalamino acid substitutions at the anchor positions in HLA-A*0101 (A1), HLA-A*0201 (A2), HLA-A*2402 (A24), andHLA-B*0702 (B7) was performed. The predictions werecompared with experimental data either from literature(A1 and B7) or from our own binding assays (A2 and A24).The binding capacity of 39 nonamers FxSKQYMTx (x isany of the 20 natural amino acids) was tested on A2 andA24. The latter data are unique in that they display theanchor specificity over the entire range of possible substitu-tions. This has enabled us to quantify not only favorablebut also disruptive effects. With respect to the latter, weemphasize the generally underestimated roles of desolva-tion and conformational strain.

MATERIALS AND METHODSModel Preparation

The primary goal of the model preparation step is togenerate templates suitable for estimating the affinitycontribution of individual amino acid residues at anchorpositions. General requirements are therefore (1) a recep-tor structure wherein at least the side-chains constitutingthe specificity pockets have correct conformations, (2) apeptide sequence and structure that is unbiased withrespect to the anchor substitutions to be made, and (3) apeptide–receptor structure, or template, that is energeti-cally relaxed. As described below, we utilized state-of-the-art side-chain modeling, peptide docking, and energyoptimization tools to accomplish these goals. For a number

54 J. DESMET ET AL.

Page 3: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

of theoretical and practical reasons (see Discussion sec-tion), 3 templates were constructed per receptor type.

Several A2 complexes are directly available from thePDB.34 We selected the structure with PDB code 1DUZbecause of its high resolution (1.8 Å) and because itcontains a high-affinity nonameric peptide (LLFGYPVYV,Kd � IC50 � 13 nM44). Since no structures are availablefor A1, A24, and B7, these had to be modeled by homology.For this purpose, we selected the following starting struc-tures from the PDB: 1HSB (type Aw68) for A1, 1DUZ (typeA2) for A24, and 1A9E (type B35) for B7. The coordinatesfor amino acid residues 1–181 (i.e., the �1�2 domain) andthe extant peptide were extracted from the files. Watermolecules were ignored in this study. The truncatedpeptide–HLA complexes were submitted to 200 steps ofunrestrained steepest descent energy minimization.

In the second step, HLA receptor models were con-structed from the selected homologs. A total of 21, 20, and21 substitutions were required for A1, A24, and B7,respectively. (The model-building step was skipped for A2,since the 1DUZ structure was of the right type). Thecorrect residue types were first introduced in standardgeometry (i.e., with standard bond lengths and angles).Then, the peptide was removed, since it would normally beincompatible with the receptor under construction. Next,we applied the FASTER algorithm45 to determine theenergetically best side-chain packing of the substitutedresidues (conserved residues were kept fixed). This pro-gram has been described in detail before46 and was appliedin an unmodified form. All substituted side-chains in the 3constructed models could be placed free of interatomicclashes. Yet, since the FASTER method selects discreteconformations from a rotamer library, the final structureis susceptible to further relaxion (e.g., by gradient meth-ods). The models were therefore subjected to another 200steps of unrestrained steepest descent energy minimiza-tion. Finally, the peptide was placed back as poly-Ala,using the original coordinates for the backbone. (Thepeptide in the PDB structure 1HSB only contains residuesP1–P3 and P8–P9, which is sufficient for the next step).

In the third step, a high-affinity peptide was docked intoeach HLA model. Apart from being a quality control of themodels, the main purpose of the docking step was togenerate an ensemble of peptide backbone conformationswithin the HLA binding groove. The following sequenceswere docked: YTAVVPLVY for A1, FLSKQYMTL for bothA2 and A24, and FPVRPQVPL for B7. The flexible dockingalgorithm and settings were applied exactly as describedbefore.2,46 Briefly, all peptide bond lengths and angleswere initialized in standard geometry. The backbone con-formation was initialized with � and � angles of �140° and140° (extended mode) and with coordinates for the N, C�,and C atoms of residue P1 copied from the poly-Alapeptide. The poly-Ala peptide was then removed. Next, thedocking algorithm was instructed to rebuild the peptidefrom the N- toward the C-terminus, residue by residue, ina combinatorial fashion. An important feature of thealgorithm is that all receptor side-chains in contact withthe (growing) peptide are remodeled at each combinatorial

step. Limited translation (max. 1 Å) of peptide fragmentsand, eventually, of the full-length peptide was allowed aswell. Typically, the docking algorithm identifies 50 to a fewhundred energetically favorable final complex structures.For the present study on anchor residues, 3 structures perreceptor type were found to be the optimal compromisebetween structural diversity and computational efficiencyrequirements. The selections were made from the top 10docking solutions by graphical inspection, aiming for maxi-mal peptide backbone variation, while maintaining global(groove) and local (near anchors) structural integrity. Inall cases, the lowest energy solution from docking wasselected, plus 2 additional structures in accordance withthe criteria indicated.

The model preparation step was completed by energy-minimizing all structures. They were submitted to 800steps of conjugate gradient minimization (i.e., the same asin the succeeding scoring step). Finally, all peptides wereagain mutated into poly-Ala in order to avoid any bias fromthe peptide sequence when scoring individual residuetypes at anchor positions. The resulting structures, re-ferred to as pAla-HLA models, were stored as input datafor the scoring algorithm.

Affinity Scoring Algorithm

Affinity scoring of anchor residues was accomplishedbasically by a combined side-chain rotameric search andenergy refinement approach. The following steps wereperformed individually for all amino acid side-chains ateach anchor position in all pAla–HLA complex models.

First, the side-chain was introduced in standard geom-etry. Then, all rotamers from the same library as used inthe model preparation were applied consecutively to themutated side-chain. Dummy rotamers were used for Gly,Ala, and Pro. Next, each rotameric variant was submittedto 800 steps of conjugate gradient energy minimization.Moderate positional restraints (1 kcal/Å2) were applied tothe full backbone of the complex except the substitutedresidue and its flanking peptidic groups. The cutoff fornonbonded interactions was set to 14 Å. These settingswere the result of numerous trial experiments (data notshown) related to a single question: What is the fastestway to obtain the energetically most favorable structurefor any possible anchor substitution in any of the pAla–HLA complexes? An extensive rotameric search in combi-nation with thorough energy minimization was needed toadequately probe the conformational space. Conjugategradient minimization was found superior to steepestdescent. A total of 800 iteration steps allowed convergenceto a more or less stable minimum, although a slowrelaxation phase (or “drift”) was observed as well. Thelatter was mainly due to a global collapse of the receptoraround the “tiny” pAla peptide. The application of posi-tional restraints largely solved this probem. Finally, the14-Å cutoff was chosen as the maximal value allowingacceptable computation times. It is stressed that none ofthese settings have been assigned as the result of aparameterization effort against experimental data.

ANCHOR PROFILES OF HLA-SPECIFIC PEPTIDES 55

Page 4: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

The following analyses were carried out on each substi-tuted and minimized structure: computation of ASA of theintroduced side-chain rotamer, conversion of its ASA intopercentage buried surface area (%BSA), calculation of theassociated desolvation energy (Edesolv), computation ofside-chain–receptor nonbonded interactions (Einter), andcomputation of conformational strain energy (Estrain). Themodeling step is concluded by selecting the rotamer r withthe best total energy score (Etotal), in accordance with Eq.(1):

Etotal � minr {Edesolv(r) Einter(r) Estrain(r)} (1)

This expression defines a unique structure and associ-ated energy score for a given amino acid at a given peptideanchor position in one of the models for a given HLAcomplex. These energies, also referred to as affinity scores,were calculated for all 20 natural amino acid substitutionsat the anchor positions P2 and P9 in pAla-A2, -A24, and-B7, and positions P3 and P9 in pAla-A1. Noise effects dueto imperfections in individual models or fluctuations in theminimization path were reduced by taking the average ofthe Etotal values over the 3 models constructed per receptortype.

The PepScope scoring function by default subdivideseach of the 3 global energy terms into more elementarycomponents, primarily for the sake of comprehensibility.In the case of the desolvation terms, this also enablesworking with GSPs42,43,47,48 rendering desolvation terms

conformation sensitive. The global energies are subdividedaccording to Eqs. (2) through (4); the desolvation term isexplained in the next section, and the interaction andstrain contributions are discussed in the next paragraphs.

Edesolv � �i %BSA(i) � GSP(i) (2)

Einter � i {Evdw(i) Ehbo(i) Eele(i)} (3)

Estrain � Erec Epep Eself (4)

In Eqs. (2) and (3), i denotes one of the chemicalfunctions present in the mutated side-chain (defined in theFig. 1 legend). Any given side-chain type is described bymaximally 3 chemical groups. All side-chain types com-prise group 1, defined as the aliphatic moiety. The subdivi-sion of Edesolv and Einter into group contributions is justi-fied in view of the additive nature of surface areas, as wellas nonbonded energy terms.

The direct side-chain–receptor interactions (Einter) arethe most obvious contributions. They consist of van derWaals interactions quantified by a “6-12” Lennard–Jonespotential [Evdw(i)], H-bonds represented by a “10-12” poten-tial [Ehbo(i)], and electrostatic interactions calculated by aCoulombic equation with a distance-dependent dielectricconstant49 [Eele(i)]. van der Waals and H-bond interac-tions were computed with a cutoff distance of 16 Å, while25 Å was used for electrostatic energy. These very largecutoffs were chosen to include nearly all interactions (i.e.,

Fig. 1. Computed hydration energies for amino acid side-chains in a water box. Contributions for van derWaals (black/gray bars), H-bonding (blue), and electrostatic interactions (red) are given separately. Values arefurther subdivided per chemical group present in a side-chain. Nine different functions are considered: (1)aliphatic CxHy; (2) aromatic CxHy; (3) aromatic NxHy; (4) hydroxyl, OH; (5) sulphur/sulphydryl, S/SH; (6)charged amine, NH3

; (7) carboxyl, COO�; (8) amide, CONH2; and (9) guanidinium, NHC(NH2)2 atoms. Thus,

each side-chain is composed of maximally three groups. Values for the aliphatic moieties (all residue types) areindicated by thin outlines and with dark colors (black, dark blue, red). The second chemical moiety, if any, isindicated by thick outlines and light colors (gray, light blue, orange). The third moiety (only Tyr, Trp, and His)again is like the first chemical group. The sum of van der Waals, H-bonding, and electrostatic energies for agiven chemical group is defined as the corresponding group solvation parameter [Eq. (2)]. The absolute valueof the total cumulative energy of a side-chain corresponds to the maximal side-chain desolvation cost.

56 J. DESMET ET AL.

Page 5: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

to make the method essentially cutoff-independent). Com-puted values for Evdw(i), where i refers to aromatic orguanidinium groups, were reduced to 80% of the originalvalue to avoid overprediction. This adjustment resultedfrom the observation that the van der Waals parametersfor these atom types in the CHARMM library are suspect-edly favorable. (On average, the van der Waals energy ofan aromatic carbon with any other atom type is 1.46 � 0.03times that of an aliphatic carbon. For guanidinium nitro-gens, this is even a factor 1.65 � 0.08.) Also, preliminarytests on HLA-A2 anchor substitutions led to significantlyoverpredicted (too negative) values, which could be ex-plained only by assuming that the associated van derWaals parameters in CHARMM have been historicallyoverestimated. Thus, we were forced to introduce a weightcoefficient and to optimize it against our experimentaldata. The same coefficient was used for aromatic andguanidinium groups. Its optimal value (0.8) was deter-mined solely on A2 data and, subsequently, applied to theother receptors without readjustment.

The strain term Estrain stands for “every increment inenergy due to the mutant side-chain, except direct interac-tions.” Computing it by comparing energies before andafter minimization would mostly result in “negative incre-ments” due to global structural drift independent of thesubstitution. Therefore, strain contributions are computedon the mutated and minimized structure, and comparedwith the same terms derived from the minimized pAla-HLA structure (“mutant” strain minus “Ala” strain). Afirst component of Estrain is Erec, the strain energy residingin the receptor, more precisely within the set of atomscloser than 15 Å from the C�-atom of the mutated position.A second component is Epep, the strain felt by the entirepoly-Ala peptide (i.e., the full peptide minus the substi-tuted residue). This term includes both the self energy ofthe peptide and its interactions with the receptor. Thethird component is Eself, the self-tension of the mutation,including all bonded and nonbonded energies within themutated residue (side-chain, main-chain, and flankingpeptide groups). In contrast to Erec and Epep, Eself ismeasured relative to the self-energy of the same aminoacid in the water box, and not relative to the minimizedpAla-HLA structure. Occasionally, one or more of thestrain terms assume unrealistically high values. However,rather than imposing general strain maxima, the full scoreis always calculated first [Eq. (1)] and then, if higher,truncated at 3.0 kcal/mol.

Computation of Group Solvation Parameters

Acetylated and aminomethylated amino acids wereplaced at the center of a spherical water box with a radiusof 37 Å and containing 6840 molecules in a TIP4P configu-ration.50 Side-chain conformations were retrieved fromthe same rotamer library as used in the preparation ofcomplex models. Rotamers were considered one by one, forall 20 natural residue types. For each rotamer, the dimen-sions of the system were reduced by retaining only thewater molecules in a 20 Å layer around the solute. Next, alloverlapping water molecules were removed. Overlap was

defined as a distance between any solute–water atom pairsmaller than the sum of their respective van der Waalsradii, minus a tolerance of 1 Å. The system was subse-quently energy-minimized by performing 200 steps ofconjugate gradient minimization using a 14 Å cutoff and10 kcal/Å2 positional restraint on the water oxygen atoms(not on the hydrogens, nor on the solute atoms). The wholeprocedure was repeated 100 times per rotamer usingslightly different initial placements (a uniform randomoffset relative to the center of the sphere was applied to theX-, Y- and Z-coordinates of the solute, sampled from theinterval �2,2 Å).

Group solvation parameters were derived from theseensembles in a way very similar to peptides in complex.Direct side-chain–water interactions were computed usingthe same CHARMM force field, cutoffs, and chemicalfunction definition [i.e., using an equation that is formallyidentical to Eq. (2)]. Rotamer self-energies (bonded andnonbonded energy of the central residue with its flankingpeptide groups) were calculated also. Main-chain–waterand water–water interactions were ignored, assuming forthe former a negligible and for the latter a linear responseeffect of the side-chain considered.

In practice, from the ensemble of 100 simulations perside-chain rotamer, the 25 solutions having the best totalside-chain–water plus self energy were retained and theirvalues were averaged. Finally, the rotamer with the lowestaverage energy was retained for each residue type.

Figure 1 shows the computed energies, subdivided intovan der Waals, H-bonding, and electrostatic interactionsfor each chemical group. These values were used as GSPsin Eq. (2), that is, it was assumed that the energetic costassociated with desolvation of a given chemical moiety(upon burial in a complex) can be approximated by theopposite of direct nonbonded solute–water interactions inthe water box described. Theoretical and practical issuesconcerning this relatively simple approach will be dis-cussed.

Peptide Binding Assays

IC50 values were determined using a cell-based assay,largely according to van der Burg et al.51 and Kessler etal.52 Briefly, immortalized B-cells displaying HLA-A*0201or HLA-A*2402 homozygously [VOSE EBV (A*0201,B*4402, Cw*0501/0711) and HATT EBV (A*2402, B*4801,Cw*0801/1202), kind gifts from Pierre van der Bruggen,UCL, Brussels] are stripped of their self-peptides, followedby equilibrium binding of test peptide in competition withfluorescent reference peptide [FLPSDC(5Fluorescein)FPSVfor A2 and RYLKC(5Fluorescein)QQLL for A24]. A 10-point concentration range of test peptide is used for eachmeasurement, typically in 2-fold increments from 62.5 nMto 32 M, in a constant background of 30 nM referencepeptide. Adapted ranges were used for excellent binders(minimal concentration 7.8 nM) and weak binders (maxi-mal concentration 128 M). Fifty percent inhibitory concen-trations (IC50 values) were calculated as averages ob-tained from at least 3 independent measurements (i.e.,from different cell preparations and peptide dilutions).

ANCHOR PROFILES OF HLA-SPECIFIC PEPTIDES 57

Page 6: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

Test peptides � 95% pure (Thermo Electron GmbH) werestored at 10 mM in DMSO at �20°C. Cysteine-containingpeptides were stored at 10 mM in 1 mM DTT DMSO. DTTdid not affect binding of FLSKQYMTL control peptide butsignificantly improved binding and reproducibility forFCSKQYMTL and FLSKQYMTC on both A2 and A24 (notshown). Lack of binding of cysteine-containing peptidestocks (without DTT) could be fully restored by addition ofDTT to 1 mM, to a 2 mM FCSKQYMTL or FLSKQYMTCDMSO stock (15 min incubation at RT prior to distribu-tion). IC50 values were converted to binding free energies(�G) using the relationships

Kd � IC50/(1 cref/Kdref) (5)

and

�G � RT ln(Kd) (6)

where Kdref and cref are the dissociation constant andformal concentration of the reference peptide, respectively.The Kdref values were derived from an independent bind-ing assay in which the fluorescence intensity was mea-sured as a function of increasing concentrations of refer-ence peptide. Nonlinear curve fitting using a single-sitebinding scheme gave approximate Kd’s of 3 nM for the A2and 30 nM for the A24 reference peptides, respectively.

RESULTSDesolvation Model

The method used to quantify desolvation effects iscritical for accurate prediction of binding affinities. Apossible approach is to combine experimental (de)hydra-tion energies with intracomplex energies.53,54 However,for one reason and another, both sources of data may showmutual inconsistencies.55 We have therefore attempted toderive a protocol for assessing hydration energies thatmaximally resembles the method for computing intracom-plex terms. The requisites were (1) an all-atom representa-tion of explicit water molecules, (2) rotameric sampling ofamino acid conformational space, (3) energy refinement bythe same (conjugate) gradient method, (4) evaluation ofmultiple representations, and (5) energetic analysis perchemical group. Importantly, we applied the sameCHARMM force field in this solvent context as we did inthe context of HLA receptors.

Solute–solvent interactions of the 20 natural amino acidresidues, submerged into a water box (see Materials andMethods section), are shown in Figure 1. The energeticanalysis was performed separately for van der Waals,H-bonding, and electrostatic interactions with solventwater, and for the different chemical functions present ineach side-chain. It is seen that the aliphatic residues Ala,Pro, Val, Ile, and Leu have small, negative hydrationenergies arising almost purely from van der Waals interac-tions. They remain below 5 kcal/mol in absolute value. Incontrast, polar side-chains other than Ser and Thr makeabout 3 to 4 times stronger interactions, roughly between�13 kcal/mol (Lys) and �19 kcal/mol (Glu). Thus, evennonpolar side-chains have favorable hydration energies,

but the latter are much smaller than those of polarside-chains.

All OH groups (Ser, Thr, Tyr) have similar values:negligible van der Waals and about �3 kcal/mol H-bonding and electrostatic interactions. Amide and carboxy-late groups have nearly identical H-bonding and electro-static interactions (about �5 and �7 kcal/mol, respectively).This may seem surprising in view of the carboxylategroups carrying a net charge, as opposed to amide groups.Apparently, the OC- and the two HN-dipoles in an amidegroup are almost equivalent to the two OC-dipoles carry-ing negative charges in carboxylates. Compared to OH-groups, amides and carboxylates have much stronger vander Waals (�7�), slightly stronger H-bonds (�1.5�) andmuch stronger electrostatics (�2.5�). Very similar resultsare obtained for the joint N�-/N�H-groups from the Hisimidazole ring. Considering that these contributions tohydration energy must be overcompensated in a complexin order for the corresponding groups to have a netstabilizing effect, the values obtained are very large.

The charged amine of Lys and the guanidinium functionof Arg both have H-bonding terms similar to that of anOH-group (��3.0 kcal/mol). Also, the electrostatic termshardly differ: That of the guanidinium group is �1 kcal/mol smaller than for OH. This was yet another unexpectedresult in view of both Lys and Arg being charged andpossessing much more dipoles than OH-containing resi-dues. The main difference with OH-functions is observedat the level of van der Waals interactions: The Lys and Argpolar groups respectively have 2.5- and 12-fold strongervan der Waals interactions than OH.

The S-/SH-groups of Cys and Met are poor H-bondformers and make weak electrostatic interactions in wa-ter. In contrast, the van der Waals contribution is exception-ally large (�3.3 kcal/mol), which is obviously due to thegreater polarizability of S-atoms. Finally, aromatic atomshave a relatively simple hydration profile: no H-bonds,very weak electrostatics, and regular van der Waalsinteractions of about �1.3 kcal/mol per heavy atom.

These computed hydration energies are to be consideredas GSPs. Once established, they can be stored in a simplelookup table. This allows rapid assessment of the energeticcost associated with (partial) dehydration of any side-chain upon burial in a complex, in accordance with Eq. (2).In principle, the same approach can be followed for main-chain moieties, but this was considered unnecessary inthis work in view of the constant conformation of thebackbone part of anchor residues. Finally, we expresslyavoided any form of parameterization in the establishingof GSP values in order to avoid bias to specific receptortypes. What follows can therefore be seen as validationexperiments.

Comparison With Experimental DehydrationEnergy

The first requisite of the solvation model was compatibil-ity with force field-based energy calculations in peptide–receptor complexes. Nevertheless, we have also directlycompared the calculated values with experimental data

58 J. DESMET ET AL.

Page 7: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

from Ooi et al.42 They have decomposed experimentalhydration free energies of small organic compounds intocontributions for 7 functional groups by assuming propor-tionality with ASA. From these terms, the authors thenreconstituted side-chain hydration free energies.

Figure 2 shows a correlation plot between the experimen-tal data and our calculated values. It is seen that both datasets correlate well, except for Arg, Tyr, and Ser, whichhave very negative (“hydrophilic”) hydration energies inthe experimental set. Ignoring the latter, a correlationwith R2 � 0.84, and a slope of 0.48 is obtained. Theexperimental data suggest that Arg and, to a lesser extent,Tyr and Ser are markedly more hydrophilic than expectedon the basis of our computations.

Another important discrepancy is the anticorrelation forthe aliphatic side-chains Leu, Ile, Val, Pro, Ala, and Gly.From Gly to Leu, we find a decrease in hydration energy ofabout 5 kcal/mol, while the experimental data show anincrease of about 2 kcal/mol. Even after rescaling ourvalues on the basis of the slope of the regression line, thedifference remains as large as 2 0.48 � 5 � 4.5 kcal/mol.In terms of dissociation constant, this corresponds to apure “dehydration advantage” of Leu over Gly by morethan 3 log-orders in Kd (1 logKd � 2.303RT � 1.37kcal/mol). In addition, a Leu side-chain in the context of aparticular complex will generally interact more favorably(by several kilocalories per mole) than the much smallerGly, which further tips the scale in favor of the former.Thus, in the absence of any other compensatory effects, theusage of hydration data from an external (e.g., experimen-tal) source would strongly favor the larger aliphatic side-chains over the smaller ones. In contrast, our force field-based hydration model suggests that the larger aliphaticgroups are disfavored in comparison with the smallerones.

Below, we discuss the origin of the deviations observedin Figure 2. For now, we only conclude that the calculateddehydration energies are in reasonable agreement withexperimental free energies. Which approach performs bestin combination with a force field-based scoring functionapplied to complexes remains to be demonstrated.

Affinity Scoring of HLA-A2 Complexes

HLA-A*0201 (A2) is one of the most extensively studiedpeptide-binding receptor molecules. It is known to show astrong preference for peptide ligands having Leu at posi-tion P2 and Val or Leu at P9.56,57 The modeling of all 20natural amino acid residues at the anchor positions P2 andP9 was performed systematically for the 3 pAla-A2 models,as described. Figure 3 shows the averaged scoring valuesfor position P2 in pAla-A2. Total affinity scores have beendissected into the 3 major energetic components: desolva-tion energy, side-chain–complex interaction (includinginteractions with all pAla residues but the mutated one),and “local strain” (including strain within the mutantresidue). All values are expressed in units of kilocaloriesper mole (i.e., the units of the force field equations).

A striking observation from Figure 3 is that directside-chain–complex interactions are very large: For 50% ofthe residues, they lie in the range �15 to �20 kcal/mol. Itis seen that most of the side-chains “larger than Leu” havesimilar interactions, with values around �17 kcal/mol.The smaller side-chains, most of which are aliphatic, havevariable interactions ranging from 0 (Gly) to �12 kcal/mol(Leu). Thus, in absence of any compensatory effects, pureforce field-derived energies would predict aromatic andpolar side-chains to be largely preferred over aliphaticresidues at position P2 in A2 complexes.

Rather, the opposite is true: Peptides binding strongly toA2 mostly have Leu, Met, Ile, and sometimes Val, Ala, orThr at position P2.32 Figure 3 shows that the explanationis to be found at two levels, namely, the compensatorydesolvation and strain terms. From Figure 2, it followedthat polar and aromatic side-chains are characterized byrelatively large desolvation energies. The latter appears tobe the primary reason for Ser, Asn, Asp, and Glu to beundesired. Yet the prohibitive nature of various otherside-chains can be explained only by an additional compen-satory component. Here, this is found to be the straininduced in the receptor by ill-fitting side-chains. The latteris applicable mainly to the aromatic Phe, Tyr, Trp, His,and the basic Lys and Arg side-chains. Even the aliphaticVal, Ile, Leu, and especially Pro induce some tension intheir environment.

Gln is the sole polar side-chain that is reasonably wellaccepted in the receptor pocket at P2. Figure 3 shows thatGln can be accommodated almost free of strain, but thedesolvation cost is large (17.2 kcal/mol). Yet Gln seems tohave no problem in compensating for this by makingexceptionally favorable interactions. The inset in Figure 3shows more detail. The aliphatic part of its side-chaininteracts favorably with the pocket residues, almost exclu-sively through van der Waals terms. The amide groupcontributes even better: �9.6, �2.6, and �2.9 kcal/molfrom van der Waals, H-bonds, and electrostatic interac-tions, respectively. The total van der Waals interaction inthe complex is �15.7 kcal/mol, or roughly 3 times strongerthan in the water box. In itself, this is not an exceptionalsituation for buried, well-packed side-chains, includingthe polar ones (data not shown). What really makes thedifference is the ability of Gln to form 2 nearly ideal

Fig. 2. Correlation between calculated and experimental hydrationenergies of amino acid side-chains. Experimental data (�Gh) are takenfrom Ooi et al., Table 4.42 Arg, Tyr, and Ser (filled symbols) were ignoredin the regression analysis.

ANCHOR PROFILES OF HLA-SPECIFIC PEPTIDES 59

Page 8: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

H-bonds, one between the first of its 2 amide H� atoms andGlu A63.O�, and the other between the second amide H�

and the backbone A63.O atom. Only Lys can make thesame 2 H-bonds, but this occurs at the cost of a consider-able strain of 5 kcal/mol distributed in proportions of 2:1:2over side-chain self, peptide pAla energy, and receptortension, respectively. Csh, Ser, Thr, and Asn can onlymake 1 H-bond, which is apparently not sufficient to makethem preferred residues.

The net result of the delicate balance is shown by thetiny black bars in Figure 3. When plotted on a scalesuitable to display the individual contributions, the totalenergy score almost vanishes (the total energy oftenamounts to less than 10% of the direct side-chain–receptorinteraction). Pending the experimental validation below,our top-10 ranked residues (L-M-V-I-Q-A-T-S-C-K) corre-late well with those of the popular Bimas scoring ma-trix,58,59 (L-M-I-Q-V-E-A-T-C-G). Only Glu at rank 6 in theBimas series is in disagreement with our method rankingGlu as the fifth worst type at P2 in A2. It will be seen thatGlu in the Bimas matrix is a false positive.

Experimental Validation for A2

Validation of the predicted binding capacity was achievedby direct comparison with experimental affinities obtainedfor a series of systematic P2 mutants of a selected peptide.We chose the nonameric peptide FLSKQYMTL from hepa-titis B virus polymerase, residues 676–684, because of itscross-reactivity between A2 and A24.

Direct comparison with experimental data is impeded bysome technical difficulties: (1) Prediction scores resultfrom the assessment of individual side-chains in a refer-ence pAla-HLA context, while experimental values are forfull-length peptides; (2) prediction scores are expressed in

units of the force field (i.e., kilocalories per mole), whileexperimental values were measured as IC50’s and con-verted to binding free energies (in kilojoules per mole)using Eqs. (5) and (6). While the former is a typical offsetproblem, the latter seems a trivial conversion problem (1kcal � 4.187 kJ). However, values derived from a forcefield-function are basically potential and not free energies.Yet all force field-based methods, including some freeenergy perturbation methods,18 assume a linear relation-ship between experimental free energy and computedvalues. With respect to said difficulties, this allows toconvert raw prediction scores (E) to theoretical free ener-gies (E�) by means of least-squares fitting, in accordancewith the equation

E�(kJ/mol) � a � b E(kcal/mol) (7)

where a is the intercept and b the slope of the least-squaresregression line of the correlation plot. For the P2 mutants,the slope of the regression line (b) equaled 2.44 kJ/kcal andthe intercept was �33.6 kJ/mol. The latter value is ameasure for the affinity contributed by the peptide minusthe substituted side-chain. The slope of the curve amountsto 0.58 times the theoretical value of 4.187 kJ/kcal, whichis fairly consistent with the simulations in the water box(Fig. 2).

Table I shows the data for the wt peptide FLSKQYMTLand all 38 P2 and P9 substitutions. The experimental dataconfirm the potential of Leu at P2 for strong binding. TheMet and Ile variants essentially bind with the sameaffinity. Val and Gln at P2 also allow strong binding. Thesedata are greatly consistent with what is known about theP2 preference of A2 binding peptides.32,58 However, ourdata set further shows that Thr, Ala, Gly, and Ser arefeasible residues as well, something which is rarely recog-

Fig. 3. Energy contributions calculated for different side-chains placed at P2 in the pAla-A2 complex. Whitebars, desolvation energies; light gray bars, strain terms; dark gray bars, sum of intracomplex interactions. Totalenergies are indicated by tiny black bars; they correspond to the scores in column 3 of Table I. Inset: detail ofGln interactions with the A2 receptor; 1, first group (aliphatic moiety); 2, second group (amide function); V, vander Waals; H, H-bonds; E, electrostatic energy. Numeric values are computed interaction energies in kcal/molunits.

60 J. DESMET ET AL.

Page 9: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

nized. Even the Csh and Phe variants have a dissociationconstant that is less than 1 log-order worse than the wt (1logKd � 5.74 kJ/mol). Only Tyr, Arg, Lys, Glu, Pro, Asp,and Trp are truly disruptive (logKd more than 2 unitshigher than the wt).

Our predictions follow the observed trends in that ourtop 5 best ranked peptides (Leu, Met, Val, Ile, Gln) exactlycoincide with the experimental top 5. The moderatelybinding Gly, Ser, and Phe mutants are underpredicted byapproximately 1 logKd in magnitude. The opposite holdsfor the disruptive residues Lys, Pro, and Trp, although thepredicted values are certainly not suggestive of favorableinteraction. All other predictions are correct to withinabout 0.5 logKd. The overall standard deviation of thedifference between predicted and experimental bindingenergy is 3.9 kJ/mol (�2/3 logKd).

The same theoretical and experimental analysis wasperformed on the second anchor position P9. Here again,the best peptides contained the well-known motif residuesVal, Leu, or Met.32,58 More surprising was the observationthat Ala, Ile, Thr, Phe, Ser, and Csh substitutions caused areduction in affinity compared to wt of less than �0.5logKd. Especially Phe and Ser are usually consideredprohibitive. Further, Gly and Trp are also presumed to bedisruptive, but they do fall within the 1 logKd range fromwt. Truly prohibited at P9 (wt 2 logKd) are only thecharged residues Lys, Glu, Asp, and Arg.

Our P9 predictions are generally of the same or slightlybetter quality than for P2. Only the value for Lys waspredicted with an error larger than 1 logKd (in thefalse-positive sense). A similar yet smaller error was

observed for Arg. (Charged residues, especially Lys andArg, showed some general false-positive tendency, whichwill be discussed.) Underpredicted (i.e., in the false-negative sense) are Trp and Phe, which experienced arelatively high strain in the models (12.1 and 5.6 kcal/mol,equivalent to 29.5 and 13.7 kJ/mol, respectively; themodels were probably not fully relaxed). All other residuetypes were predicted with an error less than �0.5 logKd.The global standard error was 3.1 kJ/mol. Interestingly,many features that have not been recognized before werecorrectly predicted: (1) Val is not the best possible residueat P9; (2) Met, Leu, Ala, and Ile are almost equallypreferred; and (3) 11 other residue types (50%) bind withlower but nonprohibitive strength. Analysis of the struc-tural data showed that all but the aromatic side-chainscan be accommodated free of strain into a medium-size,mainly hydrophobic pocket (the F pocket35). In the case ofPhe and Trp, the computed strain was probably evenoverestimated. Moreover, Csh, Ser, Thr, Asn, and Gln canform 1 ideal H-bond. Though not enough for strong bind-ing, this evidently helps to broaden the specificity profile.Our results are supported by the systematic survey ofRudolf et al.,60 who checked the A2 affinity of all possible9-mers derived from HPV-18 E6 and E7 proteins. Fifteenout of the 247 peptides (6%) had a Kd below 1 M, but only4 had Val at P9, 4 others had Leu, 1 Ile, and 6 had anonstandard anchor residue. In conclusion, the specificityprofile at the anchor residues in A2 is well represented byour predictions and much more permissive than so farassumed.

TABLE I. Predicted Versus Experimental Affinities of FLSKQYMTL Mutantsa

A2 A24

P2 P9 P2 P9

AA �Gexp Score �Gpr Error AA �Gexp Score �Gpr Error AA �Gexp Score �Gpr Error AA �Gexp Score �Gpr Error

L �42.4 �4.8 �45.3 �2.9 M �42.9 �3.3 �40.7 2.2 W �48.8 �6.2 �48.9 �0.1 F �46.2 �4.2 �40.7 5.5M �41.8 �4.4 �44.3 �2.5 V �42.4 �4.2 �42.0 0.4 F �46.9 �4.5 �45.5 1.4 W �46.1 �4.5 �41.8 4.3I �41.5 �2.7 �40.2 1.3 L �42.4 �3.3 �40.8 1.6 Y �46.4 �5.3 �47.1 �0.7 I �43.1 �4.2 �40.7 2.4V �40.8 �3.2 �41.3 �0.5 A �42.0 �2.0 �39.0 3.0 M �46.0 �4.0 �44.5 1.5 L �41.4 �4.0 �40.0 1.5Q �40.4 �2.4 �39.4 1.0 I �41.6 �2.5 �39.6 2.0 Q �44.9 �1.2 �38.7 6.2 M �38.4 �4.1 �40.3 �1.9T �39.9 �1.4 �36.9 3.0 T �40.5 �4.4 �42.4 �1.9 A �43.8 �1.7 �39.6 4.2 V �34.3 �2.3 �33.8 0.5A �39.7 �2.1 �38.7 1.0 F �40.5 0.1 �35.9 4.5 G �41.5 0.1 �36.0 5.5 Y �34.1 �1.6 �31.2 2.9G �39.5 0.5 �32.5 7.0 S �39.0 �1.8 �38.6 0.4 L �41.4 �3.9 �44.3 �2.8 H �32.5 1.0 �21.7 10.8S �38.9 �0.5 �34.8 4.1 C �38.9 �0.6 �37.0 1.9 I �41.2 �3.4 �43.1 �1.9 C �29.1 �1.1 �29.5 �0.4C �37.3 �0.5 �34.7 2.6 G �37.1 �0.3 �36.6 0.6 V �40.3 �3.1 �42.6 �2.3 A �28.6 �1.6 �31.4 �2.8F �36.8 0.8 �31.6 5.2 W �36.8 3.0 �31.8 5.0 T �39.7 �1.2 �38.6 1.0 G �26.4 0.3 �24.1 2.2N �35.9 0.5 �32.3 3.7 Q �36.0 �1.5 �38.2 �2.2 S �38.5 �0.3 �36.8 1.7 Q �20.3 0.7 �23.0 �2.6H �34.0 0.1 �33.3 0.7 P �35.4 �0.3 �36.5 �1.1 H �36.5 0.1 �35.9 0.6 N �20.2 3.0 �14.5 5.7Y �29.9 2.2 �28.3 1.6 H �35.2 0.4 �35.6 �0.4 C �34.8 �0.7 �37.7 �2.9 T �20.1 �0.9 �28.7 �8.6R �27.8 2.4 �27.6 0.2 N �34.3 0.3 �35.7 �1.4 N �34.4 1.8 �32.4 2.1 S �20.1 �0.7 �28.0 �7.9K �27.2 0.1 �33.4 �6.2 Y �34.0 3.0 �31.8 2.2 R �33.7 1.0 �34.1 �0.4 K �20.0 �0.4 �26.7 �6.7E �26.7 1.8 �29.2 �2.5 K �30.5 �1.6 �38.3 �7.8 P �31.8 0.3 �35.6 �3.8 R �19.9 0.5 �23.6 �3.7P �26.3 0.2 �33.1 �6.9 E �30.2 3.0 �31.8 �1.6 E �31.4 1.8 �32.3 �0.9 P �19.9 1.3 �20.8 �0.9D �24.1 2.5 �27.4 �3.4 D �30.0 3.0 �31.8 �1.8 K �30.7 0.0 �36.1 �5.4 E �19.2 0.4 �23.9 �4.7W �23.2 1.5 �29.9 �6.7 R �28.3 1.5 �34.0 �5.7 D �30.6 1.3 �33.4 �2.9 D �18.9 3.0 �14.5 4.5

�b: 3.9 �: 3.1 �: 3.1 �: 5.0

aAA, amino acid placed at the indicated anchor position (P2/P9) of the peptide FLSKQYMTL in the indicated complex (A2/A24); �Gexp,experimental binding free energy in kJ/mol; Score, uncalibrated affinity score in kcal/mol; �Gpr, calibrated affinity in kJ/mol [(Eq. (9)]; Error,difference between �Gexp and �Gpr. Values � �3 kJ/mol are given in bold; values � 3 kJ/mol are underlined.b�, standard deviation of Error values.

ANCHOR PROFILES OF HLA-SPECIFIC PEPTIDES 61

Page 10: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

Binding Specificity of A24

HLA-A*2402 (A24) is another abundant MHC class Ireceptor with similar hydrophobicity as A2 but a signifi-cantly different (presumed) specificity profile. The anchorpositions also appear at positions P2 and P9 but theconsensus motif is P2(Y/F),P9(L/F/I/M)58 or P2(Y/F),P9(F/W/I/L).32 Both specificity pockets are considerably largerthan in A2. For the pocket at P2, this is mainly due to thePhe3Ser mutation in the receptor at residue A9, while theLeu3Ala mutation at A81 can be held responsible for thelarger F-pocket at the peptide C-terminus. The Asn at A77(instead of Asp) also tends to form a H-bond with Tyr-A116, which rigidifies part of the pocket wall.

Our predictions suggested two interesting possibilities,namely, (1) that there is an overall similarity between theanchor preferences of A2 and A24, and (2) that Trp is themost preferred residue at both anchor positions. The latteris completely overlooked in the Bimas and Syfpeithi scor-ing matrices58,59,61 and partially in the motif analysis.32

Also, the idea that A24 would be compatible with theA2-motif, with additional preference for bulky aromaticanchors is not known. We have therefore set up an A24binding assay and tested the same FLSKQYMTL mutantson this receptor.

The experimental data in Table I confirmed our expecta-tions. We found that the A2-P2 preferred residues Met,Gln, (Ala), (Gly), Leu, Ile, and Val, and the A2-P9 preferredIle, Leu, and Met (not Val) have similar affinities for A24.Also, the polar residues show the same disruptive charac-ter. Moreover, aromatic side-chains (with the exception ofTyr at P9) are contributing very strongly to affinity, afeature that is specific to A24. Trp is the “winner” at bothpositions P2 and P9, which is in agreement with thepredictions but not with the presumed A24 motif. It ispossible that the lower intrinsic amino acid frequency ofTrp can be held responsible for the gaps in motifs based onexperimental data. For that matter, the same fact couldalso explain the often underestimated preference of Met atvarious positions in different receptors.

A number of deviations between theoretical and experi-mental data exist. The tiny side-chains Ala and Gly areunderpredicted at P2 by about 1 order of magnitude in Kd.This is often seen at other positions as well. A possibleexplanation could be the presence of structured watermolecules, which are ignored in the simulations. Theunderpredicted Gln might be explained in the same way.Structural analysis showed that Gln at P2 adopts anidentical conformation as in A2, but its van der Waalsinteractions are reduced by �1 kcal/mol (equivalent to�2.5 kJ/mol), due to the mutation Phe3Ser at receptorposition A9. Given the constitution of the pocket, it is likelythat one or more water molecules further stabilize the freeO� atom of Gln. After all, Gln is a well accepted (andcommonly underestimated) residue at P2 in A24.

Position P9 in A24 is a difficult case. The pocket at P9 isvery large but not well shaped to accommodate bulkyside-chains. Only Trp can take full advantage of theLeu3Ala mutation at A81, but even here it suffers from asignificant compensatory strain (ranging from 4.6 to 8.5

kcal/mol in the 3 models). Phe experiences less strain butthe latter is also strongly fluctuating (0.7 to 5.5 kcal/mol).All in all, both Trp and Phe receive similar scores, inagreement with experiment, but both are somewhat under-estimated due to incomplete relaxation in the simulations.

The P9 position in A24 is also special for another reason,namely, its enormous diversity in binding affinity. Theexperimental affinity range spans 27 kJ/mol or �5 ordersof magnitude in Kd. As much as 10 of the 20 residue types(50%) cause very poor binding. The reason for this pro-nounced specificity is related to the weird shape of theF-pocket. Side-chains larger than Ala tend to bump intothe relatively rigid wall formed by A77-Asn, A116-Tyr, andA147-Trp. This causes a residual strain of 1 kcal/mol (�2.5kJ/mol) or higher. More important, however, is the loss ofinteraction with residue A81 (Ala instead of Leu) for allside-chains that do not properly fill the pocket. Finally, theAla mutation leaves a fully hydrophobic hole that does notharbor structured water well. The combination of these 3effects ensures that small and/or polar side-chains aredisruptive at P9 in A24. Though conceptually understood,the peculiarities of the P9 region severely complicatemodeling, which explains why the predicted values fluctu-ate more than they usually do. Nevertheless, the globalstandard error remains acceptable (5.0 kJ/mol) and the top5 predicted residues (Trp, Phe, Ile, Met, and Leu) exactlycoincide with the experimental top 5.

When pooled, the predicted affinities of all peptidestested on A2 and A24 correlate with an R2 of 0.77 and astandard error of 3.85 kJ/mol (0.67 logKd). It is recalledthat no experimental information (apart from the calibra-tion needed for conversion to free energy units) was used inthe development of this structure-based scoring method.The method is therefore unbiased, the advantages of whichshow up in the discovery of previously unknown anchorpreferences.

Binding Specificity of B7

We wanted to examine the performance of PepScope onHLA-B*0702 (B7), a receptor with a pronounced prefer-ence for Pro at position P2 and a P9 specificity with sharedfeatures from A2 and A24. Especially the P2 profileseemed intriguing: According to a point mutation analysisby Sidney et al.62 on the HIV nef 84-92 peptide (FPVR-PQVPL), all P2 mutants were binding at least 100 timesweaker than the native peptide sequence; thus, only the wtPro would be allowed at P2.

We followed exactly the same approach as for A2 in thescoring of all possible amino acid side-chains at positionsP2 and P9 in 3 B7 models. Table II shows the averagedtotal scores for the side-chains in a pAla-B7 context. Ourvalues were compared to those of Sidney et al.62 and 2frequently used scoring matrices (i.e., Bimas58 and Syf-peithi61).

PepScope indeed came up with Pro as the elected P2residue. The gap with the second best (Ala) was about 3kcal/mol, which is one of the largest differences we everobserved. Yet, if the Sidney profile based on a singlepeptide sequence were true in general, then our Ala score

62 J. DESMET ET AL.

Page 11: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

would still be somewhat overestimated. Analysis of thestructural data indicated that the primary reason for theobserved profile is the steric repulsion of all side-chainslarger than Gly, Ala, Pro, and Ser. The Pro ring hydrocar-bons fill the pocket in a strainless fashion and make strongvan der Waals interactions (�8.1 kcal/mol), so it only hasto compensate for desolvation. Ala can also bind free ofstrain but interacts more weakly (�3.6 kcal/mol). Valmakes identical van der Waals interactions as Pro (�8.2kcal/mol) but induces 4.4 kcal/mol of strain; together withthe desolvation term (3.7 kcal/mol) and the minor electro-statics (�0.3 kcal/mol), the balance becomes slightly favor-able (�0.5 kcal/mol). Unfortunately, Sidney et al.62 do notprovide data for the Val mutant, but the Bimas matrixdoes pick it up as a feasible residue. Several other side-chains can make relatively strong interactions, but theyinvariably pay a very high “strain” price for it (e.g., Leu).Ser, Thr, and Asn are subject to a more delicate balancebetween different energetic contributions: In the 3 models,they make 1–2 (Ser and Thr) or 2–3 (Asn) H-bonds, butagain, this is overcompensated by induced strain. If thelatter were not the case, these residue types would haveeven been preferred. It may be questioned whether thecomputed strain terms for these small residues are real,but the experimental data force an affirmative conclusion.

Position P9 has totally different features. Its profilelooks similar to that of A2-P9 and A24-P9, with a strongpreference for Met, Leu, Ile, Val, and Phe. The relativelystrong affinity for Phe (and even Trp), in spite of theabsence of the receptor A81 Leu3Ala mutation, could beexplained by a tendency of the F-pocket to open up as a

result of Asp A114 attracting Tyr A116 by means of aH-bond. Another striking observation is that our top-ranked residue type, Met, is indeed identified by Sidney etal.62 as the most potent one. Note that neither Bimas norSyfpeithi have correctly appraised the capacity of Met.Also, Phe is significantly misjudged by Bimas: the coeffi-cient �1 in the scoring table suggests it is disfavored,while it is actually one of the best. Overall, it can beconcluded that our scores are in much better agreementwith experimental affinities than the ranking by bothBimas and Syfpeithi.

Binding Specificity of A1

Having examined the anchor specificity of 2 typicallyhydrophobic (A2 and A24) and 1 sterically driven receptor(B7), we decided to test PepScope on a more polar system.HLA-A*0101 (A1) is such a receptor, showing markedpreference for Asp and Glu at P3 (not P2), and Tyr, Lys,Arg, and Phe at P9.32,58 Position P2 prefers small, polarside-chains like Thr and Ser, but a pronounced motif hasnot been identified. Hydrophobic side-chains seem to playan inferior role in general. The A1 system was thereforeconsidered as an important test case for the validation ofthe solvent model and the scoring of H-bonds and electro-static interactions.

Since no studies on systematic anchor substitutions areavailable, we have compared our scores with Bimas data(Fig. 4). The first impression is that the results from bothmethods are in fair agreement. There is a consensus aboutstrong binding capacity of Thr at P2, Asp and Glu at P3,and Tyr, Lys, Phe, and Arg at P9. Both methods also agree

TABLE II. HLA-B7 Anchor Specificitya

P2 P9

AA

PepScopescore

(kcal/mol)

SidneyKd

(nM)

Bimasscore(a.u.)

Syfpeithiscore(a.u.) AA

PepScopescore

(kcal/mol)

SidneyKd

(nM)

Bimasscore(a.u.)

Syfpeithiscore(a.u.)

P �4.8 13 20 10 M �3.5 3.8 10 6A �1.9 �1300 3 0 L �2.3 13.5 40 10S �0.9 �1300 1 0 I �1.8 22.4 4 6C �0.6 ND 1 0 F �1.7 9.5 0.2 6V �0.5 ND 5 0 V �1.1 23.8 2 6T �0.5 �1300 1 0 A �0.7 62.3 1 6G 0.0 �1300 1 0 H �0.3 238 0.1 0M 0.7 ND 1 0 G 0.0 ND 0.1 0N 0.8 �1300 1 0 W 0.0 345 0.2 0I 1.2 ND 1 0 K 0.4 �380 0.1 0L 1.2 �1300 1 0 Y 0.6 �380 0.2 0E 1.5 ND 0.1 0 R 0.7 ND 0.1 0H 1.7 ND 0.1 0 S 0.9 ND 0.2 0D 2.0 �1300 0.1 0 T 1.0 �380 1 6Q 2.0 �1300 1 0 Q 1.0 �380 0.1 0W 2.1 ND 0.1 0 C 1.2 �380 1 0Y 2.5 ND 0.1 0 D 1.3 �380 0.1 0F 3.0 �1300 0.1 0 P 1.4 ND 0.1 0R 3.0 ND 0.1 0 E 2.8 ND 0.1 0K 3.0 �1300 0.1 0 N 3.0 �380 0.2 0

aAA, amino acid placed at the indicated anchor position (P2/P9) of the peptide FPVRPQVPL in the B7 complex; PepScope score, uncalibratedaffinity score; Sidney Kd, dissociation constant published by Sidney et al.62; Bimas score, prediction score58; Syfpeithi score, prediction score61;ND, not determined.

ANCHOR PROFILES OF HLA-SPECIFIC PEPTIDES 63

Page 12: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

about the prohibitive nature of Glu, His, Phe, Arg, Trp,and Lys at P2, Arg and Lys at P3, and Pro, Asp, and Glu atP9. However, some deviations are observed as well. Accord-ing to our computations, Tyr is a poor yet tolerated residueat P2, while Bimas considers it to be prohibitive. Thepeptide IYQYMDDLY has been identified as a medium-affinity binder,63 suggesting that Tyr at P2 is at leastfeasible. Met and Trp at P3 and P9, respectively, areranked higher by PepScope than by Bimas. The latter isrecurrently observed for most receptor systems and issupported by direct affinity measurements (see previoussections). Finally, Thr received the fourth best score at P9(�2.7 kcal/mol), similar to the known anchors Phe andArg, while Bimas assigns a neutral score (1.0). We know ofonly one experimental example with Thr at P9, but it isquite convincing: The strongest A1-binding peptide fromHPV-18 was found to be YSDSVYGDT, with a Kd of 188nM.60

We have examined the underlying structural reasons forthe observed polar preferences. A common feature forThr/Ser at P2, Asp/Glu at P3, and Lys/Arg at P9 is theirability to bind relatively free of strain (the highest tensionswere �1.5 kcal/mol for Asp/Glu at P3 and 2.5 kcal/mol forArg at P9). Thus, the interactions made by these residuesin the complex are compensated only for desolvationenergy (Fig. 1). The OH-functions of both Ser and Thr atP2 can form 2 nearly ideal H-bonds (one as donor toGlu-A63 and one as acceptor from Asn-A66) for which theyreceive �2.4 kcal/mol. They also receive an equal (Ser) orslightly better (Thr: �2.9 kcal/mol) amount of electrostaticenergy. Together with the other interactions, the balanceis slightly in favor of Thr (�2.6 kcal/mol), immediatelyfollowed by Ser (�2.0 kcal/mol). The total scores for Serand Thr are “good” though not “excellent”; another H-bondwould be needed to make them elected anchors.

Asp and Glu, in contrast, are true anchor residues atposition P3. Asp can form a double bifurcated H-bond withArg-A156 (�3.1 kcal/mol) and receives an extra �6.2kcal/mol electrostatic energy, mainly from the same Argbut also from the peptide backbone NH dipoles of residuesP3 and P5. Glu forms 2 suboptimal H-bonds with Arg-A156 and another weak H-bond with His-A70, for which itreceives a total of �2.7 and �5.2 kcal/mol in H-bondingand electrostatic energy, respectively.

Lys and Arg at P9 can both form a single H-bond withAsp-A116 (yielding �1.4 and �1.9 kcal/mol, respectively),but the electrostatics are only of moderate quality (�1.2and �1.8 kcal/mol, respectively). However, Lys can bind in

a totally relaxed way, while Arg induces 2.5 kcal/mol ofstrain. The van der Waals interactions largely suffice tocompensate for desolvation, so that the end balance forboth is relatively favorable (�4.2 and �2.6 kcal/mol,respectively). For comparison, Tyr has a total energy of�5.7 kcal/mol, a value that can rightly be associated with astrong anchor.

DISCUSSION

Accurate modeling of the conformation of anchor resi-dues in peptide–receptor complexes is possible providedthat models are prepared with state-of-the-art methodol-ogy and that the conformational space of the anchorside-chains is thoroughly explored.64 However, the nextproblem to be solved, that is, to derive affinities fromstructural data, is absolutely not trivial.65,66 An evengreater challenge is to develop a robust, generally appli-cable scoring method that does not require reparameteriza-tion for different systems.

Affinity scoring algorithms face a dual problem: Thebinding of a ligand to its receptor is accompanied by partialdesolvation of both molecules. From a structural point ofview, this involves dramatic changes in intermolecularinteractions: Ligand (and receptor) interactions with sol-vent disappear, solvent molecules reorganize, and theligand forms novel contacts with its receptor. This pictureis further complicated by dynamic aspects: Flexible li-gands in solution assume a repertoire of conformations,liberated solvent molecules gain configurational freedom,and even stable complexes remain under thermal motion.The most authentic affinity prediction methods are there-fore molecular dynamics simulations.16 However, suchmethods are computationally extremely demanding and,so far, they have not been applied to HLA complexes,which contain long, flexible peptides.

At the other end of the spectrum are experimental–statistical prediction methods that typically derive rulesfrom large data sets of peptide sequences and experimen-tal binding data. The most successful statistical ap-proaches are matrix-based prediction methods.58,61,67,68

As a matter of fact, they derive affinity profiles of the typedescribed in this work, but using experimental informa-tion only. Since at least 1, but ideally 3 or more peptidesequences are needed per element of a 9 (positions) � 20(amino acids) matrix, hundreds of synthetic peptides arerequired for each HLA receptor type. At present, suchquantities are available only for a very limited number oftypes.

Fig. 4. Anchor profiles for HLA-A1. A comparison is shown between PepScope and Bimas58 predictions.Amino acid preferences for the anchor positions indicated at the left are listed in decreasing order ofpreference. Strong and nonpreferred residues are indicated in bold and italics, respectively. The correspondingcutoffs are �2.5 and 0 kcal/mol for PepScope, and 2 and 0.1 units for Bimas.

64 J. DESMET ET AL.

Page 13: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

In between are a variety of methods combining experi-mental and theoretical information. Structural–statisticalmethods19–21 extract interatomic contact or distance pref-erences from known tertiary structures. These preferencesare statistically processed and represented by a scoringfunction. Partitioning methods22–27 basically do the same,but they apply more sophisticated mathematical relation-ships to represent the physical determinants of stabilityand affinity. Since both approaches are, in one way oranother, optimized against experimental data, the selec-tion of learning data (accuracy, size, diversity, etc.) iscritical for transferability of the method. Another majorproblem is the coupling of different physical effects. Forexample, hydrogen bonds are partially electrostatic innature, formation of local interactions reduces conforma-tional freedom (entropy), and hydration of apolar groups isaccompanied by nonenthalpic, that is, purely entropiceffects (the hydrophobic effect). Fortunately, since freeenergy is a state function, desolvation can be decoupledfrom complex formation, which allows us to study themseparately. But here, another important problem immedi-ately shows up: Net binding free energies are smallcompared to the isolated desolvation and formation freeenergies. (In part, this is due to the choice of the intermedi-ate state where the ligand resides in vacuum.) Finally,structure-based methods using static molecular represen-tations have to envisage the consequences of ignoringdynamical effects.

These considerations lied at the basis of the PepScopemethod. Without claiming the ultimate solution, we incor-porated features of different approaches while focusingprimarily on transferability. We investigated the useful-ness of one and the same potential energy function, herethe original CHARMM function and parameters, to assessboth desolvation and intracomplex energies. No attemptwas made to calculate free energies for both of the indi-vidual steps but, rather, potential energies were computed[Eq. (2) for desolvation and Eqs. (3) and (4) for complexformation], and these terms were added up [Eq. (1)] andscaled linearly by fitting calculated anchor profiles toexperimental data [Eq. (7)]. This scaling does not alter theranking of anchor residues, such ranking forming themain feature of a specificity profile and the primary aim ofthis work.

Still, the slope of potential versus free energy correlationplots does offer interesting information: For side-chainhydration energies (Fig. 2), it was 0.48, and for the 4affinity profiles in Table I, it was 0.57 � 0.22. Thissuggests that a large portion of ligand–solvent as well asligand–receptor interaction energy is “somehow” compen-sated. For ligand–solvent systems, there is direct experi-mental evidence for this: Ooi et al.42 not only publishedhydration free energies (Fig. 2) but also enthalpies (Table442); expressing free energy as a function of enthalpyyields a correlation curve with R2 � 0.92 and a slope of0.71. Thus, about 30% of hydration enthalpy is entropy-compensated. Moreover, the high correlation coefficientindicates that, at least for side-chain hydration energies,free energy can be well approximated as a linear function

of enthalpy and, in a modeling context, by force field-basedpotential energy. In addition, the same idea formed thebasis of a widespread hypothesis in FEP methods: theso-called “linear response assumption” of Åqvist et al.69

states that hydration free energy scales linearly withsolute–solvent interactions, which allows avoidance ofexplicit sampling of responsive effects. Interestingly, theauthors found an optimal coefficient of 1

2(i.e., exactly the

same as the slope of the curve in Fig. 2).Some structure-based prediction methods rely on experi-

mental data to assess the desolvation part of their affinityscoring function, while intracomplex interactions are calcu-lated from a potential energy function.53,54 We reasonedthat this may not be the optimal approach given thedifferent nature of both data sources. In order to preservemaximal consistency between desolvation and complexformation, a very similar protocol and an identical forcefield function was used in both cases. In view of thesimplicity of the solute–solvent protocol, it was rathersurprising that very few systematic errors were observedin the validation experiments. Only one significant type oferror was detected, namely, an overestimation of aromaticgroups, including the guanidinium moiety of Arg. Inspec-tion of the CHARMM van der Waals parameters suggestedthat these might have been overestimated. We thereforeassigned a weight coefficient to the corresponding Evdw

terms in Eq. (3), the value of which was optimized using A2data only. Since we principally wanted to avoid parameter-ization, only a single coefficient was used. The value of0.8 a posteriori seems quite satisfactory given the resultsfor the aromatic minded A24-P2 and -P9 anchors and the(moderate) Arg preference of position P9 in A1. Somehigher accuracy could have been obtained using a slightlygreater coefficient for true aromatic groups (Phe, Tyr, Trp,His) and a little smaller coefficient for the guanidiniumgroup (Arg), but such adjustment was not performed inthis work. In general, affinity scores for charged groupsalso showed some tendency toward the false-positive side,which could be due to underestimation of the correspond-ing GSPs (by 1–2 kcal/mol; i.e., about 5–10% of the totaldesolvation energy of charged side-chains).

The PepScope method generates multiple representa-tions for all studied systems. In the water box, 100independent energy minimizations are performed for eachside-chain rotamer. In the evaluation of intracomplexenergies, side-chain rotamers are minimized in 3 differenttemplate contexts. The systematic rotameric search per-formed for each side-chain type, both in the water box andin the complexes, presents another level of conformationaldiversity. In a sense, this multiple representation or“ensemble” approach can be envisaged as a way to sampleconformational space in a nondynamic environment. Yetthe ensemble approach in combination with energy minimi-zation also serves another purpose, namely, to search forglobal energy optima starting from multiple suboptimalstates, which is a typical search problem. Inspection of theenergy refinement paths, in particular of side-chains withH-bonding capacity (e.g., Gln at P2 in A2; see Fig. 3)showed that the search problem is more prominent than

ANCHOR PROFILES OF HLA-SPECIFIC PEPTIDES 65

Page 14: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

the sampling problem. The rotameric searches and theusage of multiple models should therefore be seen as apractical way to identify near-optimal structures ratherthan an attempt to sample conformational space.

The establishment of the modeling protocols was guidedmainly by experience and common sense. It was com-pletely decoupled from the other part of the PepScopemethod, that is, the scoring protocol. The latter actuallyresulted from an endeavor at using an integral force fieldfunction, without reparameterization, to derive affinityscores. As discussed, only van der Waals contributions ofaromatic groups had to be adjusted. Thus, we largelysucceeded in our attempt, especially since not any adjust-ment was needed for HLA receptors with divergent physi-cochemical properties (A24, B7, and A1).

On reflection, the implications of this study are far-reaching: in general, the net ligand binding free energyappears to be a delicate balance between very largecomponents, namely, free energy of desolvation and freeenergy of binding. When decomposing the latter into directligand–receptor interactions and induced strain, there are2 positive and 1 negative contribution. Figure 3 illustratestheir mutual proportions, computed for all amino acidside-chains at position P2 in HLA-A2. Given the faircorrelation with experiment (Tables I and II), the indi-vidual terms are expected to be meaningful. From astructural bioinformatics perspective, these findings arestartling. The net binding energy of an “average” side-chain in a “typical” receptor pocket amounts at best about25%, but mostly around 10%, or less, of the correspondingintracomplex interaction terms. This means that a 10%error in any of the independent contributions is likely todisturb the correlation with experiment. For example,failure to simulate a single, relevant H-bond (or thegeneration of a false H-bond) distorts the computed inter-action energy by 1.5 kcal/mol in H-bonding and roughly anequal amount in electrostatic energy; this is equivalent tomore than 1 log-order in Kd.

This situation provides a possible explanation for thelack of transferability of many scoring functions. A simpletrain of thought makes this clear. Suppose, for example,that the hydration free energies from Ooi et al.42 wouldhave been combined as such with the intracomplex poten-tial energies. Since the ratio of free–potential energy isroughly 1

2(Fig. 2 and Åqvist and Marelius18), the difference

in dehydration energy between polar and nonpolar resi-dues would also be halved. When parameterizing a typicalscoring function of the partitioning type against, for ex-ample, hydrophobic systems, this would logically lead to asuppression of polar interactions. Examples of significantdifferences in weight coefficients of polar and nonpolarcontributions resulting from optimization on systems ofdifferent hydrophobicity are known,18,29 but systematicstudies on the origins of (non)transferability are scarce.The PepScope approach could form a valuable reference infuture studies on this important matter.

The validation part of this study was focused on HLAanchor residues. The latter form the primary contributionto peptide binding affinity36 and are therefore dominant

determinants of specificity. Secondary anchors have alsobeen identified.70,71 A wide variety of methods, specificallydesigned for affinity prediction of peptide-MHC complexes,have been published (see reviews72,73). Initial predictionmethods were mainly based on motifs (i.e., specific combi-nations of amino acids in peptide sequences). Later meth-ods utilized more refined motifs or the quantitative versionknown as scoring matrices or profiles.58,61,67,68 A bottle-neck in deriving profiles is the lack of experimental datafor many HLA subtypes. Therefore, structure-based predic-tion methods are expected to fill the gap.26,38,74–76 Al-though many successful applications were reported, theprediction accuracy of both experimental and structurebased approaches seems to have become stagnant of late(e.g., see the MHC class II benchmark results at http://www.imtech.res.in/raghava/mhcbench/result.html). Somepossible reasons have been addressed in this work, butprobably the main source of error is that all data sets usedto parameterize scoring functions (or matrices) are heavilybiased toward the presence of classical sequence motifs.There is a high risk for such types of bias to distort thescoring parameters, leading to newly selected sequencesthat are characterized by the same bias. Although thevalue of classical motifs is not questioned, there are 2 caseswherein motif-deficient peptides can still attain high affin-ity. First, we have shown that residue types Trp and Metare generally undervalued in common profiles. A plausibleexplanation is their low frequency in natural proteins.Second, our experiments (and predictions) indicate thatmany anchor substitutions are suboptimal indeed, but notto the extent that they would be disruptive. This impliesthat the loss in affinity can often be compensated by thecontribution of nonanchor residues. It also implies thatmany HLA binding peptides, including immunologicallyactive T-cell epitopes, may have been missed in the past byselection procedures based on traditional motifs or pro-files.

Exactly the same conclusion was drawn from a recentstudy on the impact of anchor substitutions.77 Theauthors selected a high-affinity A2-binding peptide(ILDPFPVTV) from the results of a QSAR-based designexperiment, synthesized all possible monosubstitutedvariants of the P2 and P9 anchors, and determined thebinding capacities using a stability assay. In spite of theexperimental differences, their results were in very goodagreement with ours. The authors concluded that “onedoes not require traditional anchors if the rest of thepeptide is sufficiently optimized” and also that “therelative importance of the anchor residues should thusbe rethought.” It is our personal conviction that this goalcan best be achieved by a joint experimental and theoreti-cal effort, paying special attention to problems related todata selection, prediction accuracy, and especially trans-ferability.

CONCLUSIONS

The binding specificity of peptide–HLA complexes isstrongly determined by peptide anchor positions. Yet theanchor preferences for most HLA subtypes remain poorly

66 J. DESMET ET AL.

Page 15: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

characterized. We showed that many profiles are skewedin the sense that they contain false-positive and, espe-cially, false-negative information. The binding capacitiesof Trp and Met are generally undervalued. Small, polarside-chains (e.g., Ser, Thr) are usually assigned neutralvalues in Bimas and Syfpeithi scoring matrices, while inreality their binding strength is highly variable: relativelystrong at A2-P2, A2-P9, A1-P2, and A1-P9, but prohibitiveat A24-P9, B7-P2, and B7-P9. In general, all currentbinding motifs and profiles have difficulties in correctlyappreciating anchor residues with intermediate bindingcapacity. Another conclusion is that true binding profilesare usually broader (i.e., less specific) than assumed thusfar.

The PepScope method uses only structural data and astandard force field function to score anchor residues inHLA complexes. This approach offers several advantages:(1) Its independence from experimental binding data guar-antees unbiased analysis with a greater sensitivity; (2) themethod is equally well applicable in cases where experimen-tal information is scarce; and (3) computed affinities can bethoroughly rationalized either by dissection into physicalcontributions or by structural inspection. Furthermore,since the method does not contain a training step, it ischaracterized by a great transferability. We have demon-strated this by validation on 4 HLA receptors with diver-gent physical properties.

We have developed an original approach to quantifydesolvation effects. This was accomplished by in silicosubmersion of amino acid model compounds in explicitwater, followed by standard energy minimization andproper rotameric sampling, energy calculation, averaging,and selection. Combination of solvent terms with intracom-plex energies resulted in scores almost devoid of system-atic errors. This confirmed our postulate that desolvationenergies should be compatible with intracomplex termsand are therefore preferably derived from the same energyfunction.

The PepScope scoring function basically consists of threeenergetic components: desolvation, direct ligand–receptorinteractions, and intracomplex strain. These terms arestrongly affected by local conditions in a complex, thelatter forming the basis of specificity. In order for a residueto be contributive, its local interactions have to compen-sate for unfavorable desolvation and strain. The netbalance can be very delicate, especially when aromatic orpolar side-chains are involved. From a modeling point ofview, this imposes very high demands on the accuracy ofcomplex models. Here, we have demonstrated that suchlevel of accuracy is attainable for buried anchors. Scoringof nonanchor residues or full peptide sequences obviouslyrequires extension of the methodology. However, the lattergoes beyond the scope of this work. The PepScope methodshould therefore be seen as a first yet important step toaffinity scoring of complete ligands.

ACKNOWLEDGMENTS

Prof. Pierre van der Bruggen, UCL, Brussels, is acknowl-edged for his kind gift of B-cells. We thank Prof. Kris

Thielemans, VUB, Brussels, for his vivid interest in ourwork and its possible applications in vaccine development.

REFERENCES

1. Rosenfeld R, Zheng Q, Vajda S, DeLisi C. Flexible docking ofpeptides to class I major-histocompatibility-complex receptors.Genet Anal 1995;12:1–21.

2. Desmet J, Wilson IA, Joniau M, De Maeyer M, Lasters I. Computa-tion of the binding of fully flexible peptides to proteins with flexibleside chains. FASEB J 1997;11:164–172.

3. Dunbrack RL, Karplus M. Conformational analysis of the back-bone-dependent rotamer preferences of protein side-chains. NatStruct Biol 1994;1:335–340.

4. De Maeyer M, Desmet J, Lasters I. All in one: a highly detailedrotamer library improves both accuracy and speed in the model-ling of sidechains by dead-end elimination. Fold Des 1997;2:53–66.

5. Schrauber H, Eisenhaber F, Argos P. Rotamers: to be or not to be?:an analysis of amino acid side-chain conformations in globularproteins. J Mol Biol 1993;230:592–612.

6. Mendes J, Baptista AM, Carrondo MA, Soares CM. Improvedmodeling of side-chains in proteins with rotamer-based methods: aflexible rotamer model. Proteins 1999;37:530–543.

7. Madden DR, Garboczi DN, Wiley DC. The antigenic identity ofpeptide-MHC complexes: a comparison of the conformations of fiveviral peptides presented by HLA-A2. Cell 1993;75:693–708.

8. Chen Y, Sidney J, Southwood S, Cox AL, Sakaguchi K, HendersonRA, Appella E, Hunt DF, Sette A, Engelhard VH. Naturallyprocessed peptides longer than nine amino acid residues bind tothe class I MHC molecule HLA-A2.1 with high affinity and indifferent conformations. J Immunol 1994;152:2874–2881.

9. Batalia MA, Collins EJ. Peptide binding by class I and class IIMHC molecules. Biopolymers 1997;43:281–302.

10. Persson K, Schneider G. Three-dimensional structures of MHCclass I-peptide complexes: implications for peptide recognition.Arch Immunol Ther Exp 2000;48:135–142.

11. Guo HC, Jardetzky TS, Garrett TP, Lane WS, Strominger JL,Wiley DC. Different length peptides bind to HLA-Aw68 similarlyat their ends but bulge out in the middle. Nature 1992;360:364–366.

12. Stern LJ, Wiley DC. Antigenic peptide binding by class I and classII histocompatibility proteins. Structure 1994;2:245–251.

13. Zhang W, Young ACM, Imarai M, Nathenson SG, Sacchettini JC.Crystal structure of the major histocompatibility complex class IH-2Kb molecule containing a single viral peptide: implications forpeptide binding and T-cell receptor recognition. Proc Natl Acad SciUSA 1992;89:8403–8407.

14. Smith KJ, Reid SW, Harlos K, McMichael AJ, Stuart DI, Bell JI,Jones EY. Bound water structure and polymorphic amino acidsact together to allow the binding of different peptides to MHCclass I HLA-B53. Immunity 1996;4:215–228.

15. Levy Y, Onuchic JN. Water and proteins: a love–hate relation-ship. Proc Natl Acad Sci USA 2004;101:3325–3326.

16. Reddy MR, Erion MD. Free energy calculations in rational drugdesign. New York: Kluwer Academic/Plenum Press; 2001.

17. Pearlman DA. Free energy calculations: methods for estimatingligand binding affinities. In: Reddy MR, Erion MD, editors. Freeenergy calculations in rational drug design. New York: KluwerAcademic/Plenum Press; 2001. p 9–35.

18. Åqvist J, Marelius J. The linear interaction energy method forcomputation of ligand binding affinities. In: Reddy MR, Erion MD,editors. Free energy calculations in rational drug design. NewYork: Kluwer Academic/Plenum Press; 2001. p 171–194.

19. Gilis D, Rooman M. Stability changes upon mutation of solvent-accessible residues in proteins evaluated by database-derivedpotentials. J Mol Biol 1996;257:1112–1126.

20. Sippl MJ, Ortner M, Jaritz M, Lackner P, Flockner H. Helmholtzfree energies of atom pair interactions in proteins. Fold Des1996;1:289–298.

21. Jernigan RL, Bahar I. Structure-derived potentials and proteinsimulations. Curr Opin Struct Biol 1996;6:195–209.

22. Bohm HJ. The development of a simple empirical scoring functionto estimate the binding constant for a protein–ligand complex ofknown three-dimensional structure. J Comput Aided Mol Des1994;8:243–256.

ANCHOR PROFILES OF HLA-SPECIFIC PEPTIDES 67

Page 16: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

23. Weng Z, Vajda S, DeLisi C. Prediction of complexes using empiri-cal free energy functions. Protein Sci 1996;5:614–626.

24. Vajda S, Sippl M, Novotny J. Empirical potentials and functionsfor protein folding and binding. Curr Opin Struct Biol 1997;7:222–228.

25. Wang R, Lai L, Wang S. Further development and validation ofempirical scoring functions for structure-based binding affinityprediction. J Comput Aided Mol Des 2002;16:11–26.

26. Doytchinova IA, Flower DR. Physicochemical explanation of pep-tide binding to HLA-A*0201 major histocompatibility complex: athree-dimensional quantitative structure-activity relationshipstudy. Proteins 2002;48:505–518.

27. Guerois R, Nielsen JE, Serrano L. Predicting changes in thestability of proteins and protein complexes: a study of more than1000 mutations. J Mol Biol 2002;320:369–387.

28. Froloff N, Windemuth A, Honig B. On the calculation of bindingfree energies using continuum methods: application to MHC classI protein–peptide interactions. Protein Sci 1997;6:1293–1301.

29. Rognan D, Lauemøller SL, Holm A, Buus S, Tschinke V. Predict-ing binding affinities of protein ligands from three-dimensionalmodels: application to peptide binding to class I major histocompat-ibility proteins. J Med Chem 1999;42:4650–4658.

30. Schapira M, Totrov M, Abagyan R. Prediction of the bindingenergy for small molecules, peptides and proteins. J Mol Recogn1999;12:177–190.

31. Engelhard VH. Structure of peptides associated with MHC class Imolecules. Curr Opin Immunol 1994;6:13–23.

32. Marsh SGE, Parham P, Barber LD. The HLA factsbook. SanDiego: Academic Press; 2000.

33. Bjorkman PJ, Saper MA, Samraoui B, Bennett WS, StromingerJL, Wiley DC. The foreign antigen binding site and T cellrecognition regions of class I histocompatibility antigens. Nature1987;329:512–528.

34. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, WeissigH, Shindyalov IN, Bourne PE. The Protein Data Bank. NucleicAcids Res 2000;28:235–242.

35. Saper MA, Bjorkman PJ, Wiley DC. Refined structure of thehuman histocompatibility antigen HLA-A2 at 2.6 Å. J Mol Biol1991;219:277–319.

36. Ruppert J, Sidney J, Celis E, Kubo RT, Grey HM, Sette A.Prominent role of secondary anchor residues in peptide binding toHLA-A2.1 molecules. Cell 1993;74:929–937.

37. Matsumura M, Fremont DH, Peterson PA, Wilson IA. Emergingprinciples for the recognition of peptide antigens by MHC class Imolecules. Science 1992;257:927–934.

38. Ogata K, Jaramillo A, Cohen W, Briand JP, Connan F, Choppin J,Muller S, Wodak SJ. Automatic sequence design of major histocom-patibility complex class I binding peptides impairing CD8 T cellrecognition. J Biol Chem 2003;278:1281–1290.

39. Gordon DB, Marshall SA, Mayo SL. Energy functions for proteindesign. Curr Opin Struct Biol 1999;9:509–513.

40. Brooks BR, Bruccoleri RE, Olafson BD, States DJ, SwaminathanS, Karplus M. CHARMM: a program for macromolecular energyminimization and dynamics calculations. J Comput Chem 1983;4:187–217.

41. Wolfenden R, Andersson L, Cullis PM, Southgate CC. Affinities ofamino acid side chains for solvent water. Biochemistry 1981;20:849–855.

42. Ooi T, Oobatake M, Nemethy G, Scheraga HA. Accessible surfaceareas as a measure of the thermodynamic parameters of hydra-tion of peptides. Proc Natl Acad Sci USA 1987;84:3086–3090.

43. Wimley WC, Creamer TP, White SH. Solvation energies of aminoacid side chains and backbone in a family of host–guest pentapep-tides. Biochemistry 1996;35:5109–5124.

44. Del Guercio MF, Sidney J, Hermanson G, Perez C, Grey HM, KuboRT, Sette A. Binding of a peptide antigen to multiple HLA allelesallows definition of an A2-like supertype. J Immunol 1995;154:685–693.

45. Desmet J, Spriet J, Lasters I. Fast and accurate side-chaintopology and energy refinement (FASTER) as a new method forprotein structure optimization. Proteins 2002;48:31–43.

46. Desmet J, De Maeyer M, Spriet J, Lasters I. Flexible docking ofpeptide ligands to proteins. In: Webster D, editor. Methods inmolecular biology: Vol. 143. Protein structure prediction: methodsand protocols. Totowa, NJ: Humana Press; 2000. p 359–376.

47. Eisenberg D, McLachlan AD. Solvation energy in protein foldingand binding. Nature 1986;319:199–203.

48. Lazaridis T, Karplus M. Effective energy function for proteins insolution. Proteins 1999;35:133–152.

49. Warshel A, Levitt M. Theoretical studies of enzymatic reac-tions: dielectric, electrostatic and steric stabilization of thecarbonium ion in the reaction of lysozyme. J Mol Biol 1976;103:227–249.

50. Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, KleinML. Comparison of simple potential functions for simulatingliquid water. J Chem Phys 1983;79:926–935.

51. van der Burg SH, Ras E, Drijfhout JW, Benckhuijsen WE,Bremers AJ, Melief CJM, Kast WM. An HLA class I peptide-binding assay based on competition for binding to class I mol-ecules on intact human B cells: identification of conserved HIV-1polymerase peptides binding to HLAA*0301. Hum Immunol 1995;44:189–198.

52. Kessler JH, Mommaas B, Mutis T, Huijbers I, Vissers D, Benckhui-jsen WE, Schreuder GM, Offringa R, Goulmy E, Melief CJ, van derBurg SH, Drijfhout JW. Competition-based cellular peptide bind-ing assays for 13 prevalent HLA class I alleles using fluorescein-labeled synthetic peptides. Hum Immunol 2003;64:245–255.

53. Vajda S, Weng Z, Rosenfeld R, DeLisi C. Effect of conformationalflexibility and solvation on receptor–ligand binding free energies.Biochemistry 1994;33:13977–13988.

54. Wernisch L, Hery S, Wodak SJ. Automatic protein design with allatom force-fields by exact and heuristic optimization. J Mol Biol2000;301:713–736.

55. Das B, Meirovitch H. Solvation parameters for predicting thestructure of surface loops in proteins: transferability and entropiceffects. Proteins 2003;51:470–483.

56. Falk K, Rotzschke O, Stevanovic S, Jung G, Rammensee HG.Allele specific motifs revealed by sequencing of self-peptideseluted from MHC molecules. Nature 1991;351:290–296.

57. Rotzschke O, Falk K, Stevanovic S, Jung G, Rammensee HG.Peptide motifs of closely related HLA class I molecules encompasssubstantial differences. Eur J Immunol 1992;22:2453–2456.

58. Parker KC, Bednarek MA, Coligan JE. Scheme for rankingpotential HLA-A2 binding peptides based on independentbinding of individual peptide side-chains. J Immunol 1994;152:163–175.

59. Parker KC, Shields M, DiBrino M, Brooks A, Coligan JE. Peptidebinding to MHC class I molecules: implications for antigenicpeptide prediction. Immunol Res 1995;14:34–57.

60. Rudolf MP, Man S, Melief CJM, Sette A, Kast WM. Human T-cellresponses to HLA-A-restricted high binding affinity peptides ofhuman papillomavirus type 18 proteins E6 and E7. Clin CancerRes 2001;7(Suppl 3):788s–795s.

61. Rammensee H-G, Bachmann J, Emmerich NP, Bachor OA,Stevanovic S. SYFPEITHI: database for MHC ligands and peptidemotifs. Immunogenetics 1999;50:213–219.

62. Sidney J, Southwood S, del Guercio MF, Grey HM, Chesnut RW,Kubo RT, Sette A. Specificity and degeneracy in peptide binding toHLA-B7-like class I molecules. J Immunol 1996;157:3480–3490.

63. Sette A, Kubo RT, Sidney J, Celis E, Grey HM, Southwood S.HLA-binding peptides and their uses. 1999;WO 99/45954.

64. Ota N, Agard DA. Binding mode prediction for a flexible ligandin a flexible pocket using multi-conformation simulated anneal-ing pseudo crystallographic refinement. J Mol Biol 2001;314:607– 617.

65. Halperin I, Ma B, Wolfson H, Nussinov R. Principles of docking:an overview of search algorithms and a guide to scoring functions.Proteins 2002;47:409–443.

66. Kroemer RT. Molecular modelling probes: docking and scoring.Biochem Soc Trans 2003;31:980–984.

67. Jung G, Fleckenstein B, von der Mulbe F, Wessels J, NiethammerD, Wiesmuller KH. From combinatorial libraries to MHC ligandmotifs, T-cell superagonists and antagonists. Biologicals 2001;29:179–181.

68. Singh H, Raghava GPS. ProPred1: prediction of promiscuousMHC class-I binding sites. Bioinformatics 2003;19:1009–1014.

69. Åqvist J, Medina C, Samuelsson JE. A new method for predictingbinding affinity in computer-aided drug design. Prot Eng 1994;7:358–391.

70. Kondo A, Sidney J, Southwood S, del Guercio MF, Appella E,Sakamoto H, Celis E, Grey HM, Chesnut RW, Kubo RT. Promi-nent roles of secondary anchor residues in peptide binding toHLA-A24 human class I molecules. J Immunol 1995;155:4307–4312.

68 J. DESMET ET AL.

Page 17: Anchor profiles of HLA-specific peptides: Analysis by a novel affinity scoring method and experimental validation

71. Sidney J, Southwood S, Pasquetto V, Sette A. Simultaneousprediction of binding capacity for multiple molecules of the HLAB44 supertype. J Immunol 2003;171:5964–5974.

72. Yu K, Petrovsky N, Schonbach C, Koh JLY, Brusic V. Methods forprediction of peptide binding to MHC molecules: a comparativestudy. Mol Med 2002;8:137–148.

73. Lund O, Nielsen M, Kesmir C, Christensen JK, Lundegaard C,Worning P, Brunak S. Web-based tools for vaccine design. In:Korber BT, Brander C, Haynes BF, Koup R, Kuiken C, Moore JP,Walker BD, Watkins D, editors. HIV molecular immunology 2002.Los Alamos, NM: Theoretical Biology and Biophysics Group, LosAlamos National Laboratory; 2002. p 45–51.

74. Schueler-Furman O, Altuvia Y, Sette A, Margalit H. Structure-

based prediction of binding peptides to MHC class I molecules:application to a broad range of MHC alleles. Protein Sci 2000;9:1838–1846.

75. Wollacott AM, Desjarlais JR. Virtual interaction profiles of pro-teins. J Mol Biol 2001;313:317–342.

76. Schafroth HD, Floudas CA. Predicting peptide binding to MHCpockets via molecular modeling, implicit solvation, and globaloptimization. Proteins 2004;54:534–556.

77. Doytchinova IA, Walshe VA, Jones NA, Gloster SE, Borrow P,Flower DR. Coupling in silico and in vitro analysis of peptide-MHC binding: a bioinformatic approach enabling prediction ofsuperbinding peptides and anchorless epitopes. J Immunol 2004;172:7495–1502.

ANCHOR PROFILES OF HLA-SPECIFIC PEPTIDES 69