Industrial-scale proteomics: From liters of plasma to chemically synthesized proteins Keith Rose 1 * , Lydie Bougueleret 1 * , Thierry Baussant 1 , Günter Böhm 1 , Paolo Botti 1 , Jacques Colinge 1 , Isabelle Cusin 1 , Hubert Gaertner 1 , Anne Gleizes 1 , Manfred Heller 1 , Silvia Jimenez 1 , Andrew Johnson 1 , Martin Kussmann 1 , Laure Menin 1 , Christoph Menzel 1 , Frederic Ranno 1 , Patricia Rodriguez-Tomé 1 , John Rogers 1 , Cedric Saudrais 1 , Matteo Villain 1 , Diana Wetmore 1 , Amos Bairoch 1, 2 and Denis Hochstrasser 1, 3 1 GeneProt, Geneva, Switzerland 2 Swiss Institute of Bioinformatics, CMU, Geneva, Switzerland 3 University Hospital of Geneva, Geneva, Switzerland Human blood plasma is a useful source of proteins associated with both health and disease. Analysis of human blood plasma is a challenge due to the large number of peptides and pro- teins present and the very wide range of concentrations. In order to identify as many proteins as possible for subsequent comparative studies, we developed an industrial-scale (2.5 liter) approach involving sample pooling for the analysis of smaller proteins (M r generally , ca. 40 000 and some fragments of very large proteins). Plasma from healthy males was depleted of abundant proteins (albumin and IgG), then smaller proteins and polypeptides were sepa- rated into 12 960 fractions by chromatographic techniques. Analysis of proteins and polypep- tides was performed by mass spectrometry prior to and after enzymatic digestion. Thousands of peptide identifications were made, permitting the identification of 502 different proteins and polypeptides from a single pool, 405 of which are listed here. The numbers refer to chromato- graphically separable polypeptide entities present prior to digestion. Combining results from studies with other plasma pools we have identified over 700 different proteins and polypep- tides in plasma. Relatively low abundance proteins such as leptin and ghrelin and peptides such as bradykinin, all invisible to two-dimensional gel technology, were clearly identified. Proteins of interest were synthesized by chemical methods for bioassays. We believe that this is the first time that the small proteins in human blood plasma have been separated and analyzed so extensively. Keywords: Industrial-scale / Mass spectrometry / Plasma Received 11/11/03 Revised 16/12/03 Accepted 22/12/03 Proteomics 2004, 4, 2125–2150 2125 1 Introduction The present article describes an industrial-scale ap- proach (comprehensive as far as possible) [1] to the anal- ysis of small proteins and polypeptides present in human blood plasma. Plasma is interesting for the following rea- sons: it contains many active proteins and telltale traces of diseases from many tissues; it may be obtained in fairly large quantities relatively noninvasively from both patients and control subjects (contrast tissue, which is often diffi- cult to obtain in large amounts from controls); blood, plasma or serum are used quite generally for exisiting clinical tests; many of the cells responsible for the protein content of plasma are not found in the blood, which may limit a genomic approach; it is via the plasma that most therapeutic agents reach their targets, and finally it is a fluid and therefore may be pooled (much more difficult to achieve with tissue) in order to obtain a large and repre- sentative sample for analysis. Nonetheless, analysis of the plasma proteome is challenging [1, 2] in view of the Correspondence: Keith Rose, Ph.D., GeneProt, 2 rue Pré-de-la- Fontaine, CP125, CH-1217 Meyrin 2, Switzerland E-mail: [email protected]Fax: 141-22-719-39-70 Abbreviation: LIMS, laboratory information management sys- tem * These authors contributed equally. 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de DOI 10.1002/pmic.200300718
26
Embed
Industrial-scale proteomics: From liters of plasma to chemically synthesized proteins
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Industrial-scale proteomics: From liters of plasma tochemically synthesized proteins
Keith Rose1*, Lydie Bougueleret1*, Thierry Baussant1, Günter Böhm1, Paolo Botti1,Jacques Colinge1, Isabelle Cusin1, Hubert Gaertner1, Anne Gleizes1, Manfred Heller1,Silvia Jimenez1, Andrew Johnson1, Martin Kussmann1, Laure Menin1,Christoph Menzel1, Frederic Ranno1, Patricia Rodriguez-Tomé1, John Rogers1,Cedric Saudrais1, Matteo Villain1, Diana Wetmore1, Amos Bairoch1, 2
and Denis Hochstrasser1, 3
1GeneProt, Geneva, Switzerland2Swiss Institute of Bioinformatics, CMU, Geneva, Switzerland3University Hospital of Geneva, Geneva, Switzerland
Human blood plasma is a useful source of proteins associated with both health and disease.Analysis of human blood plasma is a challenge due to the large number of peptides and pro-teins present and the very wide range of concentrations. In order to identify as many proteinsas possible for subsequent comparative studies, we developed an industrial-scale (2.5 liter)approach involving sample pooling for the analysis of smaller proteins (Mr generally , ca.40 000 and some fragments of very large proteins). Plasma from healthy males was depletedof abundant proteins (albumin and IgG), then smaller proteins and polypeptides were sepa-rated into 12 960 fractions by chromatographic techniques. Analysis of proteins and polypep-tides was performed by mass spectrometry prior to and after enzymatic digestion. Thousandsof peptide identifications were made, permitting the identification of 502 different proteins andpolypeptides from a single pool, 405 of which are listed here. The numbers refer to chromato-graphically separable polypeptide entities present prior to digestion. Combining results fromstudies with other plasma pools we have identified over 700 different proteins and polypep-tides in plasma. Relatively low abundance proteins such as leptin and ghrelin and peptidessuch as bradykinin, all invisible to two-dimensional gel technology, were clearly identified.Proteins of interest were synthesized by chemical methods for bioassays. We believe thatthis is the first time that the small proteins in human blood plasma have been separated andanalyzed so extensively.
Keywords: Industrial-scale / Mass spectrometry / Plasma
Received 11/11/03Revised 16/12/03Accepted 22/12/03
Proteomics 2004, 4, 2125–2150 2125
1 Introduction
The present article describes an industrial-scale ap-proach (comprehensive as far as possible) [1] to the anal-ysis of small proteins and polypeptides present in humanblood plasma. Plasma is interesting for the following rea-sons: it contains many active proteins and telltale tracesof diseases from many tissues; it may be obtained in fairly
large quantities relatively noninvasively from both patientsand control subjects (contrast tissue, which is often diffi-cult to obtain in large amounts from controls); blood,plasma or serum are used quite generally for exisitingclinical tests; many of the cells responsible for the proteincontent of plasma are not found in the blood, which maylimit a genomic approach; it is via the plasma that mosttherapeutic agents reach their targets, and finally it is afluid and therefore may be pooled (much more difficult toachieve with tissue) in order to obtain a large and repre-sentative sample for analysis. Nonetheless, analysis ofthe plasma proteome is challenging [1, 2] in view of the
Correspondence: Keith Rose, Ph.D., GeneProt, 2 rue Pré-de-la-Fontaine, CP125, CH-1217 Meyrin 2, SwitzerlandE-mail: [email protected]: 141-22-719-39-70
Abbreviation: LIMS, laboratory information management sys-tem * These authors contributed equally.
large numbers of proteins expected to be present and thewide dynamic range of protein and polypeptide concen-trations, known to span at least 11 to 12 orders of magni-tude (albumin vs. TNF, Fig. 1).
Figure 1. Abundance and dynamic range of plasma pro-teins. Plasma concentrations of some known proteins areshown in a log-log relationship with the number of proteinspecies expected at the various concentration ranges.Adapted from [1]. The 100 or so most abundant proteinsrepresent the “tip of the iceberg”, labeled common in theFigure. The less abundant proteins are represented by thearea region of interest.
In order to identify and characterize proteins and poly-peptides present at very low concentrations, it is neces-sary to start with a large sample volume to ensure theirpresence in sufficient quantities. This stems from thesensitivity requirements of mass spectrometers whichare used for rapid and sensitive analysis of proteins. If100 fmol (not an unreasonable amount) of protein isrequired for successful separation, digestion and massspectrometric identification, then a protein concentrationof at least 1 nM is required with a sample size of 100 mL.This required minimum concentration falls to 100 fM ifone liter of sample is available.
Although this is a question of simple arithmetic, it is notalways appreciated. Of course, these are minimum con-centrations which may not actually be reached with thecited sample volumes since extensive separation cannotbe performed quantitatively, but it is important to appreci-ate the utility of larger sample sizes when comprehensiveproteomics is attempted. In this article we outline ourapproach for the analysis of smaller proteins (Mr , ca.40 000) present in 2.5 liter samples of pooled humanplasma and present some preliminary results. We havecarried out several separate large scale analyses (500 mLand above), including comparative studies, and believethis is the first time that such an extensive separationand study of the small proteins present in human plasmaor serum has been reported.
2 Materials and methods
2.1 Initial sample handling
A laboratory information management system (LIMS) wasdeveloped and employed to track plasma samples fromthe point of blood collection, to track fractions and fractionhandling throughout the extensive separation process,and to monitor laboratory instruments and record datagenerated. Blood (100–450mL) was obtained from healthymales using standard veinous puncture procedures in amedical center (Duke University, Durham, NC, USA) set-ting after informed written consent. Samples were bar-coded and plasma was prepared by centrifugation andremoval of white cells on filters, Sepacell RZ-2000 non-woven (Baxter, Asahi, Japian), according to the manufac-turer’s instructions. Times, temperaturesandcentrifugationconditions and subsequent procedures were rigorouslycontrolled to ensure similar treatment of all samples. Pro-tease inhibitor (Complete; Roche Molecular Biochemicals,Indianapolis, IN, USA) was added according to the manu-facturer’s instructions and mixed gently to ensure disso-lution. Plasma samples were then frozen and stored at2807C. After careful consideration of medical history andclinical chemistry parameters, major portions of 53 sam-ples were pooled. Portions from each individual (10 mL)were retained unpooled for individual small-scale studies.From this pool (total volume 6 L) 2.5 L was used for analysisof the smaller proteins and polypeptides. The separationsteps are summarized in Table 1.
Table 1. Separation steps for the small proteins of 2.5 Lof plasma or serum
1. Depletion of abundant proteins (albumin and immuno-globulins): 53 g remain.
2. Gel filtration and reverse phase capture: 1.5 g remain(ca. 91%).
fraction, online LC-MS.8. Automated liquid handling and sample preparation
of the 12 960 fractions from step 7: MALDI MS,digestion, MALDI and LC-MS/MS of digests.
Protein recovery (, 20 kDa) is given where available inparentheses. It is expressed as the % of eluted proteinover protein injected. Both values were determined bysize exclusion HPLC (using BSA as a standard, see Sec-tion 2.2).
Portions of frozen pooled plasma (125 mL) were thawedand filtered through 0.45 mm sterile filters in a sterile hood.Filtrate was applied to a tandem column combinationconsisting of 300 mL albumin affinity resin (column 5 cmID, 15 cm length, laboratory prototype gel based on apeptidic compound linked to an agarose matrix; Amers-ham Biosciences, Uppsala, Sweden) and 100 mL ProteinG Sepharose Fast Flow (5 cm ID, 5 cm length; AmershamBiosciences). Columns were equilibrated and washedwith 50 mM PO4 buffer, pH 7.1, 0.15 M NaCl. A flow rateof 5 mL/min was used. The nonretained (flow-through)fraction (350 mL) was frozen until the second step. Twentyruns were performed. Protein content in the flow-throughfraction was determined by analytical size exclusionHPLC using BSA as a standard. The 20 flow-through frac-tions were applied in turn to gel filtration chromatography.Each portion was thawed and filtered through a 0.45 mmsterile filter in a sterile hood. Filtrate was injected on twoin-line gel filtration columns: 269.5 L Superdex 75 (each14 cm ID, 62 cm length; Amersham Biosciences). Thetandem columns were equilibrated and then eluted at arate of 40 mL/min with 50 mM PO4 buffer pH 7.4, 0.1 M
NaCl, 8 M urea. Hydrophobic impurities in the buffer wereretained on a reverse phase precolumn upstream of theinjector (150 mL PLRPS; Polymer Labs, Amherst, MA,USA). During the elution of low Mr proteins (nominally ,
20 kDa based on analytical SDS-PAGE) the effluent wasswitched to an in-line reverse phase capture column(50 mL PLRPS, 100 Å; Polymer Labs). The three-wayvalve controlling effluent switching to the PLRPS columnwas activated when the absorbance at 280 nm fell to33 mAU after elution of the large proteins. The cut-offvalue was established during preliminary experimentsusing SDS-PAGE to monitor the eluate. After washingthe PLRPS capture column, low Mr proteins and peptideswere eluted with one column volume of 0.1% TFA, 80%CH3CN in water. Each eluted portion was frozen untilfurther use. The 20650 mL eluates were then thawed,pooled (1 L) and distributed into 7 polypropylene contain-ers (143 mL each). Containers were kept at 2207C.
2.3 Ion exchange chromatography
The seven portions were thawed in turn and mixed withan equal volume of cation exchange buffer A (glycine/HCl 50 mM, pH 2.7, urea 8 M). Each sample was injectedonto a 100 mL Source 15S column (35 mm ID, 100 mmlength; Amersham Biosciences) equilibrated and washedwith buffer A. A flow rate of 10 mL/min was used. Proteinsand peptides were eluted with step gradients from 100%buffer A to 100% buffer B (i.e. buffer A containing 1 M
NaCl): 3 column volumes 7.5% buffer B (75 mM NaCl);3 column volumes 10% buffer B (100 mM NaCl); 3 columnvolumes 17.5% buffer B (175 mM NaCl); 2 columnvolumes 22.5% buffer B (225 mM NaCl); 2 columnvolumes 27.5% buffer B (275 mM NaCl); 2 columnvolumes 100% buffer B (1 M NaCl). After seven runs, frac-tions were pooled in order to obtain 18 final fractions.Fractions were kept at 2207C until further use.
2.4 Reduction/alkylation and 1st RP HPLCfractionation
After adjusting the pH to 8.5 with concentrated Tris-HCl,each of the 18 cation exchange fractions was reducedwith dithioerythritol ((DTE) 30 mM, 3 h at 377C) and alkylat-ed with iodoacetamide (120 mM, 1 h 257C in the dark). Thelatter reaction was stopped with the addition of DTE(30 mM) followed by acidification (0.1% TFA). Fractionswere then injected at a rate of 10 mL/min on an UptispherC8 column (5 mm, 300 Å, 21 mm ID, 150 mm length; Inter-chim, Montlucon, France). The C8 column was equili-brated and washed with 0.1% TFA in water (solution A).Proteins and peptides were eluted at a rate of 20 mL/minwith a biphasic gradient from 100% A to 100% B (0.1%TFA, 80% CH3CN in water) over 60 min. Thirty fractionsof 40 mL were collected. Based on the measured OD ofeach fraction at 280 nm, aliquots of similar protein con-tent were created for each fraction. Aliquots were frozenand kept for further use. One aliquot per fraction wasdried with a Speed Vac (Savant Holbrook, NY, USA) afteraddition of 500 mL 10% glycerol in water to preventexcess drying. Dried fractions were kept at 2207C untilneeded.
2.5 2nd RP HPLC fractionation
Dried samples from the previous step (see Section 2.4)were resuspended in 1 mL of solution A (0.03% TFA inwater) and injected on a LCMS C4 column (5 micrometers,300 Å, 4.6 mm ID, 150 mm length; Vydac Hesperia, CA,USA). The column was equilibrated and washed at a rateof 0.8 mL/min with solution A. Proteins and peptides wereeluted with a biphasic gradient adapted to the elutionposition of the fraction in the 1st RP dimension. Sixteendifferent gradients were used with a CH3CN concentra-tion range from 5% below to 5% above the elution con-centration used in the 1st RP1 dimension. However, forproteins eluted in the first dimension with a CH3CN con-centration equal to or greater than 30% CH3CN, the start-ing elution conditions for the second dimension gradientwere set at the first dimension elution concentrationminus 30%. Twenty-four eluted fractions were collected in
deep-well plates using optimized collection configura-tions designed for optimal SpeedVac concentration andfurther robotic treatment.
2.6 MS
ESI-MS data were collected online during the 2nd RP di-mension by splitting 2.5% of the effluent to a mass spec-trometer (Esquire 3000; Bruker Daltonics Bremen, Ger-many). Aliquots of the 12 960 final fractions of undigestedproteins were mixed with MALDI matrices, and spottedwith automated spotting devices (MAP II/8; Bruker Dal-tonics) on MALDI plates (Anchor type; Bruker Daltonics)together with mass calibration standards and sensitivitystandards. Two different MALDI matrices were employed:sinapinic acid (SA), and a-cyano-4-hydroxycinnamic acid(HCCA). MALDI spectra were obtained using BrukerReflex III machines. Spectra were systematically acquiredunder three sets of conditions: low and medium massranges with HCCA in reflector mode, and high mass rangewith SA in linear mode. The bulk of each sample in the96-well plates was concentrated from 0.8 mL to about50 mL per well in the vacuum centrifuge. Samples werethen diluted to about 200 mL, reconcentrated to about50 mL per well, and stored at 47C. Proteins were thendigested by re-buffering, adding trypsin to the wells, seal-ing and incubating the plates at 377C for 12 h, followedby quenching (addition of formic acid to bring the pH to2.0). The amount of trypsin added to the wells was basedon the OD at 280 nm recorded for each particular fraction.This ensured an optimal use of trypsin and completedigestion of the most concentrated fractions. MALDIanalysis was performed as described above except thatonly the HCCA matrix was used. The major portion ofeach digest was analyzed by capillary LC-ESI-MS/MS(Esquire 3000 ion trap machines; Bruker Daltonics) at arate of 2 mL/min. For this, each of 40 Bruker Esquire 3000mass spectrometers was fitted with a pair of Waters Alli-ance chromatographs (Waters, Milford, MA, USA) througha low dead volume switching valve in order to maximizethroughput of the mass spectrometers. While one chro-matograph was equilibrating and injecting, the other waseluting components into the mass spectrometer. Theroles were then reversed automatically. Machines werecontrolled using HyStar software (Bruker Daltonics).
2.7 Bioinformatics
Raw data were processed online on an acquisition PC,then data were collected by the LIMS, backed up, andpeak lists were transferred to the Supercomputer (1420Compaq/HP Alpha processors) for processing. Identifica-tion against six different databanks was performed using
commercial algorithms in early experiments, and thenwith a new search engine [5]. Integrated identificationresults were then transferred to an Oracle DataBase, to-gether with the related relevant LIMS information (fractionnumber and position in the experimental process). Auto-matic procedures were put in place to filter out false posi-tive peptide identifications after careful validation byannotators. Manual identification of protein and or frag-ments of proteins was then done by the annotators. Auto-matic annotation is kept in the database as well as man-ual annotation when required.
2.8 Chemical synthesis
Proteins of interest were then synthesized by chemicalmethods [7–9], including Native Chemical Ligation. Finaloxidative refolding was performed by standard methods[10, 11] based on results of trial experiments involvingdilution or dialysis to remove denaturant while increasingthe proportion of oxidized glutathione in a buffered mix-ture of oxidized and reduced forms of glutathione.Masses of all fragments and of final products were veri-fied by MS and were within 0.5 amu of calculated values.Synthesized proteins were then submitted to a variety ofin vitro and in vivo assays.
3 Results and discussion
3.1 Preseparation
Portions of plasma from 53 healthy male volunteers werepooled in order to dilute phenotypic differences and toprovide a large pool volume. Great care was taken toensure that sample collection and processing (includingtimes, temperatures, centrifugation etc.) was as similaras possible for all of the samples. A volume of 2.5 L ofsuch a pool was used for the analysis of the smallerproteins described here. Unpooled portions (ca. 10 mL)of individual samples were retained for further studies.The separation scheme, shown in Table 1, starts with adepletion step to reduce the concentrations of the mostabundant proteins. It is necessary to remove abundantlarger proteins as these interfere with the subsequentchromatographic procedures [2, 3]. We used a tandemcolumn combination which adsorbs serum albumin andimmunoglobulins (Fig. 2A).
Other groups have used immobilized Cibacron blue orimmobilized antibodies for this purpose [2, 4] and suchcolumns are available from Amersham, ABI (Foster City,CA, USA) and Agilent (Palo Alto, CA, USA). The advan-tage of the albumin binding medium which we used wasits specificity. Besides albumin, the column adsorbed at
Figure 2. Chromatographic separations. (A) Depletion of albumin and IgG. The earlier peak is nonretained and the second(doublet) is obtained during regeneration. One-twentieth of the 2.5 L sample (125 mL) was loaded per run. (B) Gel filtrationof depleted plasma (125 mL equivalent). Effluent between arrows was diverted to RP capture. (C) Cation exchange chro-matography with fractions shown. Seven such runs were performed.
least 20 polypeptides identified as albumin fragmentsafter reduction, alkylation, 2-DE and MS. This depletionstep greatly simplified the subsequent analysis of smallproteins. Small quantities of very hydrophobic proteins(apolipoproteins) were also captured by the depletionmedia. These proteins were all clearly identified in theflow-through fraction also. Detailed results obtained withthe albumin binding medium will be reported elsewhere.From 2.5 L pooled plasma we obtained approximately53 g of protein depleted in albumin and immunoglobulins.
Gel filtration (Fig. 2B) was then performed. Approximately1.5 g of small proteins of which approximately 1.3 g wasof Mr , 20 000, as determined by analytical gel filtrationHPLC monitored at 210 nm and using BSA as a standard(data not shown), was obtained. These small proteins andpolypeptides were separated (Table 1) into 12 960 frac-tions by a combination of cation exchange chromatogra-phy (Fig. 2C, 18 fractions), a first RP HPLC dimension(30 fractions from each of the ion exchange fractions;Fig. 3 top panel), and a second HPLC procedure (24 frac-
Figure 3. (A) 1st RP dimension (RP1) for ion exchange fraction 4. (B) 2nd RP dimension (RP2) for 9 consecutive RP1 fractions(10–18) of the RP1 run shown in the upper panel. Examples of the second-dimension RP chromatogram obtained from nineconsecutive fractions (10–18) from the first-dimension RP run of ion exchange fraction 4 are shown. While the complexity ofthese fractions is very evident, the consecutive chromatograms are very different, testifying to the separating power of thefirst dimension.
tions from each of the fractions from the first HPLCseparation). Many (8533) of these final fractions weresubsequently found not to contain an identifiable protein,994 fractions contained a single identifiable protein, andthe richest fraction contained 21 identifiable proteins.These final fractions were analyzed online by LC-ESI-MSand offline by MALDI-TOF MS, to obtain mass determina-tion of intact species where possible. Of the 8533 emptyfractions, 5892 arrived empty in the database becausethere was no successful automatic identification, eitherbecause there was no triggered MS/MS spectrum or be-cause the quality was below our threshold. This tended tooccur for the earliest (1–5) and latest (18–24) RP2 frac-
tions of each run. The remaining 2641 of the 8533 emptyfractions were emptied after annotation since the hitswere of insufficient quality.
3.2 MS
As mentioned above, ESI-MS data of the intact proteinswere acquired online during the second RP dimension.Small aliquots of collected fractions were prepared forMALDI-TOF MS using liquid handlers. After being con-centrated in a vacuum centrifuge the proteins weredigested and optical density values of the fractions were
used to automatically adjust the amount of enzyme (tryp-sin) which was added for the digestion step. Fragmentsproduced by digestion were analyzed by MALDI-TOFMS in reflectron mode to obtain peptide mass fingerprintinformation, and also by LC-ESI-MS and MS/MS. MS/MSanalysis was initiated automatically in a data-dependentfashion based on MS data acquired during the same run.Figure 4 shows an example of how all this information isused.
Figure 4. Sequence coverage of small inducible cytokineA14. Italic bold type indicates coverage by ESI-MS;underlined sequence indicates coverage by MALDI MS.A total coverage of 82% was obtained.
From Fig. 4, the close correspondence of the intactmasses found (8903.0 by ESI, 8904.3 by MALDI) with themass calculated for the entire reduced and carboxamido-methylated cytokine (8905.99) allows us to deduce thatthis protein was processed as expected (removal of thesignal sequence) and that the cysteine residues were
alkylated properly by our procedures. A coverage of82% was obtained when combining MALDI and ESIdata. Figure 5 shows that even with 12 960 fractions it isnot possible to separate all the proteins present. Theexcellent signal-to-noise ratio obtained for the relativelylow abundance protein leptin is also shown.
3.3 Bioinformatics
More than 1.5 million MS/MS spectra were generated and330 000 of these used for manual testing and validationof the algorithms developed for automated identifica-tions. Preliminary interpretations of MS/MS spectra wereperformed using commercial algorithms. Later, MS/MSspectra were interpreted using a new identification engine(Olav) which has been previously described [5]. Theengine accessed a set of six different databases contain-ing either protein sequence data (Swiss-Prot), publicEST or Genomic sequence data, and commercial Patentdatabases. Post-processing was performed to enhancethe quality of the scoring of each identification. An inte-gration step was then used to check the consistency ofthe global identification across the different databasesused. Annotation (automatic and manual) was performed
Figure 5. MALDI mass spectrum of a tryptic digest of one of the 12 960 fractions, shown to contain atleast two proteins: leptin and apolipoprotein A II.
on the validated identifications to emphasize importantfeatures and further characterize the observed proteinsor fragments of proteins.
3.4 Proteins identified
In total 405 nonredundant proteins (defined as chroma-tographically separable polypeptide entities) were defini-tively identified (Table 2) in a process involving a singleplasma pool from healthy males. By separable poly-peptide entities we refer to known fragments of a geneproduct. For example, several forms of the serine pro-tease inhibitor Kazal-type 5 precursor are known to circu-late as separate fragments of the gene product Q9NQ38.Another example concerns polypeptide chains from thesame gene product which become chromatographicallyseparable after denaturation, reduction and alkylation,such as plasma kallikrein heavy and light chains PO3952.In addition, in some cases, MS coverage permitted theexistence of new fragments to be proposed, for examplewhen identified tryptic fragments were localized exclu-sively towards the N-terminus or C-terminus of a longprotein. Such cases brought the overall number of sepa-rable entities from 405 to 502. Furthermore, about onehundred peptides were identified from areas of the ge-nome where no proteins have previously been predicted,although a possible mechanism for their production hasbeen reported [6]. By combining results of similar analy-ses from other plasma pools, we have identified a total of
over 700 different proteins in human plasma from males.Additional proteins have been identified in plasma orserum from female donors. This is a far greater numberof nonredundant small proteins than has been found inprevious smaller-scale studies of the small proteins ofplasma or serum, described in the patent literature orreported in the scientific literature. While a large numberof identified proteins (490) have been cited recently in aninteresting article where LC-MS/MS of peptides wasemployed [3], in our opinion, when the list of identifiedserum proteins is restricted using the same criteria as weused (only classical tryptic peptides included, with nomore than one missed cleavage), this number is reducedto about 164 of the more common proteins in line withFig. 1. The first 100–200 most abundant proteins arerelatively easy to identify. The identification of the lessabundant proteins requires a larger scale approach suchas the one described here.
We show details of 405 of the 502 identified proteins inTable 2. After further study, details on the other proteinswill be published elsewhere. Of the 405 proteins shownin Table 2, 210 are known to be secreted and plasmatic,76 are secreted but not known to be plasmatic, 2 are notknown to be secreted, 15 are cellular leakage proteinsknown to be found in plasma, 41 are probably cellularleakage proteins but have not been reported as presentin plasma, and 61 are unclassified. This list includes lowabundance proteins such as leptin, and low abundancepeptides such as bradykinin. While the gel filtration step
Table 2. List of 405 of the 502 nonredundant proteins found in a single 2.5 L pool of human blood plasma from healthymales
Accession numbers and descriptions are from Swiss-Prot and TrEMBL when the protein is found there. WOSIGO (withoutsignal peptide) is the mature form of the peptide. C0,1,2 . . . are chains as described in Swiss-Prot. An accurate descriptionis given in the description column. P0,1,2 . . . are peptides as described in Swiss-Prot (see description column).a) Known to be secreted and plasmaticb) Secreted but not known to be plasmaticc) Not known to be secretedd) Cellular leakage proteins known to be found in plasmae) Probably cellular leakage proteins but have not been reported as present in plasmaf) unclassified; * indicates protein identified in the ion exchange column regeneration fraction.Length refers to the number of amino acid residues in the identified protein. Number of distinct tryptic peptides refers toidentified different classical tryptic peptides (no more than one missed cleavage allowed, no nontryptic cleavages). Forproteins identified by a single peptide, number of runs refers to the number of separate (independent) RP2 runs (fractions)in which the protein was identified.
led to identification of a majority of small proteins, frag-ments of some very large proteins were identified, manyof which (those carrying an asterisk in Table 2) were elutedduring regeneration of the ion exchange column (fraction18 in Fig. 2C). We identified 115 proteins through a singlepeptide (generally identified in multiple fractions, seerelevant column in Table 2), 143 through 2–5 peptides,77 through 6–10 peptides and 70 through more than 10peptides.
3.5 Chemical protein synthesis
Proteins found to be of interest (e.g. new and secreted)are being synthesized chemically and studied. Bioactivity,such as enzymic activity, has already been demonstratedfor some of the proteins identified. Refolding has beensuccessful with proteins possessing 8 disulfide bonds.The largest proteins synthesized and refolded so far havemore than 160 residues (Fig. 6). This is not the limit of
Figure 6. Chemically synthesized 18 kDa protein assembled from 4 fragments. RP HPLC analysis of fully reduced protein(upper left panel), and after refolding (elutes earlier, lower left panel). ESI MS analysis (right panels) of refolded protein,which gave up 8 protons (4 disulfide bonds formed). The deduced mass is correct to within one amu.
the chemical method employed [7]. Purities assessed byHPLC and MS are in excess of 95% and yields of 5–40 mg,depending on length, are obtained. The nonbiological originof the chemically synthesized proteins avoided the risk ofcontamination by viruses, prions or endotoxins which canexhibit biological effects in small animal studies.
4 Concluding remarks
A procedure for the automated industrial-scale (2.5 L) anal-ysis of human plasma has been described, and hundredsof small proteins reported. By comparing two large pools,one from diseased and one from carefully matched controlsubjects, differences even in low abundance proteins maybe detected (data not shown). Once such proteins havebeen identified, high-throughput assays may be developedand used in follow-up validation studies involving the pre-served unpooled aliquots from individual donors. Sincecompletion of the work described here, we have upgradedour mass spectrometric instrumentation to the Esquire30001 model, and studied sample volumes of 500 mLand of 10 mL of serum or plasma. Preliminary results indi-cate that more than 250 nonredundant small proteinscan be identified from a 10 mL sample of plasma.
5 References
[1] Rose, K., in: Cooper, D. N. (Ed.),Nature Encyclopedia of theHuman Genome, Macmillan Publishers, London, England2003, Vol. 3, pp.435–439.
[2] Anderson, N. L., Anderson, N. G., Mol. Cell. Proteomics2002, 1, 845–867.
[3] Adkins, J. N., Varnum, S. M., Auberry, K. J., Moore, R. J. etal., Mol. Cell. Proteomics 2002, 1, 947–955.
[4] Wang, Y. Y., Cheng, P., Chan, D. W., Proteomics 2003, 3,243–248.
[5] Colinge, J., Masselot, A., Giron, M., Dessingy, T., Magnin, J.,Proteomics 2003, 3, 1454–1463.
[6] Schwab, S. R., Li, K. C., Kang, C., Shastri, N., Science 2003,301, 1367–1371.
[7] Dawson, P. E., Kent, S. B., Annu. Rev. Biochem. 2000, 69,923–960.
[8] Low, D. W., Hill, M. G., Carrasco, M. R., Kent, S. B., Botti, P.,Proc. Natl. Acad. Sci. USA 2001, 98, 6554–6559.
[9] Villain, M., Vizzavona, J., Gaertner, H., in: Lebel, M.,Houghten, R. A. (Eds.), Peptides: The Wave of the Future.Proceedings of the 17th American Peptide Symposium,Kluwer Academic Publishers, Dordrecht, The Netherlands2001, pp.107–108.
[10] De Bernadez Clark, E., Schwartz, E., Rudolph, R., Meth.Enzymol. 1999, 309, 217–236.
[11] Armstrong, N., de Lancastre, A., Gouaux, E., Protein Sci.1999, 8, 1475–1483.