Top Banner
RESEARCH ARTICLE HUPO Brain Proteome Project: Summary of the pilot phase and introduction of a comprehensive data reprocessing strategy Michael Hamacher 1 , Rolf Apweiler 2 , Georg Arnold 3 , Albert Becker 4 , Martin Blüggel 5 , Odile Carrette 6 , Christine Colvis 7 , Michael J. Dunn 8 , Thomas Fröhlich 3 , Michael Fountoulakis 9 , André van Hall 1 , Friedrich Herberg 10 , Juango Ji 11 , Hans Kretzschmar 12 , Piotr Lewczuk 13 , Gert Lubec 14 , Katrin Marcus 1 , Lennart Martens 15 , Nadine Palacios Bustamante 1 , Young Mok Park 16 , Stephen R. Pennington 8 , Johan Robben 17 , Kai Stühler 1 , Kai A. Reidegeld 1 , Peter Riederer 18 , Jean Rossier 19 , Jean-Charles Sanchez 6 , Michael Schrader 20 , Christian Stephan 1 , Danilo Tagle 21 , Herbert Thiele 22 , Jing Wang 23 , Jens Wiltfang 13 , Jong Shin Yoo 16 , Chenggang Zhang 23 , Joachim Klose 24 and Helmut E. Meyer 1 1 Medizinisches Proteom-Center, Ruhr-Universität Bochum, Bochum, Germany 2 EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK 3 Laboratory for Functional Genome Analysis LAFUGA, Gene Center, Ludwig-Maximilians-University Munich, Munich, Germany 4 Institut für Neuropathologie, Universitätsklinikum Bonn, Bonn, Germany 5 Protagen AG, Dortmund, Germany 6 Biomedical Proteomics Research Group, Geneva University, Faculty of Medicine, CMU, Geneva, Switzerland 7 Division of Basic Neurosciences and Behavioral Research, National Institute on Drug Abuse, Bethesda, MA, USA 8 Proteome Research Centre, UCD Conway Institute, University College Dublin, Dublin, Ireland 9 Academy of Athens, Foundation for Biomedical Research, Division of Biotechnology, Athens, Greece 10 Institut für Biologie, Abteilung Biochemie, Universität Kassel, Kassel, Germany 11 Department of Biochemistry and Molecular Biology, The National Laboratory of Protein Engineering and Plant Genetic Engineering, Peking University, Beijing, P. R. China 12 Zentrum für Neuropathologie und Prionforschung, Ludwig-Maximilians-Universität München, München, Germany 13 Poliklinik & Institutsambulanz, Labor für Molekulare Neurobiologie, Klinik und Poliklinik für Psychiatrie und Psychotherapie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany 14 Medical University of Vienna, Department of Pediatrics, Division of Neuroproteomics, Vienna, Austria 15 Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium 16 Korea Basic Science Institute, Daejeon, Republic of Korea 17 Biomedisch Onderzoeksinstituut, Universiteit Hasselt, Diepenbeek, Belgium 18 Clinic and Policlinic of Psychiatry and Psychotherapy, Department Clinical Neurochemistry, University of Würzburg, Würzburg, Germany 19 Neurobiologie et Diversité Cellulaire CNRS UMR7637, Ecole Superieure de Physique et de Chimie Industrielles, Paris, France 20 Fachhochschule Weihenstephan, Fachbereich Biotechnologie, Freising, Germany 4890 Proteomics 2006, 6, 4890–4898 Correspondence: Dr. Michael Hamacher, Medizinisches Proteom- Center, Ruhr-Universität Bochum, ZKF E.143, Universitätsstraße 150, D-44801 Bochum, Germany E-mail: [email protected] Fax: 149-234-321-4554 Abbreviations: BPP , Brain Proteome Project; DCC, Data Collection Center; HUPO, Human Proteome Organisation; IPI, international protein index; PPP , Plasma Proteome Project; PSI, Proteomics Standards Initiative; RAIN, robust advanced image normalisation DOI 10.1002/pmic.200600295 © 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
9

HUPO Brain Proteome Project: Summary of the pilot phase and introduction of a comprehensive data reprocessing strategy

May 04, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HUPO Brain Proteome Project: Summary of the pilot phase and introduction of a comprehensive data reprocessing strategy

RESEARCH ARTICLE

HUPO Brain Proteome Project: Summary of the pilot

phase and introduction of a comprehensive data

reprocessing strategy

Michael Hamacher1, Rolf Apweiler2, Georg Arnold3, Albert Becker4, Martin Blüggel5,Odile Carrette6, Christine Colvis7, Michael J. Dunn8, Thomas Fröhlich3,Michael Fountoulakis9, André van Hall1, Friedrich Herberg10, Juango Ji11,Hans Kretzschmar12, Piotr Lewczuk13, Gert Lubec14, Katrin Marcus1, Lennart Martens15,Nadine Palacios Bustamante1, Young Mok Park16, Stephen R. Pennington8,Johan Robben17, Kai Stühler1, Kai A. Reidegeld1, Peter Riederer18, Jean Rossier19,Jean-Charles Sanchez6, Michael Schrader20, Christian Stephan1, Danilo Tagle21,Herbert Thiele22, Jing Wang23, Jens Wiltfang13, Jong Shin Yoo16, Chenggang Zhang23,Joachim Klose24 and Helmut E. Meyer1

1 Medizinisches Proteom-Center, Ruhr-Universität Bochum, Bochum, Germany2 EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK3 Laboratory for Functional Genome Analysis LAFUGA, Gene Center, Ludwig-Maximilians-University

Munich, Munich, Germany4 Institut für Neuropathologie, Universitätsklinikum Bonn, Bonn, Germany5 Protagen AG, Dortmund, Germany6 Biomedical Proteomics Research Group, Geneva University, Faculty of Medicine, CMU, Geneva, Switzerland7 Division of Basic Neurosciences and Behavioral Research, National Institute on Drug Abuse, Bethesda, MA, USA8 Proteome Research Centre, UCD Conway Institute, University College Dublin, Dublin, Ireland9 Academy of Athens, Foundation for Biomedical Research, Division of Biotechnology, Athens, Greece

10 Institut für Biologie, Abteilung Biochemie, Universität Kassel, Kassel, Germany11 Department of Biochemistry and Molecular Biology, The National Laboratory of Protein Engineering and Plant

Genetic Engineering, Peking University, Beijing, P. R. China12 Zentrum für Neuropathologie und Prionforschung, Ludwig-Maximilians-Universität München, München,

Germany13 Poliklinik & Institutsambulanz, Labor für Molekulare Neurobiologie, Klinik und Poliklinik für Psychiatrie und

Psychotherapie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany14 Medical University of Vienna, Department of Pediatrics, Division of Neuroproteomics, Vienna, Austria15 Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium16 Korea Basic Science Institute, Daejeon, Republic of Korea17 Biomedisch Onderzoeksinstituut, Universiteit Hasselt, Diepenbeek, Belgium18 Clinic and Policlinic of Psychiatry and Psychotherapy, Department Clinical Neurochemistry, University of

Würzburg, Würzburg, Germany19 Neurobiologie et Diversité Cellulaire CNRS UMR7637, Ecole Superieure de Physique et de Chimie

Industrielles, Paris, France20 Fachhochschule Weihenstephan, Fachbereich Biotechnologie, Freising, Germany

4890 Proteomics 2006, 6, 4890–4898

Correspondence: Dr. Michael Hamacher, Medizinisches Proteom-Center, Ruhr-Universität Bochum, ZKF E.143, Universitätsstraße150, D-44801 Bochum, GermanyE-mail: [email protected]: 149-234-321-4554

Abbreviations: BPP, Brain Proteome Project; DCC, Data CollectionCenter; HUPO, Human Proteome Organisation; IPI, internationalprotein index; PPP, Plasma Proteome Project; PSI, ProteomicsStandards Initiative; RAIN, robust advanced image normalisation

DOI 10.1002/pmic.200600295

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com

Page 2: HUPO Brain Proteome Project: Summary of the pilot phase and introduction of a comprehensive data reprocessing strategy

Proteomics 2006, 6, 4890–4898 Technology 4891

21 National Institute of Neurological Disorders and Stroke, National Institutes of Health, NeuroscienceCenter, Bethesda, MD, USA

22 Bruker Daltonik, Bremen, Germany23 Department of Neurobiology, Beijing Institute of Radiation Medicine, Beijing, P. R. China24 Institut für Humangenetik, Charité-Universitätsmedizin Berlin, Humboldt-Universität, Berlin, Germany

The Human Proteome Organisation (HUPO) initiated several projects focusing on the proteomeanalysis of distinct human organs. The Brain Proteome Project (BPP) is the initiative dedicatedto the brain, its development and correlated diseases. Two pilot studies have been performedaiming at the comparison of techniques, laboratories and approaches. With the help of theresults gained, objective data submission, storage and reprocessing workflow have been estab-lished. The biological relevance of the data will be drawn from the inter-laboratory comparisonsas well as from the re-calculation of all data sets submitted by the different groups. In the fol-lowing, results of the single groups as well as the centralised reprocessing effort will be sum-marised and compared, showing the added value of this concerted work.

Received: April 19, 2006Revised: June 12, 2006

Accepted: June 12, 2006

Keywords:

Brain Proteome Project / Human / Human Proteome Organisation / Mouse / Pilot study

1 Introduction

The Human Proteome Organisation (HUPO) was launchedin February 2001 because of the need for an internationalproteomic forum to improve understanding of human dis-eases. At present, there are seven scientific initiatives ana-lysing a distinct organ each within an international con-sortium ([1] and www.hupo.org). The initiative dealing withthe brain is the Brain Proteome Project (HUPO BPP) [2–4],chaired by Helmut E. Meyer, Bochum, Germany.

The brain is one of the most complex tissues of higherorganisms, differing from other organs due to its many dif-ferent cell types, its structure at the cellular and tissue level,and the restricted regeneration capacity. Elucidating the pro-tein complement of the brain is therefore at the upper limitof significant challenges to today’s current technologies inproteome analysis. At the same time, the brain is of para-mount interest in medical research and in pharmaceuticalindustry because of the widespread social impact of the morecommon neurological diseases such as Alzheimer’s disease,Parkinson’s disease, multiple sclerosis, prion diseases andstroke. The prevalence of some of these diseases is increas-ing significantly, e.g. every 5th person over 80 years in indus-trial countries suffers from Alzheimer’s disease [5].

The most promising current approach to help under-standing the developmental and neurodegenerative changesin the brain is proteomics in combination with other well-established methods of molecular biology and humangenetics. One aim of the HUPO BPP is the characterisationof the human and mouse brain proteomes and the usage ofthe data (identified proteins, mRNA profiles, protein/proteininteractions, protein modifications and localisation, vali-dated protein targets) in understanding neurodegenerative

diseases and aging. In order to reach this fundamental goal,it is necessary to coordinate neuroproteomic activitiesworldwide and to enable every participant and active mem-ber of the HUPO BPP to access all data and new technologiesobtained through these studies (for an activity overview seeTable 1). Unfortunately, the exchange of raw data betweenseveral groups is often hampered by incompatible data for-mats and the lack of common software. Thus, HUPO BPPrecognised the need for guidelines from the very beginningand declared the elaboration of standards and the bioinfor-matics infrastructure as the primary goals. This is consistentwith the general effort to shape a new, reliable proteomicscodex [6, 7].

In order to evaluate existing approaches in brain prote-omics as well as to establish a standardised data-reprocessingpipeline, pilot studies have been initiated including bothmouse and human brain samples. Participating groups werefree to analyse these samples according to their own ap-proaches.

In the mouse pilot study, brain tissue from normal miceof three developmental stages had to be analysed by quanti-tative proteomics, while in the human pilot study, two hu-man brain tissue samples from an autopsy and a biopsy,respectively, had to be analysed by quantitative/qualitativeproteomics techniques.

Samples were sent out to several groups that could usetheir own standard analysis protocols (shipment costs cov-ered by BrainNet Europe and Applied Biosystems). Poolingof samples was not allowed. Standard protocols for proteinextraction from brain tissue for 2-D-PAGE were made avail-able by Joachim Klose, Berlin, Germany, and Gert Lubec,Vienna, Austria, as a non-obligatory offer. However, protocolsor modification within protocols had to be annotated. Data

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com

Page 3: HUPO Brain Proteome Project: Summary of the pilot phase and introduction of a comprehensive data reprocessing strategy

4892 M. Hamacher et al. Proteomics 2006, 6, 4890–4898

Table 1. Workshops, meetings and achievements 2002 to 2006

Timeline Meeting

November 21–24, 2002 1st HUPO World Congress, Versailles, France Meyer & Klose offered to chair the brain initiative

January 15, 2003 PepTalk, San Diego, CA, USA, presenting of the HUPO BPP

April 28, 2003 Kick-off Meeting in Frankfurt/Main, Germany with 25 participants, announcement of project

May 2003 HUPO BPP homepage www.hbpp.org minutes & updates can be found here

July 10, 2003 Planning Committee Meeting in Frankfurt/Main, Germany updates and decision of first steps

September 5/6, 2003 1st HUPO BPP Workshop at Castle Mickeln, Germany with 50 participants establishing of com-mittees (Steering, Specimen, Technology & Standardisation, Bioinformatics, Training) idea ofpilot studies and master plan

October 8–10, 2003 Neuroproteomics Session at the 2nd HUPO World Congress in Montreal

January 20, 2004 1st Steering Committee Meeting at the ESPCI in Paris, France organised by Jean Rossier; prepa-ration of 2nd HUPO BPP Workshop

April 23/24, 2004 2nd HUPO BPP Workshop at the ESPCI in Paris, France, organised by Jean Rossier; official beginof pilot study and fixing of master plan (75 participants)

April 26/27, 2004 1st HUPO BPP ProteinScape Training Course for Pilot Study Participants at the MPC, Bochum,Germany

July 29, 2004 1st HUPO BPP Bioinformatics Meeting at the EBI in Hinxton, UK, organised by Rolf Apweiler;elaboration of a Data Collection Concept, a time line for reprocessing and publishing, im-plementation of ProteinScape as general analysis software

July 2004 Internet forum forum.hbpp.org with several discussion forums, announcements and downloads

September 30, 2004 2nd Steering Committee Meeting in Frankfurt/Main, Germany, updates and agreement of DataCollection Concept

October 15, 2004 2nd HUPO BPP ProteinScape Training Course for Pilot Study Participants at Protagen, Dortmund,Germany

October 23, 2004 Neuroproteomics Session & HUPO BPP Workshop at the 3rd HUPO World Congress in Beijing,P. R. China, presentation of project (e.g. pilot studies) and discussions, intensification ofexchange/contact with other HUPO projects and international societies

End of 2004 Implementation of Data Collection Center at the MPC, Bochum, Germany

November 5, 2004 2nd HUPO BPP Bioinformatics Meeting at the EBI in Hinxton, UK organised by Rolf Apweiler

December 15/16, 2004 3rd HUPO BPP Workshop at Castle Rauischholzhausen, Germany

December 17, 2004 3rd HUPO BPP ProteinScape Training Course for Pilot Study Participants at the MPC, Bochum,Germany

January 28, 2005 3rd HUPO BPP Bioinformatics Meeting at Protagen, Dortmund, Germany

April 8, 2005 4th HUPO BPP Bioinformatics Meeting at the EBI in Hinxton, UK

June 1, 2005 ”HUPO BPP International Workshop on Mouse Models for Neurodegeneration” in Doorwerth,The Netherlands

July 7, 2005 5th HUPO BPP Bioinformatics Meeting at the EBI in Hinxton, UK

August 27, 2005 4th HUPO BPP Workshop in Munich, Germany during the 4th HUPO World Congress

January 9 – 11, 2006 Jamboree of the HUPO BPP Bioinformatics Committee at the EBI in Hinxton, U.K.

February 15–16, 2006 5th HUPO BPP Workshop “Bridging Proteomics and Medical Science “ at the UCD, Dublin, Ireland

To come:

September 4–7, 2006 3rd Steering Committee meeting during the 7th Siena Meeting, Siena, Italy

October 27 – November 1, 2006 6th HUPO BPP Workshop at the 5th HUPO World Congress in Long Beach, CA, USA

had to be submitted to a Data Collection Center (DCC, datafile storage solution at the Medizinisches Proteom-Center,Bochum, Germany; see Stephan et al. in this Special Issue)for central reprocessing and will be publicly accessible at thePRIDE database (http://www.ebi.ac.uk/pride, [8, 9]) servingas reference data for future analysis.

In the course of these studies and the subsequent cen-tral reprocessing, a data collection, submission and storagepipeline has been established, a bioinformatics identifica-tion strategy has been elaborated and very interestinginsights into today’s proteomics approaches could begained.

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com

Page 4: HUPO Brain Proteome Project: Summary of the pilot phase and introduction of a comprehensive data reprocessing strategy

Proteomics 2006, 6, 4890–4898 Technology 4893

In the following overview article, the results of the dis-tinct approaches of the individual groups as well as the cen-tralised reprocessing effort will be summarised. A detailedexplanation of the analysis strategy, the original researchpapers of the participating groups, and the results of thecentral reprocessing strategy are published in the reportscontained in this Special Issue “HUPO Brain ProteomeProject”. The biological relevance will be drawn from theinter-lab comparison as well as from the re-calculation of alldata sets submitted by the different groups, leading to a clearvote for consortia work by using common standards.

The HUPO BPP pilot studies were aiming at comparingprotein analyses techniques, laboratories and approaches.With the help of the results gained in the pilot studies anobjective data submission, storage and reprocessing work-flow has been elaborated.

2 Materials and methods

2.1 Samples

2.1.1 Human brain tissue

Brain tissue from a surgical procedure for epilepsy was pro-vided by Albert Becker, University of Bonn Medical Center,Germany. Post-mortem tissue was made available fromBrainNet Europe through Hans Kretzschmar, Ludwig-Max-imilians-University Munich, Germany.

The following participating groups uploaded data gener-ated in this human brain study to the DCC:

Biomedical Research Institute (BIOMED), UniversityHasselt, Belgium

College of Life Sciences, Beijing University, P.R. ChinaProteome Research Centre, UCD Conway Institute,

University College Dublin, IrelandGene Center – Laboratory for Functional Genome Anal-

ysis – and Institute for Neuropathology, Ludwig-Max-imilians-University Munich, Germany

Korea Basic Science Institute (KBSI), Daejeon, SouthKorea

2.1.1.1 Biopsy: human temporal lobe, Patient ID:

N 1147–1149/96

A biopsy specimen was obtained from a patient (male, age44) with chronic pharmacoresistant temporal lobe epilepsy[10], who underwent surgical treatment in the Epilepsy Sur-gery-Program at the University of Bonn Medical Center. Apresurgical evaluation using a combination of non-invasiveand invasive procedures [11] revealed that seizures originatedin the mesial temporal lobe. Surgical removal of the hippo-campus and temporal lobe was clinically indicated; tissuewas stored on dry ice immediately and kept at 2807C. Allprocedures were conducted in accordance with the Declara-

tion of Helsinki and approved by the ethics committee of theUniversity of Bonn Medical Center. Informed written con-sent was obtained from the patient. A standardised neuro-pathological evaluation of the hippocampus revealedAmmon’s horn sclerosis (AHS), i.e. segmental neuronal cellloss in CA1 as well as CA3/4, gliosis, and axonal reorganisa-tion [12]. The temporal lobe specimen showed heterotopicneurons in the white matter.

2.1.1.2 Autopsy: human temporal lobe, Patient ID:

RZ104 (control case)

The brain originated from a 60-year-old Caucasian male(post-mortem interval: 11 h). Pathological characterisationrevealed: neuropathological diagnosis: Braak&Braak: 1–2;Cerad: 0; Lewy bodies (Mckeiths): 0; clinical diagnosis: pros-tate carcinoma, no other markers of neurodegeneration;cause of death: peritonitis.

Samples were taken from the area corresponding to thebiopsy sample (superior temporal gyrus).

2.1.2 Mouse brain tissue

C57BL/6J mice, originally obtained from Charles River Ger-many some months prior to the project, were bred and keptin a barrier unit under specific pathogen-free conditionsaccording to the FELASA recommendations, using Makro-lon type IIL cages with soft wood bedding. Mice had freeaccess to an autoclaved standard rodent diet (Altromin 1314)and water from automatic valves. Rooms were ventilatedwith a 15-fold air exchange rate. Room temperature was 22 6

17C and relative humidity 50 6 10%. The light/dark rhythmwas 14/10 [13]. Mice were mated permanent monogamouslyand offspring used at 7 or 54–58 days of age. For productionof the 16-day-old foetuses, females were mated overnight andvaginal plug was controlled in the morning. When plugpositive, this was considered day 0 of gestation. Dissection ofthe females and collection of foetuses was done on day 16.All mice were sacrificed by neck dislocation; foetuses werekilled by cutting off the head with scissors. Brain wasremoved from skull, weighed, immediately frozen on dry iceand stored at 2807C. There were no gross abnormalities ofthe brain at autopsy. The colony from which the experi-mental cohorts were taken was tested in behavioural (openfield, elevated plus maze), neurological (observational neu-rological battery, rota rod) and cognitive terms (Morris WaterMaze and Multiple-T-Maze) as described previously [14] anddid not reveal any abnormalities. Animals were handledaccording to Austrian Law and Guidelines.

Participants were expected to analyse n = 5 mouse brainsper stage (no pooling of samples).

The following participating groups uploaded data gener-ated in this mouse brain study to the DCC:

Beijing Institute of Radiation Medicine, P.R. China

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com

Page 5: HUPO Brain Proteome Project: Summary of the pilot phase and introduction of a comprehensive data reprocessing strategy

4894 M. Hamacher et al. Proteomics 2006, 6, 4890–4898

Biomedical Proteomics Research Group, University ofGeneva, Switzerland

Proteome Research Centre, UCD Conway Institute,University College Dublin, Ireland

Foundation for Biomedical Research of the Academy ofAthens, Greece

Gene Center – Laboratory for Functional Genome Anal-ysis – and Institute for Neuropathology, Ludwig-Max-imilians-University Munich, Germany

Institute for Human Genetics, Charité Berlin, GermanyMedizinisches Proteom-Center, Ruhr-Universitaet

Bochum, Germany

2.2 Methods

Participating laboratories used their own protocols concern-ing the following specifications:– No pooling of individual samples.– Protocols or modification within protocols had to be

annotated.– Data had to be submitted to the DCC for a central repro-

cessing.– ProteinScape™ should be used preferably for data collec-

tion and submission (see Stephan et al., this issue).– Data should be publicly accessible at the PRIDE database.– Preparation of joint publications.

For more details concerning the distinct groups, see therespective reports in this Special Issue.

2.3 DCC and bioinformatics

The heterogeneity of the data is very high because of thediverse analytical strategies and instruments. The AB 4700MALDI instrument was used in four laboratories. Twolaboratories used the LCQ mass spectrometer from ThermoElectron. The LTQ from the same company was also used intwo laboratories. Flex series instruments of Bruker wereused in two laboratories. For more details concerning thesetups as well as the identification strategies, see the articlesof the distinct participating groups.

We were interested in the assumption that a process ofcentral data reprocessing would lead to additional as well asmore reliable protein identifications. Thus, DCC and a cen-tral data reprocessing workflow have been elaborated. Inorder to collect, to analyse, and to reprocess these hetero-geneous data sets, a common, powerful and automated dataanalysis strategy was elaborated that is presented in detail byStephan et al. (this issue; see also [15–18]).

The following bioinformatics parameters were pre-set:(i) submission of data to the DCC localised at the MPC,

Bochum, Germany;(ii) use of the ProteinScape™ software (Bruker Daltonics,

Bremen, Germany) to allow handling, storage, andstandardised re-analysis of the submitted data (based on the

decision at the 2nd HUPO BPP Workshop in Paris 2004, freelicences were provided by Bruker);

(iii) use of a specified analysis guideline that was elabo-rated and discussed within the Bioinformatics Committee(IPI database release April 2005 to search against, minimaltwo identified peptides per protein etc.; full guideline avail-able at www.hbpp.org);

(iv) false positive rate of proteins should be lower than5%. This criterion has been used in several other studies,most notably by the HUPO Plasma Proteome Project(HUPO PPP) [19] that is compared with the HUPO BPP byMartens et al. (this issue).

Data submission ended on March 31, 2005, and databaseintegration of several gigabytes of data (peak lists, gel imagesand sample descriptions) was finished during April 2005. Re-analysis of the data sets according to the re-analysis criteriawas done in May through December 2005 (several iterativesteps were necessary), generating data sets containing non-evaluated peptide lists. To increase the number of identifiedproteins, different search engines were employed at thePAULA cluster, MPC, Bochum (Proteomics Cluster underLinux Architecture). These included (i) Sequest (ThermoFinnigan, cluster licence already existing) for MS/MS data;(ii) ProteinSolver (Bruker Daltonics, Bremen) for MS/MSdata; (iii) MASCOT (temporary free cluster licences byMatrix Science) for MS/MS and MS data; and (iv) ProFound(Rockefeller University, New York) for MS data.

Some of these search algorithms have been compared byKapp et al. [20], who published the results after the elabora-tion of the reprocessing guidelines and the beginning of therecalculation. Nevertheless, the HUPO BPP approach ispartly different from the Kapp approach, as the HUPO BPPcombined the results of the available search engines asdescribed by Stephan et al. (this Issue). All MS data sets weresearched against a specially prepared decoy protein databaseof the International Protein Index (IPI 3.05) databases foreach analysed species. For each protein in the original data-base, a decoy protein was added where all amino acids of thisprotein were shuffled.

The MS spectra were sent to MASCOT and ProFound asdescribed in the data reprocessing guideline. All proteinresults with a Metascore higher than 90 were assigned asidentified.

The MS/MS data from the distinct separation types (spot/band/fraction) were sent separately to MASCOT, Pro-teinSolver, and SEQUEST. To estimate the best peptidethreshold score for generating a pool of peptides, which willbe used for assembly of the proteins, the search parameterswere evaluated for each search engine prior to their use forthe automated approach. This enhances the maximum ofidentified proteins by a defined false positive rate of 5%. Theevaluation of parameters was calculated by analysing a subsetof 12 000 spectra. For assembling the different protein listsfrom MS/MS data and for removing redundancies, ProteinExtractor has been used. ProteinExtractor generates a proteinlist, which contains a minimal set of proteins with only those

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com

Page 6: HUPO Brain Proteome Project: Summary of the pilot phase and introduction of a comprehensive data reprocessing strategy

Proteomics 2006, 6, 4890–4898 Technology 4895

isoforms, which can be distinguished by MS/MS data. It usesan iterative approach: first, selecting the most likely proteincandidate (highest summed peptide score), writing this pro-tein into the result list, then marking all spectra explainableby this protein as ‘used’, then selecting the most probablenext protein candidate from the still unused spectra, andrepeating this, until all spectra are marked as ‘used’. The al-gorithm obeys rules set by MS/MS experts for the proteins,which should appear in a minimal protein list. In thisapproach, the number of matched peptides for the identifiedproteins in any of the search engines has been used ratherthan the sum of the peptide scores. The resulting mergedprotein list has been sorted by the sum of the individual sum-scores of the algorithms, and proteins were marked identifieduntil the list contained 5% of the decoy proteins. Only 0–3%of peptides are corresponding to a decoy entry on the peptidelevel in a list containing less than 5% false positive proteins.The overlap of the search engines was 80–90%, therefore 10–20% more proteins can be found with this approach. Theflexible way of reprocessing the data in order to generate theprotein lists are described in Stephan et al. (this Issue).

The HUPO BPP is one of the first major projects to sup-port the mzData standard format of the HUPO ProteomicsStandards Initiative (PSI) (psidev.sf.net). Collected data willbe submitted to the PRIDE database for public access.

Moreover, the collected original peak lists have beensubmitted to two independent analysts who will reprocessthem in regard to their own strategies (results will be pub-lished elsewhere). This will allow estimating the advantagesand disadvantages of the different strategies that were used.

Analyses of the 2-D-gel images were done centrally usingthe new image-based gel matching technique, robustadvanced image normalisation (RAIN), as described byDowsey et al. in this Special Issue.

3 Results

3.1 General remarks

In these pilot studies, nine independent laboratories ana-lysed the samples according to their own approach (fordetails of the distinct groups, please refer to the articles andReidegeld et al. in this Special Issue). The number of the MSspectra is 1350 (0.2%) and of MS/MS spectra 740 000(99.8%), with approximately 80% of spectra belonging to thehuman samples and approximately 20% to the mouse sam-ples. Half of all spectra originate from gel-based or LC-basedapproaches, respectively.

The spectra can be classified in different ways to observethe diversity of experimental setups. Prior to the MS analysis,different separation techniques were applied: 32% of thespectra were acquired after 1-D PAGE separation, 22% after2-D PAGE and 46% after LC. In addition, more than 500 2-Dgels were generated by eight participating laboratories asdescribed below (not all were centrally analysed).

3.2 2-D PAGE

Dowsey et al. describe in this Special Issue the analyses of the2-D gels generated in the pilot studies. These gels wereobtained by several techniques (e.g. single-channel andDIGE) and were very inhomogeneous even within the samecategory, as every laboratory used different size, pH range,running conditions and/or image analysis software. Never-theless, Dowsey and colleagues used their automatic gel-matching algorithm RAIN to match gel spots of the samesamples to a master gel. Besides a detailed description ofpotential challenges and other theoretical background, theRAIN algorithm is explained and discussed. Despite thegiven heterogeneity, the use of RAIN showed substantialimprovement of gel matching due to the optimised transfor-mation and parameter sets. As the complexity of DIGE gelsis even higher, matching is still suboptimal. Further studiesand the standardisation of gel experiments will lead to themain goal of providing a centralised and standardised bioin-formatics resource for 2-D gel image analysis.

3.3 Bioinformatics strategy

Central data reprocessing of such huge data sets is very timeand labour consuming. To minimise the work, but to max-imise the statistical reliability of the results, an automatedworkflow was elaborated (see Stephan et al., this Issue). Thishad to be done in an iterative way, as there was no standard-ised approach available for processing of such heterogeneousdata. It proved to be very useful that data sets were collectedusing a common software (ProteinScape and other algo-rithms therein) supporting the new standard formatmzDATA. Though not all groups adhered to the default set-tings, the conversion into the common standards (e.g. to theright IPI version 3.05) turned out to be relatively easy.

The adaptation and implementation of additional algo-rithms such as ProteinExtractor and the determination of theparameters for the different search engines took quite a lot oftime, as several rounds of adaptation had to be performed.Now completed, this pipeline allows an easy, objective (in themeaning of the strict use of the parameters), comprehensiveand fast way for reprocessing of MS/MS spectra. It assures ahighly reliable list of identified proteins in combination withthe use of a decoy database to determine a false positive ratebelow 5%. The use of different search engines results inmore proteins than the single searches alone. Every searchengine finds a subset of the overall protein lists due to thedifferent algorithms; the combination of all leads to moreidentified proteins. In addition, proteins, corresponding gelsand differential expression level alterations can easily be cor-related even after submission to the DCC, as this informa-tion is represented in the ProteinScape database structure.Further analysis of the data will be done with additionalsearch engines and/or by external analysts and publishedlater (see Stephan et al.).

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com

Page 7: HUPO Brain Proteome Project: Summary of the pilot phase and introduction of a comprehensive data reprocessing strategy

4896 M. Hamacher et al. Proteomics 2006, 6, 4890–4898

3.4 Single vs. centralised approaches

Nine participating laboratories analysed human and mousebrain samples, respectively, using a variety of different tech-niques. In the following, both the human and the mouseanalyses are briefly discussed together, as Reidegeld et al. aregiving a comprehensive overview of these approaches andthe results gained compared with the central reprocessing inthis Special Issue.

The questionnaire sent out at the very beginning alreadypromised a very heterogeneous set of methods and massspectrometers. Although the total amount of estimated gelsand spectra was slightly overestimated (e.g. 256106 MS/MSspectra from 2-D LC had been calculated), the overallamount of submitted data still is impressive. Thirty-sevendifferent analytical approaches were accomplished. Of theseanalyses, 17 were done differentially, i.e. the protein expres-sion patterns of the different samples (human or mouse)were compared.

Taken together, the individual groups identified over1200 redundant proteins. Some spectra sets were submittedwithout prior analysis due to the immense inherent time-consuming work (see Reidegeld et al. in this Issue). Thecomparison of these data is severely limited due to the dif-ferent used parameter sets (different protein databases,identification criteria etc.). By applying the described bioin-formatics workflow, it was possible to reprocess all availabledata sets with a fixed parameter set, in a reasonable period,with a suitable amount of labour and with a known qualitythreshold (false positive rate). By considering all spectra, theamount of proteins named increased fourfold to over 5400(redundant), mainly because of the previously non-analyseddata. In some data sets more proteins could be identified (upto over 90%), while others tend to lose identified proteins (upto 35%), probably due to the use of other, more stringent pa-rameters. After removing redundancies by using the ProteinExtractor (see Reidegeld et al.), 1832 proteins were identifiedin the human brain sample and 792 proteins in the mousebrain samples. Thus, the combination of all (normalised)data allowed the HUPO BPP bioinformatics team to gen-erate comprehensive, but still extended lists of proteins withan objective quality standard, thus making the best of relia-bility and effectiveness.

3.5 Comparison of technology platforms

We would like to stress the point that the overlap of identifiedproteins between the different participants is very low, e.g.,no one protein has been named by all groups concerning themouse study performing quantitative 2-D PAGE analyses.Nearly 80% of all proteins were listed by just one group andare uniquely identified, while only a subset of proteins weredetected by several groups (see Reidegeld et al., this Issue).Of course, the very different amount of proteins identified bythe different groups reduces the possibility of a big overlap.However, this distribution can be found in several big studies

(see e.g. [21]) and reflects the different features of theapproaches used and the subsequent sources of variationunder these conditions in the different laboratories. In gen-eral, the data reprocessing might be an additional source forthe small overlap, e.g. because of the application of distinctprotein assembly algorithms, uncertainty in the searchengine/database match and the different redundancyremoval steps.

Moreover, most proteins are very low abundant anddetection by recent methods is not sufficient for reproducibledetection throughout different laboratories. This also meansthat a weighting or valuation of the techniques is not mean-ingful as long as the approaches are not accurately standar-dised. However, the internal laboratory reproducibility has tobe proved.

3.6 PTM

In this Issue, Chamrad et al. address in their article thequestion of how unexplained spectra of the HUPO BPP datasets can be interpreted in regard to PTM. Using the PTM-Explorer software, they were able to automatically scan ap-proximately 30 000 spectra within less than 5 days, taking allusually occurring modifications and mass shifts intoaccount. After manually checking the potential hits, theypredict, e.g. several phosphorylations and other modifica-tions within this data subset, showing that PTM analysis ofhuge data is possible within a reasonable period.

3.7 Data mining and comparison with literature

Data mining has been one of the main goals of the pilotstudies. Mueller and colleagues from the European Bioin-formatics Institute (EBI) in Hinxton, UK (see article in thisIssue) elaborated a workflow for annotating identified pro-teins, including sequence features, genomic context, GOannotation, protein-protein interactions and disease associa-tion. In general, the results obtained reflect roughly the pro-tein composition one would expect when analysing thebrain. Mueller et al. report an enrichment of genes corre-sponding to the identified proteins that are encoded onchromosome 1. Moreover, they report that a large part of thedetected proteins are involved either in energy metabolism(mitochondrial electron transport, hydrolases) or in trans-port mechanisms (cytoskeleton-associated proteins), as thebiological context “brain” would predict. However, trans-membrane proteins are underrepresented, probably due tothe used sample preparation and/or analysis procedures.This extended data mining workflow allows a fast and auto-mated analysis of protein lists, and will therefore serve as animportant tool in further studies.

Interestingly, an astonishing number of proteins identi-fied in this pilot study are not named in other extensive pro-teome studies. This phenomenon is specified by Martens etal. (this Issue) describing, e.g. that the overlap between theHUPO BPP and the HUPO PPP [22] human pilot study is

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com

Page 8: HUPO Brain Proteome Project: Summary of the pilot phase and introduction of a comprehensive data reprocessing strategy

Proteomics 2006, 6, 4890–4898 Technology 4897

slightly above 15, but nearly 50% between the HUPO BPPand the platelet proteome data set [23] after normalisation tothe latest IPI version. The proteins concerned mostly belongto housekeeping genes. The higher percentage overlap be-tween brain and platelets could be due to the many sharedfunctions and proteins that are found in both tissues.Omenn and colleagues [24] could show, e.g., that serotonincan be removed by reuptake into serotonergic neurons aswell as into platelets. In addition, the relatively small size ofthe platelet proteome could lead to a higher overlap com-pared to the plasma. Here, it is rather suggested that incom-plete data sets, but also artefacts resulting from study design,sample preparation or analysis procedures should be con-sidered for the smaller overlap.

Martens et al. also compared the strategies of the HUPOBPP and the HUPO PPP pilot studies. While the HUPO BPPused peak lists for the central reprocessing, the HUPO PPPinitially relied solely on peptide sequences as units andgathered identifications in a centralised database, leaving theclassification into low and high confident to the submittinglaboratory. The number of peptides per protein reported issimilar for both HUPO initiatives, reaching over eight pep-tides (human), while the overlap of peptides identified doesnot exceed 5% between the two data sets. However, theauthors conclude that the organisation of data managementthrough the synergistic effects of a consortium of collabora-tors is of outstanding importance.

4 Discussion

The HUPO BPP aims at the analysis of the human brainby applying proteomics technologies and related methods.To gain an overview about existing approaches and suit-able workflows as well as in order to elaborate a standard-ised central bioinformatics pipeline, the HUPO BPP per-formed two pilot studies outlined in this Special Issue ofPROTEOMICS. Although these studies were performedon a voluntary basis, the acceptance and the engagementof the participating laboratories were very impressiveleading to the submission of enormous data sets to theDCC.

The pilot phase of the HUPO Plasma Proteome Projecttook place several months before the announcement of theHUPO BPP studies, so that the HUPO BPP committeeswere fortunate to be able to consider the pitfalls, challengesand milestones of the preceding HUPO PPP efforts. There-fore, it was known that one of the severe problems on thebioinformatics site was the heterogeneous data handlingwithin the different groups (data format, data collection anddata interpretation). To avoid possible obstacles, it wasagreed upon to use a common data-handling concept and tosupport the mzDATA standard from the HUPO PSI. Byimplementing the ProteinScape software at most participat-ing laboratories it was possible to assure correct dataexchange and storage between the groups and the DCC.

Incompatible data formats and long unmanageable Excel listcould be by-passed with this approach. The bioinformaticscommittee of the HUPO BPP made great efforts to elaboratea bioinformatics workflow for the “objective” reprocessing ofthe heterogeneous data sets. By combing existing softwaresolutions, by adapting the search parameters to the existingdata and by applying the decoy database the committee suc-ceeded in elaborating a fast, reliable (FPR ,5%) and auto-mated data reprocessing pipeline. Once defined, this work-flow can be applied fast (processing the data set generated inthese studies now takes approximately 2 weeks) and is easy touse.

The contribution of Marcus et al. in this Special Issueclearly shows that the analysis of the same sample couldresult in different sets of proteins when using different pro-tein separation systems (e.g. here Klose and IPG gels). Thiscould be due to the different nature of the used systems andthe different resulting separation properties thereof. Thesame is true for the different approaches used by the groups,namely gel-based vs. non-gel-based analyses. Even withinsimilar approaches, slightly variable handling or parameterslead to changed protein sets detected, especially when ana-lysing low abundant proteins. This could explain the smalloverlap between the proteins named by the different groups,leaving the single sets unique, but not (necessarily) wrong.The overlap is smaller between the HUPO BPP data set andother data, e.g. summarised by Martens et al., even whenconsidering the different samples.

Three conclusions have to be drawn from these results:(i) Annotations concerning sample handling, prepara-

tion, separation and identification are necessary, so that dis-crepancies and differences can be traced back. This willbecome mandatory as the journals already recognised theneed for reliable data sets [6, 7, 23].

(ii) Different approaches and search engines have to beseen as complementary. The combination of the generatedoutput results in added value, as on the one hand identifiedproteins can be approved, while on the other hand newproteins can be detected by the combination of the peptidesidentified by different groups. The separation features of thedifferent techniques overlap and can be applied succes-sively.

(iii) Every study has to show the reproducibility of itsdata. As the overlap of the identified proteins in regard tothe different laboratories is not optimal, it is extremelyimportant that the own data are handled very critically andthat internal standard operating procedures (SOP) pergroup (NOT between the laboratories) are essential. Thisincludes, e.g., independent analyses of the biological sam-ples with an appropriate statistical number of repeats (morethan five, no pooling) and taking into account the limita-tions of the used technique. Only confident protein listsresulting from stringent criteria (see, e.g., also benchmark-ing of MS/MS search algorithms by Kapp and colleagues[20]) are of advantage to the scientific community and willlead to biomarkers.

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com

Page 9: HUPO Brain Proteome Project: Summary of the pilot phase and introduction of a comprehensive data reprocessing strategy

4898 M. Hamacher et al. Proteomics 2006, 6, 4890–4898

The validation of the proteins identified is the nextessential step. However, the discussion of the biological con-text of the results will be done elsewhere, as this has not beenwithin the focus of the pilot study that concentrated on theline-up of available approaches and the elaboration of acomprehensive bioinformatics. The described reprocessingworkflow and especially the use of the same standards (e.g.software, data formats) turned out to be highly valuable toassure that results are comparable. This workflow will beimportant for further analyses in the HUPO BPP.

Concerning the upcoming main phase of HUPO BPP, ageneral roadmap was announced at the last HUPO BPPworkshop [25]. Cornerstone I consists of further proteomestudies (data collection) including the analyses of brain pro-teins, subcellular proteomes and CSF, cornerstone II coversbiomarker discovery and validation for early diagnosisincluding mouse models, human studies, human genetics,molecular biology, neurology, CSF and plasma as well asbioinformatics and data mining, while cornerstone IIIincludes the DCC and PRIDE. In general, the HUPO BPPwill learn from the lessons of these pilot studies and focuseven more intensively on standardisation and technologies,working closely together with the other HUPO initiatives,especially with the HUPO PSI, and other international andEU consortia.

Coordination of the HUPO BPP is supported by the GermanFederal Ministry for Education and Research with grants0313318B and 01GR0440. JK was supported by the GermanFederal Ministry for Education and Research within the frame-work of the German National Research Network (NGFN). Re-search in MJD laboratory is supported by the Science FoundationIreland under Grant No. 04/RP1/B499. YMP was supported bya grant from the Korea Institute of Science and Technology Eval-uation (M6–0403–00–0154), JSY by a grant from the 21CFrontier Functional Proteomics Project (FPR-05-A2–300) whichwere funded by the Korean Ministry of Science and Technology.LM would like to thank Prof. Kris Gevaert and Prof. Joël Vande-kerckhove for their support. L.M. is a Research Assistant of theFund for Scientific Research-Flanders (Belgium) (F.W.O.Vlaanderen).

The pilot project at the Biomedisch Onderzoekscentrum wassupported by the transnationale Universiteit Limburg (tUL)Impulsfinanciering. MS’ work has been supported by the GermanFederal Ministry for Education and Research grant 0312815.Fundings for CZ group: National Basic Research Project (973program) (2003CB715900); National Natural Science Founda-tion of China (90208017); Major Program for Science andTechnology Research of Beijing Municipal Bureau (7061004).HUPO BPP would like to thank all participating groups andactive members as well as all supporting companies (in alphabet-

ical order): Agilent, Applied Biosystems, BioVision, Bruker Dal-tonics, GE Healthcare, Ludesi, Nonlinear, Matrix Science, Pro-tagen AG, Thermo Electron.

5 References

[1] Hanash, S., Mol. Cell. Proteomics 2004, 3, 298–301.

[2] Meyer, H. E., Klose, J., Hamacher, M., Lancet Neurol. 2003, 2,657–658.

[3] Hamacher, M., Klose, J., Rossier, J., Marcus, K., Meyer, H. E.,Proteomics 2004, 4, 1932–1934.

[4] Hamacher, M., Meyer, H. E., Proteomics 2005, 5, 334–336.

[5] Beyreuther, K., Einhäupl, K. M., Förstl, H., Kurz, A., Demen-zen, Thieme Verlag Stuttgart 2002.

[6] Wilkins, M. R., Appel, R. D., Van Eyk, J. E., Chung, M. C. et al.,Proteomics 2006, 6, 4–8.

[7] Carr, S., Aebersold, R., Baldwin, M., Burlingame, A. et al.,Mol. Cell. Proteomics 2004, 3, 531–533.

[8] Martens, L., Hermjakob, H., Jones, P., Adamski, M. et al.,Proteomics 2005, 5, 3537–3545.

[9] Jones, P., Cote, R. G., Martens, L., Quinn, A. F. et al., NucleicAcids Res.. 2006, 34 (Database issue), D659–D663.

[10] Elger, C. E., Schramm, J. R., Radiologe 1993, 33, 165–171.

[11] Kral, T., Clusmann, H., Urbach, J., Schramm, J. et al., Zen-tralbl. Neurochir. 2002, 63, 106–110.

[12] Blumcke, I., Zuschratter, W., Schewe, J. C., Suter, B. et al., J.Comp. Neurol. 1999, 414, 437–453.

[13] Nicklas, W., Baneaux, P., Boot, R., Decelle, T. et al., Lab.Anim. 2002, 36, 20–42.

[14] Pollak, D. D., Herkner, K., Hoeger, H., Lubec, G., Behav. BrainRes. 2005, 163, 128–135.

[15] Bluggel, M., Bailey, S., Korting, G., Stephan, C. et al., Prote-omics 2004, 4, 2361–2362.

[16] Stephan, C., Reidegeld, K., Meyer, H. E., Hamacher, M., Pro-teomics 2005, 5, 2716–2717.

[17] Reidegeld, K. A., Blüggel, M., Körting, G., Chamrad, D. et al.,Eur. Pharm. Rev. 2006, 11, 33–38.

[18] Stephan, C., Hamacher, M., Bluggel, M., Korting, G. et al.,Proteomics 2005, 5, 3560–3562.

[19] States, D. J., Omenn, G. S., Blackwell, T. W., Fermin, D. et al.,Nat. Biotechnol. 2006, 24, 333–338.

[20] Kapp, E. A., Schutz, F., Connolly, L. M., Chakel, J. A. et al.,Proteomics 2005, 5, 3475–3490.

[21] Chamrad, D, Meyer, H. E., Nat. Methods 2005, 2, 647–648.

[22] Omenn, G. S., States, D. J., Adamski, M., Blackwell, T. W. etal., Proteomics 2005, 5, 3226–3245.

[23] Martens, L., Van Damme, P., Van Damme, J., Staes, A. et al.,Proteomics 2005, 5, 3193–3204.

[24] Omenn, G. S., Smith, L. T., J. Clin. Invest. 1978, 62, 235–240.

[25] Hamacher, M., Stephan, C., Palacios Bustamante, N. et al.,Proteomics 2006, 6, 2634–2637.

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com