Review Knowledge discovery in metabolomics: An overview of MS data handling While metabolomics attempts to comprehensively analyse the small molecules char- acterising a biological system, MS has been promoted as the gold standard to study the wide chemical diversity and range of concentrations of the metabolome. On the other hand, extracting the relevant information from the overwhelming amount of data generated by modern analytical platforms has become an important issue for knowledge discovery in this research field. The appropriate treatment of such data is therefore of crucial importance in order, for the data, to provide valuable information. The aim of this review is to provide a broad overview of the methodologies developed to handle and process MS metabolomic data, compare the samples and highlight the relevant metabo- lites, starting from the raw data to the biomarker discovery. As data handling can be further separated into data processing, data pre-treatment and data analysis, recent advances in each of these steps are detailed separately. Keywords: Data mining / Data processing / Metabolomics / MS DOI 10.1002/jssc.200900609 1 Metabolomics 1.1 Systems biology Metabolomics (also known as metabonomics) is a recent discipline that attempts to globally study metabolites and their concentrations, interactions and dynamics within complex samples [1]. It constitutes one of the tools of the post-genomic era [2], which is concerned with the study of the different functional levels of a biological system, i.e. the transcriptome, the proteome and the metabolome [3, 4]. As metabolites can be considered the downstream products of cellular regulatory processes [5], metabolomics data can precisely characterise cells, tissues, biofluids or whole organisms by defining specific biochemical phenotypes that are representative of physiological or developmental states. The metabolome is the holistic quantitative set of low- molecular-weight compounds ( o1000 Da), including many hundreds or thousands of molecules such as carbohydrates, vitamins, lipids and amino or fatty acids. These metabolites participate in the metabolic reactions necessary for normal functions, the maintenance or growth of a cell [2]. Their origin can be either endogenous such as the products of biosynthesis and catabolism or exogenous such as nutrients or pharmaceutical compounds degradation products [6, 7]. The chemical properties of the organic or inorganic constituents of this set are greatly variable and this diversity is a critical aspect to consider [8]. The variability in the molecular weight, polarity or solubility and the wide dynamic range of concentrations (from pmol to mmol) constitute additional difficulties when analysing metabolites while no amplification process is available. 1.2 Applications of metabolomics Measuring metabolites changes could offer deeper insights into biological mechanisms by describing the responses of systems to environmental or genetic modifications. Meta- bolomic analyses constitute a potent tool for the discovery of biomarkers related to a physiological response and the diagnosis of complex phenotypes. Numerous applications have already been developed in a variety of research fields of the post-genomic era, from medical science to agriculture. Although early metabolic studies focused on pre-selec- ted specific compounds, modern untargeted approaches represent efficient tools for the early detection of diseases [9–11]. The screening of drug candidates constitutes another prominent application through the assessment of the effects of metabolic modifications or toxicity [12, 13]. In addition, metabolomics has recently received an increasing interest from the nutrition research field [14]. Several applications Julien Boccard Jean-Luc Veuthey Serge Rudaz School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, Geneva, Switzerland Received September 29, 2009 Revised November 3, 2009 Accepted November 3, 2009 Abbreviations: ANN, artificial neural networks; DIMS, direct infusion mass spectrometry; HCA, hierarchical cluster analysis; PCA, principal component analysis; PLS, projection to latent structures by means of partial least squares; SVM, support vector machine; UV, unit variance Correspondence: Dr. Serge Rudaz, School of Pharmaceutical Sciences – EPGL, University of Geneva, 20 Bd d’Yvoy, 1211 Geneva 4, Switzerland E-mail: [email protected]Fax: 141-22-379-68-08 & 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.jss-journal.com J. Sep. Sci. 2010, 33, 1–15 1
15
Embed
Knowledge discovery in metabolomics: An overview of MS data handling
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Review
Knowledge discovery in metabolomics: Anoverview of MS data handling
While metabolomics attempts to comprehensively analyse the small molecules char-
acterising a biological system, MS has been promoted as the gold standard to study the
wide chemical diversity and range of concentrations of the metabolome. On the other
hand, extracting the relevant information from the overwhelming amount of data
generated by modern analytical platforms has become an important issue for knowledge
discovery in this research field. The appropriate treatment of such data is therefore of
crucial importance in order, for the data, to provide valuable information. The aim of this
review is to provide a broad overview of the methodologies developed to handle and
process MS metabolomic data, compare the samples and highlight the relevant metabo-
lites, starting from the raw data to the biomarker discovery. As data handling can be
further separated into data processing, data pre-treatment and data analysis, recent
advances in each of these steps are detailed separately.
Keywords: Data mining / Data processing / Metabolomics / MSDOI 10.1002/jssc.200900609
1 Metabolomics
1.1 Systems biology
Metabolomics (also known as metabonomics) is a recent
discipline that attempts to globally study metabolites and
their concentrations, interactions and dynamics within
complex samples [1]. It constitutes one of the tools of the
post-genomic era [2], which is concerned with the study of
the different functional levels of a biological system, i.e. the
transcriptome, the proteome and the metabolome [3, 4]. As
metabolites can be considered the downstream products of
cellular regulatory processes [5], metabolomics data can
precisely characterise cells, tissues, biofluids or whole
organisms by defining specific biochemical phenotypes that
are representative of physiological or developmental states.
The metabolome is the holistic quantitative set of low-
molecular-weight compounds (o1000 Da), including many
hundreds or thousands of molecules such as carbohydrates,
vitamins, lipids and amino or fatty acids. These metabolites
participate in the metabolic reactions necessary for normal
functions, the maintenance or growth of a cell [2]. Their
origin can be either endogenous such as the products of
biosynthesis and catabolism or exogenous such as nutrients
or pharmaceutical compounds degradation products [6, 7].
The chemical properties of the organic or inorganic
constituents of this set are greatly variable and this diversity
is a critical aspect to consider [8]. The variability in the
molecular weight, polarity or solubility and the wide
dynamic range of concentrations (from pmol to mmol)
constitute additional difficulties when analysing metabolites
while no amplification process is available.
1.2 Applications of metabolomics
Measuring metabolites changes could offer deeper insights
into biological mechanisms by describing the responses of
systems to environmental or genetic modifications. Meta-
bolomic analyses constitute a potent tool for the discovery of
biomarkers related to a physiological response and the
diagnosis of complex phenotypes. Numerous applications
have already been developed in a variety of research fields of
the post-genomic era, from medical science to agriculture.
Although early metabolic studies focused on pre-selec-
ted specific compounds, modern untargeted approaches
represent efficient tools for the early detection of diseases
[9–11]. The screening of drug candidates constitutes another
prominent application through the assessment of the effects
of metabolic modifications or toxicity [12, 13]. In addition,
metabolomics has recently received an increasing interest
from the nutrition research field [14]. Several applications
Julien BoccardJean-Luc VeutheySerge Rudaz
School of PharmaceuticalSciences, University of Geneva,University of Lausanne, Geneva,Switzerland
Received September 29, 2009Revised November 3, 2009Accepted November 3, 2009
Abbreviations: ANN, artificial neural networks; DIMS, directinfusion mass spectrometry; HCA, hierarchical clusteranalysis; PCA, principal component analysis; PLS,
projection to latent structures by means of partial leastsquares; SVM, support vector machine; UV, unit variance
Correspondence: Dr. Serge Rudaz, School of PharmaceuticalSciences – EPGL, University of Geneva, 20 Bd d’Yvoy, 1211Geneva 4, SwitzerlandE-mail: [email protected]: 141-22-379-68-08
interaction chromatography and ion exchange chromato-
graphy. The phase chemistry greatly influences the nature
of metabolites that can be investigated, and the assessment
of complete metabolomes is currently impossible using a
single chromatographic system [25, 53, 54]. The recent
introduction of capillary LC [55, 56] and ultra-high pressure
LC [57] offers great potential for metabolomic analyses on
complex samples by highly improving the chromatographic
performance. Columns packed with sub-2-mm particles can
lead either to the better separation of narrower chromato-
graphic peaks or to a faster analysis without the loss of
resolution, compared with conventional LC. The use of
elevated temperatures is another recent area of investigation
to improve the resolution of LC [58–60], whereas monolithic
columns constitute another alternative to improve the
chromatographic performance [53, 61, 62]. The typical 2-D
structure of LC-MS data is shown in Fig. 3.
3 Metabolomic data handling
3.1 Data handling
Metabolomic experiments usually produce large amounts of
data. Handling and analysing such complex data sets has a
great impact on the quality of the identification and
quantification of putative low-mass regulators, and therefore
to the resulting biological interpretation [63]. Data handling
can be further separated into data processing, data pre-
treatment and data analysis [64]. The distinct steps of the
strategy applied for knowledge discovery are shown in Fig. 4.
The appropriate procedures for data handling should take
the different sources of variation into account and should be
cautiously undertaken as the choice of these procedures
depends heavily on: (i) the analytical platform used to gener-
ate the data, (ii) the biological phenomenon under study, (iii)
the downstream data analysis method and (iv) the inherent
properties of the data (e.g. dimensionality) [65].
Figure 2. 2-D GC-MS [173].
Total Ion Chromatogram Single Mass Spectrum
Single Ion ChromatogramTotal Mass Spectrum
Inte
nsity
Figure 3. LC-MS data with the corresponding total ion chroma-togram and total mass spectrum. Single ion chromatogram andsingle mass spectrum are given as examples of the specificmonitoring of particular m/z or retention time.
J. Sep. Sci. 2010, 33, 1–15 Liquid Chromatography 3
[1] Nicholson, J. K., Lindon, J. C., Holmes, E., Xenobiotica1999, 29, 1181–1189.
[2] Sumner, L. W., Mendes, P., Dixon, R. A., Phytochem-istry 2003, 62, 817–836.
[3] Hollywood, K., Brison, D. R., Goodacre, R., Proteomics2006, 6, 4716–4723.
[4] Oksman-Caldentey, K. M., Saito, K., Curr. Opin.Biotechnol. 2005, 16, 174–179.
[5] Oresic, M., Clish, C. B., Davidov, E. J., Verheij, E.,Vogels, J., Havekes, L. M., Neumann, E., Adourian, A.,Naylor, S., van der Greef, J., Plasterer, T., Appl.Bioinformatics 2004, 3, 205–217.
[6] Keun, H. C., Pharmacol. Ther. 2006, 109, 92–106.
[7] Plumb, R. S., Dear, G. J., Mallett, D. N., Higton, D. M.,Pleasance, S., Biddlecombe, R. A., Xenobiotica 2001,31, 599–617.
[8] Forster, J., Famili, I., Fu, P., Pjzalsson, B. O., Nielsen, J.,Genome Res. 2003, 13, 244–253.
[9] Ackermann, B. L., Hale, J. E., Duffin, K. L., Curr. DrugMetab. 2006, 7, 525–539.
[10] van der Greef, J., Martin, S., Juhasz, P., Adourian, A.,Plasterer, T., Verheij, E. R., McBurney, R. N.,J. Proteome Res. 2007, 6, 1540–1559.
[11] Sreekumar, A., Poisson, L. M., Rajendiran, T. M., Khan,A. P., Cao, Q., Yu, J. D., Laxman, B., Mehra, R., Lonigro,R. J., Li, Y., Nyati, M. K., Ahsan, A., Kalyana-Sundaram,S., Han, B., Cao, X. H., Byun, J., Omenn, G. S., Ghosh,D., Pennathur, S., Alexander, D. C., Berger, A., Shuster,J. R., Wei, J. T., Varambally, S., Beecher, C., Chinnai-yan, A. M., Nature 2009, 457, 910–914.
[12] Lindon, J. C., Nicholson, J. K., Holmes, E., Antti, H.,Bollard, M. E., Keun, H., Beckonert, O., Ebbels, T. M.,Reilly, M. D., Robertson, D., Stevens, G. J., Luke, P.,Breau, A. P., Cantor, G. H., Bible, R. H., Niederhauser,U., Senn, H., Schlotterbeck, G., Sidelmann, U. G.,Laursen, S. M., Tymiak, A., Car, B. D., Lehman-McKeeman, L., Colet, J. M., Loukaci, A., Thomas, C.,Toxicol. Appl. Pharmacol. 2003, 187, 137–146.
[13] Dieterle, F., Schlotterbeck, G. T., Ross, A., Niederhauser,U., Senn, H., Chem. Res. Toxicol. 2006, 19, 1175–1181.
[14] Wishart, D. S., Trends Food Sci. Technol. 2008, 19,482–493.
[15] Weckwerth, W., Annu. Rev. Plant. Biol. 2003, 54,669–689.
[49] Schauer, N., Steinhauser, D., Strelkov, S., Schomburg,D., Allison, G., Moritz, T., Lundgren, K., Roessner-Tunali, U., Forbes, M. G., Willmitzer, L., Fernie, A. R.,Kopka, J., FEBS Lett. 2005, 579, 1332–1337.
[50] Beens, J., Brinkman, U. A. T., Analyst 2005, 130, 123–127.
[51] Idborg, H., Edlund, P. O., Jacobsson, S. P., RapidCommun. Mass. Spectrom. 2004, 18, 944–954.
[52] Niessen, W. M. A., J. Chromatogr. A 1999, 856,179–197.
[53] Tolstikov, V. V., Lommen, A., Nakanishi, K., Tanaka, N.,Fiehn, O., Anal. Chem. 2003, 75, 6737–6740.
[54] Wang, C., Kong, H. W., Guan, Y. F., Yang, J., Gu, J. R.,Yang, S. L., Xu, G. W., Anal. Chem. 2005, 77,4108–4116.
[55] Ding, J., Sorensen, C. M., Zhang, Q. B., Jiang, H. L.,Jaitly, N., Livesay, E. A., Shen, Y. F., Smith, R. D., Metz,T. O., Anal. Chem. 2007, 79, 6081–6093.
[56] Dear, G. J., Ayrton, J., Plumb, R., Fraser, I. J., RapidCommun. Mass. Spectrom. 1999, 13, 456–463.
[57] Swartz, M. E., J. Liquid Chromatogr. Rel. Technol.2005, 28, 1253–1263.
[58] Plumb, R. S., Rainville, P., Smith, B. W., Johnson, K. A.,Castro-Perez, J., Wilson, I. D., Nicholson, J. K., Anal.Chem. 2006, 78, 7278–7283.
[59] Stoll, D. R., Cohen, J. D., Carr, P. W., J. Chromatogr.A 2006, 1122, 123–137.
[60] Grata, E., Boccard, J., Guillarme, D., Glauser, G.,Carrupt, P. A., Farmer, E. E., Wolfender, J. L., Rudaz, S.,J. Chromatogr. B Analyt. Technol. Biomed. Life Sci.2008, 871, 261–270.
[61] Maruska, A., Kornysova, O., J. Chromatogr. A 2006,1112, 319–330.
[62] Pham-Tuan, H., Kaskavelis, L., Daykin, C. A., Janssen,H. G., J. Chromatogr. B 2003, 789, 283–301.
[63] Hendriks, M. M. W. B., Cruz-Juarez, L., De Bont, D.,Hall, R., Anal. Chim. Acta 2005, 545, 53–64.
[64] van den Berg, R. A., Hoefsloot, H. C. J., Westerhuis,J. A., Smilde, A. K., van der Werf, M. J., Biomed.Chromatogr. Genomics 2006, 7.
[65] Jonsson, P., Bruce, S. J., Moritz, T., Trygg, J., Sjoes-troem, M., Plumb, R., Granger, J., Maibaum, E.,Nicholson, J. K., Holmes, E., Antti, H., Analyst 2005,130, 701–707.
[66] Hilario, M., Kalousis, A., Pellegrini, C., Muller, M., MassSpectrom. Rev. 2006, 25, 409–449.
[67] Katajamaa, M., Oresic, M., J. Chromatogr. A 2007,1158, 318–328.
[68] Davis, R. A., Charlton, A. J., Godward, J., Jones, S. A.,Harrison, M., Wilson, J. C., Chemometric Intell. Lab.Syst. 2007, 85, 144–154.
[69] Fleming, C. M., Kowalski, B. R., Apffel, A., Hancock,W. S., J. Chromatogr. A 1999, 849, 71–85.
[70] Radulovic, D., Jelveh, S., Ryu, S., Hamilton, T. G., Foss,E., Mao, Y., Emili, A., Mol. Cell. Proteom. 2004, 3,984–997.
[71] Wang, W., Zhou, H., Lin, H., Roy, S., Shaler, T. A., Hill,L. R., Norton, S., Kumar, P., Anderle, M., Becker, C. H.,Anal. Chem. 2003, 75, 4818–4826.
[72] Li, X. J., Yi, E. C., Kemp, C. J., Zhang, H., Aebersold, R.,Mol. Cell. Proteom. 2005, 4, 1328–1340.
[73] Hastings, C. A., Norton, S. M., Roy, S., Rapid Commun.Mass. Spectrom. 2002, 16, 462–467.
[74] Katajamaa, Mi., Oresic, M., Biomed. Chromatogr.Bioinformatics 2005, 6, 179.
[75] Smith, C. A., Want, E. J., O’Maille, G., Abagyan, R.,Siuzdak, G., Anal. Chem. 2006, 78, 779–787.
[76] Eanes, R. C., Marcus, R. K., Spectrochim. Acta B. At.Spectrosc. 2000, 55B, 403–428.
[77] Hermansson, M., Uphoff, A., Kakela, R., Somerharju,P., Anal. Chem. 2005, 77, 2166–2175.
[78] Leptos, K. C., Sarracino, D. A., Jaffe, J. D., Krastins, B.,Church, G. M., Proteomics 2006, 6, 1770–1782.
[79] Windig, W., Phalp, J. M., Payne, A. W., Anal. Chem.1996, 68, 3602–3606.
[80] Windig, W., Smith, W. F., J. Chromatogr. A 2007, 1158,251–257.
[81] Gika, H. G., Macpherson, E., Theodoridis, G. A.,Wilson, I. D., J. Chromatogr. B 2008, 871, 299–305.
[82] Lange, E., Tautenhahn, R., Neumann, S., Gropl, C.,Biomed Chromatogr. Bioinformatics 2008, 9, 375.
[91] De Souza, D. P., Saunders, E. C., McConville, M. J.,Likic, V. A., Bioinformatics 2006, 22, 1391–1396.
[92] Shen, H. L., Grung, B., Kvalheim, O. M., Eide, I., Anal.Chim. Acta 2001, 446, 313–328.
[93] Andreev, V. P., Rejtar, T., Chen, H. S., Moskovets, E. V.,Ivanov, A. R., Karger, B. L., Anal. Chem. 2003, 75,6314–6326.
[94] Scholz, M., Gatzek, S., Sterling, A., Fiehn, O., Selbig, J.,Bioinformatics 2004, 20, 2447–2454.
[95] Bijlsma, S., Bobeldijk, L., Verheij, E. R., Ramaker, R.,Kochhar, S., Macdonald, I. A., van Ommen, B., Smilde,A. K., Anal. Chem. 2006, 78, 567–574.
[96] Wu, L., Mashego, M. R., van Dam, J. C., Proell, A. M.,Vinke, J. L., Ras, C., van Winden, W. A., van Gulik,W. M., Heijnen, J. J., Anal. Biochem. 2005, 336,164–171.
[97] Birkemeyer, C., Luedemann, A., Wagner, C.,Erban, A., Kopka, J., Trends Biotechnol. 2005, 23,28–33.
J. Sep. Sci. 2010, 33, 1–15 Liquid Chromatography 13
[107] Bro, R., Smilde, A. K., J. Chemometrics 2003, 17, 16–33.
[108] Jackson, J. E., A User’s Guide to Principal Compo-nents, Wiley Interscience, New York 2004.
[109] Eriksson, L., Johansson, E., Kettaneh-Wold, N., Wold,S., Multi- and Megavariate Data Analysis: Principlesand Applications, Umetrics Academy, Umea, Sweden2002.
[110] Keun, H. C., Ebbels, T. M. D., Antti, H., Bollard, M. E.,Beckonert, O., Holmes, E., Lindon, J. C., Nicholson,J. K., Anal. Chim. Acta 2003, 490, 265–276.
[111] Smilde, A. K., van der Werf, M. J., Bijlsma, S., van derWerff-van der Vat, B., Jellema, R. H., Anal. Chem. 2005,77, 6729–6736.
[112] Kvalheim, O. M., Brakstad, F., Liang, Y. Z., Anal. Chem.1994, 66, 43–51.
[113] Sokal, R. R., Rohlf, F. J., Biometry: The Principles andPractice of Statistics in Biological Research, W. H.Freeman and Company, New York 1995.
[114] Weckwerth, W., Morgenthal, K., Drug Discov. Today2005, 10, 1551–1558.
[115] Broadhurst, D. I., Kell, D. B., Metabolomics 2006, 2,171–196.
[116] Holmes, E., Antti, H., Analyst 2002, 127, 1549–1557.
[117] Massart, D. L., Vandeginste, B. G. M., Buydens,L. M. C., De Jong, S., Lewi, P. J., Smeyers-Verbeke, J.(Eds.), Handbook of Chemometrics and Qualimetrics,Part A, Elsevier, Oxford, UK 1997.
[118] Eriksson, L., Antti, H., Gottfries, J., Holmes, E., Johans-son, E., Lindgren, F., Long, I., Lundstedt, T., Trygg, J.,Wold, S., Anal. Bioanal. Chem. 2004, 380, 419–429.
[119] Hotelling, H., J. Educat. Psychol. 1933, 24, 417–441.
[120] Trygg, J., Gullberg, J., Johansson, A. I., Jonsson, P.,Moritz, T., Biotechnol. Agri. Forestry 2006, 57, 117–128.
[121] Wiener, M. C., Sachs, J. R., Deyanova, E., Yates, N. A.,Anal. Chem. 2004, 76, 6085–6096.
[122] Shaffer, J. P., Ann. Rev. Psychol. 1995, 46, 561–584.
[123] Schauer, N., Zamir, D., Fernie, A. R., J. Exp. Bot. 2005,56, 297–307.
[124] Serkova, N. J., Jackman, M., Brown, J. L., Liu, T.,Hirose, R., Roberts, J. P., Maher, J. J., Niemann, C. U.,J. Hepatol. 2006, 44, 956–962.
[125] Pearson, K., Philos. Mag. 1901, 2, 559–572.
[126] Major, H. J., Williams, R., Wilson, A. J., Wilson, I. D.,Rapid Commun. Mass Spectrom. 2006, 20, 3295–3302.
[127] Pierce, K. M., Hope, J. L., Hoggard, J. C., Synovec,R. E., Talanta 2006, 70, 797–804.
[128] Bylund, D., Samskog, J., Markides, K. E., Jacobsson,S. P., J. Am. Soc. Mass Spectrom. 2003, 14, 236–240.
[129] Belton, P. S., Colquhoun, I. J., Kemsley, E. K., Delga-dillo, I., Roma, P., Dennis, M. J., Sharman, M., Holmes,E., Nicholson, J. K., Spraul, M., Food Chem. 1998, 61,207–213.
[130] Jansen, J. J., Hoefsloot, H. C. J., Boelens, H. F. M., vander Greef, J., Smilde, A. K., Bioinformatics 2004, 20,2438–2446.
[131] Chan, E. C. Y., Yap, S. L., Lau, A. J., Leow, P. C., Toh,D. F., Koh, H. L., Rapid Commun. Mass Spectrom.2007, 21, 519–528.
[132] Eisen, M. B., Spellman, P. T., Brown, P. O., Botstein, D.,Proc. Natl. Acad. Sci. USA 1998, 95, 14863–14868.
[133] Tikunov,Y., Lommen, A., De Vos, C. H. R., Verhoeven,H. A., Bino, R. J., Hall, R. D., Bovy, A. G., Plant Physiol.2005, 139, 1125–1137.
[134] Beckonert, O., Bollard, M. E., Ebbels, T. M. D., Keun,H. C., Antti, H., Holmes, E., Lindon, J. C., Nicholson, J.K., Anal. Chim. Acta 2003, 490, 3–15.
[135] Wold, S., Johansson, E., Cocchi, M., 3D QSAR in DrugDesign: Theory, Methods and Applications, EscomScience Publisher, Leiden 1993, pp. 523–550.