Metabolomics PCB 5530 Tom Niehaus Fall 2014
Dec 23, 2015
Metabolomics
PCB 5530Tom Niehaus
Fall 2014
Learning Outcomes
- Learn the basics of metabolomics
- Understand the limitations of metabolomics
- Things to consider when using metabolomics for your own research
• Finish lecture
Day 1
Day 2
• Lecture
• Activity 1: Identifying an unknown peak
• Activity 2: Analyzing a metabolomics dataset
Definitions and BackgroundMetabolome = the total metabolite pool
• All low molecular weight (< 2000 Da) organic molecules in a sample such as a leaf, fruit, seedling, etc.
SugarsNucleosidesOrganic acidsKetonesAldehydesAminesAmino acidsSmall peptidesLipidsSteroidsTerpenesAlkaloidsDrugs (xenobiotics)
Metabolomics = high-throughput analysis of metabolites
Definitions and Background
Metabolomics is the simultaneous measurement of the levels of a large number of cellular metabolites (typically several hundred). Many of these are not identified (i.e. are just peaks in a profile).
Not hypothesis driven
snapshot
Definitions and Background
Definitions and Background
Metabolomics
Metabolic profiling
Targeted analysis
-measure many compounds
-measure a set of related compounds
-measure a specific compound
Scope Accuracy
Definitions and BackgroundHistory and Development
• Metabolic profiling is not new. Profiling for clinical detection of human disease using blood and urine samples has been carried out for Centuries.
This urine wheel was published in 1506 by Ullrich Pinder, in his book Epiphanie Medicorum. The wheel describes the possible colors, smells and tastes of urine, and uses them to diagnose disease.
Nicholson, J. K. & Lindon, J. C. Nature 455, 1054–1056 (2008).
Definitions and BackgroundHistory and Development
• Advanced chromatographic separation techniques were developed in the late 1960’s.
• Linus Pauling published “Quantitative Analysis of Urine Vapor and Breath by Gas-Liquid Partition Chromatography” in 1971
• Chuck Sweeley at MSU helped pioneer metabolic profiling using gas chromatography/ mass spectrometry (GC-MS)
• Plant metabolic biochemists (e.g. Lothar Willmitzer) were among other early leaders in the field.
• Metabolomics is expanding to catch up with other multiparallel analytical techniques (transcriptomics, proteomics) but remains far less developed and less accessible.
Definitions and BackgroundPlant Metabolome Size
• It is estimated that all plant species contain 90,000 - 200,000 compounds.
• Each individual plant species contains about 5,000 – 30,000 compounds.
e.g. ~ 5,000 in Arabidopsis
The plant metabolome is much larger than that of yeast, where there are far fewer metabolites than genes or proteins (<600 metabolites vs. 6000 genes). The size of the plant metabolome reflects the vast array of plant secondary compounds. This makes metabolic profiling in plants much harder than in other organisms.
Definitions and BackgroundThe Power of Metabolomics
Silent Knockout Mutations.
~90% of Arabidopsis knockout mutations are silent – i.e. have no visible phenotype and so provide no clues to gene function. (The search for some sort of visible phenotype therefore often becomes desperate.) The situation in yeast is similar – up to 85% of yeast genes are not needed for survival.
When there is little or no change in growth rate (visible phenotype) of a knockout mutant, the pool sizes of metabolites have altered so as to compensate for the effect of the mutation, leaving metabolic fluxes are unchanged. Thus – intuitively – mutations that are silent when scored for metabolic fluxes or growth rate (growth rate is the sum of all metabolic fluxes) should have obvious effects on metabolite levels. There is a firm theoretical basis for this in MCA.
Definitions and BackgroundThe Power of Metabolomics
Example.
• In the Chloroplast 2010 project (phenotype analysis of knockouts of Arabidopsis genes encoding predicted chloroplast proteins):
• Various knockouts showed essentially normal growth and color but highly abnormal free amino acid profiles, e.g. At1g50770 (‘Aminotransferase-like’)
Definitions and BackgroundLimitations of metabolomics
• High biological variance in metabolite levels (i.e., high variation between genetically identical plants grown in the same conditions)
• Unlike nucleic acids and proteins, metabolites have a vast range of chemical structures and properties. Their molecular weights span two orders of magnitude (20–2000 Da). Therefore no single extraction or analysis method works for all metabolites. (Unlike DNA sequencing, microarrays, MS analysis of proteins – all are general methods.)
• The concentrations of various metabolites can vary dramatically from mM to pM concentrations.
• Some metabolites are labile and won’t survive extraction and analysis
• Issues with chromatography, detection, and data analysis
MetabolomicsSteps in metabolomics
sample preparation sample extraction
chromatography
detectiondata analysis
Sample PreparationGrowth/Sample Size
• Grow organisms (e.g. plants or bacteria) under identical conditions
• Randomize the treatment groups
(Make sure the effects you measure are due to the variable being testing)
• number of replicates… depends on what you want to find
- Large differences = small replication needed - Small differences = large replication needed
• In general, six replicates for each treatment are needed (due to high biological variability)
Sample PreparationSample collection
• Uniform sample sizes (e.g. hole punches in leaves)
• Be consistent - similar tissue- time of day
• Quickly freeze sample in liquid nitrogen, store samples at -80°C
• Fast-harvesting method for bacteria (~30 sec)
Sample ExtractionChoosing an extraction method
• No universal extraction method exists
• Some solvents may degrade certain compounds
• Its good to have some idea of what metabolites you want to extract
Sample ExtractionSample extraction
• The method should be consistent and reproducible
• Further workup may be required (e.g. solid phase extraction)
SPEX SamplePrep Grinder
Y
Chromatographyintroduction
• Invented in 1900 by Mikhail Tsvet (used to separate plant pigments)
• There are several types of chromatography, but all consist of a stationary phase and a mobile phase. Compounds are separated based on differential partitioning between the two phases.
• Types include:- TLC (thin-layer chromatography)- GC (gas chromatography)- LC (liquid chromatography)
GC and LC are routinely used in metabolomics
ChromatographyGas Chromatography
• GC = ‘good chromatography’
• optimized over several decades
• ~5 columns routinely used
• high reproducibility
Limitations:- high temperatures can destroy labile compounds- polar compounds cannot ‘fly’ on GC columns and must first be derivatized
(5% diphenyl/95% methyl siloxane)
Identification based on RT
20
1) Methoximation of aldehyde and keto groups (primarily for opening reducing ring sugars)2) Silylation of polar hydroxy, thiol, carboxy and amino groups with silylation agent MSTFA
• A single compound with multiple active groups will result in multiple peaks (1TMS, 2TMS, 3TMS)• GC-MS can distinguish between stereoisomers
Step 1) Methoximation Step 2) Silylation
Gas chromatography requires volatile compounds (two step derivatization in vial)
estrone minor_R I 950990 estrone major_R I 94875380 110 140 170 200 230 260 290 320 350 380 410 440 470 500
0
50
100
50
100
91
91
96
96
107
107
115
115
128
128
141
141
147 163
163
177
189
193
205
207
218
218
231
231
244
244 257
267
271
283
283
298
298
312
312
340
340
356
356
371
371
383388401 415
415435 457 475 489
m/z
Abu
ndan
ce
estrone minor_R I 950990 estrone major_R I 94875380 110 140 170 200 230 260 290 320 350 380 410 440 470 500
0
50
100
50
100
91
91
96
96
107
107
115
115
128
128
141
141
147 163
163
177
189
193
205
207
218
218
231
231
244
244 257
267
271
283
283
298
298
312
312
340
340
356
356
371
371
383388401 415
415435 457 475 489
m/z
Abu
ndan
ce
Z/E isomer have same mass spectrumbut differ 2 seconds in retention time
Anal Chem. 2009 Dec 15;81(24):10038-48. doi: 10.1021/ac9019522.FiehnLib: mass spectral and retention index libraries for metabolomics based on quadrupole and time-of-flight gas chromatography/mass spectrometry.Kind T, Wohlgemuth G, Lee do Y, Lu Y, Palazoglu M, Shahbaz S, Fiehn O.
ChromatographySample derivatization
ChromatographyLiquid Chromatography
• LC = ‘Lousy chromatography’
• fairly new, recent advances
• infinite solvent systems possible
Advantages:- compound can be collected after separation- derivatization not necessary- a separation protocol can be optimized for nearly any compound
• low reproducibility
• thousands of columns available- normal phase -ion exchange- reverse phase -HILIC
DetectionMass Spectrometry
• mass spectrometry is a technique to measure the mass of ions (m/z)
• All mass spectrometers perform three main tasks:
1) Ionize molecules:
2) Use electric and magnetic fields to accelerate ions and manipulate their flight:
3) Detect ions (convert to electronic signal):
DetectionMass Spectrometry
Example mass spectrum:
m/z
rela
tive
abun
danc
e
DetectionMass Spectrometry
Time [min]
NormalizedIntensity
m/z
NormalizedIntensity
Chromatogram (GC-MS)
100
0
50
75
25
0 2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 18.00 20.00 22.00
100
0
50
75
25
30 40 50 60 70 80 90 100 110 120 130 140 150 160 170
166
971298361
47
35
11911270
Mass spectrum (EI)
Peak selector
Mass SpectrometryIonization: chemical vs electon
M anua l C omponent in d:\ opteron-sav er\ pro jec ts\ gc t\ asms2008\ data -ppt\ qc 100_ 05-21-08_ ac q65-800_ 1.c dfc affeic ac id R I 745130
60 90 120 150 180 210 240 270 300 330 360 390 420 450 480
0
50
100
50
100
73 87
89
99 114
117
131 147
147
165
175
191
191
203
219
219
233
247
249
265
267
307
307324327
340
353
381
381
396
397
425
437 469 493
Chemical Ionization (+)
ElectronIonization (+)
[M+H]+
[M+28]+
[M+40]+
m/z394.0 396.0 398.0 400.0 402.0 404.0 406.0
%
0
100
70eV; 500uA Emission; 40% CI gas; mass range 65-800; ScanRate0.2-0.03; source tempe 200C;PushInter 40
TOF MS CI+
1.43e+003
397.1690
396.1648
395.2110
398.1729
399.1714
400.1689 401.1606
m/z394.0 396.0 398.0 400.0 402.0 404.0 406.0
%
0
100
70eV; 500uA Emission; 40% CI gas; mass range 65-800; ScanRate0.2-0.03; source tempe 200C;PushInter 40
TOF MS CI+
1.43e+003
397.1690
396.1648
395.2110
398.1729
399.1714
400.1689 401.1606
• [M+H]+ is very abundant in chemical ionization (CI) • Different ionization gases can be used such as NH3, methane, butane
Example picture: adduct ions at M+28.02=[M+C2H5]+ and M+40.04=[M+C3H5]+ are used for verification of [M+H]+
Accurate mass [u]397.1690Mass accuracy [ppm]
5Isotopic abundance error [%]
5A+1 [%] 37.90A+2 [%] 17.84A+3 [%] 5.03
26
Adduct formation – expect the unexpected
Adduct ion Percent [%] Adduct ion Percent [%] Adduct ion Percent [%] Adduct ion Percent [%] Adduct ion Percent [%][M+H]+ 62.55381 [M+H-C3H8O]+ 0.02667 [M-CCl3]+ 0.00381 [M(37Cl)]+. 0.00190 [M-2H+Na]- 0.00127
[M+2H]2+ 11.44459 [M-H-H2O-CO2]- 0.02667 [M-H-CO2]- 0.00381 [M-CH3]+ 0.00190 [M-H+Co]+ 0.00127
[M+H-H2O]+ 8.77598 [M-H-H2O-HCO2H]- 0.02667 [M+H-C5H7PO6]+ 0.00381 [M+H-C4H11N]+ 0.00190 [M+H-(CH3)2NH-C3H6]+ 0.00127
[M-H]- 6.25214 [M+H-3H2O]+ 0.02540 [M+H-HCl]+ 0.00381 [M+H-NO2-CHO]+ 0.00190 [M+H-C10H6(OH)N]+ 0.00127
[M+Na]+ 5.51055 [M+H-CHN]+ 0.02540 [M+H-C12H12N2O3]+ 0.00381 [M-H-HF]- 0.00190 [M-H+Ni]+ 0.00127
[M+H-NH3]+ 1.19494 [M+K-3H]2- 0.01905 [M+H-CH3CO2H]+ 0.00381 [M(37Cl)+H]+ 0.00190 [M-H-H2O-C4H7CO2H]- 0.00127
[M+NH4]+ 0.73715 [M+H-(CH3)2NH]+ 0.01524 [M+H-CH3]+. 0.00381 [M-H-C6H10O5]- 0.00190 [M+H-OH]+ 0.00127
[M-H-H2O]- 0.34604 [M+H-CHNO]+ 0.01333 [M+H-H2]+ 0.00381 [M+H-H2O-C6H13N]+ 0.00190 [M(81Br)+H]+... 0.00127
[M-H+2Na]+ 0.32953 [M+H-C2H6O]+ 0.01333 [M+H-C3H8NO6P]+ 0.00317 [M+H-H2O-H3PO4]+ 0.00190 [M-H-CH2O-CH2NH]- 0.00127
[M-H+H2O]- 0.24508 [M+H-CH4O]+ 0.01270 [M+H-C5H14NO4P]+ 0.00317 [M+H-C5H7PO6-NH3]+ 0.00190 [M+H-CO-CONH]+ 0.00127
[M+NH4-H2O]+ 0.22984 [M+H-C7H13NO3]+ 0.01143 [M+Li-(CH3)3N]+ 0.00317 [M-H-C5H7PO6]- 0.00190 [M-H-CONH]- 0.00127
[M+H+H2O]+ 0.19429 [M+Na-2H]- 0.00952 [M+Li-C5H14NO4P]+ 0.00317 [M+H-H2S]+ 0.00190 [M+H-C3H4O2]+ 0.00127
[M+H+Na]2+ 0.18286 [M-H-CH2O]- 0.00952 [M+Cl]- 0.00317 [M+H-H2O-C8H8]+ 0.00190 [M+H-C3H6O4]+ 0.00127
[M+H+K]2+ 0.17524 [M+H-C11H12N2O3]+ 0.00952 [M(35Cl)-H]- 0.00317 [M+H-H2O-NH3-C8H8]+ 0.00190 [M+Na-H2S]+ 0.00127
[M-2H]2- 0.13968 [M+H-C13H16N3O4]+ 0.00952 [M(37Cl)-H]- 0.00317 [M+H-H2O-NH3-C8H8-CO]+ 0.00190 [M-H+2Na-H2S]+ 0.00127
[M+2Na]2+ 0.13778 [M+H-C17H25N3O4]+ 0.00952 [M-H-C5H7O6P]- 0.00317 [M+H-H2O-NH3]+ 0.00190 [M-C5H5Cl]+ 0.00127
[M+2H-NH3]2+ 0.13714 [M+CH3CO2]- 0.00889 [M+H-C3H7O5P]+ 0.00317 [M+H-C3H6]+ 0.00190 [M+H-N2]+ 0.00127
[M+K]+ 0.13651 [M-H2O+Na]+ 0.00825 [M-H-C6H6N8O]- 0.00317 [M+HCO2-320]- 0.00190 [M+H-H2O-CO]+ 0.00127
[M+H-2H2O]+ 0.11810 [M-H+NH3]- 0.00762 [M(81Br)+H]+ 0.00317 [M+H-C3H7N]+ 0.00190 [M-H-H3PO4]- 0.00127
[M+3H]3+ 0.06667 [M+H-C9H9NO]+ 0.00762 [M-C4H9]+ 0.00317 [M-H-H2]- 0.00190 [M+H+CH3CN]+ 0.00127
[M+2H-H2O]2+ 0.06476 [M+H-C15H21N2O3]+ 0.00762 [M-2H+3Li]+ 0.00254 [M-H-C16H30O-H2O]- 0.00190 [M+H-C4H6]+ 0.00127
[M]+. 0.05905 [M-2H+3Na]+ 0.00698 [M-H-HCl]- 0.00254 [M-H-CH4O]- 0.00190 [M+H-CH3OH]+ 0.00127
[M+2Na-H]+ 0.05143 [M+HCO2]- 0.00635 [M+2Li-H]+ 0.00254 [M+H-C10H8FN3]+ 0.00127 [M+H-HCCl3]+ 0.00127
[M-H+2K]+ 0.05079 [M+H-NO2]+ 0.00571 [M+H-C8H10O2]+ 0.00254 [M+Li-C3H5NO2]+ 0.00127 [M+H-C2H3N3]+ 0.00127
[M+H-CO]+ 0.04635 [M+H-C6H13NO2]+ 0.00571 [M+H-C2Cl4]+ 0.00254 [M+Li-H3PO4]+ 0.00127 [M+H-C3H6O2]+ 0.00127
[M+H-CO2]+ 0.04318 [M-H-C3H5NO2]- 0.00508 [M-H-C7H5NO]- 0.00254 [M-2H+3Li-C15H31CO2H]+ 0.00127 [M+H-CH2Cl2O]+ 0.00127
[M+H-CH2O2]+ 0.03810 [M(81Br)-H]- 0.00508 [M+H-C5H11N]+ 0.00254 [M-2H+3Na-C3H5NO2]+ 0.00127 [M(356)+H-HCl]+ 0.00127
[M-H-NH3]- 0.03746 [M+H-HCO2H]+ 0.00508 [M+Ba-H]+ 0.00254 [M-2H+Na+Co]+ 0.00127 [M-C4H4O4S]+ 0.00127
[M.Cl]- 0.03556 [M-2H+Li]- 0.00444 [M+H-C14H25NO3]+ 0.00254 [M-2H+Li-C3H5NO2]- 0.00127 [M+H-C8H14O3]+ 0.00127
[M+Li]+ 0.03111 [M+H-CH4]+ 0.00444 [M+H-C6H5NO2S]+ 0.00254 [M-2H+Li-C16H30O]- 0.00127 [M+H-C2H4]+ 0.00127
Statistics: Adducts in NIST12 MS/MS DB (80,000 spectra)Most common adducts for LC-MS ([M+H]+ [M+Na]+ [M+NH4]+ [M+acetate]+)
…around 290 different adducts
• There are several types of mass spectrometers:
- TOF (time of flight) - Q, QQQ (quadrupole) - Ion Trap - Orbitrap- FTICR (Fourier transform ion cyclotron resonance)
Mass SpectrometryMass Spectrometers
Quad TOF
Mass SpectrometryDefinitions and concepts
• isomer- compounds with the same chemical formula
• isobar- compounds with similar masses
e.g. CO (27.9949) and C2H4 (28.0313)
e.g. propanol and isopropanol (C3H8O)
C8H10N2O has 100,082,479 isomers
• isotopes- compounds with different numbers of neutrons in their nuclei
e.g. 12C vs 13C
Mass SpectrometryDefinitions and concepts
• Resolution (resolving power)
• Accuracy
• Mass range
RP(FWHM) = measured mass / peak width at 50% peak intensity
Difference in true mass and measured mass
Range of ions that can be detected (typically 50-1000 m/z)
Mass SpectrometryWhy is resolution important?
• High resolution is needed to determine the accurate mass
• High resolution is also needed to determine accurate isotopic patterns
• Note: -monoisotopic vs ave mass
-accurate mass can distinguish isobars, not isomers
Mass SpectrometryDefinitions and concepts
• Dynamic range- the concentration range over which a linear response is obtained
• Sensitivity- the lowest amount an instrument can detect
• Speed- the number of spectra or scans that can be acquired in one second
Determines the capability of an instrument to do quantitative analysis
1 scan/ sec = very slow500 scans/sec = very fast
• matrix effects- signal is muted due to complex sample or other unknown processes
32
In order to deconvolute (separate/clean) overlapping peaks, enough mass spectra have to be acquiredto perform the mathematical calculations. With only one spectrum per second this is impossible. That requires:
a) fast scanning detectors like time-of-flight (TOF)b) fast data acquisition hardware/software (DAC/ADC)
The LECO TOF can acquire up to 500 mass spectra per second. For GC-MS 20 spectra/second sufficient for comprehensive GC (GCxGC) up to 200 spectra/sec needed
Source: LECO ChromaTOF Helpfile
Mass SpectrometryWhy is high speed important?
Mass SpectrometryProperties of various mass spectrometers
TOF Quad Ion Trap Orbitrap FT-ICR
Resolving Power very good fair fair very good excellent
Dynamic Range very good excellent fair fair fair
Sensitivity excellent excellent excellent excellent excellent
Speed excellent good excellent good fair
Cost 150-300K 100K 100K 500K 1M
Maintenance ave ave ave ave very high
Data AnalysisGoals
• Identify all peaks
In practice this is very difficult if not impossible
• quantification or semi-quantification of compounds
Often involves comparing -fold changes in samples or groups of samplese.g. wild-type vs knockout plant
Various statistical tests to look for differences in the treatment groupse.g. PCA, MCA, ANOVA
• Huge data files
Data AnalysisIdentifying peaks
• MS libraries can identify peaks (mostly GC/MS), especially when combined with RT information (GC/MS only):
e.g. NIST library
Data AnalysisActivity 1: Identifying peaks
• Can you find sucrose in a MS dataset?
Example: sucrose (C12H22O11)
Data AnalysisActivity 1: Identifying peaks
• Accurate mass can help determine the chemical formula:
Example: sucrose (C12H22O11)
-Determine monoisotopic mass at http://www.chemspider.com/(342.116211 Da)
-Determine M+H from MS adduct excel sheet (class website)(343.123487 Da)
Lets say you find that mass in the dataset, but is it really sucrose?
-Download Molecular weight calculator at http://www.alchemistmatt.com/mwtwin.html-Open formula finder under tools-enter molecular weight target: 342.116211-how many isobars are at 2 ppm? 0.1 ppm-enter 342.116211 at chemspider, how many isomers?
Data Analysis
Example output of a metabolomics experiment
• Open GC-TOF-MS dataset from class website:
-How many compounds identified? How many significant -fold changes-Pathway analysis at http://www.metaboanalyst.ca/MetaboAnalyst/-enter compound names or KEGG IDs for significant -fold changes-choose organism ‘E. coli’ and submit- Which pathways are affected in this dataset?
• Open HILIC-TOF-MS dataset from class website:
-How many compounds identified? How many significant -fold changes-How many unidentified peaks?-Can you identify an unknown peak with a significant fold change