Top Banner
Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte July 24, 2013
44

Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Feb 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Pre-processing of LC-MS Untargeted Metabolomics Data

Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

July 24, 2013

Page 2: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Outline  •  Introduction •  Basic steps of data pre-processing

•  Software tools •  Databases

2  

Page 3: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Raw LC-MS data

3  

Page 4: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Raw LC-MS data

TIC

MS and MS2 spectra

4  

Page 5: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Goals of pre-processing  •  Extract qualitative and quantitative information of possible

metabolites –  Determine the identity

–  Estimate the relative abundance

•  Align samples to correct retention time shifts •  Produce a table of possible metabolites with their

quantitative information for subsequent statistical analysis

5  

Page 6: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

•  Feature: a 3D signal induced by a single ion species (e.g. [M+H]+ or [M-H]- of a compound)

Work flow

feature grouping

feature filtering

feature detection

raw MS data

feature annotation

alignment stats feature identification

6  

Page 7: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Features

http://www.spectrolyzer.com/fileadmin/manual/html/Manual233.png 7  

Page 8: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Features

8  

Page 9: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Feature detection

•  Achieves suppression of noise, metabolite ID and quantification

•  Two steps –  Separation of mass traces

•  Binning

•  Region of interest (ROI)

–  Detection of chromatographic features

•  Binning –  Partition the mass-vs-RT map into bins of fixed width

–  Difficult to estimate optimal bin width

Too small à split features Too wide à possible feature merging

9  

Page 10: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Binning

Simple method: partition into bins of fixed width (e.g., 0.025 m/z) Bins should be on the order of mass accuracy of instrument

Step 1: Separation of mass signals

10  Tautenhahn, R.; Bottcher, C.; Neumann, S., Highly sensitive feature detection for high resolution LC/MS. BMC bioinformatics 2008, 9, 504.

Page 11: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

ROI !"#$%&'()*+

11  

Page 12: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

•  Use wavelet transform

Detect chromatographic peaks

!"#$%$&'(")$*'+$",'+-.,-/0

12  

Page 13: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

•  Feature quality measures –  S/N

–  Feature width

–  Abundance

•  Feature grouping –  Similarity measure: normalized dot product

Feature filtering and grouping

13  

!"#$%&!'(')#*'+,&-#.%/&0%#1&2*34$'56

Page 14: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Feature annotation

Formula N Mass shift

[M+H]+ 1 1.007276

[M+2H]+ 1 2.014552

[M+3H]+ 1 3.021828

[M+Na]+ 1 22.98977

[M+K]+ 1 38.963708

[M-C3H9N]+ 1 -59.073499

[M+2Na-H]+ 1 44.96563

[2M+Na]+ 2 22.98977

[M+H-NH3]+ 1 -16.01872

[2M+H]+ 2 1.007276

[M-OH]+ 1 -17.0028

m/z = (N*compound mass+mass shift)/CS

•  using the R package CAMERA •  http://bioconductor.org/packages/devel/bioc/

html/CAMERA.html

Ralf Tautenhahn, Christoph Böttcher and Steffen Neumann. Annotation of LC/ESI-MS Mass Signals. Bioinformatics Research and Development, Springer LNBI 4414, 2007.

14  

Page 15: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Alignment

15  

•  Goal: Correct retention time shift from run to run

Smith, C. A.; Want, E. J.; O'Maille, G.; Abagyan, R.; Siuzdak, G., XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical chemistry 2006, 78 (3), 779-87.

Page 16: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Alignment

16  

•  Alignment method in XCMS –  Use “well-behaved” peak groups

–  For each well-behaved group, calculate the median retention time and, for every sample, a deviation from that median

–  Within a sample, the deviation generally changes over time in a nonlinear fashion.

–  Those changes are approximated using a local polynomial regression (loess).

Page 17: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Alignment

17  

Page 18: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Software tools

Vendor Format Viz Software

Agilent .d MassHunter Mass Profiler Pro

AB Sciex .wiff MarkerView

Waters .raw MassLynx MarkerLynx

Bruker .d, YEP, BAF, FID Compass Metabolic Profiler

Thermo .raw Xcalibur

18  

•  Proprietary

•  Freely available –  XCMS -- MAVEN

–  MZmine -- MetAlign

Page 19: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

File format conversion

•  Convert files to mzXML using MSConvert

http://proteowizard.sourceforge.net/ 19  

Page 20: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Raw data viz in Incilicos Viewer

20  

Page 21: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

XCMS

•  Free and open source •  Written in R

Smith, C. A.; Want, E. J.; O'Maille, G.; Abagyan, R.; Siuzdak, G., XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical chemistry 2006, 78 (3), 779-87.

Tautenhahn, R.; Bottcher, C.; Neumann, S., Highly sensitive feature detection for high resolution LC/MS. BMC bioinformatics 2008, 9, 504.

21  

Filter and Identify PeaksxcmsSet()

Match Peaks Across Samples

group()

Fill in Missing Peak DatafillPeaks()

Statistically Analyze Resultsdifffreport()

Retention TimeCorrectionretcor()

Visualize Important Peaks

getEIC()

RawLC/MSDataNetCDF

mzXML

mzData

Figure 1: Flow chart showing a high-level overview of the preprocessing/analysismethodology employed by xcms . Function/method names corresponding to each stepare also given.

2

Page 22: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

XCMS Online

Tautenhahn, R.; Patti, G. J.; Rinehart, D.; Siuzdak, G., XCMS Online: a web-based platform to process untargeted metabolomic data. Analytical chemistry 2012, 84 (11), 5035-9.

Page 23: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

XCMS Online

•  Workflow

23  

Page 24: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

XCMS Online

24  

Page 25: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

XCMS Online

25  

Page 26: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Feature identification

26  

•  Current bottleneck of metabolomics •  Use three pieces of information

–  molecular mass

–  retention time –  MS/MS spectra

•  Match with metabolites in databases

Page 27: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Feature identification

27  Courtesy of ASMS 2013 short course on metabolomics

Page 28: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Databases

28  

•  Spectral databases –  NIST –  HMDB –  Metlin –  Fiehn GC-MS Database –  BMRB –  MMCD –  MassBank –  Golm Metabolome Database

•  Compound databases –  PubChem –  ChEBI –  ChemSpider –  KEGG Glycan –  LIPID MAPS

•  Pathway databases –  KEGG –  MetaCyc –  HumanCyc –  BioCyc –  Reactome

•  Drug databases –  DrugBank –  PharmGKB –  SuperTarget –  Therapeutic Target DB –  STITCH

http://masspec.scripps.edu/metabo_science/metadbase.php http://www.hmdb.ca/databases

Page 29: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

HMDB

29  

•  http://www.hmdb.ca •  HMDB is a freely available electronic database containing

detailed information about small molecule metabolite found in the human body.

Page 30: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

HMDB: metabolite statistics

30  

Page 31: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

HMDB: metabolite statistics

31  

Page 32: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

HMDB: spectra statistics

32  

Page 33: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

HMDB: search

33  

Page 34: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

HMDB: MS search

34  

Page 35: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

HMDB: MS search result

35  

Page 36: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Metlin

36  

•  All data acquired on Agilent Q-TOF •  Search by MS, MS/MS, fragment,

and neutral loss

Page 37: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Metlin search

37  

Page 38: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Metlin search result

38  

Page 39: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

MassBank

39  

•  http://www.massbank.jp •  Public repository of mass spectral data

•  Data from different platforms

Page 40: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

MassBank: data sources

40  

Page 41: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

MassBank: database services

41  

Page 42: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Identification, again

42  

Stein, S., Mass spectral reference libraries: an ever-expanding resource for chemical identification. Analytical chemistry 2012, 84 (17), 7274-82.

“…there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don't know we don't know.”

Page 43: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

Acknowledgement

43  

•  Wenchao Zhang, Ph.D., previously a postdoc in the Du Lab

Page 44: Pre-processing of LC-MS Untargeted Metabolomics Data · 2013-08-02 · Pre-processing of LC-MS Untargeted Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University

44  

Thank you!