Top Banner
Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul Vora Purohit
22

Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Mar 27, 2015

Download

Documents

Kaitlyn Reeves
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data

Institute on Research and Statistics, Sacramento04/08/04Parul Vora Purohit

Page 2: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Biodata and ‘omics

Genome Project Genomics - Study of Genes Proteomics - Study of proteins Metabolomics - Study of metabolites *

cellomics, CHOmics, chromonoics, etc.

Analytical techniques Microarray Spectroscopy Mass Spectroscopy NMR Spectroscopy *

Page 3: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

NMR Spectroscopy

Curtsey ~ Joseph Medendorp / Public Information / University of Kentucky

Intense homogenous and magnetic field

High Powered RF transmittor capable of delivering short pulses ~ 500 MHz stimulate 1H nuclear spin transitions

Probe which enables the coils used to excite and detect the signal

Plot of signal vs shift in frequency from original pulse

Measured in ppm (ratio from the original signal)

Page 4: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

NMR Data

Allows detection of compounds with H content Shift characterizes the chemicals (metabolites) Examples:

2.14 ppm – glutamine – γ CH2 group 2.27 ppm - valine – β CH group 6.91 ppm – tyrosine – C3, 5H ring

~65,000 points (variables) per sample

Page 5: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Questions

Classification ~ Can we distinguish sick organisms from the healthy ones?

Identification ~ Which metabolites play a role in the disease (biomarker)?

DIFFERENCES IN THE DETAILS!

Page 6: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Abalone Data

A set of 18 abalone 8 healthy, 5 stunted, 5 sick

Tissue from muscle

Questions : Can we classify the abalone accurately ? Can we detect any metabolites that are markers?

Page 7: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Problems / Solutions Multivariate Techniques

Matrix of 65,000 (variables) x 18 (samples)

Too many variables as compared to the number of samples Dimension Reduction by Binning

Classification and metabolite marker identification using PCA and Cluster Analysis

Methods assume that the data is normally distributed with a constant variance

Generalized Log Transformation improves results!

Page 8: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

NMR Data Pre-Processing

Background Subtraction

‘TMSP Peak (standard at 0 ppm removed)

Water Peak Removal 4.72-4.96 ppm removed)

Normalization Integrated Intensity normalized to

1.0 to remove the effects of systematic intensity changes between abalone

Binning / Size

Page 9: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Binned Spectrum

Bin Size Range = 0.00125 ppm – 0.7 ppm

Intensity of Bin = Integrated Intensity of all points in Bin

Restricted Region of interest to 0.2 ppm – 10.0 ppm

Bin Size = .04 ppm

239 Bins

Page 10: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Principal Component Analysis (PCA)

Technique that allows for the explanation of the variance-covariance of the variables in terms of a linear combination of them

X = t1pT1 + t2pT

2 + …+ tkpTk + E pi - eigenvectors

Projections of the original data matrix on these components give the relations between the samples – Scores Plot

A plot of the eigenvectors of the covariance matrix gives a relationship between the variables – Loadings Plot

Reduces the dimension of the problem; a few components suffice to explain the variance

* Courtesy Wise, B. M. and Gallagher, N. B., PLS_Toolbox 2.1

Page 11: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

PCA Results

Scores Plot Loadings Plot

Page 12: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Cluster Analysis - Hierarchical

Transformed Data – Groups Clearly Identified

Untransformed Data

Page 13: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Generalized Log Transformation

Shown* that a transformation of the form

f(y) = ln( y + (y2 + c) )

can lead to a variance stabilizing effect on the data

The parameter c can be obtained by Maximum

Likelihood or ANOVA methods and is ~ of the value

c ~ σ2 / S2

where σ2 is the variance of the noise and S2 the variance of the high peaks

*Durbin, B., Hardin, J., Rocke, D. M., Bioinformatics, 2002, 18, s105-s110

* Sue Geller, Jeff Gregg, Paul Hagerman, David Rocke, Transformation and Normalization of Oligonucleotide Microarray Data, 2003

Page 14: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Maximum Likelihood*

Need replicates to determine accurate the SSE (c)

Find c for the minimum SSE

Find c steps using Newton’s method or educated intervals

* Box, G. and Cox. D.R. (1964) An Analysis of transformations. J. roy. Stat. Soc.. Series B (Methodological), 26, 211.

lvec

SSEv

ec

2.2*10^-7 2.4*10^-7 2.6*10^-7 2.8*10^-7 3*10^-7 3.2*10^-7

1.88

6*10

^-9

1.88

7*10

^-9

1.88

8*10

^-9

1.88

9*10

^-9

c

Err

or S

um o

f S

quar

es

Page 15: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Transformed Spectrum

Bin Size = .04 ppm239 Bins, c = 2.7e-7

Calculate ‘c’ using the replicate data by maximum likelihood methodsUse transformation of the form using replicates,

Transform data to stabilize the variancef(y) = ln( y + (y2 + c) )

Page 16: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Stabilized Variance

Bin Size = .04ppm

Bin Size = .04ppm

C = 2.7E-7

Page 17: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Scores Plot – Transformation Effects

Untransformed Data Transformed Data

Page 18: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Loadings Plot – Transformation Effects

Untransformed Data Transformed Data

Page 19: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Cluster Analysis - Hierarchical

Transformed Data – Groups Clearly Identified

Untransformed Data

Page 20: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Raw Spectra – Significant Bins

Bin 124 – 5.38 ppm Bin 76 – 3.22 ppm

Bin 125 – 5.42 ppm Bin 77 – 3.26 ppm

Bin 126 – 5.46 ppm Bin 78 – 3.3 ppm

Healthy Stunt. SickHealthy Stunt. Sick

Glycogen, Sucrose, Fructose ?

Page 21: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Conclusions

Demonstrated the use of data reduction techniques, multi-variate techniques for studying NMR and Mass Spectrometer data

Demonstrated the use of these techniques to identify metabolite and protein bio-markers

Showed the usefulness of transformations in rendering the data more useful

Page 22: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul.

Acknowledgements

David M. Rocke, CIPIC

David L. Woodruff, CIPIC

Mark R. Viant, U. of Birmingham, U. K.