Top Banner
Proteomic Mass Spectrometry
23

Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Proteomic Mass Spectrometry

Page 2: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Outline

• Previous Research

• Project Goals

• Data and Algorithms

• Experimental Results

• Conclusions

• To Do List

Page 3: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Motivation

• MS spectra has high dimension– Most ML algorithms are incapable of handling

such high dimensional data

• Dimensionality Reduction (DR)– Preserve as much information as possible,

while reducing the dimensionality.

• Feature Extraction (FE)– Removal of irrelevant and/or redundant features

(information)

Page 4: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Previous research

• Usually applies DR then FE• Does Order matter ?

DR: Down Sampling, PCA, WaveletsFE: T-Test, Random Forests, Manual Peak Extraction

• In [conrads03] show that high resolution MS spectra produces better classification accuracy.– Most previous research down samples spectraCONJECTURE: Down Sampling detrimental to

performance.

Page 5: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Project Goals

• Test Down Sampling Conjecture

• Compare FE algorithms (NOTE: Optimal FE is NP-hard !)– Use a simple but fast classifier to test a number

of FE approaches

• Test across different data sets– Are there any clearly superior FE algorithms ?

Page 6: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Three Data Sets

Heart/Kidney (100/100)– 164,168 features, 2 classes

Ovarian Cancer (91/162)– 15,154 features, 2 classes

Prostate Cancer (63/190/26/43)– 15,154 features, 4 classes

• Normal, Benign, Stage 1, Stage 2 Cancer• Transformed into Normal/Benign Vs Cancer (1&2)

Page 7: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Algorithms

Centroid Classifier– given class means P, Q and sample point s

C = argmin (d(P,s), d(Q,s))C = argmin (d(P,s), d(Q,s))

P Q

d(P,s) d(Q,s)

Page 8: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Algorithms

• T-testT-test – do the means of 2 distr. Differ ?

• KS-testKS-test – do the cdf differ ?

• CompositeComposite – (T-test)*(KS-test)

• IFEIFE - Individual Feature Evaluation using the centroid classifier

• DPCADPCA – discriminative principle component analysis

Page 9: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Preliminary Experiments

• Compare normalization approaches

• Compare similarity metrics– Cross correlation– (-L1)– Angular

• Across 3 data sets => 27 configurations

L1à norm;1 à norm;no normalization

Page 10: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Preliminary Experiments (cont)

• No single norm/metric clearly superior on all data sets

• 2-5% increase in performance if suitable normalization and similarity metric chosen (can be up to 10% increase)

• L1-norm with angle similarity metric worked well on Heart/Kidney and Ovarian Cancer sets (easy sets)

• L1-norm and L1-metric best on Prostate 2-class problem (hard set).

Page 11: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Down Sampling

Page 12: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Statistical Tests

• T-test, KS-test, Composite– Ranks features in terms of relevance

• SFS – Sequential Forward Selection– Selects ever increasing feature sets

• I.e., {1}; {1,2}; {1,2,3}; {1,2,3,4}

Page 13: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Heart/Kidney

Page 14: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Ovarian Cancer

Page 15: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Prostate Cancer

Page 16: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Single Feature Classification

• Use each feature to classify test samples

• Rank features in terms of performance

• SFS

Page 17: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Performance Comparison

Performance of Statistical Feature Extraction on Heart/Kidney Data Set

0.75

0.8

0.85

0.9

0.95

1

None(164168)

KS-test (2601)

T-test (887)

Composite(1810)

Best SingleFeature

SFS-L1(3) SFS-Angle(70)

Feature Extraction M ethod (# of Features)

Ac

cu

rac

y

Page 18: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Performance Comparison

Performance of Statistical Feature Extraction on Ovarian Cancer Set

0.75

0.8

0.85

0.9

0.95

1

None(15154)

KS-test(12)

T-test (91) Composite(20)

BestSingle

Feature

SFS-L1(48)

SFS-Angle(14)

Feature Extraction Method (# of Features)

Acc

ura

cy

Page 19: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Performance Comparison

Performance of Statistical Feature Extraction on Prostate Cancer Set

0.50.55

0.60.65

0.70.75

0.80.85

0.90.95

1

Feature Extraction Method (# of Features)

Ac

cu

rac

y

Page 20: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Summary

• For each data set, for each FE algorithm ran 15,000 3-fold cross validation experiments.

• Total of 810,000 FE experiments ran

• DE experiments ~ 100,000 experiments

• Additional 50,000 experiments using DPCA classifier

• did not produce significantly different results than the centroid classifier

Page 21: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Conclusions• HK and Ovarian Data sets considerably easier to classify than

Prostate Cancer• Feature Extraction (in general) significantly improves

performance on all data sets• No single technique superior on all data sets.

– Best Performance using SFS with feature weighting– Smallest feature set with T-test of KS-Test– Composite test inferior to all others.

• Down Sampling appears to be detrimental – What about other Dim. Red. Techniques ?

• E.g. PCA and Wavelets

Page 22: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

Conclusions

• Down Sampling appears to be detrimental – What about other Dim. Red. Techniques ?

• E.g. PCA and Wavelets

• What about FE after Down Sampling ?– On Prostate data performance appears to drop

w.r.t. to best single feature.

Page 23: Proteomic Mass Spectrometry. Outline Previous Research Project Goals Data and Algorithms Experimental Results Conclusions To Do List.

To Do List

• Check PCA, Wavelets and other DR techniques• Use other (better) classifiers• General Hypothesis

– Use a simple fast classifier together with FE techniques to extract a good feature set

– Replace classifier with a more effective one.• Need to verify that other classifiers respond well to the extracted

features.