Top Banner
Classification of Classification of multiple cancer types multiple cancer types by multicategory by multicategory support vector support vector machines using gene machines using gene expression data expression data
24

Support Vector Machine

Mar 17, 2016

Download

Documents

nanji

Classification of multiple cancer types by multicategory support vector machines using gene expression data. Support Vector Machine. A classification method which successfully diagnosis cancer problems Two types - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Support Vector Machine

Classification of multiple Classification of multiple cancer types by cancer types by

multicategory support vector multicategory support vector machines using gene machines using gene

expression dataexpression data

Page 2: Support Vector Machine

Support Vector MachineSupport Vector Machine A classification method which A classification method which

successfully diagnosis cancer problemssuccessfully diagnosis cancer problems Two types Two types

Binary SVM:Binary SVM: optimal extension to more than optimal extension to more than two classes not seen therefore limitation on two classes not seen therefore limitation on its application to multiple tumor typesits application to multiple tumor types

Multicategory SVM:Multicategory SVM: (recently proposed) (recently proposed) Demonstrated on leukemia data and small Demonstrated on leukemia data and small round blue cells of childhood tumor.round blue cells of childhood tumor.

Page 3: Support Vector Machine

DNA microarray techonologyDNA microarray techonologyThis method measures the relative amount This method measures the relative amount

of mRNA in isolated cells or biosped of mRNA in isolated cells or biosped tissuestissues

Uses SVM, solves a series of binary Uses SVM, solves a series of binary problems- DAG SVM algorithmproblems- DAG SVM algorithm

MSVM is applied to two gene expression MSVM is applied to two gene expression data setsdata sets

Page 4: Support Vector Machine

FeaturesFeatures EffectivenessEffectiveness

Prediction strengthPrediction strength

Effect of data preprocessingEffect of data preprocessing

Gene selectionGene selection

Dimension reductionDimension reduction

Page 5: Support Vector Machine
Page 6: Support Vector Machine

Binary SVMBinary SVM

Page 7: Support Vector Machine

MSVMMSVM

Page 8: Support Vector Machine

Procedure- 3 class problemProcedure- 3 class problemGene expression was monitored for Gene expression was monitored for

classification of 2 leukemias ALL acute classification of 2 leukemias ALL acute lymphoblastic leukemia) and AML ( acute lymphoblastic leukemia) and AML ( acute myeloid leukemia) myeloid leukemia)

ALL ALL B-cellB-cellT-cellT-cell

Page 9: Support Vector Machine

Procedure conc.Procedure conc.Number of genes 7129Number of genes 712938 samples- training set38 samples- training set34 samples- test set34 samples- test setPreprocessing steps performedPreprocessing steps performed

Thresholding(floor-100, ceiling 16000)Thresholding(floor-100, ceiling 16000)Filtering of genes (max/min <= 5 and max-Filtering of genes (max/min <= 5 and max-

min< =500)min< =500)Base 10 logarithmic transformationBase 10 logarithmic transformation

Page 10: Support Vector Machine

Procedure conc.Procedure conc.Standardization of each variableStandardization of each variableVariable selectionVariable selection

Prescreening measure – ratio of between Prescreening measure – ratio of between classes sum of squares to within class sum of classes sum of squares to within class sum of squares for each gene( largest ratios taken)squares for each gene( largest ratios taken)

Page 11: Support Vector Machine

Heat Map of 40 most important Heat Map of 40 most important genes in training setgenes in training set

Page 12: Support Vector Machine

Small round blue cell tumors data Small round blue cell tumors data (SRBCTs)(SRBCTs)

4 types4 typesNeuroblastoma (NB)Neuroblastoma (NB)

Rhabdomyosarcoma (RMS)Rhabdomyosarcoma (RMS)

Non Hodgkin lymphoma (NHL)Non Hodgkin lymphoma (NHL)

Ewing family of tumors ( EWS)Ewing family of tumors ( EWS)

Page 13: Support Vector Machine

Used Artificial Neural Networks (ANN)Used Artificial Neural Networks (ANN)

Training set – 63 samplesTraining set – 63 samples

Test set – 20 samplesTest set – 20 samples

Nearest Neighbor, weighted voting , linear SVM was Nearest Neighbor, weighted voting , linear SVM was applied to dataapplied to data

MSVM was applied for comparisonMSVM was applied for comparison

Logarithm base 10 of expression levelsLogarithm base 10 of expression levels

Page 14: Support Vector Machine
Page 15: Support Vector Machine

Predicted decision vectorsPredicted decision vectors

Page 16: Support Vector Machine

SANNSANNFor multiclass classificationFor multiclass classificationClassification results superior to ANNClassification results superior to ANN ANN uses back propagation algorithmANN uses back propagation algorithmWhy ?Why ?

Non linear connectionsNon linear connections Inclusion of interactions within independent Inclusion of interactions within independent

variables input)variables input) Independence from conventional processesIndependence from conventional processes

Page 17: Support Vector Machine

LimitationsLimitations

Learned knowledge is contained Learned knowledge is contained 100’s-1000’s weights (synapses)100’s-1000’s weights (synapses)

Cannot be analyzed in a single Cannot be analyzed in a single regression formularegression formula

Page 18: Support Vector Machine

Combining several ANNsCombining several ANNsThrough ensembles of networksThrough ensembles of networks

An ensemble: collection of finite number of An ensemble: collection of finite number of different classifiersdifferent classifiers

Cascading ANNsCascading ANNs

Page 19: Support Vector Machine

Two level ANNTwo level ANN

Task : Chest RadiogramsTask : Chest Radiograms

Lung Nodules( Class A)Lung Nodules( Class A)

Without Lung Nodules( Class B)Without Lung Nodules( Class B)

Page 20: Support Vector Machine

Two level architecture carrying lower Two level architecture carrying lower level and higher level conceptslevel and higher level concepts

Task: differentiate (higher level)Task: differentiate (higher level)Normal cells (class A) Normal cells (class A) From malignant cells (class B) (lower level)From malignant cells (class B) (lower level)

Class B_1Class B_1Class B_2Class B_2Class B_3Class B_3Class B_4Class B_4

Page 21: Support Vector Machine

One vs. allOne vs. allUsed with SVMUsed with SVM

K binary classes- distinguish one class K binary classes- distinguish one class from all lumped togetherfrom all lumped together

Sample assigned to classifier achieving Sample assigned to classifier achieving greatest output activitygreatest output activity

Page 22: Support Vector Machine

ALL Pairs approachALL Pairs approachBuilds K(K-1)/2 Binary classifiersBuilds K(K-1)/2 Binary classifiers

K-1 binary classifiers distinguish from K-1 binary classifiers distinguish from other classifiersother classifiers

Output activities summed up –class with Output activities summed up –class with greatest activity is the winning classgreatest activity is the winning class

Page 23: Support Vector Machine

SANNSANNOriented to human decision makingOriented to human decision making

Exclusion performed- preferences Exclusion performed- preferences narrowed downnarrowed down

Classification made by first ANN is a Classification made by first ANN is a preselection for second successive ANNpreselection for second successive ANN

Page 24: Support Vector Machine

ReferencesReferenceshttp://info.cchmc.org/presentations/ylee_1http://info.cchmc.org/presentations/ylee_1

3Dec02.pdf3Dec02.pdf