Top Banner
Object Orie’d Data Analysis, Last Time • Kernel Embedding – Embed data in higher dimensional manifold – Gives greater flexibility to linear methods • Support Vector Machines – Aimed at very non-Gaussian Data – E.g. from Kernel Embedding • Distance Weighted Discrimination – HDLSS Improvement of SVM
76

Object Orie’d Data Analysis, Last Time

Dec 31, 2015

Download

Documents

lesley-good

Object Orie’d Data Analysis, Last Time. Kernel Embedding Embed data in higher dimensional manifold Gives greater flexibility to linear methods Support Vector Machines Aimed at very non-Gaussian Data E.g. from Kernel Embedding Distance Weighted Discrimination HDLSS Improvement of SVM. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Object Orie’d Data Analysis, Last Time

Object Orie’d Data Analysis, Last Time

• Kernel Embedding– Embed data in higher dimensional

manifold

– Gives greater flexibility to linear methods

• Support Vector Machines– Aimed at very non-Gaussian Data

– E.g. from Kernel Embedding

• Distance Weighted Discrimination– HDLSS Improvement of SVM

Page 2: Object Orie’d Data Analysis, Last Time

Support Vector MachinesGraphical View, using Toy Example:

• Find separating plane

• To maximize distances from data to plane

• In particular smallest distance

• Data points closest are called

support vectors

• Gap between is called margin

Page 3: Object Orie’d Data Analysis, Last Time

Support Vector MachinesGraphical View, using Toy Example:

Page 4: Object Orie’d Data Analysis, Last Time

Support Vector MachinesForgotten last time,

Important Extension:

Multi-Class SVMs

Hsu & Lin (2002)

Lee, Lin, & Wahba (2002)

• Defined for “implicit” version

• “Direction Based” variation???

Page 5: Object Orie’d Data Analysis, Last Time

Support Vector MachinesAlso forgotten last time,

Toy examples illustrating

Explicit vs. Implicit

Kernel Embedding

As well as effect of window width, σ

on Gaussian kernel embedding

Page 6: Object Orie’d Data Analysis, Last Time

SVMs, Comput’n & Embedding

For an “Embedding Map”,

e.g.

Explicit Embedding:

Maximize:

Get classification function:

• Straightforward application of embedding

• But loses inner product advantage

x

2x

xx

i ji

jijijiiD xxyyL,

21

n

iiii bxxyxf

1

Page 7: Object Orie’d Data Analysis, Last Time

SVMs, Comput’n & EmbeddingImplicit Embedding:

Maximize:

Get classification function:

• Still defined only via inner products• Retains optimization advantage• Thus used very commonly• Comparison to explicit embedding?• Which is “better”???

i ji

jijijiiD xxyyL,

21

n

iiii bxxyxf

1

Page 8: Object Orie’d Data Analysis, Last Time

Support Vector MachinesTarget Toy Data set:

Page 9: Object Orie’d Data Analysis, Last Time

Support Vector MachinesExplicit Embedding, window σ = 0.1:

Page 10: Object Orie’d Data Analysis, Last Time

Support Vector MachinesExplicit Embedding, window σ = 1:

Page 11: Object Orie’d Data Analysis, Last Time

Support Vector MachinesExplicit Embedding, window σ = 10:

Page 12: Object Orie’d Data Analysis, Last Time

Support Vector MachinesExplicit Embedding, window σ = 100:

Page 13: Object Orie’d Data Analysis, Last Time

Support Vector MachinesNotes on Explicit Embedding:

• Too small Poor generalizability

• Too big miss important regions

• Classical lessons from kernel smoothing

• Surprisingly large “reasonable region”

• I.e. parameter less critical (sometimes?)

Also explore projections (in kernel space)

Page 14: Object Orie’d Data Analysis, Last Time

Support Vector MachinesKernel space projection, window σ =

0.1:

Page 15: Object Orie’d Data Analysis, Last Time

Support Vector MachinesKernel space projection, window σ = 1:

Page 16: Object Orie’d Data Analysis, Last Time

Support Vector MachinesKernel space projection, window σ =

10:

Page 17: Object Orie’d Data Analysis, Last Time

Support Vector MachinesKernel space projection, window σ =

100:

Page 18: Object Orie’d Data Analysis, Last Time

Support Vector MachinesKernel space projection, window σ =

100:

Page 19: Object Orie’d Data Analysis, Last Time

Support Vector MachinesNotes on Kernel space projection:

• Too small – Great separation

– But recall, poor generalizability

• Too big no longer separable

• As above:– Classical lessons from kernel smoothing

– Surprisingly large “reasonable region”

– I.e. parameter less critical (sometimes?)

Also explore projections (in kernel space)

Page 20: Object Orie’d Data Analysis, Last Time

Support Vector MachinesImplicit Embedding, window σ = 0.1:

Page 21: Object Orie’d Data Analysis, Last Time

Support Vector MachinesImplicit Embedding, window σ = 0.5:

Page 22: Object Orie’d Data Analysis, Last Time

Support Vector MachinesImplicit Embedding, window σ = 1:

Page 23: Object Orie’d Data Analysis, Last Time

Support Vector MachinesImplicit Embedding, window σ = 10:

Page 24: Object Orie’d Data Analysis, Last Time

Support Vector MachinesNotes on Implicit Embedding:

• Similar Large vs. Small lessons

• Range of “reasonable results”

Seems to be smaller

(note different range of windows)

• Much different “edge” behavior

Interesting topic for future work…

Page 25: Object Orie’d Data Analysis, Last Time

Distance Weighted Discrim’n 2-d Visualization:

Pushes PlaneAway FromData

All PointsHave SomeInfluence

n

i iw r1,

1min

Page 26: Object Orie’d Data Analysis, Last Time

Distance Weighted Discrim’n References for more on DWD:

• Current paper:Marron, Todd and Ahn (2007)

• Links to more papers:Ahn (2007)

• JAVA Implementation of DWD:caBIG (2006)

• SDPT3 Software:Toh (2007)

Page 27: Object Orie’d Data Analysis, Last Time

Batch and Source Adjustment

Recall from Class Notes 8/28/07• For Stanford Breast Cancer Data (C.

Perou)• Analysis in Benito, et al (2004)

Bioinformatics, 20, 105-114. https://genome.unc.edu/pubsup/dwd/

• Adjust for Source Effects– Different sources of mRNA

• Adjust for Batch Effects– Arrays fabricated at different times

Page 28: Object Orie’d Data Analysis, Last Time

Source Batch Adj: Biological Class Col. &

Symbols

Page 29: Object Orie’d Data Analysis, Last Time

Source Batch Adj: Source Colors

Page 30: Object Orie’d Data Analysis, Last Time

Source Batch Adj: PC 1-3 & DWD direction

Page 31: Object Orie’d Data Analysis, Last Time

Source Batch Adj: DWD Source Adjustment

Page 32: Object Orie’d Data Analysis, Last Time

Source Batch Adj: Source Adj’d, PCA view

Page 33: Object Orie’d Data Analysis, Last Time

Source Batch Adj: S. & B Adj’d, Adj’d PCA

Page 34: Object Orie’d Data Analysis, Last Time

3434

UNC, Stat & OR

Why not adjust using SVM?

Major Problem: Proj’d Distrib’al

Shape

Triangular Dist’ns (opposite skewed)

Does not allow sensible rigid shift

Page 35: Object Orie’d Data Analysis, Last Time

3535

UNC, Stat & OR

Why not adjust using SVM?

Nicely Fixed by DWD

Projected Dist’ns near Gaussian

Sensible to shift

Page 36: Object Orie’d Data Analysis, Last Time

3636

UNC, Stat & OR

Why not adjust by means?

DWD is complicated: value added?

Xuxin Liu example…

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

(although still not perfect)

Page 37: Object Orie’d Data Analysis, Last Time

3737

UNC, Stat & OR

Why not adjust by means?

Next time:

Work in before and after, slides like

138-141 from DWDnormPreso.ppt

In Research/Bioinf/caBIG

Page 38: Object Orie’d Data Analysis, Last Time

3838

UNC, Stat & OR

Twiddle ratios of subtypes

Page 39: Object Orie’d Data Analysis, Last Time

3939

UNC, Stat & OR

Why not adjust by means?

DWD robust against non-

proportional subtypes…

Mathematical Statistical Question:

Are there mathematics behind this?

(will answer next time…)

Page 40: Object Orie’d Data Analysis, Last Time

4040

UNC, Stat & OR

DWD in Face Recognition

Face Images as Data

(with M. Benito & D. Peña)

Male – Female Difference?

Discrimination Rule?

Represented as long

vector of pixel gray levels

Registration is critical

Page 41: Object Orie’d Data Analysis, Last Time

4141

UNC, Stat & OR

DWD in Face Recognition, (cont.)

Registered Data

Shifts and scale

Manually chosen

To align eyes and mouth

Still large variation

See males vs. females???

Page 42: Object Orie’d Data Analysis, Last Time

4242

UNC, Stat & OR

DWD in Face Recognition , (cont.)

DWD Direction

Good separation

Images “make

sense”

Garbage at ends?

(extrapolation

effects?)

Page 43: Object Orie’d Data Analysis, Last Time

4343

UNC, Stat & OR

DWD in Face Recognition , (cont.)

Unregistered Version

Much blurrier

Since features don’t

properly line up

Nonlinear Variation

But DWD still works

Can see M-F

differ’ce?

Page 44: Object Orie’d Data Analysis, Last Time

4444

UNC, Stat & OR

DWD in Face Recognition , (cont.)

Interesting summary:

Jump between

means

(in DWD direction)

Clear separation of

Maleness vs.

Femaleness

Page 45: Object Orie’d Data Analysis, Last Time

4545

UNC, Stat & OR

DWD in Face Recognition , (cont.)

Fun Comparison:

Jump between means

(in SVM direction)

Also distinguishes

Maleness vs.

Femaleness

But not as well as

DWD

Page 46: Object Orie’d Data Analysis, Last Time

4646

UNC, Stat & OR

DWD in Face Recognition , (cont.)

Analysis of difference: Project onto normals SVM has “small gap” (feels noise artifacts?) DWD “more informative” (feels real structure?)

Page 47: Object Orie’d Data Analysis, Last Time

4747

UNC, Stat & OR

DWD in Face Recognition, (cont.)

Current Work:

Focus on “drivers”:

(regions of interest)

Relation to Discr’n?

Which is “best”?

Lessons for human

perception?

Page 48: Object Orie’d Data Analysis, Last Time

Outcomes DataBreast Cancer Study (C. M. Perou):

• Outcome of interest = death or survival

• Connection with gene expression?

Approach:

• Treat death vs. survival during study as “classes”

• Find “direction that best separates the classes”

Page 49: Object Orie’d Data Analysis, Last Time

Outcomes DataFind “direction that best separates the classes”

Page 50: Object Orie’d Data Analysis, Last Time

Outcomes DataFind “direction that best separates the classes”

Page 51: Object Orie’d Data Analysis, Last Time

Outcomes DataFind “direction that best separates classes”• DWD Projection• SVM Projection

Notes:• SVM is “better separated”?

(recall “data piling” problems….)• DWD gives “more spread between sub-

populations”???(perhaps “more stable”?)

Page 52: Object Orie’d Data Analysis, Last Time

Outcomes DataWhich is “better”?

Approach:

• Find “genes of interest”

• To maximize loadings of direction vectors

(reflects pointing in gene direction)

• Show intensity plot (of gene expression)

• Using top 20 genes in each direction

Page 53: Object Orie’d Data Analysis, Last Time

Outcomes DataWhich is “better”?

• Study with gene intensity plot

• Order cases by DWD score (proj’n)

• Order genes by DWD loading (vec. entry)

• Reduce to top & bottom 20

• Color map shows gene expression

• Shows genes that drive classification

• Gene names also available

Page 54: Object Orie’d Data Analysis, Last Time

Outcomes DataWhich is “better”? DWD direction

Page 55: Object Orie’d Data Analysis, Last Time

Outcomes DataWhich is “better”? SVM direction

Page 56: Object Orie’d Data Analysis, Last Time

Outcomes DataWhich is “better”?

• DWD finds genes showing better separation

• SVM genes are less informative

Page 57: Object Orie’d Data Analysis, Last Time

Outcomes DataHow about Centroid (Mean Diff’nce) Method?

Page 58: Object Orie’d Data Analysis, Last Time

Outcomes DataHow about Centroid (Mean Diff’nce) Method?

Page 59: Object Orie’d Data Analysis, Last Time

Outcomes DataCompare to DWD direction

Page 60: Object Orie’d Data Analysis, Last Time

Outcomes DataHow about Centroid (Mean Diff’nce) Method?

• Best yet, in terms of red – green plot?

• Projections unacceptably mixed?

• These are two different goals…

• Try for trade-off?

Scale space approach???

• Interesting philosophical point:

Very simple things often “best”

Page 61: Object Orie’d Data Analysis, Last Time

Outcomes DataWeakness of above analysis:

• Some with “genes prone to disease” have not died yet

• Perhaps can see in DWD expression plot?

Better analysis:

• More sophisticated survival methods

• Work in progress w/ Brent Johnson, Danyu Li, Helen Zhang

Page 62: Object Orie’d Data Analysis, Last Time

Distance Weighted Discrim’n 2=d Visualization:

Pushes PlaneAway FromData

All PointsHave SomeInfluence

n

i iw r1,

1min

Page 63: Object Orie’d Data Analysis, Last Time

Distance Weighted Discrim’n Maximal Data Piling

Page 64: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Main idea:

Comparison of

• SVM (Support Vector Machine)

• DWD (Distance Weighted Discrimination)

• MD (Mean Difference, a.k.a. Centroid)

Linear versions, across dimensions

Page 65: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Overall Approach:

• Study different known phenomena– Spherical Gaussians

– Outliers

– Polynomial Embedding

• Common Sample Sizes

• But wide range of dimensions

25 nn

1600,400,100,40,10d

Page 66: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Spherical Gaussians:

Page 67: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Spherical Gaussians:• Same setup as before• Means shifted in dim 1 only,• All methods pretty good• Harder problem for higher dimension• SVM noticeably worse• MD best (Likelihood method)• DWD very close to MD• Methods converge for higher

dimension??

2.21

Page 68: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Outlier Mixture:

Page 69: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Outlier Mixture:80% dim. 1 , other dims 020% dim. 1 ±100, dim. 2 ±500, others 0• MD is a disaster, driven by outliers• SVM & DWD are both very robust• SVM is best• DWD very close to SVM (insig’t

difference)• Methods converge for higher dimension??

Ignore RLR (a mistake)

2.21

Page 70: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Wobble Mixture:

Page 71: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Wobble Mixture:80% dim. 1 , other dims 020% dim. 1 ±0.1, rand dim ±100, others

0• MD still very bad, driven by outliers• SVM & DWD are both very robust• SVM loses (affected by margin push)• DWD slightly better (by w’ted influence)• Methods converge for higher dimension??

Ignore RLR (a mistake)

2.21

Page 72: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Nested Spheres:

Page 73: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n SimulationsNested Spheres:

1st d/2 dim’s, Gaussian with var 1 or C2nd d/2 dim’s, the squares of the 1st dim’s(as for 2nd degree polynomial embedding)

• Each method best somewhere• MD best in highest d (data non-Gaussian)• Methods not comparable (realistic)• Methods converge for higher

dimension??• HDLSS space is a strange place

Ignore RLR (a mistake)

Page 74: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n SimulationsConclusions:

• Everything (sensible) is best sometimes• DWD often very near best• MD weak beyond Gaussian

Caution about simulations (and examples):• Very easy to cherry pick best ones• Good practice in Machine Learning

– “Ignore method proposed, but read paper for useful comparison of

others”

Page 75: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Caution: There are additional players

E.g. Regularized Logistic Regression

looks also very competitive

Interesting Phenomenon:

All methods come together

in very high dimensions???

Page 76: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Can we say more about:

All methods come together

in very high dimensions???

Mathematical Statistical Question:

Mathematics behind this???

(will answer next time)