Object Orie’d Data Analysis, Last Time
Post on 31-Dec-2015
30 Views
Preview:
DESCRIPTION
Transcript
Object Orie’d Data Analysis, Last Time
• Kernel Embedding– Embed data in higher dimensional
manifold
– Gives greater flexibility to linear methods
• Support Vector Machines– Aimed at very non-Gaussian Data
– E.g. from Kernel Embedding
• Distance Weighted Discrimination– HDLSS Improvement of SVM
Support Vector MachinesGraphical View, using Toy Example:
• Find separating plane
• To maximize distances from data to plane
• In particular smallest distance
• Data points closest are called
support vectors
• Gap between is called margin
Support Vector MachinesGraphical View, using Toy Example:
Support Vector MachinesForgotten last time,
Important Extension:
Multi-Class SVMs
Hsu & Lin (2002)
Lee, Lin, & Wahba (2002)
• Defined for “implicit” version
• “Direction Based” variation???
Support Vector MachinesAlso forgotten last time,
Toy examples illustrating
Explicit vs. Implicit
Kernel Embedding
As well as effect of window width, σ
on Gaussian kernel embedding
SVMs, Comput’n & Embedding
For an “Embedding Map”,
e.g.
Explicit Embedding:
Maximize:
Get classification function:
• Straightforward application of embedding
• But loses inner product advantage
x
2x
xx
i ji
jijijiiD xxyyL,
21
n
iiii bxxyxf
1
SVMs, Comput’n & EmbeddingImplicit Embedding:
Maximize:
Get classification function:
• Still defined only via inner products• Retains optimization advantage• Thus used very commonly• Comparison to explicit embedding?• Which is “better”???
i ji
jijijiiD xxyyL,
21
n
iiii bxxyxf
1
Support Vector MachinesTarget Toy Data set:
Support Vector MachinesExplicit Embedding, window σ = 0.1:
Support Vector MachinesExplicit Embedding, window σ = 1:
Support Vector MachinesExplicit Embedding, window σ = 10:
Support Vector MachinesExplicit Embedding, window σ = 100:
Support Vector MachinesNotes on Explicit Embedding:
• Too small Poor generalizability
• Too big miss important regions
• Classical lessons from kernel smoothing
• Surprisingly large “reasonable region”
• I.e. parameter less critical (sometimes?)
Also explore projections (in kernel space)
Support Vector MachinesKernel space projection, window σ =
0.1:
Support Vector MachinesKernel space projection, window σ = 1:
Support Vector MachinesKernel space projection, window σ =
10:
Support Vector MachinesKernel space projection, window σ =
100:
Support Vector MachinesKernel space projection, window σ =
100:
Support Vector MachinesNotes on Kernel space projection:
• Too small – Great separation
– But recall, poor generalizability
• Too big no longer separable
• As above:– Classical lessons from kernel smoothing
– Surprisingly large “reasonable region”
– I.e. parameter less critical (sometimes?)
Also explore projections (in kernel space)
Support Vector MachinesImplicit Embedding, window σ = 0.1:
Support Vector MachinesImplicit Embedding, window σ = 0.5:
Support Vector MachinesImplicit Embedding, window σ = 1:
Support Vector MachinesImplicit Embedding, window σ = 10:
Support Vector MachinesNotes on Implicit Embedding:
• Similar Large vs. Small lessons
• Range of “reasonable results”
Seems to be smaller
(note different range of windows)
• Much different “edge” behavior
Interesting topic for future work…
Distance Weighted Discrim’n 2-d Visualization:
Pushes PlaneAway FromData
All PointsHave SomeInfluence
n
i iw r1,
1min
Distance Weighted Discrim’n References for more on DWD:
• Current paper:Marron, Todd and Ahn (2007)
• Links to more papers:Ahn (2007)
• JAVA Implementation of DWD:caBIG (2006)
• SDPT3 Software:Toh (2007)
Batch and Source Adjustment
Recall from Class Notes 8/28/07• For Stanford Breast Cancer Data (C.
Perou)• Analysis in Benito, et al (2004)
Bioinformatics, 20, 105-114. https://genome.unc.edu/pubsup/dwd/
• Adjust for Source Effects– Different sources of mRNA
• Adjust for Batch Effects– Arrays fabricated at different times
Source Batch Adj: Biological Class Col. &
Symbols
Source Batch Adj: Source Colors
Source Batch Adj: PC 1-3 & DWD direction
Source Batch Adj: DWD Source Adjustment
Source Batch Adj: Source Adj’d, PCA view
Source Batch Adj: S. & B Adj’d, Adj’d PCA
3434
UNC, Stat & OR
Why not adjust using SVM?
Major Problem: Proj’d Distrib’al
Shape
Triangular Dist’ns (opposite skewed)
Does not allow sensible rigid shift
3535
UNC, Stat & OR
Why not adjust using SVM?
Nicely Fixed by DWD
Projected Dist’ns near Gaussian
Sensible to shift
3636
UNC, Stat & OR
Why not adjust by means?
DWD is complicated: value added?
Xuxin Liu example…
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
(although still not perfect)
3737
UNC, Stat & OR
Why not adjust by means?
Next time:
Work in before and after, slides like
138-141 from DWDnormPreso.ppt
In Research/Bioinf/caBIG
3838
UNC, Stat & OR
Twiddle ratios of subtypes
3939
UNC, Stat & OR
Why not adjust by means?
DWD robust against non-
proportional subtypes…
Mathematical Statistical Question:
Are there mathematics behind this?
(will answer next time…)
4040
UNC, Stat & OR
DWD in Face Recognition
Face Images as Data
(with M. Benito & D. Peña)
Male – Female Difference?
Discrimination Rule?
Represented as long
vector of pixel gray levels
Registration is critical
4141
UNC, Stat & OR
DWD in Face Recognition, (cont.)
Registered Data
Shifts and scale
Manually chosen
To align eyes and mouth
Still large variation
See males vs. females???
4242
UNC, Stat & OR
DWD in Face Recognition , (cont.)
DWD Direction
Good separation
Images “make
sense”
Garbage at ends?
(extrapolation
effects?)
4343
UNC, Stat & OR
DWD in Face Recognition , (cont.)
Unregistered Version
Much blurrier
Since features don’t
properly line up
Nonlinear Variation
But DWD still works
Can see M-F
differ’ce?
4444
UNC, Stat & OR
DWD in Face Recognition , (cont.)
Interesting summary:
Jump between
means
(in DWD direction)
Clear separation of
Maleness vs.
Femaleness
4545
UNC, Stat & OR
DWD in Face Recognition , (cont.)
Fun Comparison:
Jump between means
(in SVM direction)
Also distinguishes
Maleness vs.
Femaleness
But not as well as
DWD
4646
UNC, Stat & OR
DWD in Face Recognition , (cont.)
Analysis of difference: Project onto normals SVM has “small gap” (feels noise artifacts?) DWD “more informative” (feels real structure?)
4747
UNC, Stat & OR
DWD in Face Recognition, (cont.)
Current Work:
Focus on “drivers”:
(regions of interest)
Relation to Discr’n?
Which is “best”?
Lessons for human
perception?
Outcomes DataBreast Cancer Study (C. M. Perou):
• Outcome of interest = death or survival
• Connection with gene expression?
Approach:
• Treat death vs. survival during study as “classes”
• Find “direction that best separates the classes”
Outcomes DataFind “direction that best separates the classes”
Outcomes DataFind “direction that best separates the classes”
Outcomes DataFind “direction that best separates classes”• DWD Projection• SVM Projection
Notes:• SVM is “better separated”?
(recall “data piling” problems….)• DWD gives “more spread between sub-
populations”???(perhaps “more stable”?)
Outcomes DataWhich is “better”?
Approach:
• Find “genes of interest”
• To maximize loadings of direction vectors
(reflects pointing in gene direction)
• Show intensity plot (of gene expression)
• Using top 20 genes in each direction
Outcomes DataWhich is “better”?
• Study with gene intensity plot
• Order cases by DWD score (proj’n)
• Order genes by DWD loading (vec. entry)
• Reduce to top & bottom 20
• Color map shows gene expression
• Shows genes that drive classification
• Gene names also available
Outcomes DataWhich is “better”? DWD direction
Outcomes DataWhich is “better”? SVM direction
Outcomes DataWhich is “better”?
• DWD finds genes showing better separation
• SVM genes are less informative
Outcomes DataHow about Centroid (Mean Diff’nce) Method?
Outcomes DataHow about Centroid (Mean Diff’nce) Method?
Outcomes DataCompare to DWD direction
Outcomes DataHow about Centroid (Mean Diff’nce) Method?
• Best yet, in terms of red – green plot?
• Projections unacceptably mixed?
• These are two different goals…
• Try for trade-off?
Scale space approach???
• Interesting philosophical point:
Very simple things often “best”
Outcomes DataWeakness of above analysis:
• Some with “genes prone to disease” have not died yet
• Perhaps can see in DWD expression plot?
Better analysis:
• More sophisticated survival methods
• Work in progress w/ Brent Johnson, Danyu Li, Helen Zhang
Distance Weighted Discrim’n 2=d Visualization:
Pushes PlaneAway FromData
All PointsHave SomeInfluence
n
i iw r1,
1min
Distance Weighted Discrim’n Maximal Data Piling
HDLSS Discrim’n Simulations
Main idea:
Comparison of
• SVM (Support Vector Machine)
• DWD (Distance Weighted Discrimination)
• MD (Mean Difference, a.k.a. Centroid)
Linear versions, across dimensions
HDLSS Discrim’n Simulations
Overall Approach:
• Study different known phenomena– Spherical Gaussians
– Outliers
– Polynomial Embedding
• Common Sample Sizes
• But wide range of dimensions
25 nn
1600,400,100,40,10d
HDLSS Discrim’n Simulations
Spherical Gaussians:
HDLSS Discrim’n Simulations
Spherical Gaussians:• Same setup as before• Means shifted in dim 1 only,• All methods pretty good• Harder problem for higher dimension• SVM noticeably worse• MD best (Likelihood method)• DWD very close to MD• Methods converge for higher
dimension??
2.21
HDLSS Discrim’n Simulations
Outlier Mixture:
HDLSS Discrim’n Simulations
Outlier Mixture:80% dim. 1 , other dims 020% dim. 1 ±100, dim. 2 ±500, others 0• MD is a disaster, driven by outliers• SVM & DWD are both very robust• SVM is best• DWD very close to SVM (insig’t
difference)• Methods converge for higher dimension??
Ignore RLR (a mistake)
2.21
HDLSS Discrim’n Simulations
Wobble Mixture:
HDLSS Discrim’n Simulations
Wobble Mixture:80% dim. 1 , other dims 020% dim. 1 ±0.1, rand dim ±100, others
0• MD still very bad, driven by outliers• SVM & DWD are both very robust• SVM loses (affected by margin push)• DWD slightly better (by w’ted influence)• Methods converge for higher dimension??
Ignore RLR (a mistake)
2.21
HDLSS Discrim’n Simulations
Nested Spheres:
HDLSS Discrim’n SimulationsNested Spheres:
1st d/2 dim’s, Gaussian with var 1 or C2nd d/2 dim’s, the squares of the 1st dim’s(as for 2nd degree polynomial embedding)
• Each method best somewhere• MD best in highest d (data non-Gaussian)• Methods not comparable (realistic)• Methods converge for higher
dimension??• HDLSS space is a strange place
Ignore RLR (a mistake)
HDLSS Discrim’n SimulationsConclusions:
• Everything (sensible) is best sometimes• DWD often very near best• MD weak beyond Gaussian
Caution about simulations (and examples):• Very easy to cherry pick best ones• Good practice in Machine Learning
– “Ignore method proposed, but read paper for useful comparison of
others”
HDLSS Discrim’n Simulations
Caution: There are additional players
E.g. Regularized Logistic Regression
looks also very competitive
Interesting Phenomenon:
All methods come together
in very high dimensions???
HDLSS Discrim’n Simulations
Can we say more about:
All methods come together
in very high dimensions???
Mathematical Statistical Question:
Mathematics behind this???
(will answer next time)
top related