Object Orie’d Data Analysis, Last Time • HDLSS Discrimination – MD much better • Maximal Data Piling – HDLSS space is a strange place • Kernel Embedding – Embed data in higher dimensional manifold – Gives greater flexibility to linear methods – Which manifold? - Radial basis functions – Careful about over fitting?
52
Embed
Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Object Orie’d Data Analysis, Last Time
• HDLSS Discrimination– MD much better
• Maximal Data Piling– HDLSS space is a strange place
• Kernel Embedding– Embed data in higher dimensional manifold
– Gives greater flexibility to linear methods
– Which manifold? - Radial basis functions
– Careful about over fitting?
Kernel EmbeddingAizerman, Braverman and Rozoner
(1964) • Motivating idea:
Extend scope of linear discrimination,By adding nonlinear components to data
(embedding in a higher dim’al space)
• Better use of name:nonlinear discrimination?
Kernel EmbeddingStronger effects for higher order polynomial embedding:
E.g. for cubic,
linear separation can give 4 parts (or fewer)
332 :,, xxxx
Kernel EmbeddingGeneral View: for original data matrix:
add rows:
i.e. embed in ThenHigher sliceDimensional with aSpace hyperplane
dnd
n
xx
xx
1
111
nn
dnd
n
dnd
n
xxxx
xx
xx
xx
xx
212111
221
21
211
1
111
Kernel EmbeddingPolynomial Embedding, Toy Example 3:Donut
Kernel EmbeddingPolynomial Embedding, Toy Example 3:Donut
Kernel EmbeddingPolynomial Embedding, Toy Example 3:Donut
Kernel EmbeddingPolynomial Embedding, Toy Example 3:Donut
Kernel EmbeddingToy Example 4: Checkerboard
VeryChallenging!
LinearMethod?
PolynomialEmbedding?
Kernel EmbeddingToy Example 4: Checkerboard
Polynomial Embedding:
• Very poor for linear
• Slightly better for higher degrees
• Overall very poor
• Polynomials don’t have needed
flexibility
Kernel EmbeddingToy Example 4: CheckerboardRadialBasisEmbedding+ FLDIsExcellent!
Kernel EmbeddingOther types of embedding:
• Explicit
• Implicit
Will be studied soon, after
introduction to Support Vector Machines…
Kernel Embedding generalizations of this idea to other
types of analysis
& some clever computational ideas.
E.g. “Kernel based, nonlinear Principal
Components Analysis”
Ref: Schölkopf, Smola and Müller
(1998)
Support Vector MachinesMotivation:
• Find a linear method that “works well”for embedded data
• Note: Embedded data are very non-Gaussian
• Suggests value ofreally new approach
Support Vector MachinesClassical References:
• Vapnik (1982)
• Boser, Guyon & Vapnik (1992)
• Vapnik (1995)
Excellent Web Resource:
• http://www.kernel-machines.org/
Support Vector MachinesRecommended tutorial:
• Burges (1998)
Recommended Monographs:
• Cristianini & Shawe-Taylor (2000)
• Schölkopf & Alex Smola (2002)
Support Vector MachinesGraphical View, using Toy Example:
• Find separating plane
• To maximize distances from data to plane
• In particular smallest distance
• Data points closest are called
support vectors
• Gap between is called margin
Support Vector MachinesGraphical View, using Toy Example:
Support Vector MachinesGraphical View, using Toy Example:
Support Vector MachinesGraphical View, using Toy Example:
Support Vector MachinesGraphical View, using Toy Example:
Support Vector MachinesGraphical View, using Toy Example:
• Find separating plane
• To maximize distances from data to plane
• In particular smallest distance
• Data points closest are called
support vectors
• Gap between is called margin
SVMs, Optimization Viewpoint
Formulate Optimization problem, based on:
• Data (feature) vectors • Class Labels • Normal Vector • Location (determines intercept) • Residuals (right side) • Residuals (wrong side) • Solve (convex problem) by quadratic
SVMs, ComputationMajor Computational Point:• Classifier only depends on data
through inner products!• Thus enough to only store inner
products• Creates big savings in optimization• Especially for HDLSS data• But also creates variations in kernel
embedding (interpretation?!?)• This is almost always done in practice
SVMs, Comput’n & Embedding
For an “Embedding Map”,
e.g.
Explicit Embedding:
Maximize:
Get classification function:
• Straightforward application of embedding
• But loses inner product advantage
x
2x
xx
i ji
jijijiiD xxyyL,
21
n
iiii bxxyxf
1
SVMs, Comput’n & EmbeddingImplicit Embedding:
Maximize:
Get classification function:
• Still defined only via inner products• Retains optimization advantage• Thus used very commonly• Comparison to explicit embedding?• Which is “better”???
i ji
jijijiiD xxyyL,
21
n
iiii bxxyxf
1
SVMs & RobustnessUsually not severely affected by
outliers,But a possible weakness:
Can have very influential pointsToy E.g., only 2 points drive SVM
SVMs & RobustnessCan have very influential points
SVMs & RobustnessUsually not severely affected by outliers,But a possible weakness:
Can have very influential pointsToy E.g., only 2 points drive SVMNotes:• Huge range of chosen hyperplanes• But all are “pretty good discriminators”• Only happens when whole range is
OK???• Good or bad?
SVMs & RobustnessEffect of violators (toy example):
SVMs & RobustnessEffect of violators (toy example):
• Depends on distance to plane
• Weak for violators nearby
• Strong as they move away
• Can have major impact on plane
• Also depends on tuning parameter C
SVMs, Computation Caution: available algorithms are not
created equal
Toy Example:
• Gunn’s Matlab code
• Todd’s Matlab code
SVMs, Computation Toy Example: Gunn’s Matlab code
SVMs, Computation Toy Example: Todd’s Matlab code
SVMs, Computation Caution: available algorithms are not
created equal
Toy Example:
• Gunn’s Matlab code
• Todd’s Matlab code
Serious errors in Gunn’s version, does not find real optimum…