Top Banner
Statistics 860 Lecture 19 c G. Wahba 2016 Based on “A Framework for Kernel Regularization With Application to Protein Clustering”, by Fan Lu, und ¨ uz Keles ¸ Stephen J. Wright and Grace Wahba, PNAS, 102(2005). Open access link available via the TRLIST link on my website: http://www.stat.wisc.edu/˜wahba. or lu.etal.pnas05.pdf and ”Robust Manifold Unfolding With Kernel Regularization” by Fan Lu, Yi Lin and Grace Wahba, TR 1108, Oct. 2005. tr1108rr.pdf 1
33

Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

Jul 16, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

Statistics 860 Lecture 19 c©G. Wahba 2016

Based on “A Framework for Kernel RegularizationWith Application to Protein Clustering”, by Fan Lu,

Sunduz Keles Stephen J. Wright and Grace Wahba,PNAS, 102(2005). Open access link available via the

TRLIST link on my website:http://www.stat.wisc.edu/˜wahba . or

lu.etal.pnas05.pdf

and”Robust Manifold Unfolding With Kernel

Regularization” by Fan Lu, Yi Lin and Grace Wahba,TR 1108, Oct. 2005. tr1108rr.pdf

1

Page 2: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

An approach to supervised/semi-supervised machinelearning is described, using Regularized Kernel Esti-mation. This can be used when only noisy dissimi-larity data between objects in a training set is given.A particularly interesting example is provided by theuse of BLAST scores in clustering and classifying pro-tein sequence data; the BLAST scores give a pair-wise dissimilarity score between protein sequences,labeled or not. A variant of the problem is appropri-ate as a dimension reduction tool for data that sitsin a low dimensional nonlinear manifold in a high di-mensional space. Tuning and computation issues andopen questions will be mentioned. Next lecture willshow how the RKE was used to analyze pedigree in-formation.

Page 3: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

OUTLINE

1. Motivation: To cluster and classify objects for whichonly dissimilarity information is known (ex. Protein Se-quences)

2. Multidimensional Scaling, other related work

3. Dissimilarity information and RKE

4. Special case: l1 loss and trace penalty.

5. The General Convex Cone Problem

6. The ”Newbie” formulation.

7. A numerical trick: subsampling

8. Eigensequence plots, truncation, tuning

9. Clustering and classification of protein sequences

10. Multiple sources of information

11. Robust manifold unfolding

2

Page 4: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

−0.4−0.3−0.2−0.100.10.20.30.4

−1

0

1

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

3D representation of the sequence space for 280 pro-teins from the globin family. Red:α-globins, blue: β-globins, purple: myoglobins, green: a heterogeneousgroup of proteins from other small subfamilies withinthe globin family. Note that there are no units on theaxes. The coordinate system has been derived from(noisy, scattered, incomplete) dissimilarity data (BLASTscores) via Regularized Kernel Estimation (RKE).

3

Page 5: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ Motivation: To Cluster and Classify Objects WithHuge or Unconventional Descriptors

In the first figure we saw a multicategory support vec-tor machine classifying clouds that could be charac-terized by by a vector of real numbers that could bescaled to sit in Euclidean d space. (d = 2 in the plot,but there were actually radiance observations at 12bandwidths.) Here we are interested in objects like im-ages, texts, microarray gene chips, protein sequencesand other objects that do not have natural, or usefuleasily obtained vectors of real numbers encoding theirattributes. What we do assume we have is some dis-similarity information which measures how far apart(in some sense) (a subset of) pairs of objects in thetraining set are. These may be crude subjective judg-ments, as might be obtained from a panel studying im-ages, counts of matching words, in the case of texts,projections of extremely large real vectors in the caseof gene chips, or similarity information between pro-tein sequences as might be obtained from populartechniques as found in the BLAST algorithms. By theuse of reglarized kernel estimation (RKE) we can turnthe given information into Euclidean vectors.

4

Page 6: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ Multidimensional Scaling, Other Related Work

The work we will talk about is related to Multidimen-sional Scaling (MDS), a procedure that has been aroundfor many years. However, the regularized kernel esti-mate we will obtain will work differently. In MDS pairsof distances dij, i, j = 1, · · ·N between N objectsare given, generally in a higher dimension than one isinterested in. In MDS a prior choice of dimension isselected, and one searches for a set dij of Euclideandistances in the chosen number of dimensions thatbest fits the data, typically in a least squares sense,that is, minimize

i≤j

(dij − dij)2.

In two or three dimensions, the results can be usedto plot and visualize the N objects and see their rela-tions. Note that the orientation of the plot is not deter-mined. MDS is widely used in psychological studies,an example might be a plot beginning with the percentof agreement between judges.

5

Page 7: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ Dissimilarity Information and RKE

Given a set of N objects, suppose we have obtaineda measure of dissimilarity, dij, for certain object pairs(i, j). Regularized Kernel Estimation (RKE): Finds K:

minK∈SN

(i,j)∈ΩL(

dij, dij(K))

+ λJ(K), (1)

SN is the convex cone of all real nonnegative definitematrices of dimension N , Ω is the set of pairs withdissimilarity information dij, and the induced dissimi-larity dij is

dij = K(i, i) +K(j, j)− 2K(i, j)

where K(i, j) is the (i, j) entry of K. L measuresthe discrepancy between the observed and induceddissimilarity. L and J are convex in K and λ is a tun-ing parameter balancing fit to the data and the penaltyor complexity on K.

6

Page 8: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

No restrictions on the set of pairs other than requiringthat the graph of the objects with pairs connected byedges be connected.

Observed dissimilarity information may be incomplete,may not satisfy the triangle inequality, may be noisy.It also may be crude, as for example when it encodesa small number of coded levels such as “very close”,“close”, “distant”, and “very distant”. A dimension isnot specified in advance.

The literature has many examples of methods for es-timating K from observations for the purpose of clus-tering and classification. The closest in spirit to theapproach here might be Lancriet et al (2004) in Ma-chine Learning who assume that K is a linear com-bination of prespecified kernels and then estimate thecoefficients by semidefinite programming.

7

Page 9: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ Special Case: l1 loss and trace penalty

The special case used here the l1 loss function andtrace penalty:

minK0

(i,j)∈Ω|dij − dij(K)|+ λ trace(K). (2)

This formulation can be posed as a special case of ageneral convex cone optimization problem for whichefficient software is available. The sum of squaresloss function with trace penalty can also be solvedwith convex cone software.

8

Page 10: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ The General Convex Cone Problem.

Notation:

• Rp is Euclidean p-space

• Pp is the nonnegative orthant in Rp, that is, theset of vectors in Rp whose components are allnonnegative.

• Qq is the second-order cone of dimension q, whichis the set of vectors x =

(

x(1), . . . , x(q))

∈ Rq

that satisfy the condition x(1) ≥ [∑q

i=2 x(i)2]1/2.

• Ss is the cone of symmetric positive definite s× s

matrices of real numbers.

9

Page 11: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ The General Convex Cone Problem (cont).

minXj,xi,z

ns∑

j=1

Cj ·Xj +

nq∑

i=1

ci · xi + g · z (3)

s.t.ns∑

j=1

Arj ·Xj +

nq∑

i=1

ari · xi + gr · z = br, ∀r

Xj ∈ Ssj

xi ∈ Qqi

z ∈ Pp.

Cj, Arj are real symmetric matrices (not necessarilypositive semidefinite) of dimension sj

ci, ari ∈ Rqi,

g, gr ∈ Rp,

br ∈ R1.10

Page 12: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

The solution of a general convex cone problem can beobtained numerically using publicly available softwaresuch as SDPT3 or DSDP5.

SDPT3: Tutuncu, R. H., Toh, K. C. & Todd, M. J.(2003) Mathematical Programming 95, 189–217.

DSDP5: Benson, S. J. & Ye, Y. (2004) DSDP5: Asoftware package implementing the dual-scaling al-gorithm for semidefinite programming, (Mathematicsand Computer Science Division, Argonne National Lab-oratory, Argonne, IL), Technical Report ANL/MCS-TM-255.

11

Page 13: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ The ”Newbie” Formulation

Suppose a solution KN has been found for a ”training”set of N objects. We wish to augment the optimalkernel (by one row and column), without changing anyof its existing elements, to account for a new object.That is, find a new “pseudo-optimal” kernel KN+1 ofthe form

KN+1 =

[

KN bT

b c

]

0,

(where b ∈ RN and c is a scalar) that solves the fol-lowing optimization problem:

minb,c≥0∑

i∈Ωnewbie

∣di,N+1 − di,N+1(KN+1)∣

s.t. b ∈ Range(KN), c− bTK†Nb ≥ 0.

12

Page 14: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

K†N is the pseudo-inverse of KN and Ωnewbie is a

subset of 1,2, . . . , N. The constraints in this prob-lem are the necessary and sufficient conditions forKN+1 to be positive semidefinite. Note that only asubset of the di,N+1 are required.

13

Page 15: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ Missing Data, Subsampling

The algorithm does not require a full set of dij. Withlarge data sets it can work quite well with a fraction ofthe N(N − 1)/2 possible dij.

Present algorithms require O(m2) storage, where m

is the number of dij. To save on computer storage,even when a full set is available, a random subsetof the available dissimilarity data is chosen such thateach object i appears with roughly the same frequencyamong the (i, j) pairs of Ω. For each i, a fixed num-ber k of pairs (i, j) with j 6= i is chosen.

The parameter k is verified to be sufficiently large whenthe estimated dij from several different random sub-sets does not vary noticeably.

14

Page 16: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ Eigensequence Plots, Tuning and Truncation

Increasing λ will shrink the eigenvalues of the esti-mated Kλ. Ideally it will shrink the smaller eigenval-ues while keeping the larger ones intact, thus pro-viding clues to the minimum number of dimensionsneeded to retain most of the dissimilarity information.

15

Page 17: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ Eigensequence Plots, Tuning and Truncation(cont.)

1 280

−8

−6

−4

−2

0

1.5λ=1e−08

Rank

Log 10

(eig

enva

lue)

1 280

λ=0.1

Rank1 280

λ=1

Rank1 280

λ=10

Rank1 280

λ=250

Rank5 101

0

0.5

1

1.5

1.25

0.75

0.25

λ=1

Rank

Log scale eigensequence plots for 5 values of λ. Pro-tein sequence example with N = 280, k = 55. Rightpanel is the λ = 1 case on an expanded scale. Notenatural breaks appear after both the second and thethird eigenvalues. Eyeball method tuning choice: λ =

1, Dimension p = 3.

16

Page 18: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

Setting all the eigenvalues of K after the largest p to0 results in the νth coordinates of the ith object asxi(ν) =

√λνφν(j), ν = 1,2, . . . , p, i = 1, · · · , N,

where the λν, φν are the first p eigenvalues and eigen-vectors of K and φν(j) is the j component of φν.

Fora newbie, once di,N+1 are found for sufficientlymany i, the coordinates of the newbie are easily found.

The coordinates of the original set of N objects are al-ways centered at the origin since the RKE estimate ofK always has the constant vector as a 0 eigenvector.

17

Page 19: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ Clustering and Classification of ProteinSequences

Challenging problem in biology is inferring molecularfunctions of unanotated proteins. One method for do-ing this is to examine the sequence similarity betweenthe unannotated protein and a set of annotated pro-teins - those whose function is understood. The firstproblem is the clustering of large numbers of proteinsequences into subfamilies to group similar proteinstogether. The second problem is to assign new unan-notated proteins to the closest class, given labeled orclustered training data. (Much literature).

Example: 630 Globin sequences. Chose 280 sequencesincluding three large sub-classis of the globin family(112 alpha globin, 101 beta globins, 40 myoglobins,27 globins (a heterogenous category). The Bioconductor

package pairseqsim was used to obtain global pair-wise alignment scores for all pairs of the N = 280

sequences. k = 55, or about 36% of the total possi-ble.

18

Page 20: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

−0.4−0.3−0.2−0.100.10.20.30.4

−1

0

1

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

3D representation of the sequence space for 280 pro-teins from the globin family. Different subfamilies areencoded with different colors: Red symbols are alpha-globin subfamily, blue symbols are beta-globins, pur-ple symbols represent myglobin subfamily, and greensymbols, scattered in the middle, are a heterogeneousgroup encompassing proteins from other small sub-families within the globin family.

19

Page 21: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

Here, hemoglobin zeta chains are represented by thesymbol +, fish myglobins are marked by the purplebox symbol symbol, and the diverged alpha-globinHBAMRANCAis shown by the symbol *. Hemoglobinalpha-D chains, embedded within the alpha-globin clus-ter, are highlighted using the the symbol .

20

Page 22: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

−0.4−0.3−0.2−0.100.10.20.30.4

−1

0

1

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

Positioning test globin sequences in the coordinatesystem of 280 training sequences from the globin fam-ily. The newbie algorithm is used to locate one Hemoglobinzeta chain (black circle), one Hemoglobin theta chain(black star), and seventeen Leghemoglobins (blacktriangles) into the coordinate system of the trainingglobin sequence data.

21

Page 23: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ CV2, pairwise cross validation. WORK IN PROGRESS

−3 −2 −1 0 1 2 3 4 5−3

−2

−1

0

1

2

3

Plot of toy (signal) data for test of CV2, pairwise cross-validation. Noisy data was generated by adding twodimensions of small Gaussian noise, (σ ≈ .3) thenbinning the observed pairwise (four dimensional) dis-tances into 10 equally spaced bins.

22

Page 24: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ CV2, pairwise cross validation (cont.)

−10 −8 −6 −4 −2 0 2 40.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

log10

(λ)

Pro

crus

te M

easu

re

Normalized “Procrustes Measure” of the distance ofKλ from K, the Gram matrix of the (toy) signal data:

dK,Kλ=

trK + trKλ − 2tr[K1/2KλK1/2]1/2√

trKtrKλ.

Note mimimum around log10(λ) = 2.25.

23

Page 25: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

An object space measure of distance between twokernels is

dKα,Kβ= minMαβ∈O‖Xα −MαβXβ‖2

where O is the class of rotation matrices and the X ′sare arrays of the fitted coordinates: X ′

αXα = Kα,similarly for Xβ. Interestingly enough, this gives thesame result as the unnormalized ”Procrustes Mea-sure”, (See “Wahba’s Problem”, SIAM Review 8, 384-386 (1966))wahbasproblem.pdf

24

Page 26: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ CV2, pairwise cross validation.

0 50 100−10

−8

−6

−4

−2

0

2

4

log10

(λ)=−9

Rank

Log 10

(eig

enva

lue)

0 50 100−10

−8

−6

−4

−2

0

2

4

log10

(λ)=−1

Rank0 50 100

−10

−8

−6

−4

−2

0

2

4

log10

(λ)=2

Rank0 50 100

−10

−8

−6

−4

−2

0

2

4

log10

(λ)=2.25

Rank0 50 100

−10

−8

−6

−4

−2

0

2

4

log10

(λ)=2.4

Rank0 50 100

−10

−8

−6

−4

−2

0

2

4

log10

(λ)=2.5

Rank

Eigensequences as a function of λ for the Kλ for thenoisy toy data. Recall that the minimum of the pro-crustes measure is log10(λ) ≈ 2.25− 2.4.

25

Page 27: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ CV2, pairwise cross validation (cont.)

To do CV2, the pairwise crossvalidation, a separatetuning set with dissimilarity information is needed.

1. Fix λ. Solve the newbie problem separately forobjects r and s in the tuning set to get dir and disfor selected i = i1, · · · , iL.

2. Using dir and dis for i1, · · · , iL obtain the newbiecoordinate vectors xr = xr(λ) and xs = xs(λ).

3. Compare the predicted distances |xr − xs| to theobserved dissimilarities drs to get CV (λ):

CV (λ) =∑

r,s| |xr(λ)− xs(λ)| − drs|.

4. Increment λ.

26

Page 28: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ CV2, pairwise cross validation.

−10 −8 −6 −4 −2 0 2 41

2

3

4

5

6

7

8

9

10

11

log10

(λ)

norm

aliz

ed C

V

CV2 plot. Tuning set of 30 objects to give(

302

)

pairs.Using 95% of the trace to evaluate the CV2.

27

Page 29: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ CV2, pairwise cross validation.

−10 −8 −6 −4 −2 0 2 40

2

4

6

8

10

12

14

log10

(λ)

CV

CV2 plot. Tuning set of 30 objects to give(

302

)

pairs.Using only the rank 2 approximation to Kλ to evalu-ate CV2. (In this toy problem we know that the ”right”answer is 2.)

28

Page 30: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ Multiple Sources of Information

In the case of proteins, there are other sources of in-formation than sequence data, namely various kindsof structure information. The first question that ariseswhen there are different sources of information mightbe: Are they telling you the same thing? When only(relative) distance information is available, the ”Pro-crustes” measure provides pairwise dissimilarities be-tween RKE’s. Alternatively a simple measure of cor-relation is:

ij ds/2ijα d

s/2ijβ /((

ij dsijα)

1/2(∑

ij dsijβ)

1/2)

where α and β index the different sources of informa-tion and s is a real number to be chosen.

29

Page 31: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ Combining Multiple Sources of Information

With labeled data, these kernels can further be exam-ined and combined in an optimal way for classification.One way to do that follows by letting

K =p∑

α=1

µαKα (4)

and tuning the µ to optimize the classification in aSupport Vector machine:

G. Lanckriet, N. Cristianini, P. Bartlett, and ElGahoui. A learning the ker-nel matrix with semidefinite programming. J. Mach. Learning Res, 5:27–72,2004.

Numerous proposals have been made for in-sampletuning of SVM’s, along with those in Lanckriet et al,see

G. Wahba, Y. Lin, Y. Lee, and H. Zhang. On the relation between the GACVand Joachims’ ξα method for tuning support vector machines, with extensionsto the non-standard case. Technical Report 1039, Statistics Department Uni-versity of Wisconsin, Madison WI, 2001. tr1039.pdf

and elsewhere for references.30

Page 32: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

Alternatively, given a fitted kernel for one source ofdata, an interesting question is how new informationfrom a different source is best incorporated. The an-swer can be expected to be different according to whetherclassification (given a labeled training set) or cluster-ing (with unlabeled data) is the goal.

Numerous open questions.

Page 33: Based on “A Framework for Kernel Regularization Sund¨ uz ...pages.stat.wisc.edu/~wahba/stat860public/lect/lect19/lect19.pdfuse of BLAST scores in clustering and classifying pro-tein

♣♣ Robust Manifold Unfolding

−10 0 10 20020

−10

0

10

(a)

−40 −20 0 20 40−15

−10

−5

0

5

10

15

(b)

True Parameterization

0 200 400 600 800

−20

−10

0

10

Log(

eige

nval

ue)

(c)

−40 −20 0 20 40−15

−10

−5

0

5

10

15

(d)

First 2 Principal Coordinates

0 200 400 600 800

−20

−10

0

10

Log(

eige

nval

ue)

(e)

−40 −20 0 20 40−15

−10

−5

0

5

10

15

(f)

First 2 Principal Coordinates

λ= 0.002 λ= 0.0025

Unrolling the Wisconsin Roll (a.k.a. ”Swiss Roll”) (a),(b) ”truth” and truth unrolled, (c), (d) eigenvalues andflattened manifold, noise case 1, (e), (f) noise case 2.

minK0

6−nn pairs(i,j)

|dij − dij(K)| − λ trace(K).

F. Lu, Y. Lin, and G. Wahba. Robust manifold unfolding with kernel regular-ization. TR 1008, Department of Statistics, University of Wisconsin, MadisonWI, October, 2005.tr1108rr.pdf .

31