Top Banner
Introduction Kernel Clustering Methods Spectral clustering Complex Networks An Introduction to Data Clustering 3 Francesco Masulli DIBRIS - Dip. Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi, University of Genova, ITALY & S.H.R.O. - Sbarro Institute for Cancer Research and Molecular Medicine Temple University, Philadelphia, PA, USA email: [email protected] ML-CI-2017 Francesco Masulli Introduction to Data Clustering 3
70

An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

Oct 14, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

An Introduction to Data Clustering 3

Francesco Masulli

DIBRIS - Dip. Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi,University of Genova, ITALY

&S.H.R.O. - Sbarro Institute for Cancer Research and Molecular Medicine

Temple University, Philadelphia, PA, USAemail: [email protected]

ML-CI-2017

Francesco Masulli Introduction to Data Clustering 3

Page 2: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Outline

1 Introduction

2 Kernel Clustering Methods

3 Spectral clustering

4 Complex Networks

Francesco Masulli Introduction to Data Clustering 3

Page 3: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Clustering Paradigms

Partitive/central clustering trying to obtain a single partitionof data, that are often based on the optimization of anappropriate objective functiona good cluster is that thedistances between the points and the cluster centroid aresmall (cluster compactness)

Francesco Masulli Introduction to Data Clustering 3

Page 4: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Clustering paradigms

Vicinity (connectivity) clustering: a good cluster is thateach point share the same cluster label as its nearestneighbor⇒it can represent any cluster shape that is an arbitrarymanifold in the data space (Shared Nearest NeighborClustering (Jarvis et al, 73; Ertoz et al, 2013>, Spectralclustering (Filippone, 2008)).

Francesco Masulli Introduction to Data Clustering 3

Page 5: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel and Spectral Clustering (Filippone, 2008)

Kernel clustering uses kernels (Mercer,1909; Aronszajn,1950; Saitoh, 1988) to kernelize the distance (metric) or toproject data in high-dimensional spacesSpectral clustering techniques make use of the spectrum(eigenvalues) of the similarity matrix of the data to performdimensionality reduction before clustering in fewerdimensions.Kernel and Spectral clustering are able to model genericshapes of densities.Fuzzy clustering are fruitfully used for extending the powerof Kernel and Spectral clustering methods.

Francesco Masulli Introduction to Data Clustering 3

Page 6: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel Clustering Methods

In machine learning, the use of the kernelfunctions (Mercer,1909) has been introduced by Aizermanet al. in 1964. In 1995 Cortes and Vapnik introducedSupport Vector Machines (SVMs)) which perform betterthan other classification algorithms in several problems.The success of SVM has brought to extend the use ofkernels to other learning algorithms (e.g., KernelPCA (Scholkopf, 1998)).The choice of the kernel is crucial to incorporate a prioriknowledge on the application, for which it is possible todesign ad hoc kernels.

Francesco Masulli Introduction to Data Clustering 3

Page 7: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Mercer kernels (Aronszajn, 1950; Saitoh, 1988)

We consider, for the sake of simplicity, vectors in Rd instead ofCd .

Definition (Positive definite kernel)

Let X = {x1, . . . ,xn} be a nonempty set where xi ∈ Rd . Afunction K : X × X → R is called a positive definite kernel (orMercer kernel) if and only if:

K is symmetric (i.e. K (xi ,xj) = K (xj ,xi))and

n∑i=1

n∑j=1

cicjK (xi ,xj) ≥ 0 ∀n ≥ 2 , (1)

where cr ∈ R ∀r = 1, . . . ,n

Francesco Masulli Introduction to Data Clustering 3

Page 8: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Mercer kernels (Aronszajn, 1950; Saitoh, 1988)

Each Mercer kernel can be expressed as follows:

K (xi ,xj) = Φ(xi) · Φ(xj) , (2)

where Φ : X → F performs a mapping from the inputspace X to a high dimensional feature space F .One of the most relevant aspects in applications is that it ispossible to compute Euclidean distances in F withoutknowing explicitly Φ.

Francesco Masulli Introduction to Data Clustering 3

Page 9: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Mercer kernels (Aronszajn, 1950; Saitoh, 1988)

This can be done using the so called distance kerneltrick (Muller, 2001; Scholkopf, 1998):

‖Φ(xi )− Φ(xj )‖2 = (Φ(xi )− Φ(xj )) · (Φ(xi )− Φ(xj ))

= Φ(xi ) · Φ(xi ) + Φ(xj ) · Φ(xj )− 2Φ(xi ) · Φ(xj )

= K (xi , xi ) + K (xj , xj )− 2K (xi , xj ) (3)

in which the computation of distances of vectors in featurespace is just a function of the input vectors.In fact, every algorithm in which input vectors appear onlyin dot products with other input vectors can bekernelized (Scholkopf, 2001).

Francesco Masulli Introduction to Data Clustering 3

Page 10: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Mercer kernels (Aronszajn, 1950; Saitoh, 1988)

In order to simplify the notation we introduce the so calledGram matrix K where each element kij is the scalarproduct Φ(xi) · Φ(xi).Thus, Eq. 3 can be rewritten as:∥∥Φ(xi)− Φ(xj)

∥∥2= kii + kjj − 2kij . (4)

Francesco Masulli Introduction to Data Clustering 3

Page 11: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel Clustering Methods

Examples of Mercer kernels (Vapnik, 1995):linear:

K (l)(xi ,xj) = xi · xj (5)

polynomial of degree p:

K (p)(xi ,xj) = (1 + xi · xj)p p ∈ N (6)

Gaussian:

K (g)(xi ,xj) = exp

(−‖xi − xj‖2

2s2

)σ ∈ R (7)

Francesco Masulli Introduction to Data Clustering 3

Page 12: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel Clustering Methods

The use of the linear kernel simply leads to thecomputation of the Euclidean norm in the input space.Indeed:

‖xi − xj‖2 = xi · xi + xj · xj − 2xi · xj

= K (l)(xi ,xi) + K (l)(xj ,xj)− 2K (l)(xi ,xj)

=∥∥Φ(xi)− Φ(xj)

∥∥2, (8)

shows that choosing the kernel K (l) implies Φ = I(where I is the identity function).⇒ kernels can offer a more general way to represent theelements of a set X and possibly, for some of theserepresentations, the clusters can be easily identified.

Francesco Masulli Introduction to Data Clustering 3

Page 13: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel Clustering Methods

In literature there are some applications of kernels in clustering.divided in three categories, which are based respectively on:

kernelization of the metric (Wu, 2003; Zhang, 2003; Zhang,2004);clustering in feature space (Graepel, 1998; Inokuchi, 2004;MacDonald, 2000; Qinand, 2004; Zhang, 2002);description via support vectors (Camastra, 2005; BenHur,2001).

Francesco Masulli Introduction to Data Clustering 3

Page 14: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel Clustering Methods

Methods based on kernelization of the metric look for centroids in inputspace and the distances between patterns and centroids is computedby means of kernels:

‖Φ(xh)− Φ(vi )‖2 = K (xh, xh) + K (vi , vi )− 2K (xh, vi ) . (9)

Clustering in feature space is made by mapping each pattern using thefunction Φ and then computing centroids in feature space (vΦ

i ).It is possible to compute the distances

∥∥Φ(xh)− vΦi

∥∥2 by means of thekernel trick.

The description via support vectors makes use of One Class SVM tofind a minimum enclosing sphere in feature space able to enclosealmost all data in feature space excluding outliers. The computedhypersphere corresponds to nonlinear surfaces in input space enclosinggroups of patterns. The Support Vector Clustering algorithm allows toassign labels to patterns in input space enclosed by the same surface.

Francesco Masulli Introduction to Data Clustering 3

Page 15: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel K-Means

Given the data set X , we map our data in some featurespace F , by means of a nonlinear map Φ and we considerk centers in feature space (vΦ

i ∈ F withi = 1, . . . , k ) (Girolami, 2002; Scholkopf, 1998).We call the set V Φ = (vΦ

1 , . . . ,vΦk ) Feature Space

Codebook since in our representation the centers in thefeature space play the same role of the codevectors in theinput space.In analogy with the codevectors in the input space, wedefine for each center vΦ

i its Voronoi Region and VoronoiSet in feature space.

Francesco Masulli Introduction to Data Clustering 3

Page 16: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel K-Means

The Voronoi Region in feature space (RΦi ) of the center vΦ

iis the set of all vectors in F for which vΦ

i is the closestvector

RΦi =

{xΦ ∈ F

∣∣∣∣ i = arg minj

∥∥∥xΦ − vΦj

∥∥∥} . (10)

The set of the Voronoi Regions in feature space define aVoronoi Tessellation of the Feature Space.

Francesco Masulli Introduction to Data Clustering 3

Page 17: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel K-Means

The Kernel K-Means algorithm has the following steps:1 Project the data set X into a feature space F , by means of

a nonlinear mapping Φ.2 Initialize the codebook V Φ = (vΦ

1 , . . . ,vΦk ) with vΦ

i ∈ F3 Compute for each center vΦ

i the set πΦi

4 Update the codevectors vΦi in F

vΦi =

1∣∣πΦi

∣∣ ∑x∈πΦ

i

Φ(x) (11)

5 Go to step 3 until any vΦi changes

6 Return the feature space codebook.This algorithm minimizes the quantization error in featurespace.

Francesco Masulli Introduction to Data Clustering 3

Page 18: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel K-Means

Since we do not know explicitly Φ it is not possible tocompute directly Eq. 11.Nevertheless, it is always possible to compute distancesbetween patterns and codevectors by using the kerneltrick, allowing to obtain the Voronoi sets in feature spaceπΦ

i .Indeed, writing each centroid in feature space as acombination of data vectors in feature space we have:

vΦj =

n∑h=1

γjhΦ(xh) , (12)

where

γjh =

{1 if xh ∈ πΦ

j0 otherwise

(13)

Francesco Masulli Introduction to Data Clustering 3

Page 19: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel K-Means

Now the quantity:∥∥∥Φ(xi)− vΦj

∥∥∥2=

∥∥∥∥∥Φ(xi)−n∑

h=1

γjhΦ(xh)

∥∥∥∥∥2

(14)

can be expanded by using the scalar product and the kerneltrick in Eq. 3:∥∥∥∥∥Φ(xi)−

n∑h=1

γjhΦ(xh)

∥∥∥∥∥2

= kii − 2∑

h

γjhkih +∑

r

∑s

γjrγjskrs .

(15)This allows to compute the closest feature space codevector foreach pattern and to update the coefficients γjh. It is possible toiterate operations (13) and (15) until any γjh changes to obtain aVoronoi tessellation of the feature space.

Francesco Masulli Introduction to Data Clustering 3

Page 20: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel K-Means

An on-line version of the kernel K-Means algorithm can befound in (Scholkopf, 1998).

Francesco Masulli Introduction to Data Clustering 3

Page 21: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel K-Means version by Girolami (2002)

In this formulation the number of clusters is denoted by cand a fuzzy membership matrix U is introduced.Each element uih denotes the fuzzy membership of thepoint xh to the Voronoi set πΦ

i .This algorithm tries to minimize the following functionalwith respect to U:

JΦ(U,V Φ) =n∑

h=1

c∑i=1

uih

∥∥∥Φ(xh)− vΦi

∥∥∥2. (16)

Francesco Masulli Introduction to Data Clustering 3

Page 22: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel K-Means version by Girolami (2002)

The minimization technique used by Girolami isDeterministic Annealing (Rose, 1998) which is a stochasticmethod for optimization.A parameter controls the fuzziness of the membershipduring the optimization and can be thought proportional tothe temperature of a physical system.This parameter is gradually lowered during the annealingand at the end of the procedure the memberships havebecome crisp; therefore a tessellation of the feature spaceis found.This linear partitioning in F , back to the input space, formsa nonlinear partitioning of the input space.

Francesco Masulli Introduction to Data Clustering 3

Page 23: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel SOM (Inokuchi, 2004; MacDonald, 2000)

The method tries to adapt the grid of codevectors vΦj in

feature space.We start writing each codevector as a combination ofpoints in feature space:

vΦj =

n∑h=1

γjhΦ(xh) , (17)

where the coefficients γih are initialized once the grid iscreated.

Francesco Masulli Introduction to Data Clustering 3

Page 24: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel SOM (Inokuchi, 2004; MacDonald, 2000)

Then we randomly pick an input x from X and compute thewinner by writing s(x) = arg minv j∈V ‖x − v j‖ in featurespace:

s(Φ(x)) = arg minvΦ

j ∈V‖Φ(x)− vΦ

j ‖ (18)

= arg minvΦ

j ∈V

√√√√wwwwwΦ(x)−n∑

h=1

γjhΦ(xh)

wwwww2

, (19)

that can be written, using the kernel trick:

s(Φ(x)) = arg minvΦ

j ∈V

(kii − 2

∑h

γjhkih +∑

r

∑s

γjrγjskrs

).

(20)

Francesco Masulli Introduction to Data Clustering 3

Page 25: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel SOM (Inokuchi, 2004; MacDonald, 2000)

To update the codevectors we rewrite∆v j = ε(t)h(drs)(x − v j):

vΦ′j = vΦ

j + ε(t)h(drs)(

Φ(x)− vΦj

). (21)

Using Eq. 17:

vΦ′j =

n∑h=1

γ′jhΦ(xh) =n∑

h=1

γjhΦ(xh) + ε(t)h(drs)

(Φ(x)−

n∑h=1

γjhΦ(xh)

).

(22)

Thus the rule for the update of γjh is:

γ′jh =

{(1− ε(t)h(drs))γjh + ε(t)h(drs) if h = 1(1− ε(t)h(drs))γjh otherwise.

(23)

Francesco Masulli Introduction to Data Clustering 3

Page 26: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel Neural Gas (Qin, 2004)

The kernel version of neural gas applies the soft rule forthe update to the codevectors in feature space.Rewriting ∆v j = ε(t)hλ(ρj)(x − v j) in feature space for theupdate of the codevectors we have:

∆vΦj = εhλ(ρj)

(Φ(x)− vΦ

j

), (24)

where ρj is the rank of the distance ‖Φ(x)− vΦj ‖.

Again it is possible to write vΦj as a linear combination of

Φ(xi) as in Eq. 17, allowing to compute such distances bymeans of the kernel trick.As in the kernel SOM, the updating rule for the centroidsbecomes an updating rule for the coefficients of suchcombination.

Francesco Masulli Introduction to Data Clustering 3

Page 27: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

One Class SVM (BenHur, 2000; Tax, 1999)

This approach provides a support vector description infeature space.The idea is to use kernels to project data into a featurespace and then to find the sphere enclosing almost alldata, namely not including outliers.Formally a radius R and the center v of the smallestenclosing sphere in feature space are defined.The constraint is thus:

‖Φ(xj)− v‖2 ≤ R2 + ξj ∀j , (25)

where the non negative slack variables ξj have beenadded.

Francesco Masulli Introduction to Data Clustering 3

Page 28: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

One Class SVM (BenHur, 2000; Tax, 1999)

The Lagrangian for this problem is defined (Burges, 1998):

L = R2 −∑

j

(R2 + ξj − ‖Φ(xj)− v‖2)βj −∑

j

ξjµj + C∑

j

ξj

(26)where:

βj ≥ 0 and µj ≥ 0 are Lagrange multipliers,C is a constant andC∑

j ξj is a penalty term.Computing the partial derivative of L with respect to R, vand ξj and setting them to zero leads to the followingequations:∑

j

βj = 1, v =∑

j

βjΦ(xj), βj = C − µj . (27)

Francesco Masulli Introduction to Data Clustering 3

Page 29: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

One Class SVM (BenHur, 2000; Tax, 1999)

The Karush-Kuhn-Tucker (KKT) complementary conditions(Burges, 1998) result in:

ξjµj = 0, (R2 + ξj − ‖Φ(xj)− v‖2)βj = 0. (28)

Following simple considerations regarding all theseconditions it is possible to see that:

when ξj > 0, the image of xj lies outside the hypersphere.These points are called bounded support vectors;when ξj = 0 and 0 < βj < C, the image of xj lies on thesurface of the hypersphere. These points are calledsupport vectors.

Francesco Masulli Introduction to Data Clustering 3

Page 30: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

One Class SVM (BenHur, 2000; Tax, 1999)

Moreover, it is possible to write the Wolfe dual form (BenHur,2000), whose optimization leads to this quadratic programmingproblem with respect to the βj :

JW =∑

j

kjjβj −∑

i

∑j

kijβiβj (29)

The distance from the image of a point xj and the center v ofthe enclosing sphere can be computed as follows:

dj = ‖Φ(xj)− v‖2 = kjj − 2∑

r

βr kjr +∑

r

∑s

βrβskrs (30)

Francesco Masulli Introduction to Data Clustering 3

Page 31: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

One Class SVM (BenHur, 2000; Tax, 1999)

One class SVM applied to two data sets with outliers.The gray line shows the projection in input space of the smallest enclosingsphere in feature space.On the left a linear kernel, and on the right a Gaussian kernel have beenused.

Francesco Masulli Introduction to Data Clustering 3

Page 32: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

One Class SVM (BenHur, 2000; Tax, 1999)Support Vector Clustering (BenHur, 2001)

Once boundaries in input space are found, a labelingprocedure is necessary in order to complete clustering.In Support Vector Clustering the cluster assignmentprocedure follows a simple geometric idea:Any path connecting a pair of points belonging to differentclusters must exit from the enclosing sphere in featurespace.

Francesco Masulli Introduction to Data Clustering 3

Page 33: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Support Vector Clustering (BenHur, 2001)One Class SVM (BenHur, 2000; Tax, 1999)

Denoting with Y the image in feature space of one of suchpaths and with y the elements of Y , it will result thatR(y) > R for some y.Thus it is possible to define an adjacency structure in thisform: {

1 if R(y) < R ∀y ∈ Y0 otherwise.

(31)

Clusters are simply the connected components of thegraph with the adjacency matrix just defined.

Francesco Masulli Introduction to Data Clustering 3

Page 34: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Support Vector Clustering (BenHur, 2001)Variante di One-class SVM

In the implementation in (BenHur, 2000) the check is madesampling the line segment Y in 20 equidistant points.There are some modifications on this labeling algorithm(e.g., (Lee, 2005; Yang02)) that improve performances.An improved version of SVC algorithm with application inhandwritten digits recognition can be found in (Chiang,2003).The algorithm proposed by Camastra & Verri (2005) uses aK-Means-like strategy, i.e., by repeatedly moving of allcenters vΦ

i in the feature space, computing One ClassSVM on their Voronoi sets πΦ

i , until no center changesanymore.

Francesco Masulli Introduction to Data Clustering 3

Page 35: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel fuzzy clustering methods

Approaches:Kernel Fuzzy c-Means with kernelization of the metricKernel Fuzzy c-Means in feature spacePossibilistic c-Means with the kernelization of the metric

Francesco Masulli Introduction to Data Clustering 3

Page 36: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel Fuzzy c-Means with kernelization of the metric(Wu, 2003; Zhang, 2003; Zhang, 2004)

The basic idea is to minimize the functional:

JΦ(U,V ) =n∑

h=1

c∑i=1

(uih)m ‖Φ(xh)− Φ(vi)‖2 , (32)

with the probabilistic constraint over the memberships∑ci=1 uih = 1 , ∀i = 1, . . . ,n .

The procedure for the optimization of JΦ(U,V ) is again thePicard iteration technique.

Francesco Masulli Introduction to Data Clustering 3

Page 37: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel Fuzzy c-Means with kernelization of the metric(Wu, 2003; Zhang, 2003; Zhang, 2004)

Minimization of the functional in Eq. 32 has been proposedonly in the case of a Gaussian kernel K (g).The reason is that the derivative of JΦ(U,V ) with respectto the vi using a Gaussian kernel is particularly simplesince it allows to use the kernel trick:

∂K (xh,vi)

∂vi=

(xh − vi)

s2 K (xh,vi) . (33)

Francesco Masulli Introduction to Data Clustering 3

Page 38: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel Fuzzy c-Means with kernelization of the metric(Wu, 2003; Zhang, 2003; Zhang, 2004)

We obtain for the memberships:

u−1ih =

c∑j=1

(1− K (xh,vi)

1− K (xh,vj)

) 1m−1

, (34)

and for the codevectors:

vi =

∑nh=1 (uih)m K (xh, vi)xh∑n

h=1 (uih)m K (xh, vi). (35)

Francesco Masulli Introduction to Data Clustering 3

Page 39: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel Fuzzy c-Means in feature space (Graepel,1998; Zhang, 2002)

Here we derive the Fuzzy c-Means in feature space, whichis a clustering method which allows to find a soft linearpartitioning of the feature space.This partitioning, back to the input space, results in a softnonlinear partitioning of data.The functional to optimize with the probabilistic constraint∑c

i=1 uih = 1 , ∀i = 1, . . . ,n is:

JΦ(U,V Φ) =n∑

h=1

c∑i=1

(uih)m∥∥∥Φ(xh)− vΦ

i

∥∥∥2. (36)

Francesco Masulli Introduction to Data Clustering 3

Page 40: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel Fuzzy c-Means in feature space (Graepel,1998; Zhang, 2002)

It is possible to rewrite the norm in Eq. 36 explicitly byusing:

vΦi =

∑nh=1 (uih)m Φ(xh)∑n

h=1 (uih)m = ai

n∑h=1

(uih)m Φ(xh) , (37)

which is the kernel version of the FCM eq. for centerupdate, i.e, vi =

∑nh=1(uih)mxh∑n

h=1(uih)m .

For simplicity of notation we use:

a−1i =

n∑r=1

(uir )m . (38)

Francesco Masulli Introduction to Data Clustering 3

Page 41: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Kernel Fuzzy c-Means in feature space (Graepel,1998; Zhang, 2002)

Now it is possible to write the kernel version of the FCM eq. for center

update, i.e., u−1ih =

∑cj=1

(‖xh−vi‖‖xh−vj‖

) 2m−1 :

u−1ih =

c∑j=1

khh − 2ai

n∑r=1

(uir )m khr + a2

i

n∑r=1

n∑s=1

(uir )m (uis)m krs

khh − 2aj

n∑r=1

(ujr )m khr + a2

j

n∑r=1

n∑s=1

(ujr )m (ujs)m krs

1

m−1

.

(39)Eq. 39 gives the rule for the update of the membership uih.

Francesco Masulli Introduction to Data Clustering 3

Page 42: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Possibilistic c-Means with the kernelization of themetric (Zhang, 2003)

The formulation of the Possibilistic c-Means PCM-I with thekernelization of the metric involves the minimization of thefollowing functional:

JΦ(U,V ) =n∑

h=1

c∑i=1

(uih)m ‖Φ(xh)− Φ(vi)‖2+c∑

i=1

ηi

n∑h=1

(1− uih)m

(40)Minimization leads to:

u−1ih = 1 +

(‖Φ(xh)− Φ(vi)‖2

ηi

) 1m−1

, (41)

Francesco Masulli Introduction to Data Clustering 3

Page 43: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Possibilistic c-Means with the kernelization of themetric (Zhang, 2003)

Eq. 41 can be rewritten, considering a Gaussian kernel, as:

u−1ih = 1 + 2

(1− K (xh,vi)

ηi

) 1m−1

. (42)

The update of the codevectors follows:

vi =

∑nh=1 (uih)m K (xh, vi)xh∑n

h=1 (uih)m K (xh, vi). (43)

The computation of the ηi is straightforward.

Francesco Masulli Introduction to Data Clustering 3

Page 44: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Possibilistic c-Means in feature space [FIL08]

M. Filippone, F. Camastra, F. Masulli, S. Rovetta, "A survey of kernel and spectral methods for clustering", Pattern

Recognition, 41, 1 pp. 176-190, 2008.

Francesco Masulli Introduction to Data Clustering 3

Page 45: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Possibilistic c-Means in feature space [FIL08]

Francesco Masulli Introduction to Data Clustering 3

Page 46: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Possibilistic c-Means in feature space [FIL08]

Francesco Masulli Introduction to Data Clustering 3

Page 47: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Possibilistic c-Means in feature space [FIL08]

Francesco Masulli Introduction to Data Clustering 3

Page 48: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Possibilistic c-Means in feature space [FIL08]

Francesco Masulli Introduction to Data Clustering 3

Page 49: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Possibilistic c-Means in feature space [FIL08]

Francesco Masulli Introduction to Data Clustering 3

Page 50: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Possibilistic c-Means in feature space [FIL08]

Francesco Masulli Introduction to Data Clustering 3

Page 51: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Possibilistic c-Means in feature space [FIL08]

Francesco Masulli Introduction to Data Clustering 3

Page 52: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Possibilistic c-Means in feature space [FIL08]

Francesco Masulli Introduction to Data Clustering 3

Page 53: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Spectral clustering methods

Clustering Graphs:

D degree matrix; W adiacency matrix; L ≡ D −W Laplacian matrix

Application of a clustering technique (such as K-Means) to data in aaffine subspace spanned by the first k∗ eigenvectors of L.

The graph can be given or D, W and L can be obtained from the datasimilarity matrix (symmetric and non negative) using, e.g., a Gaussian kernel.

Francesco Masulli Introduction to Data Clustering 3

Page 54: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Spectral clustering methods

Francesco Masulli Introduction to Data Clustering 3

Page 55: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Ng-Jordan-Weiss algorithm (2002)

L = D −W

wij = e−||xi−xj ||

σiσj for i 6= j , wii = 0dii =

∑j wij , and dij = 0 for i 6= j

normalized Laplacian :

Lsym := D−1/2LD−1/2 = I − D−1/2WD−1/2

Francesco Masulli Introduction to Data Clustering 3

Page 56: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Ng-Jordan-Weiss algorithm (2002)

1 Set the number of clusters k , and the similarity matrixS ∈ Rn×n.

2 Compute the normalized Laplacian Lsym

3 Obtain the top k eigenvectors v1, .., vk of Lsym, andcalculate V ∈ Rn×k by reshaping them as columns.

4 Get U ∈ Rn×k by normalizing the row sum of V to 1,where uij = vij/(

∑k v2

ik )12 .

5 For i = 1, ..,n, let yi ∈ Rk represents the i th row of U.6 Apply K-Means for clustering (yi)i=1,..,n instances into k

clusters.

Francesco Masulli Introduction to Data Clustering 3

Page 57: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Ng-Jordan-Weiss algorithm (2002)Synthetic data set 2 (blobs)

Spectral Clustering

Francesco Masulli Introduction to Data Clustering 3

Page 58: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Networks

Complex networks have been used in many fields torepresent various kinds of complex systems (Strogatz,2001; Albert&Barabási, 2002)statistical physics, particle physics, computer science,electrical engineering, biology, economics, operationsresearch, sociology, logistical networks, the World WideWeb, Internet, gene regulatory networks, metabolicnetworks, social networks, epistemological networks,expression networks, pathway networks, gene regulatorynetworks, protein interaction networks, metabolicnetworks, etc.

Francesco Masulli Introduction to Data Clustering 3

Page 59: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Networks

Map of human protein to protein interaction network

From (Stelzl&al, 2008)Francesco Masulli Introduction to Data Clustering 3

Page 60: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Networks

Propagation of a tweet on the Internet

Francesco Masulli Introduction to Data Clustering 3

Page 61: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Networks

Network properties: small-world and scale-free(Watts&Strogatz, 1998; Barabási&Albert, 1999;Adamic&Huberman, 2000)Various methods to capture the structure andcharacteristics of the network from different perspectives

Francesco Masulli Introduction to Data Clustering 3

Page 62: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

NetworksNetwork clustering - Community detection

(Girvan&al., 2012) introduced community structure ornetwork clustering: communities are groups of verticeswhich probably share common properties and/or playsimilar roles within the network.(Newman&al, 2006) introduced modularity that is a widelyused measure of community structure.Community detection: identify the modules of networksand, possibly, their hierarchical organization, by only usingthe information encoded in the graph topology (Fortunato,2010)

Francesco Masulli Introduction to Data Clustering 3

Page 63: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

NetworksNetwork clustering - Community detection

Proposed methods:centrality measures (Fortunato&al, 2004)link density (Shen&al,2009)percolation theory (Palla&al, 2005)modularity optimization (Newman, 2006; Blondel&al,2008∗∗)kernel k-means (Dhillon&al, 2004);spectral grouping (Donetti&Munoz, 2004;White&Smyth,2005)

Spectral method gives a better solution than the other methods.It can detect community structures similar to the real even thosestructures are not very obvious (Lancichinetti&Fortunato, 2012)

Francesco Masulli Introduction to Data Clustering 3

Page 64: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Proposed approach for Community Detection

Extimation of number of clusters (communities) k using themaximization of modularity procedure (Newman&Girvan,2002): NOTE THAT k is used both for

selecting the top eigenvectors of Laplacian matrix, andsetting the number of clusters in the clustering algorithm

Application of different clustering techniques in the affinesubspace spanned by the first k eigenvectors of L:

K-Means: K -means Spectral Modularity (KSM) communitydetection methodFCM: Fuzzy C-Means Spectral Clustering Modularity (FSM)community detection methodPCM: Possibilistic Spectral Clustering Modularity (PSM)community detection method

Francesco Masulli Introduction to Data Clustering 3

Page 65: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Proposed approach for Community Detection

Fuzzy clustering techniques allow the detection ofoverlapping communitiesBridges: nodes with significant membership to differentcommunities (clusters)Hub: nodes with many links

Francesco Masulli Introduction to Data Clustering 3

Page 66: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Zachary Karate Club Social Network

(Zachary, 1977): After an argument between the club’sadministrator and the club’s instructor, the network of clubmembers split into two parties.This fighting ended in the instructor established his ownclub and taking about half of the original club with him.

Francesco Masulli Introduction to Data Clustering 3

Page 67: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Zachary Karate Club Social Network

Francesco Masulli Introduction to Data Clustering 3

Page 68: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Zachary Karate Club Social Network

Francesco Masulli Introduction to Data Clustering 3

Page 69: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Protein-to-Protein Interaction (PPI) Networks

Protein-protein interactions (PPIs) occur when two or moreproteins bind together in a cell in vitro or in a livingorganism.

The interaction interface of proteins is evolved to a specificpurpose.Interactions between proteins are important for the majorityof biological function.Not all possible PPIs will occur in any cell at a given time

Francesco Masulli Introduction to Data Clustering 3

Page 70: An Introduction to Data Clustering 3 - unige.it · Francesco Masulli Introduction to Data Clustering 3. Introduction Kernel Clustering Methods Spectral clustering Complex Networks

IntroductionKernel Clustering Methods

Spectral clusteringComplex Networks

Protein-to-Protein Interaction (PPI) NetworksIdentified protein-protein interaction communities in HIV-1 biological network usingSE-FSM

Francesco Masulli Introduction to Data Clustering 3