1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

Post on 14-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

1

Support Cluster MachinePaper from ICML2007

Read by Haiqin Yang

2007-10-18

This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping Fan, Xiangyang Xue, which was published in 2007.

2

Outline

Background and Motivation

Support Cluster Machine - SCM

Kernel in SCM

Experiments

An Interesting Application: Privacy-preserving Data Mining

Discussions

3

Background and Motivation

Large scale classification problem Decomposition methods

Osuna et al., 1997; Joachims, 1999; Platt, 1999; Collobert & Bengio, 2001; Keerthi et al., 2001;

Incremental algorithms Cauwenberghs & Poggio, 2000; Fung & Mangasarian, 2002; Laskov et al., 2006;

Parallel techniques Collobert et al., 2001; Graf et al., 2004;

Approximate formula Fung & Mangasarian, 2001; Lee & Mangasarian, 2001;

Choose representatives Active learning - Schohn & Co

hn, 2003; Cluster Based-SVM - Yu et al.,

2003; Core Vector Machine (CVM) -

Tsang et al., 2005; Clustering SVM - Boley, D. &

Cao, 2004;

4

Support Cluster Machine - SCM

Given training samples:

Procedure

5

SCM Solution

Dual representation

Decision function

6

Kernel

Probability product kernel

By Gaussian assumption, i.e.,

Hence

7

Kernel Property I

That is

Decision function

Property II

8

Experiments

Datasets Toydata MNIST – Handwritten digits

(‘0’-’9’) classification Adult – Privacy-preserving Dat

aset

Clustering algorithms Threshold Order Dependent (T

OD) EM algorithm

Classification methods libSVM SVMTorch SVMlight

CVM (Core Vector Machine) SCM

Model selection

CPU: 3.0GHz

9

Toydata

Samples: 2500 samples/class generated from a mixture of Gaussian distribution

Clustering algorithm: TOD Clustering results: 25 positive, 25 negative

10

MNIST Data description

10 classes: Handwritten digits ‘0’-’9’ Training samples: 60,000, about 6000 for each class Testing samples: 10,000

Construct 45 binary classifiers Results

25 Clusters for EM algorithm

11

MNIST

Test results for TOD algorithm

12

Privacy-preserving Data Mining Inter-Enterprise data mining

Problem: Two parties owning confidential databases wish to build a decision-tree classifier on the union of their databases, without revealing any unnecessary information.

Horizontally partitionedRecords (users) split across companiesExample: Credit card fraud detection model

Vertically partitionedAttributes split across companiesExample: Associations across websites

13

Privacy-preserving Data Mining Randomization approach

50 | 40K | ... 30 | 70K | ... ...

...

Randomizer Randomizer

Reconstructdistribution

of Age

Reconstructdistributionof Salary

Data MiningAlgorithms

Model

65 | 20K | ... 25 | 60K | ... ...

14

Classification Example

Age Salary Repeat Visitor?

23 50K Repeat

17 30K Repeat

43 40K Repeat

68 50K Single

32 70K Single

20 20K Repeat

Age < 25

Salary < 50K

Repeat

Repeat

Single

Yes

Yes

No

No

15

Privacy-preserving Dataset: Adult

Data description Training samples: 30162 Testing samples: 15060 Percentage of positive samples: 24.78%

Procedure Horizontally partition data into three subsets (parties) Cluster by TOD algorithm Obtain three positive and three negative GMMs Combine positive and negative GMMs into one positive and one negative

GMMs with modified priors Classify them by SCM

16

Privacy-preserving Dataset: Adult Partition results

Experimental results

17

Discussions Solved problems

Large scale problems: downsample by clustering + classifier Privacy-preserving problems: hide individual information

Differences to other methods Training units are generative model, testing units are vectors Training units contain complete statistical information Only one parameter for model selection Easy implementation Generalization ability is not clear, while the RBF kernel in SVM has the p

roperty of larger width leads to lower VC dimension.

18

Discussions

Advantages of using priors and covariances

19

Thank you!

top related