Top Banner
1 Support Cluster Ma chine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by B in Li, Mingmin Chi, Jianping Fan, Xiangyang Xue, whic h was published in 2007.
19

1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

Dec 14, 2015

Download

Documents

Lesley Randall
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

1

Support Cluster MachinePaper from ICML2007

Read by Haiqin Yang

2007-10-18

This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping Fan, Xiangyang Xue, which was published in 2007.

Page 2: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

2

Outline

Background and Motivation

Support Cluster Machine - SCM

Kernel in SCM

Experiments

An Interesting Application: Privacy-preserving Data Mining

Discussions

Page 3: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

3

Background and Motivation

Large scale classification problem Decomposition methods

Osuna et al., 1997; Joachims, 1999; Platt, 1999; Collobert & Bengio, 2001; Keerthi et al., 2001;

Incremental algorithms Cauwenberghs & Poggio, 2000; Fung & Mangasarian, 2002; Laskov et al., 2006;

Parallel techniques Collobert et al., 2001; Graf et al., 2004;

Approximate formula Fung & Mangasarian, 2001; Lee & Mangasarian, 2001;

Choose representatives Active learning - Schohn & Co

hn, 2003; Cluster Based-SVM - Yu et al.,

2003; Core Vector Machine (CVM) -

Tsang et al., 2005; Clustering SVM - Boley, D. &

Cao, 2004;

Page 4: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

4

Support Cluster Machine - SCM

Given training samples:

Procedure

Page 5: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

5

SCM Solution

Dual representation

Decision function

Page 6: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

6

Kernel

Probability product kernel

By Gaussian assumption, i.e.,

Hence

Page 7: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

7

Kernel Property I

That is

Decision function

Property II

Page 8: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

8

Experiments

Datasets Toydata MNIST – Handwritten digits

(‘0’-’9’) classification Adult – Privacy-preserving Dat

aset

Clustering algorithms Threshold Order Dependent (T

OD) EM algorithm

Classification methods libSVM SVMTorch SVMlight

CVM (Core Vector Machine) SCM

Model selection

CPU: 3.0GHz

Page 9: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

9

Toydata

Samples: 2500 samples/class generated from a mixture of Gaussian distribution

Clustering algorithm: TOD Clustering results: 25 positive, 25 negative

Page 10: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

10

MNIST Data description

10 classes: Handwritten digits ‘0’-’9’ Training samples: 60,000, about 6000 for each class Testing samples: 10,000

Construct 45 binary classifiers Results

25 Clusters for EM algorithm

Page 11: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

11

MNIST

Test results for TOD algorithm

Page 12: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

12

Privacy-preserving Data Mining Inter-Enterprise data mining

Problem: Two parties owning confidential databases wish to build a decision-tree classifier on the union of their databases, without revealing any unnecessary information.

Horizontally partitionedRecords (users) split across companiesExample: Credit card fraud detection model

Vertically partitionedAttributes split across companiesExample: Associations across websites

Page 13: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

13

Privacy-preserving Data Mining Randomization approach

50 | 40K | ... 30 | 70K | ... ...

...

Randomizer Randomizer

Reconstructdistribution

of Age

Reconstructdistributionof Salary

Data MiningAlgorithms

Model

65 | 20K | ... 25 | 60K | ... ...

Page 14: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

14

Classification Example

Age Salary Repeat Visitor?

23 50K Repeat

17 30K Repeat

43 40K Repeat

68 50K Single

32 70K Single

20 20K Repeat

Age < 25

Salary < 50K

Repeat

Repeat

Single

Yes

Yes

No

No

Page 15: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

15

Privacy-preserving Dataset: Adult

Data description Training samples: 30162 Testing samples: 15060 Percentage of positive samples: 24.78%

Procedure Horizontally partition data into three subsets (parties) Cluster by TOD algorithm Obtain three positive and three negative GMMs Combine positive and negative GMMs into one positive and one negative

GMMs with modified priors Classify them by SCM

Page 16: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

16

Privacy-preserving Dataset: Adult Partition results

Experimental results

Page 17: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

17

Discussions Solved problems

Large scale problems: downsample by clustering + classifier Privacy-preserving problems: hide individual information

Differences to other methods Training units are generative model, testing units are vectors Training units contain complete statistical information Only one parameter for model selection Easy implementation Generalization ability is not clear, while the RBF kernel in SVM has the p

roperty of larger width leads to lower VC dimension.

Page 18: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

18

Discussions

Advantages of using priors and covariances

Page 19: 1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

19

Thank you!