Top Banner
1 Classification & Clustering 魏魏魏 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpayd
42

1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

Jan 14, 2016

Download

Documents

Dwayne Fields
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

1

Classification & Clustering

魏志達 Jyh-Da Wei

-- Parametric and Nonparametric Methods

Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin

Page 2: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

2

Classes vs. Clusters

Classification: supervised learning– Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron

Clustering: unsupervised learning– K-Means, Expectation Maximization, Self-Organization Map

Parametric Nonparametric Networks

Classes PR Kernel, KNN MLP

Clusters K-Means, EM Agglomerative SOM

Page 3: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

3

Classes vs. Clusters

Classification: supervised learning– Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron

Clustering: unsupervised learning– K-Means, Expectation Maximization, Self-Organization Map

Parametric Nonparametric Networks

Classes PR Kernel, KNN MLP

Clusters K-Means, EM Agglomerative SOM

Page 4: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

4

Bayes’ Rule

x

xx

ppP

PCC

C|

|

posterior

likelihoodprior

evidence

0 0

0

0 1

1 1 0 1

0 0 10

1

| ( , ) ( , )

( | ) , | | 1

choose if |

(i.e., ,

max |

max , )

|

|

i i

i k k

k k

P C P C

p x p x C P C p x C p x C

p C x p C x P C xp x

C

p x C P C

p x C P C

P x C P C

P

x

P C x C x

因為給定 x 之值則 p(x) 均等

Page 5: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

5

Bayes’ Rule: K>2 Classes

K

kkk

ii

iii

CPCp

CPCp

pCPCp

CP

1

|

|

||

x

x

xx

x

1

0 and 1

choose if | max |

(i.e., , max , )

K

i ii

i i k k

i k k

P C P C

P x C P x C

C P C P C

x x

因為給定 x 之值則 p(x) 均等

Page 6: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

6

Gaussian (Normal) Distribution

2

2

2exp

2

1 x-xp

p(x) = N ( μ, σ2)

Estimate μ and σ2:

μ σ

N

mxs

N

xm

t

t

t

t

2

2

2

2

2exp

2

1 xxp

Page 7: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

7

Equal variances

Single boundary athalfway between means

P(C1)=P(C2)

Page 8: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

8

Variances are different

Two boundaries

P(C1)=P(C2)

Page 9: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

9

Multivariate Normal Distribution

μxμxx

μx

1212 2

1exp

2

Σ

Σ

T

//d

d

p

~ ,N

Page 10: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

10

Multivariate Normal Distribution Mahalanobis distance: (x – μ)T ∑–1 (x – μ)

measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)

Bivariate: d = 2

2221

2121

iiii /xz

zzzzx,xp

2

2212122

21

21 212

1exp

12

1

Page 11: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

11

Bivariate Normal

Page 12: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

12

Page 13: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

13

Estimation of Parameters

ˆt

iti

t tit

i tit

Tt t ti i it

i tit

rP C

N

rm

r

rS

r

x

x m x m

Page 14: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

14

likelihoods

posterior for C1

discriminant: P (C1|x ) = 0.5

只分二類的話,剛好以 0.5 為界線

Page 15: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

15

break

Page 16: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

16

Classes vs. Clusters

Classification: supervised learning– Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron

Clustering: unsupervised learning– K-Means, Expectation Maximization, Self-Organization Map

Parametric Nonparametric Networks

Classes PR Kernel, KNN MLP

Clusters K-Means, EM Agglomerative SOM

Page 17: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

17

Parametric vs. Nonparametric Parametric Methods

– Advantage: it reduces the problem of estimating a probability density function (pdf), discriminant, or regression function to estimating the values of a small number of parameters.

– Disadvantage: this assumption does not always hold and we may incur a large error if it does not.

Nonparametric Methods– Keep the training data;“let the data speak for itself”– Given x, find a small number of closest training instances

and interpolate from these– Nonparametric methods are also called memory-based or

instance-based learning algorithms.

Page 18: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

18

Density Estimation

Given the training set X={ xt }t drawn iid (independent and identically distributed) from p(x)

Divide data into bins of size h Histogram estimator: (Figure – next page)

# in the same bin as ˆ

tx xp x

Nh

Extreme case: p(x)=1/h, for exactly consulting the sample space

該 xt 項構成集合之第 t 項

Page 19: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

19

Nh

xx#xp̂

t as bin same the in

0.375

Page 20: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

20

Density Estimation

Given the training set X={ xt }t drawn iid from p(x) x is always at the center of a bin of size 2h Naive estimator: (Figure – next page)

or

( 讓每一個 xt 投票 )

Nh

hxxhx#xp̂

t

2

otherwise0

1if 21

1

1

u/uw

hxx

wNh

xp̂N

t

t

w(u): 依地緣關係投票,贊成票計 1/2, [-1,1] 區間積分值為 1

Page 21: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

21

Nh

hxxhx#xp̂

t

2

h=0.25

h=0.5

Naïve estimator: h=1

Page 22: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

22

Kernel Estimator Kernel function, e.g., Gaussian kernel:

Kernel estimator (Parzen windows): Figure – next page

If K is Gaussian, then will be smooth having all the derivatives.

N

t

t

hxx

KNh

xp̂1

1

2exp

2

1 2uuK

K(u): 依地緣關係給分,實數域積分值為 1

Page 23: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

23

5 0 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

11

0

K u( )

55 u

2exp

2

1 2uuK

Page 24: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

24

N

t

t

hxx

KNh

xp̂1

1

Page 25: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

25

Generalization to Multivariate Data Kernel density estimator

with the requirement that

Multivariate Gaussian kernel

spheric

ellipsoid

1

tN

dt

p KNh h

x x

x

uuu

uu

1212

2

21

exp2

1

2exp

2

1

SS

T//d

d

K

K

( ) 1dR

K x dx

Page 26: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

26

k-Nearest Neighbor Estimator

Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width

dk(x): distance to kth closest instance to x

xNdk

xp̂k2

Page 27: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

27

xNdk

xp̂k2

同時向兩側長,看多遠可吃到 k 個 samples

Page 28: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

28

Nonparametric Classification(kernel estimator)

1

1

1 ˆˆ | ,

1ˆ, |

tNt i

i i idti

tNt

i i i idt

Nx xp x C K r P C

N h h N

x xp x C p x C P C K r

Nh h

rit 視 xt 是否遲於 Ci 而定 0/1

原本要比較 p(Ci|x)=p(x,Ci)/p(x) 之值何者大但給定 x 之值則 p(x) 均等 , 此處大家都不寫,式子較漂亮

可不看係數只看後項,意義為累計各委員評分這些評分為依地緣而定的正實數值

Page 29: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

29

Nonparametric Classification k-nn estimator (1)

For the special case of k-nn estimator

where

ki : the number of neighbors out of the k nearest that belong to ci

Vk(x) : the volume of the d-dimensional hypersphere centered at x,

with radius

cd : the volume of the unit sphere in d dimensions For example,

xVN

kCxp

ki

ii |ˆ

)(kxxr

ddk crV

;3

43

;2

;21

33

3

22

2

11

rcrVd

rcrVd

rcrVd

k

k

k

Page 30: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

30

Nonparametric Classification k-nn estimator (2)

From

Then

xVN

kCxp

ki

ii |ˆ

k

k

xp

CPCxpxCP iii

i ˆ

ˆ|ˆ|ˆ

xNV

kxp

N

NCP i

i ˆ

要比較 p(Ci|x)=p(x,Ci)/p(x) 之值何者大雖然給定 x 之值則 p(x) 均等 , 但此處大家寫出來,推得的式子較漂亮

意義為累積找到 k samples 之時何類的出席數最多

Page 31: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

31

break

Page 32: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

32

Classes vs. Clusters

Classification: supervised learning– Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron

Clustering: unsupervised learning– K-Means, Expectation Maximization, Self-Organization Map

Parametric Nonparametric Networks

Classes PR Kernel, KNN MLP

Clusters K-Means, EM Agglomerative SOM

Page 33: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

33

Classes vs. Clusters Supervised: X = { xt ,rt }t

Classes Ci i=1,...,K

where p ( x | Ci) ~ N ( μi , ∑i )

Φ = {P (Ci ), μi , ∑i }Ki=1

Unsupervised : X = { xt }t

Clusters Gi i=1,...,k

where p ( x | Gi) ~ N ( μi , ∑i )

Φ = {P ( Gi ), μi , ∑i }ki=1

Labels, r ti ?

k

iii Ppp

1

| GGxx

K

iii Ppp

1

| CCxx

t

ti

T

it

t itt

ii

t

ti

t

tti

it

ti

i

r

r

r

r

N

rCP̂

mxmx

xm

S

Page 34: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

34

k-Means Clustering Find k reference vectors (prototypes/codebook

vectors/codewords) which best represent data Reference vectors, mj, j =1,...,k Use nearest (most similar) reference:

Reconstruction error

jt

jit mxmx min

otherwise0

minif 1

1

jt

jit

ti

t i itt

ikii

b

bE

mxmx

mxm X

希望群中心造成的總偏離值最小

Page 35: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

35

Encoding/Decoding

otherwise0

minif 1 jt

jit

tib

mxmx

Page 36: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

36

k-means Clustering

1. Winner takes all2. 不做逐步修正,而是一口氣取群平均3. 下頁有實例,上課再舉反例 ( 前方將士變節 )

Page 37: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

37

Page 38: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

38

EM in Gaussian Mixtures zt

i = 1 if xt belongs to Gi, 0 otherwise (labels r ti of

supervised learning); assume p(x|Gi)~N(μi,∑i) E-step:

M-step:

Use estimated labels in place of unknown labels

ti

lti

j jl

jt

il

it

lti

hP

PpPp

,zE

,G

G,GG,G

X

x

xx

|

||

t

ti

Tli

t

t

li

ttil

i

t

ti

t

ttil

it

ti

i

h

h

h

h

N

hP

111

1

mxmx

xm

S

G

擁有 P(Gi ) 做後援就不怕將士變節

Page 39: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

39

P(G1|x)=h1=0.5

Page 40: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

40

Classes vs. Clusters

Classification: supervised learning– Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron

Clustering: unsupervised learning– K-Means, Expectation Maximization, Self-Organization Map

Parametric Nonparametric Networks

Classes PR Kernel, KNN MLP

Clusters K-Means, EM Agglomerative SOM

Page 41: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

41

Agglomerative Clustering

Start with N groups each with one instance and merge two closest groups at each iteration

Distance between two groups Gi and Gj:– Single-link:

– Complete-link:

– Average-link, centroid

sr

,ji ,d,d

js

ir

xxxx GG

GG

min

sr

,ji ,d,d

js

ir

xxxx GG

GG

max

Page 42: 1 Classification & Clustering 魏志達 Jyh-Da Wei -- Parametric and Nonparametric Methods Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin.

42

Dendrogram

Example: Single-Link Clustering

人類

侏儒黑猩猩

黑猩猩

大猩猩

長臂猿

獼猴

可以動態分群