Top Banner
1 Unsupervised Learning Unsupervised Learning and Clustering and Clustering Shyh-Kang Jeng Shyh-Kang Jeng Department of Electrical Engineeri Department of Electrical Engineeri ng/ ng/ Graduate Institute of Communicatio Graduate Institute of Communicatio n/ n/ Graduate Institute of Networking a Graduate Institute of Networking a nd Multimedia, National Taiwan Uni nd Multimedia, National Taiwan Uni versity versity
27

1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

Jan 18, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

11

Unsupervised LearningUnsupervised Learningand Clusteringand Clustering

Shyh-Kang JengShyh-Kang JengDepartment of Electrical Engineering/Department of Electrical Engineering/Graduate Institute of Communication/Graduate Institute of Communication/

Graduate Institute of Networking and MultiGraduate Institute of Networking and Multimedia, National Taiwan Universitymedia, National Taiwan University

Page 2: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

22

Supervised vs. Unsupervised Supervised vs. Unsupervised LearningLearning

Supervised training proceduresSupervised training procedures– Use samples labeled by their category Use samples labeled by their category

membership membership

Unsupervised training proceduresUnsupervised training procedures– Use unlabeled samplesUse unlabeled samples

Page 3: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

33

Reasons for interestReasons for interestCollecting and labeling a large set of Collecting and labeling a large set of sample patterns can be costlysample patterns can be costly– e.g., speeche.g., speech

Training with large amount of unlabeled Training with large amount of unlabeled data, and using supervision to label the data, and using supervision to label the groupings found groupings found – For “data mining” applicationsFor “data mining” applications

Improved performance for data with slow Improved performance for data with slow changes of characteristics of patterns by changes of characteristics of patterns by tracking in an unsupervised modetracking in an unsupervised mode– Automated food classification when seasons Automated food classification when seasons

changechange

Page 4: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

44

Reasons for interestReasons for interestCan use unsupervised methods to Can use unsupervised methods to find features that will then be useful find features that will then be useful for categorizationfor categorization– Data dependent “smart preprocessing” Data dependent “smart preprocessing”

or “smart feature extraction”or “smart feature extraction”

Perform exploratory data analysis Perform exploratory data analysis and gain insights into the nature or and gain insights into the nature or structure of the datastructure of the data– Discovery of distinct clusters may Discovery of distinct clusters may

suggest us to alter the approach to suggest us to alter the approach to designing the classifierdesigning the classifier

Page 5: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

55

Basic Assumptions to Begin withBasic Assumptions to Begin withSamples come from a known number c Samples come from a known number c of classesof classesPrior probabilities Prior probabilities PP((jj)) for each class ar for each class are knowne knownForms for the class-conditional probabiliForms for the class-conditional probability densities ty densities pp((xx||jj,,jj)) are known are knownValues for parameter vectors Values for parameter vectors 11, …, , …, cc ar are unknowne unknownCategory labels are unknownCategory labels are unknown

Page 6: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

66

Mixing DensityMixing Density

parameters mixing:)(

densitiescomponent :),|(

,,

)(),|()|(

form theof

samplesfor function density y probabilit

1

1

j

jj

tc

c

jjjj

P

p

Ppp

x

xx

Page 7: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

77

Goal and ApproachGoal and Approach

Use samples drawn from the mixture Use samples drawn from the mixture density to estimate the unknown density to estimate the unknown parameter vector parameter vector With known With known , we can decompose , we can decompose the mixture into its components and the mixture into its components and use a maximum a posteriori classifier use a maximum a posteriori classifier on the derived densitieson the derived densities

Page 8: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

88

Existence of SolutionsExistence of SolutionsSuppose unlimited number of samples aSuppose unlimited number of samples and nonparametric methods are availablend nonparametric methods are availableIf there is only one value of If there is only one value of that will pro that will produce the observed values for duce the observed values for pp((xx||) ) , a sol, a solution is possible in principleution is possible in principleIf several different values of If several different values of can produc can produce the same values for e the same values for pp((xx||)) , then there is , then there is no hope of obtaining a unique solutionno hope of obtaining a unique solution

Page 9: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

99

Identifiable DensityIdentifiable Density

parameters individual theofany infer not can we

if able,unidentifi completely is )|(

data ofamount

infinitean fromeven , unique arecover

not can weif leidentifiabnot is )|(

)'|()|(

such that an least at exists there '

if leidentifiab is )|(

x

x

xx

x

p

p

pp

x

p

Page 10: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

1010

An Example of Unidentifiable An Example of Unidentifiable Mixture of Discrete DistributionsMixture of Discrete Distributions

2.1

4.0)|0(,6.0)|1(

0 if2

11

1 if2

1

)1(2

1)1(

2

1)|(

binary :

21

21

21

122

111

xPxP

x

x

xP

x

xxxx

Page 11: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

1111

An Example of Unidentifiable An Example of Unidentifiable Mixture of Gaussian DistributionsMixture of Gaussian Distributions

)()(when

2

1exp

2

)(

2

1exp

2

)()|(

21

22

2

21

1

PP

xP

xP

xp

Page 12: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

1212

Maximum-Likelihood EstimatesMaximum-Likelihood Estimates

ˆ :estimate likelihood-maximul

)|()|(

samples observed theof likelihood

unknown and fixed is vector parameter full

)(),|()|(

fromtly independen

drawn samples unlabeled : ,,

1

1

1

n

kk

c

jjjj

n

pDp

Ppp

nD

x

xx

xx

Page 13: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

1313

Maximum-Likelihood EstimatesMaximum-Likelihood Estimates

|

)(),|(),|(

yprobabilitposterior

ift independen

lyfunctional are and of elements that assume

)(),|(|

1

|ln

11

1

k

iiikki

ji

c

jjjjk

n

k k

n

kk

p

Ppp

ji

Ppp

l

pl

ιι

x

xx

xx

x

Page 14: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

1414

Maximum-Likelihood EstimatesMaximum-Likelihood Estimates

0)ˆ,|(ln)ˆ,|(

),|(ln),|(

1

1

n

kikkki

n

kikkki

pP

pPl

i

ii

xx

xx

Page 15: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

1515

Maximum-Likelihood Estimates for Maximum-Likelihood Estimates for Unknown PriorsUnknown Priors

1)(

,,1,0)(

sconstraint subject to ,)( and

over extends of valuemaximum for thesearch

)(),|(ln)|(ln

1

1 11

c

ii

i

i

n

k

c

jjjjk

n

kk

P

ciP

P

l

Pppl

xx

Page 16: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

1616

Maximum-Likelihood Estimates for Maximum-Likelihood Estimates for Unknown PriorsUnknown Priors

c

j jjjk

iiikki

n

kiikki

n

kkii

i

Pp

PpP

pP

Pn

P

P

i

1

1

1

)(ˆ)ˆ,|(

)(ˆ)ˆ,|()ˆ,|(ˆ

0)ˆ,|(ln)ˆ,|(ˆ

)ˆ,|(ˆ1)(ˆ

:)(for estimates likelihood-maximum

x

xx

xx

x

Page 17: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

1717

Application to Normal MixturesApplication to Normal Mixtures

Component densities Component densities pp((xx||ii,,ii)~)~NN((ii,,ii))

Three casesThree casesCaseCase ii ii PP((ii)) cc

11 ?? XX XX XX

22 ?? ?? ?? XX

33 ?? ?? ?? ??

Page 18: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

1818

Case 1: Unknown Mean VectorsCase 1: Unknown Mean Vectors

c

j jjjk

iiikki

n

k ki

n

k kkii

tc

n

kikiki

iiii

iit

iid

ii

Pp

PpP

P

P

P

p

p

i

1

1

1

11

1

1

12/12/

)()ˆ,|(

)()ˆ,|()ˆ,|(

)ˆ,|(

)ˆ,|(ˆ

ˆ,,ˆˆ,0ˆ)ˆ,|(

),|(ln

2

1)2(ln),|(ln

x

xx

x

xx

xΣx

xΣx

xΣxΣx

Page 19: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

1919

Case 1: Unknown Mean VectorsCase 1: Unknown Mean Vectors

22

2121

2

1exp

23

2

2

1exp

23

1),|(

x

xxp

Page 20: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

2020

Case 1: Unknown Mean VectorsCase 1: Unknown Mean Vectors

257.1ˆ

,085.2ˆ

668.1ˆ

,130.2ˆ

2,2

2

1

2

1

21

Page 21: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

2121

Case 2: All Parameters UnknownCase 2: All Parameters Unknown

c

j jjjk

iiikki

n

k ki

tikik

n

k kii

n

k ki

n

k kkii

n

kkii

Pp

PpP

P

P

P

P

Pn

P

1

1

1

1

1

1

)(ˆ)ˆ,|(

)(ˆ)ˆ,|()ˆ,|(ˆ

)ˆ,|(ˆ

ˆˆ)ˆ,|(ˆˆ

)ˆ,|(ˆ

)ˆ,|(ˆˆ

)ˆ,|(ˆ1)(ˆ

x

xx

x

xxxΣ

x

xx

x

Page 22: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

2222

Case 2: All Parameters UnknownCase 2: All Parameters Unknown

c

j jjkjt

jkj

iikit

iki

c

j jjjk

iiikki

P

P

Pp

PpP

1

12/1

12/1

1

)(ˆˆˆˆ21

expˆ

)(ˆˆˆˆ21

expˆ

)(ˆ)ˆ,|(

)(ˆ)ˆ,|()ˆ,|(ˆ

xΣxΣ

xΣxΣ

x

xx

Page 23: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

2323

kk-Means Clustering-Means Clustering

n

k ki

n

k kkii

ki

ki

kmik

ikit

ikki

P

P

miP

P

P

1

1

2

1

)ˆ,|(ˆ

)ˆ,|(ˆˆapply y iterativel

otherwise0

if1)ˆ,|(ˆ

as )ˆ,|(ˆ eapproximat

, nearest to ˆ find ,ˆ computemerely

small is

ˆˆˆ when large is )ˆ,|(ˆ

x

xx

x

x

xx

xΣxx

Page 24: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

2424

kk-Means Clustering-Means Clustering

initialize initialize nn, , cc, , 11, , 22, …, , …, ccdo do classify classify nn samples according to nearest samples according to nearest ii

recompute recompute iiuntil until no change in no change in iireturn return 11, , 22, …, , …, cc

endend

Page 25: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

2525

kk-Means Clustering-Means Clustering

Complexity Complexity OO((ndcTndcT))

In practice, the number of iterations In practice, the number of iterations TT is is generally much less than the number of generally much less than the number of samplessamplesThe values obtained can be accepted as The values obtained can be accepted as the answer, or can be used as starting pthe answer, or can be used as starting points for more exact computationsoints for more exact computations

Page 26: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

2626

kk-Means Clustering-Means Clustering

688.1ˆ

130.2ˆ

likelihood

-maximum

684.1ˆ

176.2ˆ

2

1

2

1

Page 27: 1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

2727

kk-Means Clustering-Means Clustering