1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

Unsupervised LearningUnsupervised Learningand Clusteringand Clustering

Shyh-Kang JengShyh-Kang JengDepartment of Electrical Engineering/Department of Electrical Engineering/Graduate Institute of Communication/Graduate Institute of Communication/

Graduate Institute of Networking and MultiGraduate Institute of Networking and Multimedia, National Taiwan Universitymedia, National Taiwan University

Supervised vs. Unsupervised Supervised vs. Unsupervised LearningLearning

Supervised training proceduresSupervised training procedures– Use samples labeled by their category Use samples labeled by their category

membership membership

Unsupervised training proceduresUnsupervised training procedures– Use unlabeled samplesUse unlabeled samples

Reasons for interestReasons for interestCollecting and labeling a large set of Collecting and labeling a large set of sample patterns can be costlysample patterns can be costly– e.g., speeche.g., speech

Training with large amount of unlabeled Training with large amount of unlabeled data, and using supervision to label the data, and using supervision to label the groupings found groupings found – For “data mining” applicationsFor “data mining” applications

Improved performance for data with slow Improved performance for data with slow changes of characteristics of patterns by changes of characteristics of patterns by tracking in an unsupervised modetracking in an unsupervised mode– Automated food classification when seasons Automated food classification when seasons

changechange

Reasons for interestReasons for interestCan use unsupervised methods to Can use unsupervised methods to find features that will then be useful find features that will then be useful for categorizationfor categorization– Data dependent “smart preprocessing” Data dependent “smart preprocessing”

or “smart feature extraction”or “smart feature extraction”

Perform exploratory data analysis Perform exploratory data analysis and gain insights into the nature or and gain insights into the nature or structure of the datastructure of the data– Discovery of distinct clusters may Discovery of distinct clusters may

suggest us to alter the approach to suggest us to alter the approach to designing the classifierdesigning the classifier

Basic Assumptions to Begin withBasic Assumptions to Begin withSamples come from a known number c Samples come from a known number c of classesof classesPrior probabilities Prior probabilities PP((jj)) for each class ar for each class are knowne knownForms for the class-conditional probabiliForms for the class-conditional probability densities ty densities pp((xx||jj,,jj)) are known are knownValues for parameter vectors Values for parameter vectors 11, …, , …, cc ar are unknowne unknownCategory labels are unknownCategory labels are unknown

Mixing DensityMixing Density

parameters mixing:)(

densitiescomponent :),|(

)(),|()|(

form theof

samplesfor function density y probabilit

Goal and ApproachGoal and Approach

Use samples drawn from the mixture Use samples drawn from the mixture density to estimate the unknown density to estimate the unknown parameter vector parameter vector With known With known , we can decompose , we can decompose the mixture into its components and the mixture into its components and use a maximum a posteriori classifier use a maximum a posteriori classifier on the derived densitieson the derived densities

Existence of SolutionsExistence of SolutionsSuppose unlimited number of samples aSuppose unlimited number of samples and nonparametric methods are availablend nonparametric methods are availableIf there is only one value of If there is only one value of that will pro that will produce the observed values for duce the observed values for pp((xx||) ) , a sol, a solution is possible in principleution is possible in principleIf several different values of If several different values of can produc can produce the same values for e the same values for pp((xx||)) , then there is , then there is no hope of obtaining a unique solutionno hope of obtaining a unique solution

Identifiable DensityIdentifiable Density

parameters individual theofany infer not can we

if able,unidentifi completely is )|(

data ofamount

infinitean fromeven , unique arecover

not can weif leidentifiabnot is )|(

)'|()|(

such that an least at exists there '

if leidentifiab is )|(

An Example of Unidentifiable An Example of Unidentifiable Mixture of Discrete DistributionsMixture of Discrete Distributions

4.0)|0(,6.0)|1(

binary :

An Example of Unidentifiable An Example of Unidentifiable Mixture of Gaussian DistributionsMixture of Gaussian Distributions

)()(when

Maximum-Likelihood EstimatesMaximum-Likelihood Estimates

ˆ :estimate likelihood-maximul

)|()|(

samples observed theof likelihood

unknown and fixed is vector parameter full

)(),|()|(

fromtly independen

drawn samples unlabeled : ,,

)(),|(),|(

yprobabilitposterior

ift independen

lyfunctional are and of elements that assume

)(),|(|

iiikki

0)ˆ,|(ln)ˆ,|(

),|(ln),|(

kikkki

Maximum-Likelihood Estimates for Maximum-Likelihood Estimates for Unknown PriorsUnknown Priors

,,1,0)(

sconstraint subject to ,)( and

over extends of valuemaximum for thesearch

)(),|(ln)|(ln

Maximum-Likelihood Estimates for Maximum-Likelihood Estimates for Unknown PriorsUnknown Priors

j jjjk

iiikki

kiikki

)(ˆ)ˆ,|(

)(ˆ)ˆ,|()ˆ,|(ˆ

0)ˆ,|(ln)ˆ,|(ˆ

)ˆ,|(ˆ1)(ˆ

:)(for estimates likelihood-maximum

Application to Normal MixturesApplication to Normal Mixtures

Component densities Component densities pp((xx||ii,,ii)~)~NN((ii,,ii))

Three casesThree casesCaseCase ii ii PP((ii)) cc

11 ?? XX XX XX

22 ?? ?? ?? XX

33 ?? ?? ?? ??

Case 1: Unknown Mean VectorsCase 1: Unknown Mean Vectors

j jjjk

iiikki

k kkii

kikiki

12/12/

)()ˆ,|(

)()ˆ,|()ˆ,|(

)ˆ,|(

)ˆ,|(ˆ

ˆ,,ˆˆ,0ˆ)ˆ,|(

),|(ln

1)2(ln),|(ln

xΣxΣx

Case 1: Unknown Mean VectorsCase 1: Unknown Mean Vectors

257.1ˆ

,085.2ˆ

668.1ˆ

,130.2ˆ

Case 2: All Parameters UnknownCase 2: All Parameters Unknown

j jjjk

iiikki

k kkii

)(ˆ)ˆ,|(

)(ˆ)ˆ,|()ˆ,|(ˆ

)ˆ,|(ˆ

ˆˆ)ˆ,|(ˆˆ

)ˆ,|(ˆ

)ˆ,|(ˆˆ

)ˆ,|(ˆ1)(ˆ

Case 2: All Parameters UnknownCase 2: All Parameters Unknown

j jjkjt

j jjjk

iiikki

)(ˆˆˆˆ21

)(ˆ)ˆ,|(

)(ˆ)ˆ,|()ˆ,|(ˆ

xΣxΣ

kk-Means Clustering-Means Clustering

k kkii

)ˆ,|(ˆ

)ˆ,|(ˆˆapply y iterativel

otherwise0

if1)ˆ,|(ˆ

as )ˆ,|(ˆ eapproximat

, nearest to ˆ find ,ˆ computemerely

small is

ˆˆˆ when large is )ˆ,|(ˆ

initialize initialize nn, , cc, , 11, , 22, …, , …, ccdo do classify classify nn samples according to nearest samples according to nearest ii

recompute recompute iiuntil until no change in no change in iireturn return 11, , 22, …, , …, cc

endend

Complexity Complexity OO((ndcTndcT))

In practice, the number of iterations In practice, the number of iterations TT is is generally much less than the number of generally much less than the number of samplessamplesThe values obtained can be accepted as The values obtained can be accepted as the answer, or can be used as starting pthe answer, or can be used as starting points for more exact computationsoints for more exact computations

688.1ˆ

130.2ˆ

likelihood

-maximum

684.1ˆ

176.2ˆ

1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

unknown mean vectorscase

unknown priorsapplication

unsupervised methods

n samples

unlabeled data

number of samplesthe

different values of

approachuse samples

Documents

Reporter: Yu Ting Huang Advising Prof: Ru Jong Jeng 1.

1 Multivariate Statistical Analysis Shyh-Kang Jeng...

1 Introduction Shyh-Kang Jeng Department of Electrical...

1 Comparison of Several Multivariate Means Shyh-Kang Jeng...

Bandwidth Measurements Jeng Lung WebTP Meeting 10/25/99.

(Michelle Jeng) (Vickie Hickman) (Michelle Jeng) Packet...

Computational Cognitive Neuroscience Shyh-Kang Jeng...

Cheng- Hsien Lin, Jeng-Farn Lee, Jia-Hui Wan

Shyh-In Hwang in Yuan Ze University1 Chapter 6 Consistency.....

Faa-Jeng Lin IEEE Fellow, IET Fellow -...

JUDSON HWANG WONG SHYH LONG - UTAR...

1 Sample Geometry and Random Sampling Shyh-Kang Jeng...

1 Research Guide Shyh-Kang Jeng Department of Electrical...

Adela Jeng

Graduate Program in Aerospace Engineering & Engineering...

1 Data Abstractions Shyh-Kang Jeng Department of Electrical....