1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
Post on 18-Jan-2016
215 Views
Preview:
Transcript
11
Unsupervised LearningUnsupervised Learningand Clusteringand Clustering
Shyh-Kang JengShyh-Kang JengDepartment of Electrical Engineering/Department of Electrical Engineering/Graduate Institute of Communication/Graduate Institute of Communication/
Graduate Institute of Networking and MultiGraduate Institute of Networking and Multimedia, National Taiwan Universitymedia, National Taiwan University
22
Supervised vs. Unsupervised Supervised vs. Unsupervised LearningLearning
Supervised training proceduresSupervised training procedures– Use samples labeled by their category Use samples labeled by their category
membership membership
Unsupervised training proceduresUnsupervised training procedures– Use unlabeled samplesUse unlabeled samples
33
Reasons for interestReasons for interestCollecting and labeling a large set of Collecting and labeling a large set of sample patterns can be costlysample patterns can be costly– e.g., speeche.g., speech
Training with large amount of unlabeled Training with large amount of unlabeled data, and using supervision to label the data, and using supervision to label the groupings found groupings found – For “data mining” applicationsFor “data mining” applications
Improved performance for data with slow Improved performance for data with slow changes of characteristics of patterns by changes of characteristics of patterns by tracking in an unsupervised modetracking in an unsupervised mode– Automated food classification when seasons Automated food classification when seasons
changechange
44
Reasons for interestReasons for interestCan use unsupervised methods to Can use unsupervised methods to find features that will then be useful find features that will then be useful for categorizationfor categorization– Data dependent “smart preprocessing” Data dependent “smart preprocessing”
or “smart feature extraction”or “smart feature extraction”
Perform exploratory data analysis Perform exploratory data analysis and gain insights into the nature or and gain insights into the nature or structure of the datastructure of the data– Discovery of distinct clusters may Discovery of distinct clusters may
suggest us to alter the approach to suggest us to alter the approach to designing the classifierdesigning the classifier
55
Basic Assumptions to Begin withBasic Assumptions to Begin withSamples come from a known number c Samples come from a known number c of classesof classesPrior probabilities Prior probabilities PP((jj)) for each class ar for each class are knowne knownForms for the class-conditional probabiliForms for the class-conditional probability densities ty densities pp((xx||jj,,jj)) are known are knownValues for parameter vectors Values for parameter vectors 11, …, , …, cc ar are unknowne unknownCategory labels are unknownCategory labels are unknown
66
Mixing DensityMixing Density
parameters mixing:)(
densitiescomponent :),|(
,,
)(),|()|(
form theof
samplesfor function density y probabilit
1
1
j
jj
tc
c
jjjj
P
p
Ppp
x
xx
77
Goal and ApproachGoal and Approach
Use samples drawn from the mixture Use samples drawn from the mixture density to estimate the unknown density to estimate the unknown parameter vector parameter vector With known With known , we can decompose , we can decompose the mixture into its components and the mixture into its components and use a maximum a posteriori classifier use a maximum a posteriori classifier on the derived densitieson the derived densities
88
Existence of SolutionsExistence of SolutionsSuppose unlimited number of samples aSuppose unlimited number of samples and nonparametric methods are availablend nonparametric methods are availableIf there is only one value of If there is only one value of that will pro that will produce the observed values for duce the observed values for pp((xx||) ) , a sol, a solution is possible in principleution is possible in principleIf several different values of If several different values of can produc can produce the same values for e the same values for pp((xx||)) , then there is , then there is no hope of obtaining a unique solutionno hope of obtaining a unique solution
99
Identifiable DensityIdentifiable Density
parameters individual theofany infer not can we
if able,unidentifi completely is )|(
data ofamount
infinitean fromeven , unique arecover
not can weif leidentifiabnot is )|(
)'|()|(
such that an least at exists there '
if leidentifiab is )|(
x
x
xx
x
p
p
pp
x
p
1010
An Example of Unidentifiable An Example of Unidentifiable Mixture of Discrete DistributionsMixture of Discrete Distributions
2.1
4.0)|0(,6.0)|1(
0 if2
11
1 if2
1
)1(2
1)1(
2
1)|(
binary :
21
21
21
122
111
xPxP
x
x
xP
x
xxxx
1111
An Example of Unidentifiable An Example of Unidentifiable Mixture of Gaussian DistributionsMixture of Gaussian Distributions
)()(when
2
1exp
2
)(
2
1exp
2
)()|(
21
22
2
21
1
PP
xP
xP
xp
1212
Maximum-Likelihood EstimatesMaximum-Likelihood Estimates
ˆ :estimate likelihood-maximul
)|()|(
samples observed theof likelihood
unknown and fixed is vector parameter full
)(),|()|(
fromtly independen
drawn samples unlabeled : ,,
1
1
1
n
kk
c
jjjj
n
pDp
Ppp
nD
x
xx
xx
1313
Maximum-Likelihood EstimatesMaximum-Likelihood Estimates
|
)(),|(),|(
yprobabilitposterior
ift independen
lyfunctional are and of elements that assume
)(),|(|
1
|ln
11
1
k
iiikki
ji
c
jjjjk
n
k k
n
kk
p
Ppp
ji
Ppp
l
pl
ιι
x
xx
xx
x
1414
Maximum-Likelihood EstimatesMaximum-Likelihood Estimates
0)ˆ,|(ln)ˆ,|(
),|(ln),|(
1
1
n
kikkki
n
kikkki
pP
pPl
i
ii
xx
xx
1515
Maximum-Likelihood Estimates for Maximum-Likelihood Estimates for Unknown PriorsUnknown Priors
1)(
,,1,0)(
sconstraint subject to ,)( and
over extends of valuemaximum for thesearch
)(),|(ln)|(ln
1
1 11
c
ii
i
i
n
k
c
jjjjk
n
kk
P
ciP
P
l
Pppl
xx
1616
Maximum-Likelihood Estimates for Maximum-Likelihood Estimates for Unknown PriorsUnknown Priors
c
j jjjk
iiikki
n
kiikki
n
kkii
i
Pp
PpP
pP
Pn
P
P
i
1
1
1
)(ˆ)ˆ,|(
)(ˆ)ˆ,|()ˆ,|(ˆ
0)ˆ,|(ln)ˆ,|(ˆ
)ˆ,|(ˆ1)(ˆ
:)(for estimates likelihood-maximum
x
xx
xx
x
1717
Application to Normal MixturesApplication to Normal Mixtures
Component densities Component densities pp((xx||ii,,ii)~)~NN((ii,,ii))
Three casesThree casesCaseCase ii ii PP((ii)) cc
11 ?? XX XX XX
22 ?? ?? ?? XX
33 ?? ?? ?? ??
1818
Case 1: Unknown Mean VectorsCase 1: Unknown Mean Vectors
c
j jjjk
iiikki
n
k ki
n
k kkii
tc
n
kikiki
iiii
iit
iid
ii
Pp
PpP
P
P
P
p
p
i
1
1
1
11
1
1
12/12/
)()ˆ,|(
)()ˆ,|()ˆ,|(
)ˆ,|(
)ˆ,|(ˆ
ˆ,,ˆˆ,0ˆ)ˆ,|(
),|(ln
2
1)2(ln),|(ln
x
xx
x
xx
xΣx
xΣx
xΣxΣx
1919
Case 1: Unknown Mean VectorsCase 1: Unknown Mean Vectors
22
2121
2
1exp
23
2
2
1exp
23
1),|(
x
xxp
2020
Case 1: Unknown Mean VectorsCase 1: Unknown Mean Vectors
257.1ˆ
,085.2ˆ
668.1ˆ
,130.2ˆ
2,2
2
1
2
1
21
2121
Case 2: All Parameters UnknownCase 2: All Parameters Unknown
c
j jjjk
iiikki
n
k ki
tikik
n
k kii
n
k ki
n
k kkii
n
kkii
Pp
PpP
P
P
P
P
Pn
P
1
1
1
1
1
1
)(ˆ)ˆ,|(
)(ˆ)ˆ,|()ˆ,|(ˆ
)ˆ,|(ˆ
ˆˆ)ˆ,|(ˆˆ
)ˆ,|(ˆ
)ˆ,|(ˆˆ
)ˆ,|(ˆ1)(ˆ
x
xx
x
xxxΣ
x
xx
x
2222
Case 2: All Parameters UnknownCase 2: All Parameters Unknown
c
j jjkjt
jkj
iikit
iki
c
j jjjk
iiikki
P
P
Pp
PpP
1
12/1
12/1
1
)(ˆˆˆˆ21
expˆ
)(ˆˆˆˆ21
expˆ
)(ˆ)ˆ,|(
)(ˆ)ˆ,|()ˆ,|(ˆ
xΣxΣ
xΣxΣ
x
xx
2323
kk-Means Clustering-Means Clustering
n
k ki
n
k kkii
ki
ki
kmik
ikit
ikki
P
P
miP
P
P
1
1
2
1
)ˆ,|(ˆ
)ˆ,|(ˆˆapply y iterativel
otherwise0
if1)ˆ,|(ˆ
as )ˆ,|(ˆ eapproximat
, nearest to ˆ find ,ˆ computemerely
small is
ˆˆˆ when large is )ˆ,|(ˆ
x
xx
x
x
xx
xΣxx
2424
kk-Means Clustering-Means Clustering
initialize initialize nn, , cc, , 11, , 22, …, , …, ccdo do classify classify nn samples according to nearest samples according to nearest ii
recompute recompute iiuntil until no change in no change in iireturn return 11, , 22, …, , …, cc
endend
2525
kk-Means Clustering-Means Clustering
Complexity Complexity OO((ndcTndcT))
In practice, the number of iterations In practice, the number of iterations TT is is generally much less than the number of generally much less than the number of samplessamplesThe values obtained can be accepted as The values obtained can be accepted as the answer, or can be used as starting pthe answer, or can be used as starting points for more exact computationsoints for more exact computations
2626
kk-Means Clustering-Means Clustering
688.1ˆ
130.2ˆ
likelihood
-maximum
684.1ˆ
176.2ˆ
2
1
2
1
2727
kk-Means Clustering-Means Clustering
top related