Model-Based Co-clustering for Ordinal Data Julien JACQUES 1 & Christophe BIERNACKI 2 1 Université de Lyon, Lumière Lyon 2, ERIC 2 Université Lille 1 et Inria 1 / 28
Model-Based Co-clustering for Ordinal Data
Julien JACQUES1 & Christophe BIERNACKI2
1 Université de Lyon, Lumière Lyon 2, ERIC2 Université Lille 1 et Inria
1 / 28
Ordinal data ?
DefinitionAn ordinal variable µ takes values among m full ordered modalities
µ ∈ {1, . . . ,m} with 1 < . . . < m
Widespread data� Marketing: customer satisfaction
surveys� Sociology: education levels� Medecine: pain evaluation� Many nominal data are . . . ordinal!
2 / 28
Co-clustering ?Simultaneous clustering of rows (individuals) and column (features)
3 / 28
Overview
The BOS model for ordinal data
The Latent Block Model
Model inference
Numerical experiments
Model parameter estimation accuracy
Ability of ICL-BIC to select the number of co-clusters
Comparison with competitors
4 / 28
Plan
The BOS model for ordinal data
The Latent Block Model
Model inference
Numerical experiments
Model parameter estimation accuracy
Ability of ICL-BIC to select the number of co-clusters
Comparison with competitors
5 / 28
BOS(µ, π) model: parameters and properties [1]
� µ: position parameter (unique mode if π > 0)� monotonic decrease around µ� π: precision parameter:
� p(µ;µ, π) increases with π� p(µ;µ, π)− p(x ;µ, π) increases with π (x 6= µ)� uniform distribution if π = 0� Dirac in µ if π = 1
� identifiability (if π = 0)
[1] Biernacki & Jacques (2015), Model-based clustering of multivariate ordinaldata relying on a stochastic binary search algorithm, to appear in Statistics andComputing.
6 / 28
BOS(µ, π) model: illustration
2 40
0.5
1
mu=1,π=0
2 40
0.5
1
mu=1,π=0.1
2 40
0.5
1
mu=1,π=0.2
2 40
0.5
1
mu=1,π=0.5
2 40
0.5
1
mu=2,π=0
2 40
0.5
1
mu=2,π=0.1
2 40
0.5
1
mu=2,π=0.2
2 40
0.5
1
mu=2,π=0.5
2 40
0.5
1
mu=3,π=0
2 40
0.5
1
mu=3,π=0.1
2 40
0.5
1
mu=3,π=0.2
2 40
0.5
1
mu=3,π=0.5
7 / 28
BOS(µ, π) model: inference
Due to the nature of the BOS(µ, π) model, maximum likelihood shouldbe estimated thanks to an EM algorithm.
8 / 28
Plan
The BOS model for ordinal data
The Latent Block Model
Model inference
Numerical experiments
Model parameter estimation accuracy
Ability of ICL-BIC to select the number of co-clusters
Comparison with competitors
9 / 28
Latent Block Model
Latent Block Model (LBM)
BOS(µk`, πk`)
orginal data coclustering result
10 / 28
Latent Block Model
Latent Block Model (LBM)n × d random variables x are assumed to be independent once the rowv = (vik )i,k and column w = (wh`)h,` partitions are fixed:
p(x; θ) =∑v∈V
∑w∈W
p(v; θ)p(w; θ)p(x|v,w; θ)
with� V (W ) set of possible partitions of rows (column) into K (L) groups,� p(v; θ) =
∏ik α
vikk and p(w; θ) =
∏h` β
wh``
� p(x|v,w; θ) =∏
ihk` p(xih;µk`, πk`)vik wh` where p(xih;µk`, πk`) ∼ BOS(µk`, πk`)
� θ = (πk`, µk`, αk , β`)
11 / 28
Plan
The BOS model for ordinal data
The Latent Block Model
Model inference
Numerical experiments
Model parameter estimation accuracy
Ability of ICL-BIC to select the number of co-clusters
Comparison with competitors
12 / 28
LBM inference
LBM inference� The aim is to estimate θ by maximizing the observed log-likelihood
`(θ; x) =∑
x
ln p(x; θ). (1)
with x is the observed data and x the unobserved one� v and w are missing� EM is not computationally tractable� ⇒ variational or stochastic version should be used
13 / 28
LBM inference
SEM-Gibbs algorithm for LBM inference� init : θ(0), w(0)
� SE step� generate the row partition v (q+1)
ik |x,w(q)
p(vik = 1|x,w(q); θ(q)) =α(q)k fk (x (q)
i. |w(q); θ(q))∑
k′ α(q)k′ fk′(x (q)
i. |w(q); θ(q))
� generate the column partition w (q+1)h` |x, v(q+1)
p(wh` = 1|x, v(q+1); θ(q)) =β(q)` g`(x (q)
.h |v(q+1); θ(q))∑
`′ β(q)`′ g`′(x
(q).h |v(q+1); θ(q))
� M step� Estimate θ, conditionally on v(q+1),w(q+1) obtained at the SE step (and
also conditionally to x), using the EM algorithm for BOS inference.14 / 28
LBM inference
SEM-Gibbs algorithm for LBM inference� θ is obtained by mean / mode of the sample distribution (after a burn
in period)� final bipartition (v, w) estimated by MAP conditionally on θ
15 / 28
LBM inference
Choosing K and LWe propose to adapt the ICL-BIC criterion developed in (Keribin et al.2014) for categorical data coclustering based on the multinomialdistribution.Thus, K and L can be chosen by maximizing
ICL-BIC(K ,L) = log p(x, v, w; θ)− K − 12
log n − L− 12
log d − KL2
log(nd)
Missing dataWithin the SEM-Gibbs framework, missing data can be easily taken intoaccount by considering them as an additional missing random variableto be simulated in the SE step.
16 / 28
LBM inference
Choosing K and LWe propose to adapt the ICL-BIC criterion developed in (Keribin et al.2014) for categorical data coclustering based on the multinomialdistribution.Thus, K and L can be chosen by maximizing
ICL-BIC(K ,L) = log p(x, v, w; θ)− K − 12
log n − L− 12
log d − KL2
log(nd)
Missing dataWithin the SEM-Gibbs framework, missing data can be easily taken intoaccount by considering them as an additional missing random variableto be simulated in the SE step.
16 / 28
Plan
The BOS model for ordinal data
The Latent Block Model
Model inference
Numerical experiments
Model parameter estimation accuracy
Ability of ICL-BIC to select the number of co-clusters
Comparison with competitors
17 / 28
Plan
The BOS model for ordinal data
The Latent Block Model
Model inference
Numerical experiments
Model parameter estimation accuracy
Ability of ICL-BIC to select the number of co-clusters
Comparison with competitors
18 / 28
Experimental setup
� K = L = 3 clusters in row and column� d = 100 ordinal variables with m = 5 levels� n = 100 observations� values of (µk`, πk`)
Setting 1 Setting 2k /` 1 2 31 (1,0.9) (2,0.9) (3,0.9)2 (4,0.9) (5,0.9) (1,0.5)3 (2,0.5) (3,0.5) (4,0.5)
k /` 1 2 31 (1,0.2) (2,0.2) (3,0.2)2 (4,0.2) (5,0.2) (1,0.1)3 (2,0.1) (3,0.1) (4,0.1)
19 / 28
Example of data
Setting 1:
Setting 2:
20 / 28
How many iterations do we need ?
0 10 20 30 40 50
01
23
45
6
mu
iterations
0 10 20 30 40 50
0.00.2
0.40.6
0.81.0
pi
iterations
0 10 20 30 40 50
0.00.2
0.40.6
0.81.0
alpha
iterations
0 10 20 30 40 50
0.00.2
0.40.6
0.81.0
beta
iterations
0 10 20 30 40 50
01
23
4
Partition en ligne
iterations
0 10 20 30 40 50
01
23
4
Partition en colonne
iterations
21 / 28
Accuracy of estimation and co-clustering
Setting 1:
mu pi alpha beta
0.0
0.5
1.0
1.5
2.0
2.5
Error in parameter estimation
row column
0.0
0.2
0.4
0.6
0.8
1.0
Quality of the partitions (ARI)
Setting 2:
mu pi alpha beta
0.0
0.5
1.0
1.5
2.0
2.5
Error in parameter estimation
row column
0.0
0.2
0.4
0.6
0.8
1.0
Quality of the partitions (ARI)
22 / 28
Plan
The BOS model for ordinal data
The Latent Block Model
Model inference
Numerical experiments
Model parameter estimation accuracy
Ability of ICL-BIC to select the number of co-clusters
Comparison with competitors
23 / 28
Selection of the number of co-clusters
Exp. setup� experimental setting 1� ICL-BIC computed for 2 to 4 clusters in row / column� 50 simulations
Results
L2 3 4
K
2 0 0 03 0 46 34 0 1 0
24 / 28
Plan
The BOS model for ordinal data
The Latent Block Model
Model inference
Numerical experiments
Model parameter estimation accuracy
Ability of ICL-BIC to select the number of co-clusters
Comparison with competitors
25 / 28
Comparison with competitors
Competitors� R package blocksluter for nominal data.� R package blocksluter for continuous data.� Optimal is the Bayes classifier using the true model parameter values.
Results
setting 1 setting 2ARI row ARI column ARI row ARI column
BOS 0.971 (0.117) 0.960 (0.139) 0.581 (0.149) 0.589 (0.171)bc categ. 1(0) 0(0) 0.288 (0.088) 0.018 (0.055)bc conti. 0.841 (0.290) 0.833 (0.288) 0.421 (0.103) 0.270 (0.110)Optimal 1(0) 1(0) 0.761 (0.087) 0.759 (0.087)
26 / 28
Conclusions
Results� BOS: new probability distribution for ordinal data which respects the
ordinal scale of data� (co-)clustering algorithms have been developed based on the BOS
model� Applications are welcome !
27 / 28
References
[1] Biernacki & Jacques (2015), Model-based clustering of multivariateordinal data relying on a stochastic binary search algorithm, toappear in Statistics and Computing.
[2] Agresti (2010), Analysis of ordinal categorical data, Wiley.[3] Govaert, G. and Nadif, M. (2013). Co-Clustering. Wiley-ISTE.
28 / 28