Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Model-Based Co-clustering for Ordinal Data

Julien JACQUES1 & Christophe BIERNACKI2

1 Université de Lyon, Lumière Lyon 2, ERIC2 Université Lille 1 et Inria

1 / 28

Ordinal data ?

DefinitionAn ordinal variable µ takes values among m full ordered modalities

µ ∈ {1, . . . ,m} with 1 < . . . < m

Widespread data� Marketing: customer satisfaction

surveys� Sociology: education levels� Medecine: pain evaluation� Many nominal data are . . . ordinal!

2 / 28

Co-clustering ?Simultaneous clustering of rows (individuals) and column (features)

3 / 28

Overview

The BOS model for ordinal data

The Latent Block Model

Model inference

Numerical experiments

Model parameter estimation accuracy

Ability of ICL-BIC to select the number of co-clusters

Comparison with competitors

4 / 28

Plan



Model inference





5 / 28

BOS(µ, π) model: parameters and properties [1]

� µ: position parameter (unique mode if π > 0)� monotonic decrease around µ� π: precision parameter:

� p(µ;µ, π) increases with π� p(µ;µ, π)− p(x ;µ, π) increases with π (x 6= µ)� uniform distribution if π = 0� Dirac in µ if π = 1

� identifiability (if π = 0)

[1] Biernacki & Jacques (2015), Model-based clustering of multivariate ordinaldata relying on a stochastic binary search algorithm, to appear in Statistics andComputing.

6 / 28

BOS(µ, π) model: illustration

2 40

0.5

1

mu=1,π=0

2 40

0.5

1

mu=1,π=0.1

2 40

0.5

1

mu=1,π=0.2

2 40

0.5

1

mu=1,π=0.5

2 40

0.5

1

mu=2,π=0

2 40

0.5

1

mu=2,π=0.1

2 40

0.5

1

mu=2,π=0.2

2 40

0.5

1

mu=2,π=0.5

2 40

0.5

1

mu=3,π=0

2 40

0.5

1

mu=3,π=0.1

2 40

0.5

1

mu=3,π=0.2

2 40

0.5

1

mu=3,π=0.5

7 / 28

BOS(µ, π) model: inference

Due to the nature of the BOS(µ, π) model, maximum likelihood shouldbe estimated thanks to an EM algorithm.

8 / 28

Plan



Model inference





9 / 28

Latent Block Model

Latent Block Model (LBM)

BOS(µk`, πk`)

orginal data coclustering result

10 / 28

Latent Block Model

Latent Block Model (LBM)n × d random variables x are assumed to be independent once the rowv = (vik )i,k and column w = (wh`)h,` partitions are fixed:

p(x; θ) =∑v∈V

∑w∈W

p(v; θ)p(w; θ)p(x|v,w; θ)

with� V (W ) set of possible partitions of rows (column) into K (L) groups,� p(v; θ) =

∏ik α

vikk and p(w; θ) =

∏h` β

wh``

� p(x|v,w; θ) =∏

ihk` p(xih;µk`, πk`)vik wh` where p(xih;µk`, πk`) ∼ BOS(µk`, πk`)

� θ = (πk`, µk`, αk , β`)

11 / 28

Plan



Model inference





12 / 28

LBM inference

LBM inference� The aim is to estimate θ by maximizing the observed log-likelihood

`(θ; x) =∑

x

ln p(x; θ). (1)

with x is the observed data and x the unobserved one� v and w are missing� EM is not computationally tractable� ⇒ variational or stochastic version should be used

13 / 28

LBM inference

SEM-Gibbs algorithm for LBM inference� init : θ(0), w(0)

� SE step� generate the row partition v (q+1)

ik |x,w(q)

p(vik = 1|x,w(q); θ(q)) =α(q)k fk (x (q)

i. |w(q); θ(q))∑

k′ α(q)k′ fk′(x (q)

i. |w(q); θ(q))

� generate the column partition w (q+1)h` |x, v(q+1)

p(wh` = 1|x, v(q+1); θ(q)) =β(q)` g`(x (q)

.h |v(q+1); θ(q))∑

`′ β(q)`′ g`′(x

(q).h |v(q+1); θ(q))

� M step� Estimate θ, conditionally on v(q+1),w(q+1) obtained at the SE step (and

also conditionally to x), using the EM algorithm for BOS inference.14 / 28

LBM inference

SEM-Gibbs algorithm for LBM inference� θ is obtained by mean / mode of the sample distribution (after a burn

in period)� final bipartition (v, w) estimated by MAP conditionally on θ

15 / 28

LBM inference

Choosing K and LWe propose to adapt the ICL-BIC criterion developed in (Keribin et al.2014) for categorical data coclustering based on the multinomialdistribution.Thus, K and L can be chosen by maximizing

ICL-BIC(K ,L) = log p(x, v, w; θ)− K − 12

log n − L− 12

log d − KL2

log(nd)

Missing dataWithin the SEM-Gibbs framework, missing data can be easily taken intoaccount by considering them as an additional missing random variableto be simulated in the SE step.

16 / 28

LBM inference

Choosing K and LWe propose to adapt the ICL-BIC criterion developed in (Keribin et al.2014) for categorical data coclustering based on the multinomialdistribution.Thus, K and L can be chosen by maximizing

ICL-BIC(K ,L) = log p(x, v, w; θ)− K − 12

log n − L− 12

log d − KL2

log(nd)

Missing dataWithin the SEM-Gibbs framework, missing data can be easily taken intoaccount by considering them as an additional missing random variableto be simulated in the SE step.

16 / 28

Plan



Model inference





17 / 28

Plan



Model inference





18 / 28

Experimental setup

� K = L = 3 clusters in row and column� d = 100 ordinal variables with m = 5 levels� n = 100 observations� values of (µk`, πk`)

Setting 1 Setting 2k /` 1 2 31 (1,0.9) (2,0.9) (3,0.9)2 (4,0.9) (5,0.9) (1,0.5)3 (2,0.5) (3,0.5) (4,0.5)

k /` 1 2 31 (1,0.2) (2,0.2) (3,0.2)2 (4,0.2) (5,0.2) (1,0.1)3 (2,0.1) (3,0.1) (4,0.1)

19 / 28

Example of data

Setting 1:

Setting 2:

20 / 28

How many iterations do we need ?

0 10 20 30 40 50

01

23

45

6

mu

iterations

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

pi

iterations

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

alpha

iterations

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

beta

iterations

0 10 20 30 40 50

01

23

4

Partition en ligne

iterations

0 10 20 30 40 50

01

23

4

Partition en colonne

iterations

21 / 28

Accuracy of estimation and co-clustering

Setting 1:

mu pi alpha beta

0.0

0.5

1.0

1.5

2.0

2.5

Error in parameter estimation

row column

0.0

0.2

0.4

0.6

0.8

1.0

Quality of the partitions (ARI)

Setting 2:

mu pi alpha beta

0.0

0.5

1.0

1.5

2.0

2.5

Error in parameter estimation

row column

0.0

0.2

0.4

0.6

0.8

1.0

Quality of the partitions (ARI)

22 / 28

Plan



Model inference





23 / 28

Selection of the number of co-clusters

Exp. setup� experimental setting 1� ICL-BIC computed for 2 to 4 clusters in row / column� 50 simulations

Results

L2 3 4

K

2 0 0 03 0 46 34 0 1 0

24 / 28

Plan



Model inference





25 / 28


Competitors� R package blocksluter for nominal data.� R package blocksluter for continuous data.� Optimal is the Bayes classifier using the true model parameter values.

Results

setting 1 setting 2ARI row ARI column ARI row ARI column

BOS 0.971 (0.117) 0.960 (0.139) 0.581 (0.149) 0.589 (0.171)bc categ. 1(0) 0(0) 0.288 (0.088) 0.018 (0.055)bc conti. 0.841 (0.290) 0.833 (0.288) 0.421 (0.103) 0.270 (0.110)Optimal 1(0) 1(0) 0.761 (0.087) 0.759 (0.087)

26 / 28

Conclusions

Results� BOS: new probability distribution for ordinal data which respects the

ordinal scale of data� (co-)clustering algorithms have been developed based on the BOS

model� Applications are welcome !

27 / 28

References

[1] Biernacki & Jacques (2015), Model-based clustering of multivariateordinal data relying on a stochastic binary search algorithm, toappear in Statistics and Computing.

[2] Agresti (2010), Analysis of ordinal categorical data, Wiley.[3] Govaert, G. and Nadif, M. (2013). Co-Clustering. Wiley-ISTE.

28 / 28

Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Documents