Top Banner
Model-Based Co-clustering for Ordinal Data Julien JACQUES 1 & Christophe BIERNACKI 2 1 Université de Lyon, Lumière Lyon 2, ERIC 2 Université Lille 1 et Inria 1 / 28
29

Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Aug 18, 2019

Download

Documents

truongnguyet
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Model-Based Co-clustering for Ordinal Data

Julien JACQUES1 & Christophe BIERNACKI2

1 Université de Lyon, Lumière Lyon 2, ERIC2 Université Lille 1 et Inria

1 / 28

Page 2: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Ordinal data ?

DefinitionAn ordinal variable µ takes values among m full ordered modalities

µ ∈ {1, . . . ,m} with 1 < . . . < m

Widespread data� Marketing: customer satisfaction

surveys� Sociology: education levels� Medecine: pain evaluation� Many nominal data are . . . ordinal!

2 / 28

Page 3: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Co-clustering ?Simultaneous clustering of rows (individuals) and column (features)

3 / 28

Page 4: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Overview

The BOS model for ordinal data

The Latent Block Model

Model inference

Numerical experiments

Model parameter estimation accuracy

Ability of ICL-BIC to select the number of co-clusters

Comparison with competitors

4 / 28

Page 5: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Plan

The BOS model for ordinal data

The Latent Block Model

Model inference

Numerical experiments

Model parameter estimation accuracy

Ability of ICL-BIC to select the number of co-clusters

Comparison with competitors

5 / 28

Page 6: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

BOS(µ, π) model: parameters and properties [1]

� µ: position parameter (unique mode if π > 0)� monotonic decrease around µ� π: precision parameter:

� p(µ;µ, π) increases with π� p(µ;µ, π)− p(x ;µ, π) increases with π (x 6= µ)� uniform distribution if π = 0� Dirac in µ if π = 1

� identifiability (if π = 0)

[1] Biernacki & Jacques (2015), Model-based clustering of multivariate ordinaldata relying on a stochastic binary search algorithm, to appear in Statistics andComputing.

6 / 28

Page 7: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

BOS(µ, π) model: illustration

2 40

0.5

1

mu=1,π=0

2 40

0.5

1

mu=1,π=0.1

2 40

0.5

1

mu=1,π=0.2

2 40

0.5

1

mu=1,π=0.5

2 40

0.5

1

mu=2,π=0

2 40

0.5

1

mu=2,π=0.1

2 40

0.5

1

mu=2,π=0.2

2 40

0.5

1

mu=2,π=0.5

2 40

0.5

1

mu=3,π=0

2 40

0.5

1

mu=3,π=0.1

2 40

0.5

1

mu=3,π=0.2

2 40

0.5

1

mu=3,π=0.5

7 / 28

Page 8: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

BOS(µ, π) model: inference

Due to the nature of the BOS(µ, π) model, maximum likelihood shouldbe estimated thanks to an EM algorithm.

8 / 28

Page 9: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Plan

The BOS model for ordinal data

The Latent Block Model

Model inference

Numerical experiments

Model parameter estimation accuracy

Ability of ICL-BIC to select the number of co-clusters

Comparison with competitors

9 / 28

Page 10: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Latent Block Model

Latent Block Model (LBM)

BOS(µk`, πk`)

orginal data coclustering result

10 / 28

Page 11: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Latent Block Model

Latent Block Model (LBM)n × d random variables x are assumed to be independent once the rowv = (vik )i,k and column w = (wh`)h,` partitions are fixed:

p(x; θ) =∑v∈V

∑w∈W

p(v; θ)p(w; θ)p(x|v,w; θ)

with� V (W ) set of possible partitions of rows (column) into K (L) groups,� p(v; θ) =

∏ik α

vikk and p(w; θ) =

∏h` β

wh``

� p(x|v,w; θ) =∏

ihk` p(xih;µk`, πk`)vik wh` where p(xih;µk`, πk`) ∼ BOS(µk`, πk`)

� θ = (πk`, µk`, αk , β`)

11 / 28

Page 12: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Plan

The BOS model for ordinal data

The Latent Block Model

Model inference

Numerical experiments

Model parameter estimation accuracy

Ability of ICL-BIC to select the number of co-clusters

Comparison with competitors

12 / 28

Page 13: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

LBM inference

LBM inference� The aim is to estimate θ by maximizing the observed log-likelihood

`(θ; x) =∑

x

ln p(x; θ). (1)

with x is the observed data and x the unobserved one� v and w are missing� EM is not computationally tractable� ⇒ variational or stochastic version should be used

13 / 28

Page 14: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

LBM inference

SEM-Gibbs algorithm for LBM inference� init : θ(0), w(0)

� SE step� generate the row partition v (q+1)

ik |x,w(q)

p(vik = 1|x,w(q); θ(q)) =α(q)k fk (x (q)

i. |w(q); θ(q))∑

k′ α(q)k′ fk′(x (q)

i. |w(q); θ(q))

� generate the column partition w (q+1)h` |x, v(q+1)

p(wh` = 1|x, v(q+1); θ(q)) =β(q)` g`(x (q)

.h |v(q+1); θ(q))∑

`′ β(q)`′ g`′(x

(q).h |v(q+1); θ(q))

� M step� Estimate θ, conditionally on v(q+1),w(q+1) obtained at the SE step (and

also conditionally to x), using the EM algorithm for BOS inference.14 / 28

Page 15: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

LBM inference

SEM-Gibbs algorithm for LBM inference� θ is obtained by mean / mode of the sample distribution (after a burn

in period)� final bipartition (v, w) estimated by MAP conditionally on θ

15 / 28

Page 16: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

LBM inference

Choosing K and LWe propose to adapt the ICL-BIC criterion developed in (Keribin et al.2014) for categorical data coclustering based on the multinomialdistribution.Thus, K and L can be chosen by maximizing

ICL-BIC(K ,L) = log p(x, v, w; θ)− K − 12

log n − L− 12

log d − KL2

log(nd)

Missing dataWithin the SEM-Gibbs framework, missing data can be easily taken intoaccount by considering them as an additional missing random variableto be simulated in the SE step.

16 / 28

Page 17: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

LBM inference

Choosing K and LWe propose to adapt the ICL-BIC criterion developed in (Keribin et al.2014) for categorical data coclustering based on the multinomialdistribution.Thus, K and L can be chosen by maximizing

ICL-BIC(K ,L) = log p(x, v, w; θ)− K − 12

log n − L− 12

log d − KL2

log(nd)

Missing dataWithin the SEM-Gibbs framework, missing data can be easily taken intoaccount by considering them as an additional missing random variableto be simulated in the SE step.

16 / 28

Page 18: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Plan

The BOS model for ordinal data

The Latent Block Model

Model inference

Numerical experiments

Model parameter estimation accuracy

Ability of ICL-BIC to select the number of co-clusters

Comparison with competitors

17 / 28

Page 19: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Plan

The BOS model for ordinal data

The Latent Block Model

Model inference

Numerical experiments

Model parameter estimation accuracy

Ability of ICL-BIC to select the number of co-clusters

Comparison with competitors

18 / 28

Page 20: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Experimental setup

� K = L = 3 clusters in row and column� d = 100 ordinal variables with m = 5 levels� n = 100 observations� values of (µk`, πk`)

Setting 1 Setting 2k /` 1 2 31 (1,0.9) (2,0.9) (3,0.9)2 (4,0.9) (5,0.9) (1,0.5)3 (2,0.5) (3,0.5) (4,0.5)

k /` 1 2 31 (1,0.2) (2,0.2) (3,0.2)2 (4,0.2) (5,0.2) (1,0.1)3 (2,0.1) (3,0.1) (4,0.1)

19 / 28

Page 21: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Example of data

Setting 1:

Setting 2:

20 / 28

Page 22: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

How many iterations do we need ?

0 10 20 30 40 50

01

23

45

6

mu

iterations

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

pi

iterations

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

alpha

iterations

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

beta

iterations

0 10 20 30 40 50

01

23

4

Partition en ligne

iterations

0 10 20 30 40 50

01

23

4

Partition en colonne

iterations

21 / 28

Page 23: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Accuracy of estimation and co-clustering

Setting 1:

mu pi alpha beta

0.0

0.5

1.0

1.5

2.0

2.5

Error in parameter estimation

row column

0.0

0.2

0.4

0.6

0.8

1.0

Quality of the partitions (ARI)

Setting 2:

mu pi alpha beta

0.0

0.5

1.0

1.5

2.0

2.5

Error in parameter estimation

row column

0.0

0.2

0.4

0.6

0.8

1.0

Quality of the partitions (ARI)

22 / 28

Page 24: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Plan

The BOS model for ordinal data

The Latent Block Model

Model inference

Numerical experiments

Model parameter estimation accuracy

Ability of ICL-BIC to select the number of co-clusters

Comparison with competitors

23 / 28

Page 25: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Selection of the number of co-clusters

Exp. setup� experimental setting 1� ICL-BIC computed for 2 to 4 clusters in row / column� 50 simulations

Results

L2 3 4

K

2 0 0 03 0 46 34 0 1 0

24 / 28

Page 26: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Plan

The BOS model for ordinal data

The Latent Block Model

Model inference

Numerical experiments

Model parameter estimation accuracy

Ability of ICL-BIC to select the number of co-clusters

Comparison with competitors

25 / 28

Page 27: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Comparison with competitors

Competitors� R package blocksluter for nominal data.� R package blocksluter for continuous data.� Optimal is the Bayes classifier using the true model parameter values.

Results

setting 1 setting 2ARI row ARI column ARI row ARI column

BOS 0.971 (0.117) 0.960 (0.139) 0.581 (0.149) 0.589 (0.171)bc categ. 1(0) 0(0) 0.288 (0.088) 0.018 (0.055)bc conti. 0.841 (0.290) 0.833 (0.288) 0.421 (0.103) 0.270 (0.110)Optimal 1(0) 1(0) 0.761 (0.087) 0.759 (0.087)

26 / 28

Page 28: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

Conclusions

Results� BOS: new probability distribution for ordinal data which respects the

ordinal scale of data� (co-)clustering algorithms have been developed based on the BOS

model� Applications are welcome !

27 / 28

Page 29: Model-Based Co-clustering for Ordinal Data · Overview The BOS model for ordinal data The Latent Block Model Model inference Numerical experiments Model parameter estimation accuracy

References

[1] Biernacki & Jacques (2015), Model-based clustering of multivariateordinal data relying on a stochastic binary search algorithm, toappear in Statistics and Computing.

[2] Agresti (2010), Analysis of ordinal categorical data, Wiley.[3] Govaert, G. and Nadif, M. (2013). Co-Clustering. Wiley-ISTE.

28 / 28