Introduction to latent variable models - EIEF · Introduction to latent variable models Lecture 2 ... • Item Response Theory models ... . 11 dummies for the nursing homes

Introduction to latent variable models

Lecture 2

Francesco BartolucciDepartment of Economics, Finance and Statistics

University of Perugia, IT

[email protected]

– Typeset by FoilTEX – 1

[2/23]

Outline

• Examples on the EM algorithm for finite mixture and latent class

models

• Choice of the number of components/classes

• Computation of standard errors for the parameter estimates

• Item Response Theory models

• Dynamic versions of latent variable models for panel data


Examples on the EM algorithm for finite mixture and latent class models [3/23]

Example on the EM algorithm forfinite mixture of Normal distributions

• A finite mixture of Normal distributions with common variance is

considered

• Data consist of 500 observations simulated from a model with 2

components

• In order to select the number of components two criteria are

commonly used:

Akaike Information Criterion (AIC) = −2`(θk) + 2×#param.

Bayesian Information Criterion (BIC) = −2`(θk) + log(n)×#param.

• The second criterion is usually preferred (McLachlan & Peel, 2000)



Example on the EM algorithm for the LC model

• A latent class (LC) model for binary response variables is considered

• Data are collected on 216 subjects who responded to T = 4 items

concerning social aspects (Goodman, 1974, Biometrika)

• Data may be represented by a 24-dimensional vector of frequencies

for all the response configurations

n =

freq(0000)

freq(0001)...

freq(1111)

=

42

23...

20

• Selection criteria AIC and BIC are used for the number of classes



• For both finite mixture and LC models the likelihood may be

multimodal

• A common strategy to overcome this problem is to try different

starting values for the EM algorithm, which are randomly chosen

• In both cases the vector of probabilities π = {πc} may be chosen to

be proportional to a vector with elements drawn from U(0,1)

• For the finite mixture case we can draw each µc from N(y,S) and

let Σ = S

. y: sample mean

. S: sample variance

• For the LC case we can draw every λtc from U(0,1)



Latent regression model

• Two possible choices to include individual covariates:

1. on the measurement model so that we have random intercepts (via

a logit or probit parametrization):

λitc = p(yit = 1|ui = ξc,Xi),

logλitc

1− λitc= ξc + x′

itβ, i = 1, . . . , n, t = 1, . . . , T, c = 1, . . . , k

2. on the model for the distribution of the latent variables (via a

multinomial logit parameterization):

πic = p(ui = ξc|Xi), logπic

πi1= x′

itβc, c = 2, . . . , k

• Alternative parameterizations are possible with ordinal response

variables or ordered latent classes



• The models based on the two extensions have a different

interpretation:

1. the latent variables are used to account for the unobserved

heterogeneity and then the model may be seen as discrete version

of the logistic model with one random effect

2. the main interest is on a latent variable which is measured through

the observable response variables (e.g. health status) and on how

this latent variable depends on the covariates

• Only the M-step of the EM algorithm must be modified by exploiting

standard algorithms for the maximization of:

1. the weighed likelihood of a logit model

2. the likelihood of a multinomial logit model



Example on the EM algorithm forlatent regression model (type 2)

• Data about 1,093 elderly people, admitted in 2003 to 11 nursing

homes in Umbria, who responded to 9 items about their health status:

Item %

1 [CC1] Does the patient show problems in recalling what

recently happened (5 minutes)? 72.6

2 [CC2] Does the patient show problems in making decisions

regarding tasks of daily life? 64.2

3 [CC3] Does the patient have problems in being understood? 43.9

4 [ADL1] Does the patient need support in moving to/from lying position,

turning side to side and positioning body while in bed? 54.4

5 [ADL2] Does the patient need support in moving to/from bed, chair,

wheelchair and standing position? 59.0

6 [ADL3] Does the patient need support for eating? 28.7

7 [ADL4] Does the patient need support for using the toilet room? 63.5

8 [SC1] Does the patient show presence of pressure ulcers? 15.4

9 [SC2] Does the patient show presence of other ulcers? 23.1



• Binary responses to items are coded so that 1 is a sign of bad health

conditions

• The available covariates are:

. gender (0 = male, 1 = female)

. 11 dummies for the nursing homes

. age

• Many latent classes (k = 6) are selected through BIC; in order to

have a easier interpretation of the classes, the constraint of

monotonicity of the conditional probabilities should be used (ordered

latent classes: λt1 ≤ · · · ≤ λtc, t = 1, . . . , T )



Computation of the standard errors

• Differently from the Fisher-scoring and Newton-Raphson algorithms,

the EM algorithm does not provide the information matrix of the

incomplete data; this matrix allows us to obtain standard errors

• Many methods are available to obtain this matrix from the

information matrix of the complete data that is used within the EM

algorithm (McLachlan & Peel, 2000)

• A simple method has been used by Bartolucci & Farcomeni

(2009,Jasa); it is based on the fact that

s(θ) =∂`(θ)∂θ

=∂Q(θ|θ)

∂θ

∣∣∣∣θ=θ



• The score at θ of the incomplete data is then equal to the score of

the complete data (first derivative of the expected value of the

complete data log-likelihood computed at the same point θ)

• By computing (minus) the numerical derivative of s(θ) we obtain an

approximated observed information matrix

J(θ) ≈ J(θ) = −∂2`(θ)∂θ∂θ′

• The standard error for each estimate θj, se(θj), is then obtained as

the squared root of the corresponding diagonal element of

J(θ)−1


Item Response Theory models [12/23]

Item Response Theory (IRT) models

• IRT models are tailored to the analysis of data arising from the

administration of a questionnaire made of a series of items which

measure a common (continuous) latent trait

• The main application of these models is then for educational

assessment, where the latent trait corresponds to a certain type of

ability of an examinee

• Main references: Fischer & Molenaar (1995), Hambleton &

Swaminathan (1996), van der Linden & Hambleton (1997), Baker &

Kim (2004)



• Main IRT assumptions:

. unidimensionality: for each subject i, the responses to the T items

depend on the same latent variable ui

. local independence: for each subject i, the responses to the T

items are independent given ui

. monotonicity: the probability pt(ui) = p(yit = 1|ui) is a monotonic

increasing function of ui

• Most used Item Response Functions (IRF) for pt(ui):

. one-parameter logistic (1PL, Rasch, 1960):

pt(ui) =eui−βt

1 + eui−βt

∗ βt: difficulty level of item t



. two-parameter logistic (2PL, Birnbaum, 1968):

pt(ui) =eαt(ui−βt)

1 + eαt(ui−βt)

∗ αt: discriminating index of item t measuring how strongly the

probability of success depends on the ability level

. three-parameter logistic (3PL, Birnbaum, 1968):

pt(ui) = γt + (1− γt)eαt(ui−βt)

1 + eαt(ui−βt)

∗ γt: guess parameter corresponding to probability of success for a

subject with ability level tending to −∞



• Most used estimation methods:

. Joint Maximum Likelihood (JML): fixed-parameters approach

which consists of maximizing the likelihood of the model with

respect to the ability and item parameters jointly

. Conditional Maximum Likelihood (CML): applicable only to

estimate the difficulty parameters of the Rasch model. It is based

on the maximization of the conditional likelihood of these

parameters given a set of sufficient statistics for the ability

parameters

. Marginal Maximum Likelihood (MML): random-parameters

approach which consists of maximizing the marginal likelihood

corresponding to the manifest probability of the observed responses



Joint maximum likelihood method

• Local independence implies:

p(yi|ui) =∏

t

pt(ui)yit[1− pt(ui)]1−yit

• The joint likelihood is then

LJ(θ) =∏

i

p(yi|ui) =∏

i

∏t

pt(ui)yit[1− pt(ui)]1−yit

. θ: parameter vector which contains the item parameters and the

ability parameters (ui)

• LJ(θ) is maximized by a standard Newton-Raphson algorithm

(attention must be payed to the implementation with many subjects)

• The method is simple to apply, but it is known to lead to an

inconsistent estimator



Conditional maximum likelihood method

• The method exploits the conditional likelihood of yi given

yi+ =∑

t yit:

p(yi|yi+, ui) =p(yi|ui)p(yi+|ui)

= p(yi|yi+)

which does not depend on ui for the Rasch model

• The conditional likelihood is then

LC(β) =∏

i

p(yi|yi+)

• LC(β) is maximized by a Newton-Raphson algorithm, which also

produces standard errors (attention must be payed to the

implementation with many items)



• The method leads to a consistent estimator, but only for the

difficulty parameters in β



Marginal maximum likelihood method

• The method exploits the manifest distribution of yi

p(yi) =∫

p(yi|ui)p(ui)dui

• The marginal log-likelihood is then LM(θ) =∏

i p(yi)

. θ: parameter vector which contains the item parameters and the

parameters of the latent distribution

• The distribution of ui may be continuous or discrete; the second is

seen as a semiparametric approach (Lindsay et al., 1991, Jasa)

• Maximization of LM(θ) is carried on via a Newton-Raphson

algorithm (typically with continuous latent distribution) or EM

algorithm (typically with discrete latent distribution)



Example

• Application based on a dataset provided by the Educational Testing

Service (Bartolucci & Forcina, 2005, Psychometrika)

• Data concern responses of 1,510 students to 12 items on Math

within National Assessment of Educational Progress 1996 project:

1 Round to thousand place

2 Write fraction that represents shaded region

3 Multiply two negative integers

4 Reason about sample space (number correct)

5 Find amount of restaurant tip

6 Identify representative sample

7 Read dials on a meter

8 Find (x, y) solution of linear equation

9 Translate words to symbols

10 Find number of diagonals in polygon from a vertex

11 Find perimeter (quadrilateral)

12 Reason about betweenness



estimate s.e. 95%-conf.int.

β1 0.000 – – –

β2 -0.051 0.097 -0.241 0.138

β3 0.755 0.093 0.574 0.936

β4 -1.140 0.111 -1.357 -0.923

β5 1.672 0.092 1.491 1.853

β6 0.014 0.096 -0.175 0.202

β7 0.724 0.093 0.542 0.905

β8 1.305 0.092 1.125 1.485

β9 0.365 0.094 0.181 0.549

β10 0.574 0.093 0.391 0.756

β11 2.697 0.098 2.505 2.888

β12 2.751 0.098 2.558 2.944

u1 -0.080 0.674 -1.400 1.241

u2 1.193 0.662 -0.104 2.491

u3 1.193 0.662 -0.104 2.491

u4 0.770 0.649 -0.501 2.041

u5 -0.080 0.674 -1.400 1.241... ... ... ... ...

u1510 2.158 0.750 0.689 3.626

Table 1: JML estiamtes of ability and item parameters under the Rasch model



estimate s.e. 95%-conf.int.

β1 0.000 – – –

β2 -0.047 0.092 -0.229 0.134

β3 0.691 0.088 0.517 0.864

β4 -1.040 0.106 -1.247 -0.833

β5 1.521 0.088 1.349 1.693

β6 0.013 0.092 -0.168 0.193

β7 0.662 0.089 0.489 0.836

β8 1.191 0.088 1.019 1.363

β9 0.334 0.090 0.158 0.511

β10 0.525 0.089 0.351 0.700

β11 2.427 0.092 2.246 2.607

β12 2.474 0.093 2.292 2.655

Table 2: CML estimates of the item parameters of the Rasch model



#classes (k) `M(θk) #parameters BIC

1 -11009 12 22106

2 -10242 14 20586

3 -10166 16 20450

4 -10163 18 20458

Table 3: Selection of the number of classes for the latent class Rasch model (model

with one discrete latent variable)

class ability probability

1 -0.645 0.165

2 0.970 0.457

3 2.432 0.378

Table 4: MML estimates of the ability parameters of the Rasch model with 3 latent

classes

• Estimates of the item parameters are very similar to those obtained

with the CML approach


Introduction to latent variable models - EIEF · Introduction to latent variable models Lecture 2 ... • Item Response Theory models ... . 11 dummies for the nursing homes

Documents