Introduction to latent variable models Lecture 2 Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia, IT [email protected] – Typeset by Foil T E X – 1
Introduction to latent variable models
Lecture 2
Francesco BartolucciDepartment of Economics, Finance and Statistics
University of Perugia, IT
– Typeset by FoilTEX – 1
[2/23]
Outline
• Examples on the EM algorithm for finite mixture and latent class
models
• Choice of the number of components/classes
• Computation of standard errors for the parameter estimates
• Item Response Theory models
• Dynamic versions of latent variable models for panel data
– Typeset by FoilTEX – 2
Examples on the EM algorithm for finite mixture and latent class models [3/23]
Example on the EM algorithm forfinite mixture of Normal distributions
• A finite mixture of Normal distributions with common variance is
considered
• Data consist of 500 observations simulated from a model with 2
components
• In order to select the number of components two criteria are
commonly used:
Akaike Information Criterion (AIC) = −2`(θk) + 2×#param.
Bayesian Information Criterion (BIC) = −2`(θk) + log(n)×#param.
• The second criterion is usually preferred (McLachlan & Peel, 2000)
– Typeset by FoilTEX – 3
Examples on the EM algorithm for finite mixture and latent class models [4/23]
Example on the EM algorithm for the LC model
• A latent class (LC) model for binary response variables is considered
• Data are collected on 216 subjects who responded to T = 4 items
concerning social aspects (Goodman, 1974, Biometrika)
• Data may be represented by a 24-dimensional vector of frequencies
for all the response configurations
n =
freq(0000)
freq(0001)...
freq(1111)
=
42
23...
20
• Selection criteria AIC and BIC are used for the number of classes
– Typeset by FoilTEX – 4
Examples on the EM algorithm for finite mixture and latent class models [5/23]
• For both finite mixture and LC models the likelihood may be
multimodal
• A common strategy to overcome this problem is to try different
starting values for the EM algorithm, which are randomly chosen
• In both cases the vector of probabilities π = {πc} may be chosen to
be proportional to a vector with elements drawn from U(0,1)
• For the finite mixture case we can draw each µc from N(y,S) and
let Σ = S
. y: sample mean
. S: sample variance
• For the LC case we can draw every λtc from U(0,1)
– Typeset by FoilTEX – 5
Examples on the EM algorithm for finite mixture and latent class models [6/23]
Latent regression model
• Two possible choices to include individual covariates:
1. on the measurement model so that we have random intercepts (via
a logit or probit parametrization):
λitc = p(yit = 1|ui = ξc,Xi),
logλitc
1− λitc= ξc + x′
itβ, i = 1, . . . , n, t = 1, . . . , T, c = 1, . . . , k
2. on the model for the distribution of the latent variables (via a
multinomial logit parameterization):
πic = p(ui = ξc|Xi), logπic
πi1= x′
itβc, c = 2, . . . , k
• Alternative parameterizations are possible with ordinal response
variables or ordered latent classes
– Typeset by FoilTEX – 6
Examples on the EM algorithm for finite mixture and latent class models [7/23]
• The models based on the two extensions have a different
interpretation:
1. the latent variables are used to account for the unobserved
heterogeneity and then the model may be seen as discrete version
of the logistic model with one random effect
2. the main interest is on a latent variable which is measured through
the observable response variables (e.g. health status) and on how
this latent variable depends on the covariates
• Only the M-step of the EM algorithm must be modified by exploiting
standard algorithms for the maximization of:
1. the weighed likelihood of a logit model
2. the likelihood of a multinomial logit model
– Typeset by FoilTEX – 7
Examples on the EM algorithm for finite mixture and latent class models [8/23]
Example on the EM algorithm forlatent regression model (type 2)
• Data about 1,093 elderly people, admitted in 2003 to 11 nursing
homes in Umbria, who responded to 9 items about their health status:
Item %
1 [CC1] Does the patient show problems in recalling what
recently happened (5 minutes)? 72.6
2 [CC2] Does the patient show problems in making decisions
regarding tasks of daily life? 64.2
3 [CC3] Does the patient have problems in being understood? 43.9
4 [ADL1] Does the patient need support in moving to/from lying position,
turning side to side and positioning body while in bed? 54.4
5 [ADL2] Does the patient need support in moving to/from bed, chair,
wheelchair and standing position? 59.0
6 [ADL3] Does the patient need support for eating? 28.7
7 [ADL4] Does the patient need support for using the toilet room? 63.5
8 [SC1] Does the patient show presence of pressure ulcers? 15.4
9 [SC2] Does the patient show presence of other ulcers? 23.1
– Typeset by FoilTEX – 8
Examples on the EM algorithm for finite mixture and latent class models [9/23]
• Binary responses to items are coded so that 1 is a sign of bad health
conditions
• The available covariates are:
. gender (0 = male, 1 = female)
. 11 dummies for the nursing homes
. age
• Many latent classes (k = 6) are selected through BIC; in order to
have a easier interpretation of the classes, the constraint of
monotonicity of the conditional probabilities should be used (ordered
latent classes: λt1 ≤ · · · ≤ λtc, t = 1, . . . , T )
– Typeset by FoilTEX – 9
Examples on the EM algorithm for finite mixture and latent class models [10/23]
Computation of the standard errors
• Differently from the Fisher-scoring and Newton-Raphson algorithms,
the EM algorithm does not provide the information matrix of the
incomplete data; this matrix allows us to obtain standard errors
• Many methods are available to obtain this matrix from the
information matrix of the complete data that is used within the EM
algorithm (McLachlan & Peel, 2000)
• A simple method has been used by Bartolucci & Farcomeni
(2009,Jasa); it is based on the fact that
s(θ) =∂`(θ)∂θ
=∂Q(θ|θ)
∂θ
∣∣∣∣θ=θ
– Typeset by FoilTEX – 10
Examples on the EM algorithm for finite mixture and latent class models [11/23]
• The score at θ of the incomplete data is then equal to the score of
the complete data (first derivative of the expected value of the
complete data log-likelihood computed at the same point θ)
• By computing (minus) the numerical derivative of s(θ) we obtain an
approximated observed information matrix
J(θ) ≈ J(θ) = −∂2`(θ)∂θ∂θ′
• The standard error for each estimate θj, se(θj), is then obtained as
the squared root of the corresponding diagonal element of
J(θ)−1
– Typeset by FoilTEX – 11
Item Response Theory models [12/23]
Item Response Theory (IRT) models
• IRT models are tailored to the analysis of data arising from the
administration of a questionnaire made of a series of items which
measure a common (continuous) latent trait
• The main application of these models is then for educational
assessment, where the latent trait corresponds to a certain type of
ability of an examinee
• Main references: Fischer & Molenaar (1995), Hambleton &
Swaminathan (1996), van der Linden & Hambleton (1997), Baker &
Kim (2004)
– Typeset by FoilTEX – 12
Item Response Theory models [13/23]
• Main IRT assumptions:
. unidimensionality: for each subject i, the responses to the T items
depend on the same latent variable ui
. local independence: for each subject i, the responses to the T
items are independent given ui
. monotonicity: the probability pt(ui) = p(yit = 1|ui) is a monotonic
increasing function of ui
• Most used Item Response Functions (IRF) for pt(ui):
. one-parameter logistic (1PL, Rasch, 1960):
pt(ui) =eui−βt
1 + eui−βt
∗ βt: difficulty level of item t
– Typeset by FoilTEX – 13
Item Response Theory models [14/23]
. two-parameter logistic (2PL, Birnbaum, 1968):
pt(ui) =eαt(ui−βt)
1 + eαt(ui−βt)
∗ αt: discriminating index of item t measuring how strongly the
probability of success depends on the ability level
. three-parameter logistic (3PL, Birnbaum, 1968):
pt(ui) = γt + (1− γt)eαt(ui−βt)
1 + eαt(ui−βt)
∗ γt: guess parameter corresponding to probability of success for a
subject with ability level tending to −∞
– Typeset by FoilTEX – 14
Item Response Theory models [15/23]
• Most used estimation methods:
. Joint Maximum Likelihood (JML): fixed-parameters approach
which consists of maximizing the likelihood of the model with
respect to the ability and item parameters jointly
. Conditional Maximum Likelihood (CML): applicable only to
estimate the difficulty parameters of the Rasch model. It is based
on the maximization of the conditional likelihood of these
parameters given a set of sufficient statistics for the ability
parameters
. Marginal Maximum Likelihood (MML): random-parameters
approach which consists of maximizing the marginal likelihood
corresponding to the manifest probability of the observed responses
– Typeset by FoilTEX – 15
Item Response Theory models [16/23]
Joint maximum likelihood method
• Local independence implies:
p(yi|ui) =∏
t
pt(ui)yit[1− pt(ui)]1−yit
• The joint likelihood is then
LJ(θ) =∏
i
p(yi|ui) =∏
i
∏t
pt(ui)yit[1− pt(ui)]1−yit
. θ: parameter vector which contains the item parameters and the
ability parameters (ui)
• LJ(θ) is maximized by a standard Newton-Raphson algorithm
(attention must be payed to the implementation with many subjects)
• The method is simple to apply, but it is known to lead to an
inconsistent estimator
– Typeset by FoilTEX – 16
Item Response Theory models [17/23]
Conditional maximum likelihood method
• The method exploits the conditional likelihood of yi given
yi+ =∑
t yit:
p(yi|yi+, ui) =p(yi|ui)p(yi+|ui)
= p(yi|yi+)
which does not depend on ui for the Rasch model
• The conditional likelihood is then
LC(β) =∏
i
p(yi|yi+)
• LC(β) is maximized by a Newton-Raphson algorithm, which also
produces standard errors (attention must be payed to the
implementation with many items)
– Typeset by FoilTEX – 17
Item Response Theory models [18/23]
• The method leads to a consistent estimator, but only for the
difficulty parameters in β
– Typeset by FoilTEX – 18
Item Response Theory models [19/23]
Marginal maximum likelihood method
• The method exploits the manifest distribution of yi
p(yi) =∫
p(yi|ui)p(ui)dui
• The marginal log-likelihood is then LM(θ) =∏
i p(yi)
. θ: parameter vector which contains the item parameters and the
parameters of the latent distribution
• The distribution of ui may be continuous or discrete; the second is
seen as a semiparametric approach (Lindsay et al., 1991, Jasa)
• Maximization of LM(θ) is carried on via a Newton-Raphson
algorithm (typically with continuous latent distribution) or EM
algorithm (typically with discrete latent distribution)
– Typeset by FoilTEX – 19
Item Response Theory models [20/23]
Example
• Application based on a dataset provided by the Educational Testing
Service (Bartolucci & Forcina, 2005, Psychometrika)
• Data concern responses of 1,510 students to 12 items on Math
within National Assessment of Educational Progress 1996 project:
1 Round to thousand place
2 Write fraction that represents shaded region
3 Multiply two negative integers
4 Reason about sample space (number correct)
5 Find amount of restaurant tip
6 Identify representative sample
7 Read dials on a meter
8 Find (x, y) solution of linear equation
9 Translate words to symbols
10 Find number of diagonals in polygon from a vertex
11 Find perimeter (quadrilateral)
12 Reason about betweenness
– Typeset by FoilTEX – 20
Item Response Theory models [21/23]
estimate s.e. 95%-conf.int.
β1 0.000 – – –
β2 -0.051 0.097 -0.241 0.138
β3 0.755 0.093 0.574 0.936
β4 -1.140 0.111 -1.357 -0.923
β5 1.672 0.092 1.491 1.853
β6 0.014 0.096 -0.175 0.202
β7 0.724 0.093 0.542 0.905
β8 1.305 0.092 1.125 1.485
β9 0.365 0.094 0.181 0.549
β10 0.574 0.093 0.391 0.756
β11 2.697 0.098 2.505 2.888
β12 2.751 0.098 2.558 2.944
u1 -0.080 0.674 -1.400 1.241
u2 1.193 0.662 -0.104 2.491
u3 1.193 0.662 -0.104 2.491
u4 0.770 0.649 -0.501 2.041
u5 -0.080 0.674 -1.400 1.241... ... ... ... ...
u1510 2.158 0.750 0.689 3.626
Table 1: JML estiamtes of ability and item parameters under the Rasch model
– Typeset by FoilTEX – 21
Item Response Theory models [22/23]
estimate s.e. 95%-conf.int.
β1 0.000 – – –
β2 -0.047 0.092 -0.229 0.134
β3 0.691 0.088 0.517 0.864
β4 -1.040 0.106 -1.247 -0.833
β5 1.521 0.088 1.349 1.693
β6 0.013 0.092 -0.168 0.193
β7 0.662 0.089 0.489 0.836
β8 1.191 0.088 1.019 1.363
β9 0.334 0.090 0.158 0.511
β10 0.525 0.089 0.351 0.700
β11 2.427 0.092 2.246 2.607
β12 2.474 0.093 2.292 2.655
Table 2: CML estimates of the item parameters of the Rasch model
– Typeset by FoilTEX – 22
Item Response Theory models [23/23]
#classes (k) `M(θk) #parameters BIC
1 -11009 12 22106
2 -10242 14 20586
3 -10166 16 20450
4 -10163 18 20458
Table 3: Selection of the number of classes for the latent class Rasch model (model
with one discrete latent variable)
class ability probability
1 -0.645 0.165
2 0.970 0.457
3 2.432 0.378
Table 4: MML estimates of the ability parameters of the Rasch model with 3 latent
classes
• Estimates of the item parameters are very similar to those obtained
with the CML approach
– Typeset by FoilTEX – 23