This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Latent Class Analysis
Foundation Entries
SAGE Research Methods Foundations
By: Jay Magidson, Jeroen K. Vermunt & John P. Madura
Published:2020
Length: 10,000 Words
DOI: http://dx.doi.org/10.4135/9781526421036
Methods: Latent Class Analysis
Online ISBN: 9781526421036
Disciplines: Anthropology, Business and Management, Criminology and Criminal Justice,
Communication and Media Studies, Counseling and Psychotherapy, Economics, Education, Geography,
Health, History, Marketing, Nursing, Political Science and International Relations, Psychology, Social
Policy and Public Policy, Social Work, Sociology, Science, Technology, Computer Science, Engineering,
traditional LC modeling. In fact, it was Karl Pearson who first brought attention to this type of statistical
application. In his first statistical publication, Pearson (1883) dealt with the approximation of a complicated
continuous density as a finite mixture of simpler densities. In a classic application, he showed that the
asymmetric nature of the forehead to body length distribution in crabs can be explained as a mixture of
two normal probability density functions with different means and different variances (Pearson, 1894). He
interpreted the results as providing evidence that this population was evolving into two new species.
Since inclusion of continuous indicators introduces variance parameters into LC models, this allows classes
to be revealed that differ not only in their means for one or more indicators but also in variance. As explained
in the next subsection, the ability to account for variance heterogeneity allows LC models to extract segments
that are more meaningful than those obtained from the K-means clustering algorithm. While χ2 model fit
statistics such as L2 and X2 are not available for continuous variables, the BICLL statistic can be used with
such models since its computation requires only knowledge of the likelihood function.
Relationship to K-Means
When all variables in a LC model are continuous, LC models can be compared directly to the popular
K-means approach to clustering, which has been shown to be equivalent to maximizing the classification
likelihood of a restricted mixture model (Vermunt, 2011). The K-means Euclidean distance criterion translates
into implicit assumptions of (a) local independence and (b) equal within-cluster variances. These assumptions
are often referred to as sphericity because the locus of points associated each of the K-clusters correspond
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 22 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
to equal-sized spheres.
In comparison to LC, K-means has been shown to perform poorly in cluster recovery because the K-means
assumption that residual variances are equal within each cluster fails to hold true in practice (Magidson &
Vermunt, 2002a, 2002b; Vermunt, 2011).
As an example, Figure 6 shows how patients from three known groups (those with “overt diabetes,” those
with “chemical diabetes,” and “normal” individuals who do not have diabetes), compare with respect to three
continuous measurements—GLUCOSE, INSULIN, and SSPG (Reaven & Miller, 1999). In particular, the
scatterplot of GLUCOSE by INSULIN reveals that the variances for these variables are much larger in the
Overt diabetes group,
Figure 6. Matrix scatterplot of diabetes data set by clinical classification.
Three-class LC models that allow variances to differ across classes not only provide the best fit to these data
in terms of the BIC statistic but also have been shown to recover the three structural groups with much higher
accuracy than K-means. Using simulated data, Vermunt (2011) confirms the general superiority of LC over
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 23 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
K-means in recovering true class membership in the commonly occurring situation where variances differ
across classes.
The scatterplot between (y1) GLUCOSE and (y2) INSULIN in Figure 6 shows that the Overt group has larger
variances than the other groups, and also that a positive correlation between these variables remains within
this group, violating the local independence assumption. One way to handle such violations without increasing
the number of classes is to include a direct effect in the model. This is handled by applying the bivariate
normal distribution to the variable pair, P(y1, y2 | X = k) as shown in Equation 7, which introduces a covariance
parameter along with the mean and variance parameters:
(7)
Covariate Extension: One-Step and Three-Step Approaches
An important extension of the LC model involves inclusion of covariates predicting class membership (Dayton
& Macready, 1988; Kamakura, Wedel, & Agrawal, 1994). Denoting a person i’s covariate vector by zi, this
extended LC model is defined as:
(8)
The main change compared to the basic LC model is that the class membership probabilities may now be
dependent on zi, whereas the conditional probability of the indicators, P(yij | X = k), remains unchanged.
Note that an important additional assumption is made, namely that the effect of the zi on the yi is fully
mediated by the LCs. It is possible to test this assumption using local fit measures (BVRs) similar to those
discussed earlier, as well as to relax it by allowing for direct effects, which implies replacing P(yij | X = k)by P(yij | X = k, zi) for one or more of the yij. Typically, P(X = k | zi) is modeled using a multinomial logistic
specification; that is,
(9)
where γ0k and γpk represents the intercept and the slope of predictor zip for LC k. For identification, we assume
parameters sum to 0 across classes (effect coding) or are equated to 0 for one class (dummy coding).
The simultaneous modeling of responses yi and covariates zi using this one-step approach may sometimes
be impractical, especially when the number of possibly relevant covariates is large. Moreover, in most
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 24 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
applications, we wish to obtain a clustering that is not affected by the chosen covariates but only by the
selected indicator variables. Therefore, most researchers prefer using a three-step approach involving:
1. estimating the LC model without covariates,
2. obtaining the individuals’ class assignments using the posterior membership probabilities, and
3. investigating how the class assignments are related to covariates.
However, as shown by Annabel Bolck, Marcel Croon, and Jacques Hagenaars (2004), this three-step
approach yields downward biased estimates of the covariate effects. Based on the work of these authors,
Vermunt (2010) proposed a simple method to adjust for this bias (see also Bakk, Tekle, & Vermunt, 2013).
The adjustment is based on the following relationship between the class assignments wi and the true class
memberships:
(10)
Note that this again is a LC model but with wi as a single “response” variable. The adjustment proposed by
Vermunt (2010) therefore involves estimating a LC model with zi as concomitant variables and wi as the single
response variable, while fixing the P(wi | X = k) at the values computed using the parameter estimates from
the first step.
Multilevel LC Analysis
Vermunt (2003) proposed the multilevel LC model, which can be used when individuals (lower level units
such as students) belong to groups (higher level units such as schools), and when the number of groups is
too large to use the grouping variable as a series of dummy variables in a LC model with covariates. The
description of the multilevel LC model requires expansion of our notation. We refer to a particular group or
higher level unit as g and to the response vector of a group and an individual within a group as yg and ygi,
respectively. The number of individuals within a group is denoted by ng, the group-level LC variable by V, a
group-level LC by d, and the number of group-level LCs by D. The lower level part of the two-level LC model
has the following form:
(11)
which is the same as a standard LC model, except for the fact that the lower level LC proportions are allowed
to differ across higher level LCs V. As in the standard LC model, we assume local independence across the
J indicators. The higher level part of the model which connects the responses of the ng persons in group g
equals
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 25 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
(12)
As can be seen, the main additional model assumptions are each group belongs to one of D group-level
LCs and that the individuals’ responses within a group are independent given the group’s class membership.
Combining the Equations (11) and (12) yields the full equation of a two-level LC model:
(13)
As in a standard LC class model, we can include covariates predicting the higher level and lower level class
memberships, either using a one-step or a three-step approach. Moreover, the assumption that the ygij are
independent of the group’s class membership given the individual’s class membership can be tested and
relaxed (Nagelkerke, Oberski, & Vermunt, 2016).
LM or Latent Transition Models
A LM model is a LC model for longitudinal data in which persons are allowed to switch between latent states
across measurement occasions. It is also referred to as latent transition model (Collins & Lanza, 2010),
hidden Markov model (MacDonald & Zucchini, 1997; Visser, 2011), Markov switching or regime-switching
model.
More generally, a mixture LM model utilizes both LCs and latent states to study different transition patterns,
from one latent state to another, that occur for different LCs. For example, a Mover-Stayer model is a 2-class
LM model where one class consists of “stayers” who always remain in the same state, while the other class
are “movers” who change from one state to another over time.
LC Regression or Conjoint Models
LC regression models differ from LC cluster models in that the parameters that differ across unobserved
subgroups are regression coefficients rather than conditional probabilities. However, unlike standard
regression, where regression coefficients are treated as fixed effects, LC regression is more like mixed
models which allows for heterogeneity in the regression coefficients between observations.
Typical LC regression applications often involve multiple replications or repeated univariate responses for
each case, where the replications correspond to time (one replication for each time point), or different
situations, such as ratings of different products. Applications of the former include LC growth modeling, where
observations are clustered or grouped based on the way they change over time. The latter includes ratings-
based and choice-based conjoint applications, where observations are generally clustered or segmented
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 26 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
based on the attributes that drive their ratings (or choices).
To facilitate LC regression models that involve multiple replications per case, the data are organized as a
long file rather than the typical wide file format. For example, in a taste testing experiment, consumers were
asked to rate each of J = 15 cracker products on an M = 9-point liking scale (Popper et. al., 2004; Magidson &
Vermunt, 2006). Figure 7 shows the data as a wide file in which each of the 15 ratings are stored in separate
columns, and each record corresponds to a different consumer (e.g., ratings for consumer #1101 are provided
in the first record). This format is typical for a LC cluster analysis.
Figure 7. Example of wide file format.
In contrast, Figure 8 shows the same data as a long file in which the product ratings for a given consumer
are stored in a single response variable RATING, with separate records for each of the 15 products
for that consumer. The latter format allows for the regression of the dependent variable RATING to be
performed as a function of the single nominal dependent variable PRODUCT or as a function of separate
product attributes. To accommodate the latter regression, the restructured data also include four appearance
attributes (JAPP1-JAPP4), four flavor attributes (FLV1-FLV4), and four texture attributes (TEX1-TEX4) for
each cracker product (Figure 8). The values for these sensory attributes were obtained from food experts
(see Popper et. al., 2004).
For these data, regardless of whether a LC cluster analysis is performed, a LC regression with product as a
nominal predictor, or LC regression with the product attributes as predictors, the resulting class assignments
are similar. In all three cases, a 2-class model consists of respondents who tend to rate all crackers relatively
low (Class 1) and respondents who tend to rate all crackers relatively high (Class 2)—see Figure 9. This result
is typical when analyzing ratings data. While successful in capturing the strong response level difference,
this type of result is not useful for food manufacturers, who want to know which types of crackers appeal to
different consumer segments.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 27 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
Figure 8. Example of long file format for LC Cluster model.
Figure 9. Results from standard 2-class LC analysis.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 28 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
Next, some approaches that separate out confounding factors, resulting in more meaningful classes, are
described. In particular, the random intercept model is very useful when analyzing ratings data to remove the
response level confound, by capturing local dependencies in a structured manner (see “Random Intercept
Model for Analyzing Ratings” section), and the SALC model (see “SALC Tree Models for Analyzing Choices”
section) is useful in adjusting classes to remove the confounding effects of scale in choice models.
Regardless of whether ratings or choices are analyzed, the challenge in both cases is to extract classes that
are meaningful and free from the potential confounding effects of response level (for the analysis of ratings)
or scale effects (for the analysis of choices).
Random Intercept Model for Analyzing Ratings
In practice, when LC regression is used with ratings data, care must be taken to avoid regression intercept
heterogeneity from dominating the model, resulting in LC segments that differ primarily in their ratings
style—one class tends to rate all objects high while a second class tends to give lower ratings to all objects.
To deal with this problem, a random intercept can be introduced into the LC regression model to account for
this heterogeneity, allowing the LCs to capture the more meaningful heterogeneity related to differences in
the regression coefficients. Using the ordinal scale type based on the adjacent-category logit model for m
rating, the LC random intercept regression model (Magidson & Vermunt, 2006) for the rating of product t can
be expressed as:
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 29 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
(14)
where the continuous latent variable F is used to model the random intercept, allowing the regression
coefficients βk1, βk2, …, βk15 to assess liking for each of the 15 cracker products relative to individual i’s overall
liking of crackers (as reflected by their intercept). This is similar to centering where the ratings for an individual
is measured relative to their average rating for all 15 crackers. Figure 9 plots the results obtained from the
2-class random intercept regression model. Compared to the results from the standard LC (Figure 10), the
resulting classes now differ in their relative preferences for one cracker over another and thus are more useful
to food manufacturers.
Figure 10. Results from 2-class LC random intercept regression analysis.
Results from the alternative 2-class random intercept regression where the sensory attributes Z1 − ZQ are
utilized as predictors instead of PRODUCT yields results very similar to that in Figure 9, providing evidence
that class differences in the liking ratings reflect different preferences with respect to the cracker attributes.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 30 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
The specification of this latter model is
(15)
SALC Tree Models for Analyzing Choices
By asking respondents to choose between two or more alternatives rather than rate each alternative, choice-
based conjoint avoids the response style problem inherent in ratings-based conjoint. However, LC choice
modeling has its own unique scaling problem that should be dealt with in order to avoid the confounding
effects of scale classes (Groothuis-Oudshoorn et al., 2018). By decomposing utilities into separate scale and
preference components, the SALC model (Magidson & Vermunt, 2007) was introduced as a potential solution
to this problem, allowing LCs to reflect differences in preferences rather than differences in scale.
Table 9 provides results from a standard LC best-worst choice model where each respondent selected their
most and least preferred principle to be used in health plan design (for details of these data, see Louviere &
Flynn, 2007).
The class-specific utility estimates presented in Table 9 yield different rankings of the 15 principles, the
highest estimate corresponding to the principle with the largest utility for that class. Because of the relatively
large number of classes and the potential scale confound, interpretation becomes somewhat difficult.
Table 9. Utilities estimates for the 8-class model.
Description Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 7 Class 8