Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Complex sampling in latent variable models Daniel Oberski Department of methodology and statistics Complex sampling in latent variable models Daniel Oberski
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Complex sampling in latent variable models
Daniel Oberski
Department of methodology and statistics
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
• When doing latent class analysis, factor analysis, IRT, orstructural equation modeling, should you use samplingweights, stratification, and clustering variables?
• What is complex about surveys?• What is ``pseudo'' about pseudo-maximum likelihood?• What are design effects and what makes them so deft?
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Outline
..1 Complex surveys
..2 Latent variable models (LVM)
..3 Estimation of LVM under complex sampling
..4 Effect on LVM
..5 Conclusion
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Does it make a difference?
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Unweighted regression Weighted regression
Source: 1988 National Maternal and Infant Health Survey (Korn and Graubard, 1995).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Unweighted regression Weighted regression
Source: 1988 National Maternal and Infant Health Survey (Korn and Graubard, 1995).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Latent class analysis of eating vegetables
Unweighted LCALow High
Latent class 33% 77%
Recall 1 high 60% 80%Recall 2 high 51% 82%Recall 3 high 40% 81%Recall 4 high 46% 79%
LCA using weightsLow High
Latent class 18% 82%
Recall 1 high 46% 78%Recall 2 high 39% 76%Recall 3 high 28% 77%Recall 4 high 39% 73%
Source: The continuing Survey of Food Intakes by Individuals (Patterson et al., 2002).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Latent class analysis of eating vegetables
Unweighted LCALow High
Latent class 33% 77%
Recall 1 high 60% 80%Recall 2 high 51% 82%Recall 3 high 40% 81%Recall 4 high 46% 79%
LCA using weightsLow High
Latent class 18% 82%
Recall 1 high 46% 78%Recall 2 high 39% 76%Recall 3 high 28% 77%Recall 4 high 39% 73%
Source: The continuing Survey of Food Intakes by Individuals (Patterson et al., 2002).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample surveys, ``linear estimators''
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample surveys
Purposes:• Descriptive;• Analytic.
Assessment of Health Status and Social Determinantsof Health (Padgol village, Gujarat, India).Source: Boston U. India Research and Outreach Initiative.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample surveys
Idea of a sample survey: can generalize from a sample to apopulation if the sample is ``like'' the population,``representative method''.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample of people ``like'' the population?
• Neyman (1934) figured this would be true on average ifyou draw a random sample;
• This is the theory we still use today.
``Linear estimator'':
Eπ
n−1∑
i∈sampleyi
= N−1∑
i∈populationyi.
and generallymn
d→ N [µ, var(mn)]
``Design-consistent''
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample of people ``like'' the population?
• Neyman (1934) figured this would be true on average ifyou draw a random sample;
• This is the theory we still use today.
``Linear estimator'':
Eπ
n−1∑
i∈sampleyi
= N−1∑
i∈populationyi.
and generallymn
d→ N [µ, var(mn)]
``Design-consistent''
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample of people ``like'' the population?
• Neyman (1934) figured this would be true on average ifyou draw a random sample;
• This is the theory we still use today.``Linear estimator'':
Eπ
n−1∑
i∈sampleyi
= N−1∑
i∈populationyi.
and generallymn
d→ N [µ, var(mn)]
``Design-consistent''
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample of people ``like'' the population?
• Neyman (1934) figured this would be true on average ifyou draw a random sample;
• This is the theory we still use today.``Linear estimator'':
Eπ
n−1∑
i∈sampleyi
= N−1∑
i∈populationyi.
and generallymn
d→ N [µ, var(mn)]
``Design-consistent''
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
``Linear estimator''• Most of the time when people talk about ``linearestimators'', they are thinking about means and totals.
• But a proportion is a linear estimator too;
• for ex., proportion observed for response patterns:
• Even the (co)variance is a linear estimator, if you redefined := (y− E(Y))(y− E(Y))T: then var(y) = (n− 1)−1
∑d
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Complications→ ``complex surveys'':• Clustering• Stratification• Selection with unequal probabilities πi
Equivalent: not independently and identically distributed (iid)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Clustering
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Simple random sampling: a lot of driving
A simple random sample of voter locations in the US.Source: Lumley (2010).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Source: Heeringa et al. (2010)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample clustering for several reasons:• Geographic clustering of elements for household surveysreduces interviewing costs by amortizing travel andrelated expenditures over a group of observations. E.g.:NCS- R, National Health and Nutrition Examination Survey(NHANES), Health and Retirement Study (HRS)
• Sample elements may not be individually identified on theavailable sampling frames but can be linked to aggregatecluster units (e.g., voters at precinct polling stations,students in colleges and universities). The availablesampling frame often identifies only the cluster groupings.
• One or more stages of the sample are deliberatelyclustered to enable the estimation of multilevel modelsand components of variance in variables of interest (e.g.,students in classes, classes within schools).
(Heeringa et al., 2010, p. 28)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample clustering for several reasons:• Geographic clustering of elements for household surveysreduces interviewing costs by amortizing travel andrelated expenditures over a group of observations. E.g.:NCS- R, National Health and Nutrition Examination Survey(NHANES), Health and Retirement Study (HRS)
• Sample elements may not be individually identified on theavailable sampling frames but can be linked to aggregatecluster units (e.g., voters at precinct polling stations,students in colleges and universities). The availablesampling frame often identifies only the cluster groupings.
• One or more stages of the sample are deliberatelyclustered to enable the estimation of multilevel modelsand components of variance in variables of interest (e.g.,students in classes, classes within schools).
(Heeringa et al., 2010, p. 28)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample clustering for several reasons:• Geographic clustering of elements for household surveysreduces interviewing costs by amortizing travel andrelated expenditures over a group of observations. E.g.:NCS- R, National Health and Nutrition Examination Survey(NHANES), Health and Retirement Study (HRS)
• Sample elements may not be individually identified on theavailable sampling frames but can be linked to aggregatecluster units (e.g., voters at precinct polling stations,students in colleges and universities). The availablesampling frame often identifies only the cluster groupings.
• One or more stages of the sample are deliberatelyclustered to enable the estimation of multilevel modelsand components of variance in variables of interest (e.g.,students in classes, classes within schools).
(Heeringa et al., 2010, p. 28)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Stratification
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Sample stratified by region
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Stratified sampling serves several purposes:• Relative to an SRS of equal size, smaller standard errors• Disproportionately allocate the sample to subpopulations,that is, to oversample specific subpopulations to ensuresufficient sample sizes for analysis.
(Heeringa et al., 2010, p. 32)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Unequal probabilities of selection
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Common reasons for varying probabilities of case selection insample surveys include (Heeringa et al., 2010, p. 38--43):
• Disproportionate sampling within strata to• achieve an optimally allocated sample
• deliberately increase precision for subpopulations• Differentially sample subpopulations, e.g. NHANESoversampling of people with disabilities.
• Subsampling of observational units within sample clusters,e.g. selecting a single random respondent from theeligible members of sample households.
• Sampling probability that can be obtained only in theprocess of the survey data collection, e.g. in a randomdigit dialing (RDD) telephone survey, number of distinctlandline telephone numbers
• Nonresponse
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Common reasons for varying probabilities of case selection insample surveys include (Heeringa et al., 2010, p. 38--43):
• Disproportionate sampling within strata to• achieve an optimally allocated sample• deliberately increase precision for subpopulations
• Differentially sample subpopulations, e.g. NHANESoversampling of people with disabilities.
• Subsampling of observational units within sample clusters,e.g. selecting a single random respondent from theeligible members of sample households.
• Sampling probability that can be obtained only in theprocess of the survey data collection, e.g. in a randomdigit dialing (RDD) telephone survey, number of distinctlandline telephone numbers
• Nonresponse
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Common reasons for varying probabilities of case selection insample surveys include (Heeringa et al., 2010, p. 38--43):
• Disproportionate sampling within strata to• achieve an optimally allocated sample• deliberately increase precision for subpopulations
• Differentially sample subpopulations, e.g. NHANESoversampling of people with disabilities.
• Subsampling of observational units within sample clusters,e.g. selecting a single random respondent from theeligible members of sample households.
• Sampling probability that can be obtained only in theprocess of the survey data collection, e.g. in a randomdigit dialing (RDD) telephone survey, number of distinctlandline telephone numbers
• Nonresponse
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Common reasons for varying probabilities of case selection insample surveys include (Heeringa et al., 2010, p. 38--43):
• Disproportionate sampling within strata to• achieve an optimally allocated sample• deliberately increase precision for subpopulations
• Differentially sample subpopulations, e.g. NHANESoversampling of people with disabilities.
• Subsampling of observational units within sample clusters,e.g. selecting a single random respondent from theeligible members of sample households.
• Sampling probability that can be obtained only in theprocess of the survey data collection, e.g. in a randomdigit dialing (RDD) telephone survey, number of distinctlandline telephone numbers
• Nonresponse
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Common reasons for varying probabilities of case selection insample surveys include (Heeringa et al., 2010, p. 38--43):
• Disproportionate sampling within strata to• achieve an optimally allocated sample• deliberately increase precision for subpopulations
• Differentially sample subpopulations, e.g. NHANESoversampling of people with disabilities.
• Subsampling of observational units within sample clusters,e.g. selecting a single random respondent from theeligible members of sample households.
• Sampling probability that can be obtained only in theprocess of the survey data collection, e.g. in a randomdigit dialing (RDD) telephone survey, number of distinctlandline telephone numbers
• Nonresponse
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Common reasons for varying probabilities of case selection insample surveys include (Heeringa et al., 2010, p. 38--43):
• Disproportionate sampling within strata to• achieve an optimally allocated sample• deliberately increase precision for subpopulations
• Differentially sample subpopulations, e.g. NHANESoversampling of people with disabilities.
• Subsampling of observational units within sample clusters,e.g. selecting a single random respondent from theeligible members of sample households.
• Sampling probability that can be obtained only in theprocess of the survey data collection, e.g. in a randomdigit dialing (RDD) telephone survey, number of distinctlandline telephone numbers
• Nonresponse
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Common reasons for varying probabilities of case selection insample surveys include (Heeringa et al., 2010, p. 38--43):
• Disproportionate sampling within strata to• achieve an optimally allocated sample• deliberately increase precision for subpopulations
• Differentially sample subpopulations, e.g. NHANESoversampling of people with disabilities.
• Subsampling of observational units within sample clusters,e.g. selecting a single random respondent from theeligible members of sample households.
• Sampling probability that can be obtained only in theprocess of the survey data collection, e.g. in a randomdigit dialing (RDD) telephone survey, number of distinctlandline telephone numbers
• Nonresponse
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Linear estimators in complex samples
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Problems:Bias: If some (types of) people have a differing chance πiof being in the sample, usual sample statistics will not (onaverage) equal the population quantities anymore.Variance: Affected by clustering/stratification.
If µ̂n := n−1∑i∈sample
1πiyi, notice:
Eπ
n−1∑
i∈sample
1
πiyi
= N−1∑
i∈population
πiπiyi = N
−1∑
i∈populationyi
Solutions:• weighted estimator µ̂n unbiased (Horvitz and Thompson, 1952);• Can obtain variance of weighted estimate, var(µ̂n), underclustering, stratification.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Problems:Bias: If some (types of) people have a differing chance πiof being in the sample, usual sample statistics will not (onaverage) equal the population quantities anymore.Variance: Affected by clustering/stratification.
If µ̂n := n−1∑i∈sample
1πiyi, notice:
Eπ
n−1∑
i∈sample
1
πiyi
= N−1∑
i∈population
πiπiyi = N
−1∑
i∈populationyi
Solutions:• weighted estimator µ̂n unbiased (Horvitz and Thompson, 1952);• Can obtain variance of weighted estimate, var(µ̂n), underclustering, stratification.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Problems:Bias: If some (types of) people have a differing chance πiof being in the sample, usual sample statistics will not (onaverage) equal the population quantities anymore.Variance: Affected by clustering/stratification.
If µ̂n := n−1∑i∈sample
1πiyi, notice:
Eπ
n−1∑
i∈sample
1
πiyi
= N−1∑
i∈population
πiπiyi = N
−1∑
i∈populationyi
Solutions:• weighted estimator µ̂n unbiased (Horvitz and Thompson, 1952);• Can obtain variance of weighted estimate, var(µ̂n), underclustering, stratification.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Latent variable modeling
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Latent variable modeling (LVM)
• (Confirmatory) factor analysis (CFA);• Structural Equation Modeling (SEM);• Latent Class Analysis/Modeling (LCA/LCM);• Latent trait modeling;• Item Response Theory (IRT) models;• Mixture models;• Random effects/hierarchical/multilevel models;• ``Anchoring vignettes'' models;• ... etc.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
• Proportions can be turned into an LC or IRT analysis;• Covariances can be turned into a SEM analysis.
DefinitionLatent variable model estimation: a way of turning observedcovariances/proportions (``moments'') into LVM parameterestimates.
LVM : mn → θ̂n
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
• Proportions can be turned into an LC or IRT analysis;• Covariances can be turned into a SEM analysis.
DefinitionLatent variable model estimation: a way of turning observedcovariances/proportions (``moments'') into LVM parameterestimates.
LVM : mn → θ̂n
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
LVM : mn → θ̂n
Example: confirmatory factor analysis (CFA) with 1 factor, 3indicators:
:λ̂11 =
√cor(y1, y2)cor(y1, y3)/cor(y2, y3)
λ̂21 =√
cor(y1, y2)cor(y2, y3)/cor(y1, y3)
λ̂31 =√
cor(y1, y3)cor(y2, y3)/cor(y1, y2)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Inference in latent variable models under simple randomsampling
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Data generating process:
Model
Superpopulation
→
Finite population
→
Sample
Inference:
Model
←
Finite population
←
Sample
(Fuller, 2009).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Data generating process:
Model
Superpopulation
→
Finite population
→
Sample
Inference:
Model
←
Finite population
←
Sample
(Fuller, 2009).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Data generating process:
Model
Superpopulation
→ Finite population →
Sample
Inference:
Model
←
Finite population
←
Sample
(Fuller, 2009).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Data generating process:
Model
Superpopulation
→ Finite population → Sample
Inference:
Model
←
Finite population
←
Sample
(Fuller, 2009).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Data generating process:
Model
Superpopulation
→ Finite population → Sample
Inference:
Model
←
Finite population
← Sample
(Fuller, 2009).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Data generating process:
Model
Superpopulation
→ Finite population → Sample
Inference:
Model
← Finite population ← Sample
(Fuller, 2009).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Data generating process:
Model
Superpopulation
→ Finite population → Sample
Inference:Model
← Finite population ← Sample
(Fuller, 2009).
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Superpopulation → Finite population of 100 subjects
Loadings: 0.707
→
y1−2
0
2
−4 −2 0 2
Corr:0.442
Corr:0.475
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
● ●
●
●
●
●
●
y20
2
−2 0 2
Corr:0.321
●
●
●●
● ●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●●●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
y30
2
−2 0 2
Loadings:y1: 0.810y2: 0.546y3: 0.587
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
When does weighting make a difference for pointestimates of latent variable models?
• (Usually) when weights represent omitted variable(s) thatinteract with observed or latent variables;
• (Sometimes, e.g. IRT, LCA) when selection is correlatedwith a dependent variable.
• When the model is strongly misspecified:
0.5 1.0 1.5 2.0
-2-1
01
x
y1
True curve (black line),Overall linear reg. line (green),
and reg. from unequal selection/weights
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
When does weighting make a difference for pointestimates of latent variable models?
• (Usually) when weights represent omitted variable(s) thatinteract with observed or latent variables;
• (Sometimes, e.g. IRT, LCA) when selection is correlatedwith a dependent variable.
• When the model is strongly misspecified:
0.5 1.0 1.5 2.0
-2-1
01
x
y1
True curve (black line),Overall linear reg. line (green),
and reg. from unequal selection/weights
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
When does weighting make a difference for pointestimates of latent variable models?
• (Usually) when weights represent omitted variable(s) thatinteract with observed or latent variables;
• (Sometimes, e.g. IRT, LCA) when selection is correlatedwith a dependent variable.
• When the model is strongly misspecified:
0.5 1.0 1.5 2.0
-2-1
01
x
y1
True curve (black line),Overall linear reg. line (green),
and reg. from unequal selection/weights
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
When does weighting make a difference for pointestimates of latent variable models?
• (Usually) when weights represent omitted variable(s) thatinteract with observed or latent variables;
• (Sometimes, e.g. IRT, LCA) when selection is correlatedwith a dependent variable.
• When the model is strongly misspecified:
0.5 1.0 1.5 2.0
-2-1
01
x
y1
True curve (black line),Overall linear reg. line (green),
and reg. from unequal selection/weights
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Should you weight?
..1 Purpose of the analysis: analytical versus descriptive;
..2 Anticipated bias from an unweighted analysis;
..3 If unweighted analysis is unbiased, relative magnitude ofinefficiency resulting from a weighted analysis;
..4 Whether variables are available and known to model thesample design instead of weighting the analysis.
(Patterson et al., 2002, p. 727)
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
Conclusions• Surveys are not usually simple random samples (or iid);• Sample design may bias the results of latent variablemodeling (confidence intervals, significance tests, fitmeasures, parameter estimates);
• Pseudo-maximum likelihood can take the design intoaccount without additional assumptions;
• Implemented in software. SEM: lavaan.survey in R• Nonparametric correction for the design;• ``Aggregate modeling'';• Payment is in variance (efficiency);• Alternative is modeling the effects of strata, clusters,covariates behind; ``disaggregate modeling''.
Complex sampling in latent variable models Daniel Oberski
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion
ReferencesFuller, W. A. (2009). Sampling statistics. Wiley, New York.Heeringa, S., West, B., and Berglund, P. (2010). Applied survey data analysis.Horvitz, D. and Thompson, D. (1952). A generalization of sampling without
replacement from a finite universe. Journal of the American StatisticalAssociation, 47(260):663--685.
Kish, L. (1965). Survey sampling. New York: Wiley.Korn, E. and Graubard, B. (1995). Examples of differing weighted and
unweighted estimates from a sample survey. The American Statistician,49(3):291--295.
Lumley, T. (2010). Complex surveys: a guide to analysis using R. Wiley.Neyman, J. (1934). On the two different aspects of the representative
method: the method of stratified sampling and the method of purposiveselection. Journal of the Royal Statistical Society, 97(4):558--625.
Patterson, B., Dayton, C., and Graubard, B. (2002). Latent class analysis ofcomplex sample survey data. Journal of the American StatisticalAssociation, 97(459):721--741.
Skinner, C., Holt, D., and Smith, T. (1989). Analysis of complex surveys. JohnWiley & Sons.
Complex sampling in latent variable models Daniel Oberski