Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates Doubly robust estimates for longitudinal data analysis with missing response and missing covariates Xiao-Hua Andrew Zhou, Ph.D Co-Investigator and Senior Biostatistician, NACC Professor, Department of Biostatistics University of Washington October, 2009 Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariates ADC, 2009 1 / 43
43
Embed
Doubly robust estimates for longitudinal data analysis ... · longitudinal data analysis with missing response and missing covariates Xiao-Hua Andrew Zhou, Ph.D Co-Investigator and
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Doubly robust estimates for
longitudinal data analysis with missingresponse and missing covariates
Xiao-Hua Andrew Zhou, Ph.D
Co-Investigator and Senior Biostatistician, NACCProfessor, Department of Biostatistics
University of Washington
October, 2009
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 1 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
1 NACC UDS
2 Analysis of Complete Longitudinal Data
3 Estimating Equations for Missing Outcome
4 Methods for Handling Missing Covariates
5 New MethodModel Formulation For Missing Response and CovariatesEstimation and Inference
6 Simulations and ApplicationsSimulationsApplications
7 Summary
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 2 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
A NACC example
Using the National Alzheimer’s Coordinating Center (NACC)Uniform Data Set (UDS), we are interested in assessing heassociation between patient’s characteristics and the onset ofdementia.
The response is the diagnosis of dementia (Yes/No).
The covariates that may be related to the status of dementiainclude sex, congestive heart failure (CVCHF, yes/no), familyhistory of dementia (FHDEM, yes/no), diabetes (yes/no),behavioral assessment (depression or dysphoria, yes/no),hypertension (yes/no), education (years), Mini-Mental StateExam (MMSE) score, and age.
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 3 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
A NACC example, continued
There are 16223 subjects from 29 Alzheimer’s Disease Centersincluded at the entry of this study.
Follow-up visits for subjects are scheduled at approximatelyone-year intervals, with up to three follow-ups at present.
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 4 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
An example, continued
Due to some reasons, there are some missing data for theresponse and the behavioral assessment covariate.
There are 8724 subjects with complete data on scheduledvisits.
About 11.9% subjects miss both the response and behavioralassessment; about 31.2% subjects miss the response butobserve behavioral assessment; about 3.2% subjects miss thebehavioral assessment but observe the response; and about53.7% subjects observe both the response and the behavioralassessment covariate.
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 5 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
GEE Approach with Complete LongitudinalData
The method of generalized estimating equations (GEE) is apopular method for analyzing longitudinal data.
It requires only the specification of a model for the marginalmean and variance of each measurement and of a ”working”matrix for the correlation between measurements in a cluster.
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 6 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Notations
Let Yij denote the response of individual i at time j
(i = 1, . . . ,N; j = 1, . . . ,Mi). Let Yi = (Yi1, . . . ,YiMi)T .
Let xij denote a vector of covariates for individual i at time j ,and xi = (xT
i1 , . . . , xTiMi
)T . xi = (xTi1, . . . , x
TiMi
)T .
Let µij = E (Yij | xij), g(µij) = βT xij ; letµi = (µi1 . . . , µiMi
)T .
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 7 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
GEE for Complete Data Analysis
The GEE for complete data are
N∑
i=1
Ui(β, ρ;Yi , xi ) = 0,
where
Ui(β, ρ;Yi , xi ) =∂µT
i
∂βVi (ρ)−1(Yi − µi ),
and Vi(ρ) is the working covariance matrix of Yi .
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 8 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Asymptotic results
When xi contains only time-independent covariates, undersome regularity conditions, the GEE yields estimators that areconsistent.
If xi includes some time-dependent covariates, the GEE stillyields consistent estimators under one additional assumptionthat E (Yij | xi ) = E (Yij | xij). If this is not the case, then forconsistency the independent working correlation should beused.
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 9 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Time-dependent Covariates
Let Lij denote all the data that should be collected onindividual i at time j .
Let Lij denote the data available on individual i by time j .
Let Lij denote the data not yet available by time j .
Note that Lij includes both Yij and xij .
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 10 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Drop-out
Let Rij = 1 if measurement j on individual i is observed andRij = 0 otherwise.
Assume monotone drop-out: Rij = 0 implies Rik = 0 for alltimes k > j .
Let Cij = 1 if subject is last observed measurement is at timej and 0 otherwise.
We assume that the covariates included in Lij are chosen so thatthe data can assumed to be Missing at Random (MAR):
i.e., the probability of missingness only depends on the observeddata.
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 11 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
GEE for Complete-Data
N∑
i=1
Ui(β, ρ;Yi , xi ) = 0,
where
Ui(β, ρ;Yi , xi ) =∂µT
i
∂βVi(ρ)−1(Yi − µi ),
and Vi(ρ) is the working covariance matrix of Yi .These equations yield estimates that are consistent if the data areMissing Completely at Random (MCAR), but not necessarily ifthey are MAR.
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 12 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Re-weighting
With missing data, we can base our estimates on thecomplete cases, but re-weight them according to theprobability of being observed.The estimating equations are then
N∑
i=1
∂µTi
∂βVi(ρ)−1∆i(α)(Yi − µi),
where ∆i (α) = diag(Ri1/πi1, . . . ,RiMi/πiMi
) andπij = πij(α) is the probability, according to a specifieddropout model, that measurement j on subject i is observed.Under the drop-out missing data,
πij(α) = (1 − λi1(α)) . . . (1 − λij(α)),
where λij(α) = P(Rij = 0 | Lij ,Rij = 1).The resulting estimates are consistent if the data are MAR, aslong as the probability model for the missingness is correctlyspecified.Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 13 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Imputation
Alternatively, we can impute, or “guess”, what the missingvalues are based on some probability model.
Then the estimates are based on both the observed data andthe imputed data.
The complete case estimating equations are used, but afterimputing missing responses with their expected values:
E (Yij |Lik ,Rik = 1), for j > k.
The imputations are based on specified regression models.
The resulting estimates are consistent if the data are MAR, aslong as the probability model for the imputations is correct.
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 14 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Doubly-robust Estimating Equations
The inverse probability weighting estimates make no use ofthe available data on subjects with missing measurements.
Let d(LM ,β) = U(β,ρ;Y, x) be the contribution of a fullyobserved subject to the estimating equations.
For drop-out missing data, the IPW estimating equations canbe augmented by a term F (C ,LC ,β) satisfyingEC{F (C ,LC ,β)|LM} = 0.
The resulting augmented estimating equations are
N∑
i=1
{
RiMi
πiMi
d(LMi ,β) + F (C ,LC ,β)
}
= 0.
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 15 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Doubly-robust Estimating Equations (2)
The optimal choice of augmentation term is
Fopt(C ,LC ,β) =
M−1∑
j=1
(
Cj − λj+1Rj
πj+1
)
Hj(β),
where Hj (β) = ELj{d(LM ,β)|Lj ,Rj = 1}.
We specify models for Hj (β), j = 1, . . . ,M − 1 which involveparameters γ.
Let α̂ and γ̂ denote consistent estimators of α and γ.
Then, in the estimating equations, replace λj , πj , and Hj withλj(α), πj(α), and Hj(β, γ̂).
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 16 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Properties of DR Estimating Equations
If:
The data are MAR,
the marginal model is correct, g(µij) = βTxij , and
either the dropout model πj , or the model for Hj (or both) iscorrectly specified,
then the solution to the estimating equations β̂ is consistent for β.
Furthermore, if both the dropout model and the model for Hj
are correct, then this solution β̂ is optimal in the sense that ithas the smallest asymptotic variance among estimates fromaugmented estimating equations. A consistent estimate ofthis variance exists in closed form.
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 17 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Methods for Handling Missing Covariates
Lipsitz et al. (1999) considered the doubly robust estimate in thecross-sectional study with a missing covariate
Notations:
yi : response, xi : covariate vector that is always observedzi : covariate that is subject to missingri : missing indicator for zi
Joint density of (ri , yi , zi |xi)
p(ri , yi , zi |xi ) = p(ri |yi , zi , xi , ω)p(yi |zi , xi , β)p(zi |xi , α)
= p(ri |yi , xi , ω)p(yi |zi , xi , β)p(zi |xi , α) (MAR)
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 18 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Score Equation for Complete Data
The likelihood-based score question:
n∑
i=1
u1i (β)u2i(α)u3i (ω)
= 0,
where
u1i (β; yi , xi , zi ) = ∂ log p(yi |xi ,zi ,β)∂β
u2i (β; xi , zi ) = ∂ log p(zi |xi ,α)∂α
u3i (β; ri , xi , yi ) = ∂ log p(ri |xi ,yi ,zi ,ω)∂ω
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 19 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Methods for Handling Missing Covariates
With missing data, the maximum likelihood estimating equationsfor γ̂ = (β̂′, α̂′, ω̂′)′ solves
u∗(γ̂) =
n∑
i=1
u∗i (γ̂) =
n∑
i=1
E
u1i(β̂)u2i (α̂) observed datau3i (ω̂)
= 0
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 20 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Methods for Handling Missing Covariates
We can further show that
u∗(γ) =
n∑
i=1
riu1i(β; yi , xi , zi) + (1 − ri)Ezi |yi ,xi[u1i (β; yi , xi , zi )]
riu2i (α; zi , xi ) + (1 − ri)Ezi |yi ,xi[u2i (α; zi , xi )]
u3i (ω; yi , xi , ri )
Solving u∗(γ̂) = 0 we get the MLE
The asymptotic properties of (β̂, α̂)′ don’t depend on themissing data model
If p(yi |xi , zi ) and p(zi |xi ) are correctly specified, we can getconsistent estimate of (β̂, α̂)′ by solving u∗(γ̂) = 0
If p(yi |xi , zi ) or/and p(zi |xi ) are misspecified, then β̂ will notbe consistent
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 21 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Methods for Handling Missing Covariates
Weighted GEE
S(γ) =
n∑
i=1
riπi
u1i (β; yi , xi , zi) +(
1 − riπi
)
Ezi |yi ,xi[u1i (β; yi , xi , zi)]
riπi
u2i (α; zi , xi) +(
1 − riπi
)
Ezi |yi ,xi[u2i (α; zi , xi )]
u3i (ω; yi , xi , ri )
where πi = P(ri = 1|yi , xi )
Doubly robust estimate, i.e., solving S(γ̂) = 0 can getasymptotic unbiased estimate for β when either πi or p(zi |xi )is correctly specified
EM algorithm for the estimate
Asymptotic variance
Var(γ̂) ={
n∑
i=1
E
[
∂Si(γ)
∂γ′
]}
−1n
∑
i=1
E [Si(γ)S ′
i (γ)]{
n∑
i=1
E
[
∂Si(γ)
∂γ
]}
−1
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 22 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 23 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Model Formulation (Continued)
Missing data models: λijk = P(Rij = k |R̄ij , Yi , Xi , Zi ), k = 0, 1, 2, 3
log(λijk
λij0
)
= uijk′αk k = 1, 2, 3
R̄ij : missing response indicator history
Covariate model: ωij = E (Xij |X̄ij ,Zi )
h(ωij) = v ′ijγ
X̄ij : covariate history
θ = (β′, α′, γ′)′, where β is of interest
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 24 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Model Formulation (Continued)
MAR assumption:
P(Rij = k|R̄ij ,Yi ,Xi ,Zi )
= P(Rij = k|R̄ij ,Y(o)i ,X
(o)i ,Zi)
Yi = (Y(o)i , Y
(m)i )
Xi = (X(o)i , X
(m)i )
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 25 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Model Formulation (Continued)
Weighted GEE (WGEE) for β:
S1(θ) =
n∑
i=1
[
DiMi(Yi−µi )+EY
(m)i
,X(m)i
|Y(o)i
,X(o)i
,Zi[DiNi (Yi−µi)]
]
= 0
Mi = κ−1F−1/2i [C−1
i • ∆i ]F−1/2i
Ni = κ−1F−1/2i [C−1
i • (11′ − ∆i)]F−1/2i
Fi = diag(var(Yij |Xij ,Zij), j = 1, . . . , Ji )
Ci : working correlation matrix
∆i = [δijk ] with
δijk = [I (Rij = 1,Rik = 3) + I (Rij = 3,Rik = 3)]/πijk for j 6= k
andδijj = I (Rij = 3)/πij
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 26 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Model Formulation (Continued)
Weighted GEE (WGEE) for γ:
S2(θ) =
n∑
i=1
[
vi∆∗i (Xi −ωi)+E
X(m)i
|X(o)i
,Zi[vi(I −∆∗
i )(Xi −ωi)]]
= 0
∆∗i = diag(I (Rij = 1 or 3)/πx
ij , j = 1, . . . , Ji)
πxij = P(Rij = 1 or 3|Yi ,Zi ,Xi )
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 27 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Model Formulation (Continued)
Estimation function for missing data parameter α:
S3(α) =
n∑
i=1
Ji∑
j=1
3∑
k=0
I (Rij = k)
λijk
∂λijk
∂α= 0
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 28 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Estimation and Inference
Solve estimating equations
S(θ̂) =
S1(θ̂)
S2(θ̂)S3(α̂)
=n
∑
i=1
Si(θ) = 0
EM algorithm for the estimation
Variance estimate
Var(θ̂) ={
n∑
i=1
E
[
∂Si(θ)
∂θ
]}
−1n
∑
i=1
E [Si(θ)S′
i (θ)]{
n∑
i=1
E
[
∂Si(θ)
∂θ
]
′}
−1
.
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 29 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Estimation and Inference (Continued)
Doubly robust estimate
If missing data model is correctly specified, we get asymptoticunbiased estimate for β no matter the model for the covariateis correctly specified or not
If covariate model is correctly specified, we get asymptoticunbiased estimate for β no matter the model for the missingdata is correctly specified or not
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 30 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Simulations
Response model is logit(µij) = β0 + β1xij + β2Zij , j = 1, 2, 3,with exchangeable correlation ρ.
Covariate model
logitωij = γ0 + γ1Xi ,j−1 + γ2Zij
Missing data model
log(λijk
λij0
)
= α0k + α1k1I (Ri ,j−1 = 1) + α1k2I (Ri ,j−1 = 2)
+α1k3I (Ri ,j−1 = 3) + α2ky(o)i ,j−1 + α3kx
(o)i ,j−1
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 31 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Simulations (Continued)
Methods considered
1 EM(x+): EM with correct covariate model
2 WGEE(x+, r+): WGEE with correct covariate and missingdata models
3 WGEE(x−, r+): WGEE with incorrect covariate and correctmissing data models
4 WGEE(x+, r−): WGEE with correct covariate and incorrectmissing data models
5 WGEE(x−, r−): WGEE with incorrect covariate and incorrectmissing data models
6 cc: complete case MLE
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 32 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Simulations (Continued)
Table: Empirical bias, standard deviation and coverageprobabilities for six approaches to estimation and inference withincomplete covariate and response data (ρ = 0.6, α2 = γ2 = −2)
Xiao-Hua Zhou Doubly robust estimates for longitudinal data analysis with missing response and missing covariatesADC, 2009 33 / 43
Motivation Example Complete Longitudinal Data Missing outcome Missing Covariates Missing both Response and Covariates
Table: Empirical bias, standard deviation and coverageprobabilities for six approaches to estimation and inference withincomplete covariate and response data (ρ = 0.3, α2 = γ2 = −2)