GEE and Generalized Linear Mixed Models Tom Greene.

Post on 24-Dec-2015

275 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

GEE and Generalized Linear Mixed Models

Tom Greene

Outline

• Subject specific and population average inference in generalized linear models

• Review of classical generalized linear models with independent observations

• Generalized Estimating Equations • Contrasts of GLMMs with GEEs• GEE example

Classes of Generalized Linear ModelsLinear Models

(Linear regression, ANOVA, ANCOVA)E(Y) = X β,

Responses Independent

Generalized Linear Models(Logistic regression, Poisson

regression, etc.)g(E(Y)) = X β

Responses Independent

Linear Mixed ModelsE(Y|b) = X β + Z b

Responses Correlated Correlation modeled in part by

“random effects”

Generalized Linear Mixed Models (GLMM)

g(E(Y|b)) = X β + Z bResponses Correlated

Correlation modeled in part by “random effects”

Generalized Estimating Equations Approach (GEE)

g(E(Y)) = X β Responses Correlated

Classes of Generalized Linear Modelsfor Correlated Data

Linear Mixed ModelsE(Y|b) = X β + Z b

Responses Correlated Correlation modeled in part by

“random effects”

Generalized Linear Mixed Models (GLMM)

g(E(Y|b)) = X β + Z bResponses Correlated

Correlation modeled in part by “random effects”

Generalized Estimating Equations Approach (GEE)

g(E(Y)) = X β Responses Correlated

Population Average Inference Subject Specific Inference

Classes of Generalized Linear Modelsfor Correlated Data

Generalized Linear Mixed Models (GLMM)

g(E(Y|b)) = X β + Z bResponses Correlated

Generalized Estimating Equations Approach (GEE)

g(E(Y)) = X β Responses Correlated

Population Average Inference Subject Specific Inference

• Analysis describes differences in the mean of Y across the entire population

• Analysis informative from population perspective; most relevant from perspective of Policy makers Providers desiring to optimize

outcomes across entire population

• Analysis describes differences in the mean of Y conditional on the patient’s specific random effect b

• Most relevant from an individual patient’s perspective

• Often b represent a dimension of frailty – Hence, X β tells about therelationship of Y to X among patients with the same frailty

Extreme Example

Subject specific effects of X on Pr(Death), OR = 20 per 1 unit increase in X

Population average effect of X on Pr(Death), OR = 2.7 per 1 unit increase in X

Example: Toenail Data Toenail Dermatophyte Onychomycosis: Common toenail infection, difficult to treat, affecting more than 2% of population.

Design: Randomized, double-blind, parallel group, multicenter study for the comparison of two new compounds (A and B) for oral treatment. 2 x189 patients randomized, 36 centers 48 weeks of total follow up (12 months) 12 weeks of treatment (3 months) Measurements at months 0, 1, 2, 3, 6, 9, 12.

Research question: Severity relative to treatment of TDO ?

• Independent responses Yi, i = 1, 2, …, N – Yi, with distribution from exponential family– f(y;θ,ø) =

• Mean model – μi = E(Yi|Xi1,Xi2,…,Xip)– g(μi) = β0 + β1Xi1 + β2Xi2+ βpXip

• Variance function – Var(Yi) = øV(μi)– V(μi) is a known function determined by the assumed

distribution of Y within the exponential family

Review of Generalized Linear Models(Independent Responses)

),()()(

exp

yc

aby

Review of Generalized Linear Models(Independent Responses)

Review of Generalized Linear Models(Independent Responses)

• Independent responses Yi, i = 1, 2, …, N – Yi, with distribution from exponential family– f(y;θ,ø) =

• Mean model – μi = E(Yi|Xi1,Xi2,…,Xip)– g(μi) = β0 + β1Xi1 + β2Xi2+ βJXiJ

• Variance function – Var(Yi) = øV(μi)– vi = V(μi) is a known function determined by the assumed

distribution of Y within the exponential family

Review of Generalized Linear Models(Independent Responses)

),()()(

exp

yc

aby

The mean model is the only part we have to get right for valid large-sample inference!!!

Extension to GEE for Longitudinal Data GEE: Generalized Estimating Equations (Liang & Zeger, 1986;Zeger & Liang, 1986)• Method is semi-parametric

– estimating equations are derived without full specification

of the joint distribution of a subject’s observations • Instead, specification of•The mean model for the marginal distributions of the yij

•The variance function of yij given µij

•The “working” correlation matrix for the vector of repeated observations from each subject1.Relies on the independence across subjects (or clusters) to estimate consistently the variance of the regression coefficients

GEE Method Outline1. Relate the marginal response μij = E(yij) to a linear combination of the covariates g(μij) = Xt

ijβ• yij is the response for subject i at time j, j = 1,2, .., J• Xij is a p × 1 vector of covariates• β is a p × 1 vector of regression coefficients• g(·) is the link function

2. Describe the variance of yij as a function of the meanV(yij) = v(μij)ø

• ø is possibly unknown scale parameter• v(·) is a known variance function

Link and Variance Functions• Normally-distributed response

g(μij) = μij “Identity link”v(μij) = 1V(yij) = ø

• Binary response (Bernoulli)g(μij) = log[μij/(1 − μij)] “Logit link”v(μij) = μij(1 − μij)

ø = 1• Poisson response

g(μij) = log(μij) “Log link”v(μij) = μij

ø = 1

GEE Method Outline3. Choose the form of a n × n “working” correlation matrix Ri for each Yi

Working Correlation Structures

Working Correlation Structures

Working Correlation Structures

(AR(1)

Working Correlation Structures

GEE Estimation• Define Ai = n × n diagonal matrix with V(μij) as the jth diagonal element• Define Ri(α) = n × n “working” correlation matrix (of the n repeated measures)

Working variance–covariance matrix for Yi equals

Vi(α) = øAi1/2 Ri(α) Ai

1/2

1) Target of Inference: •GEE:Population Average•GLMM: Subject Specific

Notes: Recent work on perform population average inference under GLMM models

GEE vs. GLMM

2) Outputs: •GEE:– Coefficients relating Y to X

•GLMM:– Coefficients relating Y to X conditional on b– Estimates of subject specific random effects– Variance of subject specific random effects

GEE vs. GLMM

3) Robustness: •GEE (with robust variance estimates):– Inference valid in large samples even if distribution of Y

and/or variance of Y are incorrectly specified •GLMM (with model-based estimates)– Valid inference generally requires correct specification of

distribution of Y and of variance of Y

Notes: 1)Recent proposals for robust variance estimates under GLMM2)Inference for Linear Mixed Models remains valid if Y is not normal for large N3)Caveat to GEE robustness: GEE can be biased if time dependent covariates are used unless an independent working correlation matrix is used

GEE vs. GLMM

4) Efficiency (power and width of confidence intervals)•GEE:– Usually fairly efficient if variance function is

correctly specified – Between subject comparisons are nearly efficient

if an independence covariance structure is used for balanced data

•GLMM:– Maximum likelihood estimates are asymptotically

efficient as long as the model is correctly specified

GEE vs. GLMM

5) Missing Data: •“Classical” GEE (with robust variance estimates)– Valid inference if data are Missing Completely At

Random (MCAR) even if variance model is wrong– If variance model is correct, estimate of β is still

consistent if data are MAR but not MCAR (but standard errors are not correct)

•GLMM (with model-based estimates)– Valid inference if data are Missing At Random (MAR)

Notes: 1)Various strategies for valid GEE inference if data are MAR

GEE vs. GLMM

Missing data•Three general approaches to dealing with missing data under GEE which assume MAR but not MCAR

1. Inverse probability weighting (Robins, Rotnitzky and Zhao, JASA, 1995)

2. Multiple imputation 3. Inverse probability weighting with augmentation, or

doubly robust estimation •Each method can incorporate covariate information not included in the GEE model itself. This can make the MAR assumption much more plausible.•Methods 2 and 3 can be considerably more efficient than standard inverse probability weighting

6) Small to Moderate Samples: •GEE (with robust variance estimates):– Estimated standard errors are unstable and biased

downwards • Inefficient estimating equation for estimating variance• Effectively uses fully unstructured variance model

– “Sample size” means the number of independent units

– Various corrections have been proposed (available in PROC GLIMMIX)

•GLMM (with model-based estimates)– Large-sample approximations are often invoked, but

performance usually better than GEE with small to moderate N if model is correctly specified.

GEE vs. GLMM

More Toenail Data

• Multicenter trial comparing active vs. control oral treatments for toenail infection

• Repeated measurements of binary outcome:– 0 = none or mild separation – 1 = severe separation

• 1908 observations in 294 patients, mostly over 1 year

**** Standard GENMOD GEE program using Robust SEs *****;**** Binary outcome leads to default logistic link function ****;proc genmod descending;Class id;model outcome = treatment month treatment*month/ dist=bin;repeated subject=id/type=exch covb corrw;estimate 'Control Slope' month 1/exp;estimate 'Treartment Slope' month 1 treatment*month 1/exp;run;

Working Correlation Matrix

Col1 Col2 Col3 Col4 Col5 Col6 Col7 Row1 1.0000 0.4212 0.4212 0.4212 0.4212 0.4212 0.4212 Row2 0.4212 1.0000 0.4212 0.4212 0.4212 0.4212 0.4212 Row3 0.4212 0.4212 1.0000 0.4212 0.4212 0.4212 0.4212 Row4 0.4212 0.4212 0.4212 1.0000 0.4212 0.4212 0.4212 Row5 0.4212 0.4212 0.4212 0.4212 1.0000 0.4212 0.4212 Row6 0.4212 0.4212 0.4212 0.4212 0.4212 1.0000 0.4212 Row7 0.4212 0.4212 0.4212 0.4212 0.4212 0.4212 1.0000

**** Standard GENMOD GEE program using Robust SEs;**** Binary outcome leads to default logistic link function;proc genmod descending;Class id;model outcome = treatment month treatment*month/ dist=bin;repeated subject=id/type=exch covb corrw;estimate 'Control Slope' month 1/exp;estimate 'Treatment Slope' month 1 treatment*month 1/exp;run;

Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates

Standard 95% Confidence Parameter Estimate Error Limits Z Pr > |Z|

Intercept -0.5819 0.1720 -0.9191 -0.2446 -3.38 0.0007 treatment 0.0072 0.2595 -0.5013 0.5157 0.03 0.9779 month -0.1713 0.0300 -0.2301 -0.1125 -5.71 <.0001 treatment*month -0.0777 0.0541 -0.1838 0.0283 -1.44 0.1509

**** Standard GENMOD GEE program using Robust SEs *****;**** Binary outcome leads to default logistic link function ****;proc genmod descending;Class id;model outcome = treatment month treatment*month/ dist=bin;repeated subject=id/type=exch covb corrw;estimate 'Control Slope' month 1/exp;estimate 'Treatment Slope' month 1 treatment*month 1/exp;run;

Contrast Estimate Results Mean Mean L'Beta StandardLabel Estimate Confidence Limits Estimate Error

Control Slope 0.4573 0.4427 0.4719 -0.1713 0.0300Exp(Control Slope) 0.8426 0.0253Treatment Slope 0.4381 0.4165 0.4599 -0.2490 0.0450Exp(Treatment Slope) 0.7796 0.0351

Contrast Estimate Results L'Beta Chi- Label Alpha Confidence Limits Square Pr > ChiSq

Control Slope 0.05 -0.2301 -0.1125 32.60 <.0001 Exp(Control Slope) 0.05 0.7945 0.8936 Treatment Slope 0.05 -0.3373 -0.1607 30.57 <.0001 Exp(Treatment Slope) 0.05 0.7137 0.8515

Can ignore in this case

**** GLIMMIX GLMM Estimating Subject Specific Effects ****;**** Binary outcome leading to default logistic link function ****;proc glimmix method=RSPL data=toenail;Class id;model outcome (event="1") = treatment month treatment*month/ s dist=binary;random int / subject=id;estimate 'Control Slope' month 1/or;estimate 'Treartment Slope' month 1 treatment*month 1/or cl; run;

Solutions for Fixed Effects StandardEffect Estimate Error DF t Value Pr > |t|

Intercept -0.7204 0.2370 292 -3.04 0.0026treatment -0.02594 0.3360 1612 -0.08 0.9385month -0.2782 0.03222 1612 -8.64 <.0001treatment*month -0.09583 0.05105 1612 -1.88 0.0607

*** Small Sample; data small; set toenail; if id <= 20;** Standard GENMOD GEE with Robust SEs: 17 Patients Only ***;** Binary outcome leading to default logistic link function **;proc genmod descending;Class id;model outcome = treatment month treatment*month/ dist=bin;repeated subject=id/type=exch covb corrw; run;

Standard 95% Confidence Parameter Estimate Error Limits Z Pr > |Z|

Intercept -0.3558 0.6272 -1.5851 0.8736 -0.57 0.5706 treatment 0.0527 0.9679 -1.8444 1.9497 0.05 0.9566 month -0.1543 0.0991 -0.3485 0.0400 -1.56 0.1196 treatment*month 0.0272 0.1725 -0.3109 0.3654 0.16 0.8746

**** GLIMMIX GEE program using Robust SEs;**** Binary outcome leads to default logistic link function;**** Restricted to 17 patients;**** Small N Adjustment of Morel, Bokossa, and Neerchal (2003); proc glimmix method=RSPL empirical=mbn data=small;Class id;model outcome (event="1") = treatment month treatment*month/ s dist=binary ddfm=kenwardroger;random _residual_ / subject=id type=cs;run;

Solutions for Fixed Effects

StandardEffect Estimate Error DF t Value Pr > |t|

Intercept -0.3605 0.7369 15 -0.49 0.6317treatment 0.05762 1.1209 15 0.05 0.9597month -0.1530 0.1197 94 -1.28 0.2043treatment*month 0.02560 0.1984 94 0.13 0.8976

THAT’s ALL

top related