Longitudinal Data Analysis - Note Set 13

7/24/2019 Longitudinal Data Analysis - Note Set 13

1/12

Longitudinal Data Analysis - Note Set 13

Topics in the Remainder of the Course

Longitudinal analysis when the model for the outcome variable is

nonlinear

Binary outcomes (logistic regression)

Counts (Poisson regression)

Generalized estimating equations for marginal models

Nonlinear mixed effects analysis for subject-specific models

Inverse Probability of Missingness weighting

Hierarchical models for multilevel data

346

Topics for Today

Logistic Regression

Poisson Regression

The Generalized Linear Model

347


2/12


Review: Logistic and Poisson Regression

We begin with a review of logistic and Poisson regression for a singleresponse variable. This review will explain why we need a differentapproach when the regression model for the response is nonlinear, that is,when the expected response does not change linearly with changes in thevalue of the covariates.

Logistic Regression:

So far, we have considered linear regression models for a continuous

response, Y , of the following form

Y = 0 + 1X1 + 2X2 + . . . + kXk +

In the univariate case, the response variable, Y, was assumed to have anormal distribution with mean

E(Y) = 0 + 1X1 + 2Xz+ . . . + kXk

and with variance, 2.

348

The population intercept, 0, represents the mean value of the responsewhen all of the covariates have value equal to zero.

Regression coefficients for covariates, say 1, represent the expected

change in the mean response for a single-unit change in X1, given nochange in the values of the other covariates.

In many studies, however, we are interested in a response variable that is

dichotomous rather than continuous.

This situation calls for a regression model for a binary (or dichotomous)

response.

349


3/12


Let Y be a binary response, where

Y = 1 represents a `success';

Y = 0 represent a `failure'.

Then the mean of the binary response variable, denoted , is the

proportion of successes or the probability that the response equals 1.

That is,

= E(Y) = Pr(Y = 1) = Pr(`success')

With a binary response, we are usually interested in estimating the

Probability, , and modeling how it depends on the values of covariates.

The most common approach to this problem is logistic regression.

350

A naive strategy for modeling a binary response is to perform an ordinarylinear regression analysis using the regression model

= E(Y ) = 0 + 1X1 + 2Xz+ . . . + kXk

This is sometimes done but is usually not employed in the analysis ofsubject level data since is a probability and is restricted to valuesbetween 0 and 1.

Also, the usual assumption of homogeneity of variance would be violatedsince the variance of a binary response depends on the mean, i.e.

var(Y ) = (1 -)

Instead, we can consider a logistic regression model where

ln [/(1 -)] = 0 + 1X1 + 2Xz+ . . . + kXk

351


4/12


Relation between log(/(1 )) and

352

This model accommodates the constraint that is restricted to values

between 0 and 1.

Recall that /(1 - ) is defined as the odds of success.

Therefore, modeling with a logistic function can be considered to beanalogous to a linear regression model where the mean of the continuousresponse has been replaced by the logarithm of the odds of success.

Note that the relationship between and the covariates is non-linear. Wecan use ML estimation to obtain estimates of the logistic regressionparameters, under the assumption that the binary responses are Bernoullirandom variables.

353


5/12


Given the logistic regression model

ln [(1 -)] = 0 + 1X1 + 2Xz+ . . . + kXk

the population intercept, 0, has interpretation as the log odds of successwhen all of the covariates take on the value zero.

The population slope, say 1, has interpretation in terms of the change inlog odds of success for a single-unit change in X1 given that all of the othercovariates remain constant.

When one of the covariates is dichotomous, say X1, then 1 has a specialinterpretation:

exp (1) is the odds ratio or ratio of odds of success for the two possiblelevels of X1 (given that all of the other covariates remain constant).

354

Keep in mind that as:

increases

the odds of success increases

and the log odds of success increases

Similarly, as:

decreases

the odds of success decreases

and thus the log odds of success decreases

355


6/12


Longitudinal Assessment of Neurocognitive

Function After CABGNEJM 2001; 344:395-402

The investigators studied predictors of cognitive decline five years after

surgery in 261 patients undergoing surgery. Decline in postoperative

function was defined as a drop of 1 SD or more in the scores on tests of

any one of four domains of cognitive function.

Factors predicting long-term cognitive decline were determined by

multivariable logistic regression.

The incidence of cognitive decline was 53 percent at discharge, 36 at six

weeks, and 42 percent at five years.

356

Newman, M. F. et al. N Engl J Med 2001;344:395-402

Univariable and Multivariable Predictors of Cognitive Decline and Change

in the Compos ite Cogniti ve Index at Five Years

357


7/12


Poisson Regression

In Poisson regression, the response variable is a count (e.g. number of

cases of a disease in a given period of time) and the Poisson distribution

provides the basis of likelihood-based inference.

Often the counts may be expressed as rates. That is, the count or

absolute number of events is often not satisfactory because any

comparison will depend on the sizes of the groups (or the

`time at risk') that generated the observations.

Like a proportion or probability, a rate provides a basis for direct

comparison.

In either case, Poisson regression relates the expected counts or rates to a

set of covariates.

358

The Poisson regression model has two components:

1. The distributional assumption: The response variable is a count and isassumed to have a Poisson distribution.

That is, the probability a specific number of events, Y, occurs is

Pr(Y events) = e- Y/(Y!)

Note that is the expected count or number of events and the expected

rate is given by /t, where t is a relevant baseline measure (e.g. t might

be the number of persons or the number of person-years of observation).

359


8/12


2. The regression model:

ln( /t) = 0 + 1X1 + 2Xz+ . . . + kXk

Since ln( /t) = ln() - ln(t), the Poisson regression model can also bewritten as

ln() = ln(t) + 0 + 1X1 + 2Xz+ . . . + kXk

where the `coefficient' associated with ln(t) is fixed as 1.

This adjustment term is known as an `offset'.

360

Therefore, modeling (or /t) with a log function can be considered to be

equivalent to a linear regression model where the mean of the continuous

response has been replaced by the logarithm of the expected count (or

rate).

Once again, the relationship between (or /t) and the covariates is

non-linear.

We can use ML estimation to obtain estimates of the Poisson regression

parameters, under the assumption that the responses are Poisson random

variables.

361

log linear


9/12


Given the Poisson regression model

ln(/t) = 0 + 1X1 +2Xz+ . . . + kXk

the population intercept, 0, has interpretation as the log expected rate

when all the covariates take on the value zero.

The population slope, say 1, has interpretation in terms of the change inlog expected rate for a single-unit change in X1 given that all of the othercovariates remain constant.

When one of the covariates is dichotomous, say X1, then 1 has a special

interpretation:

exp (1) is the rate ratio for the two possible levels of X1 (given that all ofthe other covariates remain constant).

362

Example: Prospective study of coronary heart disease (CHD).

The study observed 3154 men aged 40-50 for an average of 8 years and

recorded incidence of cases of CHD.

The risk factors considered include:

Smoking exposure: 0, 10, 20, 30 cigs per day;

Systolic BP: 0 (< 140), 1 ( 140);

Behavior Type: 0 (type B), 1 (type A).

A simple Poisson regression model is:

ln(/t) = ln(rate of CHD) = 0 + 1Smoke

or ln() = ln(t) + 0 + 1Smoke

363


10/12


Person Blood

Years Smoking Pressure Behavior CHD

5268.2 0 0 0 202542.0 10 0 0 16

1140.7 20 0 0 13

614.6 30 0 0 3

4451.1 0 0 1 41

2243.5 10 0 1 24

1153.6 20 0 1 27

925.0 30 0 1 17

1366.8 0 1 0 8

497.0 10 1 0 9

238.1 20 1 0 3

146.3 30 1 0 7

1251.9 0 1 1 29

640.0 10 1 1 21374.5 20 1 1 7

338.2 30 1 1 12

364

For these data, the ML estimate of 1 is 0.0318. That is, the rate of CHDincreases by a factor of exp(0.0318) = 1.032 for every cigarette smoked.

Alternatively, the rate of CHD in smokers of one pack per day (20 cigs) is

estimated to be (1.032)20 = 1.88 times higher than the rate of CHD in

non-smokers.

We can include the additional risk factors in the following model:

ln ( /t) = 0 + 1 Smoke + 2 Type + 3BP

Effect Estimate Std. Error

Intercept -5.420 0.130

Smoke 0.027 0.006

Type 0.753 0.136

BP 0.753 0.129

365


11/12


Now, the adjusted rate of CHD (controlling for blood pressure and behaviortype) increases by a factor of exp(0.027) = 1.028 for every cigarette smoked.

Thus, the adjusted rate of CHD in smokers of one pack per day (20 cigs) is

estimated to be (1.027)20 = 1.704 times higher than the rate of CHD in non-

smokers.

Finally, note that when a Poisson regression model is applied to data

consisting of very small rates (say, /t


12/12


Barger, L. K. et al. N Engl J Med 2005;352:125-134

Weekly Hours That Interns Worked as a Percentage of Reported Weeks

368

Results

A total of 320 motor vehicle crashes were reported, including 133 that were

consequential; 131 of the 320 crashes occurred on the commute from

work.

Every extended shift (> 24 hrs) scheduled per month increased the

monthly rate of any motor vehicle crash by 9.1 percent (95 percent

confidence interval, 3.4 to 14.7 percent) and increased the monthly rate of

a crash on the commute from work by 16.2 percent (95 percent confidence

interval, 7.8 to 24.7 percent).

369

Longitudinal Data Analysis - Note Set 13

Documents