BIOST 536 Lecture 14 1 Lecture 14 – Stratified Models Outline Description of stratified models Exact and approximate conditional likelihood Comparison to standard logistic regression models Comparison to Mantel-Haenszel Stratification m ay be done after the fact M any casesand controlsin the sam e stratum (no longer 1 case m atched to m controls) U nderlying m odelstill: logit(P log(P P X X X ji ji ji ji1 ji2 jik ) /( )) ... 1 1 2 j k for the j th stratum , i th person in thatstratum Stratified conditionalm odel: “R em ove” 1 2 , ,..., J from the m odel C an be com pared to the standard logistic regression m odelto see ifthe coefficientsare sim ilar D on’thave to think very hard abouthow to m odelthe confounder, butdo need to be consciousofthe num ber ofstrata (every stratum hasto have atleastone case or the entire stratum islost)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
BIOST 536 Lecture 14 1
Lecture 14 – Stratified Models Outline
Description of stratified models Exact and approximate conditional likelihood Comparison to standard logistic regression models Comparison to Mantel-Haenszel
Stratification may be done after the fact
Many cases and controls in the same stratum (no longer 1 case matched to m controls)
Underlying model still :
logit (P log (P P X X Xji ji ji ji1 ji2 jik) / ( )) ... 1 1 2 j k
for the j th stratum, i th person in that stratum
Stratified conditional model : “Remove” 1 2, ,..., J from the model
Can be compared to the standard logistic regression model to see if the coefficients are similar
Don’t have to think very hard about how to model the confounder, but do need to be conscious of the
number of strata (every stratum has to have at least one case or the entire stratum is lost)
BIOST 536 Lecture 14 2
Leisure world data Suppose they stratified on 5-year age groups post hoc, rather
than forming matched sets
Multiple cases and controls in each stratum Need a minimum of one case and one control per stratum Cases are compared to only controls within their stratum and
Estimates are nonsensical – do not have enough data in the separate age categories to do this
Try a different way of assessing effect modification . gen alcage=alc*age . xi: clogit cc alc tob alcage , group(age) note: multiple positive outcomes within groups encountered. Conditional (fixed-effects) logistic regression Number of obs = 975 LR chi2(3) = 156.21 Prob > chi2 = 0.0000 Log likelihood = -342.75918 Pseudo R2 = 0.1856 ------------------------------------------------------------------------------ cc | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- alc | 1.523761 .3955047 3.85 0.000 .7485858 2.298936 tob | .4328816 .0960843 4.51 0.000 .2445598 .6212033 alcage | -.1186174 .0963783 -1.23 0.218 -.3075154 .0702805 ------------------------------------------------------------------------------ . lrtest . A Likelihood-ratio test LR chi2(1) = 1.51 (Assumption: A nested in .) Prob > chi2 = 0.2184
BIOST 536 Lecture 14 12
Alcohol/tobacco example
No evidence of effect modification of age on the tobacco effect using the model
1 6 1 2 3logit (p) ( 1) ... ( 6) *age age tob alc tob age
This is a simpler form of effect modification that assumes the log OR for tobacco
changes linearly with age group
Interaction terms can be modeled in a simpler way than main effects, but not vice-versa
Test whether alcohol and tobacco interact . gen alctob=alc*tob . xi: clogit cc alc tob alctob , group(age) or Conditional (fixed-effects) logistic regression Number of obs = 975 LR chi2(3) = 155.78 Prob > chi2 = 0.0000 Log likelihood = -342.97434 Pseudo R2 = 0.1851 ------------------------------------------------------------------------------ cc | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- alc | 3.521209 .7743686 5.72 0.000 2.288229 5.418565 tob | 1.929266 .4469089 2.84 0.005 1.225219 3.037879 alctob | .9031615 .0875702 -1.05 0.293 .7468497 1.092189 ------------------------------------------------------------------------------ . lrtest . A Likelihood-ratio test LR chi2(1) = 1.08 (Assumption: A nested in .) Prob > chi2 = 0.2978
No evidence of an interaction between the two grouped linear variables
BIOST 536 Lecture 14 13
Alcohol/tobacco exampleReturn to two variable model Conditional (fixed-effects) logistic regression Number of obs = 975 LR chi2(2) = 154.69 Prob > chi2 = 0.0000 Log likelihood = -343.51645 Pseudo R2 = 0.1838 ------------------------------------------------------------------------------ cc | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- alc | 1.059052 .1043685 10.15 0.000 .8544933 1.26361 tob | .4360365 .0958063 4.55 0.000 .2482596 .6238133
Compute the estimated linear predictor
1 1 2 2 kˆ ˆ ˆX X ... Xk
which in this case is
0.436 tobacco 1.059 alcohol . predict xb, xb . sum xb Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- xb | 975 2.732436 1.110127 1.495088 5.980353
If 1 1 2 2 kˆ ˆ ˆ1 X X ... X 0k for all observations then we can use a special test
for deciding if the covariates act multiplicatively on the OR (Barlow, 1985)
Compute 1 1 2 2 k 1 1 2 2 kˆ ˆ ˆ ˆ ˆ ˆ1 X X ... X log 1 X X ... Xk kz and fit z
as another covariate ; if z is not statistically significant then the multiplicative model is OK
BIOST 536 Lecture 14 14
Alcohol/tobacco example. gen z=(1+xb)*log(1+xb) . clogit cc alc tob z, group(age) Conditional (fixed-effects) logistic regression Number of obs = 975 LR chi2(3) = 155.47 Prob > chi2 = 0.0000 Log likelihood = -343.12662 Pseudo R2 = 0.1847 ------------------------------------------------------------------------------ cc | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- alc | 2.368205 1.48855 1.59 0.112 -.5492987 5.285709 tob | .9711536 .6152065 1.58 0.114 -.234629 2.176936 z | -.5048729 .5714515 -0.88 0.377 -1.624897 .6151514 ------------------------------------------------------------------------------ . lrtest . A Likelihood-ratio test LR chi2(1) = 0.78 (Assumption: A nested in .) Prob > chi2 = 0.3772
No evidence against the multiplicative model
BIOST 536 Lecture 14 15
Binary exposure variableTesting a binary exposure only with stratification
Suppose we are interested only in the relationship of low alcohol (0-79) to high alcohol (80+ g) and its
relationship to esophageal cancer
How do we control for age group ?
- Many choices for computing the OR adjusting for age (ordinary logistic with age as an explicit covariate;
conditional logistic with age as an implicit covariate; Mantel-Haenszel)
Unconditional logistic regression - Fit age group and add alcohol(binary) to the model and test with a LR test . gen alcbin=(alc>2) . xi: logistic cc i.age i.age _Iage_1-6 (naturally coded; _Iage_1 omitted) Logistic regression Number of obs = 975 LR chi2(5) = 121.04 Prob > chi2 = 0.0000 Log likelihood = -434.22195 Pseudo R2 = 0.1223 ------------------------------------------------------------------------------ cc | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Iage_2 | 5.447368 5.777946 1.60 0.110 .6812858 43.55562 _Iage_3 | 31.67665 32.24812 3.39 0.001 4.307063 232.9685 _Iage_4 | 52.6506 53.37904 3.91 0.000 7.218137 384.0445 _Iage_5 | 59.66981 60.74305 4.02 0.000 8.114154 438.7995 _Iage_6 | 48.22581 50.98864 3.67 0.000 6.071737 383.0417 ------------------------------------------------------------------------------ . est store A
BIOST 536 Lecture 14 16
Binary exposure variable. xi: logistic cc i.age alcbin i.age _Iage_1-6 (naturally coded; _Iage_1 omitted) Logistic regression Number of obs = 975 LR chi2(6) = 200.57 Prob > chi2 = 0.0000 Log likelihood = -394.46094 Pseudo R2 = 0.2027 ------------------------------------------------------------------------------ cc | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Iage_2 | 4.675303 4.983382 1.45 0.148 .5787862 37.76602 _Iage_3 | 24.50217 25.06914 3.13 0.002 3.298423 182.0131 _Iage_4 | 40.99664 41.75634 3.65 0.000 5.56895 301.8028 _Iage_5 | 52.81958 54.03823 3.88 0.000 7.111389 392.3155 _Iage_6 | 52.57232 55.99081 3.72 0.000 6.519386 423.9432 alcbin | 5.311584 1.007086 8.81 0.000 3.662981 7.702174 ------------------------------------------------------------------------------ . lrtest . A Likelihood-ratio test LR chi2(1) = 79.52 (Assumption: A nested in .) Prob > chi2 = 0.0000
Highly significant result - this is the maximum likelihood estimate
( See page 144, Breslow and Day)
However, note there is some evidence for a poor model fit . lfit Logistic model for cc, goodness-of-fit test number of observations = 975 number of covariate patterns = 12 Pearson chi2(5) = 9.32 Prob > chi2 = 0.0970
BIOST 536 Lecture 14 17
Now try conditional logistic regression . clogit cc alcbin, group(age) or note: multiple positive outcomes within groups encountered. Conditional (fixed-effects) logistic regression Number of obs = 975 LR chi2(1) = 79.01 Prob > chi2 = 0.0000 Log likelihood = -381.35643 Pseudo R2 = 0.0939 ------------------------------------------------------------------------------ cc | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- alcbin | 5.250918 .9913914 8.78 0.000 3.626815 7.6023 ------------------------------------------------------------------------------
Pretty close to the unconditional estimate - the conditional estimate will also
be very close to the Mantel-Haenszel estimate . mhodds cc alcbin age Mantel-Haenszel estimate of the odds ratio Comparing alcbin==1 vs. alcbin==0, controlling for age ---------------------------------------------------------------- Odds Ratio chi2(1) P>chi2 [95% Conf. Interval] ---------------------------------------------------------------- 5.157623 85.01 0.0000 3.494918 7.611359 ---------------------------------------------------------------- Consider another binary exposure variable (0-9 cigarettes versus more) . gen tobbin=(tob>1) if tob~=. . tabulate tobbin cc | cc tobbin | 0 1 | Total -----------+----------------------+---------- 0 | 447 78 | 525 1 | 328 122 | 450 -----------+----------------------+---------- Total | 775 200 | 975
BIOST 536 Lecture 14 18
Binary exposure variableFirst fit a conditional logistic regression model . clogit cc tobbin, group(age) or Conditional (fixed-effects) logistic regression Number of obs = 975 LR chi2(1) = 22.65 Prob > chi2 = 0.0000 Log likelihood = -409.53599 Pseudo R2 = 0.0269 ------------------------------------------------------------------------------ cc | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- tobbin | 2.24614 .3874419 4.69 0.000 1.60181 3.149652 ------------------------------------------------------------------------------
Now consider testing for effect modification by age - similar to testing
for heterogeneity with the Mantel-Haenszel estimate . xi: clogit cc tobbin i.age*tobbin, group(age) or Conditional (fixed-effects) logistic regression Number of obs = 975 LR chi2(5) = 26.96 Prob > chi2 = 0.0001 Log likelihood = -407.38213 Pseudo R2 = 0.0320 ------------------------------------------------------------------------------ cc | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- tobbin | 5.18e+07 3.02e+11 0.00 0.998 0 . _IageXtobb~2 | 8.65e-08 .0005046 -0.00 0.998 0 . _IageXtobb~3 | 5.13e-08 .0002995 -0.00 0.998 0 . _IageXtobb~4 | 4.87e-08 .0002845 -0.00 0.998 0 . _IageXtobb~5 | 2.67e-08 .0001557 -0.00 0.998 0 . _IageXtobb~6 | 4.02e-08 .0002346 -0.00 0.998 0 .
Model gives nonsensical estimates – must be an age stratum with too sparse data
Check for an age effect on condom use . logistic condom age Logistic regression Number of obs = 137 LR chi2(1) = 8.11 Prob > chi2 = 0.0044 Log likelihood = -90.463046 Pseudo R2 = 0.0429 ------------------------------------------------------------------------------ condom | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .9191588 .0280178 -2.77 0.006 .8658531 .9757463 ------------------------------------------------------------------------------
Significant age effect (linear) with use decreasing with age - check to see if the linear age model is sufficient
. fracpoly logistic condom age, compare -> gen double Iage__1 = X^-2-.1270867769 if e(sample) -> gen double Iage__2 = X^-2*ln(X)-.131082712 if e(sample) (where: X = age/10) Logistic regression Number of obs = 137 LR chi2(2) = 8.75 Prob > chi2 = 0.0126 Log likelihood = -90.144149 Pseudo R2 = 0.0463 ------------------------------------------------------------------------------ condom | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Iage__1 | 4.78e-07 7.34e-06 -0.95 0.343 4.23e-20 5416774 Iage__2 | 4.53e+21 1.56e+23 1.45 0.147 2.42e-08 8.46e+50 ------------------------------------------------------------------------------ Deviance: 180.29. Best powers of age among 44 models fit: -2 -2. --------------------------------------------------------------- age df Deviance Dev. dif. P [*] Powers --------------------------------------------------------------- Not in model 0 189.038 8.750 0.068 Linear 1 180.926 0.638 0.888 1 m = 1 2 180.665 0.377 0.828 3 m = 2 4 180.288 -- -- -2 -2 --------------------------------------------------------------- [*] P-value from deviance difference comparing reported model with m = 2 model
BIOST 536 Lecture 14 26
Linear age term seems sufficient; check survey variables . logistic condom age suscept Logistic regression Number of obs = 137 LR chi2(2) = 9.46 Prob > chi2 = 0.0088 Log likelihood = -89.786959 Pseudo R2 = 0.0501 ------------------------------------------------------------------------------ condom | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .9207313 .0281974 -2.70 0.007 .8670913 .9776895 suscept | .8528191 .117933 -1.15 0.250 .6503507 1.11832
The more susceptible to AIDS - the less likely to use condoms (!!) . logistic condom age severity Logistic regression Number of obs = 137 LR chi2(2) = 10.52 Prob > chi2 = 0.0052 Log likelihood = -89.259015 Pseudo R2 = 0.0557 ------------------------------------------------------------------------------ condom | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .924917 .0284517 -2.54 0.011 .8708004 .9823968 severity | .7475505 .1430667 -1.52 0.128 .5137325 1.087787 ------------------------------------------------------------------------------
The greater the belief that AIDS is severe - the less likely to use condoms (!!) . logistic condom age barrier1 barrier2 barrier3 Logistic regression Number of obs = 137 LR chi2(4) = 10.74 Prob > chi2 = 0.0296 Log likelihood = -89.148431 Pseudo R2 = 0.0568 ------------------------------------------------------------------------------ condom | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .9188923 .0283186 -2.74 0.006 .865032 .9761062 barrier1 | .7901184 .1295718 -1.44 0.151 .5729324 1.089635 barrier2 | 1.149433 .2147997 0.75 0.456 .7969219 1.657874 barrier3 | .9401959 .1583129 -0.37 0.714 .6759127 1.307814 ------------------------------------------------------------------------------
None of the barriers appears to affect condom use
BIOST 536 Lecture 14 27
. logistic condom age benefit1 benefit2 benefit3 Logistic regression Number of obs = 137 LR chi2(4) = 13.00 Prob > chi2 = 0.0113 Log likelihood = -88.017402 Pseudo R2 = 0.0688 ------------------------------------------------------------------------------ condom | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .917169 .0288622 -2.75 0.006 .8623094 .9755188 benefit1 | .9553348 .1838037 -0.24 0.812 .6552195 1.392914 benefit2 | 1.137751 .1982079 0.74 0.459 .8086474 1.600793 benefit3 | 1.493691 .2881765 2.08 0.038 1.023385 2.18013 ------------------------------------------------------------------------------ . logistic condom age benefit3 Logistic regression Number of obs = 137 LR chi2(2) = 12.43 Prob > chi2 = 0.0020 Log likelihood = -88.305042 Pseudo R2 = 0.0657 ------------------------------------------------------------------------------ condom | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .917197 .0285421 -2.78 0.005 .8629273 .9748797 benefit3 | 1.453072 .2693064 2.02 0.044 1.010483 2.089514
Only one of the benefits appears to affect condom use
"Improving the relationship" increases condom use
Consider partner's age as an additional predictor to male’s age
BIOST 536 Lecture 14 28
. logistic condom age ptage Logistic regression Number of obs = 137 LR chi2(2) = 8.33 Prob > chi2 = 0.0155 Log likelihood = -90.355054 Pseudo R2 = 0.0441 ------------------------------------------------------------------------------ condom | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .9305389 .0374007 -1.79 0.073 .8600478 1.006808 ptage | .9804494 .0416898 -0.46 0.642 .9020512 1.065661 ------------------------------------------------------------------------------
Neither age effect is significant in a Wald test - possible collinearity? . corr age ptage (obs=137) | age ptage -------------+------------------ age | 1.0000 ptage | 0.6715 1.0000
Reparametrize using age difference . gen agediff=age-ptage . logistic condom age agediff Logistic regression Number of obs = 137 LR chi2(2) = 8.33 Prob > chi2 = 0.0155 Log likelihood = -90.355054 Pseudo R2 = 0.0441 ------------------------------------------------------------------------------ condom | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .9123463 .0316291 -2.65 0.008 .8524135 .9764929 agediff | 1.01994 .043369 0.46 0.642 .9383845 1.108585 ------------------------------------------------------------------------------
Model has the same predictive effect as previous model
Ageeduc = 12 will drop out since no members use condoms . clogit condom benefit3 , group(ageeduc) note: 1 group (3 obs) dropped because of all positive or all negative outcomes. Conditional (fixed-effects) logistic regression Number of obs = 134 LR chi2(1) = 3.06 Prob > chi2 = 0.0803 Log likelihood = -69.653107 Pseudo R2 = 0.0215 ------------------------------------------------------------------------------ condom | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- benefit3 | .3190356 .1864274 1.71 0.087 -.0463554 .6844266 ------------------------------------------------------------------------------
Slightly reduced OR for benefit3 - now technically ns; Strata are too small
In this case the unconditional analysis might be preferred since the age effect is of scientific interest