1 Problems with infinite solutions in logistic regression Ian White MRC Biostatistics Unit, Cambridge UK Stata Users’ Group London, 12th September 2006.

1

Problems with infinite solutions in logistic regression

Ian WhiteMRC Biostatistics Unit, Cambridge

UK Stata Users’ GroupLondon, 12th September 2006

2

Introduction

• I teach logistic regression for the analysis of case-control studies to Epidemiology Master’s students, using Stata

• I stress how to work out degrees of freedom– e.g. if E has 2 levels and M has 4 levels then you get

3 d.f. for testing the E*M interaction

• Our practical uses data on 244 cases of leprosy and 1027 controls– previous BCG vaccination is the exposure of interest– level of schooling is a possible effect modifier– in what follows I’m ignoring other confounders

3

Leprosy data-> tabulation of d outcome

0=control, | 1=case | Freq. Percent Cum.------------+----------------------------------- 0 | 1,027 80.80 80.80 1 | 244 19.20 100.00------------+----------------------------------- Total | 1,271 100.00

-> tabulation of bcg exposure

BCG scar | Freq. Percent Cum.------------+----------------------------------- Absent | 743 58.46 58.46 Present | 528 41.54 100.00------------+----------------------------------- Total | 1,271 100.00

-> tabulation of school possible effect modifier

Schooling | Freq. Percent Cum.------------+----------------------------------- 0 | 282 22.19 22.19 1 | 606 47.68 69.87 2 | 350 27.54 97.40 3 | 33 2.60 100.00------------+----------------------------------- Total | 1,271 100.00

4

Main effects model. xi: logistic d i.bcg i.schooli.bcg _Ibcg_0-1 (naturally coded; _Ibcg_0 omitted)i.school _Ischool_0-3 (naturally coded; _Ischool_0 omitted)

Logistic regression Number of obs = 1271 LR chi2(4) = 97.50 Prob > chi2 = 0.0000Log likelihood = -572.86093 Pseudo R2 = 0.0784

------------------------------------------------------------------------------ d | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- _Ibcg_1 | .2908624 .0523636 -6.86 0.000 .204384 .4139314 _Ischool_1 | .7035071 .1197049 -2.07 0.039 .5040026 .9819836 _Ischool_2 | .4029998 .0888644 -4.12 0.000 .2615825 .6208704 _Ischool_3 | .09077 .0933769 -2.33 0.020 .0120863 .6816944------------------------------------------------------------------------------

. estimates store main

5

Interaction model. xi: logistic d i.bcg*i.schooli.bcg _Ibcg_0-1 (naturally coded; _Ibcg_0 omitted)i.school _Ischool_0-3 (naturally coded; _Ischool_0 omitted)i.bcg*i.school _IbcgXsch_#_# (coded as above)


------------------------------------------------------------------------------ d | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- _Ibcg_1 | .2248804 .0955358 -3.51 0.000 .0977993 .5170913 _Ischool_1 | .6626409 .1234771 -2.21 0.027 .4599012 .9547549 _Ischool_2 | .4116581 .1027612 -3.56 0.000 .2523791 .6714598 _Ischool_3 | 1.28e-08 1.42e-08 -16.41 0.000 1.46e-09 1.12e-07_IbcgXsch_~1 | 1.448862 .7046411 0.76 0.446 .5585377 3.758385_IbcgXsch_~2 | 1.086848 .6226504 0.15 0.884 .3536056 3.340553_IbcgXsch_~3 | 4.25e+07 . . . . .------------------------------------------------------------------------------Note: 17 failures and 0 successes completely determined.

. estimates store inter

6

The problem. table bcg school, by(d)

----------------------------------0=control |, 1=case |and BCG | Schoolingscar | 0 1 2 3----------+-----------------------0 | Absent | 141 257 129 17 Present | 57 229 182 15----------+-----------------------1 | Absent | 77 93 29 Present | 7 27 10 1----------------------------------

7

LR test

. xi: logistic d i.bcg i.school

LR chi2(4) = 97.50

Log likelihood = -572.86093

. estimates store main

. xi: logistic d i.bcg*i.school

LR chi2(7) = 101.43


. estimates store inter

. lrtest main inter

Likelihood-ratio test LR chi2(2) = 3.92

(Assumption: main nested in inter) Prob > chi2 = 0.1407

8

What is Stata doing? (guess)

• Recognises the information matrix is singular• Hence reduces model df by 1• In other situations Stata drops observations

– if a single variable perfectly predicts success/failure– this happens if the problematic cell doesn’t occur in a

reference category– then Stata refuses to perform lrtest, but we can force it to do so

– Stata still gets df=2; can use df(3) option

9

. gen bcgrev=1-bcg

. xi: logistic d i.bcgrev*i.schooli.bcgrev _Ibcgrev_0-1 (naturally coded; _Ibcgrev_0 omitted)i.school _Ischool_0-3 (naturally coded; _Ischool_0 omitted)i.bcg~v*i.sch~l _IbcgXsch_#_# (coded as above)

note: _IbcgXsch_1_3 != 0 predicts failure perfectly _IbcgXsch_1_3 dropped and 17 obs not used


------------------------------------------------------------------------------ d | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- _Ibcgrev_1 | 4.446809 1.889136 3.51 0.000 1.933895 10.22502 _Ischool_1 | .9600749 .4312915 -0.09 0.928 .3980361 2.315729 _Ischool_2 | .4474097 .2307071 -1.56 0.119 .1628482 1.229215 _Ischool_3 | .5428571 .6013396 -0.55 0.581 .0619132 4.75979_IbcgXsch_~1 | .6901971 .3356713 -0.76 0.446 .2660717 1.79039_IbcgXsch_~2 | .920092 .5271167 -0.15 0.884 .2993516 2.82801------------------------------------------------------------------------------

. est store interrev

. lrtest interrev mainobservations differ: 1254 vs. 1271r(498);

. lrtest interrev main, force

Likelihood-ratio test LR chi2(2) = 3.92(Assumption: main nested in interrev) Prob > chi2 = 0.1407

10

What’s right?

• Zero cell suggests small sample so asymptotic 2

distribution may be inappropriate for LRT– true in this case: have a bcg*school category with only 1

observation– but I’m going to demonstrate the same problem in hypothetical

example with expected cell counts > 3 but a zero observed cell count

• Could combine or drop cells to get rid of zeroes– but the cell with zeroes may carry information

• Problems with testing boundary values are well known – e.g. LRT for testing zero variance component isn’t 2

1

– but here the point estimate, not the null value, is on the boundary

11

Example to explain why LRT makes some sense

. tab x y, chi2 exact

| y x | 0 1 | Total-----------+----------------------+---------- 0 | 10 20 | 30 1 | 0 10 | 10-----------+----------------------+---------- Total | 10 30 | 40

Pearson chi2(1) = 4.4444 Pr = 0.035 Fisher's exact = 0.043 1-sided Fisher's exact = 0.035

12

Model: logit P(y=1|x) = + x-2

6-2

4-2

2-2

0-1

8Lo

g lik

elih

ood

0 5 10beta

Difference in log lik = 3.4

LRT = 6.8 on 0 df?

13

Example to explore correct df using Pearson / Fisher as gold standard

. tab x y, chi2 exact

| y x | 0 1 | Total-----------+----------------------+---------- 1 | 6 0 | 6 2 | 3 6 | 9 3 | 3 6 | 9 -----------+----------------------+---------- Total | 12 12 | 24

Pearson chi2(2) = 8.0000 Pr = 0.018 Fisher's exact = 0.029

• All expected counts ≥3

• Don’t want to drop or merge category 1 - contains the evidence for association!

14

. xi: logistic y i.xi.x _Ix_1-3 (naturally coded; _Ix_1 omitted)


------------------------------------------------------------------------------ y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- _Ix_2 | 1.61e+08 1.61e+08 18.90 0.000 2.27e+07 1.14e+09 _Ix_3 | 1.61e+08 . . . . .------------------------------------------------------------------------------Note: 6 failures and 0 successes completely determined.

. est store x

. xi: logistic y

Logistic regression Number of obs = 24 LR chi2(0) = 0.00 Prob > chi2 = .Log likelihood = -16.635532 Pseudo R2 = 0.0000

------------------------------------------------------------------------------ y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------------------------------------------------------------------------------------

. est store null

15

LRT

. xi: logistic y i.x


. est store x

. xi: logistic y


. est store null

. lrtest x null

Likelihood-ratio test LR chi2(1) = 10.36(Assumption: null nested in x) Prob > chi2 = 0.0013

16

Comparison of tests | y x | 0 1 | Total-----------+----------------------+---------- 1 | 6 0 | 6 2 | 3 6 | 9 3 | 3 6 | 9 -----------+----------------------+---------- Total | 12 12 | 24

Pearson chi2(2) = 8.0000 P = 0.018 Fisher's exact = P = 0.029 LR chi2(1) = 10.36 P = 0.0013 (using 2df: P = 0.0056)

Clearly LRT isn’t great. But 1df is even worse than 2df

17

Note

• In this example, we could use Pearson / Fisher as gold standard.

• Can’t do this in more complex examples (e.g. adjust for several covariates).

18

My proposal for Stata

• lrtest appears to adjust df for infinite parameter estimates: it should not

• Model df should be incremented to allow for any variables dropped because they perfectly predict success/failure– Don’t need to increment log lik as it is 0 for

the cases dropped

• Can the ad hoc handling of zeroes by xi:logistic be improved?

19

Conclusions for statisticians

• Must remember the 2 approximation is still poor for these LRTs– typically anti-conservative? (Kuss, 2002)

• Performance of LRT can be improved by using penalised likelihood (Firth, 1993; Bull, 2006) - like a mildly informative prior– worth using routinely?

• Gold standard: Bayes or exact logistic regression (logXact)?

20

The end

21

Output for example with 2-level x. logit y x


------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- _cons | .6931472 .3872983 1.79 0.074 -.0659436 1.452238------------------------------------------------------------------------------

. estimates store x

. logit y


------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- _cons | 1.098612 .3651484 3.01 0.003 .3829346 1.81429------------------------------------------------------------------------------

. estimates store null

. lrtest x nulldf(unrestricted) = df(restricted) = 1r(498);

. lrtest x null, force df(1)

Likelihood-ratio test LR chi2(1) = 6.80(Assumption: null nested in x) Prob > chi2 = 0.0091

1 Problems with infinite solutions in logistic regression Ian White MRC Biostatistics Unit, Cambridge UK Stata Users’ Group London, 12th September 2006.

Documents