Top Banner
CHAPTER 5 ST 544, D. Zhang 5 Building and Applying Logistic Regression Models I Strategies in Model Selection I.1 Num of x’s in a logistic regression model # of x’s can be entered in the model: Rule of thumb: # of events (both [Y = 1] and [Y = 0]) per x 10. Need to be aware of collinearity in x’s. Slide 248
51

CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

Jul 07, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

5 Building and Applying Logistic

Regression Models

I Strategies in Model Selection

I.1 Num of x’s in a logistic regression model

• # of x’s can be entered in the model:

Rule of thumb: # of events (both [Y = 1] and [Y = 0]) per x ≥ 10.

• Need to be aware of collinearity in x’s.

Slide 248

Page 2: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

I.2 Crab data revisited

• If we throw all indep variables to the logistic regression:

logit{π} = α+ β1c1 + β2c2 + β3c3 + β4s1 + β5s2 + β6wt+ β7width

The LRT for H0 : all β’s = 0 is 40.6 with df = 7 (p-value < 0.0001).

• However, only β2 is significantly from 0! Something is wrong.

• Collinearity is an issue! Wt, width and color are correlated.Slide 249

Page 3: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

I.3 Variable selection

• Use traditional model selection procedures (used when p << n)

1. Forward selection (simple one + variant)

2. Backward elimination

3. Better to use LRT for variable selection

4. Can consider interactions (usually 2-way interactions)

• Use modern model selection procedures, usually in the form of

penalized likelihood (can handle p > n); New research area.

Slide 250

Page 4: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

I.4 Backward elimination for crab data

The table indicates that model 5 (M3 on slide 241) may be considered

the final model.

Slide 251

Page 5: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

I.5 Use AIC or BIC for model selection

• AIC formula (smaller, the better):

AIC = -2 (log likelihood - # of parameters in the model)

• AIC “penalizes a bigger model” by its complexity/size.

• For model 5 in Table 5.2, the SAS program and output:data crab;

input color spine width satell weight;weight=weight/1000;color=color-1;y=(satell>0);n=1;

if color<4 then c=1;else c=0;

datalines;3 3 28.3 8 30504 3 22.5 0 15502 1 26.0 9 2300....;

Slide 252

Page 6: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

proc genmod descending;model y/n = width c / dist=bin;

run;

************************************************************************Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 170 187.9579 1.1056Scaled Deviance 170 187.9579 1.1056Pearson Chi-Square 170 167.4557 0.9850Scaled Pearson X2 170 167.4557 0.9850Log Likelihood -93.9789Full Log Likelihood -93.9789AIC (smaller is better) 193.9579AICC (smaller is better) 194.0999BIC (smaller is better) 203.4178

AIC = −2(−93.98− 3) = 193.96 ≈ 194.

• Note: Now Proc Genmod and Proc Logistic do not produce

Pearson χ2 and deviance for binary data anymore, unless

aggregate=(width c) is used, in which case their df=# of distinct

settings determined by width and c - # of parameters in the model.

In the above program, we tricked proc genmod by using y/n so the

procedure does not think the data is binary.Slide 253

Page 7: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

I.6 Summarizing predictive power, classification tables and ROC curves

• Suppose we have binary response Yi = 1/0 (success/failure), xi a

vector of covariates.

π(xi) = P [Yi = 1|xi]

logit{π(xi)} = xTi β(can have more than 1 x)

After we fit the model, we got β ⇒ we got πi as

πi =ex

Ti β

1 + exTi β.

• Choose a known value π0 (e.g., π0 = 0.5), and conduct prediction Yi as

Yi =

1 if πi > π0

0 otherwise

Slide 254

Page 8: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

and then construct the table (classification table)

Y

1 0

Y 1 n11 n12

0 n21 n22

The following two quantities tell us how good the prediction is:

sensitivity = n11

n11+n12

specificity = n22

n21+n22

• Using only one table with one π0 loses information.

• Solution: use many different values of π0 ⇒ many classification tables

⇒ many pairs of sensitivity and specificity ⇒ plot sensitivity v.s. 1−specificity ⇒ ROC (receiver operating characteristic curve) ⇒ Area

under the ROC curve summarizes the predictive power of the model,

often called the c-index.

Slide 255

Page 9: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• An example:

Y π Y0.3− Y0.4− Y0.5− Y0.6− Y0.7− Y0.8− Y0.8+

1 0.8 1 1 1 1 1 1 0

1 0.6 1 1 1 1 0 0 0

1 0.4 1 1 0 0 0 0 0

0 0.7 1 1 1 1 1 0 0

0 0.5 1 1 1 0 0 0 0

0 0.3 1 0 0 0 0 0 0

Y

Y 1 0

1 3 0

0 3 0

se = 33

sp = 03

3 0

2 1

se = 33

sp = 13

2 1

2 1

se = 23

sp = 13

2 1

1 2

se = 23

sp = 23

1 2

1 2

se = 13

sp = 23

1 2

0 3

se = 13

sp = 33

0 3

0 3

se = 03

sp = 33

Slide 256

Page 10: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

ROC curve for the example

Slide 257

Page 11: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• The AUC for the above ROC curve:

1− 3

9=

2

3

= proportion of concordant pairs in (Yi, πi) among all pairs with

different outcome Yi.

# of pairs with different outcomes: 3× 3 = 9.

# of concordant pairs: 3 + 2 + 1 = 6.

Slide 258

Page 12: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• If there are ties in πi’s, need to do some adjustment. For example,suppose two πi’ for a Yi = 1 and a Yi = 0 are the same (0.4):

Y π Y0.4− Y0.5− Y0.6− Y0.7− Y0.8− Y0.8+

1 0.8 1 1 1 1 1 0

1 0.6 1 1 1 0 0 0

1 0.4 1 0 0 0 0 0

0 0.7 1 1 1 1 0 0

0 0.5 1 1 0 0 0 0

0 0.4 1 0 0 0 0 0

The corresponding classification tables are:

Y

Y 1 0

1 3 0

0 3 0

se = 33

sp = 03

2 1

2 1

se = 23

sp = 13

2 1

1 2

se = 23

sp = 23

1 2

1 2

se = 13

sp = 23

1 2

0 3

se = 13

sp = 33

0 3

0 3

se = 03

sp = 33

Slide 259

Page 13: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

ROC curve when there are tied predictive probs

Slide 260

Page 14: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• AUC = 5.59

9 = # of pairs with diff outcomes

5.5 = # of concordant pairs (5) + 0.5 × # of ties in πi’s with diff.

outcomes (1).

• Note: For binomial data, we need to decompose them as binary data.

There will be a lot tied predicted probabilities.

• The program to get πi, ROC curve and c-index:Proc logistic; * may need descending for binary y;

model y/n = x / outroc=roc;output out=outpred predicted=pihat;

run;

title "ROC Plot";symbol1 v=dot i=join;proc gplot data=roc;

plot _sensit_*_1mspec_;run;

here variable 1mspec means 1 minus specificity.

Slide 261

Page 15: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• SAS program and output for the logistic model for crab data:

M3 : logit{π(x, c)} = α+ β1c+ β2x

title "ROC Curve and c-index";proc logistic descending;

model y = width c / link=logit outroc=roc;output out=outpred predicted=pihat;

run;

proc plot data=roc;plot _sensit_*_1mspec_;

run;

*************************************************************************

Analysis of Maximum Likelihood Estimates

Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -12.9795 2.7272 22.6502 <.0001width 1 0.4782 0.1041 21.0841 <.0001c 1 1.3005 0.5259 6.1162 0.0134

Association of Predicted Probabilities and Observed Responses

Percent Concordant 76.7 Somers’ D 0.544Percent Discordant 22.3 Gamma 0.549Percent Tied 0.9 Tau-a 0.252Pairs 6882 c 0.772

Slide 262

Page 16: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

ROC curve from the model:

Plot of _SENSIT_*_1MSPEC_. Legend: A = 1 obs, B = 2 obs, etc.

1.0811 +|| BAAA AABA| A BAA A A

0.9009 + AAAAB AAA| A A A| A AA A

S | AAe 0.7207 + AAABn | B As | Ai | A At 0.5405 + A Ai | A Bv | BAi | Bt 0.3604 + By | BA

| B| B

0.1802 + BA| A| D| D

0.0000 + B---+--------+--------+--------+--------+--------+--------+--------+--0.0000 0.1452 0.2903 0.4355 0.5806 0.7258 0.8710 1.0161

1 - Specificity

Slide 263

Page 17: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

II Model Checking for Logistic Models

II.1 LRT testing current model to more complex models

• Suppose we would like to see if the logistic model (with only one x):

log{(π(x)} = α+ βx

fits the data well, we can fit a more complex model such as

log{(π(x)} = α+ β1x+ β2x2.

and test H0 : β2 = 0 using the Wald, score and LRT tests. LRT is

usually preferred.

Slide 264

Page 18: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

II.2 Goodness of fit using deviance and Pearson χ2 for grouped data

• For binomial data like the Snoring/Heart disease example:

Heart Disease

x Yes (yi) No ni

0 Never 24 1355 1379

Snoring 2 Occasionally 35 605 638

4 Nearly every night 21 192 213

5 Every night 30 224 254

where ni →∞, we can use the deviance or Pearson χ2 to check the

goodness of fit of the logistic model

logit{(π(x)} = α+ βx.

Slide 265

Page 19: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• Treat the data as if from I × 2 table, the deviance G2(M) of the

current model M can be shown to have the form:

G2(M) = 2∑

obs× log

{obs

fitted

}and the Pearson χ2 have the form:

χ2 =∑ (obs− fitted)2

fitted

where the summation is over 2I cells (8 cells for the previous example)

• For snoring/HD example, we know that linear probability model has a

better fit than the logistic model.

Slide 266

Page 20: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

II.3 Goodness of fit for ungrouped data, Hosmer-Lemeshow test

• After fitting the logistic regression model for binary data (can be

recovered for binomial data), group data into g groups of approximately

the same size based on the estimated success probabilities:

y11, y12, · · · , y1n1

π11, π12, · · · , π1n1 n1

y21, y22, · · · , y2n2

π21, π22, · · · , π2n2n2

· · ·yg1, yg2, · · · , ygng

πg1, πg2, · · · , πgngng

Slide 267

Page 21: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• Then construct the following stat

g∑i=1

(∑ni

j=1 yij −∑ni

j=1 πij)2

(∑ni

j=1 πij)(ni −∑ni

j=1 πij)/ni

H0∼ χ2g−2(roughly),

when the # of distinct covariate patterns is large.

• This is the Hosmer-Lemeshow test of goodness-of-fit.

• The test can be obtained usingProc Logistic;

model y/n = x1 x2 / lackfit;Run;

Slide 268

Page 22: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

II.4 Residuals from the logistic models

• With data yi from Bin(ni, πi) and we fit the logistic model

logit(πi) = α+ βxi.

After we got α, β ⇒ πi:

πi =eα+xiβ

1 + eα+xiβ.

• Pearson Residual:

ei =yi − niπi√niπi(1− πi)

• Standardized Pearson residual

esti =yi − niπiSE

=yi − niπi√

niπi(1− πi)(1− hi)=

ei√1− hi

where hi is the ith element of the hat matrix.

Slide 269

Page 23: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• E(esti ) ≈ 0, var(esti ) ≈ 1 for large ni. So esti behaves like a N(0,1)

random variable. Large esti ( |esti | > 2) indicates potential outlier.

• Plots of esti v.s. xi or xiβ may detect lack of fit.

• When ni = 1 (binary data), esti is not very informative.

• Note: Proc Logistic does not report esti . Need to use Proc

GenMod to get esti .

Slide 270

Page 24: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• Example 1: Residual plot for the crab data:

Model: logit(P [Y = 1|x, c]) = β0 + β1c1 + β2c2 + β3c3 + β4xdata crab;

input color spine width satell weight;weight=weight/1000;color=color-1;satbin=(satell>0);c1 = (color=1);c2 = (color=2);c3 = (color=3);c4 = (color=4);s1 = (spine=1);s2 = (spine=2);datalines;

3 3 28.3 8 30504 3 22.5 0 15502 1 26.0 9 23004 3 24.8 0 21004 3 ...

proc genmod data=crab descending;model satbin = width c1 c2 c3 / dist=bin link=logit;output out=resid ResRaw=ResRaw ResChi=ResChi StdReschi=StdReschi;

run;

data _null_; set resid;file "crab_res";put stdreschi width;

run;

Slide 271

Page 25: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

Slide 272

Page 26: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• Example 2: Admission to Graduate School at UF in 1997-1998 (Table

5.5)

Let π(k, g) = P [admission|D = k,G = g] for department D = k and

gender G = g. We consider three models:

1. π(k, g) = Dk: Admission is independent of gender at each

department.

2. π(k, g) = Dk +Gg: Admission-Gender association is the same

across departments (⇔ logit{π(k, g)} = Dk +Gg).

3. π(k, g) = Gg: Get the marginal Admission-Gender association

collapsed over departments.

options ls=75 ps=100;

data admit;input dept $ gender y yno;n = y+yno;male=gender-1;cards;anth 1 32 81anth 2 21 41astr 1 6 0astr 2 3 8chem 1 12 43chem 2 34 110

Slide 273

Page 27: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

...

title "Model 1: Logistic model assuming gender and admission are";title2 "conditional independent given department";proc genmod;

class dept;model y/n = dept /dist=bin link=logit;output out=resid Resraw=Resraw Reschi=Reschi StdReschi=StdReschi;

run;

data resid; set resid;keep dept male Resraw Reschi StdReschi;

run;

title "Residuals from Model 1";proc print data=resid;run;

title "Model 2: Logistic model with homogeneous GA and DA association";proc genmod data=admit;

class dept;model y/n = dept male;

run;

title "Model 3: Logistic model for marginal GA association";proc genmod data=admit;

model y/n = male;run;

Slide 274

Page 28: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

Part of the output:Model 1: Logistic model assuming gender and admission are 1

conditional independent given department

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 23 44.7352 1.9450Scaled Deviance 23 44.7352 1.9450Pearson Chi-Square 23 40.8523 1.7762Scaled Pearson X2 23 40.8523 1.7762

StdObs dept male Reschi Resraw Reschi

1 anth 0 -0.45509 -2.22286 -0.764572 anth 1 0.61438 2.22286 0.764573 astr 0 2.30940 2.82353 2.870964 astr 1 -1.70561 -2.82353 -2.870965 chem 0 -0.22824 -0.71357 -0.268306 chem 1 0.14105 0.71357 0.268307 clas 0 -0.75593 -0.50000 -1.069048 clas 1 0.75593 0.50000 1.069049 comm 0 -0.16670 -1.04167 -0.63260

10 comm 1 0.61024 1.04167 0.6326011 comp 0 0.85488 1.63636 1.1575212 comp 1 -0.78040 -1.63636 -1.1575213 engl 0 0.67452 3.32130 0.9420914 engl 1 -0.65769 -3.32130 -0.9420915 geog 0 1.79629 2.75000 2.1664116 geog 1 -1.21106 -2.75000 -2.1664117 geol 0 -0.21822 -0.30000 -0.2608218 geol 1 0.14286 0.30000 0.2608219 germ 0 0.89974 0.77273 1.8873020 germ 1 -1.65903 -0.77273 -1.88730

Slide 275

Page 29: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

21 hist 0 -0.14639 -0.31034 -0.1762722 hist 1 0.09820 0.31034 0.1762723 lati 0 1.22493 3.25676 1.6456424 lati 1 -1.09895 -3.25676 -1.6456425 ling 0 0.78403 2.13043 1.3729826 ling 1 -1.12711 -2.13043 -1.3729827 math 0 1.00845 3.30631 1.2884428 math 1 -0.80193 -3.30631 -1.2884429 phil 0 1.22474 1.00000 1.3416430 phil 1 -0.54772 -1.00000 -1.3416431 phys 0 1.17573 2.57576 1.3245832 phys 1 -0.61005 -2.57576 -1.3245833 poli 0 -0.18041 -0.68707 -0.2331834 poli 1 0.14772 0.68707 0.2331835 psyc 0 -1.16905 -2.41176 -2.2722236 psyc 1 1.94841 2.41176 2.2722237 reli 0 0.63246 0.75000 1.2649138 reli 1 -1.09545 -0.75000 -1.2649139 roma 0 0.05868 0.17647 0.1397040 roma 1 -0.12677 -0.17647 -0.1397041 soci 0 0.17272 0.56164 0.3012342 soci 1 -0.24679 -0.56164 -0.3012343 stat 0 -0.00960 -0.02439 -0.0122944 stat 1 0.00768 0.02439 0.0122945 zool 0 -1.23400 -3.10769 -1.7587346 zool 1 1.25314 3.10769 1.75873

Model 2: Logistic model with homogeneous GA and DA association 4

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 22 42.3601 1.9255Scaled Deviance 22 42.3601 1.9255Pearson Chi-Square 22 38.9908 1.7723Scaled Pearson X2 22 38.9908 1.7723

Slide 276

Page 30: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

Analysis Of Maximum Likelihood Parameter Estimates

Standard Wald 95% WaldParameter DF Estimate Error Confidence Limits Chi-Square

Intercept 1 -2.0323 0.2877 -2.5962 -1.4685 49.91dept anth 1 1.2585 0.3277 0.6162 1.9008 14.75dept astr 1 2.2622 0.5631 1.1586 3.3659 16.14

...

male 1 -0.1730 0.1123 -0.3932 0.0472 2.37

Model 3: Logistic model for marginal GA association 6

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 44 449.3122 10.2116Scaled Deviance 44 449.3122 10.2116Pearson Chi-Square 44 409.4050 9.3047Scaled Pearson X2 44 409.4050 9.3047

Analysis Of Maximum Likelihood Parameter Estimates

Standard Wald 95% Confidence WaldParameter DF Estimate Error Limits Chi-Square

Intercept 1 -0.6455 0.0637 -0.7703 -0.5207 102.77male 1 0.0662 0.0921 -0.1142 0.2467 0.52

Models 2 & 3 show Simpson’s Paradox.

Slide 277

Page 31: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• Example 3: Heart disease and blood pressure (Table 5.6, P. 151)data HD;

input bp $ n y;if bp="<117" then

x=111.5;else if bp="117-126" then

x=121.5;else if bp="127-136" then

x=131.5;else if bp="137-146" then

x=141.5;else if bp="147-156" then

x=151.5;else if bp="157-166" then

x=161.5;else if bp="167-186" then

x=176.5;else

x=191.5;cards;<117 156 3117-126 252 17127-136 284 12137-146 271 16147-156 139 12157-166 85 8167-186 99 16>186 43 8

;

proc genmod;model y/n = x /dist=bin link=logit residual;

run;

Slide 278

Page 32: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 6 5.9092 0.9849Scaled Deviance 6 5.9092 0.9849Pearson Chi-Square 6 6.2899 1.0483Scaled Pearson X2 6 6.2899 1.0483

Analysis Of Maximum Likelihood Parameter Estimates

Standard Wald 95% Confidence WaldParameter DF Estimate Error Limits Chi-Square

Intercept 1 -6.0820 0.7243 -7.5017 -4.6624 70.51x 1 0.0243 0.0048 0.0148 0.0338 25.25

Raw Pearson DevianceObservation Residual Residual Residual

Std Deviance Std Pearson LikelihoodResidual Residual Residual

1 -2.194866 -0.979434 -1.061683-1.198648 -1.105788 -1.179257

2 6.3932374 2.0057053 1.85010722.1903838 2.3745999 2.2447199

3 -3.072737 -0.813338 -0.841966-0.978546 -0.945274 -0.970016

4 -2.081617 -0.50673 -0.51623-0.583485 -0.572747 -0.581169

5 0.3836399 0.1175816 0.11700160.1254648 0.1260868 0.1255461

6 -0.856987 -0.304247 -0.308775-0.330927 -0.326074 -0.330303

7 1.791237 0.5134723 0.50496570.6411542 0.651955 0.6452766

8 -0.361958 -0.139464 -0.140243-0.178337 -0.177346 -0.177959

Slide 279

Page 33: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

III Sparse Data

III.1 Complete separation and quasi-complete separation

• Consider the following data set:Obs x1 x2 y

1 1 2 02 2 3 03 3 4 04 4 5 05 5 5 16 6 6 17 7 7 18 8 8 1

There is a complete separation in x1, and quasi-complete separation in

x2.

• What would happen if we fit

M1 : logit(πi) = α+ βx1i

and

M2 : logit(πi) = α+ βx2i?

Slide 280

Page 34: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

Complete separation in x1

If we fit M1, α→ −∞, β →∞.

How about M2?

Slide 281

Page 35: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

III.2 Sparse 2× 2×K tables

Slide 282

Page 36: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• As we see before, we may not be interested in XY marginal

association. Instead should focus on conditional association.

• Consider logistic model for π(x, z) = P [Y = 1|x, z]:

logit{π(x, z)} = βx+ βZk

x = 1/0 for active drug/placebo, k = 1, 2, 3, 4, 5 for 5 centers.

Common odds-ratio θXY |Z = eβ across centers.

• SAS program and part of the output:data fungal;

input center trt y y0;n=y+y0;control=1-trt;cards;1 1 0 51 0 0 92 1 1 122 0 0 103 1 0 73 0 0 54 1 6 34 0 2 65 1 5 95 0 2 12

;

Slide 283

Page 37: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

proc genmod;class center;model y/n = center trt / noint;

run;

*********************************************************************************

Analysis Of Maximum Likelihood Parameter Estimates

Standard Wald 95% WaldParameter DF Estimate Error Confidence Limits Chi-Square Pr > ChiSq

Intercept 0 0.0000 0.0000 0.0000 0.0000 . .center 1 1 -28.0221 213410.4 -418305 418248.7 0.00 0.9999center 2 1 -4.2025 1.1891 -6.5331 -1.8720 12.49 0.0004center 3 1 -27.9293 188688.5 -369851 369794.7 0.00 0.9999center 4 1 -0.9592 0.6548 -2.2426 0.3242 2.15 0.1430center 5 1 -2.0223 0.6700 -3.3354 -0.7092 9.11 0.0025trt 1 1.5460 0.7017 0.1708 2.9212 4.85 0.0276Scale 0 1.0000 0.0000 1.0000 1.0000

• From the output, we know that for centers 1 & 3, βZk = −∞.

• β = 1.546, SE(β) = 0.702, p-value from Wald test = 0.0276. May

not be valid!

Slide 284

Page 38: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

IV Conditional Logistic Models and Exact Inference

IV.1 Conditional logistic regression for 2× 2×K tables

• If the number of centers K is large in the previous common odds-ratio

example:

logit{π(x, z)} = βx+ βZk , z = 1, 2, ...,K

then there will be too many βZk ’s and the ML inference on β may not

be valid.

• Idea: find out sufficient statistics of βk and conduct inference on β

based on the conditional distribution of the data given those sufficient

statistics.

Slide 285

Page 39: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• Data from center k:

Y

S F

X trt n11k n12k n1+k

control n21k n22k n2+k

Z = k

• It can be shown that n+1k = n11k + n21k (total # of successes at

center k) is a sufficient statistic for βk.

⇒ Lk(β, βk|n+1k) = Lk(β|n+1k) should be free of βk – non-central

hypergeometric dist.

When β = 0(X ⊥ Y |Z), Lk(β|n+1k) is the standard hypergeometric

dist. with no unknown parameter.

Slide 286

Page 40: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• The conditional logistic inference (on β) is based on the conditional

likelihood:

Lc(β|{n+1k}) =

K∏k=1

Lk(β, βk|n+1k),

which only has one parameter β no matter how large K is!

Treat this as a regular likelihood function, we can estimate β by

maximizing Lc(β|{n+1k}). We can also conduct the Wald, score and

LRT for testing H0 : β = 0.

Slide 287

Page 41: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• SAS program and output:title "Use a conditional logistic regression to assess treatment effect";proc logistic data=fungal;

class center;model y/n = trt;strata center;

run;

********************************************************************************

The LOGISTIC Procedure

Conditional Analysis

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 5.2269 1 0.0222Score 5.0170 1 0.0251Wald 4.6507 1 0.0310

Analysis of Conditional Maximum Likelihood Estimates

Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq

trt 1 1.4706 0.6819 4.6507 0.0310

• However, since the tables are sparse, all three tests may not be valid ⇒exact conditional inference!

Slide 288

Page 42: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

IV.2 Exact conditional inference for 2× 2×K tables

• With common odds-ratio model for 2× 2×K tables

logit{π(x, z)} = βx+ βZk , z = 1, 2, ...,K

The conditional likelihood of β only depends on β.

• Under H0 : β = 0(X ⊥ Y |Z), the conditional likelihood Lk(β|n+1k) is

completely known, and is equal to the conditional distribution of n11k

given all the margins – hypergeometric dist.

• We can conduct exact inference for H0 : β = 0(X ⊥ Y |Z) using this

hypergeometric dist.

Slide 289

Page 43: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• SAS program and part of the output:proc logistic data=fungal;

class center / param=ref;model y/n = center trt;exact trt;

run;

*************************************************************************

The LOGISTIC Procedure

Exact Conditional Tests

--- p-Value ---Effect Test Statistic Exact Mid

trt Score 5.0170 0.0333 0.0235Probability 0.0197 0.0333 0.0235

• Note: Since the above exact test is based on the conditional dist. of

n11k given margins, which is the dist that CMH test is based, it can be

shown that the above exact score test is actually the exact CMH test!

Compare this to the large-sample CMH test on the next slide.data y1; set fungal;

count=y;drop y0;y=1;

run;

Slide 290

Page 44: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

data y0; set fungal;count=y0;drop y0;y=0;

run;

data new; set y1 y0;run;

title "MH test for conditional independence and MH common OR";proc freq data=new order=data;

weight count;tables center*trt*y/nopercent norow nocol cmh;

run;

****************************************************************************

MH test for conditional independence and MH common OR 11

The FREQ Procedure

Summary Statistics for trt by yControlling for center

Cochran-Mantel-Haenszel Statistics (Based on Table Scores)

Statistic Alternative Hypothesis DF Value Prob---------------------------------------------------------------

1 Nonzero Correlation 1 5.0170 0.02512 Row Mean Scores Differ 1 5.0170 0.02513 General Association 1 5.0170 0.0251

Slide 291

Page 45: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

IV.3 Other exact conditional test in logistic models

• For a logistic model:

logit{π(x)} = α+ β1x1 + β2x2 + · · ·+ βpxp

We can find out suff. stat. for each βk, denoted by Tk. Suppose we

would like to make exact conditional inference on, βp, say, then the

exact inference can be based on

f(y1, y2, ..., yn|T1, T2, ..., Tp−1) = L(βp).

For exact test of H0 : βp = 0, the cond. dist. of data (Y1, Y2, ..., Yn)

given T1, T2, ..., Tp−1 is completely known. We can do exact score test

based on L(βp).

We can also construct an exact CI for βp based on L(βp).

Software:Proc Logistic; *may use "descending" for binary response;

model y/n = x1 x2 x3 / link=logit;exact x3;

run;

Slide 292

Page 46: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• Fisher’s Exact Test: We can consider a logistic model

logit(P [Y = 1]) = α+ βx

for the following 2× 2 table:

Y

1 0

X 1 y1 n1 − y1 n1

0 y2 n2 − y2 n2

It can be shown that a sufficient statistic of α is y1 + y2 – the column

margin. Then the Fisher’s exact test can be achieved byProc Logistic;

model y/n = x / link=logit;exact x;

run;

Slide 293

Page 47: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• Exact Cochran-Armitage trend test: If there is only one ordinal

x (with score denoted by x), then we conduct the exact test for β = 0

in the following logistic regression:

logit{π(x)} = α+ βx.

It can be shown that the resulting exact score test is the exact

Cochran-Armitage trend test.

• Example: Mother’s alcohol consumption and infant malformation

Alcohol Malformation

Consumption Present (Y = 1) Absent (Y = 0)

0 (0) 48 17, 066

< 1 (0.5) 38 14, 464

1− 2 (1.5) 5 788

3− 5 (4) 1 126

≥ 6 (7) 1 37

Slide 294

Page 48: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• SAS program and part of the output:data table2_7;

input alcohol malform count @@;datalines;0 1 48 0 0 170660.5 1 38 0.5 0 144641.5 1 5 1.5 0 7884 1 1 4 0 1267 1 1 7 0 37

;

title "Eaxct Cochran-Armitage trend test";proc logistic;

freq count;model malform (event="1") = alcohol / link=logit;* equivalent to model malform (ref="0") = alcohol / link=logit;exact alcohol;

run;

*************************************************************************

The LOGISTIC Procedure

Exact Conditional Tests

--- p-Value ---Effect Test Statistic Exact Mid

alcohol Score 6.5699 0.0172 0.0158Probability 0.00291 0.0217 0.0202

The exact Cochran-Armitage trend test has p-value = 0.0172 (mid

p-value=0.0158) ⇒ significant evidence for alcohol effect on infant

malformation!Slide 295

Page 49: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

V Sample Size Calculation for Comparing Two Proportions

• Sample size calculation is usually posed as a hypothesis testing

problem. For comparing two success probabilities π1 and π2 from two

groups, the null hypothesis is H0 : π1 = π2 and the alternative is

Ha : π1 6= π2.

• Suppose we have data: y1 ∼ Bin(n1, π1) and y2 ∼ Bin(n2, π2), we

would construct a test statistic

T =p1 − p2√

p1(1− p1)/n1 + p2(1− p2)/n2

,

where p1 = y1/n1, p2 = y2/n2, and reject H0 : π1 = π2 at level α if

|T | ≥ zα/2,

when both n1 and n2 are large.

Slide 296

Page 50: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

• If we would like to have power 1− β to detect a difference δ = π1 − π2

(w.l.o.g, assume δ > 0), then we need

P [T ≥ zα/2|Ha : π1 − π2 = δ] = 1− β.

• Assume equal sample size for each group: n1 = n2, then the above

power statement leads to (approximately)

P

[p1 − p2 − δ√

π1(1− π1)/n1 + π2(1− π2)/n1

≥ zα/2 −δ√

π1(1− π1)/n1 + π2(1− π2)/n1

∣∣∣∣∣Ha

]= 1− β

P [Z ≥ zα/2 − δ√n1/√π1(1− π1) + π2(1− π2)] = 1− β

where Z ∼ N(0, 1).

Slide 297

Page 51: CHAPTER 5 ST 544, D. Zhang 5 Building and Applying ...people.uncw.edu/chenc/STT425/PPT-Chapter/ZhangDa... · Slide 248. CHAPTER 5 ST 544, D. Zhang I.2 Crab data revisited If we throw

CHAPTER 5 ST 544, D. Zhang

⇒zα/2 − δ

√n1/√π1(1− π1) + π2(1− π2) = −zβ

n1 = n2 =(zα/2 + zβ)2[π1(1− π1) + π2(1− π2)]

(π1 − π2)2.

• For example, if we would like to detect Ha : π1 = 0.3, π2 = 0.2 with

90% power at level 0.05, then

n1 = n2 =(z0.05/2 + z0.1)2[0.3(1− 0.3) + 0.2(1− 0.2)]

(0.3− 0.2)2

=(1.96 + 1.28)2[0.3(1− 0.3) + 0.2(1− 0.2)]

(0.3− 0.2)2= 388.4 = 389.

• Note: The textbook also discussed the sample size calculation in

detecting β for a logistic regression model (p.161-162).

Slide 298