Page 1
CHAPTER 5 ST 544, D. Zhang
5 Building and Applying Logistic
Regression Models
I Strategies in Model Selection
I.1 Num of x’s in a logistic regression model
• # of x’s can be entered in the model:
Rule of thumb: # of events (both [Y = 1] and [Y = 0]) per x ≥ 10.
• Need to be aware of collinearity in x’s.
Slide 248
Page 2
CHAPTER 5 ST 544, D. Zhang
I.2 Crab data revisited
• If we throw all indep variables to the logistic regression:
logit{π} = α+ β1c1 + β2c2 + β3c3 + β4s1 + β5s2 + β6wt+ β7width
The LRT for H0 : all β’s = 0 is 40.6 with df = 7 (p-value < 0.0001).
• However, only β2 is significantly from 0! Something is wrong.
• Collinearity is an issue! Wt, width and color are correlated.Slide 249
Page 3
CHAPTER 5 ST 544, D. Zhang
I.3 Variable selection
• Use traditional model selection procedures (used when p << n)
1. Forward selection (simple one + variant)
2. Backward elimination
3. Better to use LRT for variable selection
4. Can consider interactions (usually 2-way interactions)
• Use modern model selection procedures, usually in the form of
penalized likelihood (can handle p > n); New research area.
Slide 250
Page 4
CHAPTER 5 ST 544, D. Zhang
I.4 Backward elimination for crab data
The table indicates that model 5 (M3 on slide 241) may be considered
the final model.
Slide 251
Page 5
CHAPTER 5 ST 544, D. Zhang
I.5 Use AIC or BIC for model selection
• AIC formula (smaller, the better):
AIC = -2 (log likelihood - # of parameters in the model)
• AIC “penalizes a bigger model” by its complexity/size.
• For model 5 in Table 5.2, the SAS program and output:data crab;
input color spine width satell weight;weight=weight/1000;color=color-1;y=(satell>0);n=1;
if color<4 then c=1;else c=0;
datalines;3 3 28.3 8 30504 3 22.5 0 15502 1 26.0 9 2300....;
Slide 252
Page 6
CHAPTER 5 ST 544, D. Zhang
proc genmod descending;model y/n = width c / dist=bin;
run;
************************************************************************Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 170 187.9579 1.1056Scaled Deviance 170 187.9579 1.1056Pearson Chi-Square 170 167.4557 0.9850Scaled Pearson X2 170 167.4557 0.9850Log Likelihood -93.9789Full Log Likelihood -93.9789AIC (smaller is better) 193.9579AICC (smaller is better) 194.0999BIC (smaller is better) 203.4178
AIC = −2(−93.98− 3) = 193.96 ≈ 194.
• Note: Now Proc Genmod and Proc Logistic do not produce
Pearson χ2 and deviance for binary data anymore, unless
aggregate=(width c) is used, in which case their df=# of distinct
settings determined by width and c - # of parameters in the model.
In the above program, we tricked proc genmod by using y/n so the
procedure does not think the data is binary.Slide 253
Page 7
CHAPTER 5 ST 544, D. Zhang
I.6 Summarizing predictive power, classification tables and ROC curves
• Suppose we have binary response Yi = 1/0 (success/failure), xi a
vector of covariates.
π(xi) = P [Yi = 1|xi]
logit{π(xi)} = xTi β(can have more than 1 x)
After we fit the model, we got β ⇒ we got πi as
πi =ex
Ti β
1 + exTi β.
• Choose a known value π0 (e.g., π0 = 0.5), and conduct prediction Yi as
Yi =
1 if πi > π0
0 otherwise
Slide 254
Page 8
CHAPTER 5 ST 544, D. Zhang
and then construct the table (classification table)
Y
1 0
Y 1 n11 n12
0 n21 n22
The following two quantities tell us how good the prediction is:
sensitivity = n11
n11+n12
specificity = n22
n21+n22
• Using only one table with one π0 loses information.
• Solution: use many different values of π0 ⇒ many classification tables
⇒ many pairs of sensitivity and specificity ⇒ plot sensitivity v.s. 1−specificity ⇒ ROC (receiver operating characteristic curve) ⇒ Area
under the ROC curve summarizes the predictive power of the model,
often called the c-index.
Slide 255
Page 9
CHAPTER 5 ST 544, D. Zhang
• An example:
Y π Y0.3− Y0.4− Y0.5− Y0.6− Y0.7− Y0.8− Y0.8+
1 0.8 1 1 1 1 1 1 0
1 0.6 1 1 1 1 0 0 0
1 0.4 1 1 0 0 0 0 0
0 0.7 1 1 1 1 1 0 0
0 0.5 1 1 1 0 0 0 0
0 0.3 1 0 0 0 0 0 0
Y
Y 1 0
1 3 0
0 3 0
se = 33
sp = 03
3 0
2 1
se = 33
sp = 13
2 1
2 1
se = 23
sp = 13
2 1
1 2
se = 23
sp = 23
1 2
1 2
se = 13
sp = 23
1 2
0 3
se = 13
sp = 33
0 3
0 3
se = 03
sp = 33
Slide 256
Page 10
CHAPTER 5 ST 544, D. Zhang
ROC curve for the example
Slide 257
Page 11
CHAPTER 5 ST 544, D. Zhang
• The AUC for the above ROC curve:
1− 3
9=
2
3
= proportion of concordant pairs in (Yi, πi) among all pairs with
different outcome Yi.
# of pairs with different outcomes: 3× 3 = 9.
# of concordant pairs: 3 + 2 + 1 = 6.
Slide 258
Page 12
CHAPTER 5 ST 544, D. Zhang
• If there are ties in πi’s, need to do some adjustment. For example,suppose two πi’ for a Yi = 1 and a Yi = 0 are the same (0.4):
Y π Y0.4− Y0.5− Y0.6− Y0.7− Y0.8− Y0.8+
1 0.8 1 1 1 1 1 0
1 0.6 1 1 1 0 0 0
1 0.4 1 0 0 0 0 0
0 0.7 1 1 1 1 0 0
0 0.5 1 1 0 0 0 0
0 0.4 1 0 0 0 0 0
The corresponding classification tables are:
Y
Y 1 0
1 3 0
0 3 0
se = 33
sp = 03
2 1
2 1
se = 23
sp = 13
2 1
1 2
se = 23
sp = 23
1 2
1 2
se = 13
sp = 23
1 2
0 3
se = 13
sp = 33
0 3
0 3
se = 03
sp = 33
Slide 259
Page 13
CHAPTER 5 ST 544, D. Zhang
ROC curve when there are tied predictive probs
Slide 260
Page 14
CHAPTER 5 ST 544, D. Zhang
• AUC = 5.59
9 = # of pairs with diff outcomes
5.5 = # of concordant pairs (5) + 0.5 × # of ties in πi’s with diff.
outcomes (1).
• Note: For binomial data, we need to decompose them as binary data.
There will be a lot tied predicted probabilities.
• The program to get πi, ROC curve and c-index:Proc logistic; * may need descending for binary y;
model y/n = x / outroc=roc;output out=outpred predicted=pihat;
run;
title "ROC Plot";symbol1 v=dot i=join;proc gplot data=roc;
plot _sensit_*_1mspec_;run;
here variable 1mspec means 1 minus specificity.
Slide 261
Page 15
CHAPTER 5 ST 544, D. Zhang
• SAS program and output for the logistic model for crab data:
M3 : logit{π(x, c)} = α+ β1c+ β2x
title "ROC Curve and c-index";proc logistic descending;
model y = width c / link=logit outroc=roc;output out=outpred predicted=pihat;
run;
proc plot data=roc;plot _sensit_*_1mspec_;
run;
*************************************************************************
Analysis of Maximum Likelihood Estimates
Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -12.9795 2.7272 22.6502 <.0001width 1 0.4782 0.1041 21.0841 <.0001c 1 1.3005 0.5259 6.1162 0.0134
Association of Predicted Probabilities and Observed Responses
Percent Concordant 76.7 Somers’ D 0.544Percent Discordant 22.3 Gamma 0.549Percent Tied 0.9 Tau-a 0.252Pairs 6882 c 0.772
Slide 262
Page 16
CHAPTER 5 ST 544, D. Zhang
ROC curve from the model:
Plot of _SENSIT_*_1MSPEC_. Legend: A = 1 obs, B = 2 obs, etc.
1.0811 +|| BAAA AABA| A BAA A A
0.9009 + AAAAB AAA| A A A| A AA A
S | AAe 0.7207 + AAABn | B As | Ai | A At 0.5405 + A Ai | A Bv | BAi | Bt 0.3604 + By | BA
| B| B
0.1802 + BA| A| D| D
0.0000 + B---+--------+--------+--------+--------+--------+--------+--------+--0.0000 0.1452 0.2903 0.4355 0.5806 0.7258 0.8710 1.0161
1 - Specificity
Slide 263
Page 17
CHAPTER 5 ST 544, D. Zhang
II Model Checking for Logistic Models
II.1 LRT testing current model to more complex models
• Suppose we would like to see if the logistic model (with only one x):
log{(π(x)} = α+ βx
fits the data well, we can fit a more complex model such as
log{(π(x)} = α+ β1x+ β2x2.
and test H0 : β2 = 0 using the Wald, score and LRT tests. LRT is
usually preferred.
Slide 264
Page 18
CHAPTER 5 ST 544, D. Zhang
II.2 Goodness of fit using deviance and Pearson χ2 for grouped data
• For binomial data like the Snoring/Heart disease example:
Heart Disease
x Yes (yi) No ni
0 Never 24 1355 1379
Snoring 2 Occasionally 35 605 638
4 Nearly every night 21 192 213
5 Every night 30 224 254
where ni →∞, we can use the deviance or Pearson χ2 to check the
goodness of fit of the logistic model
logit{(π(x)} = α+ βx.
Slide 265
Page 19
CHAPTER 5 ST 544, D. Zhang
• Treat the data as if from I × 2 table, the deviance G2(M) of the
current model M can be shown to have the form:
G2(M) = 2∑
obs× log
{obs
fitted
}and the Pearson χ2 have the form:
χ2 =∑ (obs− fitted)2
fitted
where the summation is over 2I cells (8 cells for the previous example)
• For snoring/HD example, we know that linear probability model has a
better fit than the logistic model.
Slide 266
Page 20
CHAPTER 5 ST 544, D. Zhang
II.3 Goodness of fit for ungrouped data, Hosmer-Lemeshow test
• After fitting the logistic regression model for binary data (can be
recovered for binomial data), group data into g groups of approximately
the same size based on the estimated success probabilities:
y11, y12, · · · , y1n1
π11, π12, · · · , π1n1 n1
y21, y22, · · · , y2n2
π21, π22, · · · , π2n2n2
· · ·yg1, yg2, · · · , ygng
πg1, πg2, · · · , πgngng
Slide 267
Page 21
CHAPTER 5 ST 544, D. Zhang
• Then construct the following stat
g∑i=1
(∑ni
j=1 yij −∑ni
j=1 πij)2
(∑ni
j=1 πij)(ni −∑ni
j=1 πij)/ni
H0∼ χ2g−2(roughly),
when the # of distinct covariate patterns is large.
• This is the Hosmer-Lemeshow test of goodness-of-fit.
• The test can be obtained usingProc Logistic;
model y/n = x1 x2 / lackfit;Run;
Slide 268
Page 22
CHAPTER 5 ST 544, D. Zhang
II.4 Residuals from the logistic models
• With data yi from Bin(ni, πi) and we fit the logistic model
logit(πi) = α+ βxi.
After we got α, β ⇒ πi:
πi =eα+xiβ
1 + eα+xiβ.
• Pearson Residual:
ei =yi − niπi√niπi(1− πi)
• Standardized Pearson residual
esti =yi − niπiSE
=yi − niπi√
niπi(1− πi)(1− hi)=
ei√1− hi
where hi is the ith element of the hat matrix.
Slide 269
Page 23
CHAPTER 5 ST 544, D. Zhang
• E(esti ) ≈ 0, var(esti ) ≈ 1 for large ni. So esti behaves like a N(0,1)
random variable. Large esti ( |esti | > 2) indicates potential outlier.
• Plots of esti v.s. xi or xiβ may detect lack of fit.
• When ni = 1 (binary data), esti is not very informative.
• Note: Proc Logistic does not report esti . Need to use Proc
GenMod to get esti .
Slide 270
Page 24
CHAPTER 5 ST 544, D. Zhang
• Example 1: Residual plot for the crab data:
Model: logit(P [Y = 1|x, c]) = β0 + β1c1 + β2c2 + β3c3 + β4xdata crab;
input color spine width satell weight;weight=weight/1000;color=color-1;satbin=(satell>0);c1 = (color=1);c2 = (color=2);c3 = (color=3);c4 = (color=4);s1 = (spine=1);s2 = (spine=2);datalines;
3 3 28.3 8 30504 3 22.5 0 15502 1 26.0 9 23004 3 24.8 0 21004 3 ...
proc genmod data=crab descending;model satbin = width c1 c2 c3 / dist=bin link=logit;output out=resid ResRaw=ResRaw ResChi=ResChi StdReschi=StdReschi;
run;
data _null_; set resid;file "crab_res";put stdreschi width;
run;
Slide 271
Page 25
CHAPTER 5 ST 544, D. Zhang
Slide 272
Page 26
CHAPTER 5 ST 544, D. Zhang
• Example 2: Admission to Graduate School at UF in 1997-1998 (Table
5.5)
Let π(k, g) = P [admission|D = k,G = g] for department D = k and
gender G = g. We consider three models:
1. π(k, g) = Dk: Admission is independent of gender at each
department.
2. π(k, g) = Dk +Gg: Admission-Gender association is the same
across departments (⇔ logit{π(k, g)} = Dk +Gg).
3. π(k, g) = Gg: Get the marginal Admission-Gender association
collapsed over departments.
options ls=75 ps=100;
data admit;input dept $ gender y yno;n = y+yno;male=gender-1;cards;anth 1 32 81anth 2 21 41astr 1 6 0astr 2 3 8chem 1 12 43chem 2 34 110
Slide 273
Page 27
CHAPTER 5 ST 544, D. Zhang
...
title "Model 1: Logistic model assuming gender and admission are";title2 "conditional independent given department";proc genmod;
class dept;model y/n = dept /dist=bin link=logit;output out=resid Resraw=Resraw Reschi=Reschi StdReschi=StdReschi;
run;
data resid; set resid;keep dept male Resraw Reschi StdReschi;
run;
title "Residuals from Model 1";proc print data=resid;run;
title "Model 2: Logistic model with homogeneous GA and DA association";proc genmod data=admit;
class dept;model y/n = dept male;
run;
title "Model 3: Logistic model for marginal GA association";proc genmod data=admit;
model y/n = male;run;
Slide 274
Page 28
CHAPTER 5 ST 544, D. Zhang
Part of the output:Model 1: Logistic model assuming gender and admission are 1
conditional independent given department
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 23 44.7352 1.9450Scaled Deviance 23 44.7352 1.9450Pearson Chi-Square 23 40.8523 1.7762Scaled Pearson X2 23 40.8523 1.7762
StdObs dept male Reschi Resraw Reschi
1 anth 0 -0.45509 -2.22286 -0.764572 anth 1 0.61438 2.22286 0.764573 astr 0 2.30940 2.82353 2.870964 astr 1 -1.70561 -2.82353 -2.870965 chem 0 -0.22824 -0.71357 -0.268306 chem 1 0.14105 0.71357 0.268307 clas 0 -0.75593 -0.50000 -1.069048 clas 1 0.75593 0.50000 1.069049 comm 0 -0.16670 -1.04167 -0.63260
10 comm 1 0.61024 1.04167 0.6326011 comp 0 0.85488 1.63636 1.1575212 comp 1 -0.78040 -1.63636 -1.1575213 engl 0 0.67452 3.32130 0.9420914 engl 1 -0.65769 -3.32130 -0.9420915 geog 0 1.79629 2.75000 2.1664116 geog 1 -1.21106 -2.75000 -2.1664117 geol 0 -0.21822 -0.30000 -0.2608218 geol 1 0.14286 0.30000 0.2608219 germ 0 0.89974 0.77273 1.8873020 germ 1 -1.65903 -0.77273 -1.88730
Slide 275
Page 29
CHAPTER 5 ST 544, D. Zhang
21 hist 0 -0.14639 -0.31034 -0.1762722 hist 1 0.09820 0.31034 0.1762723 lati 0 1.22493 3.25676 1.6456424 lati 1 -1.09895 -3.25676 -1.6456425 ling 0 0.78403 2.13043 1.3729826 ling 1 -1.12711 -2.13043 -1.3729827 math 0 1.00845 3.30631 1.2884428 math 1 -0.80193 -3.30631 -1.2884429 phil 0 1.22474 1.00000 1.3416430 phil 1 -0.54772 -1.00000 -1.3416431 phys 0 1.17573 2.57576 1.3245832 phys 1 -0.61005 -2.57576 -1.3245833 poli 0 -0.18041 -0.68707 -0.2331834 poli 1 0.14772 0.68707 0.2331835 psyc 0 -1.16905 -2.41176 -2.2722236 psyc 1 1.94841 2.41176 2.2722237 reli 0 0.63246 0.75000 1.2649138 reli 1 -1.09545 -0.75000 -1.2649139 roma 0 0.05868 0.17647 0.1397040 roma 1 -0.12677 -0.17647 -0.1397041 soci 0 0.17272 0.56164 0.3012342 soci 1 -0.24679 -0.56164 -0.3012343 stat 0 -0.00960 -0.02439 -0.0122944 stat 1 0.00768 0.02439 0.0122945 zool 0 -1.23400 -3.10769 -1.7587346 zool 1 1.25314 3.10769 1.75873
Model 2: Logistic model with homogeneous GA and DA association 4
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 22 42.3601 1.9255Scaled Deviance 22 42.3601 1.9255Pearson Chi-Square 22 38.9908 1.7723Scaled Pearson X2 22 38.9908 1.7723
Slide 276
Page 30
CHAPTER 5 ST 544, D. Zhang
Analysis Of Maximum Likelihood Parameter Estimates
Standard Wald 95% WaldParameter DF Estimate Error Confidence Limits Chi-Square
Intercept 1 -2.0323 0.2877 -2.5962 -1.4685 49.91dept anth 1 1.2585 0.3277 0.6162 1.9008 14.75dept astr 1 2.2622 0.5631 1.1586 3.3659 16.14
...
male 1 -0.1730 0.1123 -0.3932 0.0472 2.37
Model 3: Logistic model for marginal GA association 6
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 44 449.3122 10.2116Scaled Deviance 44 449.3122 10.2116Pearson Chi-Square 44 409.4050 9.3047Scaled Pearson X2 44 409.4050 9.3047
Analysis Of Maximum Likelihood Parameter Estimates
Standard Wald 95% Confidence WaldParameter DF Estimate Error Limits Chi-Square
Intercept 1 -0.6455 0.0637 -0.7703 -0.5207 102.77male 1 0.0662 0.0921 -0.1142 0.2467 0.52
Models 2 & 3 show Simpson’s Paradox.
Slide 277
Page 31
CHAPTER 5 ST 544, D. Zhang
• Example 3: Heart disease and blood pressure (Table 5.6, P. 151)data HD;
input bp $ n y;if bp="<117" then
x=111.5;else if bp="117-126" then
x=121.5;else if bp="127-136" then
x=131.5;else if bp="137-146" then
x=141.5;else if bp="147-156" then
x=151.5;else if bp="157-166" then
x=161.5;else if bp="167-186" then
x=176.5;else
x=191.5;cards;<117 156 3117-126 252 17127-136 284 12137-146 271 16147-156 139 12157-166 85 8167-186 99 16>186 43 8
;
proc genmod;model y/n = x /dist=bin link=logit residual;
run;
Slide 278
Page 32
CHAPTER 5 ST 544, D. Zhang
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 6 5.9092 0.9849Scaled Deviance 6 5.9092 0.9849Pearson Chi-Square 6 6.2899 1.0483Scaled Pearson X2 6 6.2899 1.0483
Analysis Of Maximum Likelihood Parameter Estimates
Standard Wald 95% Confidence WaldParameter DF Estimate Error Limits Chi-Square
Intercept 1 -6.0820 0.7243 -7.5017 -4.6624 70.51x 1 0.0243 0.0048 0.0148 0.0338 25.25
Raw Pearson DevianceObservation Residual Residual Residual
Std Deviance Std Pearson LikelihoodResidual Residual Residual
1 -2.194866 -0.979434 -1.061683-1.198648 -1.105788 -1.179257
2 6.3932374 2.0057053 1.85010722.1903838 2.3745999 2.2447199
3 -3.072737 -0.813338 -0.841966-0.978546 -0.945274 -0.970016
4 -2.081617 -0.50673 -0.51623-0.583485 -0.572747 -0.581169
5 0.3836399 0.1175816 0.11700160.1254648 0.1260868 0.1255461
6 -0.856987 -0.304247 -0.308775-0.330927 -0.326074 -0.330303
7 1.791237 0.5134723 0.50496570.6411542 0.651955 0.6452766
8 -0.361958 -0.139464 -0.140243-0.178337 -0.177346 -0.177959
Slide 279
Page 33
CHAPTER 5 ST 544, D. Zhang
III Sparse Data
III.1 Complete separation and quasi-complete separation
• Consider the following data set:Obs x1 x2 y
1 1 2 02 2 3 03 3 4 04 4 5 05 5 5 16 6 6 17 7 7 18 8 8 1
There is a complete separation in x1, and quasi-complete separation in
x2.
• What would happen if we fit
M1 : logit(πi) = α+ βx1i
and
M2 : logit(πi) = α+ βx2i?
Slide 280
Page 34
CHAPTER 5 ST 544, D. Zhang
Complete separation in x1
If we fit M1, α→ −∞, β →∞.
How about M2?
Slide 281
Page 35
CHAPTER 5 ST 544, D. Zhang
III.2 Sparse 2× 2×K tables
Slide 282
Page 36
CHAPTER 5 ST 544, D. Zhang
• As we see before, we may not be interested in XY marginal
association. Instead should focus on conditional association.
• Consider logistic model for π(x, z) = P [Y = 1|x, z]:
logit{π(x, z)} = βx+ βZk
x = 1/0 for active drug/placebo, k = 1, 2, 3, 4, 5 for 5 centers.
Common odds-ratio θXY |Z = eβ across centers.
• SAS program and part of the output:data fungal;
input center trt y y0;n=y+y0;control=1-trt;cards;1 1 0 51 0 0 92 1 1 122 0 0 103 1 0 73 0 0 54 1 6 34 0 2 65 1 5 95 0 2 12
;
Slide 283
Page 37
CHAPTER 5 ST 544, D. Zhang
proc genmod;class center;model y/n = center trt / noint;
run;
*********************************************************************************
Analysis Of Maximum Likelihood Parameter Estimates
Standard Wald 95% WaldParameter DF Estimate Error Confidence Limits Chi-Square Pr > ChiSq
Intercept 0 0.0000 0.0000 0.0000 0.0000 . .center 1 1 -28.0221 213410.4 -418305 418248.7 0.00 0.9999center 2 1 -4.2025 1.1891 -6.5331 -1.8720 12.49 0.0004center 3 1 -27.9293 188688.5 -369851 369794.7 0.00 0.9999center 4 1 -0.9592 0.6548 -2.2426 0.3242 2.15 0.1430center 5 1 -2.0223 0.6700 -3.3354 -0.7092 9.11 0.0025trt 1 1.5460 0.7017 0.1708 2.9212 4.85 0.0276Scale 0 1.0000 0.0000 1.0000 1.0000
• From the output, we know that for centers 1 & 3, βZk = −∞.
• β = 1.546, SE(β) = 0.702, p-value from Wald test = 0.0276. May
not be valid!
Slide 284
Page 38
CHAPTER 5 ST 544, D. Zhang
IV Conditional Logistic Models and Exact Inference
IV.1 Conditional logistic regression for 2× 2×K tables
• If the number of centers K is large in the previous common odds-ratio
example:
logit{π(x, z)} = βx+ βZk , z = 1, 2, ...,K
then there will be too many βZk ’s and the ML inference on β may not
be valid.
• Idea: find out sufficient statistics of βk and conduct inference on β
based on the conditional distribution of the data given those sufficient
statistics.
Slide 285
Page 39
CHAPTER 5 ST 544, D. Zhang
• Data from center k:
Y
S F
X trt n11k n12k n1+k
control n21k n22k n2+k
Z = k
• It can be shown that n+1k = n11k + n21k (total # of successes at
center k) is a sufficient statistic for βk.
⇒ Lk(β, βk|n+1k) = Lk(β|n+1k) should be free of βk – non-central
hypergeometric dist.
When β = 0(X ⊥ Y |Z), Lk(β|n+1k) is the standard hypergeometric
dist. with no unknown parameter.
Slide 286
Page 40
CHAPTER 5 ST 544, D. Zhang
• The conditional logistic inference (on β) is based on the conditional
likelihood:
Lc(β|{n+1k}) =
K∏k=1
Lk(β, βk|n+1k),
which only has one parameter β no matter how large K is!
Treat this as a regular likelihood function, we can estimate β by
maximizing Lc(β|{n+1k}). We can also conduct the Wald, score and
LRT for testing H0 : β = 0.
Slide 287
Page 41
CHAPTER 5 ST 544, D. Zhang
• SAS program and output:title "Use a conditional logistic regression to assess treatment effect";proc logistic data=fungal;
class center;model y/n = trt;strata center;
run;
********************************************************************************
The LOGISTIC Procedure
Conditional Analysis
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 5.2269 1 0.0222Score 5.0170 1 0.0251Wald 4.6507 1 0.0310
Analysis of Conditional Maximum Likelihood Estimates
Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq
trt 1 1.4706 0.6819 4.6507 0.0310
• However, since the tables are sparse, all three tests may not be valid ⇒exact conditional inference!
Slide 288
Page 42
CHAPTER 5 ST 544, D. Zhang
IV.2 Exact conditional inference for 2× 2×K tables
• With common odds-ratio model for 2× 2×K tables
logit{π(x, z)} = βx+ βZk , z = 1, 2, ...,K
The conditional likelihood of β only depends on β.
• Under H0 : β = 0(X ⊥ Y |Z), the conditional likelihood Lk(β|n+1k) is
completely known, and is equal to the conditional distribution of n11k
given all the margins – hypergeometric dist.
• We can conduct exact inference for H0 : β = 0(X ⊥ Y |Z) using this
hypergeometric dist.
Slide 289
Page 43
CHAPTER 5 ST 544, D. Zhang
• SAS program and part of the output:proc logistic data=fungal;
class center / param=ref;model y/n = center trt;exact trt;
run;
*************************************************************************
The LOGISTIC Procedure
Exact Conditional Tests
--- p-Value ---Effect Test Statistic Exact Mid
trt Score 5.0170 0.0333 0.0235Probability 0.0197 0.0333 0.0235
• Note: Since the above exact test is based on the conditional dist. of
n11k given margins, which is the dist that CMH test is based, it can be
shown that the above exact score test is actually the exact CMH test!
Compare this to the large-sample CMH test on the next slide.data y1; set fungal;
count=y;drop y0;y=1;
run;
Slide 290
Page 44
CHAPTER 5 ST 544, D. Zhang
data y0; set fungal;count=y0;drop y0;y=0;
run;
data new; set y1 y0;run;
title "MH test for conditional independence and MH common OR";proc freq data=new order=data;
weight count;tables center*trt*y/nopercent norow nocol cmh;
run;
****************************************************************************
MH test for conditional independence and MH common OR 11
The FREQ Procedure
Summary Statistics for trt by yControlling for center
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic Alternative Hypothesis DF Value Prob---------------------------------------------------------------
1 Nonzero Correlation 1 5.0170 0.02512 Row Mean Scores Differ 1 5.0170 0.02513 General Association 1 5.0170 0.0251
Slide 291
Page 45
CHAPTER 5 ST 544, D. Zhang
IV.3 Other exact conditional test in logistic models
• For a logistic model:
logit{π(x)} = α+ β1x1 + β2x2 + · · ·+ βpxp
We can find out suff. stat. for each βk, denoted by Tk. Suppose we
would like to make exact conditional inference on, βp, say, then the
exact inference can be based on
f(y1, y2, ..., yn|T1, T2, ..., Tp−1) = L(βp).
For exact test of H0 : βp = 0, the cond. dist. of data (Y1, Y2, ..., Yn)
given T1, T2, ..., Tp−1 is completely known. We can do exact score test
based on L(βp).
We can also construct an exact CI for βp based on L(βp).
Software:Proc Logistic; *may use "descending" for binary response;
model y/n = x1 x2 x3 / link=logit;exact x3;
run;
Slide 292
Page 46
CHAPTER 5 ST 544, D. Zhang
• Fisher’s Exact Test: We can consider a logistic model
logit(P [Y = 1]) = α+ βx
for the following 2× 2 table:
Y
1 0
X 1 y1 n1 − y1 n1
0 y2 n2 − y2 n2
It can be shown that a sufficient statistic of α is y1 + y2 – the column
margin. Then the Fisher’s exact test can be achieved byProc Logistic;
model y/n = x / link=logit;exact x;
run;
Slide 293
Page 47
CHAPTER 5 ST 544, D. Zhang
• Exact Cochran-Armitage trend test: If there is only one ordinal
x (with score denoted by x), then we conduct the exact test for β = 0
in the following logistic regression:
logit{π(x)} = α+ βx.
It can be shown that the resulting exact score test is the exact
Cochran-Armitage trend test.
• Example: Mother’s alcohol consumption and infant malformation
Alcohol Malformation
Consumption Present (Y = 1) Absent (Y = 0)
0 (0) 48 17, 066
< 1 (0.5) 38 14, 464
1− 2 (1.5) 5 788
3− 5 (4) 1 126
≥ 6 (7) 1 37
Slide 294
Page 48
CHAPTER 5 ST 544, D. Zhang
• SAS program and part of the output:data table2_7;
input alcohol malform count @@;datalines;0 1 48 0 0 170660.5 1 38 0.5 0 144641.5 1 5 1.5 0 7884 1 1 4 0 1267 1 1 7 0 37
;
title "Eaxct Cochran-Armitage trend test";proc logistic;
freq count;model malform (event="1") = alcohol / link=logit;* equivalent to model malform (ref="0") = alcohol / link=logit;exact alcohol;
run;
*************************************************************************
The LOGISTIC Procedure
Exact Conditional Tests
--- p-Value ---Effect Test Statistic Exact Mid
alcohol Score 6.5699 0.0172 0.0158Probability 0.00291 0.0217 0.0202
The exact Cochran-Armitage trend test has p-value = 0.0172 (mid
p-value=0.0158) ⇒ significant evidence for alcohol effect on infant
malformation!Slide 295
Page 49
CHAPTER 5 ST 544, D. Zhang
V Sample Size Calculation for Comparing Two Proportions
• Sample size calculation is usually posed as a hypothesis testing
problem. For comparing two success probabilities π1 and π2 from two
groups, the null hypothesis is H0 : π1 = π2 and the alternative is
Ha : π1 6= π2.
• Suppose we have data: y1 ∼ Bin(n1, π1) and y2 ∼ Bin(n2, π2), we
would construct a test statistic
T =p1 − p2√
p1(1− p1)/n1 + p2(1− p2)/n2
,
where p1 = y1/n1, p2 = y2/n2, and reject H0 : π1 = π2 at level α if
|T | ≥ zα/2,
when both n1 and n2 are large.
Slide 296
Page 50
CHAPTER 5 ST 544, D. Zhang
• If we would like to have power 1− β to detect a difference δ = π1 − π2
(w.l.o.g, assume δ > 0), then we need
P [T ≥ zα/2|Ha : π1 − π2 = δ] = 1− β.
• Assume equal sample size for each group: n1 = n2, then the above
power statement leads to (approximately)
P
[p1 − p2 − δ√
π1(1− π1)/n1 + π2(1− π2)/n1
≥ zα/2 −δ√
π1(1− π1)/n1 + π2(1− π2)/n1
∣∣∣∣∣Ha
]= 1− β
⇒
P [Z ≥ zα/2 − δ√n1/√π1(1− π1) + π2(1− π2)] = 1− β
where Z ∼ N(0, 1).
Slide 297
Page 51
CHAPTER 5 ST 544, D. Zhang
⇒zα/2 − δ
√n1/√π1(1− π1) + π2(1− π2) = −zβ
⇒
n1 = n2 =(zα/2 + zβ)2[π1(1− π1) + π2(1− π2)]
(π1 − π2)2.
• For example, if we would like to detect Ha : π1 = 0.3, π2 = 0.2 with
90% power at level 0.05, then
n1 = n2 =(z0.05/2 + z0.1)2[0.3(1− 0.3) + 0.2(1− 0.2)]
(0.3− 0.2)2
=(1.96 + 1.28)2[0.3(1− 0.3) + 0.2(1− 0.2)]
(0.3− 0.2)2= 388.4 = 389.
• Note: The textbook also discussed the sample size calculation in
detecting β for a logistic regression model (p.161-162).
Slide 298