April 11 • Logistic Regression – Modeling interactions – Analysis of case-control studies – Data presentation
April 11
• Logistic Regression– Modeling interactions
– Analysis of case-control studies
– Data presentation
Subgroup AnalysesJournal Tables
Treatment A Treatment B OROverall 100 150 0.67
Men 40 90 x.xxWomen 60 60 x.xx
Age <50 25 30 x.xxAge 51-60 35 50 x.xxAge 60 + 40 70 x.xx
SBP < 160 40 70 x.xxSBP ≥ 160 60 80 x.xx
Is there any evidence that the effect of treatment differs among subgroups
TOMHS Example
• Question: Does the effect of active BP treatment on CVD differ for young versus older persons?
• Looking at an interaction effect (effect modification)• Compare
– Odds CVD (treatment/placebo) in younger patients
– Odds CVD (treatment/placebo) in older patients
Logistic regression equation
Model log odds of outcome as a linear function of one or more variables
Xi = predictors, independent variables
is increase in log odds of 1-unit increase in X
eis relative odds of a 1-unit increase in X
...)1
log( 22110
xx
The model is:
Logistic Model For Interaction
X1 = 1 for active treatment and 0 for placebo
X2 = 1 for age ≥ 55 and 0 for age < 55
X3 = X1 * X2
21322110)1
log( xxxx
Logistic Model For Interaction
X1 = 1 for active treatment and 0 for placebo
X2 = 1 for age ≥ 55 and 0 for age < 55
X3 = X1 * X2
21322110)1
log( xxxx
Log Odds (placebo, young) = 0
Log Odds (active, young) = 0 + 1
Log Odds (placebo, old) = 0 + 2
Log Odds (active, old) = 0 + 1 + 2 +3
Dif = 1; exp(1) is odds (A v P) for young
Dif = 1 + ; exp(1 + 3 ) is odds (A v P) for old
Log Odds (placebo, young) = 0
Log Odds (active, young) = 0 + 1
Log Odds (placebo, old) = 0 + 2
Log Odds (active, old) = 0 + 1 + 2 +3
exp(1) is odds (A v P) for young
exp(1 + 3 ) is odds (A v P) for old
What does 3 Mean?
=Odds (A v P) for Old exp(1 + 3)
Odds (A v P) for Young exp (1)exp (3)=
A ratio of ratios!!
Interaction Hypothesis
Ho: 3 = 0
Ha: 3 ≠ 0
Test in SAS just like any other coefficient
21322110)1
log( xxxx
TOMHS: Overall Effect of Active Treatment
PROC MEANS DATA=temp N MEAN SUM; CLASS active; VAR cvd; RUN;
Analysis Variable : cvd
N active Obs N Mean Sum============================================================ 0 234 234 0.1623932 38.0000000
1 668 668 0.1107784 74.0000000============================================================
Active: 38/234 or 11.1%
Placebo: 74/668 or 16.2%
RR = 0.68 (32% lower rate of CVD with active treatment)
OVERALL (ACTIVE VERSUS PLACEBO)
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -1.6405 0.1773 85.6626 <.0001active 1 -0.4423 0.2159 4.1964 0.0405
Odds Ratio Estimates
Point 95% WaldEffect Estimate Confidence Limits
active 0.643 0.421 0.981
Active group at 36% lower risk of CVD compared to placebo.
Reading DATA and Creating indicator variables and interaction variable
LIBNAME tomhs 'C:/';DATA temp; SET tomhs.bpstudy; cvd = second; if group = 6 then active = 0; else active = 1; if age < 55 then old = 0; else old =1;
*compute interaction term (x3); active_old = active*old;
* Get simple counts and proportions first;PROC MEANS DATA=temp N MEAN SUM; CLASS old active; VAR cvd; RUN;
The MEANS Procedure
Analysis Variable : cvd
N old active Obs N Mean Sum=========================================================================== 0 0 115 115 0.1565217 18.0000000
1 350 350 0.0714286 25.0000000
1 0 119 119 0.1680672 20.0000000
1 318 318 0.1540881 49.0000000
It appears the effect of treatment is mostly in younger patients
PROC LOGISTIC DATA=temp DESCENDING; MODEL CVD = active old active_old;
CONTRAST 'A v P (Young)' active 1 /ESTIMATE=BOTH; CONTRAST 'A v P (Old)' active 1 active_old 1
/ESTIMATE=BOTH; * Will give us beta1 + beta 3; RUN;
SAS OUTPUT
Response Profile
Ordered Total Value cvd Frequency
1 1 112 2 0 790
Probability modeled is cvd=1.Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 15.7787 3 0.0013Score 14.7851 3 0.0020Wald 14.0735 3 0.0028
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -1.6843 0.2566 43.0730 <.0001active 1 -0.8806 0.3301 7.1180 0.0076old 1 0.0850 0.3549 0.0573 0.8108active_old 1 0.7771 0.4395 3.1261 0.0770
Odds Ratio Estimates
Point 95% WaldEffect Estimate Confidence Limits
active 0.415 0.217 0.792old 1.089 0.543 2.183active_old 2.175 0.919 5.147
b1b2b3
Odds CVD (A v P) for younger patients = exp(b1) = 0.415
Odds CVD (A v P) for older patients = exp(b1 + b3) = exp(-0.11) = 0.90
2.175 = 0.90/.415
Ratio of Odds Ratios
CONTRAST 'A v P (Old)' active 1 active_old 1 /ESTIMATE=BOTH;
Computes 1*beta1 + 0*beta2 + 1*beta3 =beta1 + beta3
Plus test and 95%CI
Contrast Rows Estimation and Testing Results
Standard Lower UpperContrast Type Row Estimate Error Alpha Limit Limit
A v P (Young) PARM 1 -0.8806 0.3301 0.05 -1.5275 -0.2337A v P (Young) EXP 1 0.4145 0.1368 0.05 0.2171 0.7916
A v P (Old) PARM 1 -0.1035 0.2902 0.05 -0.6723 0.4653A v P (Old) EXP 1 0.9017 0.2617 0.05 0.5105 1.5925
Exp(
Patients in the active group were at 36% lower risk of CVD compared to the placebo group (OR: 0.64; 95% CI:0.42-0.98). Analyses by age showed that the benefit for active treatment was greatest in younger patients. In patients < age 55 the CVD risk was 58% lower in the active treatment (OR: 0.42) where for patients over 55 years of age the CVD risk was only 10% lower (OR:.90). The test for interaction between treatment and age approached significance (p=.07).
Description of Findings
Logistic Regression forCase Control Studies
• Same analyses as prospective study• Outcome:
– Y = 1 is a case
– Y = 0 is a control
• Model log (odds) of being a case• Odds ratios have same meaning• Estimating probability of being a case not appropriate
Example Colon Polyp Study
• Cases (N=574)– Patients diagnosed with colorectal polyps from
colonoscopy
• Controls (N=707)– Patients clear of colorectal polyps from colonoscopy
• Risk Factors Under Study – FH of colon cancer– Smoking and alcohol– Reproductive history factors– Obesity and adiposity (weight to hip measures)
Example Colon Polyp Study
• Variables – CC Status (1=case, 2=control)– Age (years)– FH colon cancer (1=Y, 0=N)– Current Smoking (1=Y, 0=N)– Gender (1=Men, 0 = Women)– Waist to Hip Ratio
• Variables Names– CC, AGE, FHCC, SMOKERS, MEN, and WHIP
PROC LOGISTIC DATA=temp ; MODEL cc = age fhcc smokers men whip; UNITS whip = 0.1 ;Response Profile Ordered Total Value CC1 Frequency
1 1 561 2 2 690
Probability modeled is CC=1.
Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 165.4379 5 <.0001Score 155.7546 5 <.0001Wald 139.8082 5 <.0001
Analysis of Maximum Likelihood Estimates Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -4.0683 0.5953 46.7054 <.0001AGE 1 0.0497 0.00618 64.8156 <.0001FHCC 1 -0.4434 0.1505 8.6798 0.0032smokers 1 0.5272 0.1623 10.5537 0.0012men 1 0.8379 0.1503 31.0610 <.0001WHIP 1 0.7491 0.6287 1.4197 0.2335
Odds Ratio Estimates Point 95% WaldEffect Estimate Confidence Limits
AGE 1.051 1.038 1.064FHCC 0.642 0.478 0.862smokers 1.694 1.233 2.329men 2.312 1.722 3.104WHIP 2.115 0.617 7.253
UNITS whip = 0.1 ;Effect Unit Estimate 95% Confidence LimitsWHIP 0.1000 1.078 0.953 1.219
Interaction Model
• Is relationship of waist to hip ratio different for men and women
• Define interaction term– Whip * men
PROC LOGISTIC DATA=temp DESCENDING; MODEL cc = age fhcc smokers men whip whip_men; Analysis of Maximum Likelihood Estimates Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -2.4771 0.8632 8.2349 0.0041AGE 1 0.0511 0.00626 66.7467 <.0001FHCC 1 -0.4528 0.1511 8.9866 0.0027smokers 1 0.5487 0.1631 11.3203 0.0008men 1 -2.5148 1.3103 3.6838 0.0549WHIP 1 -1.2470 1.0235 1.4846 0.2231whip_men 1 3.7225 1.4392 6.6897 0.0097
Odds Ratio Estimates Point 95% WaldEffect Estimate Confidence Limits
AGE 1.052 1.040 1.065FHCC 0.636 0.473 0.855smokers 1.731 1.257 2.383men 0.081 0.006 1.055WHIP 0.287 0.039 2.136whip_men 41.367 2.464 694.576
P-value for women
Some Practical Aspects for Analyses
• Divide continuous variable of interest into 3-5 categories and compute relative odds for increasing categories.
• Summarize results using beta coefficient using factor as continuous variable.
Example Omega-3 Intake and CHD
Omega-3 Intake CHD
N
Odds Ratio
(95% CI)
I 40 1.00
II 42 1.08 (0.80 – 1.45)
III 37 0.92 (0.68 – 1.32)
IV 35 0.89 (0.66 - 1.25)
V 24 0.61 (0.34 – 0.98)
Beta (SE) 0.30 (.15)
Advantages
• Can determine if risk increases linearly with increasing levels of factor
• No assumptions of pattern of risk when using categories
• Can determine if there is a threshold effect• Eliminates possible effect of outliers.
Analysis
• Create indicator variables for quintiles of omega-3 and run logistic regression
• Run regression using omega-3 as continuous variable
In Class Exercise
• Investigate whether the odds of CVD increases linearly with age
• Divide age into 4-categories– < 50; 50-54; 55-59; 60+
• Two CVD endpoints: – Clinical – major CVD
– Second – major + minor CVD
• Compute percent with CVD with each age category
• Run logistic regression with 3 indicator variables using < 50 as reference group
• Run logistic regression using age as continuous variable