Top Banner
Ph.d. course in “Advanced statistical analysis of epidemiological studies” Interaction Dose-response models www.biostat.ku.dk/~pka/avepi18 Clayton & Hills, Ch. 24-26 2 November 2018 Per Kragh Andersen 1
66

Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge...

Mar 29, 2019

Download

Documents

trandat
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Ph.d. course in “Advanced statisticalanalysis of epidemiological studies”

Interaction

Dose-response models

www.biostat.ku.dk/~pka/avepi18

Clayton & Hills, Ch. 24-26

2 November 2018

Per Kragh Andersen

1

Page 2: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Hypothesis tests, ch. 24Wald test for a single parameter:

W =

(M − 0

S

)2

∼ χ21 (chi-square)

directly based on computer output:

Table 24.1. Program output for the diet data.

Parameter Estimate, M SD, S W

Corner -5.4180 0.4420

Exposure (1) 0.8697 0.3080 7.97

Age (1) 0.1290 0.4753 0.07

Age (2) 0.6920 0.4614 2.25

2

Page 3: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Likelihood ratio test: compare max. log likelihoods “under” and“outside” the hypothesis.

Test statistic = -2 × difference between max. log likelihoods

Model Max. log likelihood # parameters

Corner + Age + Exposure -247.03 4

Corner + Exposure -249.04 2

Corner + Age -251.18 3

# parameters removed = # d.f. in χ2

Exposure: 8.30, 1 d.f.

Age: 4.02, 2 d.f.

Models have to be ”nested” - we cannot compare the last two modelsin the table.

3

Page 4: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Max. log likelihood: measure of goodness of fit of a model:

larger log likelihood =⇒ better fit

Interpretation of –247.03 ? No!

Some times, the deviance is introduced as a supplement to the max.log likelihood.

4

Page 5: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Interaction, sect. 24.3We have assumed that the effect of exposure is constant over agebands (and vice versa).

Is that reasonable?

Or is there interaction between age and exposure?

log(Rate)=Corner + Exposure + Age + Exposure·Age

Introducing interaction terms into the regression model, we:

• get a quantification of heterogeneity

• are able to adjust for other explanatory variables when examininginteraction

5

Page 6: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Table 24.5. Estimates of parameters in the model with interaction

Parameter Estimate SD

Corner -5.0237 0.500

Exposure(1) -0.0258 0.866

Age(1) -0.5153 0.671Age(2) 0.3132 0.612

Age(1) · Exposure(1) 1.2720 1.020Age(2) · Exposure(1) 0.8719 0.973

Test for no interaction: Max. log likelihood for

Corner + Age + Exposure + Age.Exposure

is -246.19 leading to the LR test 1.67 (2 d.f.)

6

Page 7: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Illustrative example without interaction Table 22.4

Exposure

Age 0 10 5.0 15.01 12.0 36.02 30.0 90.0

0 5.0 5.0 × 3.01 12.0 12.0 × 3.02 30.0 30.0 × 3.0

0 5.0 5.0 × 3.01 5.0 × 2.4 5.0 × 2.4 × 3.02 5.0 × 6.0 5.0 × 6.0 × 3.0

Corner = 5.0 Age(1) = 2.4

Exposure (1) = 3.0 Age(2) = 6.0

7

Page 8: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Example: Illustrative values of rates with interactionTable 24.2. Definition of interactions in terms of exposure

Exposure

Age 0 10 5.0 15.01 12.0 42.02 30.0 135.0

0 5.0 5.0 × 3.01 12.0 12.0 × 3.52 30.0 30.0 × 4.5

0 5.0 5.0 × 3.01 12.0 12.0 × 3.0 × 1.1672 30.0 30.0 × 3.0 × 1.5

interactionparameters

8

Page 9: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Example: Illustrative values of rates with interactionTable 24.3. Definition of interactions in terms of age

Exposure

Age 0 10 5.0 15.01 12.0 42.02 30.0 135.0

0 5.0 15.01 5.0 × 2.4 15.0 × 2.82 5.0 × 6.0 15.0 × 9.0

0 5.0 15.01 5.0 × 2.4 15.0 × 2.4 × 1.1672 5.0 × 6.0 15.0 × 6.0 × 1.5

interactionparameters

9

Page 10: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Table 24.4. Definition of interactions in terms of exposure and age

Exposure

Age 0 10 5.0 5.0 × 3.01 5.0 × 2.4 5.0 × 3.0 × 2.4 × 1.1672 5.0 × 6.0 5.0 × 3.0 × 6.0 × 1.5

Exercise 24.4.

10

Page 11: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Exercise 24.4: solution

Parameter Estimate SD

Corner -5.0237 0.500

Exposure(1) -0.0258 0.866

Age(1) -0.5153 0.671Age(2) 0.3132 0.612

Age(1) · Exposure(1) 1.2720 1.020Age(2) · Exposure(1) 0.8719 0.973

log4

607.9= −5.0237, log

2311.9

4607.9

= −0.0258

(except for rounding errors)

11

Page 12: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

SAS programdata ihd;

input eksp alder pyrs cases;

lpyrs=log(pyrs);

datalines; /* or, alternatively, read from www */

0 2 311.9 2

0 1 878.1 12

0 0 667.5 14

1 2 607.9 4

1 1 1272.1 5

1 0 888.9 8

;

run;

proc genmod data=ihd;

class eksp alder;

model cases=eksp alder eksp*alder/dist=poi offset=lpyrs type3;

run;

12

Page 13: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

The GENMOD Procedure

Model Information

Data Set WORK.IHD

Distribution Poisson

Link Function Log

Dependent Variable cases

Offset Variable lpyrs

Observations Used 6

Class Level Information

Class Levels Values

eksp 2 0 1

alder 3 0 1 2

13

Page 14: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 0 0.0000 .

Scaled Deviance 0 0.0000 .

Pearson Chi-Square 0 0.0000 .

Scaled Pearson X2 0 0.0000 .

Log Likelihood 53.3799

14

Page 15: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Analysis Of Parameter Estimates

Standard Wald 95% Chi-

Parameter DF Estimate Error Confidence Limits Square

Intercept 1 -5.0237 0.5000 -6.0037 -4.0437 100.95

eksp 0 1 -0.0258 0.8660 -1.7232 1.6716 0.00

eksp 1 0 0.0000 0.0000 0.0000 0.0000 .

alder 0 1 0.3132 0.6124 -0.8871 1.5134 0.26

alder 1 1 -0.5153 0.6708 -1.8301 0.7995 0.59

alder 2 0 0.0000 0.0000 0.0000 0.0000 .

eksp*alder 0 0 1 0.8719 0.9728 -1.0349 2.7786 0.80

eksp*alder 0 1 1 1.2720 1.0165 -0.7204 3.2643 1.57

15

Page 16: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Analysis Of Parameter

Estimates

Parameter Pr > ChiSq

Intercept <.0001

eksp 0 0.9762

eksp 1 .

alder 0 0.6091

alder 1 0.4424

alder 2 .

eksp*alder 0 0 0.3701

eksp*alder 0 1 0.2108

16

Page 17: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

LR Statistics For Type 3 Analysis

Chi-

Source DF Square Pr > ChiSq

eksp 1 3.09 0.0790

alder 2 4.37 0.1125

eksp*alder 2 1.67 0.4333

17

Page 18: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Table 24.5. Reporting estimates from the model with interaction:

Reparametrize into separate effects of Exposure within each Age band.

Parameter Estimate SD RR

Corner -5.0237 0.500

Exposure(1)·Age(0) -0.0258 0.866 0.97

Exposure(1)·Age(1) 1.2461 0.532 3.48

Exposure(1)·Age(2) 0.8461 0.443 2.33

Age(1) -0.5153 0.671 0.60Age(2) 0.3132 0.612 1.37

18

Page 19: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Interactions: which to study?When the model contains p covariates there are p(p− 1)/2 possibletwo-factor interactions (e.g., 45 for p = 10).

It is out of the question to study them all, so a generalrecommendation is to restrict attention to those that werepre-specified in the research protocol:

“Don’t ask a question if you are not interested in the reply!”

There will also be a type I error problem: “if you ask too manyquestions you will get too many wrong answers”.

19

Page 20: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Interaction is scale dependent.Table of disease rates:

Factor A Factor B

Absent Present

Absent 0.1 0.2

Present 0.3 λ

If λ = 0.6 then the rate ratio associated with the presence of factor Ais 3 both when factor B is absent or present; and the rate ratioassociated with the presence of factor B is 2 both when factor A isabsent or present.

However, the rate difference associated with the presence of factor Ais 0.2 when factor B is absent and 0.4 if it is present and the ratedifference associated with the presence of factor B is 0.1 when factorA is absent and 0.3 if it is present

20

Page 21: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Factor A Factor B

Absent Present

Absent 0.1 0.2

Present 0.3 λ

If λ = 0.4 then the rate difference associated with the presence offactor A is 0.2 both when factor B is absent or present; the ratedifference associated with the presence of factor B is 0.1 both whenfactor A is absent or present.

However, the rate ratio associated with the presence of factor A is 3when factor B is absent and 2 if it is present and the rate ratioassociated with the presence of factor B is 2 when factor A is absentand 1.33 if it is present

More on additive models for rates later.

21

Page 22: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Example: case-control study of oral cancer

Table 24.6. Cases (controls) for oral cancer study.

Alcohol (oz/day, 1 drink ∼ 0.3 oz/day).

Tobacco 0 1 2 3

(cigs/day) 0 0.1-0.3 0.4-1.5 1.6+

0 (0) 10 (38) 7 (27) 4 (12) 5 (8)

1 (1-19) 11 (26) 16 (35) 18 (16) 21 (20)

2 (20-39) 13 (36) 50 (60) 60 (49) 125 (52)

3 (40+) 9 (8) 16 (19) 27 (14) 91 (27)

Table 24.7. Case/control ratios for the oral cancer data.

Alcohol

Tobacco 0 1 2 3

0 0.26 0.26 0.33 0.63

1 0.42 0.46 1.13 1.05

2 0.36 0.83 1.22 2.40

3 1.12 0.84 1.93 3.37

22

Page 23: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Is the effect of tobacco the same for all levels of alcohol consumption?

= INTERACTION

(∼ SYNERGY?)

But CORRELATION is something completely different

23

Page 24: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Fig. 24.2. Nesting of models.

5. Corner+Alcohol+Tobacco+Alcohol.Tobacco

4. Corner+Alcohol+Tobacco

2. Corner+Alcohol 3. Corner+Tobacco

1. Corner

HHHHHj

HHHHHj

������

������

?

Exercise 24.6

24

Page 25: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Exercise 24.6: Log-likelihoods

5. -577.65

4. -580.99

2. -596.62 3. -608.59

1. -643.93

HHHHHj

HHHHHj

������

������

?

25

Page 26: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Dose-response modelsExplanatory variables with ordered categories.

Table 25.1. Alcohol and tobacco treated as categorical variables

Parameter Estimate SD

Corner -1.6090 0.2654

Alcohol(1) 0.2897 0.2327

Alcohol(2) 0.8437 0.2383

Alcohol(3) 1.3780 0.2256

Tobacco(1) 0.5887 0.2844

Tobacco(2) 1.0260 0.2544

Tobacco(3) 1.4090 0.2823

26

Page 27: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Alternative: monotone effect of tobaccoFig. 20.1. Log-linear trend

-

6Log(odds)

q q q q0 1 2 3 Dose, z

rr

r

###########

β

β

27

Page 28: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Look at successive differences between effects:

Tobacco(1), Tobacco(2)-Tobacco(1), Tobacco(3)-Tobacco(2)

Exercise 25.1

Introduce a variable

taking values 0, 1, 2 or 3 and denote its effect by

[Tobacco]

Model: log(Odds) = Corner + Alcohol + [Tobacco]

28

Page 29: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Exercise 25.1: solution.Table 25.1. Alcohol and tobacco treated as categorical variables

Parameter Estimate SD Succ. diff.

Corner -1.6090 0.2654

Alcohol(1) 0.2897 0.2327 0.2897

Alcohol(2) 0.8437 0.2383 0.5540

Alcohol(3) 1.3780 0.2256 0.5543

Tobacco(1) 0.5887 0.2844 0.5887

Tobacco(2) 1.0260 0.2544 0.4373

Tobacco(3) 1.4090 0.2823 0.3830

29

Page 30: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Model: log(Odds) = Corner + Alcohol + [Tobacco]Table 25.2. The linear effect of tobacco consumption

Alcohol Tobacco log(Odds)=Corner + ...

0 0 -

0 1 1×[Tobacco]

0 2 2×[Tobacco]

0 3 3×[Tobacco]

1 0 Alcohol(1)

1 1 Alcohol(1)+1×[Tobacco]

1 2 Alcohol(1)+2×[Tobacco]

1 3 Alcohol(1)+3×[Tobacco]

2 0 Alcohol(2)

2 1 Alcohol(2)+1×[Tobacco]

2 2 Alcohol(2)+2×[Tobacco]

2 3 Alcohol(2)+3×[Tobacco]

3 0 Alcohol(3)

3 1 Alcohol(3)+1×[Tobacco]

3 2 Alcohol(3)+2×[Tobacco]

3 3 Alcohol(3)+3×[Tobacco]

30

Page 31: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Table 25.3. Linear effect of tobacco per level

Parameter Estimate SD

Corner –1.5250 0.219

Alcohol(1) 0.3020 0.232

Alcohol(2) 0.8579 0.237

Alcohol(3) 1.3880 0.225

[Tobacco] 0.4541 0.083

31

Page 32: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Similarly with alcohol consumption:introduce variable with values=0, 1, 2 or 3and denote its effect [Alcohol]

Table 25.4. Linear effects of alcohol and tobacco per level

Parameter Estimate SD

Corner –1.6290 0.1860

[Alcohol] 0.4901 0.0676

[Tobacco] 0.4517 0.0833

Exercise 25.3

32

Page 33: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Exercise 25.3: solution.Tobacco(3)+Alcohol(3)=2.7870

3× [Tobacco] + 3× [Alcohol] = 2.8254

33

Page 34: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Alternative ways of scoringTobacco: cigarettes/day (0 : 0, 1-19 : 10, 20-39 : 30, 40+ : 50)

Alcohol: ounces/day (0.0 : 0, 0.1-0.3 : 0.2, 0.4-1.5 : 1.0, 1.6+ : 2.0)

Table 25.5. Alcohol in ounces/day and tobacco in cigarettes/day

Parameter Estimate SD

Corner –1.2657 0.1539

[Alcohol] 0.6484 0.0881

[Tobacco] 0.0253 0.0046

34

Page 35: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Testing

– Test for linearity:

1) Comparing the “nested” models:log(Odds) = Corner + Alcohol + Tobacco

andlog(Odds) = Corner + Alcohol + [Tobacco],

here: LR test=0.38, 2. d.f.,

or

2) eliminating [Tobsq] (=0, 1, 4, 9) fromlog(Odds) = Corner + Alcohol + [Tobacco] + [Tobsq],

here LR test=0.02.

35

Page 36: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

– Trend test: (1. d.f.) Eliminating [Tobacco] from the model:log(Odds) = Corner + Alcohol + [Tobacco],

here LR test=30.88.

Why not use individual levels, that is, a truly quantitative covariateand no categorization at all?

Pros and cons

• Information is lost by categorization

• Categories may be more robust (e.g., smoking)

• Few outliers may have large influence (“Casanova effect”!)

• Model with a linear effect is no longer “nested” in categoricalmodel ⇒ alternative alternatives are needed when testing linearity

36

Page 37: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Indicator (‘dummy’) variablesThe way in which the categorical covariates are entered into theregression model.

Table 25.8. Indicator variables for the three alcohol parameters

A1 A2 A3 Level log(Odds) = Corner + · · ·

0 0 0 0 –

1 0 0 1 Alcohol(1)

0 1 0 2 Alcohol(2)

0 0 1 3 Alcohol(3)

37

Page 38: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

The use of indicator variables enables the programmer to choosehis/her preferred reference level.

Interaction terms are simple products of indicator variabels.

Table 25.10. Indicator variables for interaction parameters

A1 A2 A3 T A1 · T A2 · T A3 · T

0 0 0 0 0 0 00 0 0 1 0 0 01 0 0 0 0 0 01 0 0 1 1 0 00 1 0 0 0 0 00 1 0 1 0 1 00 0 1 0 0 0 00 0 1 1 0 0 1

NB: Tobacco is here on 2 levels only

38

Page 39: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Treating the zero level differentlyFig. 25.1. Separating zero exposure from the dose-response.

-

6Log(odds)

q q q q0 1 2 3 Dose, z

rr r r

CornerCorner+Non-smoker

Non-smoker

""""""""

%%%%"""

39

Page 40: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Corresponds to adding a new variable [Non-smoker]

Table 25.11. Separating zero exposure from the dose-response

Tobacco Non-smoker log(Odds) = Corner + · · ·

0 1 [Non-smoker]

1 0 1 × [Tobacco]

2 0 2 × [Tobacco]

3 0 3 × [Tobacco]

(Alternative: include [Smoker] = 1 - [Non-smoker])

40

Page 41: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Indicator variables may be chosen in several ways.

E.g., to model successive differences

Table 25.12. Indicators to compare each level with the one before

Tobacco D1 D2 D3

0 0 0 01 1 0 02 1 1 03 1 1 1

Here, D1 = indicator for Tobacco ≥ 1

D2 = indicator for Tobacco ≥ 2

D3 = indicator for Tobacco ≥ 3

41

Page 42: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

data oral;

filename oraldata

url ’http://publicifsv.sund.ku.dk/~pka/epidata/oral.txt’;

infile oraldata firstobs=2;

input cases controls alc tob;

total=cases+controls;

run;

proc genmod data=oral;

class tob (ref=’0’) alc (ref=’0’);

model cases/total=tob alc/dist=binomial type3;

run;

42

Page 43: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

/* ***** NEW DATA STEP IN SAME SAS PROGRAM ***** */

data oral; set oral;

tob0=(tob=0); tob1=(tob=1);

tob2=(tob=2); tob3=(tob=3);

alc0=(alc=0); alc1=(alc=1);

alc2=(alc=2); alc3=(alc=3);

run;

/* ***** FURTHER PROCedure STEPS ***** */

proc genmod data=oral;

model cases/total=tob alc1 alc2 alc3/dist=binomial type3;

run;

proc genmod data=oral;

model cases/total=tob alc0 alc1 alc2/dist=binomial type3;

run;

proc genmod data=oral;

model cases/total=tob tob0 alc0 alc1 alc2/dist=binomial type3;

run;

43

Page 44: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Model Information

Data Set WORK.ORAL

Distribution Binomial

Link Function Logit

Response Variable (Events) cases

Response Variable (Trials) total

Observations Used 16

Number Of Events 483

Number Of Trials 930

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 11 7.0730 0.6430

Scaled Deviance 11 7.0730 0.6430

Pearson Chi-Square 11 7.0049 0.6368

Scaled Pearson X2 11 7.0049 0.6368

Log Likelihood -581.1858

44

Page 45: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Analysis Of Parameter Estimates

Standard Wald 95% Confidence Chi-

Parameter DF Estimate Error Limits Square

Intercept 1 -1.5252 0.2193 -1.9551 -1.0954 48.37

tob 1 0.4541 0.0834 0.2907 0.6175 29.67

alc1 1 0.3020 0.2316 -0.1519 0.7558 1.70

alc2 1 0.8579 0.2369 0.3935 1.3223 13.11

alc3 1 1.3880 0.2249 0.9472 1.8288 38.09

Scale 0 1.0000 0.0000 1.0000 1.0000

Parameter Pr > ChiSq

Intercept <.0001

tob <.0001

alc1 0.1922

alc2 0.0003

alc3 <.0001

45

Page 46: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

LR Statistics For Type 3 Analysis

Chi-

Source DF Square Pr > ChiSq

tob 1 30.88 <.0001

alc1 1 1.72 0.1900

alc2 1 13.49 0.0002

alc3 1 40.19 <.0001

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 11 7.0730 0.6430

Scaled Deviance 11 7.0730 0.6430

Pearson Chi-Square 11 7.0049 0.6368

Scaled Pearson X2 11 7.0049 0.6368

Log Likelihood -581.1858

46

Page 47: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Analysis Of Parameter Estimates

Standard Wald 95% Confidence Chi-

Parameter DF Estimate Error Limits Square

Intercept 1 -0.1373 0.2081 -0.5452 0.2706 0.44

tob 1 0.4541 0.0834 0.2907 0.6175 29.67

alc0 1 -1.3880 0.2249 -1.8288 -0.9472 38.09

alc1 1 -1.0860 0.1834 -1.4455 -0.7265 35.06

alc2 1 -0.5301 0.1870 -0.8965 -0.1636 8.04

Scale 0 1.0000 0.0000 1.0000 1.0000

Parameter Pr > ChiSq

Intercept 0.5094

tob <.0001

alc0 <.0001

alc1 <.0001

alc2 0.0046

47

Page 48: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

LR Statistics For Type 3 Analysis

Chi-

Source DF Square Pr > ChiSq

tob 1 30.88 <.0001

alc0 1 40.19 <.0001

alc1 1 35.95 <.0001

alc2 1 8.03 0.0046

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 10 6.7227 0.6723

Scaled Deviance 10 6.7227 0.6723

Pearson Chi-Square 10 6.6852 0.6685

Scaled Pearson X2 10 6.6852 0.6685

Log Likelihood -581.0106

48

Page 49: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Analysis Of Parameter Estimates

Standard Wald 95% Confidence Chi-

Parameter DF Estimate Error Limits Square

Intercept 1 -0.0347 0.2711 -0.5660 0.4966 0.02

tob 1 0.4086 0.1133 0.1865 0.6306 13.01

tob0 1 -0.1965 0.3328 -0.8487 0.4558 0.35

alc0 1 -1.3795 0.2254 -1.8213 -0.9377 37.45

alc1 1 -1.0880 0.1835 -1.4476 -0.7285 35.17

alc2 1 -0.5332 0.1870 -0.8998 -0.1667 8.13

Scale 0 1.0000 0.0000 1.0000 1.0000

Parameter Pr > ChiSq

Intercept 0.8982

tob 0.0003

tob0 0.5549

alc0 <.0001

alc1 <.0001

alc2 0.0044

49

Page 50: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

The GENMOD Procedure

LR Statistics For Type 3 Analysis

Chi-

Source DF Square Pr > ChiSq

tob 1 13.22 0.0003

tob0 1 0.35 0.5539

alc0 1 39.41 <.0001

alc1 1 36.07 <.0001

alc2 1 8.12 0.0044

50

Page 51: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

More on quantitative covariatesIn a model like

log(Rate)=Corner + Exposure + [x]

the effect of x is assumed to be linear, i.e. [x] expresses the change inlog(Rate) per 1 unit change of x.

To test for linearity, one may add [xsq] to the model where xsq= x2.

An alternative alternative is a linear spline.

51

Page 52: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Linear splinesAn alternative to a straight line is a broken line.

Introduce break points for x, e.g., a1, a2, a3 and add the three linearsplines

I1 × [x− a1], I2 × [x− a2], I3 × [x− a3]

to [x]:

Here, I1 = indicator for x ≥ a1I2 = indicator for x ≥ a2I3 = indicator for x ≥ a3

The parameter for the spline I1 × [x− a1] = x+1 , say gives the changein slope at the break point a1. Similarly at a2, a3.

Splines are easy to program and parameters are easier to interpret thanfor quadratic terms.

52

Page 53: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

SAS code for linear splines

data new; set old;

if x<a1 then spline1=0; if x>=a1 then spline1=x-a1;

if x<a2 then spline2=0; if x>=a2 then spline1=x-a2;

/* Alternatively - use dummy variables:

spline1=(x-a1)*(x>=a1); spline2=(x-a2)*(x>=a2); */

run;

53

Page 54: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

2 4 6 8 10

−2.0

−1.5

−1.0

−0.5

0.00.5

x

Linea

r Pred

ictor

2 4 6 8 10

−2.0

−1.5

−1.0

−0.5

0.00.5

x

Linea

r Pred

ictor

2 4 6 8 10

−2.0

−1.5

−1.0

−0.5

0.00.5

x

Linea

r pred

ictor

2 4 6 8 10

−2.0

−1.5

−1.0

−0.5

0.00.5

x

Linea

r Pred

ictor

54

Page 55: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Quadratic and restricted splinesThe linear spline function is continuous but not smooth. The ‘linearpredictor’ is:

a+ b1xi + b1,1x+i1 + ...+ b1,3x

+i3.

Linear predictor with quadratic splines:

a+ b1xi + b2x2i + b1,1(x

+i1)

2 + ...+ b1,3(x+i3)

2.

No simple interpretation of coefficients, but a smooth curve isobtained. LR test for linearity b2 = b1,1 = ... = b1,3 = 0.

The quadratic effect b2x2i may be quite dramatic for large (bothpositive and negative) values of xi (‘Casanova’ again!).

This may be avoided using restricted splines. The idea is that for large(positive or negative) x’s, the curve is linear instead of quadratic.

Also cubic splines may be defined: (x+ij)3.

55

Page 56: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Additive models for ratesAdditive Poisson model for diet data:

Rate = Corner + Age + Exposure,

CasesPyrs

= Corner + Age + Exposure,

Cases = Pyrs × (Corner + Age + Exposure).

That is:

• ‘identity link’ instead of log link

• Corner replaced by Pyrs times Corner

• explanatory variables also multiplied by Pyrs

• no ‘offset’ needed

56

Page 57: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

SAS program using indicator variablesdata ihd;

input eksp alder pyrs cases;

pyrs=pyrs/1000;

age0=pyrs*(alder=2); age1=pyrs*(alder=1); age2=pyrs*(alder=0);

eksp0=pyrs*(eksp=1); eksp1=pyrs*(eksp=0);

datalines;

0 2 311.9 2

0 1 878.1 12

0 0 667.5 14

1 2 607.9 4

1 1 1272.1 5

1 0 888.9 8

;

run;

proc genmod data=ihd;

model cases=pyrs age1 age2 eksp1/dist=poi link=id noint;

run;

57

Page 58: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

SAS output (edited)The GENMOD Procedure

Model Information

Data Set WORK.IHD

Distribution Poisson

Link Function Identity

Dependent Variable cases

Number of Observations Read 6

Number of Observations Used 6

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 2 2.1024 1.0512

Scaled Deviance 2 2.1024 1.0512

Pearson Chi-Square 2 1.8369 0.9185

Scaled Pearson X2 2 1.8369 0.9185

Log Likelihood 52.3287

58

Page 59: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Analysis Of Parameter Estimates

Standard Wald 95% Confidence Chi-

Parameter DF Estimate Error Limits Square

Intercept 0 0.0000 0.0000 0.0000 0.0000 .

pyrs 1 5.1758 2.5085 0.2593 10.0924 4.26

age1 1 -1.0028 3.0344 -6.9501 4.9445 0.11

age2 1 4.8503 3.9117 -2.8164 12.5171 1.54

eksp1 1 8.4316 3.2113 2.1376 14.7255 6.89

Scale 0 1.0000 0.0000 1.0000 1.0000

Analysis Of Parameter Estimates

Parameter Pr > ChiSq

Intercept .

pyrs 0.0391

age1 0.7410

age2 0.2150

eksp1 0.0086

59

Page 60: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Observed and predicted ratesTable 22.6. Energy intake and IHD incidence per 1000 person-years(D = Cases, Y = P-yrs.)

Exposed Unexposed

Current (< 2750 kcal) (≥ 2750 kcal)

age D Y Rate D Y Rate RD

40–49 2 311.9 6.41 (13.61) 4 607.9 6.58 (5.18) -0.17

50–59 12 878.1 13.67 (12.61) 5 1271.1 3.93 (4.18) 9.74

60–69 14 667.5 20.97 (18.46) 8 888.9 9.00 (10.03) 11.97

60

Page 61: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

SAS program using indicator variables: interactiondata ihd;

input eksp alder pyrs cases;

pyrs=pyrs/1000;

age0=pyrs*(alder=2); age1=pyrs*(alder=1); age2=pyrs*(alder=0);

eksp0=pyrs*(eksp=1); eksp1=pyrs*(eksp=0);

age1exp1=(alder=1)*(eksp=0)*pyrs; age2exp1=(alder=0)*(eksp=0)*pyrs;

datalines;

0 2 311.9 2

0 1 878.1 12

0 0 667.5 14

1 2 607.9 4

1 1 1272.1 5

1 0 888.9 8

;

run;

proc genmod data=ihd;

model cases=pyrs age1 age2 eksp1 age1exp1 age2exp1/dist=poi link=id noint;

run;

61

Page 62: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

SAS output (edited)The GENMOD Procedure

Model Information

Data Set WORK.IHD

Distribution Poisson

Link Function Identity

Dependent Variable cases

Number of Observations Read 6

Number of Observations Used 6

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 0 0.0000 .

Scaled Deviance 0 0.0000 .

Pearson Chi-Square 0 0.0000 .

Scaled Pearson X2 0 0.0000 .

Log Likelihood 53.3799

62

Page 63: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Analysis Of Parameter Estimates

Standard Wald 95% Confidence Chi-

Parameter DF Estimate Error Limits Square

Intercept 0 0.0000 0.0000 0.0000 0.0000 .

pyrs 1 6.5800 3.2900 0.1317 13.0283 4.00

age1 1 -2.6495 3.7301 -9.9605 4.6614 0.50

age2 1 2.4199 4.5770 -6.5509 11.3906 0.28

eksp1 1 -0.1677 5.6021 -11.1476 10.8121 0.00

age1exp1 1 9.9031 7.0736 -3.9609 23.7671 1.96

age2exp1 1 12.1416 8.5399 -4.5962 28.8794 2.02

Scale 0 1.0000 0.0000 1.0000 1.0000

Parameter Pr > ChiSq

Intercept .

pyrs 0.0455

age1 0.4775

age2 0.5970

eksp1 0.9761

age1exp1 0.1615

age2exp1 0.1551

63

Page 64: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Exercise 1Consider the data from last time from Doll & Hill’s famous study ofsmoking and cardiovascular mortality among British doctors (see thefile doctors.txt).

1. Examine if there is an interaction between age and smoking by addingthe term smoke*agegrp to the model including smoke and agegrp.

2. In the model with interaction, drop the ‘main effect’ (smoke) andinterpret the resulting parameter estimates.

3. Examine if the effect of agegrp can be assumed to be linear.

4. Fit a model where the effect of agegrp is a linear spline. Do the test forlinearity.

5. Return to treating age and smoking as categorical variables. Fit anadditive rate model and study whether there is an interaction betweenage and smoking on the additive scale.

64

Page 65: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Exercise 2This exercise continues the analysis of the oral cancer example inClayton & Hills’ Table 24.6 (see the file oral.txt).

1. Fit all the models in Clayton & Hills Fig. 24.2 (slide 24) and reconstructthe likelihood ratio tests from Clayton & Hills Exercise 24.6 (slide 25).

2. Try to test for linearity for tob by defining a new variable(tobsq=tob*tob) and add it to the model where both tob and alc

enter linearly. What is the conclusion concerning linearity of tob?

3. Test for linearity when tobacco is instead coded as in C & H’s Table25.5 (slide 34). For which of the two ways of coding the variable doesthe linear model give the better fit?

4. Fit the model where tob enters linearly and alc is categorical. Whathappens if an interaction term is added? Interpret the results.

5. Fit the model where both tob and alc enter linearly. What happens ifan interaction term is added? Interpret the results.

65

Page 66: Ph.d. coursein“Advancedstatistical ...publicifsv.sund.ku.dk/~pka/avepi18/day2-18.pdf · 2 ˘˜2 1 (chi-square) ... log(Rate)=Corner+Exposure+Age+ExposureAge Introducinginteractiontermsintotheregressionmodel,we:

Description of data set oral.txtThe data set is available from

publicifsv.sund.ku.dk/~pka/epidata

• There are 1 + 16 lines: line no. 1 contains names of variables,and there is one line for each of the cells in the four-by-four table.

• cases: number of cases

• controls: number of controls

• alc: 0, 1, 2 or 3 giving the level of alcohol consumption

• tob: 0, 1, 2 or 3 giving the level of tobacco consumption

66