Top Banner
BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching Matching/Stratification G roup subjectsinto subsetson the basisofassum ed confounders(e.g. age, gender, clinic, etc) M atching variablesshould notbe ofdirectscientific interest M atching D one in advance, often in a specified ratio:1 case to 1 control;or 1 case to m controls M ay be m atched on underlying relationship thatcannotbe quantified (tw ins, siblings, neighbors) Stratification U sually done after data collection on the basisofrecorded covariates Little controlover the num ber ofcasesand controlsin each stratum n j casesm atched to m j controlsin stratum j Strata w ith either no casesor no controlsgetelim inated from the analysis M atched analyzed asstratified Suppose w e m atch a 40-44 year old case w ith four 40-44 year old controls, then m atch another 40-44 year old case w ith four different40-44 year old controls C om bine alltogether and analyze astw o casesm atched to eightcontrols(“frequency m atching”)
37
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 1

Lecture 12 – Introduction to MatchingMatching/Stratification

Group subjects into subsets on the basis of assumed confounders (e.g. age, gender, clinic, etc)

Matching variables should not be of direct scientific interest

Matching

Done in advance, often in a specified ratio: 1 case to 1 control ; or 1 case to m controls

May be matched on underlying relationship that cannot be quantified (twins, siblings, neighbors)

Stratification

Usually done after data collection on the basis of recorded covariates

Little control over the number of cases and controls in each stratum

n j cases matched to m j controls in stratum j

Strata with either no cases or no controls get eliminated from the analysis

Matched analyzed as stratified

Suppose we match a 40-44 year old case with four 40-44 year old controls, then match another 40-44

year old case with four different 40-44 year old controls

Combine all together and analyze as two cases matched to eight controls (“frequency matching”)

Page 2: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 2

Conditional logistic regressionStratified or matched data are often analyzed with conditional logistic regression

Matched data can be analyzed with unconditional logistic regression in some cases

Unmatched data can be analyzed with conditional logistic regression in some cases (post-hoc

stratification)

Conditional logistic regression

For finely matched or stratified data with many nuisance parameters

Simplest case

1 to 1 matching for J pairs of observations

In the unconditional model ( j = 1, 2, … , J )

ikkij XX ...)(plogit 11i

Each matched pair has its own j

Number of parameters is J + K

As the number of pairs goes to , then the number of parameters does as well

Asymptotic statistical properties fail if this happens

Page 3: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 3

Conditional logistic regression

Solutions

1. Do not try to estimate the 's directly, but rather estimate some hidden distribution generating

the 's, e.g. assume they are from a standard normal distribution N( 0, 2 ) and estimate 2

instead of the 's

Approach is called a "random effects model"

2. Get rid of the 's by conditioning them out of the likelihood and estimating the remaining parameters

(conditional logistic regression)

Consider the likelihood contribution of the j th matched pair (one case-one control)

controlcase

case

controlcase

case

j

j

controljcasej

casej

XX

X

XX

X

XX

X

ee

e

ee

e

e

e

ee

e

which depends only on the covariate and , but not j

Page 4: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 4

Conditional logistic regressionSituation where a conditional likelihood is necessary or at least useful

Let the first stratum be the referent group

2 stratumin not 0

2 stratumin 1Z2

3 stratumin not 0

3 stratumin 1Z3 …

J stratumin not 0

J stratumin 1ZJ

kkJJ XXZZZ ......(p)logit 1133221

has J matched sets (strata)

Suppose each set is 1 case : m controls, then there are n = J (m+1) observations

If J is small relative to n, then we can use standard logistic regression and estimate the 's

If J as n then we do need to use the conditional likelihood

If the unconditional likelihood is used, the estimates of the 's are biased anti-conservatively

(too high for > 0)

For example, in 1-1 matching with a single exposure and many matched sets

nalunconditio estimates 2 not and nalunconditio estimates 2 not

Breslow& Day (page 252) shows the effects of this bias under different sample sizes and true OR’s

Page 5: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 5

Conditional logistic regressionFew strata relative to sample size:

Age group categorized into six groups (30-39), (40-49), (50-59), (60-69),

(70-79), 80+

Only six strata and the number of strata does not increase as n

Can use unconditional logistic regression :

kk XX

AGEGRPAGEGRP

...

...(p)logit

11

66221

Can also use conditional logistic regression and condition out the six

parameters

Often compare both methods to make sure they agree

My personal rule: I tend to use conditional logistic regression and would

always use it when the number of nuisance parameters ('s) is

greater than 10

Page 6: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 6

Example Oxford data with a parameter for each year of birth 1944-1964

Cases and controls were frequency matched on year of birth Ungrouped data: n = 11,852 observations

. tabulate xray y | y xray | 0 1 | Total -----------+----------------------+---------- 0 | 5,324 4,994 | 10,318 1 | 602 932 | 1,534 -----------+----------------------+---------- Total | 5,926 5,926 | 11,852

. table yob y ---------------------- | y yob | 0 1 ----------+----------- 1934 | 28 28 1935 | 53 53 1936 | 108 108 1937 | 175 175 1938 | 226 226 1939 | 281 281 1940 | 345 345 1941 | 429 429 1942 | 455 455 1943 | 529 529 1944 | 486 486 1945 | 471 471 1946 | 457 457 1947 | 378 378 1948 | 405 405 1949 | 358 358 1950 | 272 272 1951 | 192 192 1952 | 146 146 1953 | 100 100 1954 | 32 32 ----------------------

Page 7: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 7

Example Usual odds ratio and Mantel-Haenszel odds ratio adjusting for

year of birth

Standard logistic regression

. cc y xray Proportion | Exposed Unexposed | Total Exposed -----------------+------------------------+------------------------ Cases | 932 4994 | 5926 0.1573 Controls | 602 5324 | 5926 0.1016 -----------------+------------------------+------------------------ Total | 1534 10318 | 11852 0.1294 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Odds ratio | 1.650475 | 1.477037 1.84433 (exact) Attr. frac. ex. | .3941138 | .3229688 .4577978 (exact) Attr. frac. pop | .0619835 | +------------------------------------------------- chi2(1) = 81.55 Pr>chi2 = 0.0000 . mhodds y xray yob Mantel-Haenszel estimate of the odds ratio Comparing xray==1 vs. xray==0, controlling for yob ---------------------------------------------------------------- Odds Ratio chi2(1) P>chi2 [95% Conf. Interval] ---------------------------------------------------------------- 1.654310 81.99 0.0000 1.481817 1.846883 ---------------------------------------------------------------- . logistic y xray

Logistic regression Number of obs = 11852 LR chi2(1) = 82.10 Prob > chi2 = 0.0000 Log likelihood = -8174.1279 Pseudo R2 = 0.0050 ------------------------------------------------------------------------------ y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- xray | 1.650475 .0922224 8.97 0.000 1.479268 1.841496 ------------------------------------------------------------------------------

Page 8: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 8

Example Unconditional logistic regression adjusting for YOB

. xi: logistic y xray i.yob i.yob _Iyob_1934-1954 (naturally coded; _Iyob_1934 omitted) Logistic regression Number of obs = 11852 LR chi2(21) = 82.69 Prob > chi2 = 0.0000 Log likelihood = -8173.8379 Pseudo R2 = 0.0050 ------------------------------------------------------------------------------ y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- xray | 1.656325 .0928747 9.00 0.000 1.48394 1.848735 _Iyob_1935 | .9752808 .3228944 -0.08 0.940 .5097007 1.86614 _Iyob_1936 | .9830131 .295339 -0.06 0.955 .5455344 1.771318 _Iyob_1937 | .9855205 .2841592 -0.05 0.960 .5600609 1.734188 _Iyob_1938 | .973024 .2761799 -0.10 0.923 .557855 1.697172 _Iyob_1939 | .9695006 .2721963 -0.11 0.912 .5591977 1.680857 _Iyob_1940 | .9651519 .2686923 -0.13 0.899 .5592764 1.665578 _Iyob_1941 | .9628661 .2660777 -0.14 0.891 .5602031 1.654956 _Iyob_1942 | .9537434 .2631168 -0.17 0.864 .5553973 1.637794 _Iyob_1943 | .9605308 .263891 -0.15 0.883 .5606049 1.645757 _Iyob_1944 | .9489256 .2613148 -0.19 0.849 .5531317 1.62793 _Iyob_1945 | .9460176 .2607497 -0.20 0.840 .5511675 1.623734 _Iyob_1946 | .9405123 .2594737 -0.22 0.824 .5476844 1.615097 _Iyob_1947 | .957647 .2657444 -0.16 0.876 .5559034 1.649725 _Iyob_1948 | .965126 .2671821 -0.13 0.898 .5609709 1.660457 _Iyob_1949 | .9878059 .2745763 -0.04 0.965 .572884 1.703243 _Iyob_1950 | .9738951 .2738298 -0.09 0.925 .5612806 1.689835 _Iyob_1951 | .9763592 .2798153 -0.08 0.933 .556754 1.712206 _Iyob_1952 | .980806 .2866674 -0.07 0.947 .5530905 1.739282 _Iyob_1953 | .9795656 .2967346 -0.07 0.946 .5409829 1.773714 _Iyob_1954 | .9649768 .3540197 -0.10 0.923 .4701514 1.980597 ------------------------------------------------------------------------------

Page 9: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 9

Example. est store A . xi: logistic y i.yob i.yob _Iyob_1934-1954 (naturally coded; _Iyob_1934 omitted) Logistic regression Number of obs = 11852 LR chi2(20) = -0.00 Prob > chi2 = 1.0000 Log likelihood = -8215.1804 Pseudo R2 = -0.0000 ------------------------------------------------------------------------------ y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Iyob_1935 | 1 .3304004 0.00 1.000 .5233144 1.910897 _Iyob_1936 | 1 .2999118 0.00 1.000 .5555391 1.800053 _Iyob_1937 | 1 .2878492 0.00 1.000 .5688298 1.757995 _Iyob_1938 | 1 .283334 0.00 1.000 .5738861 1.742506 _Iyob_1939 | 1 .2802606 0.00 1.000 .5773534 1.732041 _Iyob_1940 | 1 .2778951 0.00 1.000 .5800364 1.72403 _Iyob_1941 | 1 .2758452 0.00 1.000 .5823715 1.717117 _Iyob_1942 | 1 .2753619 0.00 1.000 .5829234 1.715491 _Iyob_1943 | 1 .2742431 0.00 1.000 .5842031 1.711734 _Iyob_1944 | 1 .2748523 0.00 1.000 .5835059 1.713779 _Iyob_1945 | 1 .2750906 0.00 1.000 .5832335 1.714579 _Iyob_1946 | 1 .275327 0.00 1.000 .5829634 1.715374 _Iyob_1947 | 1 .276983 0.00 1.000 .5810743 1.72095 _Iyob_1948 | 1 .2763455 0.00 1.000 .5818007 1.718802 _Iyob_1949 | 1 .2775161 0.00 1.000 .5804675 1.722749 _Iyob_1950 | 1 .2806804 0.00 1.000 .5768785 1.733467 _Iyob_1951 | 1 .2860861 0.00 1.000 .5707989 1.751931 _Iyob_1952 | 1 .2917657 0.00 1.000 .5644801 1.771542 _Iyob_1953 | 1 .3023716 0.00 1.000 .5528672 1.808753 _Iyob_1954 | 1 .3659625 0.00 1.000 .4880813 2.048839 ------------------------------------------------------------------------------ . est store B . lrtest A B Likelihood-ratio test LR chi2(1) = 82.69 (Assumption: B nested in A) Prob > chi2 = 0.0000

Page 10: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 10

Example Conditional logistic regression stratified on YOB

with m cases : n controls for each YOB (“true stratification”)

In all the analyses, the OR and 95% CI are about the same due to the close frequency matching

. clogit y xray, group(yob) note: multiple positive outcomes within groups encountered. Iteration 0: log likelihood = -8105.5126 Iteration 1: log likelihood = -8105.5114 Iteration 2: log likelihood = -8105.5114 Conditional (fixed-effects) logistic regression Number of obs = 11852 LR chi2(1) = 82.55 Prob > chi2 = 0.0000 Log likelihood = -8105.5114 Pseudo R2 = 0.0051 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- xray | .5037908 .0560286 8.99 0.000 .3939767 .6136048 ------------------------------------------------------------------------------ . clogit , or Conditional (fixed-effects) logistic regression Number of obs = 11852 LR chi2(1) = 82.55 Prob > chi2 = 0.0000 Log likelihood = -8105.5114 Pseudo R2 = 0.0051 ------------------------------------------------------------------------------ y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- xray | 1.654983 .0927264 8.99 0.000 1.482866 1.847078 ------------------------------------------------------------------------------

Page 11: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 11

Conditional logistic regression

Philosophy underlying conditional logistic regression

Cohort study:

Suppose subjects are pair matched on one or more variables, and then all are

followed for a fixed period of time

A binary outcome is recorded at the end of the observation period

Assume that the matching variable is highly related to the probability of the

outcome

If both people in the pair have the same outcome, there is no information

about the association between the covariates and the outcome (could be

due to the matching variables or due to the covariates)

Only those discordant on the outcome are informative, since it is assumed to

be due to the covariates, not the common matching variables

Inefficient to perform a cohort study this way

Page 12: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 12

1-1 matchingMatched Case-Control study ( 1 case : 1 control ) :

Guarantee that the pair is discordant on the outcome and collect covariate

information

Simplest case: Single binary exposure variable

Exposure Exposure Exposure Exposure Total

Pos Neg Pos Neg Pos Neg Pos Neg

Case 1 0 1 0 0 1 0 1 1

Control 1 0 0 1 1 0 0 1 1

Total 2 0 1 1 1 1 0 2 2

Number of

such tables

n11 n10 n01 n00

Let P1

X

X

Case

Case

e

e

j

j

1 and P0

X

X

Control

Control

e

e

j

j

1

Now assume that I do not know which one was actually the case (the labels

are missing), but I do know that exactly one of them is the case

Page 13: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 13

1-1 matchingThe probability I correctly identify the case is a conditional probability

P (1- P

P (1- P P (1- P1 0

1 0 0 1

)

) )

e

e ee

e e

e

e e

j

j j

j

j j

j

j j

X

X X

X

X X

X

X X

Case

Case Control

Case

Case Control

Control

Control Case

1

1

1

1

1

1 1

1

1

e

e e

j

j j

X

X X

Case

Case Control

e

e e

X

X X

Case

Case Control

e

e e

X

X X

Case

Case Control is the contribution of this pair to the conditional likelihood

Page 14: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 14

1-1 matchingWe want to estimate using this conditional likelihood

What if X Case = X Control = 1 (both exposed) ?

e

e e

e

e e

X

X X

Case

Case Control

1

2 does not help us estimate

Similarly if X Case = X Control = 0 (both unexposed)

e

e e

X

X X

Case

Case Control

1

2 does not help us estimate

Only those pairs for which X Case = 1 and X Control = 0 or

X Case = 0 and X Control = 1 can help us estimate

In this simple case

e

n

n10

01

#sets case exposed

#sets control exposed

Matched pair must be discordant on outcome

Matched pair must be discordant on exposure

Case

unexposed

Case

exposed

Control unexposed 00n 10n

Control exposed 01n 11n

Page 15: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 15

1-1 matching

Several covariates with 1 to 1 matching

Suppose case covariates are X11, X12, ... , X1k

and the control covariates are X01, X02, ... , X0k

We want to estimate 1, 2, ... , k

e

e e

k

k k

1 2

1 2 1 2

X X X

X X X X X X

11 12 1k

11 12 1k 01 02 0k

...

... ...

is the contribution of this pair to the conditional likelihood

Note there is no 0 , that is no intercept (const) in a conditional model

Page 16: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 16

ExampleExample: Endometrial cancer in Leisure World

Total of 63 cancer cases identified

One case matched to four controls from the same community

Matched on age (within one year); age at entry into the retirement

community

Controls had to be alive and be at risk for endometrial cancer

Exposure variables assessed from medical records

In our first examples we will use only one control per case and ignore the

other three (63 matched sets)

Consider some dichotomous exposure variables

First few records . sort set case . list set case age gall hyperten obes estrogen estdose in 1/6 +----------------------------------------------------------------+ | set case age gall hyperten obes estrogen estdose | |----------------------------------------------------------------| 1. | 1 0 75 0 0 . 0 0 | 2. | 1 1 74 0 0 1 1 4 | 3. | 2 0 67 0 0 0 1 6 | 4. | 2 1 67 0 0 0 1 6 | 5. | 3 0 76 0 1 1 1 2 | 6. | 3 1 76 0 1 1 1 1 |

Page 17: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 17

Example

Not really what we want since we want to retain the matching and compare Gall (case) vs Gall (control)

First look at the descriptives . summ Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- set | 126 32 18.25683 1 63 case | 126 .5 .501996 0 1 age | 126 70.69841 6.333744 57 83 gall | 126 .2063492 .4062996 0 1 hyperten | 126 .3730159 .4855368 0 1 obes | 107 .6728972 .471363 0 1 thyroid | 126 .2777778 .4496913 0 1 barbit | 126 .2936508 .457252 0 1 tranq | 126 .5079365 .5019328 0 1 estrogen | 126 .6825397 .4673464 0 1 estdose | 122 1.434426 1.876288 0 6 premarin | 119 32.47899 39.71655 0 96 other | 126 1.206349 .4062996 1 2 . tabulate case gall | gall case | 0 1 | Total -----------+----------------------+---------- 0 | 54 9 | 63 1 | 46 17 | 63 -----------+----------------------+---------- Total | 100 26 | 126

Page 18: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 18

Example Use small trick to get case and control value on the same line

for Gall bladder disease. gen gallcont=gall if case==0 (63 missing values generated) . gen gallcase=gall[_n+1] if set==set[_n+1] (63 missing values generated) . list set case age gall gallcase gallcont in 1/12 +-----------------------------------------------+ | set case age gall gallcase gallcont | |-----------------------------------------------| 1. | 1 0 75 0 0 0 | 2. | 1 1 74 0 . . | 3. | 2 0 67 0 0 0 | 4. | 2 1 67 0 . . | 5. | 3 0 76 0 0 0 | 6. | 3 1 76 0 . . | 7. | 4 0 70 1 0 1 | 8. | 4 1 71 0 . . | 9. | 5 0 69 0 1 0 | 10. | 5 1 69 1 . . | 11. | 6 0 71 0 0 0 | 12. | 6 1 70 0 . . | +-----------------------------------------------+ . tabulate gallcont gallcase | gallcase gallcont | 0 1 | Total -----------+----------------------+---------- 0 | 41 13 | 54 1 | 5 4 | 9 -----------+----------------------+---------- Total | 46 17 | 63

Page 19: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 19

Example Can use matched case-control command (mcc)

Can get the OR easily and get confidence intervals and exact p-values based on the exact binomial distribution with null hypothesis p=0.50 and n = number discordant on exposure status

Easier to just use conditional logistic regression

. mcc gallcase gallcont | Controls | Cases | Exposed Unexposed | Total -----------------+------------------------+------------ Exposed | 4 13 | 17 Unexposed | 5 41 | 46 -----------------+------------------------+------------ Total | 9 54 | 63 McNemar's chi2(1) = 3.56 Prob > chi2 = 0.0593 Exact McNemar significance probability = 0.0963 Proportion with factor Cases .2698413 Controls .1428571 [95% Conf. Interval] --------- -------------------- difference .1269841 -.017101 .2710693 ratio 1.888889 .9643767 3.699697 rel. diff. .1481481 .0060224 .2902738 odds ratio 2.6 .869851 9.314679 (exact)

Page 20: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 20

ExamplePrevious gall bladder disease; univariate odds ratio . clogit case gall, group(set) or Conditional (fixed-effects) logistic regression Number of obs = 126 LR chi2(1) = 3.68 Prob > chi2 = 0.0550 Log likelihood = -41.826784 Pseudo R2 = 0.0422 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gall | 2.6 1.368211 1.82 0.069 .9269183 7.292984 ------------------------------------------------------------------------------

Hypertension; univariate odds ratio . clogit case hyperten, group(set) or Conditional (fixed-effects) logistic regression Number of obs = 126 chi2(1) = 0.81 Prob > chi2 = 0.3681 Log Likelihood = -43.26328 Pseudo R2 = 0.0093 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- hyperten | 1.384615 .5039669 0.894 0.371 .678441 2.825831 ------------------------------------------------------------------------------

Obesity; univariate odds ratio . clogit case obes, group(set) or Note: 17 groups (17 obs) dropped due to all positive or negative outcomes. Conditional (fixed-effects) logistic regression Number of obs = 90 chi2(1) = 0.25 Prob > chi2 = 0.6166 Log Likelihood = -31.066296 Pseudo R2 = 0.0040 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- obes | 1.285714 .6479391 0.499 0.618 .4788276 3.45231 ------------------------------------------------------------------------------

Page 21: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 21

ExampleEstrogen use; univariate odds ratio . clogit case estrogen, group(set) or Conditional (fixed-effects) logistic regression Number of obs = 126 chi2(1) = 24.45 Prob > chi2 = 0.0000 Log Likelihood = -31.443696 Pseudo R2 = 0.2799 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- estrogen | 9.666667 5.862608 3.741 0.000 2.944712 31.73296 ------------------------------------------------------------------------------

Gall bladder and estrogen use; multivariate odds ratios . clogit case gall estrogen, group(set) or Conditional (fixed-effects) logistic regression Number of obs = 126 chi2(2) = 25.79 Prob > chi2 = 0.0000 Log Likelihood = -30.77245 Pseudo R2 = 0.2953 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- gall | 2.003172 1.233219 1.128 0.259 .5993636 6.694932 estrogen | 9.107077 5.552653 3.623 0.000 2.756735 30.08589 ------------------------------------------------------------------------------

Estrogen use odds ratio reduced from 9.67 to 9.11

Gall bladder odds ratio reduced from 2.60 to 2.00

Page 22: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 22

ExampleWhat can we do in conditional logistic regression ?

Model building proceeds in a fashion similar to unconditional logistic regression

Can do Wald test and LR test

Fit gall bladder disease . clogit case gall, group(set) or Conditional (fixed-effects) logistic regression Number of obs = 126 LR chi2(1) = 3.68 Prob > chi2 = 0.0550 Log likelihood = -41.826784 Pseudo R2 = 0.0422 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gall | 2.6 1.368211 1.82 0.069 .9269183 7.292984 ------------------------------------------------------------------------------

Gall bladder disease is marginally important

Now add estrogen use to the model . est store A . clogit case gall estrogen, group(set) or Conditional (fixed-effects) logistic regression Number of obs = 126 LR chi2(2) = 25.79 Prob > chi2 = 0.0000 Log likelihood = -30.77245 Pseudo R2 = 0.2953 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gall | 2.003172 1.233221 1.13 0.259 .5993628 6.694941 estrogen | 9.107077 5.552675 3.62 0.000 2.756722 30.08604 ------------------------------------------------------------------------------

Page 23: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 23

Example. est store B . lrtest A B Likelihood-ratio test LR chi2(1) = 22.11 (Assumption: A nested in B) Prob > chi2 = 0.0000

Consider the interaction of estrogen and gall bladder disease . xi: clogit case i.gall*estrogen, group(set) or i.gall _Igall_0-1 (naturally coded; _Igall_0 omitted) i.gall*estrogen _IgalXestro_# (coded as above) Conditional (fixed-effects) logistic regression Number of obs = 126 LR chi2(3) = 28.35 Prob > chi2 = 0.0000 Log likelihood = -29.492281 Pseudo R2 = 0.3246 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Igall_1 | 9.898637 12.10023 1.88 0.061 .9016854 108.6665 estrogen | 14.45528 10.88972 3.55 0.000 3.302049 63.28045 _IgalXestr~1 | .1174832 .1609567 -1.56 0.118 .008013 1.722488 ------------------------------------------------------------------------------ . est store C . lrtest B C Likelihood-ratio test LR chi2(1) = 2.56 (Assumption: B nested in C) Prob > chi2 = 0.1096 . lincom _Igall_1 + estrogen + _IgalXestro_1 , or ( 1) [case]_Igall_1 + [case]estrogen + [case]_IgalXestro_1 = 0 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 16.81038 15.2039 3.12 0.002 2.855758 98.95412 ------------------------------------------------------------------------------

Remove the interaction and consider whether gall bladder is needed

Page 24: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 24

Example. clogit case estrogen, group(set) or Conditional (fixed-effects) logistic regression Number of obs = 126 LR chi2(1) = 24.45 Prob > chi2 = 0.0000 Log likelihood = -31.443696 Pseudo R2 = 0.2799 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- estrogen | 9.666667 5.862625 3.74 0.000 2.944702 31.73307 ------------------------------------------------------------------------------ . est store A . lrtest A B Likelihood-ratio test LR chi2(1) = 1.34 (Assumption: A nested in B) Prob > chi2 = 0.2466

Estrogen alone is a strong predictor and gall bladder does not add significantly

Compute the conditional probabilities of being a case given 1-1 matching and the covariates . clogitp condprob, group(set) . list set case condprob estrogen set case condprob estrogen 1. 1 1 .90625 1 2. 1 0 .09375 0 etc 123. 62 0 .09375 0 124. 62 1 .90625 1 125. 63 0 .5 1 126. 63 1 .5 1

The sum of the conditional probabilities for each set is always 1 (for 1-m matching)

Sets where the covariate is the same for the case and control (condprob=0.5)

do not contribute to the conditional likelihood

Page 25: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 25

1-m matching 1 case matched to m controls

For simplicity assume, m = 3

For the j th matched set, let

X j0 = covariate for the case

X j1 = covariate for the first control

X j2 = covariate for the second control

X j3 = covariate for the third control

and Pij

X

X

ji

ji

e

e

j

j

1 where i = 0, 1, 2, 3

Then Pij is the absolute probability of becoming a case

However, what we know is that one and only one person became a case so what we want is the

conditional probability of being a case given exactly one person is a case

p00 1 2 3

0 1 2 3 1 0 2 3 2 0 1 3 3 0 1 2

P (1- P (1- P (1- P

P (1- P (1- P (1- P P (1- P (1- P (1- P P (1- P (1- P (1- P P (1- P (1- P (1- P

) ) )

) ) ) ) ) ) ) ) ) ) ) )

e

e e e e

j

j j j j

X

X X X X

j0

j0 j1 j2 j3

e

e e e e

X

X X X X

j0

j0 j1 j2 j3

Page 26: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 26

1-m matching Since j is not estimated, we can not get the unconditional probability Pij

We can estimate the conditional probabilities, i.e. p0, p1, p2, p3 , instead

These actually are multinomial probabilities

j3j2j1j0

j1

XXXX

X

1

eeee

ep

j3j2j1j0

j2

XXXX

X

2

eeee

ep

j3j2j1j0

j3

XXXX

X

3

eeee

ep

are the

conditional probabilities that each control was the one and only case in the set

Note: 13210 pppp by the way they are defined

If the covariate is the same for all members of the set we would have the likelihood contribution

4

1xxxx

x

XXXX

X

0 j3j2j1j0

j0

eeee

e

eeee

ep

so would not tell us anything about

If there are several covariates then the conditional probability for 1 to m matching is

e

e e

k

k k

1 2

1 2 1 2

X X X

X X X X X X

controls i=1 to m

j01 j02 j0k

j01 j02 j0k ji1 ji2 jik

...

... ...

Page 27: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 27

1-m matching

Note there is no 0 in this model ( no const )

Interactions can be included in the same framework

If all members in the set have exactly the same covariate values for all covariates, then the set

drops out

If all members in the set have exactly the same covariate values for some of the covariates , then

that set does not help estimation for those covariates

If the case has a missing value for one covariate, the case drops out and takes the entire set with it

Page 28: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 28

ExampleEndometrial cancer with 4 controls per case

Stratum outcome age gall bladdr

hyper tensn

obesity thyroid use

barbit use

tranq use

est use

conj dose

premar use

other drugs

1 1 74 0 0 1 1 1 1 1 4 96 1 1 0 75 0 0 9 0 0 0 0 0 0 2 1 0 74 0 0 9 0 0 0 0 0 0 2 1 0 74 0 0 9 0 0 0 0 0 0 2 1 0 75 0 0 1 0 0 1 1 1 48 1 2 1 67 0 0 0 0 0 1 1 6 96 1 2 0 67 0 0 0 0 0 0 1 6 5 2 2 0 67 0 1 1 0 0 0 0 0 0 1 2 0 67 0 0 0 0 0 0 1 3 53 2 2 0 68 0 0 0 0 0 0 1 3 45 1 3 1 76 0 1 1 0 0 1 1 1 9 1 3 0 76 0 1 1 1 0 1 1 2 96 1 3 0 76 0 1 0 0 1 1 1 0 3 1 3 0 76 0 1 1 0 1 1 1 3 15 1 3 0 77 0 0 0 0 0 0 1 0 36 1

n = 315 63 sets of five observations matched on age and neighborhood

First three sets have all the same value for gall bladder - no contribution from these sets when

estimating for gall bladder

First set has three missing values for obesity (code 9) - would end up as a 1-1 match since the case

value is not missing

Page 29: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 29

ExampleChange missing value codes to Stata missing values . replace obes=. if obes==9 (51 real changes made, 51 to missing) . replace estdose=. if estdose==9 (8 real changes made, 8 to missing) . replace premarin=. if premarin==99 (17 real changes made, 17 to missing)

Fit gall bladder . clogit case gall , group(set) or Conditional (fixed-effects) logistic regression Number of obs = 315 LR chi2(1) = 11.98 Prob > chi2 = 0.0005 Log likelihood = -95.404465 Pseudo R2 = 0.0591 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gall | 3.691907 1.372794 3.51 0.000 1.781317 7.651743 ------------------------------------------------------------------------------ . est store A

Gall bladder disease is highly significant - odds ratio is 3.69 compared to the 2.60 we found before

with 1-1 matching

Now add estrogen use

Page 30: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 30

Example. clogit case gall estrogen, group(set) or Conditional (fixed-effects) logistic regression Number of obs = 315 LR chi2(2) = 45.05 Prob > chi2 = 0.0000 Log likelihood = -78.871308 Pseudo R2 = 0.2221 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gall | 3.577465 1.469865 3.10 0.002 1.598984 8.003994 estrogen | 8.287802 3.644927 4.81 0.000 3.500144 19.62424 ------------------------------------------------------------------------------ . est store B . lrtest A B Likelihood-ratio test LR chi2(1) = 33.07 (Assumption: A nested in B) Prob > chi2 = 0.0000

Estrogen use also is highly significant - odds ratio is 8.29 (adjusted for gall bladder) - gall bladder

remains significant

Consider an interaction term between gall bladder and estrogen . xi: clogit case i.gall*estrogen, group(set) or i.gall _Igall_0-1 (naturally coded; _Igall_0 omitted) i.gall*estrogen _IgalXestro_# (coded as above) Conditional (fixed-effects) logistic regression Number of obs = 315 LR chi2(3) = 49.33 Prob > chi2 = 0.0000 Log likelihood = -76.730576 Pseudo R2 = 0.2432 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Igall_1 | 18.07166 15.95823 3.28 0.001 3.201415 102.0127 estrogen | 14.88179 9.104216 4.41 0.000 4.486594 49.36211 _IgalXestr~1 | .1283818 .1277365 -2.06 0.039 .0182633 .902457 ------------------------------------------------------------------------------

Page 31: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 31

Example. est store C . lrtest B C Likelihood-ratio test LR chi2(1) = 4.28 (Assumption: B nested in C) Prob > chi2 = 0.0385

Statistically significant interaction between gall bladder and estrogen, consider the joint effects . lincom _Igall_1 + estrogen + _IgalXestro_1 , or ( 1) [case]_Igall_1 + [case]estrogen + [case]_IgalXestro_1 = 0 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 34.52683 24.97059 4.90 0.000 8.366586 142.4837 ------------------------------------------------------------------------------

With an interaction term between gall bladder and estrogen

Estimated OR No estrogen Estrogen

No gall bladder 1.0 14.88

Gall bladder 18.07 18.07*14.88*0.1284 = 34.52

Without an interaction term between gall bladder and estrogen

Estimated OR No estrogen Estrogen

No gall bladder 1.0 8.29

Gall bladder 3.58 8.29*3.58 = 29.65

Page 32: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 32

Consider a larger set of main effects to find significant predictors (ignore interactions here)

Need to be careful of missing values (obesity, estdose) . clogit case gall estrogen hyperten obes thyroid barbit tranq other, group(set) or note: 6 groups (15 obs) dropped because of all positive or all negative outcomes.

If the case has a missing value, the entire set is lost Conditional (fixed-effects) logistic regression Number of obs = 249 LR chi2(8) = 35.63 Prob > chi2 = 0.0000 Log likelihood = -65.021923 Pseudo R2 = 0.2150 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gall | 3.564345 1.569089 2.89 0.004 1.504061 8.446836 estrogen | 6.743934 3.497201 3.68 0.000 2.440652 18.63463 hyperten | .7363322 .2787066 -0.81 0.419 .3506618 1.546177 obes | 1.50281 .6185417 0.99 0.322 .6707458 3.367055 thyroid | 1.039858 .414751 0.10 0.922 .4758495 2.272369 barbit | 1.126494 .4418284 0.30 0.761 .5222451 2.429871 tranq | .8861228 .3554218 -0.30 0.763 .4037187 1.944952 other | .5143312 .326884 -1.05 0.295 .1479995 1.787415 ------------------------------------------------------------------------------ . est store D . clogit case gall estrogen if obes~=. , group(set) or note: 6 groups (15 obs) dropped because of all positive or all negative outcomes. Conditional (fixed-effects) logistic regression Number of obs = 249 LR chi2(2) = 32.60 Prob > chi2 = 0.0000 Log likelihood = -66.535452 Pseudo R2 = 0.1968 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gall | 3.514066 1.533611 2.88 0.004 1.493922 8.265933 estrogen | 7.210559 3.63667 3.92 0.000 2.683258 19.3765 ------------------------------------------------------------------------------

Page 33: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 33

Example. lrtest D . Likelihood-ratio test LR chi2(6) = 3.03 (Assumption: . nested in D) Prob > chi2 = 0.8054

Only main effects of estrogen and gall bladder disease are risk factors

Look at estrogen dose coded 0-6 where 0 = lowest dose

Consider whether a dose-repsonse model is an improvement . table estrogen estdose ---------------------------------------------------- | estdose estrogen | 0 1 2 3 4 5 6 ----------+----------------------------------------- 0 | 132 1 | 62 22 16 40 16 12 7 ---------------------------------------------------- . clogit case gall estrogen estdose , group(set) or note: 4 groups (16 obs) dropped because of all positive or all negative outcomes. Conditional (fixed-effects) logistic regression Number of obs = 291 LR chi2(3) = 51.12 Prob > chi2 = 0.0000 Log likelihood = -68.502974 Pseudo R2 = 0.2717 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gall | 4.883757 2.265871 3.42 0.001 1.967117 12.1249 estrogen | 6.030169 2.975007 3.64 0.000 2.292904 15.8589 estdose | 1.229028 .1265513 2.00 0.045 1.004419 1.503865 ------------------------------------------------------------------------------ . est store C

Page 34: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 34

Example. clogit case gall estrogen if estdose~=. , group(set) or note: 4 groups (16 obs) dropped because of all positive or all negative outcomes. Conditional (fixed-effects) logistic regression Number of obs = 291 LR chi2(2) = 47.05 Prob > chi2 = 0.0000 Log likelihood = -70.539476 Pseudo R2 = 0.2501 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gall | 4.84514 2.186536 3.50 0.000 2.000655 11.73385 estrogen | 8.676606 3.935269 4.76 0.000 3.566876 21.10628 ------------------------------------------------------------------------------ . lrtest . C Likelihood-ratio test LR chi2(1) = 4.07 (Assumption: . nested in C) Prob > chi2 = 0.0436

Significant difference so keep estrogen dose in the model . clogit case gall estrogen estdose , group(set) or Conditional (fixed-effects) logistic regression Number of obs = 291 LR chi2(3) = 51.12 Prob > chi2 = 0.0000 Log likelihood = -68.502974 Pseudo R2 = 0.2717 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gall | 4.883757 2.265871 3.42 0.001 1.967117 12.1249 estrogen | 6.030169 2.975007 3.64 0.000 2.292904 15.8589 estdose | 1.229028 .1265513 2.00 0.045 1.004419 1.503865

Page 35: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 35

ExampleCompute the OR’s if gall=0, est=1, for different choices of dose . lincom estrogen+0*estdose, or ( 1) [case]estrogen = 0 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 6.030169 2.975007 3.64 0.000 2.292904 15.8589 ------------------------------------------------------------------------------ . lincom estrogen+1*estdose, or ( 1) [case]estrogen + [case]estdose = 0 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 7.411248 3.463135 4.29 0.000 2.965795 18.52003 ------------------------------------------------------------------------------ . lincom estrogen+2*estdose, or ( 1) [case]estrogen + 2 [case]estdose = 0 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 9.108635 4.218685 4.77 0.000 3.674662 22.57819

Compute the conditional probabilities under this model . clogitp condprob, group(set) . list set case condprob gall estrogen estdose

Page 36: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 36

Example +---------------------------------------------------+ | set case condprob gall estrogen estdose | |---------------------------------------------------| 1. | 1 1 .5692481 0 1 4 | 2. | 1 0 .0413737 0 0 0 | 3. | 1 0 .0413737 0 0 0 | 4. | 1 0 .0413737 0 0 0 | 5. | 1 0 .3066308 0 1 1 | |---------------------------------------------------| 6. | 2 1 .3199554 0 1 6 | 7. | 2 0 .3199554 0 1 6 | 8. | 2 0 .0153953 0 0 0 | 9. | 2 0 .172347 0 1 3 | 10. | 2 0 .172347 0 1 3 | |---------------------------------------------------| 11. | 3 1 .1863293 0 1 1 | 12. | 3 0 .2290041 0 1 2 | 13. | 3 0 .151607 0 1 0 | 14. | 3 0 .2814525 0 1 3 | 15. | 3 0 .151607 0 1 0 | |---------------------------------------------------| 16. | 4 1 . 0 1 . | 17. | 4 0 .6281511 1 1 2 | 18. | 4 0 .0851503 0 1 0 | 19. | 4 0 .1580782 0 1 3 | 20. | 4 0 .1286204 0 1 2 |

The last set is lost since Estdose is missing for the case

Caution is necessary with matching - missing values can eliminate the entire set

If all the covariate values are the same within the set, the set does not contribute to estimation for

that covariate

Page 37: BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.

BIOST 536 Lecture 12 37

Summary 1-1 matching case-control

Only sets where the covariate is different between case and control supply information about that covariate

Cannot get absolute probabilities, just conditional probabilities

Missing value for the case or control will cause loss of the set

1-m matching case-control Only sets where the covariate is different between the case

and at least one control will supply information about that covariate

Cannot get absolute probabilities, just conditional probabilities

Missing value for the case will cause loss of the set Can use Wald and LR tests as before for model fitting