LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.

Post on 03-Jan-2016

227 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

Transcript

LOGISTIC REGRESSION

A statistical procedure to relate the probability of an event to explanatory variables

Used in epidemiology to describe and evaluate the effect of a risk on the occurrence of a disease event.

Example: Framingham Heart Study

Coronary heart disease and blood pressure

LOGISTIC REGRESSION: AN EXAMPLE

Event: Coronary Heart Disease

Occurrence is the dependent variable,

which takes 2 values: Yes or No.

Risk factor: Blood pressure

Systolic blood pressure is the independent variable X, a continuous measurement.

The probability of getting coronary heart disease depends on blood pressure.

DATA

MAN SYSTOLIC DEVELOPEDBP CHD

John 130 NO 0Steven 140 NO 0Sean 145 NO 0Brian 150 NO 0Michael 155 YES 1Terry 160 NO 0Joseph 165 NO 1Patrick 170 YES 1Teddy 175 YES 1Ryan 180 YES 1

. . . .

. . . .

. . . .

SCATTER PLOT

0.0

0.2

0.4

0.6

0.8

1.0

120 140 160 180 200

Systolic blood pressure

CH

D

LINEAR REGRESSION FOR Prob.(CHD):NOT A GOOD IDEA!

-0.4-0.20.00.20.40.60.81.01.2

120 140 160 180 200

Systolic blood pressure

Pro

b(C

HD

)

PROPORTION WITH CHD BY SBP GROUP

Systolic BP Range Proportion

130-149 mmHg 0/3 0.00

150-169 mmHg 2/4 0.50

170-189 mmHg 3/3 1.00

LOGISTIC REGRESSION PROBABILITY MODEL

1

p(X) = -----------------------------

1 + exp (- 0 - X)

The probability of the event varies as an S-shaped function of the risk factor X: the logistic curve.

LOGISTIC CURVE MODEL: OCCURRENCE OF CHD AS A FUNCTION OF SBP

Probability of CHD

0

0.2

0.4

0.6

0.8

1

0 100 200 300Systolic Blood Pressure

Pro

babi

lity

prob.=1/{1+exp(-6.08 + 0.0243(SBP)}

LOGISTIC MODEL: LOG ODDS

p (X)

log ----------- = 0 + 1X

1 - p (X)

The log of the odds of the event is a linear function of X.

Log(odds of CHD) = - 6.08 + 0.0243(SBP)

ODDS

The odds of an event is the chance that the event occurs divided by the chance of its not occurring:

Odds = p/(1 - p) = p/q

: KEY PARAMETER OF THE LOGISTIC MODEL

p (X)

log ----------- = 0 + 1X 1 - p (X)

The parameter is like the slope of a linear regression model.

= 0 indicates that X has no effect on the probability, e.g., a man’s chance of CHD does not depend on his SBP.

1: KEY PARAMETER

p (X)

log ----------- = 0 + 1X

1 - p (X)

The coefficient 1 measures the amount of change in the log of the odds per unit change in X.

1: KEY PARAMETER

log odds(X+1) = 0 + 1(X+1)

= 0 + 1X+ 1

log odds(X) = 0 + 1X

Difference in log odds = 1

E.g., the log of the odds of getting CHD increases by 0.0243 for an increase of 1 mmHg of systolic blood pressure.

(Hard to explain to a patient!)

THE COEFFICIENT 1

AND THE ODDS RATIO

Difference in log odds given by 1

translates into the odds ratio (OR).

exp(1) = OR =

ratio of odds at risk level of X+1

to the odds when risk level is X

1 = 0 OR = 1.

THE COEFFICIENT 1 AND THE ODDS RATIO

For example, the odds of CHD are multiplied by the factor exp(0.0243) = 1.025 for every increase of 1 mmHg in SBP.

A difference of 10 mmHg multiplies the odds of CHD by (1.025)10, or 1.275.

ESTIMATION OF THE PARAMETERS

Technique:

Maximum likelihood estimation

For large sample sizes, the normal distribution is used to put a confidence interval around the estimate of the coefficient .

HYPOTHESIS TESTING

Ho: 1 = 0

No difference in risk at different levels of the risk factor X.

No association between risk factor X and probability of occurrence.

HYPOTHESIS TESTING

Ha: 1 =/= 0 or

1 > 0 (risk increases with X) or

1 < 0 (risk goes down as X increases)

HYPOTHESIS TESTING

Ho: OR = 1

Ha: OR =/= 1 or

OR > 1 (risk increases with X) or

OR < 1 (X is protective)

RESULTS OF LOGISTIC REGRESSION

OR with confidence interval and p value indicate whether there is a significant association between level of the risk factor and chance of occurrence

OR = 1.025 (1.015, 1.034), p < 0.001

RESULTS OF LOGISTIC REGRESSION

Can be used to predict an individual’s risk:

prob. of CHD when SBP = 180:

p/q = exp{-6.082 + 0.0243(180)}

Solve for p:

prob. of CHD = 0.125

MULTIVARIATE LOGISTIC REGRESSION

Model with additional risk factors:

p (X)

log ----------- = 0 + 1X + 2X 1 - p (X)

Log(odds of CHD) =

0+ 1(SBP) + 2(CHOL) + 3(smoker)

top related