Top Banner
Logistic Regression Logistic Regression Dr Mike Blyth Dr Mike Blyth February 2006 February 2006
34

Logistic regression (blyth 2006) (simplified)

Jan 24, 2015

Download

Health & Medicine

MikeBlyth

An introduction to logistic regression for physicians, public health students and other health workers. Logistic regression is a way to look at effect of a numeric independent variable on a binary (yes-no) dependent variable. For example, you can analyze or model the effect of birth weight on survival.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Logistic regression (blyth 2006) (simplified)

Logistic RegressionLogistic Regression

Dr Mike BlythDr Mike BlythFebruary 2006February 2006

Page 2: Logistic regression (blyth 2006) (simplified)

Logistic RegressionLogistic Regression

A way to look at effect ofA way to look at effect of– ““Numeric” (interval or ratio) independent Numeric” (interval or ratio) independent

variable variable OnOn

– BinaryBinary (yes-no) dependent variable (yes-no) dependent variable

Page 3: Logistic regression (blyth 2006) (simplified)

Dependent variable is continuous Dependent variable is continuous intervalinterval or or ratio ratio (numeric)(numeric)Independent variables are also interval or Independent variables are also interval or ratioratioExamplesExamples– Effect of weight on blood pressureEffect of weight on blood pressure– Effect of drug dose on reticulocyte countEffect of drug dose on reticulocyte count

Review Linear RegressionReview Linear Regression

Page 4: Logistic regression (blyth 2006) (simplified)

Linear RegressionLinear Regression

Independent Variable Dependent Variable

Page 5: Logistic regression (blyth 2006) (simplified)

Logistic RegressionLogistic Regression

Independent Variable Dependent Variable

Page 6: Logistic regression (blyth 2006) (simplified)

Logistic RegressionLogistic Regression

Dependent variable is binary (yes/no) outcome.Dependent variable is binary (yes/no) outcome.

Independent variables are continuous interval Independent variables are continuous interval

Examples:Examples:– Relation of weight and BP to 10 year risk of deathRelation of weight and BP to 10 year risk of death

– Relation of CD4 count to 1 year risk of AIDS diagnosisRelation of CD4 count to 1 year risk of AIDS diagnosis

Page 7: Logistic regression (blyth 2006) (simplified)

Why do we need it?Why do we need it?Could use categorical analysis such as frequency tableCould use categorical analysis such as frequency table

AIDSAIDS No AIDSNo AIDS

CD4 > 350CD4 > 350 8080 2020

150 < CD4 < 350150 < CD4 < 350 5050 5050

CD4 < 150CD4 < 150 2020 8080

• Problems

a) some information is lost when we collapse the numeric data into categories. This leads to loss of power.

b) no estimate of magnitude of relation

Page 8: Logistic regression (blyth 2006) (simplified)

Odds RatioOdds Ratio

Probability: Probability: p = probability of eventp = probability of event1 - p = probabilty of 1 - p = probabilty of notnot the event (also called q) the event (also called q)p varies from 0 to 1p varies from 0 to 1

OddsOdds– Ratio of probability of event to probability of not Ratio of probability of event to probability of not

having the event: Odds = p/(1 - p)having the event: Odds = p/(1 - p)– When p = 0.5, odds = 1 (or “1:1 odds”)When p = 0.5, odds = 1 (or “1:1 odds”)– When p = 0.1, odds = 0.1/0.9 = 0.11When p = 0.1, odds = 0.1/0.9 = 0.11

Page 9: Logistic regression (blyth 2006) (simplified)

Log Odds RatioLog Odds RatioThe log odds ratio (also called “logit”) is simply the natural The log odds ratio (also called “logit”) is simply the natural logarithm of the odds ratio:logarithm of the odds ratio:¤ logit logit = ln(odds ratio) = ln(odds ratio)

= ln(p/(1-p))= ln(p/(1-p))= ln(p) – ln(1-p)= ln(p) – ln(1-p)

ln (1) = 0, so logit is 0 when odds are 1:1, or ln (1) = 0, so logit is 0 when odds are 1:1, or probability = 50%probability = 50%

The logit for event of probability p is the opposite of the logit The logit for event of probability p is the opposite of the logit for the probability of not having the event. for the probability of not having the event.

Page 10: Logistic regression (blyth 2006) (simplified)

Relation between probability p and logit

0.000

0.250

0.500

0.750

1.000

-8 -6 -4 -2 0 2 4 6 8

logit = ln[p/(1-p)]

Page 11: Logistic regression (blyth 2006) (simplified)

Logistic regression modelLogistic regression model

The linear regression model with one variable The linear regression model with one variable isisy = a + bx + ey = a + bx + e

The logistic regression model with one The logistic regression model with one variable isvariable islogit = a + bx + elogit = a + bx + ewherewhere

logit = ln(p/(1-p))logit = ln(p/(1-p))

Page 12: Logistic regression (blyth 2006) (simplified)

The logistic regression model with one The logistic regression model with one variable isvariable islogit = a + bx logit = a + bx where logit = ln(p/(1-p))where logit = ln(p/(1-p))

In other words, the model says the odds of the event In other words, the model says the odds of the event happening are happening are – A constant factor (a)A constant factor (a)– Some other constant (b) Some other constant (b) – times a numeric risk factor (x) (for example, SBP)times a numeric risk factor (x) (for example, SBP)

Logistic regression modelLogistic regression model

Page 13: Logistic regression (blyth 2006) (simplified)

Logistic regression modelLogistic regression model

Given value of the independent variables, the Given value of the independent variables, the regression equation predicts the regression equation predicts the

Log Odds RatioLog Odds Ratio

Page 14: Logistic regression (blyth 2006) (simplified)

Logistic regression modelLogistic regression model

The statistics program calculates the The statistics program calculates the coefficient bcoefficient b

The The coefficient bcoefficient b shows how much the odds shows how much the odds ratio changes with a change in the ratio changes with a change in the independent variableindependent variable

Positive b Positive b higher risk with higher values higher risk with higher values

Negative b Negative b lower risk with higher values lower risk with higher values

Page 15: Logistic regression (blyth 2006) (simplified)

Logistic regression modelLogistic regression model

Hypothetical example given above examining relation of BP to Hypothetical example given above examining relation of BP to risk of stroke/death. The model predicts:risk of stroke/death. The model predicts:

ln(odds ratio) = constant + b ln(odds ratio) = constant + b ∙ SBPSBP

ee(ln odds ratio) (ln odds ratio) = e= e(c + b (c + b ∙ SBP)SBP)

Odds Ratio Odds Ratio = = ee(c + b(c + b∙SBP)SBP)

= = eec c ∙ e e(b(b∙SBP)SBP)

Page 16: Logistic regression (blyth 2006) (simplified)

Logistic regression modelLogistic regression model

The coefficient b shows how much the odds ratio The coefficient b shows how much the odds ratio changes with a change in the independent variablechanges with a change in the independent variable

Odds Ratio Odds Ratio = = eec c ∙ e e(bx)(bx)

In other words, In other words,

Odds Ratio Odds Ratio = = somethingsomething ∙ (e(ebb))(x) (x)

Page 17: Logistic regression (blyth 2006) (simplified)

Logistic regression modelLogistic regression model

Odds Ratio Odds Ratio = constant = constant ∙ ((eebb))(x) (x)

So So eebb is the factor indicating effect of x on the is the factor indicating effect of x on the event.event.

Each one unit change in x will multiply the odds Each one unit change in x will multiply the odds ratio by a factor of eratio by a factor of eb b ..

Page 18: Logistic regression (blyth 2006) (simplified)

Logistic regression modelLogistic regression model

Odds Ratio Odds Ratio = constant = constant ∙ ( (eebb))(x) (x)

– Suppose b = 0.693 so eSuppose b = 0.693 so ebb = 2 = 2– A one-unit change in x will A one-unit change in x will doubledouble the odds ratio the odds ratio

– Suppose b = -0.693 so eSuppose b = -0.693 so ebb = 0.5 = 0.5– A one-unit change in x will A one-unit change in x will halvehalve the odds ratio. the odds ratio.

– If b = 0, eIf b = 0, ebb = 1, and x has no effect on OR = 1, and x has no effect on OR

Page 19: Logistic regression (blyth 2006) (simplified)

Logistic regression modelLogistic regression model

For the hypothetical example above, the report is For the hypothetical example above, the report is given by Epi Info as given by Epi Info as

TermTerm Odds Odds RatioRatio

95% CI95% CI CoeffCoeff S. E.S. E. ZZ PP

BPBP 1.05971.0597 1.022 1.022 1.0981.098 0.05790.0579 0.01850.0185 3.1313.131 0.00170.0017

ConstConst ** ** ** -7.201-7.201 2.29942.2994 3.1313.131 0.00170.0017

Page 20: Logistic regression (blyth 2006) (simplified)

Logistic regression modelLogistic regression model

TermTerm Odds RatioOdds Ratio 95% CI95% CI CoefficientCoefficient S. E.S. E. ZZ P-valueP-value

BPBP 1.05971.0597 1.022 1.022 1.0981.098 0.05790.0579 0.0180.018 3.1313.131 0.00170.0017

ConstantConstant ** ** ** -7.2014-7.2014 2.2992.299 3.1313.131 0.00170.0017

Coefficient, or beta, or b, is the slope or magnitude of the effect.

Page 21: Logistic regression (blyth 2006) (simplified)

Logistic regression modelLogistic regression model

TermTerm Odds Odds RatioRatio

95% CI95% CI CoefficientCoefficient S. E.S. E. ZZ P-valueP-value

BPBP 1.05971.0597 1.0220 1.0220 1.09871.0987 0.05790.0579 0.01850.0185 3.13193.1319 0.00170.0017

ConstantConstant ** ** ** -7.2014-7.2014 2.29942.2994 3.13193.1319 0.00170.0017

Odds ratio for one unit change in the independent variable (e.g. BP). This is the calculated eb

eb

A one unit change in BP multiplies the odds ratio by 1.0597.

Page 22: Logistic regression (blyth 2006) (simplified)

Logistic regression modelLogistic regression model

TermTerm Odds RatioOdds Ratio 95% CI95% CI CoeffCoeff S. E.S. E. ZZ P-valueP-value

BPBP 1.05971.0597 1.022 1.022 1.0981.098 0.05790.0579 0.01850.0185 3.13193.1319 0.00170.0017

ConstantConstant ** ** ** -7.2014-7.2014 2.29942.2994 3.13193.1319 0.00170.0017

95% confidence interval for that odds ratio.

The confidence interval does not include 1, so the effect is statistically significant

Page 23: Logistic regression (blyth 2006) (simplified)

Using more than one independent Using more than one independent variablevariable

Single variable:Single variable:logit = c + bxlogit = c + bx

OR = c’ (e∙OR = c’ (e∙ bb))xx

Multiple variables:Multiple variables:logit = c + blogit = c + b11xx1 1 + b+ b22xx2 2 + … + b+ … + bnnxxnn

OR = c’ (e∙OR = c’ (e∙ b1b1))x1 x1 (e∙ (e∙ b2b2))x2 x2 … (e∙ ∙ … (e∙ ∙ bnbn))xnxn

Note that the terms Note that the terms multiplymultiply their effect on their effect on odds ratio.odds ratio.

Page 24: Logistic regression (blyth 2006) (simplified)

Using more than one independent Using more than one independent variablevariable

Analysis reports a b coefficient for each Analysis reports a b coefficient for each independent variable.independent variable.

That coefficient is the effect of the given That coefficient is the effect of the given independent variable, separated from the independent variable, separated from the effects of all the other independent variables.effects of all the other independent variables.

Page 25: Logistic regression (blyth 2006) (simplified)

Real Life ExampleReal Life Example

Prospective cohort study of causes of Prospective cohort study of causes of cardiac disease: Evans County Study 1965cardiac disease: Evans County Study 1965

Independent variables = age, gender, Independent variables = age, gender, race, social index, SBP, diabetes, smoking, race, social index, SBP, diabetes, smoking, cholesterol, and an obesity indexcholesterol, and an obesity index

Dependent variable = risk of dying during Dependent variable = risk of dying during 10 year period10 year period

Page 26: Logistic regression (blyth 2006) (simplified)

VariableVariable RangeRange b coeffb coeff SESE pp

ConstantConstant -6.376-6.376 1.6341.634 <0.001<0.001

AgeAge 40-69 y40-69 y 0.0860.086 0.1150.115 <0.001<0.001

GenderGender 0=m, 1=f0=m, 1=f 1.5001.500 0.9670.967 0.1210.121

Age x genderAge x gender -0.043-0.043 0.0170.017 0.0110.011

Social indexSocial index 20-8420-84 -0.056-0.056 0.0400.040 0.1600.160

(Soc ind)(Soc ind)22 400-7056400-7056 0.00060.0006 0.00030.0003 0.0820.082

SBPSBP 88-31088-310 0.0190.019 0.0020.002 <0.001<0.001

DiabetesDiabetes 0=n, 1=y0=n, 1=y 1.1231.123 0.2610.261 <0.001<0.001

SmokingSmoking 0=n, 1=y0=n, 1=y 0.3170.317 0.1570.157 0.0430.043

CholesterolCholesterol 94-54694-546 0.00310.0031 0.00150.0015 0.0410.041

QuartletQuartlet 2.11-8.762.11-8.76 -1.064-1.064 0.4320.432 0.0140.014

(Quartlet)(Quartlet)22 4.44-76.84.44-76.8 0.1120.112 0.0490.049 0.0220.022

Cited in Kelsey et al., Methods in Observational Epidemiology, 1986

Page 27: Logistic regression (blyth 2006) (simplified)

VariableVariable RangeRange b coeffb coeff SESE pp

ConstantConstant -6.376-6.376 1.6341.634 <0.001<0.001

AgeAge 40-69 y40-69 y 0.0860.086 0.1150.115 <0.001<0.001

GenderGender 0=m, 1=f0=m, 1=f 1.5001.500 0.9670.967 0.1210.121

Age x genderAge x gender -0.043-0.043 0.0170.017 0.0110.011

Social indexSocial index 20-8420-84 -0.056-0.056 0.0400.040 0.1600.160

(Soc ind)(Soc ind)22 400-7056400-7056 0.00060.0006 0.00030.0003 0.0820.082

SBPSBP 88-31088-310 0.0190.019 0.0020.002 <0.001<0.001

DiabetesDiabetes 0=n, 1=y0=n, 1=y 1.1231.123 0.2610.261 <0.001<0.001

SmokingSmoking 0=n, 1=y0=n, 1=y 0.3170.317 0.1570.157 0.0430.043

CholesterolCholesterol 94-54694-546 0.00310.0031 0.00150.0015 0.0410.041

QuartletQuartlet 2.11-8.762.11-8.76 -1.064-1.064 0.4320.432 0.0140.014

(Quartlet)(Quartlet)22 4.44-76.84.44-76.8 0.1120.112 0.0490.049 0.0220.022

Page 28: Logistic regression (blyth 2006) (simplified)

Statistical SignificanceStatistical Significance

The p value indicates statistical significanceThe p value indicates statistical significance

Age is positively correlated with risk of deathAge is positively correlated with risk of death

Gender has positive b coefficient, but the p value Gender has positive b coefficient, but the p value is 0.12, indicating that we cannot say that there is is 0.12, indicating that we cannot say that there is a significant relationship.a significant relationship.

VariableVariable RangeRange b coeffb coeff SESE pp

AgeAge 40-69 y40-69 y 0.0860.086 0.1150.115 <0.001<0.001

GenderGender 0=m, 1=f0=m, 1=f 1.5001.500 0.9670.967 0.1210.121

Page 29: Logistic regression (blyth 2006) (simplified)

Dichotomous (yes-no) variablesDichotomous (yes-no) variables

Gender is coded as 0 for male, 1 for femaleGender is coded as 0 for male, 1 for female

eebb [e [e1.5 1.5 = 4.48] is change in OR for 1 unit change in = 4.48] is change in OR for 1 unit change in gender, i.e. OR for females relative to malesgender, i.e. OR for females relative to males

eebb for any dummy variable (coded 0-1) is the adjusted for any dummy variable (coded 0-1) is the adjusted OR for that risk factor, since “1 unit of change” = OR for that risk factor, since “1 unit of change” = presence vs. absence of risk factorpresence vs. absence of risk factor

VariableVariable RangeRange b coeffb coeff SESE pp

ConstantConstant -6.376-6.376 1.6341.634 <0.001<0.001

AgeAge 40-69 y40-69 y 0.0860.086 0.1150.115 <0.001<0.001

GenderGender 0=m, 1=f0=m, 1=f 1.5001.500 0.9670.967 0.1210.121

Page 30: Logistic regression (blyth 2006) (simplified)

Squared termsSquared terms

Social index squared is included as well as Social index squared is included as well as social index itself.social index itself.

Squared terms allow for curvilinear Squared terms allow for curvilinear relationships, just as in ordinary relationships, just as in ordinary regressionregression

VariableVariable RangeRange b coeffb coeff SESE pp

Age x genderAge x gender -0.043-0.043 0.0170.017 0.0110.011

Social indexSocial index 20-8420-84 -0.056-0.056 0.0400.040 0.1600.160

(Soc ind)(Soc ind)22 400-7056400-7056 0.00060.0006 0.00030.0003 0.0820.082

Page 31: Logistic regression (blyth 2006) (simplified)

Interaction termsInteraction terms

Age and gender are entered into model as Age and gender are entered into model as separate termsseparate terms

Age x gender included to see whether age Age x gender included to see whether age has different effect in males than in has different effect in males than in females. females.

VariableVariable RangeRange b coeffb coeff SESE pp

AgeAge 40-69 y40-69 y 0.0860.086 0.1150.115 <0.001<0.001

GenderGender 0=m, 1=f0=m, 1=f 1.5001.500 0.9670.967 0.1210.121

Age x genderAge x gender M: 0-0M: 0-0

F: 40-69F: 40-69-0.043-0.043 0.0170.017 0.0110.011

Page 32: Logistic regression (blyth 2006) (simplified)

InterpretationInterpretation

With binary, dummy variables, eWith binary, dummy variables, ebb is the odds ratio. is the odds ratio. You can compare the strength (slope) of the effect by You can compare the strength (slope) of the effect by comparing b.comparing b.

With numeric variables, b is not a direct measure of With numeric variables, b is not a direct measure of strength of effect. strength of effect. – Example: b is quite small in effect of BP on mortality, Example: b is quite small in effect of BP on mortality,

because it is the effect of only because it is the effect of only one mmHgone mmHg change in BP. BP change in BP. BP is still an important factor in mortality because there is a is still an important factor in mortality because there is a wide wide rangerange in the BP. in the BP.

Page 33: Logistic regression (blyth 2006) (simplified)

InterpretationInterpretation

In a prospective cohort study we can use In a prospective cohort study we can use logistic regression model to predict logistic regression model to predict probability probability of the event given the independent variables. of the event given the independent variables. Also can derive relative risk.Also can derive relative risk.

In a cross sectional study we only have the In a cross sectional study we only have the odds ratio.odds ratio.

Page 34: Logistic regression (blyth 2006) (simplified)

Selection of variablesSelection of variables

Same principle as with ordinary regressionSame principle as with ordinary regression

Forward selection: add one variable at a time Forward selection: add one variable at a time until there are no more that make a significant until there are no more that make a significant differencedifference

Backward selection: start with all, remove one Backward selection: start with all, remove one at a time to see if they made a significant at a time to see if they made a significant contributioncontribution

EPI Info has suggestions on how to do thisEPI Info has suggestions on how to do this