Assessing Binary Outcomes: Logistic Regression

Post on 13-Feb-2016

56 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Statistics for Health Research. Assessing Binary Outcomes: Logistic Regression. Peter T. Donnan Professor of Epidemiology and Biostatistics. Objectives of Session . Understand what is meant by a binary outcome How analyses of binary outcomes implemented in logistic regression model - PowerPoint PPT Presentation

Transcript

Assessing Binary Outcomes: Logistic

Regression Peter T. Donnan

Professor of Epidemiology and Biostatistics

Statistics for Health ResearchStatistics for Health Research

Objectives of Session Objectives of Session

•Understand what is meant by a Understand what is meant by a binary outcomebinary outcome

•How analyses of binary outcomes How analyses of binary outcomes implemented in logistic implemented in logistic regression model regression model

•Understand when a logistic model Understand when a logistic model is appropriateis appropriate

•Be able to implement in SPSS and Be able to implement in SPSS and •Interpret logistic model outputInterpret logistic model output

Binary OutcomeBinary Outcome

Extremely common in health Extremely common in health research:research:•Dead / AliveDead / Alive•Hospitalisation (Yes / No)Hospitalisation (Yes / No)•Diagnosis of diabetes (Yes / No)Diagnosis of diabetes (Yes / No)•Met target e.g. total cholesterol < 5.0 Met target e.g. total cholesterol < 5.0 mmol/l (Yes / No)mmol/l (Yes / No)n.b. Can use any code such as 1 / 2 but mathematically n.b. Can use any code such as 1 / 2 but mathematically easier to use 0 / 1easier to use 0 / 1

How is relationship How is relationship formulated?formulated?

For linear simplest equation For linear simplest equation is :is :

iebxay

y is the outcome; a is the y is the outcome; a is the intercept;intercept;b is the slope related to x the b is the slope related to x the explanatory variable and;explanatory variable and;e is the error term or random e is the error term or random ‘noise’‘noise’

Can we fit y as a Can we fit y as a probability range 0 to probability range 0 to

1?1?iebxay

Not quite! Not quite! Y as continuous can take any value from -Y as continuous can take any value from -∞ to + ∞ to + ∞∞Outcome is a probability of event, Outcome is a probability of event, ΠΠ (or p) on (or p) on scale 0 – 1 scale 0 – 1 Certain transformations of p can give the Certain transformations of p can give the required scalerequired scaleProbit is a normal transformation of pProbit is a normal transformation of pBut not easy to interpret results But not easy to interpret results

We can now fit p as a probability range 0 We can now fit p as a probability range 0 to 1 to 1 And y in range -∞ to + ∞And y in range -∞ to + ∞

iebxa)p(itlogy

The logit transformation The logit transformation works! works!

iebxa

pp

1

log

Logistic Regression ModelLogistic Regression Model

This has very useful propertiesThis has very useful propertiesThe term p/(1-p) is called the ‘Odds’ of an The term p/(1-p) is called the ‘Odds’ of an eventeventNote: not the same as the probability of an Note: not the same as the probability of an event pevent pIf x is binary coded 0/1 then - If x is binary coded 0/1 then -

exp (b) = ODDS RATIOexp (b) = ODDS RATIOfor the outcome in those coded 1 relative to for the outcome in those coded 1 relative to code 0 code 0 e.g. Odds of death in men (1) vs. women (0)e.g. Odds of death in men (1) vs. women (0)

iebxa

pp

1

log

Logistic Regression ModelLogistic Regression Model

Consider the LDL data. Consider the LDL data. It has two binary outcomes –It has two binary outcomes –1)1)LDL target achievedLDL target achieved2)2)Chol target achieved Chol target achieved For example consider gender as For example consider gender as a predictor – Male = 1 & Female a predictor – Male = 1 & Female = 2= 2

For a binary x we can express For a binary x we can express results as odds ratios (available in results as odds ratios (available in

crosstabs)crosstabs)

140140 563563

149149 531531

No Yes

Male

Female

LDL target achieved

Odds yes Odds yes = = 563/140563/140Odds yes Odds yes = = 531/149531/149

Odds ratio = 4.02 / 3.56Odds ratio = 4.02 / 3.56OR = 0.886 Female cf MaleOR = 0.886 Female cf Male

140140 563563

149149 531531

No Yes

Male

Female

LDL target achieved

Odds yes Odds yes = = 563/140563/140= = 4.024.02Odds yes Odds yes = = 531/149531/149= = 3.563.56

N.b. Odds is different to prob – Men p = 563/(140+563) = 0.80 or 80%

Odds ratio from Odds ratio from CrosstabsCrosstabs

Obtain odds ratios for 2 x 2 Obtain odds ratios for 2 x 2 tables from crosstabs and select tables from crosstabs and select option ‘risk’option ‘risk’

Results from CrosstabsResults from Crosstabs

Odds ratios for achieving LDL Odds ratios for achieving LDL target in females vs. malestarget in females vs. males

n.b. OR given for Female vs male = 0.886

Fit Logistic Regression Fit Logistic Regression ModelModel

DependentDependent is binary outcome – is binary outcome – LDL target met (Yes = 1, No = 0)LDL target met (Yes = 1, No = 0)IndependentIndependent – Gender 1 = M, 2 = F – Gender 1 = M, 2 = FShould get same as the crosstabs Should get same as the crosstabs result result Select Analyze / Regression / Binary Select Analyze / Regression / Binary LogisticLogisticSelect option of 95% CI for exp (b)Select option of 95% CI for exp (b)

Regression / Regression / Binary Binary

logistic…..logistic…..

Odds ratio from logistic Odds ratio from logistic model results for a binary model results for a binary

predictorpredictor

EXP (B) = Odds ratio F vs. MEXP (B) = Odds ratio F vs. MNote that OR for Men vs Note that OR for Men vs Women = 1/0.886 = 1.13Women = 1/0.886 = 1.13

Fit Logistic Regression Fit Logistic Regression Model – continuous Model – continuous

predictorpredictorDependentDependent is binary outcome – is binary outcome – LDL target metLDL target metIndependentIndependent – Continuous predictor – Continuous predictor – Adherence– AdherenceB represents the change in the ODDS B represents the change in the ODDS RATIO for a 1 unit increase in adherenceRATIO for a 1 unit increase in adherenceB x 10 represents the change in the B x 10 represents the change in the ODDS RATIO for a 10 unit increase in ODDS RATIO for a 10 unit increase in adherenceadherence

Odds ratio from logistic Odds ratio from logistic model results for a model results for a

continuous continuous

EXP (B) = Odds ratio for 1% increase in EXP (B) = Odds ratio for 1% increase in AdherenceAdherenceOR for 10% increase is exp(10 x 0.010) = 1.105 OR for 10% increase is exp(10 x 0.010) = 1.105 i.e. a 10.5% increase in odds of i.e. a 10.5% increase in odds of meeting LDL target for each 10% meeting LDL target for each 10% increase in adherenceincrease in adherence

Fit Logistic Regression Fit Logistic Regression Model – categorical Model – categorical

predictorpredictorDependentDependent is binary outcome – is binary outcome – LDL target metLDL target metIndependentIndependent – APOE genotype (1 – – APOE genotype (1 – 6)6)Choose a reference category, in this case Choose a reference category, in this case worst outcome is genotype 6 so choose 6 worst outcome is genotype 6 so choose 6 to give ORs > 1to give ORs > 1B represents the OR for each category B represents the OR for each category relative to the reference categoryrelative to the reference category

Regression / Regression / Binary Binary

logistic…..logistic….. Choose Categorical

Odds ratios from logistic Odds ratios from logistic model results for a model results for a

categorical predictorcategorical predictor

EXP (B) = Odds ratio EXP (B) = Odds ratio for APOE (2) vs APOE for APOE (2) vs APOE (6) OR = 4.381 (6) OR = 4.381 (95% CI 1.742, 11.021)(95% CI 1.742, 11.021)

Epidemiological Epidemiological DesignsDesigns

• Logistic model common in Logistic model common in epidemiological researchepidemiological research

• In case-control designs, case is coded 1 In case-control designs, case is coded 1 and controls as 0 and used as and controls as 0 and used as dependent variabledependent variable

• In cohort study outcome (e.g. death) is In cohort study outcome (e.g. death) is used as binary outcome in logistic used as binary outcome in logistic modelmodel

• Note in cohort study exp(b) is Relative Note in cohort study exp(b) is Relative Risk (RR) rather than OR Risk (RR) rather than OR

Definition- Clinical Definition- Clinical Prediction RulePrediction Rule

• Clinical tool that quantifies Clinical tool that quantifies contribution of:contribution of:– HistoryHistory– ExaminationExamination– Diagnostic testsDiagnostic tests

• Stratify patients according to Stratify patients according to probability of having target disorderprobability of having target disorder

• Outcome can be in terms of diagnosis, Outcome can be in terms of diagnosis, prognosis, referral or treatmentprognosis, referral or treatment

Thresholds for decision Thresholds for decision makingmaking

Diagnosis / test threshold

Test / reassurance threshold

Derived Derived Probability Probability of diseaseof disease

100%

0%

TreatmentTreatment

Further diagnostic Further diagnostic testingtesting

ReassuranceReassurance

Ottawa ankle ruleOttawa ankle rule

Identify high Identify high risk through risk through ‘risk ‘risk stratification’ stratification’ andandIntervene Intervene through case through case management at management at highest riskhighest risk

Risk StratificationRisk StratificationKaiser-Permanente Kaiser-Permanente

PyramidPyramid

Framingham Risk Framingham Risk AlgorithmAlgorithm

• Prediction of Prediction of risk: risk: CardiovasculaCardiovascular r (Framingham)(Framingham)

55 yr-old woman 15-20% 5 yr risk

Increasing appearance of “prediction Increasing appearance of “prediction models” in literature (ISI Web of models” in literature (ISI Web of

Knowledge v3) Knowledge v3)

Stages of development and Stages of development and assessment of a CPRassessment of a CPR

Cross Cross SectionalSectionaloror

CohortCohortRandomized Randomized Controlled Controlled TrialTrial

Cross Cross SectionalSectionaloror

CohortCohort

Step 1 DerivationIdentification of factors with predictive power

Step 2 ValidationEvidence of reproducible accuracyApplication of a rule in similar clinical settings and population or better still multiple clinical settings and different populations with varying prevalence and outcomes of disease

Step 3 Impact AnalysisEvidence that rule changes physician behaviour and improves patient outcomes and /or reduces costs

How to derive a How to derive a CPR?CPR?

1.1. Toss a coin to make Toss a coin to make decision?decision?

2.2. Individual opinion and Individual opinion and experience?experience?

3.3. Huddle of wise ones – Huddle of wise ones – Delphi technique to reach Delphi technique to reach consensus?consensus?

4.4. Statistical prediction Statistical prediction models !models !

Regression Models for Regression Models for predictionprediction

• In all of these models we In all of these models we combine a set of factors:combine a set of factors:

Usually between 2-20 predictorsUsually between 2-20 predictorsOccam’s razor suggests smaller is betterOccam’s razor suggests smaller is better

• Fit a multiple regression Fit a multiple regression modelmodel

• Extract probabilities of Extract probabilities of outcome or diagnosisoutcome or diagnosis

• Create CPRCreate CPR

Regression Models Regression Models for predictionfor prediction

• Linear if outcome Linear if outcome continuouscontinuous

• Binary OutcomesBinary OutcomesLogistic regression model Logistic regression model Survival models – Cox PH, Survival models – Cox PH, Weibull, log logistic, etcWeibull, log logistic, etc

• Ordinal or nominal Ordinal or nominal outcomesoutcomesOrdinal logistic regressionOrdinal logistic regression

We can now fit p as a probability range 0 We can now fit p as a probability range 0 to 1 to 1 And y in range -∞ to + ∞And y in range -∞ to + ∞

iebxa)p(itlogy

The logit The logit transformation transformation

iebxa

pp

1

log

Statistical prediction Statistical prediction ModelsModels

Logistic regression model:Logistic regression model:

.....+xβ+xβ+β=)p-1plog( 22110

p= probability of the Event p= probability of the Event and effect of factors (x) and effect of factors (x) increase or decrease risk of increase or decrease risk of this eventthis event

Derivation of Derivation of probability of eventsprobability of eventsLogistic regression model:Logistic regression model:

.....+xβ+xβ+β=)p-1plog( 22110

Call Call Linear Predictor Linear Predictor as a linear as a linear function of the predictors xfunction of the predictors x11, x, x22, , xx33, etc…., etc….

.....xβxββX22110

Derivation of Derivation of probability of eventsprobability of eventsThen:Then: X)

p-1plog(

Take exp of both sides :Take exp of both sides :

)Xexp()p-1p(

Derivation of Derivation of probability of eventsprobability of events

Then rearrange:Then rearrange:

)Xexp(11p

Or:Or:)Xexp(1)Xexp(p

Example:Example:PEONY model to predict risk of PEONY model to predict risk of emergency admission to hospital over emergency admission to hospital over the next yearthe next yearNow implemented in NHS Tayside as Now implemented in NHS Tayside as part of Virtual Wards management of part of Virtual Wards management of LTCLTCPEONY II model developed – watch this PEONY II model developed – watch this space!space!Donnan et al Arch Int Med 2008Donnan et al Arch Int Med 2008

Risk Stratification Risk Stratification based on derived based on derived

probabilitiesprobabilities

Other binary modelsOther binary models

The logistic model is only applicable The logistic model is only applicable whenever the length of follow-up is whenever the length of follow-up is same for each individual e.g. 5-yr same for each individual e.g. 5-yr follow-up of a cohortfollow-up of a cohortFor binary outcomes where For binary outcomes where censoring occurs i.e. people leave censoring occurs i.e. people leave the cohort from death or migration the cohort from death or migration then length of follow-up varies and then length of follow-up varies and need to use need to use survival models survival models such as such as Cox Proportional Hazards modelCox Proportional Hazards model

SummarySummary• Logistic model easily fitted in Logistic model easily fitted in

SPSSSPSS• Clear link with ODDS RATIOSClear link with ODDS RATIOS• Common model for case-control, Common model for case-control,

cohort studies as well as cohort studies as well as development of clinical prediction development of clinical prediction modelsmodels

General General ReferencesReferences

• Campbell MJ, Machin D. Campbell MJ, Machin D. Medical Statistics. A Medical Statistics. A commonsense approach.commonsense approach. 3 3rdrd ed. Wiley, New York, ed. Wiley, New York, 1999.1999.

• Hosmer DW and Lemeshow S. Hosmer DW and Lemeshow S. Applied logistic Applied logistic regression. regression. John Wiley& sons, New Jersey, 2000. John Wiley& sons, New Jersey, 2000.

• Altman DG. Altman DG. Practical statistics for medical researchPractical statistics for medical research. . London: Chapman and Hall, 1991.London: Chapman and Hall, 1991.

• Armitage P and Berry G. Armitage P and Berry G. Statistical Methods in Statistical Methods in Medical researchMedical research. 3. 3rdrd ed. Oxford: Blackwell ed. Oxford: Blackwell Scientific, 1994.Scientific, 1994.

• Agresti A. Agresti A. An introduction to Categorical Data An introduction to Categorical Data Analysis. Analysis. Wiley, New York, 1996.Wiley, New York, 1996.

Practical: Fit Multiple Practical: Fit Multiple Logistic Regression ModelLogistic Regression Model

DependentDependent is binary outcome – is binary outcome – LDL target met (Yes = 1, No = 0)LDL target met (Yes = 1, No = 0)IndependentIndependent – Gender 1 = M, 2 = F, – Gender 1 = M, 2 = F, add APOE, adherence, etcadd APOE, adherence, etcRemember Remember Select Analyze / Regression / Select Analyze / Regression / Binary LogisticBinary LogisticSelect option of 95% CI for exp (b)Select option of 95% CI for exp (b)

3) Screening for variables to 3) Screening for variables to eliminateeliminate

• Consider screening procedures to Consider screening procedures to eliminate a number of variables eliminate a number of variables under consideration under consideration

• Test each variable separatelyTest each variable separately• If p > 0.3 then they would have to be If p > 0.3 then they would have to be

very strong confounders to become very strong confounders to become significant on adjustment in a significant on adjustment in a multiple regression so could be multiple regression so could be discardeddiscarded

• Hosmer-Lemeshow criteriaHosmer-Lemeshow criteria

4) A mixture of automatic 4) A mixture of automatic procedures and self procedures and self

selectionselection• Use automatic procedures as a guideUse automatic procedures as a guide• Compare stepwise and backward Compare stepwise and backward

elimination elimination • Think about what factors are importantThink about what factors are important• Add ‘important’ factorsAdd ‘important’ factors• Do not follow blindly statistical Do not follow blindly statistical

significancesignificance

Remember Occam’s Razor‘‘Entia non sunt Entia non sunt multiplicanda multiplicanda praeter praeter necessitatem’necessitatem’‘‘Entities must not be Entities must not be multiplied beyond multiplied beyond necessity’necessity’

William of Ockham 14th century Friar and logician1288-1347

top related