Introduction to Propensity Score Analysis

Introduction to Propensity Score

AnalysisChencan Zhu

Biostatistician, Biostatistical Consulting Core

Introduction to propensity score analysis

Outline:

Randomized controlled trial (RCT)

Observational studies & issues

Methods of accounting for confounding

Propensity score methods

Sensitivity analysis

Example

OUTLINE


Randomized Controlled Trial


Definition: A randomized trial is “a trial having a

parallel treatment design in which treatment

assignment for persons (treatment units) enrolled

is determined by a randomization process”.

RCT

Ref: https://library.downstate.edu/EBM2/2200.htm


Each subject receives either control or treatment:

T=0 vs. T=1

Each subject has a pair of potential outcomes:

• Y(0): outcome under control

• Y(1): outcome under treatment

We only observe Y, the outcome under the actual

control/treatment received.

RCT


What is an RCT estimating?

Under randomization we have that:

𝐸 𝑌 1 − 𝑌 0 = 𝐸 𝑌|𝑇 = 1 − 𝐸 𝑌|𝑇 = 0 .

Therefore, in RCTs one can obtain an

unbiased estimate of the average effect of

the treatment, at the population level.

RCT


Estimation of treatment effect: outcomes can be

compared directly between treatment arms.

• Continuous outcomes:

o Difference in means

• Count outcomes:

o Relative risks

• Time-to-event outcomes:

o Unadjusted survival curves

o Median survival time

ESTIMATION OF TREATMENT EFFECT


The absolute risk reduction (ARR) is the

reduction in the probability of the outcome

due to treatment.

𝐴𝑅𝑅 = 𝑝𝐶 − 𝑝𝑇

The number needed to treat (NTT) is the

number of subjects that one must treat to

avoid one outcome.

𝑁𝑁𝑇 = 1/𝐴𝑅𝑅


FOR BINARY OUTCOMES


The relative risk is defined by 𝑅𝑅 = Τ𝑝𝑇 𝑝𝐶.

The relative risk reduction is defined similarly:

𝑅𝑅𝑅 =𝑝𝐶 − 𝑝𝑇𝑝𝐶

These two measures convey information

about the relative reduction in the

probability of the outcome due to the

treatment or exposure.


FOR BINARY OUTCOMES


The odds of the outcome for treated subjects

is defined as Τ𝑝𝑇 1 − 𝑝𝑇 .

The odds of the event occurring is the

probability of the event occurring divided by

the probability of the event not occurring.

The odds ratio is defined as: 𝑂𝑅 =Τ𝑝𝑇 1−𝑝𝑇Τ𝑝𝐶 1−𝑝𝐶

*Clinically important questions are best addressed using

relative risks, risk differences, and NNT (Sinclair and

Bracken, J Clim Epidemiol).


FOR BINARY OUTCOMES


Randomization may not be feasible for

several reasons:

• It is unethical to withhold treatment

considered the standard of care.

• The exposure is believed to be harmful.

• Participants have strong attachments to

specific treatments.

RANDOMIZATION MAY NOT BE FEASIBLE


Observational Study & Issues


An observational study is an empirical

investigation in which the objective is to

elucidate cause-and-effect relationships…

[in which] it is not feasible to use controlled

experimentation, in the sense of being able

to impose the procedures or treatments

whose effects it is desired to discover, or to

assign subjects at random to different

procedures (WG Cochran).

OBSERVATIONAL STUDIES


Consequences of absence of randomization:

• Treatment selection is influenced by

subject (patient) characteristics.o Treated subjects often differ systematically from

untreated subjects.

• Outcomes cannot be directly compared

between treated and untreated subjects.o Treatment is confounded with subject characteristics.



Issues in designing non-randomized studies:

• Selection of patients

• Defining baseline time

• Accounting for confounding



Accounting for Confounding


Accounting for confounding in observational

studies:

• Analysis: Regression adjustment

• Design: Stratification/Matching

ACCOUNTING FOR CONFOUNDING


Regression is frequently used to estimate the

adjusted effect of exposure on outcomes in

observational studies.

• Linear regression: adjusted difference in

means

• Logistic regression: adjusted odds ratios

• Cox regression: adjusted hazard ratios

REGRESSION ADJUSTMENT


Regression limitations:

• Insufficient covariate overlap between treatment

groups.

• Difficult to access whether confounding has been

adequately removed.

• The outcome is always in sight.

• Limited adjustment with rare outcomes.

• Only suggests correlation, but no causal

relationship.

REGRESSION ADJUSTMENT


Using the Propensity Score to

Design and Analyze

Observational Studies


Definition: The probability of treatment

assignment conditional on observed

baseline covariates

𝑒 𝑋 = 𝑃𝑟 𝑇 = 1 𝑋)

In RCTs, the true propensity score is known

from the study design.

In observational studies, the propensity score

must be estimated using the sample data.

PROPENSITY SCORE


PROPENSITY SCORE

Density

Region of

common

support

0 1Propensity score

Density of

scores for

control group

Density of

scores for

treatment group


Four methods of using the propensity score for

estimating treatment effects:

• Propensity score matching

• Stratification on the propensity score

• Inverse probability of treatment weighting using

the propensity score (IPTW)

• Regression adjustment using the propensity

score

PS METHODS


Propensity score matching:

• Creates matched sets of treated and untreated

subjects with similar values of the propensity

score.

• 1:1 pair matching is the most common

implementation.

• Outcomes can be compared directly between

treated and untreated subjects in the matched

sample.

PS MATCHING


PS MATCHING

Ref: https://www.summitllc.us/propensity-score-matching


PS MATCHING

Ref: https://stats.stackexchange.com/questions/300622/how-to-assess-for-balance-of-propensity-score-matching-covariates-in-stata


PS MATCHING

Ref: SAS/STAT 14.2 User’s

Guide The PSMATCH Procedure


Stratification on the propensity score:

• Subjects are divided into strata based on the

rank-ordered propensity score.

• Outcomes are compared between treated and

untreated subjects within each PS stratum.

o Each stratum can be seen as a mini ‘quasi-RCT’.

• An overall treatment effect is pooled across

strata.

o Similar to a meta-analysis of ‘quasi-RCTs’.

PS STRATIFICATION


PS STRATIFICATION

Ref: http://www.basug.org/downloads/2011q3/Scott.pdf


Inverse probability of treatment weighting (IPTW):

• Subjects are weighted by the inverse probability of

the treatment received:

o 𝑤 =𝑇

𝑒+

1−𝑇

1−𝑒

• In this synthetic, weighted dataset, the confounding

between observed baseline covariates and treatment

has been eliminated.

• Outcomes can be compared directly between treated

and untreated subjects in this weighted sample.

o Variance estimation must account for the sample weights.

IPTW


Regression adjustment using the propensity score:

Proposed by Rosenbaum and Rubin (1983) for use

with linear models. The outcome is regressed on:

• An indicator for treatment

• The propensity score.

PS REGRESSION ADJUSTMENT


Comparison of different propensity score methods:

• PS matching, stratification, and IPTW use design

to remove confounding: treatment assignment is

independent of measured baseline covariates in

the matched/weighted sample/each stratum.

• These three methods remove confounding

without reference to the outcome; separate

design from analysis. Similar to RCTs.

COMPARISON


• PS matching removes a greater degree of the

systematic differences between groups than

does stratification on the PS.

• PS matching results in a diminished sample size

compared to PS stratification.

• PS weighting and PS matching remove

approximately equivalent amounts of imbalance.

COMPARISON


Limitations of PS regression adjustment:

• Assumes that the outcome regression model is

correctly specified.

• Loses the ability to mimic the design of an RCT.

• More difficult to estimate clinically meaningful

measures of treatment effect (risk difference,

relative risk, NNT).

• May include treated subjects for whom there are

no comparable untreated subjects (and vice

versa).

COMPARISON



PS METHODS

Ref: SAS/STAT 14.2 User’s

Guide The PSMATCH Procedure


Summary of steps in a propensity score analysis:

1. Estimate the propensity score

2. Balance assessment

3. Estimate treatment effect

4. Sensitivity analysis

PS METHODS


Sensitivity Analysis


Sensitivity analysis for PS studies:

There are many modeling decisions that can affect

the results, including

• Specification of propensity score equation• What variables to include

• How many interactions to include and at what level

• Matching method and caliper to use

Must test sensitivity of results to these decisions

• If results are not robust to these changes, this should

raise a question mark about their reliability

SENSITIVITY ANALYSIS


Sensitivity analysis for PS studies:

Propensity score methods assume that treatment

assignment and prognosis are conditionally

independent given the observed covariates.• Assume that there are no unmeasured variables that influence

treatment assignment.

Methods have been proposed to assess the

robustness of results to this assumption.



Framework for sensitivity analysis:

Cornfield et al. conducted the first formal sensitivity

analysis in an observational study (JNCI 1959).

They examined whether the association between

smoking and lung cancer was causal, or whether

the relationship was due to unmeasured

differences between smokers and non-smokers.



Rosenbaum and Rubin have proposed sensitivity

analysis for observational studies based on the

framework of Cornfield.• It assumes that there is an unmeasured (possibly binary) covariate that

was associated with treatment assignment.

For specific values of 𝛾, one can compute the range

of possible p-values for the association between

exposure and outcome under the following model:

• 𝑙𝑜𝑔 𝜋𝑗/ 1 − 𝜋𝑗 = ĸ 𝑋𝑗 + 𝛾𝑈𝑗

• 0 ≤ 𝑈𝑗 ≤ 1

• 𝜋𝑗 is the probability of treatment selection



Example 1:

Even if there is an unmeasured variable that increases the odds of

treatment by 25%, the upper bound of p-value will still be <0.05. Thus

the comparison results are robust.


Increase in the

odds of treatment

p-value lower

bound

p-value upper

bound

0% 0.0025 0.0025

5% 0.0014 0.0041

10% 0.0008 0.0067

15% 0.0005 0.0103

20% 0.0003 0.0152

25% 0.0002 0.0219


Example 2:

If there is an unmeasured variable that increases the odds of treatment

by 15%, the upper bound of p-value will be above 0.05, which means

the treatment difference will not be significant anymore.


Increase in the

odds of treatment

p-value lower

bound

p-value upper

bound

0% 0.0101 0.0101

5% 0.0048 0.0200

10% 0.0022 0.0364

15% 0.0010 0.0618

20% 0.0005 0.0982

25% 0.0002 0.1475


Example


Data set: Bariatric surgery in SPARCS during 2009-

2011 with 2-year pre-operative and 4-year post-operative

records

Study objective: To compare clinical outcomes, i.e.

post-operative yearly hospital visit, yearly cumulative

length of stay (LOS), between Roux-en-Y gastric bypass

(RYGB) and sleeve gastrectomy (LSG) patients

Treatment: LSG (N=1121, 16.72%) vs RYGB (N=5584,

83.28%)

Outcome: post-operative yearly hospital visit (binary),

post-operative yearly cumulative LOS (continuous)

EXAMPLE

Ref: C.Zhu, J.Yang, D.Spaniolas, S.Wu, “A practical guide of propensity score analysis for longitudinal observational

study”, Poster Presentation, CSP 2019, New Orleans, LA, Feb 2019.


Baseline characteristics: Patients’ demographics

(gender, age, race, region, insurance), 28 Comorbidities, and

pre-operative information (1st/2nd-year cumulative LOS, 1st/2nd-

year number of ED visits). The characteristic with the biggest

standardized differences before PS matching is shown below.

EXAMPLE

Variable (level)

Total

(N=6,705)

RYGB

(N=5,584)

LSG

(N=1,121)

Standardized

difference

(original sample)

Standardized

difference

(matched sample)

Region: West 1176 (17.54%) 1104 (19.77%) 72 (6.42%) 0.404 0.011

Region: Mid/North 2222 (33.14%) 2148 (38.47%) 74 (6.60%) 0.825 0.03

Region: close to NYC 521 (7.77%) 373 (6.68%) 148 (13.20%) 0.219 0.003

Region: NYC area 2107 (31.42%) 1414 (25.32%) 693 (61.82%) 0.792 0.013

Region: Long island 679 (10.13%) 545 (9.76%) 134 (11.95%) 0.071 0.035


Methods:

• Regular regression

• PS matching (1:1)

• PS stratificationo ATE/ATT used to average treatment effects in each stratum

• PS regression adjustmento Ver.1: outcome regressed on treatment and propensity score

o Ver.2: outcome regressed on treatment, propensity score and other

covariates

o Ver.3: outcome regressed on treatment and propensity score quintile

(treated as a categorical variable)

• IPTW

EXAMPLE


EXAMPLE


EXAMPLE


References:• Workshop handout by Dr. Peter Austin, 2010

• https://library.downstate.edu/EBM2/2200.htm

• https://stats.stackexchange.com/questions/300622/how-to-assess-

for-balance-of-propensity-score-matching-covariates-in-stata

• https://www.summitllc.us/propensity-score-matching

• http://www.basug.org/downloads/2011q3/Scott.pdf

• SAS/STAT 14.2 User’s Guide The PSMATCH Procedure

• Elze, M.C. et al. J Am Coll Cardiol. 2017;69(3):345-57

• C.Zhu, J.Yang, D.Spaniolas, S.Wu, “A practical guide of propensity

score analysis for longitudinal observational study”, Poster

Presentation, CSP 2019, New Orleans, LA, Feb 2019

REFERENCE

https://library.downstate.edu/EBM2/2200.htm

https://stats.stackexchange.com/questions/300622/how-to-assess-for-balance-of-propensity-score-matching-covariates-in-stata

https://www.summitllc.us/propensity-score-matching

http://www.basug.org/downloads/2011q3/Scott.pdf


Please check our website for future lectures:

https://osa.stonybrookmedicine.edu/research-core-

facilities/bcc/education

Next lecture:

6/19, Wednesday, noon-1pm,

Introduction to regression models

https://osa.stonybrookmedicine.edu/research-core-facilities/bcc/education


Thank you!

Introduction to Propensity Score Analysis

Documents