Top Banner
Predicting Hospital Readmission using TreeNet™ Robert Aronoff MD
31
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Predicting Hospital Readmission Using TreeNet

Predicting Hospital Readmission using TreeNet™

Robert Aronoff MD

Page 2: Predicting Hospital Readmission Using TreeNet

The vision …..

Capture data entered as

part of routine clinical

workflow

EMR

Clinical Workflow

Automated E-T-L

processes

Machine learning

algorithms for target class

prediction

Predictive Model

Creation

Vendor ‘neutral’ scoring

tools:

Intranet based

JSON serialization

Decision Support

at Point of Care

Streamlined sequence of processes

Page 3: Predicting Hospital Readmission Using TreeNet

Agenda / Table of contents

Readmission after Heart Failure

Data Structure of an Electronic Medical Record

TreeNet™ Modeling with our Dataset

Lessons Learned and Next Step(s)

1

2

3

4

Page 4: Predicting Hospital Readmission Using TreeNet

PREFACE:

Page 5: Predicting Hospital Readmission Using TreeNet

Data Modeling Paradigm

© Salford Systems, 2011

Page 6: Predicting Hospital Readmission Using TreeNet

Model Accuracy

Feature Selection

Feature Fit

Target Class

Completeness of Set

Kattan MW, EuroUro. 2011; (59): 566-567

Page 7: Predicting Hospital Readmission Using TreeNet

Model Error: The Bias – Variance Decomposition

Hastie T, Tibshirani R, Friedman JH. The elements of statistical learning :

Data mining, inference, and prediction. New York, NY: Springer; 2009.

Prediction Error = Irreducible Error + Bias² + Variance

Page 8: Predicting Hospital Readmission Using TreeNet

Model caveats

Association does not prove causality

Models are retrospective (observational)

and therefore hypothesis generating (i.e.

not hypothesis proving)

Page 9: Predicting Hospital Readmission Using TreeNet

READMISSION AFTER HEART FAILURE

Page 10: Predicting Hospital Readmission Using TreeNet

Congestive Heart Failure

Common cause for admission.

Readmission in excess of 23%.

Risk factors for readmission extensively studied.

Published reviews cite over 120 studies.

- Methods: Logistic regression; Cox proportional hazard

- C-statistic in 0.6-0.7 range

Reduction of readmission has been declared a national goal.

Improved risk models have the potential to more effectively deploy

targeted disease management.

Bueno, H. et al.

JAMA 2010;303:2141-2147

Page 11: Predicting Hospital Readmission Using TreeNet

EMR data structure

Data collected for clinical workflow.

Large volume

- Multiple observations; repeated measures

- Many interactions and interdependencies

Complex dataset

- Continuous, Ordinal, Nominal (low and high order), Binary

- ‘High-order variable-dimension nominal variables’

Missing data:

- May represent error or practice patterns

Unbalanced classes

Outliers and Entry errors

Page 12: Predicting Hospital Readmission Using TreeNet

Preliminary Dataset

Administrative candidate predictors

•Admission source, status, service

•Age, gender, race

•Primary/secondary payers

•Primary/secondary diagnoses (names

and condition categories)

•Total length of stay, ICU length of stay

•Hospital costs and charges

•Discharge status and disposition

•All-cause same-center admission in

preceding year

Clinical candidate predictors

•Specialty medical services consulted

•Specialty ancillary services consulted

•Blood laboratory values

•Medications name / therapeutic class

•Dosages of medications

•Patient weights during hospitalization

•Transfusions during hospitalization

•Nursing assessments

•Education topics

•Diagnostic tests ordered

•Ordersets utilized

- 1612 consecutive heart failure discharges abstracted

- 1280 candidate predictors screened

- Target class: Readmission at 30 days ( binary )

Preliminary Unpublished Data

Page 13: Predicting Hospital Readmission Using TreeNet

Benefits of Stochastic Gradient Boosting

Does not require data transformation

Handles large numbers of categorical and continuous variables

Has mechanisms for:

- Feature selection

- Managing missing values

- Assessing the relationship of predictors to target

Robust to:

- Data entry errors, Outliers, Target misclassification

Friedman JH. Stochastic gradient boosting. Computational Statistics and Data Analysis 2002; 38(4):367- 378.

Input and processing

High model accuracy

Classification and regression

Non-parametric application of logistic , L1, L2, or Huber-M loss function

Output

Page 14: Predicting Hospital Readmission Using TreeNet

TreeNet™ Modeling with our Dataset

Parameters of ‘feature fit’

Parameters of ‘feature selection’

Elements of insight

Putting it all together

1

2

3

4

Page 15: Predicting Hospital Readmission Using TreeNet

TreeNet – parameters of ‘feature fit’

Do not forget the manual …..

Page 16: Predicting Hospital Readmission Using TreeNet

Feature selection – variable importance

Variable Importance

Calculation

Squared relative improvement

is the sum of squared

improvements (in squared

error risk) over all internal

nodes for which the variable

was chosen as the splitting

variable

Measurements are relative

and is customary to express

largest at 100 and scale other

variables appropriately

Page 17: Predicting Hospital Readmission Using TreeNet

Insight into the model

Illuminating the ‘black box’ with partial dependence

Preliminary Unpublished Data

Page 18: Predicting Hospital Readmission Using TreeNet

Approach to feature selection

Start with a subset

based on univariate

significance (i.e. P-

value below a given

level) or variance

above a given

threshold

Domain Neutral

Know your data

Univariate stats

Application of Variable

Importance

Screening with

batteries

Forward and backward

stepwise progression

Both

Use all potential

predictors

Use knowledge of

target and predictors to

make decisions on

inclusion (or rejection)

of predictor

Domain Centric

Domain ‘Neutral’ vs. Domain ‘Centric’

Page 19: Predicting Hospital Readmission Using TreeNet

Model Variability

Variation

The model is fit via sampling (i.e.

stochastic) process.

Accuracy / Precision

S.E.M.= S.D. / sqrt ( N )

Precision (95%) ≈ 4 * S.E.M.

S.D. = 0.03

Establishing AUC precision and accuracy

N trials 10 30 300

S.E.M. .0095 .0055 .0017

Precision (95%) .038 .022 .007

Page 20: Predicting Hospital Readmission Using TreeNet

Precision and predictor selection

AUC estimated using CV-10 ( = 10 trees) SEM .0095 and

precision (95%) of .038

Repeating CV-10 (using CVR battery) 30 times SEM .0017 and

precision (95%) of .007

Profound implication on dimensionality of model achievable without

domain knowledge input.

0.50

0.55

0.60

0.65

0.70

0.75

Avg

. R

OC

Test ROC

STEP_66 (737) (0.703)

Min = 0.5057

Median = 0.6738

Mean = 0.6500

Max = 0.7034

STEP_1 (197) (0.531)

Page 21: Predicting Hospital Readmission Using TreeNet

How much of a change in AUC is clinically relevant ?

Gain Curve complements ROC curve

Preliminary Unpublished

Data

Page 22: Predicting Hospital Readmission Using TreeNet

Useful batteries for feature selection

STEPWISE

SHAVING

Testing set to CV-10

Select predictor 1-2 at

a time

Confirm with CVR

battery

Methods of forward and backward selection

Page 23: Predicting Hospital Readmission Using TreeNet

BUILDING A MODEL

Step 1 Step 2 Step 3 Step 4 Step 5

Run model with all candidate

predictors. Select N highest

important predictors.

N= 2-3 x final size

This is a multi-step process

Use backward and forward

selection to reduce

preliminary model to a core

5 -15 predictors.**

Re-examine discarded

predictors in smaller

groups. Use backward

and forward selection.**

Run batteries to assess

parameters of feature ‘fit’.

Assess model (AUC)

variability. Repeat as

needed through process.

Review predictors and use domain

knowledge to eliminate redundant

(dependent) predictors and consider

predictors of known value. **

**Each change confirmed with

CVR (30 reps). Review partial

dependence plot.

Page 24: Predicting Hospital Readmission Using TreeNet

Initial runs

6 Nodes

2 Nodes

Information content and irreducible error

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500

Cro

ss E

ntr

op

y

Number of Trees

Train

Test

0.576

0.577

#880 (0.518) #880 (0.518) (0.457)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500

Cro

ss E

ntr

op

y

Number of Trees

Train

Test

0.576

0.576

#287 (0.519) #287 (0.519) (0.436)

Preliminary Unpublished Data

Page 25: Predicting Hospital Readmission Using TreeNet

Sample Model

GAIN CURVE ROC CURVE

FEATURE SELECTION SET

MODEL TRAINING SET

Preliminary Unpublished Data

Page 26: Predicting Hospital Readmission Using TreeNet

Sample partial dependence plots

The value of non-parametric regression

Admissions within prior year ICU Days

Anion Gap Initial Systolic BP

Final BNP BUN-Creatinine Ratio

Preliminary Unpublished Data

Page 27: Predicting Hospital Readmission Using TreeNet

Prospective application

Additional heart failure discharges can be scored against the model

GAIN curve ROC curve

Causes for performance shift

Overfitting in the original model

Concomitant intervention programs are altering

patient risk of readmission

Preliminary Unpublished

Data

Page 28: Predicting Hospital Readmission Using TreeNet

Non-influential candidate predictors

Diagnoses and QualNet Condition Categories

Medications and Therapeutic Categories

Diagnostic Tests

Ordersets Submitted

Models favor continuous over binary ‘dummy’ variables

Preliminary Unpublished Data

Page 29: Predicting Hospital Readmission Using TreeNet

Lessons learned

TreeNet ( stochastic gradient boosting) is extremely well suited to

data structure of EMR data.

Insight in to dataset is a rich feature (in and above prediction

performance).

Model performance variance is important in feature selection.

- Consequence of limited information content in our dataset.

Batteries are useful.

- PARTITION – Variability assessment

- CVR – Model assessment

- STEPWISE – Forward selection

- SHAVING – Backward selection

There is great value in learning on a non-trivial dataset within a

familiar domain.

Page 30: Predicting Hospital Readmission Using TreeNet

Next steps ……

Explore options to manage model variability

and increase dimensionality of predictor set.

Extend analysis of predictor interactions.

Develop mechanism of ‘point-of-care’ patient

scoring.

Apply techniques to new problems and

dataset.

Page 31: Predicting Hospital Readmission Using TreeNet

Any Questions?

[email protected]