A SAS Macro to Compute Added Predictive Ability of New Markers in Logistic Regression Kevin Kennedy, MS Saint Luke’s Hospital, Kansas City, MO Kansas City.

A SAS Macro to Compute Added Predictive Ability of New Markers in

Logistic Regression

Kevin Kennedy, MSSaint Luke’s Hospital, Kansas City, MO

Kansas City Area SAS User Group MeetingSeptember 3, 2009

Kevin Kennedy, MSSaint Luke’s Hospital, Kansas City, MO

Kansas City Area SAS User Group MeetingSeptember 3, 2009

Acknowledgment

• A special thanks to Michael Pencina (PhD, Biostatistics Dept, Boston University, Harvard Clinical Research Institute) for his valuable input, ideas, and additional formulas to strengthen output.

• A special thanks to Michael Pencina (PhD, Biostatistics Dept, Boston University, Harvard Clinical Research Institute) for his valuable input, ideas, and additional formulas to strengthen output.

Motivation

• Predicting dichotomous outcomes are important– Will a patient develop a disease?– Will an applicant default on a loan?– Will KSU win a game in the NCAA

tournament?

• Improving an already usable models should be a continued goal

• Predicting dichotomous outcomes are important– Will a patient develop a disease?– Will an applicant default on a loan?– Will KSU win a game in the NCAA

tournament?

• Improving an already usable models should be a continued goal

Motivation

• Many published models exist– Risk of Bleeding after PCI

• Predicts patient risk of bleeding after a PCI (Percutaneous Coronary Intervention) based on 10 patient characteristics: Age, gender, shock, prior intervention, kidney function, etc..

• Example: a 78 year old female with Prior Shock and Congestive Heart Failure has a ~8% chance of bleeding after procedure (national avg=2-3%)

– Why important? We can treat those at high risk appropriately

• Many published models exist– Risk of Bleeding after PCI

• Predicts patient risk of bleeding after a PCI (Percutaneous Coronary Intervention) based on 10 patient characteristics: Age, gender, shock, prior intervention, kidney function, etc..

• Example: a 78 year old female with Prior Shock and Congestive Heart Failure has a ~8% chance of bleeding after procedure (national avg=2-3%)

– Why important? We can treat those at high risk appropriately

Mehta et al. Circ Intervention, June 2009

However…..

• These models aren’t etched in stone

• New markers (variables) should be investigated to improve model performance

• Important Question: How do we determine if we should add these new markers to the model?

• These models aren’t etched in stone

• New markers (variables) should be investigated to improve model performance

• Important Question: How do we determine if we should add these new markers to the model?

Project Goal• Compare:

• How Much Does the new variable add to model performance?

• Output of Interest: Predicted Probability of Event (one for each model) computed with the logistic regression model

• Compare:

• How Much Does the new variable add to model performance?

• Output of Interest: Predicted Probability of Event (one for each model) computed with the logistic regression model

1111

11

varvar.....var)(logit:model2

var.....var)(logit:model1

nnnn

nn

y

With

y

Outline

• Traditional Comparisions– Receiver Operating Characteristic Curve

• New Measures– IDI, NRI, Vickers Decision Curve

• SAS macro to obtain output

• Traditional Comparisions– Receiver Operating Characteristic Curve

• New Measures– IDI, NRI, Vickers Decision Curve

• SAS macro to obtain output

Traditional Approach-AUCs

• Common to plot the Receiver Operating Characteristic Curve (ROC) and report the area underneath (AUC) or c-statistic

• Measures model discrimination

• Equivalent to the probability that the predicted risk is higher for an event than a non-even[4]

• Common to plot the Receiver Operating Characteristic Curve (ROC) and report the area underneath (AUC) or c-statistic

• Measures model discrimination

• Equivalent to the probability that the predicted risk is higher for an event than a non-even[4]

AUC/ROC Background• ROC/AUC plot depicts trade-off between benefit (true

positives) and costs (false positives) by defining a “cut off” to determine positive and negative individuals

• ROC/AUC plot depicts trade-off between benefit (true positives) and costs (false positives) by defining a “cut off” to determine positive and negative individuals

True N

eg

True Pos

False Pos

Non-Events Events

Predicted Probabilities

False Neg

Cut Off

True Positive Rate (Sensitivity)=.9

False Positive Rate (1-Specificity)=.4

Tru

e P

ositi

ve R

ate

(sen

sitiv

ity)

0%

100%

False Positive Rate (1-specificity)

0%

100%

ROC curve

9.AUC

tionDiscrimina No

...

...

Yippy!

tionDiscriminaPerfect

5.

65.

7.~

9.~

1

Ugh

Ehh

AUC

AUC Computation

• All possible pairs are made between events and nonevents – A dataset with 100 events and 1000 non-events

would have 100*1000=100000 pairs– If the predicted probability for the event is higher

for the subject actually experiencing the event give a ‘1’ (concordant) otherwise ‘0’ (discordant)

– C-statistic is the average of the 1’s and 0’s (.5 for ties)

• Now use methods by DeLong to compare AUCs of Model1 and Model2

• All possible pairs are made between events and nonevents – A dataset with 100 events and 1000 non-events

would have 100*1000=100000 pairs– If the predicted probability for the event is higher

for the subject actually experiencing the event give a ‘1’ (concordant) otherwise ‘0’ (discordant)

– C-statistic is the average of the 1’s and 0’s (.5 for ties)

• Now use methods by DeLong to compare AUCs of Model1 and Model2

Advantages

• Used all the time

• Recommended guidelines exist for Excellent/Good/Poor discrimination

• Default output in proc logistic, with new extensions in version 9.2 (roc and roccontrast statements) along with other computing packages

• Used all the time

• Recommended guidelines exist for Excellent/Good/Poor discrimination

• Default output in proc logistic, with new extensions in version 9.2 (roc and roccontrast statements) along with other computing packages

Disadvantages

• Rank based– A comparison of .51 to .50 is treated the same

as .7 to .1

• Doesn’t imply a useful model– Example: all events have probability=.51 and

non-event have probability=.5---Perfect discrimination (c=1) but not useful

• Extremely hard to find markers that result in high AUC – Pepe [5] claims an odds ratio of 3 doesn’t yield

good discrimination

• Rank based– A comparison of .51 to .50 is treated the same

as .7 to .1

• Doesn’t imply a useful model– Example: all events have probability=.51 and

non-event have probability=.5---Perfect discrimination (c=1) but not useful

• Extremely hard to find markers that result in high AUC – Pepe [5] claims an odds ratio of 3 doesn’t yield

good discrimination

Alternatives

• Pencina and D’Agostino in 2008 (Statistics in Medicine) suggest 2 additional statistics: – IDI (Integrated Discrimination Improvement)– NRI (Net Reclassification Improvement)

• Vickers [2,3] developed graphical techniques of comparing models

• Pencina and D’Agostino in 2008 (Statistics in Medicine) suggest 2 additional statistics: – IDI (Integrated Discrimination Improvement)– NRI (Net Reclassification Improvement)

• Vickers [2,3] developed graphical techniques of comparing models

IDI

• Measure of improved sensitivity without sacrificing specificity• Formula measures how much increase in ‘p’ for events, and how

much decrease for the non-event

• Measure of improved sensitivity without sacrificing specificity• Formula measures how much increase in ‘p’ for events, and how

much decrease for the non-event

model 2nd in the events theofy probabilit predictedmean theis 2__

:

11__1__

2__2___Re

)2__1__()1__2__(_

eventp

where

noneventpeventp

noneventpeventpIDIlative

noneventpnoneventpeventpeventpIDIAbsolute

IDI

Mean ‘p’ decreased by 3% for non-events

Mean ‘p’ increased by 5% for events

Absolute_IDI=(.15-.12)+(.25-.2)=.08

6.1115.2.

12.25.DIRelative_I

160% Relative improvement

NRI

• Measures how well the new model reclassifies event and non-event

• Dependant on how we decide to classify observations– Example: 0-10% Low, 10-20% Moderate, >20%

High

• Questions: do the patients experiencing an event go up in risk? Mod to High between model1 and 2

• Do Patients not experiencing an event go down in risk? Moderate to Low?

• Measures how well the new model reclassifies event and non-event

• Dependant on how we decide to classify observations– Example: 0-10% Low, 10-20% Moderate, >20%

High

• Questions: do the patients experiencing an event go up in risk? Mod to High between model1 and 2

• Do Patients not experiencing an event go down in risk? Moderate to Low?

NRI

• Formula:• Formula:

events-Non of #

up moving events-Non#

events-Non of #

down moving events-Non#

Events of #

down moving Events#

Events of #

up moving Events#NRI

NRI Computation Example

• 3 groups defined– <10% (low)– 10-20% (moderate)– >20% (high)

• Each individual will have a group for model 1 and 2

• 2 cross-tabulation tables (events and non-events)

• 3 groups defined– <10% (low)– 10-20% (moderate)– >20% (high)

• Each individual will have a group for model 1 and 2

• 2 cross-tabulation tables (events and non-events)


• Crosstab1: Events (100 Events)• Crosstab1: Events (100 Events)

Low Mod High

Low 10 8 2

Mod 3 30 10

High 2 5 30

Model 2

Mo

del

1

20 Events moving up

10 Events moving down

70 Events not moving

Net of 10/100 (10%) of events getting reclassified correctly


• Crosstab2: Non-Events (200 Non-Events)• Crosstab2: Non-Events (200 Non-Events)

Low Mod High

Low 50 5 0

Mod 20 40 10

High 5 10 60

Model 2

Mo

del

1

15 Non-events moving up

35 Non-events moving down

150 Non-events not moving

Net of 20/200 (10%) of non-events getting reclassified correctly

2.200

15

200

35

100

10

100

20

NRI

NRI Caveats

• Dependant on Groups– Would we reach similar conclusions with

groups: <15, 15-30, >30?????

• Alternative ways to define groups– Any up/down movement. A change in p

from .15 to .151 would be an ‘up’ movement– A threshold: ex. a change of 3% would

constitute an up/down. i.e. .33 to .37 would be an ‘up’ but .33 to .34 would be no movement

• Good News: Macro handles all these cases and you can request all at once

• Dependant on Groups– Would we reach similar conclusions with

groups: <15, 15-30, >30?????

• Alternative ways to define groups– Any up/down movement. A change in p

from .15 to .151 would be an ‘up’ movement– A threshold: ex. a change of 3% would

constitute an up/down. i.e. .33 to .37 would be an ‘up’ but .33 to .34 would be no movement

• Good News: Macro handles all these cases and you can request all at once

Vickers Decision Curve

• A graphical comparison of model 1 and 2 based off of ‘net’ benefit (first attributed to Peirce-1884)

• Useful if a threshold is important. • Example: If a persons predicted probability of an outcome

is greater than 10% we treat with strategy A – Here we’d want to compare the models at this threshold

• A graphical comparison of model 1 and 2 based off of ‘net’ benefit (first attributed to Peirce-1884)

• Useful if a threshold is important. • Example: If a persons predicted probability of an outcome

is greater than 10% we treat with strategy A – Here we’d want to compare the models at this threshold

t

t

p

p

nnNetBenefit

1

Positives FalsePositives True


• Example:– N=1000, dichotomous event, 10% as threshold

• Example:– N=1000, dichotomous event, 10% as threshold

Yes No

≥10% 80 200

<10% 20 700

Outcome

Pre

dic

ted

P

rob

abil

ity

0578.1.1

1.

1000

200

1000

80

NetBenefit

5.78 net true positive results per 100 compared with treating all as negative

True Positive count

False Positive count


-0.01

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Threshold Probability in %

0 10 20 30 40 50

PLOT Treat All Model1 Model2

No difference between 1 2 and treat all

Model 2 seems to outperform 1

Net

Be

nefit

SAS Macro to Obtain Output

• %added_pred(data= ,id= , y= , model1cov= ,model2cov= , nripoints=ALL, nriallcut=%str(), vickersplot=FALSE, vickerpoints=%str());

• Model1cov=initial model• Model2cov=new model• Nripoints=nri levels (eg: <10, 10-20, >20) insert

nripoints=.1 .2• Nriallcut=if you want to test amount of increase or decrease

instead of levels, i.e. if you want to know if a person increase/decreases by 5% insert nriallcut=.05

• Vickerpoints=thresholds to test (eg, 10%)

• %added_pred(data= ,id= , y= , model1cov= ,model2cov= , nripoints=ALL, nriallcut=%str(), vickersplot=FALSE, vickerpoints=%str());

• Model1cov=initial model• Model2cov=new model• Nripoints=nri levels (eg: <10, 10-20, >20) insert

nripoints=.1 .2• Nriallcut=if you want to test amount of increase or decrease

instead of levels, i.e. if you want to know if a person increase/decreases by 5% insert nriallcut=.05

• Vickerpoints=thresholds to test (eg, 10%)

AUC section of Macro

• AUC Comparisions are easy with SAS 9.2

PROC LOGISTIC DATA=&DATA;

MODEL &y=&model1cov &model2cov;

ROC ‘FIRST’ &model1cov;

ROC ‘SECOND’ &model2cov;

ROCCONTRAST REFERENCE=‘FIRST’;

ODS OUTPUT ROCASSOCIATION=ROCASS ROCCONTRASTESTIMATE=ROCDIFF;

RUN;

• If working from an earlier version the %roc macro will be called from the SAS website[6]

• AUC Comparisions are easy with SAS 9.2

PROC LOGISTIC DATA=&DATA;

MODEL &y=&model1cov &model2cov;

ROC ‘FIRST’ &model1cov;

ROC ‘SECOND’ &model2cov;

ROCCONTRAST REFERENCE=‘FIRST’;

ODS OUTPUT ROCASSOCIATION=ROCASS ROCCONTRASTESTIMATE=ROCDIFF;

RUN;

• If working from an earlier version the %roc macro will be called from the SAS website[6]

AUC Section of Macro

• Sample output• %added_pred(data=data, id=id, y=event,

model1cov=x1 x2 x3 x4, model2cov=x1 x2 x3 x4 x5, …….);

• Sample output• %added_pred(data=data, id=id, y=event,

model1cov=x1 x2 x3 x4, model2cov=x1 x2 x3 x4 x5, …….);

Model1 AUC

Model2 AUC

Difference in AUC

Std Error for Difference

P-value for difference

95% CI for Difference

0.77907 0.79375 0.0147 .0042 0.0005 (0.00646,0.0229)

IDI Section of Macro

• Use proc logistic output dataset for model1 and model2

PROC LOGISTIC DATA=&DATA;MODEL &y=&model1cov;OUTPUT OUT=OLD PRED=P_OLD;RUN;

PROC LOGISTIC DATA=&DATA;MODEL &y=&model2cov;OUTPUT OUT=NEW PRED=P_NEW;RUN;

proc sql noprint; create table probs as select *, (p_new-p_old) as pdifffrom old(keep=&id &y &model1cov &model2cov p_old ) as a join new(keep=&id &y

&model1cov &model2cov p_new ) as b on a.&id=b.&idorder by &y;quit;

• Now use proc means or sql to obtain: p_event_new, p_event_old, p_nonevent_new, p_nonevent_old

• Use proc logistic output dataset for model1 and model2

PROC LOGISTIC DATA=&DATA;MODEL &y=&model1cov;OUTPUT OUT=OLD PRED=P_OLD;RUN;

PROC LOGISTIC DATA=&DATA;MODEL &y=&model2cov;OUTPUT OUT=NEW PRED=P_NEW;RUN;

proc sql noprint; create table probs as select *, (p_new-p_old) as pdifffrom old(keep=&id &y &model1cov &model2cov p_old ) as a join new(keep=&id &y

&model1cov &model2cov p_new ) as b on a.&id=b.&idorder by &y;quit;

• Now use proc means or sql to obtain: p_event_new, p_event_old, p_nonevent_new, p_nonevent_old

IDI Section of Macro

• Sample Output• Sample Output

IDI IDI Std Err

Z-value P-value

95% CI Probability change for events

Probability change for non-events

Relative IDI

.0207 .0064 6.3 <.0001 (0.0143, 0.0272) .0186 -.002125 .167

NRI Section of Macro

• In 3 parts: – All groups (any up/down movement)– User defined (eg. <10,10-20,>20)– Threshold (eg. a change of 3%)

• Coding is more involved here containing– Counts for number of groups– Do-loops for various # of thresholds

and user groups

• In 3 parts: – All groups (any up/down movement)– User defined (eg. <10,10-20,>20)– Threshold (eg. a change of 3%)

• Coding is more involved here containing– Counts for number of groups– Do-loops for various # of thresholds

and user groups

NRI Section of MacroUser Defined Groups(eg. <10, 10-20,>20

nripoints=.1 .2)%if &nripoints^=ALL %then %do;

%let numgroups=%eval(%words(&nripoints)+1); /*figure out how many ordinal groups*/data nriprobs;set probs;/*define first ordinal group for pre and post*/if 0<=p_old<=%scan(&nripoints,1,' ') then group_pre=1;if 0<=p_new<=%scan(&nripoints,1,' ') then group_post=1;%let i=1;%do %until(&i>%eval(&numgroups-1));

if %scan(&nripoints,&i,' ')<p_old then do;group_pre=&i+1;end;if %scan(&nripoints,&i,' ')<p_new then do;group_post=&i+1;end;%let i=%eval(&i+1);

%end;if &y=0 then do;

if group_post>group_pre then up_nonevent=1;if group_post<group_pre then down_nonevent=1;

end;if &y=1 then do;

if group_post>group_pre then up_event=1;if group_post<group_pre then down_event=1

end;if up_nonevent=. then up_nonevent=0;if down_nonevent=. then down_nonevent=0;if up_event=. then up_event=0;if down_event=. then down_event=0;run;

%if &nripoints^=ALL %then %do;%let numgroups=%eval(%words(&nripoints)+1); /*figure out how many ordinal groups*/data nriprobs;set probs;/*define first ordinal group for pre and post*/if 0<=p_old<=%scan(&nripoints,1,' ') then group_pre=1;if 0<=p_new<=%scan(&nripoints,1,' ') then group_post=1;%let i=1;%do %until(&i>%eval(&numgroups-1));

if %scan(&nripoints,&i,' ')<p_old then do;group_pre=&i+1;end;if %scan(&nripoints,&i,' ')<p_new then do;group_post=&i+1;end;%let i=%eval(&i+1);

%end;if &y=0 then do;

if group_post>group_pre then up_nonevent=1;if group_post<group_pre then down_nonevent=1;

end;if &y=1 then do;

if group_post>group_pre then up_event=1;if group_post<group_pre then down_event=1

end;if up_nonevent=. then up_nonevent=0;if down_nonevent=. then down_nonevent=0;if up_event=. then up_event=0;if down_event=. then down_event=0;run;

NRI Section of Macro• Sample Output• %added_pred(data=,…..,nripoints=.1 .2,

nriallcut=.03,…..);

• Sample Output• %added_pred(data=,…..,nripoints=.1 .2,

nriallcut=.03,…..);Group NRI STD

ERRZ-value

P-VALUE

95% CI % of events correctly reclassified

% of non-event correctly reclassified

ALL .454 .09 9.7 <.0001 (0.3649,0.5447) (10%) 56%

CUT_.03 .101 .08 2.46 .014 (0.0205,0.1817) 4% 6%

USER .127 .05 4.95 <.0001 (0.0769,0.177) 5% 8%

Vickers’ Section of Macro

• Default is no analysis

• Can test multiple thresholds

• Uses bootstrapping techniques to create 95% CI’s

• If testing thresholds run time will increase due to Bootstrapping

• Default is no analysis

• Can test multiple thresholds

• Uses bootstrapping techniques to create 95% CI’s

• If testing thresholds run time will increase due to Bootstrapping

Vickers Section of MacroN

et B

enef

it

-0.01

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

0.11

Threshold Probability in %

0 10 20 30 40 50 60 70 80 90 100


PLOT Treat All Model1 Model2

%added_pred(data=,…..,vickersplot=TRUE,…..)

Conclusions

• Don’t rely only on AUC and statistical significance to access added marker predictive ability, but a combination of methods

• Future: extend to time-to-event analysis

• Don’t rely only on AUC and statistical significance to access added marker predictive ability, but a combination of methods

• Future: extend to time-to-event analysis

Q’s or Comments

• If you want to use the macro or obtain literature contact me at:

• Email: [email protected] or [email protected]

• If you want to use the macro or obtain literature contact me at:

• Email: [email protected] or [email protected]

References

1) Pencina MJ, D'Agostino RB Sr, D’Agostino RB Jr. Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Stat Med 2008; 27:157-72.

2) Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Medical Decision Making. 2006 Nov-Dec;26(6):565-74

3) Vickers AJ, Cronin AM, Elkin EB, Gonen M. Extensions to decision curve analysis, A novel method for evaluating diagnostic tests, prediction models and molecular markers. BMC Medical Informatics and

Decision Making. 2008 Nov 26;8(1):53.4) Cook NR. Use and Misuse of the receiver operating characteristics curve

in risk prediction. Circulation 2007; 115:928-935.5) Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of

the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159:882-890.

6) http://support.sas.com/kb/25/017.html

1) Pencina MJ, D'Agostino RB Sr, D’Agostino RB Jr. Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Stat Med 2008; 27:157-72.

2) Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Medical Decision Making. 2006 Nov-Dec;26(6):565-74

3) Vickers AJ, Cronin AM, Elkin EB, Gonen M. Extensions to decision curve analysis, A novel method for evaluating diagnostic tests, prediction models and molecular markers. BMC Medical Informatics and

Decision Making. 2008 Nov 26;8(1):53.4) Cook NR. Use and Misuse of the receiver operating characteristics curve

in risk prediction. Circulation 2007; 115:928-935.5) Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of

the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159:882-890.

6) http://support.sas.com/kb/25/017.html

A SAS Macro to Compute Added Predictive Ability of New Markers in Logistic Regression Kevin Kennedy, MS Saint Luke’s Hospital, Kansas City, MO Kansas City.

Documents

predicted probability

predicted risk

prior intervention

new extensions

new variable

false positive rate

models existrisk of

prior shock