The Neglected “R” in the Risk-Needs-Responsivity Model: A ...

The Neglected “R” in the Risk-Needs-Responsivity Model: A New

Approach for Assessing Responsivity to Correctional Interventions

Authors

Grant Duwe, Ph.D.

Research Director

1450 Energy Park Drive, Suite 200

St. Paul, MN 55108-5219

Email: [email protected]

KiDeuk Kim, Ph.D.

Senior Fellow, Urban Institute

2100 M Street NW

Washington, DC 20037

Email: [email protected]

1450 Energy Park Drive, Suite 200

St. Paul, Minnesota 55108-5219

651/361-7200

TTY 800/627-3529

www.doc.state.mn.us

January 2019

This information will be made available in alternative format upon request.

Printed on recycled paper with at least 10 percent post-consumer waste

mailto:[email protected]

http://www.doc.state.mn.us/

Research Summary

Prevailing correctional practice holds that offenders should be assigned to interventions

on the basis of assessments for risk, needs, and responsivity. Assessments of responsivity,

however, typically consist of little more than a checklist of items such as motivation,

gender, language, or culture. We introduce a new actuarial approach for assessing

responsivity, which focuses on predicting whether individuals will desist after

participating in an intervention. We assess responsivity by using multiple classification

methods and predictive performance metrics to analyze various approaches for

prioritizing individuals for correctional interventions. The results suggest that adding an

actuarial responsivity assessment to the existing risk and needs assessments would likely

improve treatment assignments and further enhance the effectiveness of an effective

intervention. We conclude by discussing the implications of more rigorous responsivity

assessments for correctional research, policy and practice.

1

Introduction

As correctional agencies have increasingly embraced the idea of evidence-based

practices, risk-needs-responsivity (RNR) has become the prevailing model to guide the

delivery of correctional interventions. The risk principle holds that programming

resources should be reserved for higher-risk individuals, whereas the needs principle

dictates that interventions should target criminogenic needs areas, or dynamic risk factors

that are susceptible to change. The responsivity principle, meanwhile, suggests that

programming should be tailored to the strengths, abilities, and learning styles of

individuals.

Because the RNR paradigm holds that the effective delivery of programming

should be customized to an individual’s risk, needs, and responsivity, the use of

assessment instruments is central to this model. Currently, most of the widely-used tools

simultaneously assess for risk and needs. Risk assessment involves predicting who is

most likely to recidivate, while needs assessment focuses on identifying which

interventions would be the most appropriate for an individual (Gottfredson and Moriarity,

2006).

Among existing assessment instruments, the only component that is truly actuarial

is the assessment of recidivism risk. That is, these instruments rely on statistical methods

to estimate the likelihood an individual will commit a new crime in the future. The

assessment of criminogenic needs, on the other hand, does not use an actuarial approach.

Instead, the common strategy involves tallying up the number of items related to each

criminogenic needs area. The needs areas, or domains, with the highest scores are those

that presumably should be targeted for programming. For example, if an individual scores

2

highest for the substance abuse domain, then substance abuse treatment would be

considered an appropriate—if not the most appropriate—intervention.

As with the assessment of criminogenic needs, contemporary instruments do not

use actuarial methods to assess responsivity. Of the three principles within the RNR

model, responsivity is generally an afterthought. Indeed, rather than being described as

risk, needs, and responsivity assessment tools, the most widely-used instruments are

typically referred to as risk and needs assessments. And even when there is an attempt to

account for responsivity, the assessment of responsivity is barely more than a checklist of

items such as motivation, gender, and culture.

Present Study

In this study, we introduce an actuarial approach for assessing responsivity, which

involves estimating the likelihood that an individual’s participation in an intervention will

result in desistance. If an individual participated in, say, substance abuse treatment, what

is the probability it would lead to desistance? With risk assessment, the focus is on

identifying who will recidivate. It is the opposite for responsivity assessment, where the

focus is on identifying who will desist, or not recidivate. Yet, because responsivity

assessment also considers participation in correctional interventions, it attempts to predict

whether participating in an intervention will result in desistance. In doing so, the

responsivity assessment we present in this study also accounts for the efficacy of an

intervention.

Our sample consists of more than 23,000 offenders released from Minnesota

prisons between 2003 and 2011. We focus on prisoner participation in prison-based

chemical dependency treatment, which has been found to be effective in reducing

3

recidivism for Minnesota prisoners (Duwe, 2010). Recognizing potential gender

differences, we conducted separate analyses for male and female offenders.

Each of the offenders had been assessed for chemical dependency (CD) needs

upon their entry to prison. We developed baseline recidivism prediction models (i.e., risk

assessment) along with responsivity assessment models that predict desistance following

participation in prison-based CD treatment. Using the risk, needs, and responsivity

assessment data, we then examined the performance of various prioritization schemes in

reducing recidivism. In addition to prioritizing on the basis of risk and needs, we

prioritized offenders on the basis of risk-needs-responsivity, risk and responsivity, and

needs and responsivity. We estimate the overall impact on recidivism and conclude by

discussing the implications for correctional research, policy, and practice.

Risk, Needs and Responsivity Assessments

Over the last half century, risk assessment within corrections has transitioned

from reliance on professional judgment in making classification decisions to the

widespread use of empirically-based, actuarial instruments. Even though more objective,

actuarial methods for assessing risk had been around since the late 1920s (Burgess,

1928), it was not until the 1970s that clinical judgment began to give way to the

development of what Bonta and Andrews (2007) refer to as second-generation risk

assessment instruments. Consisting mostly of static items such as criminal history, these

actuarial instruments, which have been found to consistently outperform clinical

judgment in predicting recidivism (Brennan, Dieterich, and Ehret, 2009), were developed

through statistical analyses. Following the emergence of the “what works” literature and

the growing acceptance of the risk-needs-responsivity (RNR) model, which places an

4

emphasis on assessing and targeting an offender’s criminogenic needs (dynamic risk

factors) for interventions, third-generation instruments began to incorporate both static

and dynamic predictors of recidivism. Continuing this focus on assessing static and

dynamic risk factors, fourth-generation risk assessment tools have been designed to

follow individuals from intake to case closure, be administered on multiple occasions,

and better integrate protective factors (i.e., factors that reduce recidivism risk) within the

assessment process (Brennan et al., 2009).

In calling for a concentration of programming resources on the highest-risk

offenders, the risk principle makes sense at both the individual and aggregate levels.

While an intervention has an aggregate effect size, its effects on individuals will vary.

After completing an intervention, even those that are effective, some individuals will

recidivate while others will desist. For example, let us assume we have an intervention

that reduces recidivism by 25 percent. If we applied this intervention to, say, 100 higher-

risk individuals whose baseline recidivism probability was 80 percent, we would expect

the intervention to lower recidivism by 25 percent, resulting in 60 recidivists. In other

words, the intervention produced desistance for 20 of the 100 offenders.

But what if we applied the intervention to a lower-risk group whose baseline

recidivism probability was 40 percent? If we assume the intervention lowers recidivism

by 25 percent, then there would be 30 recidivists; that is, the intervention produced

desistance for 10 of the 100 offenders, which is half the number we observed for the

higher-risk group. Conceptually, adhering to the risk principle can help maximize an

effective intervention's impact on recidivism.

5

One outstanding question, however, is whether we would still observe a 25

percent reduction for the higher-risk group compared to the lower-risk group. For the

higher-risk group, who may be more entrenched in a criminal lifestyle, it could be that

one intervention is insufficient to bring about desistance. To be sure, the extant literature

suggests that higher-risk individuals require more intensive programming (Bonta,

Wallace-Capretti, and Rooney, 2000; Lowenkamp and Latessa, 2005). But the use of risk

and needs assessments operates on the assumption that we assign individuals to

interventions on the basis of risk (high) and needs (high); that is, if a high-risk individual

has a high substance abuse need, we would presumably want to prioritize this person for

CD treatment. But would CD treatment be the most appropriate intervention or, more

specifically, the most effective in reducing recidivism risk for this individual?

While risk assessment involves predicting who is most likely to recidivate, the

goal of needs assessment is, or at least should be, to identify the areas in which

interventions would likely have the greatest impact in lowering recidivism risk. If an

individual is in prison for, say, 6 months and can only participate in one intervention,

which one would have the greatest impact on recidivism? Ostensibly, needs assessment

should be able to help us identify what type of intervention would be most beneficial.

None of the existing risk and needs assessments, however, have demonstrated

they can validly predict or identify needs. The existing literature seems to assume that if a

tool predicts recidivism, then it also predicts needs. But examining how well a tool

performs in predicting recidivism is an evaluation of its ability to assess risk, not needs.

Indeed, the factors that heighten the need for an intervention within a particular area may

not be predictive of recidivism. For example, the extent of chemical use in the 12 months

6

prior to prison may be more indicative of the need for CD treatment than it is for

recidivism risk.

But even if current assessments were able to accurately predict recidivism and

identify the salient needs areas of offenders, it is still critical to assess responsivity to

programming. For instance, a potential problem can arise from assigning high-risk, high-

need individuals to interventions that do not reduce recidivism because they are either

ineffective in general or insufficient for higher-risk individuals. Put another way, general

responsivity refers to types or programming that are most effective in reducing

recidivism, such as cognitive-behavioral interventions. Specific responsivity, on the other

hand, includes individual barriers that may limit the likelihood for program participation

and successful completion (Bonta and Andrews, 2007). Examples of specific responsivity

include motivation, anxiety, different forms of learning styles, language, transportation,

gender, and culture (Cullen, 2002).

Existing responsivity assessments have not been developed through the use of

statistical methods, or an actuarial approach. Instead, these assessments are, for the most

part, barely more than a list of items for practitioners to consider when making program

placement decisions. None of the responsivity assessments used on correctional

populations have, to our knowledge, been evaluated to determine whether they are valid

or reliable. It is therefore unclear whether these assessments perform well in identifying

responsivity factors or whether their use has led to more appropriate program

assignments and, ultimately, better recidivism outcomes.

In this study, we introduce a more rigorous, actuarial approach for assessing both

general and specific responsivity. In particular, we assess responsivity by estimating the

7

likelihood that an individual’s participation in an intervention will result in desistance. If

an individual participated in, say, CD treatment, what is the probability it would lead to

desistance? As we demonstrate later, this approach to responsivity assessment not only

accounts for the efficacy of an intervention but also the varying effects an intervention

has on individuals. By improving the process in which individuals are assigned to

interventions, we propose that an actuarial approach to responsivity assessment can help

achieve better recidivism outcomes.

Chemical Dependency Treatment in MnDOC

Shortly after their admission to prison in Minnesota, prisoners with at least six

months to serve in prison undergo a brief (20-40 minutes) chemical dependency (CD)

assessment conducted by a licensed assessor. CD assessors use DSM-IV criteria for

substance abuse in their diagnoses, which are based on both self-report and collateral

information. The criteria for abuse include problems at work or school, not taking care of

personal responsibilities, financial problems, engaging in dangerous behavior while

intoxicated, legal problems, problems at home or in relationships, and continued use

despite experiencing problems. The criteria for dependence, on the other hand, include

increased tolerance; withdrawal symptoms; greater use than intended over a relatively

long period of time, inability to cut down or quit; a lot of time spent acquiring, using, or

recovering from use; missing important family, work, or social activities; and knowledge

that continued use would exacerbate a serious medical or psychological condition. After

completing the assessment, CD assessors assign prisoners a rating of no need, moderate

need, or high need for CD treatment.

8

Even though most newly admitted offenders are considered to be chemically

abusive or dependent, the number of prisoners directed to CD treatment greatly exceeds

the number of CD treatment beds available. In fact, among prisoners who receive a CD

assessment, roughly one-fourth enter CD treatment during their confinement. As a result,

the Minnesota Department of Corrections (MnDOC) has used a relatively simple,

summative algorithm to prioritize prisoners for CD treatment.

The algorithm produces a score that ranges from a low of 0 points to a high of 40

points. Of the 40 possible points, 10 are based on the assessed need for CD treatment.

More specifically, prisoners are given 0 points for no need, 5 points for moderate need,

and 10 points for high need. Likewise, 10 of the 40 points are based on assessed

recidivism risk. As noted below, our sample contains prisoners released from Minnesota

prisons between 2003 and 2011. During this time, the MnDOC used the Level of Service

Inventory-Revised (LSI-R) to assess recidivism risk. Depending on their LSI-R score,

prisoners were given either 10 points (very high risk), 7 points (high risk), 4 points

(medium risk), or 0 points (low risk) for recidivism risk.

While risk and needs make up half of the 40 points in the algorithm, the offense

for which offenders are imprisoned accounts for a total 10 points. In particular, offenders

in prison for a felony DWI are given 10 points while those in prison for other offenses

receive 0 points. The final 10 points cover items related to factors such as mental illness,

traumatic brain injury, and a history of assaultive behavior. Based on the score (ranging

from 0 to 40) from the algorithm, prisoners are then given a CD treatment priority level

of 1 (score of 20 or higher), 2 (score between 14 and 19), or 3 (score of 13 or lower).

9

Priority level 1 prisoners are most likely to receive a CD treatment offer, followed by

those in priority level 2 and priority level 3.

A prior evaluation of the MnDOC's CD treatment showed it is effective in

reducing recidivism. Using propensity score matching to match 926 treated offenders

released in 2005 with 926 inmates who had been untreated, Duwe (2010) found that

treatment decreased the risk of recidivism by 17 percent for rearrest, 21 percent for

reconviction, and 25 percent for reimprisonment for a new felony offense. Moreover,

consistent with earlier research (Wexler et al., 1990), the results showed that increased

treatment time appeared to lower the risk of recidivism, but only up to a point. While

short-term (90 days) and medium-term (180 days) programs had a statistically significant

impact on all three recidivism measures, no significant effects were found for long-term

(365 days) programming.

Data and Method

Our overall sample consists of 23,034 offenders released from Minnesota prisons

between 2003 and 2011 who had been assessed for chemical dependency. Within this

sample, there were 2,314 females and 20,720 males. Each of the 23,034 prisoners were

given a treatment need level from one of three categories—high need for treatment,

moderate need, and no need—as well as a CD treatment priority level. Of the 23,034

prisoners, a total of 5,414 (24 percent) participated in CD treatment during their

confinement prior to their release.

While the treatment need level (high, moderate or no need) provides us with the

assessed CD treatment needs for the prisoners in our sample, we developed predictive

models for assessments of recidivism risk and responsivity. More specifically, because

10

there are important gender differences with respect to risk and needs, we initially

separated our overall sample into males (N = 20,720) and females (N = 2,314). Next, we

separated these samples into three sets by the year prisoners were released from prison.

Our first set, the training set, consisted of individuals (10,517 males and 1,250

females) released from Minnesota prisons between 2003 and 2007. Our second set, the

test set, contained individuals (4,876 males and 556 females) released from prison

between 2008 and 2009. Our final set, the validation set, consisted of individuals (5,327

males and 509 females) released from prison in either 2010 or 2011.

Focusing first on the assessment of recidivism risk, we developed predictive

models on the training set data. As shown in Tables 1 and 2, our dataset contained a total

of 36 predictors that are available when a MnDOC prisoner goes through intake at the

time of admission to prison. These predictors encompass items commonly found to be

predictive of recidivism, such as criminal history, age at release, gang affiliation, and

marital status. We also include items such as prison admission and offense type. Our

measure of recidivism is reconviction for a misdemeanor, gross misdemeanor, or felony

within three years of release from prison. We obtained reconviction data on all 23,034

prisoners from the Minnesota Bureau of Criminal Apprehension. As shown in Tables 1

and 2, females had lower recidivism rates compared to males.

For both males and females, we used two different classification methods—

logistic regression and random forests—to develop predictive models on the training set.

Over the last few decades, regression modeling has been increasingly used to develop

prediction tools in the criminal justice field (Brennan and Oliver, 2000; Duwe, 2012;

Duwe, 2014; Duwe and Freske, 2012; Lowenkamp and Whetzel, 2009), while the use of

11

Table 1. Descriptive Statistics for Male Prisoner Sample

Predictors Description Mean and SD Training Test Validation

Static/Criminal History Mean SD Mean SD Mean SD

Total Convictions Total # of convictions (any offense level) 11.05 8.47 13.05 9.49 14.15 10.46 Felony Convictions Total # of felony convictions 1.70 1.68 2.29 2.05 2.82 2.38

Felony Specialization/Diversity Degree of specialization/diversity in felony offenses 0.86 0.26 0.87 0.24 0.85 0.25

Violent Convictions Total # of violent offense convictions 1.46 1.81 1.69 2.04 1.87 2.09 Violent Specialization/Diversity Degree of specialization/diversity in violent offenses 0.91 0.21 0.92 0.19 0.92 0.19

Total Assault Convictions Total # of assault offense convictions 0.93 1.51 1.11 1.70 1.24 1.77

Total Robbery Convictions Total # of robbery convictions 0.17 0.57 0.20 0.68 0.20 0.67 VOFP Convictions Total VOFP, stalking and harassment convictions 0.14 0.54 0.22 0.73 0.32 0.89

Disorderly Conduct Convictions Total # of disorderly conduct convictions 0.09 0.34 0.15 0.46 0.24 0.63

Prostitution Convictions Total # of prostitution offense convictions 0.01 0.12 0.01 0.14 0.01 0.15 Drug Offense Convictions Total # of drug offense convictions 0.99 1.34 1.11 1.49 1.12 1.60

Drug Offense Specialization/Diversity Degree of specialization/diversity in drug offenses 0.93 0.18 0.95 0.15 0.96 0.14

False Information to Police Cons. Total # of false information to police convictions 0.44 0.90 0.52 0.99 0.55 1.03 Flee/Escape Convictions Total # of flee/escape police convictions 0.23 0.61 0.27 0.66 0.31 0.71

Weapons Offense Convictions Total # of weapons offense convictions 0.08 0.32 0.10 0.36 0.12 0.40

Total Property Convictions Total # of property offense convictions 2.91 4.09 3.03 4.38 3.29 4.77 Property Offense Specialization/Diversity Degree of specialization/diversity in prop. offenses 0.89 0.19 0.92 0.15 0.92 0.15

Driving While Intoxicated (DWI) Convictions Total # of DWI convictions 0.29 0.67 0.59 1.03 0.66 1.12

Failure to Register (FTR) Convictions Total # of FTR convictions 0.05 0.28 0.08 0.36 0.09 0.40 Total Supervision Failures Total # of revocations on probation and parole 1.08 1.36 1.17 1.50 1.30 1.63

Intake

Metro County of Commitment Commit from Twin Cities metro-area county 0.51 0.50 0.51 0.50 0.51 0.50 Length of Stay in Prison (Months) Difference in months between admission and release 18.72 16.15 21.14 20.83 22.03 22.90

New Court Commitment Admitted to prison directly from court 0.60 0.49 0.65 0.48 0.68 0.47

Probation Violator Admitted to prison for probation violation 0.36 0.48 0.34 0.47 0.31 0.46 Release Violator Admitted to prison for parole violation 0.04 0.21 0.01 0.12 0.01 0.11

Person Offense Most serious index offense is person offense = 1 0.21 0.41 0.22 0.41 0.23 0.42

Sex Offense Most serious index offense is sex offense = 1 0.08 0.26 0.06 0.24 0.06 0.24 Drug Offense Most serious index offense is drugs = 1 0.30 0.46 0.27 0.45 0.24 0.43

Property Offense Most serious index offense is property = 1 0.24 0.42 0.19 0.39 0.17 0.38

DWI Offense Most serious index offense is DWI = 1 0.05 0.21 0.12 0.32 0.09 0.29 Other Offense Most serious index offense is “Other” offense = 1 0.13 0.34 0.15 0.35 0.20 0.40

Suicidal History Suicidal history = 1; no history = 0 0.11 0.32 0.16 0.36 0.17 0.37 Security Threat Group (STG) Total # of STG criteria (0-10) 0.92 1.69 0.88 1.66 0.94 1.69

Marital Status Married = 1; unmarried = 0 0.11 0.32 0.11 0.31 0.11 0.31

Age at Release Age in years at time of release 33.15 9.36 34.51 9.85 34.60 10.02 Unsupervised Release Released to no correctional supervision 0.02 0.15 0.01 0.12 0.01 0.10

Recidivism within 3 Years

General Recidivism Reconviction for misd., gross misd., or felony 0.68 0.47 0.62 0.49 0.63 0.48 N 10,517 4,876 5,327

12

machine learning algorithms such as random forests has been more recent (Barnes and

Hyatt, 2012). Created by Breiman (2001), random forests is an ensemble method that

involves growing a forest of many trees, each of which is grown on an independent

bootstrap sample from the training data. Each time a tree is fit at each node, some of the

predictor variables are censored. Random forests then find the best split based on the

selected predictor variables. The trees are grown to a maximum depth, and a consensus

prediction is obtained after voting the trees.

Recent research has advocated testing multiple classification methods when

developing a predictive model (Duwe and Kim, 2016; Ridgeway, 2013), given that there

is no single best algorithm that yields the best performance in every situation (Caruana

and Niculescu-Mizil, 2006; Wolpert, 1996).1 To identify the best predictive models, we

evaluated performance on the test set data. After doing so, we then applied the best-

performing models to the validation set data. In the validation sets, each prisoner received

a predicted probability that reflects his or her likelihood of recidivating within three years

of release from prison.

To assess responsivity, we also developed predictive models on the training set

data. The main difference between the recidivism risk and responsivity assessments had

to do with the outcome being predicted. With the recidivism risk assessment, the

predicted outcome was whether individuals recidivated within three years. With

1 Prior research has provided mixed evidence on the performance of machine learning algorithms, such as

random forests, versus older, more traditional approaches like logistic regression. Some studies have found

little or no difference between these two sets of classification methods (Hamilton et al., 2015; Liu et al.,

2011; Tollenaar and van der Heijden, 2013), whereas others have observed a performance advantage for

machine learning approaches (Berk and Bleich, 2013; Caruana et al., 2006; Duwe and Kim, 2015, 2016;

Hess and Turner, 2013). The evidence seems to be clearer that statistical and machine learning algorithms

outperform simplistic, Burgess-style methods (Duwe and Kim, 2016). Given the fact there is no single best

algorithm that performs the best in every situation, research has advocated testing multiple algorithms

(Duwe and Kim, 2015, 2016; Ridgeway, 2013), which is the approach we have followed here.

13

Table 2. Descriptive Statistics for Female Prisoner Sample

Predictors Description Mean and SD Training Test Validation

Static/Criminal History Mean SD Mean SD Mean SD

Total Convictions Total # of convictions (any offense level) 9.24 7.67 10.75 9.38 11.86 10.23 Felony Convictions Total # of felony convictions 1.58 1.58 2.20 2.07 2.66 2.47

Felony Specialization/Diversity Degree of specialization/diversity in felony offenses 0.84 0.29 0.83 0.27 0.81 0.28

Violent Convictions Total # of violent offense convictions 0.86 1.89 0.80 1.49 0.98 1.98 Violent Specialization/Diversity Degree of specialization/diversity in violent offenses 0.94 0.19 0.96 0.13 0.94 0.18

Total Assault Convictions Total # of assault offense convictions 0.37 0.91 0.41 0.99 0.53 1.35

Total Robbery Convictions Total # of robbery convictions 0.07 0.32 0.07 0.30 0.10 0.48 VOFP Convictions Total VOFP, stalking and harassment convictions 0.03 0.23 0.05 0.31 0.08 0.39

Disorderly Conduct Convictions Total # of disorderly conduct convictions 0.04 0.22 0.11 0.41 0.19 0.54

Prostitution Convictions Total # of prostitution offense convictions 0.34 1.46 0.18 0.88 0.24 1.17 Drug Offense Convictions Total # of drug offense convictions 1.11 1.35 1.21 1.61 1.31 1.74

Drug Offense Specialization/Diversity Degree of specialization/diversity in drug offenses 0.89 0.25 0.91 0.21 0.92 0.19

False Information to Police Cons. Total # of false information to police convictions 0.49 0.97 0.64 1.29 0.56 1.05 Flee/Escape Convictions Total # of flee/escape police convictions 0.10 0.36 0.10 0.38 0.08 0.30

Weapons Offense Convictions Total # of weapons offense convictions 0.01 0.09 0.01 0.15 0.02 0.15

Total Property Convictions Total # of property offense convictions 3.38 4.87 3.58 5.61 3.78 5.67 Property Offense Specialization/Diversity Degree of specialization/diversity in prop. offenses 0.83 0.24 0.86 0.22 0.86 0.21

Driving While Intoxicated (DWI) Convictions Total # of DWI convictions 0.20 0.59 0.49 0.90 0.58 1.04

Failure to Register (FTR) Convictions Total # of FTR convictions 0.00 0.03 0.01 0.11 0.01 0.13 Total Supervision Failures Total # of revocations on probation and parole 0.85 0.92 1.04 1.06 0.97 1.12

Intake

Metro County of Commitment Commit from Twin Cities metro-area county 0.50 0.50 0.42 0.49 0.43 0.50 Length of Stay in Prison (Months) Difference in months between admission and release 11.93 11.61 14.30 14.26 17.11 15.79

New Court Commitment Admitted to prison directly from court 0.49 0.50 0.44 0.50 0.51 0.50

Probation Violator Admitted to prison for probation violation 0.47 0.50 0.50 0.50 0.48 0.50 Release Violator Admitted to prison for parole violation 0.04 0.20 0.06 0.24 0.01 0.10

Person Offense Most serious index offense is person offense = 1 0.13 0.34 0.13 0.34 0.15 0.36

Sex Offense Most serious index offense is sex offense = 1 0.01 0.08 0.01 0.08 0.01 0.10 Drug Offense Most serious index offense is drugs = 1 0.44 0.50 0.44 0.50 0.44 0.50

Property Offense Most serious index offense is property = 1 0.33 0.47 0.28 0.45 0.24 0.42

DWI Offense Most serious index offense is DWI = 1 0.03 0.17 0.07 0.26 0.11 0.31 Other Offense Most serious index offense is “Other” offense = 1 0.06 0.23 0.07 0.26 0.06 0.24

Suicidal History Suicidal history = 1; no history = 0 0.21 0.41 0.32 0.47 0.36 0.48 Security Threat Group (STG) Total # of STG criteria (0-10) 0.13 0.51 0.16 0.60 0.12 0.53

Marital Status Married = 1; unmarried = 0 0.10 0.30 0.09 0.29 0.12 0.33

Age at Release Age in years at time of release 34.61 8.45 35.70 9.36 35.88 9.33 Unsupervised Release Released to no correctional supervision 0.03 0.18 0.06 0.23 0.00 0.00

Recidivism

General Reconviction for misd., gross misd., or felony 0.59 0.49 0.57 0.50 0.50 0.50 N 1,250 555 509

14

responsivity assessment, the predicted outcome was whether individuals had 1)

participated in CD treatment and 2) desisted within three years of release from prison.

Therefore, for the entire dataset, we created a variable, CD treatment desistance, that

assigned a value of “1” to desistors who participated in CD treatment and a value of “0”

to all other offenders. As a result, offenders who participated in CD treatment but

recidivated were given a value of “0”. Likewise, offenders who desisted but did not

participate in CD treatment were assigned a value of “0” for this item.

After developing responsivity assessment models on the training set data for both

males and females, we evaluated predictive performance on the test sets. We then applied

the best-performing models to the validation sets for males and females. In the validation

sets, each offender received a predicted probability that reflects his or her likelihood of

desisting as a result of CD treatment.

Predictive Performance Metrics

To measure predictive performance, we used six metrics to capture the three main

areas of predictive validity—accuracy, discrimination, and calibration. To evaluate

predictive accuracy, which assesses how well a model makes correct classification

decisions, we used accuracy (ACC). For predictive discrimination, which measures the

degree to which the model separates the recidivists from the desistors, we used three

separate metrics—the AUC, the H measure developed by Hand (2009), and the precision-

recall curve (PRC). The AUC has been one of the most widely used predictive

performance metrics, and it is relatively robust across different recidivism base rates and

selection ratios (Smith, 1996). Still, the AUC can provide overly optimistic estimates of

predictive discrimination for imbalanced datasets (Davis and Goadrich, 2006), and it can

15

provide misleading results if receiver operating characteristic (ROC) curves cross (Hand,

2009). As a result, we also used Hand’s H-measure, which uses a common cost

distribution for all classifiers (Hand, 2009), and the precision-recall curve (PRC), which

assesses discrimination with the precision and recall values. Precision measures the

percent of positive predictions that were correct (based on the 50 percent threshold),

whereas recall reflects the percentage of positives (i.e., recidivists) that were captured.

Compared to the AUC, the PRC has been found to be a better metric for highly

imbalanced datasets (i.e., making predictions for an infrequently occurring outcome)

(Davis and Goadrich, 2006).

Calibration assesses how well the predicted probabilities from a model correspond

with the observed outcome being predicted. For our calibration metric, we used root

mean square error (RMSE), which measures the squared root of the average squared

difference between observed recidivism and predicted probabilities. The sixth metric we

used is the SAR (squared error, accuracy, and ROC area) statistic developed by Caruana,

Niculescu-Mizil, Crew, and Ksikes (2004). SAR is a combined measure of

discrimination, accuracy and calibration, and the formula for SAR is: (ACC + AUC + (1

– RMSE))/3 (Caruana, Niculescu-Mizil, Crew, and Ksikes, 2004).

Prioritizing Prisoners for CD Treatment

In the validation sets for the male and female prisoners, each offender had been

assessed for risk, needs, and responsivity. Put another way, each of the 5,327 males and

509 females in the validation sets had values for 1) recidivism risk probability, 2) CD

treatment need, and 3) responsivity probability. The values for both recidivism risk and

responsivity ranged from a low of 0 percent to a high of 100 percent. A higher predicted

16

probability for recidivism signifies a higher risk for recidivism. On the other hand, a

higher predicted probability for responsivity denotes a greater likelihood that an

individual will desist after participating in CD treatment. The values for CD treatment

need consisted of “1” for no need, “2” for moderate need, and “3” for high need.

Using these values from the risk, needs, and responsivity assessments, we

examined several different ways of prioritizing prisoners for CD treatment. In particular,

we prioritized offenders on the basis of 1) risk and needs, 2) risk and responsivity, 3)

needs and responsivity, and 4) risk, needs, and responsivity. For example, in prioritizing

prisoners by risk and needs, we added the values from the risk and needs assessments to

form a total risk-needs score. Likewise, to prioritize prisoners by risk, needs, and

responsivity, we added the values from the risk, needs, and responsivity assessments to

form a total risk-needs-responsivity score. Therefore, individuals with the highest scores

are presumably those with the highest risk, needs, and responsivity to CD treatment.

A little more than one-fourth of the prisoners in the male and female validation

sets entered CD treatment. For example, 1,377 (26%) of the 5,327 male offenders in the

validation set participated in CD treatment. The recidivism rate for the treated offenders

was 49 percent versus 68 percent for those who were untreated. For females, 145 (28%)

of the 509 offenders participated in CD treatment. The recidivism rate for the treated

offenders was 35 percent compared to 56 percent for those who were untreated

To determine how each prioritization scheme might perform in assigning

individuals for CD treatment, we organized the validation sets into quartiles and then

analyzed recidivism outcomes by CD treatment participation. To illustrate with the 5,327

male offenders in the validation set, the recidivism rate was 49 percent for the 1,377

17

(26%) who entered CD treatment and 68 percent for the 3,950 (74%) who did not. The

rate was therefore 27 percent lower for the treated offenders. With a recidivism rate of 49

percent among the 1,377 treated offenders, there were still 677 who were recidivists. Yet,

if we assumed that none of the 1,377 were able to enter treatment and the recidivism rate

for untreated offenders is 68 percent, then 932 would have been recidivists. Delivering

CD treatment to the 1,377 offenders is thus associated with a reduction of 255 recidivists

(932 minus 677).

If we prioritized the top one-fourth of offenders (i.e., CD treatment capacity) on

the basis of risk-needs, risk-responsivity, needs-responsivity, or risk-needs-responsivity,

would we still see a 27 percent reduction? Similarly, if we prioritized the top one-fourth

on the basis of these four prioritization schemes, would we still observe 255 prevented

recidivists? Would the treatment effect sizes and number of prevented recidivists be

smaller, larger, or about the same? To answer these questions, we present the findings in

the following section.

Results

In Table 3, we present the predictive performance results from the recidivism and

responsivity assessments for males and females. As noted above, we used two types of

classification methods—logistic regression and random forests. The results in Table 3

indicate the recidivism risk models for both classification methods predicted recidivism

relatively well for both males and females. For male offenders, the logistic regression

model slightly outperformed the random forests model across each of the six predictive

performance metrics in both the test and validation sets. For female offenders, the

random forests model slightly outperformed logistic regression.

18

Table 3. Predictive Performance Results

Metrics ACC AUC H PRC RMSE SAR

TEST SET

Recidivism Baseline

Males

Logistic Regression 0.693 0.749 0.211 0.807 0.447 0.665

Random Forests 0.679 0.740 0.197 0.804 0.451 0.656

Females


Random Forests 0.699 0.771 0.268 0.816 0.446 0.675

Responsivity

Males


Random Forests 0.870 0.813 0.296 0.350 0.313 0.790

Females


Random Forests 0.852 0.838 0.311 0.379 0.317 0.791

VALIDATION SET

Recidivism Baseline

Males


Random Forests 0.687 0.741 0.207 0.801 0.451 0.659

Females


Random Forests 0.695 0.792 0.385 0.799 0.451 0.679

Responsivity

Males


Random Forests 0.869 0.816 0.202 0.362 0.316 0.790

Females


Random Forests 0.829 0.839 0.319 0.486 0.345 0.774 Notes: ACC = Accuracy; AUC = Area Under the Curve; PRC = Precision-Recall Curve;

RMSE = Root Mean Squared Error; SAR = Squared Error, Accuracy, ROC (Receiver

Operating Characteristic)

When we focus on assessing responsivity to CD treatment, the random forest

models performed best for male and female offenders in both the test and validation sets.

In particular, the validation test results for males indicate the random forests model had

good predictive discrimination (AUC = 0.82) and it yielded a correct classification rate of

19

87 percent. With an SAR value of 0.79, the random forests model had strong overall

predictive performance. We observed similar findings for females. The random forests

model achieved an accuracy rate of 83 percent, an AUC of 0.84, and a SAR of 0.77.

Overall, the responsivity assessment models had better predictive performance than those

for recidivism risk.

Table 4. Male Prisoner Results

Measures Recidivism Rates Number of Recidivists

Treated N Untreated N Effect Treated Untreated Prevented Total N

Overall 0.492 1,377 0.677 3,950 0.273 677 932 255 5,327

Overall Adjusted 0.492 1,332 0.677 3,950 0.273 655 902 247 5,282

Risk-Needs-Responsivity

1 (Top 25%) 0.593 479 0.856 853 0.307 790 1,140 350 1,332

2 (26-50%) 0.550 350 0.760 982 0.276 733 1,012 279 1,332

3 (51-75%) 0.390 323 0.610 1,009 0.361 519 813 294 1,332

4 (Bottom 25%) 0.320 221 0.530 1,110 0.396 426 705 279 1,331

Risk-Needs

1 (Top 25%) 0.770 184 0.860 1,148 0.105 1,026 1,146 120 1,332

2 (26-50%) 0.630 329 0.720 1,003 0.125 839 959 120 1,332

3 (51-75%) 0.460 490 0.570 842 0.193 613 759 146 1,332

4 (Bottom 25%) 0.280 374 0.480 957 0.417 373 639 266 1,331

Risk-Responsivity

1 (Top 25%) 0.588 466 0.851 866 0.309 783 1,134 351 1,332

2 (26-50%) 0.553 349 0.764 983 0.276 737 1,018 281 1,332

3 (51-75%) 0.448 290 0.662 1,042 0.323 597 882 285 1,332

4 (Bottom 25%) 0.294 272 0.468 1,059 0.372 391 623 232 1,331

Needs-Responsivity

1 (Top 25%) 0.398 732 0.443 600 0.102 530 590 60 1,332

2 (26-50%) 0.549 419 0.634 913 0.134 731 844 113 1,332

3 (51-75%) 0.760 129 0.739 1203 +0.03 1,012 984 -28 1,332

4 (Bottom 25%) 0.598 97 0.762 1234 0.215 796 1,014 218 1,331

20

In Tables 4 and 5, we present the results for the male and female offenders. Here,

we focus only on the classification method that produced the best results. Whereas

random forests yielded the best outcomes for males, it was logistic regression for

females. For both males and females, we analyzed the results according to the four

different schemes for prioritizing offenders for CD treatment. Therefore, we compared

the overall results from the validation set with the four prioritization schemes: 1) risk-

needs, 2) risk-responsivity, 3) needs-responsivity, and 4) risk-needs-responsivity.

Table 5. Female Prisoner Results

Measures Recidivism Rates Number of Recidivists

Treated N Untreated N Effect Treated Untreated Prevented Total N

Overall 0.352 145 0.560 364 0.371 51 81 30 509

Overall Adjusted 0.352 127 0.560 364 0.371 45 71 26 491

Risk-Needs-Responsivity

1 (Top 25%) 0.500 34 0.828 93 0.396 64 105 41 127

2 (26-50%) 0.333 30 0.619 97 0.462 42 79 37 127

3 (51-75%) 0.386 44 0.482 83 0.199 49 61 12 127

4 (Bottom 25%) 0.189 37 0.297 91 0.364 24 38 14 128

Risk-Needs

1 (Top 25%) 0.706 17 0.855 110 0.174 90 109 19 127

2 (26-50%) 0.571 21 0.581 106 0.017 73 74 1 127

3 (51-75%) 0.333 48 0.354 79 0.059 42 45 3 127

4 (Bottom 25%) 0.186 59 0.290 69 0.359 24 37 13 128

Risk-Responsivity

1 (Top 25%) 0.529 34 0.817 93 0.353 67 104 37 127

2 (26-50%) 0.286 28 0.616 99 0.536 36 78 42 127

3 (51-75%) 0.439 41 0.512 86 0.143 56 65 9 127

4 (Bottom 25%) 0.167 42 0.267 86 0.374 21 34 13 128

Needs-Responsivity

1 (Top 25%) 0.261 69 0.293 58 0.109 33 37 4 127

2 (26-50%) 0.349 43 0.417 84 0.163 44 53 9 127

3 (51-75%) 0.632 19 0.648 108 0.025 80 82 2 127

4 (Bottom 25%) 0.429 14 0.719 114 0.403 55 92 37 128

In Table 4, the results show there were 1,377 treated offenders and 3,950

untreated offenders. The three-year reconviction rate was 49.2 percent for the treated and

67.7 percent for the untreated. The treated rate was therefore 27 percent lower than the

21

untreated rate. Among the 1,377 who were treated, there were 677 recidivists. If the

1,377 treated offender had not been treated and their rate was 67.7 percent, then 932

would have been recidivists. As a result, CD treatment prevented 255 recidivists. We also

show the overall adjusted figures based on one-fourth (N = 1,332) participating in CD

treatment.

In Table 5, the results for females show there were 145 treated offenders and 364

untreated offenders. The three-year reconviction rate was 35.2 percent for the treated and

56.0 percent for the untreated. The treated rate was therefore 37 percent lower than the

untreated rate. Among the 145 who were treated, there were 51 recidivists. If the 145

treated offenders had not been treated and their rate was 56.0 percent, then 81 would have

been recidivists. As a result, CD treatment prevented 30 recidivists. We also show the

overall adjusted figures based on one-fourth (N = 127) participating in CD treatment.

As shown in Tables 4 and 5, the risk-needs-responsivity (RNR) scheme

performed the best for both males and females, followed by risk-responsivity (RR), risk-

needs (RN), and needs-responsivity (NR). For males, the RNR and RR schemes

performed roughly the same, while the RNR scheme for females was clearly better than

the RR scheme. For both males and females, the RNR and RR prioritization schemes

increased the number of prevented recidivists while preserving the treatment effect size.

To illustrate, when we focus on the RNR scheme for males, we see that the effect

size among the top one-fourth (30.7 percent reduction) is actually a little larger than it is

for the overall sample (27.3 percent reduction). Moreover, because the RNR scheme

effectively isolated the higher-risk offenders, it prevented a larger number of recidivists.

Indeed, the number of prevented recidivists in the RNR scheme (350) was more than 100

22

higher than the number (247) for the overall adjusted sample. Focusing on the RNR

scheme for females, we see the effect size (39.6 percent) is larger than the overall effect

size (37.1 percent). The number of recidivists prevented (41) is also 15 higher than that

observed (26) for the overall adjusted sample. Combined, the RNR scheme accounted for

118 additional prevented recidivists, whereas the RR scheme was responsible for 115

prevented recidivists.

In contrast, neither the NR nor RN schemes performed well for either males or

females. For example, the NR scheme would produce an estimated 207 fewer prevented

recidivists, whereas the RN scheme yielded 133 fewer prevented recidivists. Although

the NR scheme may have performed well in identifying who needs CD treatment and

who would benefit from it the most, it is still important to account for recidivism risk.

Likewise, the RN scheme was effective in identifying higher-risk offenders, but it did not

perform well in identifying those who would benefit from CD treatment. Recidivism

rates were higher for offenders in the upper quartiles, but the treatment effect size was

smaller. It is possible that for some higher-risk offenders, CD treatment alone is

insufficient to help bring about desistance. For these offenders, they may need another

intervention or, more precisely, multiple interventions. Assessing for responsivity,

however, helps identify who would benefit the most from CD treatment, even among the

higher-risk offenders.

In Table 6, we estimate the overall impact that each prioritization scheme might

have on recidivism. Combined, the male and female validation sets included 5,836

released prisoners, of whom 1,522 were treated. The recidivism rate was 61.8 percent for

these offenders, resulting in 3,606 recidivists. If none of the 1,522 had been treated, the

23

estimated rate would have been 66.7 percent, resulting in 3,891 recidivists. The

prioritization scheme used by the MnDOC yielded 285 prevented recidivists.

Table 6. Overall Results

N Treated Recidivists Rate Prevented Recidivists NNT

No Treatment 5,836 0 3,891 66.7%

Current State 5,836 1,522 3,606 61.8% 285 5.34

Needs-Responsivity 5,836 1,522 3,826 65.6% 65 23.42

Risk-Needs 5,836 1,522 3,749 64.2% 142 10.72

Risk-Responsivity 5,836 1,522 3,487 59.8% 404 3.77

Risk-Needs-Responsivity 5,836 1,522 3,481 59.7% 410 3.71

NNT = Number Needed to Treat

The number needed to treat (NNT) is a statistic that has been used, often in

epidemiology, to measure the efficacy of different types of treatment. NNT quantifies the

number of participants who would need to participate in an intervention in order to

produce one beneficial outcome. The NNT formula for this study is: 1/ (recidivism rate

for untreated prisoners) – (recidivism rate for treated prisoners). With 1,522 receiving

treatment, the number needed to treat (NNT) to achieve one desistor was 5.34.

When we examine the overall impact for each of the four prioritization schemes,

we see that both the NR and RN schemes performed worse than the current scheme used

by the MnDOC. The NR model achieved 65 desistors for a NNT of 23.42, whereas the

RN scheme was slightly better with 142 desistors and a NNT of 10.72. In contrast, the

RR model netted 404 desistors, resulting in a NNT of 3.77. The RNR model yielded 410

desistors, resulting in a NNT of 3.71. Compared to the current scheme used by the

MnDOC, the RNR model would produce 125 more desistors, lowering the recidivism

rate by a little more than two percentage points.

24

Conclusion

Often consisting of little more than a checklist of items, the assessment of

responsivity has been the neglected “R” in the RNR model. Here we introduced a more

rigorous, actuarial approach for assessing responsivity by attempting to predict which

prisoners would desist from crime after participating in a correctional intervention. The

responsivity assessment we presented in this study not only accounts for the efficacy of

an intervention, but it can also be combined with risk and needs assessments to

potentially produce better treatment assignments.

The results showed the responsivity assessments had relatively high levels of

predictive performance for male and female prisoners. More important, however, the

findings suggest that including an actuarial assessment for responsivity can help further

enhance the effectiveness of an effective intervention. We observed the best recidivism

outcomes when we combined the responsivity assessments with those for risk and needs.

Prioritizing the highest risk and need offenders who would likely benefit the most from

CD treatment increased the treatment effect size, improved the NNT metric, and lowered

the overall recidivism rate by two percentage points. Even though the prevention of more

than 100 individuals from becoming recidivists may not seem substantial, a reduction of

this magnitude is notable because crime is costly. Indeed, the costs resulting from crime

include victimization costs, criminal justice system (law enforcement, courts, and

corrections) costs, offender lost productivity, and public willingness-to-pay costs (Cohen

and Piquero, 2009). Although property offenses generally incur a relatively low cost, it

has been estimated that violent crimes such as a sex offense can cost society up to a half

million dollars or, more significantly, that one murder costs between $10 and $20 million

25

(in 2018 dollars) (Cohen and Piquero, 2009; DeLisi, Kosloski, Sween, Hachmeister,

Moore, and Drury, 2010; McCollister, French, and Fang, 2010).

While the findings suggest that using actuarial responsivity assessments may help

maximize the public safety benefits from effective interventions by prioritizing offenders

more effectively, several limitations are worth highlighting. Most notably, we examined

only one intervention (CD treatment) for one needs area (substance abuse) for prisoners

from one jurisdiction (Minnesota). In addition, we examined only two types of

classification methods (logistic regression and random forests), and we used a very

simplistic, summative approach for combining the risk, needs, and responsivity

assessments. Therefore, it is unclear the extent to which the findings presented here,

which should be considered preliminary, are generalizable. Still, because the findings are

promising, below we discuss the implications they may have for correctional research,

policy, and practice.

First, the results suggest that factors commonly associated with recidivism, such

as criminal history, gang affiliation, or marital status, may also have an impact on

responsivity. Indeed, it is worth reiterating that our responsivity assessment models had

better predictive performance than those for recidivism. Therefore, factors affecting

responsivity to correctional interventions may not only include those typically considered

such as gender, culture, language, and motivation, but also those more commonly

associated with recidivism risk.

Second, in addition to considering factors normally associated with recidivism,

the approach for assessing responsivity we introduced here has the advantage of helping

empirically determine whether an intervention would be effective in reducing recidivism

26

for individual offenders. Within the current RNR framework, offenders are assigned to

interventions on the basis of risk, needs and, in some instances, responsivity. It is

generally unclear, however, whether the intervention is actually effective or, even if it is,

whether the individual would benefit from the intervention. Just because the literature

indicates that prison-based drug treatment is generally effective does not mean that a

specific drug treatment program will be effective in reducing recidivism. After all, issues

such as a lack of program integrity can compromise the effectiveness of a correctional

intervention (Duwe and Clark, 2015). Yet, by assigning individuals to effective

interventions that are, in turn, the best interventions for those individuals, the use of an

actuarial approach for assessing responsivity holds the potential of delivering better

recidivism outcomes overall.

Third, even though the RNR model recommends assigning offenders on the basis

of risk, needs, and responsivity, treatment assignment decisions are often made strictly on

the basis of risk and needs due to the absence of any formal assessments for responsivity.

As such, offenders who are prioritized for programming are those with the highest risk

and needs. Our findings suggest, however, that assigning offenders strictly on the basis of

risk and needs may not deliver the desired results. Indeed, when we assigned offenders

just on the basis of risk and needs, we observed a reduced effect size for CD treatment, a

higher NNT, and fewer prevented recidivists. What these findings suggest is that many of

the highest-risk individuals may be too entrenched in a criminal lifestyle to desist as a

result of participating in CD treatment. While CD treatment may be enough to get lower-

risk prisoners to desist, more programming is needed for the higher-risk offenders. This

finding is consistent with the notion that greater doses of programming (i.e., multiple

27

interventions that address multiple needs areas) are needed for the highest-risk offenders

to help bring about desistance (Lowenkamp and Latessa, 2005).

Finally, notwithstanding the focus on a single correctional intervention in this

study, we suggest that simultaneously assessing responsivity to multiple interventions

may yield the greatest benefits. Correctional agencies typically have more than one

intervention to offer offenders and, as noted above, a single intervention may be

insufficient to bring about desistance for those with a higher risk for recidivism.

Therefore, the goal should involve conducting responsivity assessments for all

interventions an agency may have to provide offenders.

For example, let us assume a corrections agency has five interventions to which

offenders can be assigned on the basis of a risk and needs assessment. Responsivity

assessments for each of the five interventions may help better identify which programs

would work best for each individual offender. Moreover, for the higher-risk offenders

with longer confinement periods, which would allow for participation in multiple

programs, the responsivity assessment could evaluate which combinations of

interventions would most likely lead to desistance.

To illustrate, let us assume we have a very high risk individual who will be in

prison for two years, which is ample time to participate in multiple interventions. Let us

further assume a single intervention is unlikely to result in desistance for this individual.

If completing, say, CD treatment is unlikely to help this individual desist, what would his

probability for desistance be after completing CD treatment and an employment program

or cognitive-behavioral therapy? Responsivity assessments to multiple interventions

28

might reveal the best combination of programming for this individual and, in doing so,

would help deliver better recidivism outcomes overall.

As indicated by the limitations noted earlier, this study should be considered a

first step towards taking a more rigorous, actuarial approach to responsivity assessment.

Future research should examine whether this approach is effective for other types of

interventions for different offender populations in other jurisdictions. Along the same

lines, future studies should look at whether actuarial responsivity assessments can

accommodate multiple interventions so as to identify which intervention might work best

for an individual or whether multiple interventions are needed to achieve desistance for

higher-risk offenders. In addition, because we used a simple summative approach in

combining the values from the risk, needs, and responsivity assessments, future research

should examine whether there are more effective procedures for consolidating values into

a composite score.

If an actuarial approach for assessing responsivity is proven to be viable and

generalizable, there would undoubtedly be questions about how best to implement this

approach in practice. Given the reliance on historical programming data to assess

responsivity, the method we introduced here would seem to favor a more customized

assessment process that is specific to an agency and the programming it provides. This

does not mean, however, that a more generic actuarial responsivity assessment could not

be developed and integrated with global, off-the-shelf risk and needs assessments that are

used across multiple jurisdictions. Regardless of whether a valid and reliable generic

assessment can be developed, our findings suggest that actuarial responsivity assessment

29

is an area in need of more research in the future due to the potential impact it could have

on the programming assignment process and, more broadly, public safety.

30

REFERENCES

Barnes, G.C. & Hyatt, J.M. (2012). Classifying adult probationers by forecasting future

offending. National Institute of Justice: Washington, DC.

Berk, R.A., & Bleich, J. (2013). Statistical procedures for forecasting criminal behavior:

A comparative assessment. Criminology & Public Policy 12: 513-544.

Bonta, J. & Andrews, D.A. (2007). Risk-Needs-Responsivity Model for Offender

Assessment and Rehabilitation. Ottawa: Public Safety Canada.

Bonta, J., S. Wallace-Capretta, & J. Rooney, (2000). A Quasi-Experimental Evaluation of

an Intensive Rehabilitation Supervision Program. Criminal Justice and Behavior,

27, 312-329.

Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32.

Brennan, T., Dieterich, W., & Ehret, B. (2009). Evaluating the predictive validity of the

COMPAS risk and needs assessment system.

Brennan, T., & Oliver, W.L. (2000). Evaluation of Reliability and Validity of COMPAS

Scales: National Aggregate Sample. Traverse City, MI: Northpointe Institute for

Public Management.

Burgess, E.W. (1928). Factors determining success or failure on parole. In A.A. Bruce,

E.W. Burgess, J. Landesco, & A.J. Harno (Eds.), The workings of the

indeterminate sentence law and the parole system in Illinois, (pp. 221–234).

Springfield, IL: Illinois State Board of Parole.

Caruana, R., Niculescu-Mizil, A., Crew, G., & Ksikes, A. (2004). Ensemble selection

from libraries of models, in Proceedings of the 21st International Conference on

Machine Learning, Canada: Banff, 1-12.

Caruana, R. & Niculescu-Mizil, A. (2006). An empirical comparison of supervised

learning algorithms using different performance metrics, in Proceedings of the

23rd International Conference on Machine Learning, New York: Association for

Computing Machinery, 161-168.

Cohen, M. A., & Piquero, A.R. (2009). New evidence on the monetary value of saving a

high risk youth. Journal of Quantitative Criminology, 25, 25-49.

31

Cullen, F. T. (2002) “Rehabilitation and Treatment Programs.” In J. Q. Wilson and J.

Petersilia (eds.), Crime: Public Policies for Crime Control, 2nd edition. San

Francisco: ICS Press.

Davis, J. & Goadrich, M. (2006). The relationship between precision-recall and ROC

curves, in Proceedings of the 23rd International Conference on Machine

Learning, Canada: Banff, 1-12.

DeLisi, M., Kosloski, A., Sween, M., Hachmeister, E., Moore, M., & Drury, A. (2010).

Murder by numbers: Monetary costs imposed by a sample of homicide offenders.

The Journal of Forensic Psychiatry & Psychology, 21:501-513.

Duwe, G. (2010). Prison-based chemical dependency treatment in Minnesota: An

outcome evaluation. The Journal of Experimental Criminology, 6: 57-81.

Duwe, G. (2012). Predicting first-time sexual offending among prisoners without a prior

sex offense history: The Minnesota Sexual Criminal Offending Risk Estimate

(MnSCORE). Criminal Justice and Behavior, 39, 1,434-1,454.

Duwe, G. (2014). The development, validity, and reliability of the Minnesota Screening

Tool Assessing Recidivism Risk (MnSTARR). Criminal Justice Policy Review,

25, 579-613.

Duwe, G. & Clark, V. (2015). Importance of program integrity: Outcome evaluation of a

gender-responsive, cognitive-behavioral program for female offenders.

Criminology & Public Policy, 14, 301-328.

Duwe, G. & Freske, P. (2012). Using logistic regression modeling to predict sex offense

recidivism: The Minnesota Sex Offender Screening Tool-3 (MnSOST-3). Sexual

Abuse: A Journal of Research and Treatment, 24, 350-377.

Duwe, G. & Kim, K. (2016). Sacrificing accuracy for transparency in recidivism

risk assessment: The impact of classification method on predictive performance.

Corrections: Policy, Practice and Research, 1, 155-176.

Gottfredson, S.D. & Moriarty, L.J. (2006). Statistical risk assessment: Old problems and

new applications. Crime and Delinquency, 52(1), 178–200.

Hamilton, Z., Neuilly, M-A., Lee, S., & Barnoski, R. (2014). Isolating modeling effects

in offender risk assessment. Journal of Experimental Criminology. DOI:

10.1007/s11292-014-9221-8.

32

Hand, D. J. (2009). Measuring classifier performance: a coherent alternative to the area

under the ROC curve. Machine Learning, 77, 103-123.

Hess, J. & Turner, S. (2013). Risk Assessment Accuracy in Corrections Population

Management: Testing the Promise of Tree Based Ensemble Predictions. Center

for Evidence-Based Corrections: The University of California, Irvine.

Liu, Y.Y., Yang, M., Ramsey, M., Li, X.S., & Cold, J.W. (2011). A comparison of

logistic regression, classification and regression tree, and neural network models

in predicting violent re-offending. Journal of Quantitative Criminology, 27, 547-

573.

Lowenkamp, C.T. & Latessa, E.J. (2005). Increasing the effectiveness of correctional

programming through the risk principle: Identifying offenders for residential

placement. Criminology and Public Policy, 4, 501-528.

Lowenkamp, C.T. & Whetzel, J. (2009). The development of an actuarial risk assessment

instrument for U.S. Pretrial Services. Federal Probation, 73, 33-36.

McCollister, K.E., French, M.T., & Fang, H. (2010). The cost of crime to society: New

crime-specific estimates for policy and program evaluation. Drug and Alcohol

Dependence, 108, 98-109.

Ridgeway, G. (2013). The Pitfalls of Prediction. National Institute of Justice Journal,

Issue No. 271.

Smith, W. (1996). The effects of base rate and cutoff point choice on commonly used

measures of association and accuracy in recidivism research. Journal of

Quantitative Criminology, 12, 83-111.

Tollenaar, N., & van der Heijden, P.G.M. (2013). Which method predicts recidivism

best? A comparison of statistical, machine learning and data mining predictive

methods. Journal of the Royal Statistical Society, Series A 176 (part 2): 565-584.

Wexler, H.K., Falkin, G.P. & Lipton, D.S. (1990). Outcome evaluation of a prison

therapeutic community for substance abuse treatment. Criminal Justice and

Behavior, 17, 71-92.

Wolpert, D.H. (1996). The lack of a priori distinctions between learning algorithms.

Neural Computation, 8, 1,341-1,390.

The Neglected “R” in the Risk-Needs-Responsivity Model: A ...

Documents