Top Banner
Gender Differences in Structured Risk Assessment: Comparing the Accuracy of Five Instruments Jeremy Coid, Min Yang, Simone Ullrich, and Tianqiang Zhang Barts and the London School of Medicine and Dentistry, Queen Mary University of London, Forensic Psychiatry Research Unit Steve Sizmur Ministry of Justice (formerly the Home Office), England and Wales Colin Roberts University of Oxford David P. Farrington University of Cambridge Robert D. Rogers University of Oxford Structured risk assessment should guide clinical risk management, but it is uncertain which instrument has the highest predictive accuracy among men and women. In the present study, the authors compared the Psychopathy Checklist—Revised (PCL–R; R. D. Hare, 1991, 2003); the Historical, Clinical, Risk Management–20 (HCR-20; C. D. Webster, K. S. Douglas, D. Eaves, & S. D. Hart, 1997); the Risk Matrix 2000 –Violence (RM2000[V]; D. Thornton et al., 2003); the Violence Risk Appraisal Guide (VRAG; V. L. Quinsey, G. T. Harris, M. E. Rice, & C. A. Cormier, 1998); the Offenders Group Reconviction Scale (OGRS; J. B. Copas & P. Marshall, 1998; R. Taylor, 1999); and the total previous convictions among prisoners, prospectively assessed prerelease. The authors compared predischarge measures with subsequent offending and instruments ranked using multivariate regression. Most instruments demon- strated significant but moderate predictive ability. The OGRS ranked highest for violence among men, and the PCL–R and HCR-20 H subscale ranked highest for violence among women. The OGRS and total previous acquisitive convictions demonstrated greatest accuracy in predicting acquisitive offending among men and women. Actuarial instruments requiring no training to administer performed as well as personality assessment and structured risk assessment and were superior among men for violence. Keywords: risk assessment, gender differences, actuarial instruments Structured risk-assessment instruments outperform clinical judgment for the prediction of subsequent violence and sexual behavior (Grove, Zald, Lebow, Snitz, & Nelson, 2000; Hanson & Bussiere, 1996; Hanson & Morton-Bourgon, 2004; Hood, Shute, Feilzer, & Wilcox, 2002; McNeil, Sandberg, & Binder, 1998). However, most instruments have been standardized on samples of male prisoners and psychiatric patients. It remains unclear whether risk-assessment instruments demonstrate acceptable levels of pre- dictive accuracy for use with female offender populations, and if so, which instrument should be recommended for routine use, and for which offending outcomes. Current lack of knowledge over risk prediction among women may be partly due to the overrepresentation of men in prisons and secure hospitals where the development of risk-assessment instru- ments has been undertaken and where large samples are required to achieve adequate reliability. It may also be due to the fact that women offenders reoffend at a lower rate than men, according to official statistics (Coid, Hickey, Kahtan, Zhang, & Yang, 2007; Cuppleditch & Evans, 2005; Kershaw, Goodman, & White, 1999). Furthermore, certain actuarial instruments that have been devel- oped on mixed sex samples ascribe a negative weighting to female gender in terms of future risk of offending (Copas & Marshall, 1998). Compared with men, women offenders may not be seen as a “problem” in terms of future risk to the public. The majority of studies on risk of violence in women offenders have been carried out with the Psychopathy Checklist—Revised (PCL–R; Hare, 1991, 2003) or the Psychopathy Checklist: Screen- ing Version (PCL:SV; Hart, Cox, & Hare, 1995). It was demon- Jeremy Coid, Min Yang, Simone Ullrich, and Tianqiang Zhang, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, Forensic Psychiatry Research Unit, London, England; Steve Sizmur, Ministry of Justice (formerly the Home Office), England and Wales, Westminster, London; Colin Roberts, Centre for Criminology, University of Oxford, Oxford, England; David P. Farrington, Institute of Criminology, University of Cambridge, Cambridge, England; Robert D. Rogers, Department of Psychiatry, University of Oxford. Steve Sizmur is now at the Picker Institute, Oxford, England. This study was funded by the Ministry of Justice (formerly the Home Office), England and Wales. Correspondence concerning this article should be addressed to Jeremy Coid, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, Forensic Psychiatry Research Unit, William Harvey House, 61 Bartholomew Close, London EC1A 7BE, United King- dom. E-mail: [email protected] Journal of Consulting and Clinical Psychology © 2009 American Psychological Association 2009, Vol. 77, No. 2, 337–348 0022-006X/09/$12.00 DOI: 10.1037/a0015155 337
12

Gender Differences in Structured Risk Assessment: Comparing the Accuracy of Five Instruments

Jan 30, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Gender Differences in Structured Risk Assessment: Comparing the Accuracy of Five Instruments

Gender Differences in Structured Risk Assessment:Comparing the Accuracy of Five Instruments

Jeremy Coid, Min Yang, Simone Ullrich,and Tianqiang Zhang

Barts and the London School of Medicine and Dentistry, QueenMary University of London, Forensic Psychiatry Research Unit

Steve SizmurMinistry of Justice (formerly the Home Office),

England and Wales

Colin RobertsUniversity of Oxford

David P. FarringtonUniversity of Cambridge

Robert D. RogersUniversity of Oxford

Structured risk assessment should guide clinical risk management, but it is uncertain which instrumenthas the highest predictive accuracy among men and women. In the present study, the authors comparedthe Psychopathy Checklist—Revised (PCL–R; R. D. Hare, 1991, 2003); the Historical, Clinical, RiskManagement–20 (HCR-20; C. D. Webster, K. S. Douglas, D. Eaves, & S. D. Hart, 1997); the Risk Matrix2000–Violence (RM2000[V]; D. Thornton et al., 2003); the Violence Risk Appraisal Guide (VRAG;V. L. Quinsey, G. T. Harris, M. E. Rice, & C. A. Cormier, 1998); the Offenders Group ReconvictionScale (OGRS; J. B. Copas & P. Marshall, 1998; R. Taylor, 1999); and the total previous convictionsamong prisoners, prospectively assessed prerelease. The authors compared predischarge measures withsubsequent offending and instruments ranked using multivariate regression. Most instruments demon-strated significant but moderate predictive ability. The OGRS ranked highest for violence among men,and the PCL–R and HCR-20 H subscale ranked highest for violence among women. The OGRS and totalprevious acquisitive convictions demonstrated greatest accuracy in predicting acquisitive offendingamong men and women. Actuarial instruments requiring no training to administer performed as well aspersonality assessment and structured risk assessment and were superior among men for violence.

Keywords: risk assessment, gender differences, actuarial instruments

Structured risk-assessment instruments outperform clinicaljudgment for the prediction of subsequent violence and sexualbehavior (Grove, Zald, Lebow, Snitz, & Nelson, 2000; Hanson &Bussiere, 1996; Hanson & Morton-Bourgon, 2004; Hood, Shute,Feilzer, & Wilcox, 2002; McNeil, Sandberg, & Binder, 1998).However, most instruments have been standardized on samples of

male prisoners and psychiatric patients. It remains unclear whetherrisk-assessment instruments demonstrate acceptable levels of pre-dictive accuracy for use with female offender populations, and ifso, which instrument should be recommended for routine use, andfor which offending outcomes.

Current lack of knowledge over risk prediction among womenmay be partly due to the overrepresentation of men in prisons andsecure hospitals where the development of risk-assessment instru-ments has been undertaken and where large samples are requiredto achieve adequate reliability. It may also be due to the fact thatwomen offenders reoffend at a lower rate than men, according toofficial statistics (Coid, Hickey, Kahtan, Zhang, & Yang, 2007;Cuppleditch & Evans, 2005; Kershaw, Goodman, & White, 1999).Furthermore, certain actuarial instruments that have been devel-oped on mixed sex samples ascribe a negative weighting to femalegender in terms of future risk of offending (Copas & Marshall,1998). Compared with men, women offenders may not be seen asa “problem” in terms of future risk to the public.

The majority of studies on risk of violence in women offendershave been carried out with the Psychopathy Checklist—Revised(PCL–R; Hare, 1991, 2003) or the Psychopathy Checklist: Screen-ing Version (PCL:SV; Hart, Cox, & Hare, 1995). It was demon-

Jeremy Coid, Min Yang, Simone Ullrich, and Tianqiang Zhang, Bartsand the London School of Medicine and Dentistry, Queen Mary Universityof London, Forensic Psychiatry Research Unit, London, England; SteveSizmur, Ministry of Justice (formerly the Home Office), England andWales, Westminster, London; Colin Roberts, Centre for Criminology,University of Oxford, Oxford, England; David P. Farrington, Institute ofCriminology, University of Cambridge, Cambridge, England; Robert D.Rogers, Department of Psychiatry, University of Oxford.

Steve Sizmur is now at the Picker Institute, Oxford, England.This study was funded by the Ministry of Justice (formerly the Home

Office), England and Wales.Correspondence concerning this article should be addressed to Jeremy

Coid, Barts and the London School of Medicine and Dentistry, QueenMary University of London, Forensic Psychiatry Research Unit, WilliamHarvey House, 61 Bartholomew Close, London EC1A 7BE, United King-dom. E-mail: [email protected]

Journal of Consulting and Clinical Psychology © 2009 American Psychological Association2009, Vol. 77, No. 2, 337–348 0022-006X/09/$12.00 DOI: 10.1037/a0015155

337

Page 2: Gender Differences in Structured Risk Assessment: Comparing the Accuracy of Five Instruments

strated that the interpersonal and affective factor was moderatelycorrelated with violent recidivism but not the social deviancecomponent (Salekin, Rogers, Ustad, & Sewell, 1998). Genderdifferences were reported by Strand and Belfrage (2005), whofound that female psychopaths displayed significantly more lying,deceitfulness, and lack of control, whereas male psychopaths dem-onstrated more antisocial features. It has been suggested thatcorrelates of psychopathy in women relevant to risk assessment forcrime and violence tend to be modest (Nicholls, Ogloff, Brink, &Spidel, 2005) and that it may be questionable whether the instru-ments that are used to assess psychopathy are tapping the sameconstruct across gender (Forouzan & Cooke, 2005). Using psy-chopathy in violence risk assessments with women should there-fore be approached with caution (Falkenbach, 2008).

One study used a modified version of the Violence Risk Ap-praisal Guide (VRAG; Quinsey, Harris, Rice, & Cormier, 1998) ina nonforensic clinical sample for prediction of self-reported vio-lence. The modified version yielded a large effect size in theprediction of postdischarge severe violence over 20- and 50-weekperiods, and the accuracy of the VRAG predictions was unrelatedto gender (Harris, Rice, & Camilleri, 2004).

Two studies examining the predictive ability of the Historical,Clinical, Risk Management–20 (HCR-20; Webster, Douglas,Eaves, & Hart, 1997) that have reported data for women separatelydid not find significant differences between men and women(Strand & Belfrage, 2001; Webster et al., 1997; Webster, Eaves,Douglas, & Wintoup, 1995). However, a more recent study carriedout in a Dutch forensic psychiatric hospital (de Vogel & de Ruiter,2005) showed that the HCR-20 demonstrated much lower predic-tive accuracy for violent outcome in women compared with menand that only the final risk judgment and not the total score weresignificant predictors for violence.

Unfortunately, most previous studies that have compared thepredictive ability of two or more instruments have been restrictedto men (Belfrage, Franson, & Strand, 2000; Douglas, Yeomans, &Boer, 2005; Glover, Nicholson, Hemmati, Bernfeld, & Quinsey,2002; Grevatt, Thomas-Peter, & Hughes, 2004; Kroner & Loza,2001; Kroner & Mills, 2001; Mills & Kroner, 2006; Morrissey etal., 2007; Snowden, Gray, Taylor, & MacCulloch, 2007; Stadtlandet al., 2005) or have combined their male and female participantsfor the purpose of statistical analysis when comparing instruments(de Vogel, de Ruiter, Hildebrand, Bos, & van de Ven, 2004; Doyle& Dolan, 2006; Doyle, Dolan, & McGovern, 2002; Grann, Bel-frage, & Tengstrom, 2000; Gray et al., 2003, 2004). Warren et al.(2005) compared the PCL–R and HCR-20 in a retrospective studyaccording to their associations with previous offending in a sampleof female maximum security inmates. They did not demonstrateany differences between the two instruments, but this study did notinclude male participants. The study of Nicholls, Ogloff, andDouglas (2004) is the only one currently available that has com-pared male and female offenders on more than one instrument.They compared the HCR-20, PCL:SV, and the Violence ScreeningChecklist (VSC; McNeil & Binder, 1994) in a sample of involun-tarily hospitalized male and female psychiatric patients and ob-served significantly higher mean total scores on each of the threeinstruments among men compared with women. Retrospectiveexamination of medical and correctional records did not demon-strate predictive ability above a level of chance among men on anyof the three instruments for inpatient violence. However, the

HCR-20 and PCL:SV, together with their subscales, demonstratedgood to moderate predictive ability among women (the VSC failedto predict above chance and failed to predict community violenceabove a level of chance among both men and women). Whenexamining the predictive ability of the instruments for violentcrime, the HCR-20 and PCL:SV generally fared better amongwomen for total scores and certain subscales, according to AreaUnder the receiver operating characteristic Curve (AUC) values.Nevertheless, examination of the confidence intervals in Nichollset al.’s data indicates that they were not significantly better amongwomen. A similar pattern was observed for a combined category ofany crime, in which the PCL:SV total and Part 2 scores justachieved a level of statistical significance above that of men.

Nicholls et al.’s (2004) study has important implications for thepresent study. First, certain instruments may be better predictorsamong participants of one gender than another. Second, instru-ments that include subscales, such as the PCL:SV and HCR-20,may show differing predictive effects between genders. Third, thepredictive ability of individual instruments may differ betweenmen and women according to outcome, for example, differentcategories of criminal conviction or violent incidents in institutionsand the community.

We compared the predictive accuracy of five risk-assessmentinstruments for violent and other offending behaviors among asample of male and female prisoners released from prisons inEngland and Wales. Structured risk assessment is broadly dividedinto three classifications: (a) structured risk-assessment guides, (b)personality assessment, and (c) actuarial methods (Gray et al.,2004). These instruments can numerically quantify or stratify riskfor individuals or estimate the probability of a future incident byreferencing to groups with known reoffending rates. This canguide subsequent clinical management. We therefore includedinstruments in this study from each of these three classifications,irrespective of whether they had been originally standardized onmale only or mixed gender samples. In addition, we created threesimple measures of risk by adding together the total number ofprevious convictions for the categories of violence (homicide,major violence, minor violence, and weapons offenses), acquisi-tive (burglary, theft, receiving, forgery, deception, and obtainingpecuniary advantage), and a combined category of any reconvic-tion. Our intention was to investigate whether the five instrumentscould improve on the predictive ability of these simple measuresfor future criminal behavior and whether there were differencesaccording to gender.

Method

We carried out a prospective study of a cohort of male andfemale prisoners in England and Wales released between Novem-ber 14 (2002) and October 13 (2005) in the case of men andbetween November 14 (2002) and February 9 (2007) in the case ofwomen. Participants were interviewed during the 6–12 monthperiod before their expected date of release by trained interviewersusing a battery of clinical and risk-assessment measures for violentand other criminal behavior. The dependent variable was theproportion of participants who were or were not reconvicted,within different categories of offending behavior, derived fromcriminal records. Reoffending was measured following their re-lease into the community over a mean follow-up of 1.97 years

338 COID ET AL.

Page 3: Gender Differences in Structured Risk Assessment: Comparing the Accuracy of Five Instruments

(SD � 0.48; range � 8–819 days for time at risk) among men and1.40 years (SD � 0.86; range � 7–1,317 days for time at risk)among women. Only 15 men and 1 woman (0.96%) reoffended orwere recalled to prison because of breach of conditions of parolelicense within 1 month of release. These participants were includedto avoid bias against participants with special characteristics thathad led to rapid reoffending.

Sample

The sample was generated from the Prison Service InmateInformation System if they met the following criteria: (a) servinga prison sentence of 2 years or more for a sexual or violentprincipal offense (excluding life sentence prisoners), (b) being 18years of age or older, and (c) having 1 year left to serve. Infor-mation was provided on previous criminal history from the HomeOffice Offenders Index on all prisoners in England and Walesmeeting these criteria. On the basis of their current and previousconvictions, we identified a stratified sample of 3,143—with over-selection of prisoners who were either from ethnic minority groupsor younger age groups, or who were potentially high-risk offend-ers—using the highest scoring 10% on the Offenders Group Re-conviction Scale (OGRS; Copas & Marshall, 1998; Taylor, 1999).Because the population of women prisoners was much smaller,stratification was not applied, and all women (n � 391) meetingthe inclusion criteria were selected.

Among the male selected sample, 663 (21.1%) refused to par-ticipate, and 1,081 (34.4%) could not be interviewed, were foundto be unsuitable for inclusion, had died, or had been deportedbefore they left prison. Among the female sample, the numbers notinterviewed for the same reasons were much smaller: 35 (9.0%)and 35 (9.0%), respectively. In 1,116 cases (men and women), aninterview was requested, but the majority of those (n � 963;86.3%) were released from prison before the interview could bearranged.

A total of 1,396 male and 321 female prisoners were inter-viewed by 12 research assistants (psychology graduates) whovisited and interviewed participants, usually spending a day com-pleting the assessments for each participant, initially reading andextracting data from prison files and carrying out an interviewlasting 3–4 hrs. The interview initially established the criminalhistory and nature of the index offense. The Structured ClinicalInterview for DSM–IV Axis II Personality Disorders (First, Gib-bon, Spitzer, Williams, & Benjamin, 1997) and modules to iden-tify current or lifetime schizophrenia or delusional disorder, sub-stance misuse disorders, and depressive disorders wereadministered, followed by the risk-assessment instruments. Partic-ipants were excluded if not released from prison during thefollow-up period, including 17 women and 43 men.

Risk Measures

Measures evaluated included some of those in use in the Dan-gerous and Severe Personality Disorder pilot services for Englandand Wales and in the probation services in the United Kingdom.We developed a semistructured interview to collect all relevantdata using the instruments. Rating the risk-assessment instrumentsrequired exploration of criminal history, which was made availableto the researcher before interview. Researchers were trained in

administration and scoring of all risk-assessment instruments ex-cept the OGRS, which was obtained from computerized records.

The PCL–R (Hare, 1991, 2003) assessed psychopathy as part ofa comprehensive clinical assessment of personality. It consists of20 items scored 0, 1, or 2 on the basis of clinical interview andreview of file information. Item scores are summed to create a totalscore ranging from 0 to 40 and to reflect an estimate of the degreeto which an individual matches the prototypical psychopath at acutoff of 30. Although not originally developed as a risk-assessment instrument, two meta-analyses have demonstrated thatthe PCL–R is a strong predictor of violent recidivism (Hemphill,Hare, & Wong, 1998; Salekin, Rogers, & Sewell, 1996). This hasresulted in psychopathy, as measured by the PCL–R, being in-cluded as a risk factor within other risk-assessment instruments,such as the HCR-20 and VRAG (see below).

The VRAG (Quinsey et al., 1998) is a 12-item actuarial instru-ment developed from files of male criminal offenders and forensicpatients with attributed integer weights, ranging from –5 to �12.The instrument was designed for use with forensic populations;items require rating of the index offense, psychopathy, alcohol use,and past nonviolent crime.

The HCR-20 (Webster et al., 1997) is a structured risk-assessment guide and composite of 20 risk factors for futureviolence in adult offenders with a violent history and/or a majormental disorder or personality disorder. The instrument is dividedinto three subscales with 10 historical items relating to past,relatively stable violence risk factors; 5 clinical items reflectingcurrent, dynamic correlates of violence that are thought to bechangeable; and 5 risk-management items focusing on situationalfactors that might aggravate or mitigate risk. In this study, clinicaland risk management items were rated prior to release on the basisof clinical presentation and anticipated situational factors. TheHCR-20 was included as total score and subscale scores in sub-sequent analyses.

The Risk Matrix 2000–Violence (RM2000[V]; Thornton et al.,2003) is intended for use with men 21 years of age or older, but itwas applied to participants 18 years of age and older in this study.The instrument was developed as a simple, cost-effective, actuarialpredictor of violence on the basic premise that most criminalbehavior is predictable from a simple combination of age and someindicators of prior offending of the type being predicted (Friend-ship, Thornton, Erikson, & Beech, 2001). The RM2000(V) in-cludes only three items: (a) age at commencement of risk (agewhen next able to offend, i.e., on release), (b) violent appearancesin court that led to conviction, and (c) any burglaries. The indi-vidual can score between 0 and 6.

The OGRS–II (Copas & Marshall, 1998; Taylor, 1999) is acriminogenic actuarial instrument based solely on history of of-fending and certain demographic variables. The OGRS–II esti-mates the probability that offenders will be reconvicted of anyoffense within 2 years of release on the basis of nine variables(e.g., age, gender, current and previous offenses, rate of convic-tion, etc.). It does not use clinical judgment, and estimates ofreliability are not necessary as all ratings are computer generated.The score cannot be calculated for persons without previous con-victions, and the instrument does not include any assessment orweighting of mental health variables.

Because of limited funding and resources, interrater agreementcould not be evaluated throughout the study. However, because

339GENDER DIFFERENCES IN RISK

Page 4: Gender Differences in Structured Risk Assessment: Comparing the Accuracy of Five Instruments

PCL–R and HCR-20 are particularly difficult to rate, the 12research assistants were sent to a course carried out by establishedtrainers to test the quality of their ratings. A 1-day assessment oninterrater reliability for the HCR-20 took place in March 2003. Thestandard measure of interrater reliability (when a number of ratersare scoring a number of cases) is the intraclass correlation (ICC;Shrout & Fleiss, 1979). In short, ICC compares the variability inthe data due to true differences in the HCR-20 scores with thevariability in the data due to differences in the raters. Obviously, ifthe reliability of the raters is high, then the variability in the ratersshould be small compared with the variability of the HCR-20scores. This will result in a high ICC. It is generally accepted thatan ICC reliability of 0.9 is excellent, 0.8 is good, and 0.7 is barelyacceptable. An ICC of below 0.7 is not considered acceptable.ICCs were calculated for the group as a whole and resulted in thefollowing: HCR-20 total ICC � 0.98; history ICC � 0.98; clinicalICC � 0.80; and risk ICC � 0.87.

A 2-day assessment for the interrater reliability for the PCL–Rtook place in June 2003. The 2 days involved the scoring of sixreliability cases. The ICC for the PCL–R total score was 0.70. Toexamine the reasons for this relatively low ICC, we calculated thedifference between the correct criterion score for each rater. Theperson’s average discrepancy (ignoring the sign of the difference)and their range of discrepancy scores (taking account of the sign ofthe difference) were then calculated. The standard error of mea-surement represents the standard deviation of observed scores ifthe true score is held constant. In the case of the PCL–R, thismeans that if 100 trained raters assessed the same participant at thesame time, 68% of the scores would fall within �1 SEM, whereas95% of the scores would fall within �2 SEM. Using the SEM asreported in the PCL–R manual (PCL–R total score �3), we createda “pass criterion” of having the average discrepancy fall within 1SEM and for no individual measurement to be greater than �2SEM for each of the types of scores. Using this criterion, weidentified those raters who “passed” with acceptable reliability andthose who did not. From the discrepancy analysis, most raters weregood, with one exception who failed. After exclusion of thisperson from the analysis, the results improved (ICC PCL–R totalscore � 0.85).

Little information is available on the psychometric properties ofthe above mentioned risk-assessment instruments in women, andinformation has focused almost exclusively on the PCL–R andPCL:SV. Salekin, Rogers, and Sewell (1997) established the con-struct validity of psychopathy in a female offender sample using amultitrait–multimethod evaluation. The 2-year, test–retest reliabil-ity of the PCL–R was examined in a sample of male and femalemethadone patients (Rutherford, Cacciola, Alterman, McKay, &Cook, 1999). Factor 1 was more reliably measured in womencompared with men; furthermore, in women Factor 2 was signif-icantly less reliable than Factor 1 or the total score. In a review onreliability, validity, and implications for clinical utility (Vitale &Newman, 2001), it was concluded that there is support for themeasure’s reliability and modest support for its validity. Themanual of the PCL–R, 2nd edition (Hare, 2003) provides furthersupport for sufficient interrater reliability and internal consistencyfor female offenders (ICC1 � 0.94, ICC2 � 0.93, � � .84) and thevalidity (e.g., content, concurrent, convergent, and discriminant) ofthe instrument.

Ethical approval was obtained from South East Multi-CentreResearch Ethics Committee, Kent and Medway Strategic HealthAuthority. Participants gave written informed consent for the in-terview and for the searching of their criminal records.

Outcome Measures

Outcome data on 1,353 male and 304 female prisoners werederived from reconvictions recorded in the Police National Com-puter, an operational police database containing criminal historiesof all offenders in England, Wales, and Scotland. This source hasa lower failure rate than the Home Office Offenders Index fornonidentification and is updated more regularly (Howard & Ker-shaw, 2000).

Outcome variables included reconviction categories of violent,acquisitive, and any reoffending. Sexual reconviction was notassessed owing to the small number of these reoffenders. Forcategorization of violent offenses, we used offenses in the HomeOffice’s Standard List for definition of violence (committed) plusthreats to commit such an offense for England, Wales, and Scot-land. We did not include offenses involving damage to property orrobbery.

Statistical Analysis

Reoffenders in each offense category were compared with allother participants, including other categories and nonoffenders.Chi-square tests were used to compare gender differences in cat-egorical variables, such as prevalence of personality disorders andethnicity, and t tests were used to compare mean differences oncontinuous variables, such as instrumental scale scores and age.The AUC was used to assess the predictive accuracy for each riskinstrument by offender category, with 95% confidence intervals.The possible sampling error in the difference between the esti-mated AUC and the chance value 0.5 was tested for significancefor each instrument. The difference in the predictive efficacy ofeach instrument between genders was tested by fitting a multivar-iate logistic regression model with gender as a covariate and itsinteraction with each instrument score in the model. (The termmultivariate refers to multiple outcome variables from multiplerisk-assessment instruments. It involves fitting parallel regressionmodels for all outcomes simultaneously. The correlation betweenoutcomes of instruments in this case is taken into account of by thevariance–covariance residual matrix of the model.) We adjustedfor age in this analysis.

To compare the predictive efficacy among instrument scales, weperformed multivariate linear regression analysis for men andwomen samples separately. Similar to the above multivariate lo-gistic model, this analysis estimates the discriminant effects of allinstruments independent of others and allows simultaneous esti-mation of the pairwise variance and covariance among instru-ments. All scale scores were z-standardized to an equal scale witha mean of 0 and a standard deviation of 1 for direct comparison ofregression estimates. The regression coefficient was interpreted asthe mean difference of the scale score between reoffenders in aspecific category and others. The standard z test was used toexamine the significance of the regression coefficient. The largerthe coefficient, the greater was the discrimination or predictivepower of the scale.

340 COID ET AL.

Page 5: Gender Differences in Structured Risk Assessment: Comparing the Accuracy of Five Instruments

We conducted further multiple ad hoc tests, using generalizedWald tests, to compare the standardized regression coefficients toexamine whether they were significantly predictive and to rank thediscriminant effects of these scales. SPSS v12.0 for Windows (fordescriptive analysis) and MLwiN v2.02 (Rasbash, Steele, &Browne, 2003) were used for fitting multivariate logistic and linearmodels.

Results

The released sample consisted of 1,353 male participants with amean age of 30.7 years (SD � 11.4; range � 18–75 years), and304 female participants with a mean age of 28.2 years (SD � 8.8;range � 18–60 years). The mean length of sentence completedprior to release was 4.9 years (SD � 2.7; range � 0.1–28.0) formen, and 2.0 years (SD � 2.01; range � 0.42–12.8 years) forwomen. Table 1 demonstrates the demographic and criminological

characteristics of male and female prisoners and compares the twosamples, indicating a similar mean follow-up period for both menand women. However, while having a somewhat shorter time atrisk, more women had been returned to prison (because of breachof conditions of parole license) at the end of the follow-up period,and the men were significantly older. There were no genderdifferences according to ethnicity.

For both men and women, the most common index offense wasrobbery, followed by violent offending. Robbery was significantlymore prevalent among women prisoners, and sexual offenses weresignificantly more prevalent among men. There were no differ-ences in the number of previous violent and robbery offensesbetween the male and female samples, although men demonstratedsignificantly more previous acquisitive offending. Most partici-pants had a personality disorder, with men significantly morelikely to receive a diagnosis of antisocial personality disorder than

Table 1Comparison of Male and Female Prisoners

VariableFemale prisoners

(n � 304)Male prisoners

(n � 1,353) Statistic p

Follow-up in years: M 2.1 2.0Minimum–maximum 23 days–4.1 years 6 days–2.9 years

No. returned to prison: n (%) 75 (24.7) 180 (13.3) 24.6a .000Time at risk (in days): M (SD) 685.9 (341.2) 721.0 (198.3) 9.80b .000Age (years): M (SD) 28.2 (8.8) 30.7 (11.4) 3.59b .000Ethnicity: n (%)

White 248 (81.6) 1,065 (78.7) 4.67a .198Black 35 (11.5) 204 (15.1)Asian 1 (0.3) 41 (3.0)Other 20 (6.6) 43 (3.2)

Index offense: n (%)Violence 125 (41.4) 528 (39.1) 0.46a .498Sex 11 (3.6) 315 (23.3) 60.7a .000Robbery 180 (59.2) 649 (48.0) 12.6a .000Drug 7 (2.3) 50 (3.7) 1.25a .264Acquisitive 38 (12.5) 213 (15.7) 2.03a .154

No. of pre- and index offenses: M (SD)Violence 2.3 (3.2) 2.3 (3.2) 0b

Robbery 1.1 (1.4) 1.1 (1.8) 0b

Acquisitive 7.8 (12.6) 11.2 (14.9) 4.13b .000Personality disorder: n (%)

ASPD 160 (53.0) 866 (64.0) 13.6a .000Other PD 122 (40.1) 548 (41.1) 0.01a .920No PD 97 (31.1) 359 (26.5) 3.59a .058

Axis I disorders: n (%)Schizophrenia 59 (19.4) 106 (7.8) 37.1a .000Delusional disorder 12 (3.9) 37 (2.7) 1.27a .259Lifetime depression 206 (67.8) 411 (30.4) 148.5a .000Drug dependence 182 (60.1) 524 (38.7) 45.4a .000Alcohol disorder 84 (27.8) 276 (20.4) 7.64a .006

Reconviction rate: n (%)Violence 25 (8.2) 178 (13.3) 5.61a .018Sex 5 (1.6) 7 (0.5) 2.73a .098Robbery 7 (2.3) 64 (4.7) 4.18a .041Drug 14 (4.6) 128 (9.5) 7.47a .006Acquisitive 47 (15.5) 301 (22.2) 6.89a .009Any 88 (28.9) 609 (45.0) 26.3a .000

Note. Violence offense includes homicide, major violence, minor violence, and weapons offenses. Acquisitiveoffense includes burglary, theft, receiving, forgery, deception, and obtaining pecuniary advantage. The magni-tude of the percentages varied slightly because of different numbers of missing cases. ASPD � antisocialpersonality disorder; PD � personality disorder.a Chi-square test. b t test.

341GENDER DIFFERENCES IN RISK

Page 6: Gender Differences in Structured Risk Assessment: Comparing the Accuracy of Five Instruments

women. Women prisoners were significantly more likely to re-ceive Axis I diagnoses of schizophrenia, depression, drug depen-dence, and alcohol dependence compared with male prisoners.Table 1 also demonstrates that male prisoners were significantlymore likely to be reconvicted of all categories of crime (except forsex offending) at follow-up than women.

Descriptive Statistics

The means and standard deviations of the total scores of the fiveinstruments, together with their subscales, are reported in Table 2.Men scored significantly higher than women on the OGRS,VRAG, PCL–R total score, PCL–R Factor 1 score, and HCR-20 Csubscale, whereas women scored significantly higher than men onthe HCR-20 total and HCR-20 H and R subscales.

Receiver Operating Characteristic Analyses

In Table 3, we demonstrate the predictive accuracy of the fiveinstruments (including PCL–R Factors 1 and 2 and the HCR-20subscales) using receiver operating characteristic analyses. On thebasis of a comparison with the literature, AUCs in the range of0.75–0.80 are considered moderate to large effect sizes (Mossman,1994; Rice, 1997). The table demonstrates that few measuresachieved this level in this prospective study. However, all instru-ments significantly predicted reoffending behavior for each of thethree offending categories among men (ranging from AUC �.59–.72), except for PCL–R Factor 1 in the case of violence. Moreinstruments and their subscales failed to achieve statistical signif-icance among women, including the OGRS and HCR-20 R sub-scale for violence, and the PCL–R Factor 1 and HCR-20 Rsubscale for acquisitive offending. No instrument predicted abovea moderate level of ability for any offending outcome.

The AUC values indicated that the OGRS had the highest levelof predictive accuracy for violent reoffending among men, fol-lowed by the VRAG, RM2000(V), PCL–R Factor 2, HCR-20 totalscore, and PCL–R total score. The HCR-20 R subscale had lowestpredictive ability for violent reoffending. Adding the three sub-scales of the HCR-20 together and Factors 1 and 2 of the PCL–Rdid not improve the overall accuracy of these instruments. PCL–R

Factor 2 demonstrated higher AUC values for violent, acquisitive,and any offending among men.

Table 3 demonstrates that certain trends observed among menwere reversed for women and that the OGRS, VRAG, andRM2000(V) demonstrated lowest predictive ability for violence.Among female prisoners, the total PCL–R score demonstrated thehighest predictive power for violence, followed by PCL–R Factor2 and HCR-20 H subscale scores. These trends differed for ac-quisitive and any offending in which the OGRS demonstrated thehighest predictive ability. The PCL–R and its two factors, the totalHCR-20 score and its H and C subscales, all achieved higher AUCvalues for the prediction of violent reoffending among women thanamong men.

Table 3 demonstrates that the OGRS, VRAG, and RM2000(V)achieved higher AUC values for violent reoffending among menthan among women. For outcomes of acquisitive offending and acombined category of any offending, all instruments demonstratedhigher AUC values among male prisoners compared with women.There were marginal differences between male and female pris-oners in the case of the PCL–R total score and HCR-20 R subscalefor the combined category of any reoffending.

In Table 3, we also demonstrate the accuracy, using AUCvalues, of the total number of previous violent and acquisitiveconvictions (including the original index offense leading to im-prisonment) in predicting subsequent reoffending. A simple mea-sure of previous violence incidents demonstrated similar or higherAUC values for subsequent violent reoffending than the HCR-20total and its subscale scores, and both the PCL–R total and Factor1 scores among men. However, previous violence demonstratedlower AUC values for subsequent violent reconvictions amongwomen when compared with all instruments, except the OGRS andHCR-20 R subscale. Previous violence demonstrated lower AUCvalues than any risk-assessment instrument for future acquisitiveoffending among men and was not predictive beyond the level ofchance among women. Previous violence also demonstrated lowerAUC values than most instruments for a combined category of anyoffending among both men and women.

Previous acquisitive offending demonstrated higher AUC valuesamong men for subsequent violent offending compared with the

Table 2Comparison of Means and Standard Deviations for Five Risk Instruments

Measure

Men Women

t test pn M SD n M SD

OGRS 1,350 56.9 28.3 290 48.6 23.7 4.67 .000VRAG 1,343 11.7 10.9 302 9.98 9.95 2.53 .014RM2000(V) 1,337 4.37 1.83 303 4.17 1.47 1.79 .073PCL–R 1,345 18.1 7.62 305 16.4 7.48 3.49 .000

Factor 1 1,353 6.04 3.69 305 4.86 3.44 5.09 .000Factor 2 1,353 9.24 4.37 305 8.77 4.19 1.71 .087

HCR-20: Total 1,271 19.1 7.80 302 20.2 7.48 2.26 .023HCR-20 H subscale 1,281 11.1 4.56 302 12.1 4.29 3.27 .001HCR-20 C subscale 1,339 3.39 2.15 302 3.11 2.11 2.04 .041HCR-20 R subscale 1,337 4.50 2.55 302 5.03 2.42 3.28 .001

Note. OGRS � Offenders Group Reconviction Scale; VRAG � Violence Risk Appraisal Guide; RM2000(V) � Risk Matrix 2000–Violence; PCL–R �Psychopathy Checklist—Revised; HCR-20 � Historical, Clinical, Risk Management–20.

342 COID ET AL.

Page 7: Gender Differences in Structured Risk Assessment: Comparing the Accuracy of Five Instruments

PCL–R total and Factor 1 scores, and compared with the HCR-20C and R subscales. Among women, its predictive ability was nobetter than chance. In contrast, the AUC value for previous ac-quisitive offending in predicting subsequent acquisitive offendingamong men was exceeded only by the OGRS. Among women, theAUC value for previous acquisitive offending was exceeded onlyby the OGRS and VRAG. The predictive ability of previousacquisitive offending was exceeded by the OGRS, VRAG, andRM2000(V) for a combined category of any reoffending amongmen. However, among women, the predictive ability of previousacquisitive offending exceeded only that of PCL–R Factor 1 scoresand the HCR-20 R subscale for any offending. The total number ofany previous convictions demonstrated higher AUC values com-pared with all instruments and subscales, except the OGRS, for theoutcome any offending among both men and women.

Comparisons of Predictive Accuracy UsingMultivariate Regression

To establish the independent associations of all risk-assessmentscores and subscales with the outcome measures, we examined thepredictive ability of each instrument using multivariate regressionanalysis. Table 4 compares the predictive ability of scales andsubscales of the five instruments. In the first step, the predictiveability of the PCL–R factors was examined. Among men, Factor 1demonstrated poor predictive ability for both violent and a com-bined category of any reoffending. Factor 1 demonstrated predic-tive ability for acquisitive offending at a lower level of signifi-cance. In contrast, Factor 1 demonstrated significant predictiveability for violence and a combined category of any offendingamong women, but it did not demonstrate predictive accuracyabove chance for acquisitive offending. Factor 2 demonstrated

significant predictive ability for each category of offending amongboth men and women (see Table 4).

In the second step, the HCR-20 subscales were examined,confirming that each of the H, C, and R subscales has anindependent significant predictive ability among men for eachof the three categories of offending. There was a trend for theHCR-20 H subscale to rank above the HCR-20 C subscale in itspredictive accuracy for each category of offending, and for theHCR-20 R subscale to perform less accurately than the othertwo subscales. Among women, similar trends could be ob-served, but the HCR-20 R subscale did not show predictiveaccuracy above chance for violence and acquisitive offending,or the HCR-20 C subscale score for acquisitive offending. TheHCR-20 C subscale was not demonstrably superior amongwomen than the H subscale for violent offending and a com-bined category of any offending.

In the third step of the analysis, the HCR-20 subscales andPCL–R Factor 2 score, which had shown superior ability in steps1 and 2, were next compared with the OGRS, VRAG,RM2000(V), and total number of previous convictions for thesame category of offense. Among men, the OGRS and VRAGdemonstrated a similar level of predictive ability for violence andwere ranked first, above the RM2000(V), PCL–R Factor 2,HCR-20 H subscale, and number of previous violent convictions.The VRAG, RM2000(V), PCL–R Factor 2, and HCR-20 H sub-scale demonstrated similar but secondary levels of predictive abil-ity for violence. Among women, the VRAG, RM2000(V), PCL–RFactor 2, and the HCR-20 H subscale demonstrated similar levelsof predictive ability. However, the OGRS and the total number ofprevious violence convictions did not demonstrate predictive abil-ity above chance.

Table 3Predictive Validity of Risk Scales With Respect to Violent, Acquisitive, and Any Reconviction

Instrument

Violence Acquisitive Any

Men (base rate% � 13.2)

Women (baserate % � 8.2)

Men (base rate% � 22.2)

Women (baserate % � 15.5)

Men (base rate% � 45.0)

Women (baserate % � 28.9)

OGRS .72 (.68–.75) .54 (.43–.66)a,b .75 (.72–.78) .69 (.61–.77) .76 (.74–.79) .68 (.61–.74)VRAG .70 (.66–.74) .65 (.55–.75) .71 (.68–.74) .66 (.59–.74) .72 (.70–.75) .66 (.59–.72)RM2000(V) .69 (.65–.72) .66 (.55–.76) .70 (.67–.73) .61 (.53–.70)c .71 (.69–.74) .62 (.55–.69)d

PCL–R .64 (.60–.68) .73 (.63–.83) .66 (.63–.69) .63 (.55–.72) .65 (.62–.68) .67 (.60–.74)Factor 1 .54 (.49–.58)a .65 (.54–.77) .55 (.51–.58) .55 (.46–.64)a .53 (.50–.56) .59 (.52–.66)Factor 2 .68 (.64–.72) .71 (.62–.80) .71 (.68–.74) .64 (.56–.72)e .70 (.67–.73) .67 (.61–.73)

HCR-20: Total .67 (.63–.71) .70 (.60–.80) .69 (.66–.72) .61 (.53–.69) .67 (.64–.70) .67 (.60–.73)HCR-20 H subscale .66 (.63–.70) .73 (.64–.82) .70 (.67–.73) .62 (.54–.70) .69 (.66–.71) .67 (.61–.73)HCR-20 C subscale .64 (.60–.68) .69 (.59–.79) .63 (.60–.67) .60 (.51–.68) .62 (.59–.65) .67 (.61–.74)HCR-20 R subscale .59 (.54–.63) .59 (.47–.70)a .61 (.57–.64) .55 (.47–.64)a .58 (.55–.61) .59 (.52–.66)

No. previous and index offenseviolent convictions .68 (.64–.73) .62 (.51–.72) .58 (.54–.61) .55 (.45–.65)a .63 (.60–.66) .59 (.51–.66)

No. previous and index offenseacquisitive convictions .65 (.61–.69) .48 (.39–.57)a .74 (.71–.77) .64 (.57–.72) .70 (.67–.72) .65 (.59–.72)

No. previous and index offenseany convictions .68 (.64–.72) .53 (.43–.62)a .74 (.71–.77) .63 (.55–.71) .72 (.69–.75) .67 (.60–.73)

Note. The values represent Area Under the receiver operating characteristic Curve (AUC) values, and the 95% confidence intervals (CIs) are inparentheses. OGRS � Offenders Group Reconviction Scale; VRAG � Violence Risk Appraisal Guide; RM2000(V) � Risk Matrix 2000–Violence;PCL–R � Psychopathy Checklist—Revised; HCR-20 � Historical, Clinical, Risk Management–20.a An instrument with the 95% CI of its AUC value including 0.5 indicates a nonsignificant level of accuracy for the instrument. Comparison between menand women by multivariate logistic model, adjusted for age, follows: b p � .003. c p � .044. d p � .05. e p � .05.

343GENDER DIFFERENCES IN RISK

Page 8: Gender Differences in Structured Risk Assessment: Comparing the Accuracy of Five Instruments

For acquisitive reconvictions, all instruments and subscalesdemonstrated significant predictive ability among men, with theOGRS and number of previous acquisitive convictions rankedfirst, PCL–R Factor 2 and HCR-20 H subscale second, andRM2000(V) third. Among women, all instruments and subscaleshad significant and similar predictive ability. However, theRM2000(V) was less predictive than the OGRS.

For the combined category of any reoffending, all instruments andsubscales demonstrated significant predictive ability among both menand women. The OGRS was ranked as most predictive among men;VRAG and total number of all previous offenses second; andRM2000(V), PCL–R Factor 2, and HCR-20 H subscale third. Mul-tivariate regression failed to demonstrate superiority of any singleinstrument or subscale among women, except for the RM2000(V),which was less predictive than the OGRS and HCR-20 H subscale.

Discussion

It has been suggested that there are at least two distinct positions onthe assessment of violence risk among women (Nicholls et al., 2004).According to the “gendered perspective,” women’s crime and vio-

lence is linked with their unique experiences as women, and this inturn should influence a valid assessment of their risk. Very differentfactors may be associated with their violence and criminality, andtheir aggression may take different forms from those of men. Incontrast, the “nongendered perspective” argues that, while recogniz-ing certain sex differences, risk-assessment instruments developed onmen are nevertheless valid for use with women. The latter argumentwould imply that the same predictive items within these instrumentsare equally accurate for men and women. The nongendered perspec-tive of risk assessment is largely supported by this study but withcertain important differences between instruments giving also a de-gree of support to the gendered approach.

Most structured risk instruments demonstrated predictive abilityabove the level of chance for violent, acquisitive, and a combinedcategory of any offending among both men and women. However,it was notable that few instruments achieved a level of accuracyabove a moderate AUC value. When comparing the predictiveaccuracy of different instruments between men and women, lack ofstatistically significant differences was the most consistent obser-vation. Only the OGRS, an instrument designed to measure the

Table 4Comparison of Predictive Effects of Instruments for Reoffending Using the Multivariate Regression Model for Both Men (n � 1,353)and Women (n � 304)

Gender

Violence Acquisitive Any

StandardizedZ score Rank

StandardizedZ score Rank

StandardizedZ score RankInstrument

MenPCL–R: Factor 1 0.13 0.15� 2 0.11PCL–R: Factor 2 0.62��� 1 0.72��� 1 0.67��� 1

WomenPCL–R: Factor 1 0.56�� 2 0.19 0.31�� 2PCL–R: Factor 2 0.70�� 1 0.45�� 1 0.57��� 1

MenHCR-20 H subscale 0.58��� 1 0.71��� 1 0.66��� 1HCR-20 C subscale 0.47��� 2 0.44��� 2 0.40��� 2HCR-20 R subscale 0.33��� 3 0.39��� 2 0.29��� 3

WomenHCR-20 H subscale 0.79��� 1 0.43�� 1 0.60��� 1HCR-20 C subscale 0.65�� 1 0.27 0.54�� 1HCR-20 R subscale 0.23 0.15 2 0.26� 2

MenOGRS 0.75��� 1 0.85��� 1 0.90��� 1VRAG 0.72��� 1 0.74��� 2 0.78��� 2RM2000(V) 0.57��� 2 0.61��� 3 0.67��� 3PCL–R: Factor 2 0.62��� 2 0.72��� 2 0.67��� 3HCR-20 H subscale 0.58��� 2 0.71��� 2 0.66��� 3Previous and index same offense 0.60��� 2 0.81��� 1 0.75��� 2

WomenOGRS 0.17 0.63��� 1 0.62��� 1VRAG 0.52� 1 0.55��� 1 0.54��� 1RM2000(V) 0.47� 1 0.35� �OGRS 0.39�� �OGRS, HCR-20

H subscalePCL–R: Factor 2 0.70�� 1 0.45�� 1 0.57��� 1HCR-20 H subscale 0.79��� 1 0.43�� 1 0.60��� 1Previous and index same offense 0.37 0.48�� 1 0.54��� 1

Note. PCL–R � Psychopathy Checklist—Revised; HCR-20 � Historical, Clinical, Risk Management–20; OGRS � Offenders Group Reconviction Scale;VRAG � Violence Risk Appraisal Guide; RM2000(V) � Risk Matrix 2000–Violence. The cell “�OGRS” indicates that the rank was lower than theOGRS; the cell “�OGRS, HCR-20 H subscale” indicates that the rank was lower than both the OGRS and the HCR-20 H subscale.� p � .05. �� p � .01. ��� p � .001.

344 COID ET AL.

Page 9: Gender Differences in Structured Risk Assessment: Comparing the Accuracy of Five Instruments

risk of general reoffending, failed to predict violence amongwomen. This could be due to the sampling frame for the cohort;owing to stratification, the male sample was overselected fromwithin the top 10% of the high-risk level measured by the OGRSscore, whereas the female sample was randomly selected from theentire female sentenced prison population.

As expected, the RM2000(V), which was developed to predictviolent reoffending, was less predictive for acquisitive offendingamong women but not men. This suggests that the RM2000(V),while predicting acquisitive offending above a level of chance forwomen, demonstrated a lack of specificity for outcome that wasmore apparent for men than women. This trend could be observed,to a lesser degree, for certain other instruments. Considering thesefindings, it can be questioned why instruments designed to predictgeneral, primarily acquisitive, reoffending (such as the OGRS)should perform better, or as well, among men for the prediction offuture violence than those specifically designed to predict violence(such as the VRAG, RM2000[V], and HCR-20 H subscale). Theyalso raise the question why the latter are able to predict acquisitiveoffending with a relatively high degree of accuracy among menand, if not with the same level of accuracy as men, with a moderatedegree of accuracy among women.

Previous research has demonstrated that the PCL–R predictsboth violence and general recidivism (Hemphill et al., 1998;Salekin et al., 1996; Serin, 1996) and that the VRAG, despitebeing developed to predict violence, also predicts general re-cidivism (Glover et al., 2002). There are high correlationsbetween instruments, with some items in common (Belfrage,1998; Glover et al., 2002; Kroner & Mills, 2001; Simourd &Hoge, 2000). Our findings therefore challenge the notion ofalways matching specific instruments to specific outcomes inthe case of men and also challenge the instrument-outcomespecificity effect. However, this effect was weaker amongwomen than among men in our study.

Kroner, Mills, and Reddon (2005) have argued that risk-assessment instruments are only measuring general criminal riskand that no single instrument has entirely fulfilled the originaltheoretical basis on which it was developed. However, this argu-ment was applied to male samples. General criminal risk may beof lesser importance in the future prediction of risk of violenceamong women. Alternatively, additional factors may be moreimportant among women than men. For example, we observed thatPCL–R Factor 1, which included personality characteristics (suchas conning manipulativeness) and features of affective deficiency,demonstrated predictive ability for violent reoffending amongwomen but was not predictive above a level of chance among men.Furthermore, women participants had significantly higher preva-lence of Axis I clinical syndromes, such as affective disorder,psychotic illness, and substance use dependence. The HCR-20 Hsubscale, which includes these items, ranked higher in its predic-tive ability relative to other instruments among women for violentreoffending compared with men.

Comparing Structured Instruments WithPrevious Offending

Previous behavior is believed to be the best predictor of futurebehavior. However, we are not aware of a previous study that hassimply added the number of previous convictions for each of its

participants and then compared the predictive ability of this mea-sure with that of structured risk-assessment instruments. It washypothesized that such a measure should constitute a baselineabove which a structured instrument should excel if it is to beintroduced into routine clinical practice. For most instruments, andfor most outcomes, the structured instruments in our study failed toachieve this standard.

Among men, the total number of previous violent convictionsdemonstrated higher or similar AUC values to the PCL–R totalscore, the PCL–R Factor 1 score, and the HCR-20 and its sub-scales. Its predictive ability was exceeded slightly only by theOGRS and the VRAG. However, among women, previous violentoffending demonstrated lower AUC values than most other instru-ments, and it was not predictive above chance when comparedwith other instruments that used multivariate regression. Never-theless, for acquisitive reoffending, the total number of previousconvictions in the acquisitive category performed just as well asany structured risk-assessment instrument for both men andwomen. For a combined category of any offending, the totalnumber of all previous convictions performed as well as anystructured risk assessment among men and was exceeded only bythe OGRS. In women, though, all instruments predicted similarlyexcept the RM2000(V), which proved to be less predictive than theOGRS and the HCR-20 H subscale.

These findings correspond to the tendency of men to demon-strate a lack of instrument-outcome specificity as described above,with violent offending predicting acquisitive offending and viceversa. This would imply that, for men, the effort involved inapplying structured risk-assessment instruments, some of whichrequire considerable time to administer and expensive training, isnot justified if the intention is merely to stratify individuals intolevels of risk. The number of previous convictions among differentcategories of offending behavior will perform just as well. Thiseffect can be observed for women in the case of acquisitivereoffending but not violent reoffending.

These findings have a further important implication for risk pre-diction because they question whether a substantial component of thepredictive ability of most instruments is merely previous criminalhistory and point to the possibility that additional items convey littleadditional predictive power. Each of the five instruments in this studyincluded a measure of general criminal risk, for example, criminalversatility in the PCL–R, or previous court appearances for burglaryin the RM2000(V). Nevertheless, the relatively good ability of pre-vious acquisitive convictions to predict future violent convictionsamong the men in our sample could have an alternative explanation.This may have been partly due to a sampling effect. Many maleprisoners had multiple previous acquisitive convictions despite theirimprisonment for a sexual or violent offense at the time of interview.Criminological research has demonstrated that violent offense spe-cialization tends to be unusual in the criminal careers of offenderpopulations and that violent offenses are committed at random in thecourse of criminal careers (Farrington, 1991). By accurately predict-ing violent reoffending, previous acquisitive offending (and theOGRS) may have achieved this by accurately predicting acquisitivereoffending in a representative sample of male prisoners. However,the less extensive criminal histories of female prisoners meant thatthere was a considerably smaller contribution made by this compo-nent to the prediction of women’s violence. Alternatively, the under-lying factors leading to violence among women may have been very

345GENDER DIFFERENCES IN RISK

Page 10: Gender Differences in Structured Risk Assessment: Comparing the Accuracy of Five Instruments

different than for men. For example, the violent propensities of menembedded within the context of a generalized criminal lifestyle mayhave applied to a lesser extent among women, in which personalityabnormality and Axis I disorders may have been more important.

Limitations

The prisoner cohort included a large sample, prospectivelyinterviewed prior to release. However, it did not include life-sentence prisoners as it would have not been possible to assessthem prospectively in anticipation of their release date. Further-more, the base rate of sexual reoffending for the whole sampleduring the follow-up was very low, which would have resulted ininsufficient statistical power.

Few prisoners declined interview, and attrition was primarilydue to delays in access, unexpected transfer, or release of prison-ers. However, the study outcome was limited to criminal convic-tions, and the base rate of violence would have been higher ifadditional measures had been collected, such as self-reported vi-olence following release (see Doyle & Dolan, 2006). Nicholls et al.(2004) demonstrated that noncriminalized violence in differentsettings was predicted differently by certain instruments amongmen and women. Furthermore, the follow-up period was at a meanof approximately 2 years rather than all participants being mea-sured at the 2-year stage. This meant that a subgroup may not havebeen in the community long enough to have recorded a violentconviction.

It can be further argued that the retention of the 1-monthrecidivists might present a problem of outliers and that eachinstrument might respond differently to the bias introduced bythese cases. However, 5.5% (74/1,353) of male participants werefollowed up for less than 1 year, and 3 among those (4.1%) werereconvicted for a violent offense. Among women, 35 out of 304(11.5%) were followed up for less than 1 year, and none of themwere reconvicted for a violent offense. We repeated our analysesby removing those participants who were followed up for less than1 year and found similar results when compared with the wholesample, with no new trends observed. Given the small proportionof outliers and the large sample size after removing them, our mainfindings remained robust.

Although the use of the HCR-20 as an actuarial tool has beendiscouraged, it has been frequently evaluated in this manner andcompared with other risk-assessment instruments in previous re-search. However, the timing of when clinical and risk managementitems are rated is crucial, as they are intended to reflect currentmental state, functioning, and future contextual factors. It is there-fore unsurprising that C and R measures rated independently inthis study, and when combined in the HCR-20, they were lesspredictive over longer time periods. However, it is recommendedin the HCR-20 manual to reassess individuals at least every 6–12months because it is quite possible for risk factors to fluctuate inseverity over time (particularly the clinical and risk managementfactors, which are primarily dynamic in nature).

Certain items within instruments originally standardized on clin-ical populations, such as psychiatric patients (e.g., VRAG), mayhave had more limited predictive ability among general popula-tions of offenders, such as prisoners in our study. Because theseinstruments were originally developed for clinical use with pa-tients, this may explain why they did not perform with a level of

accuracy equal to instruments originally developed using actuarialmethods on similar populations of prisoners, for example, theOGRS. This should not detract from the clinical importance ofaccurately making a diagnosis of psychopathy using the PCL–R orguiding clinical risk management using the HCR-20, as part of therationale for the latter is that it encourages the clinician to identifyhow each risk factor functions in the life of the individual. How-ever, with respect to the predictive abilities of HCR-20 H subscalescores in women, this study demonstrates a predictive power equalto any other instrument, including actuarial measures, for violent,acquisitive, and any offending.

Recommendations for the Use of Instruments

Recommendations for the use of instruments to predict specificoffending outcomes among men and women must be consideredon the basis of the limitations demonstrated by this study. Further-more, it has been argued that actuarial risk-assessment instrumentsshould not be used to estimate an individual’s risk because of lackof acceptable level of accuracy when applied at the individual level(Hart, Michie, & Cooke, 2007). This argument has largely beenapplied to their use in guiding negative sanctions, such as extendeddetention for offenders who have committed serious violent andsexual offenses. The argument may become even stronger if struc-tured instruments result in negative sanctions for women, espe-cially if women in general are perceived as posing lesser risk. Inthis representative sample, women prisoners had less extensiveprevious criminal histories, and their reoffending rates at follow-upwere lower. An alternative proposal has been made for the use ofrisk-assessment instruments with the intention of improving clin-ical risk management of those released into the community afterserving their sentences or following discharge from hospital (Coid,Yang, et al., 2007). Risk factors that comprise only static orhistorical variables, especially those consisting primarily of de-mography and previous criminal history, cannot be changed byclinical management. Their purpose is therefore to stratify indi-viduals into levels of potential risk to guide subsequent casemanagement. For clinicians, attributing a score of risk on a con-tinuous scale with a very broad range (e.g., 0–100) will conveyfew benefits. Stratification into a limited number of levels, (e.g.,low, medium, high, and very high risk) provides a more practicalapproach to determining the intensity of treatment intervention anddegree of supervision in the community setting that is necessary toreduce risk. This approach would largely resolve the dilemmaposed by Hart et al.’s (2007) argument. The clinical emphasiswould then be upon the dynamic risk factors encountered follow-ing release (e.g., criminogenic influences from social networks,return to substance abuse, etc.) and how to intervene to reduce theeffects of these factors on reoffending.

Our findings demonstrate that if the intention is to stratify maleprisoners into levels of future risk for violence, then actuarialinstruments, the OGRS and VRAG, are the most accurate. How-ever, the VRAG can partially be scored on the basis of informationderived from case files, but a PCL–R score is also required tocomplete the assessment. No training was required to score theOGRS in our study because a computerized algorithm had beenautomatically applied to the prisoner’s criminal records prior toinclusion in the study. This contrasted with the intensive andexpensive training required to administer the PCL–R, VRAG, and

346 COID ET AL.

Page 11: Gender Differences in Structured Risk Assessment: Comparing the Accuracy of Five Instruments

HCR-20 (a PCL–R or PCL:SV score is also necessary for com-pletion of the interview). For women, however, the use of theOGRS and total number of previous violent convictions cannot berecommended for the prediction of future violence. The VRAG,RM2000(V), PCL–R Factor 2, and HCR-20 H subscale allachieved a similar level of predictive ability for future violenceamong women. If the intention is to achieve an equivalent level ofaccuracy using a single instrument for the prediction of futureviolence among both men and women prisoners, then the VRAGachieves a similar level of accuracy to the OGRS among men, andto the RM2000(V), PCL–R Factor 2 score, and HCR-20 H sub-scale among women. However, because PCL–R Factor 1 has somepredictive efficacy for future violence among women, and becausethe total PCL–R score predicts equally well as the VRAG orHCR-20, use of the PCL–R alone might be justified for thisoutcome in the case of women.

To stratify men according to their risk of future acquisitiveoffending, the OGRS and total number of previous acquisitiveoffenses performed better than other measures. Most instrumentsappeared to demonstrate predictive ability among women, andwith similar patterns observed for a combined category of anyreoffending behavior. However, if the intention is to screen womenfor future acquisitive and general reoffending, then economy andease of usage would favor the same actuarial approach as men.

References

Belfrage, H. (1998). Implementing the HCR-20 scheme for risk assessmentin a forensic psychiatric hospital: Integrating research and clinical prac-tice. Journal of Forensic Psychiatry, 9, 328–338.

Belfrage, H., Franson, G., & Strand, S. (2000). Prediction of violence usingthe HCR-20: A prospective study in two maximum-security correctionalinstitutions. Journal of Forensic Psychiatry, 11, 167–175.

Coid, J., Hickey, H., Kahtan, N., Zhang, T., & Yang, M. (2007). Patientsdischarged from medium secure forensic psychiatry services: Reconvic-tions and risk factors. British Journal of Psychiatry, 190, 223–229.

Coid, J., Yang, M., Ullrich, S., Zhang, T., Roberts, A., Roberts, C., et al.(2007). Predicting and understanding risk of re-offending: The PrisonerCohort Study. Ministry of Justice, Research Summary, 6.

Copas, J. B., & Marshall, P. (1998). The Offender Group ReconvictionScale: The statistical reconviction score for use by probation officers.Journal of the Royal Statistical Society, 47, 159–171.

Cuppleditch, L., & Evans, W. (2005). Reoffending in adults: Results fromthe 2002 cohort (Home Office Statistical Bulletin, 25/05). London:Home Office.

de Vogel, V., & de Ruiter, C. (2005). The HCR-20 in personality disor-dered female offenders: A comparison with a matched sample of males.Clinical Psychology and Psychotherapy, 12, 226–240.

de Vogel, V., de Ruiter, C., Hildebrand, M., Bos, B., & van de Ven, P.(2004). Type of discharge and risk of recidivism measured by theHCR-20: A retrospective study in a Dutch sample of treated forensicpsychiatric patients. International Journal of Forensic Mental Health,3(2), 149–165.

Douglas, K. S., Yeomans, M., & Boer, D. (2005). Comparative validity ofmultiple measures of violence risk in a sample of criminal offenders.Criminal Justice and Behaviour, 32, 479–510.

Doyle, M., & Dolan, M. (2006). Predicting community violence frompatients discharged from mental health services. British Journal ofPsychiatry, 189, 520–526.

Doyle, M., Dolan, M., & McGovern, J. (2002). The validity of NorthAmerican risk assessment tools in predicting inpatient violent behaviourin England. Legal and Criminological Psychology, 7, 141–154.

Falkenbach, D. M. (2008). Psychopathy and the assessment of violence inwomen. Journal of Forensic Psychology Practice, 8(2), 212–224.

Farrington, D. P. (1991). Childhood aggression and adult violence: Earlyprecursors and later life outcomes. In D. J. Pepler & K. H. Rubin (Eds.),The development and treatment of childhood aggression (pp. 5–29).Hillsdale, NJ: Erlbaum.

First, M. B., Gibbon, M., Spitzer, R. L., Williams, J. B. W., & Benjamin,L. (1997). Structured clinical interviews for DSM-IV Axis-II personalitydisorders. Washington, DC: American Psychiatric Press.

Forouzan, E., & Cooke, D. J. (2005). Figuring out la femme fatale:Conceptual and assessment issues concerning psychopathy in females.Behavioral Sciences and the Law, 23, 765–778.

Friendship, C., Thornton, D., Erikson, M., & Beech, A. (2001). Reconvic-tion: A critique and comparison of two main data sources in England andWales. Legal and Criminological Psychology, 6(1), 121–129.

Glover, A., Nicholson, D., Hemmati, T., Bernfeld, G., & Quinsey, V.(2002). A comparison of predictors of general and violent recidivismamong high risk federal offenders. Criminal Justice and Behaviour, 29,235–249.

Grann, M., Belfrage, H., & Tengstrom, A. (2000). Actuarial assessment ofrisk for violence: Predictive validity of the VRAG and the historical partof the HCR-20. Criminal Justice and Behavior, 27(1), 97–114.

Gray, N., Hill, C., McGleish, A., Timmons, D., MacCulloch, M. J., &Snowden, R. J. (2003). Prediction of violence and self-harm in mentallydisordered offenders: A prospective study of the efficacy of HCR-20,PCL–R, and psychiatric symptomatology. Journal of Consulting andClinical Psychology, 71(3), 443–451.

Gray, N. S., Snowden, R. J., MacCulloch, S., Phillips, H., Taylor, J., &MacCulloch, M. J. (2004). Relative efficacy of criminological, clinical,and personality measures of future risk of offending in mentally disor-dered offenders: A comparative study of HCR-20, PCL:SV, and OGRS.Journal of Consulting and Clinical Psychology, 72, 523–530.

Grevatt, M., Thomas-Peter, B., & Hughes, G. (2004). Violence, mentaldisorder and risk assessment: Can structured clinical assessments predictthe short-term risk of inpatient violence? Journal of Forensic Psychiatryand Psychology, 15, 278–292.

Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000).Clinical versus mechanical prediction: A meta-analysis. PsychologicalAssessment, 12(1), 19–30.

Hanson, R. K., & Bussiere, M. T. (1996). Sex offender risk predictors: Asummary of research results. Forum on Corrections Research, 8(2),10–12.

Hanson, R. K., & Morton-Bourgon, K. (2004). Predictors of sexual recid-ivism: An updated meta-analysis (User Report No. 2004–02). Ottawa,Ontario, Canada: Public Safety and Emergency Preparedness Canada.

Hare, R. D. (1991). The Psychopathy Checklist—Revised (PCL–R). To-ronto, Ontario, Canada: Multi-Health Systems.

Hare, R. D. (2003). The Psychopathy Checklist—Revised (PCL–R), SecondEdition. Toronto, Ontario, Canada: Multi-Health Systems.

Harris, G. T., Rice, M. E., & Camilleri, J. A. (2004). Applying a forensicactuarial assessment (the Violence Risk Appraisal Guide) to nonforensicpatients. Journal of Interpersonal Violence, 19(9), 1063–1074.

Hart, S. D., Cox, D. N., & Hare, R. D. (1995). The Hare PCL:SVPsychopathy Checklist: Screening Version. Toronto, Ontario, Canada:Multi-Health Systems.

Hart, S. D., Michie, C., & Cooke, D. J. (2007). Precision of actuarial riskassessment instruments. British Journal of Psychiatry, 140(Suppl. 49),560–565.

Hemphill, J. F., Hare, R. D., & Wong, S. (1998). Psychopathy andrecidivism: A review. Legal and Criminological Psychology, 3, 139–170.

Hood, R., Shute, S., Feilzer, M., & Wilcox, A. (2002). Sex offendersemerging from long-term imprisonment: A study of their long-term

347GENDER DIFFERENCES IN RISK

Page 12: Gender Differences in Structured Risk Assessment: Comparing the Accuracy of Five Instruments

reconviction rates and of parole board members’ judgments of their risk.British Journal of Criminology, 42(2), 371–394.

Howard, P., & Kershaw, C. (2000). Using criminal career data in evalu-ation. British Criminology Conference: Selected Proceedings, 3, avail-able online at www.lboro.ac.uk/departments/ss/bsc/bccsp/vol03/howard.html

Kershaw, C., Goodman, J., & White, S. (1999). Reconvictions of offenderssentenced or discharged from prison in 1995. England and Wales:Development and Statistical Directorate (Home office research findings,19/99). London: Home Office.

Kroner, D. G., & Loza, W. (2001). Evidence for the efficiency of self-report in predicting nonviolent and violent crime recidivism. Journal ofInterpersonal Violence, 16, 168–177.

Kroner, D. G., & Mills, J. F. (2001). The accuracy of five risk appraisalinstruments in predicting institutional misconduct and new convictions.Criminal Justice and Behavior, 28, 471–489.

Kroner, D. G., Mills, J. F., & Reddon, J. R. (2005). A coffee can, factoranalysis, and prediction of antisocial behaviour: The structure of crim-inal risk. International Journal of Law and Psychiatry, 28, 360–374.

McNeil, D. E., & Binder, R. L. (1994). Screening for risk of inpatientviolence: Validation of an actuarial tool. Law and Human Behavior, 18,579–586.

McNeil, D. E., Sandberg, D. A., & Binder, R. L. (1998). The relationshipbetween confidence and accuracy in clinical assessment of psychiatricpatients’ potential for violence. Law and Human Behavior, 22, 655–669.

Mills, J. F., & Kroner, D. G. (2006). The effect of discordance amongviolence and general recidivism risk estimates on predictive accuracy.Criminal Behaviour and Mental Health, 16, 155–166.

Morrissey, C., Hogue, T., Mooney, P., Allen, P., Johnston, S., Hollins, C.,et al. (2007). Predictive validity of the PCL–R in offenders with intel-lectual disability in a high secure hospital setting: Institutional aggres-sion. Journal of Forensic Psychiatry and Psychology, 18, 1–15.

Mossman, D. (1994). Assessing prediction of violence: Being accurateabout accuracy. Journal of Consulting and Clinical Psychology, 62,783–792.

Nicholls, T. L., Ogloff, J. R., Brink, J., & Spidel, A. (2005). Psychopathyin women: A review of its clinical usefulness for assessing risk foraggression and criminality. Behavioral Sciences and the Law, 23, 779–802.

Nicholls, T. L., Ogloff, J. R. P., & Douglas, K. S. (2004). Assessing riskfor violence among male and female civil psychiatric patients: TheHCR-20, PCL:SV, and VSC. Behavioural Sciences and the Law, 22,127–158.

Quinsey, V. L., Harris, G. T., Rice, M. E., & Cormier, C. A. (1998). Violentoffenders: Appraising and managing risk. Washington, DC: AmericanPsychological Association.

Rasbash, J., Steele, F., & Browne, W. (2003). A user’s guide to MLwiN.London: University of London, Institute of Education, Centre for Mul-tilevel Modeling.

Rice, M. (1997). Violent offender research and implications for the crim-inal justice system. American Psychologist, 52, 414–423.

Rutherford, M., Cacciola, J. S., Alterman, A. I., McKay, J. R., & Cook,T. G. (1999). The 2-year test-retest reliability of the Psychopathy Check-list—Revised in methadone patients. Assessment, 6(3), 285–291.

Salekin, R., Rogers, R., & Sewell, K. (1996). A review and meta-analysis

of the Psychopathy Checklist and Psychopathy Checklist—Revised:Predictive validity of dangerousness. Clinical Psychology: Science andPractice, 3, 203–215.

Salekin, R., Rogers, R., & Sewell, K. (1997). Construct validity of psy-chopathy in a female offender sample: A multitrait–multimethod eval-uation. Journal of Abnormal Psychology, 106(4), 576–585.

Salekin, R. T., Rogers, R., Ustad, K. L., & Sewell, K. W. (1998). Psy-chopathy and recidivism among female inmates. Law and Human Be-havior, 22(1), 109–128.

Serin, R. C. (1996). Violent recidivism in criminal psychopaths. Law andHuman Behavior, 20, 207–216.

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses inassessing rater reliability. Psychological Bulletin, 86(2), 420–428.

Simourd, D. J., & Hoge, R. D. (2000). Criminal psychopathy: A risk-and-need perspective. Criminal Justice and Behavior, 27, 256–272.

Snowden, R. J., Gray, N. S., Taylor, J., & MacCulloch, M. J. (2007).Actuarial prediction of violent recidivism in mentally disordered offend-ers. Psychological Medicine, 37, 1539–1549.

Stadtland, C., Hollway, M., Kleindienst, N., Dietl, J., Reich, U., & Nedopil,N. (2005). Risk assessment and prediction of violent and sexual recid-ivism in sex offenders: Long-term predictive validity of four risk as-sessment instruments. Journal of Forensic Psychiatry and Psychology,16, 92–108.

Strand, S., & Belfrage, H. (2001). Comparison of HCR-20 scores in violentmentally disordered men and women: Gender differences and similari-ties. Psychology, Crime and Law, 7, 71–79.

Strand, S., & Belfrage, H. (2005). Gender differences in psychopathy in aSwedish offender sample. Behavioral Sciences and the Law, 23, 837–850.

Taylor, R. (1999). Predicting reconvictions for sexual and violent offensesusing the Revised Offender Group Reconviction Scale (Home OfficeResearch Findings, No. 104). London: Home Office.

Thornton, D., Mann, R., Webster, S., Blud, L., Travers, R., Friendship, C.,& Erikson, M. (2003). Distinguishing and combining risks for sexualand violent recidivism. Annals of the New York Academy of Sciences,989, 225–235.

Vitale, J. E., & Newman, J. P. (2001). Using the Psychopathy Checklist—Revised with female samples: Reliability, validity, and implications forclinical utility. Clinical Psychology: Science and Practice, 8, 117–132.

Warren, J. I., South, S. C., Burnette, M. C., Rogers, A., Friend, R., Bale,R., & Patten, I. V. (2005). Understanding the risk factors for violenceand criminality in women: The concurrent validity of the PCL–R andHCR-20. International Journal of Law and Psychiatry, 28, 269–289.

Webster, C. D., Douglas, K. S., Eaves, D., & Hart, S. D. (1997). HCR-20:Assessing risk of violence (Version 2). Vancouver, British Columbia,Canada: Simon Fraser University, Mental Health Law and Policy Insti-tute.

Webster, C. D., Eaves, D., Douglas, K., & Wintoup, A. (1995). TheHCR-20 scheme: The assessment of dangerousness and risk. Vancouver,British Columbia, Canada: Simon Fraser University and Forensic Psy-chiatric Services Commission of British Columbia.

Received January 16, 2008Revision received November 13, 2008

Accepted December 3, 2008 �

348 COID ET AL.