\!c.l l)ccr\ Nlaklng '!1,1. .1. \o :, 19lt{ Statistical Approaches to the Analysis of Receiver Operating Characteristic (ROC) Curves Bqrburu .1.Mcm"eil, M.D., Ph.D., anrJJames A. Hanlev, Ph.D. tn thisarticle we review published andst.rn'e unpubiished work in statistical analyses of ROC curves.We describe both singleand jgint indices and indicatethe approaches that have been taken to consider between-reader variations andcorrela- tions, within-reader variations, and variations and correlations between cases' We thendiscuss in detail a single index. the IP ratio at a fixedFP ratio (desig- nated ZPr"), or the FP ratioat a fixedIP ratio (designated FPTp). We show howto caiculate confidence limits around cnv pointon thecurve; we further show, using the conventional Dorl'man and Ail program andthe jackknifing technique, how to cal- culate these confidence limits for multiplecurves derived from the same sample of patients. (Med Decis Making 4:137-150' 1984) Over the past 20 years investigators have proposed a number of indices to describe receiver operatingcharacteristic (ROC) curves'and have developed statistical techniques to comparetwo or nlore curves [l-3]. In this article we will briefly review past work in this field and will indicate an approach to dealing with the problem of comparing differencesbetweentwo or more ROC curvesat a singleoperating point in either the true positive ( IP) or falsepositive (FP) dimelsion. We shall first review commonly usedindices and discuss their statisticalevaiuation for experiments with unpaired and with paired designs. Indices to Describe ROC Curves Anre-RErerro MresunEMENTS. Much recent work has involved the This u'ork was supported in part b,v a grant fronl the Hartford Foundation and the National Cancer Institute (CA 28847). From the DepzLrtmenrs of Radiology and Preventive Medicine and Clinical Epidemiology, Harvard Medical Schooland the Brigham and Women's Hospital, and the Department of Epi- demiologyand Health, McGill University. Please address requests for reprintsto Dr. McNeil, Department of Radiology, Harvard Medical School, 25 Shattuck Street, Boston, Massachu- setts02115. USA.
14
Embed
Statistical Approaches to the Analysis of Receiver Operating Characteristic (ROC) Curves · 2004. 2. 25. · 138 McNEIT exo HaNLPY area under the ROC curve. When discrete rating dat-a
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
\ ! c . l l ) cc r \ N lak lng' ! 1 , 1 . . 1 . \ o : , 1 9 l t {
Statistical Approachesto the Analysisof Receiver OperatingCharacteristic (ROC) CurvesBqrburu .1. Mcm"eil, M.D., Ph.D.,anrJ James A. Hanlev, Ph.D.
tn this article we review published and st.rn'e unpubiished work in statistical analyses
of ROC curves. We describe both single and jgint indices and indicate the
approaches that have been taken to consider between-reader variations and correla-
tions, within-reader variations, and variations and correlations between cases'We then discuss in detail a single index. the IP ratio at a fixed FP ratio (desig-
nated ZPr"), or the FP ratio at a fixed IP ratio (designated FPTp). We show how to
caiculate confidence limits around cnv point on the curve; we further show, using the
conventional Dorl 'man and Ail program and the jackknifing technique, how to cal-
culate these confidence limits for multiple curves derived from the same sample of
patients. (Med Decis Making 4:137-150' 1984)
Over the past 20 years investigators have proposed a number of indices to
describe receiver operating characteristic (ROC) curves' and have developed
statistical techniques to compare two or nlore curves [l-3]. In this article we
will briefly review past work in this field and wil l indicate an approach to
dealing with the problem of comparing differences between two or more
ROC curves at a single operating point in either the true positive ( IP) or
false positive (FP) dimelsion. We shall f irst review commonly used indices
and discuss their statistical evaiuation for experiments with unpaired and
with paired designs.
Indices to Describe ROC Curves
Anre-RErerro MresunEMENTS. Much recent work has involved the
This u 'ork was supported in part b,v a grant f ronl the Hart ford Foundat ion and the
Nat ional Cancer Inst i tute (CA 28847).
From the DepzLrtmenrs of Radiology and Preventive Medicine and Clinical Epidemiology,
Harvard Medical School and the Br igham and Women's Hospi ta l , and the Department of Epi-
demiology and Heal th, McGi l l Univers i ty . Please address requests for repr ints to Dr. McNei l ,
Department of Radiology, Harvard Medical School , 25 Shattuck Street , Boston, Massachu-
set ts 02115. USA.
138 McNEIT exo HaNLPY
area under the ROC curve. When discrete rating dat-a are used (e.g., 5-point
or 6-point rating scales for radiology imaging exr-leriments), and when ROC
curves are assumed to be based on two underlying Caussiarr distributions, a
maximum likelihood estimation program b-v Dorfrnan and A1f [4] can be
used to fit data points to a smooth curve and to derive thereby (along with
other indices) the area under this fitted curve and its associated standard
error. This area is designated A- and ranges in value from 0.0 to 1.0.
If the ROC curve is drawn by connecting the pairs of observed TP and
.FP ratios, and if the trapezoidal areas are summed, the resulting non-parametric area is designated P(A).The common availabil ity of the Dorf-man and Alf program, underestimation of the area, and undue dependenceon extreme points have decreased the use of the P(,4 ) index for rating data.
When continuous dala are available (as from chemistry laboratory tests,white cell counts, individualized predictions from logistic regression or dis-
criminant analysis) and ROC curves are created, no assumptions on under-
lying distributions need be made to obtain area measurements. Instead,
Bamber's recognition of the equivalence between the area under the ROC
curve and the Wilcoxon statistic Zallows immediate and direct calculation
of W and hence the area [5]. Hanley and McNeil 's [6] derivation of a closed
form approximate expression for the standard error associated with the
Wilcoxon statistic can be used to approximate the standard error of the
area.
SropE-RrrerED AND INrrRcspr-RELATED INorces. When ROC curves
are assumed to be based on underlying Gaussian distributions, the expected
ROC points should follow a straight line when plotted on binormal coordi-
nate paper t2l . This assumption has led to the development of a series of
indices related to the slope and intercept of the straight line fit to the
observed ROC points. In general, these are derived after observed data are
fitted with the Dorfman and Alf program. The true slope of the l ine is desig-
nated b and its true intercept a (estimates of these are designated d and b;
however, for simplicity, throughout this manuscript the estimate sign will
be omitted and all symbols a and b will designate estimates); a divided by b
is called Am, an index commonly used in radiology phantom studies. Con-
ceptually, Am is a form of a standardized difference between the means oftwo normal distributions (which may have different variances). Otherderived indices are also available (see [2] for a complete summary). The
Dorfman and Alf maximum likelihood program provides estimates for a
and b,var(a) , var(b) , and covar(4, b) . E i ther of the above pai rs of ind ices(e.g., Am and the slope or a and b), or any other pair derived from them, is
sufficient to describe fully a binormal ROC curve.
Statistical Tieatment: Area Index
UNpnrnrp Dara. When the area index is used and we have only one
rea(variing
andTheDorstatbether
Irvhiexprelacovingsistfearareiculiqua
Coris t lcandenequoperatiuncbycorhor
Allinvancmacur
serlrl€poru SurEurut o,rl oql JoJ seeJe aqt q8noqtle puu 'ssoJJ lqElru sa^JnJ
lou o/ru 'alduexe ro; 'sasel auos ul 'Ja^a,4Aoq 'es€3 aql oq sfeatp tou,(eruqJnS 'a^JnJ altlua eql go uoudt:rsap E ut to 'a^JnJ
JOU alllue aql JoJ alue-urrograd Jo xepur ieuorsueulp-auo IIulaAo u€ uI palselelul sI roleEtlsanutaql t"qt eunsse senbruqcel IeJrlsrlels Eurllnser pue saJIpuI a^oqe eql Jo I1y
xapuJ tulod dJ
'JeAe,^(\oq
'ayqepe,re .{lproua8 la,{ lou st stqt i[91] sired ldocralut-edols pelelaJroJ
Jo slas o^u oql Jr uorlnqlrtstp arunbs-rqc e s,^ olloJ leqt Jllsllsls tsal u pado-la^ap seq z1a;41 'srted tdactatui-adols Jo uoslJudruoc partedun toC ('1enba
aq ol slueluaJnseatu Eete JoJ ',,nolaq pal€Jlpul s€ ',{r€sseJau lou sl sJuep-rJuroJ srql) 'sa,rrnc eql Jo eJuaptcutor alaldruoJ € sl eJaql JI {1uo Tsva uet
sldacratur puu sadols Jlaql qloq uo {lsnoauellnuls selrnJ 3gy Eutreduo3
xepul 1dacra1u1-adopg
'ss;sse ol llnJrJJrp asr,$leqlo st lEql ,{lttuenb
E 'ses€J erues aql 8urfpn1s.{q paonpur sueJs uaa,ln}aq uolle{arJoo aql Eutlelnc-l"J Jo poqlau elqlseal arotu B Pelse8Ens [t] tlaruclt pue ,{a1ueg 'xaput eale
aql Jod '[1] (,(cualstsuoJ JapEeJ-uea,,rrtaq) sesut Jo las etues aql peeJ sJapEaJ
a1dr1lnur Sur,req , q pacnpur s3tuEIJe^oJ pue seJuelJs^ oql pue i(.{rualsts-uoJ rapeaJ-urqlr,u) aJuo ueql aJotu saseJ Jo les etues eql pBeJ :apear e 8ut-,r,uq ,(q peJnpur eJuBueA eql isasec aues aql Sutsn .{q peJnpul saJu€lJE^oJpue saJu?rJ€^ eql :uoslJuduroc perted u ut luasard aq .{utu leql (suotlelar-roJ pel€rJosse pue) seJueIJ€A 3o sad,{1 eeJql lunoJJe olul al€l o1 uotssardxapraua8 e papr,tord a,ruq san8ualloJ pu" sle,\\S 'pasn xapul aql Ja^eleq^\'lynl;;rp eJoru eJe suostredruol parted toJ slsel IEJllsll€lS
'vlvc aaursd
'ra,u.od pue ezrs aldues atetullsa ot fqaJeqlpue roJJO pJupuels aql JoJ uotssardxa tuJOJ pJSolJ e opt,ro:d ol posn eqoslu ueJ Jrlsrluls uoxorli[ eql '[9] sluaueJnsuatu Jo lstsuof, etup JI 3l]sllels
uoxoJlri6 aqt uroJJ ro 's8utler Jo lslsuoJ Bl€p JI urerSord JIV pue uutuJJo(l
aql uoJJ .{ltrarrp peululqo aq u€J sJoJJa pJupuets pal€tJossu pue seeJ€ aql'uorlnqrJlsrp
IEIuJou Jql Jo elqel oq1 qlltt oIIBJ IeJIIIJI aql Eutreduoo pue
may be allnost the same, in the clinical range of interest one may be superior Thto the other. In addition, even if two curves do not cross, one could imaginE Zr,that differences would exist at one point (the clinically relevant one, per- ch<haps) on the curve but would not be detected in any global test. In either of mathese two situations raw rating scale data may not ailow direct comparisons shcto be made, since it is unusuai for indentical interpretive critera (i.e., identi- (Mcal FP ratios) to exist in different experiments. In particular, this means that reathe observed TP ratio on one curve at a particular criterion cannot be cordirectly compared with that on another curve, because the associated FP an(ratios observed may be different. cor
CSF
Methods
In this section we elaborate an approach to comparing differences between thetwo ROC curves at one point (either TP or FP). This is also part of a com- (fr,prehensive computer package being prepared by Metz [0] . Our approach is Zrbased on fitted ?"P and FP ratios, obtained from fitted parameters from the meDorfman and Alf maximum likelihood estimation program for rating-based onROC curves [,4] . The method yields confidence intervals around true posi- bettive ratios at a fixed faise positive ratio (designated TPpp) or around false thepositive ratios at a fixed true positive ratio (designated FP7). Illustrative
examples are presented.The maximum likelihood estimation program of Dorfman and Alf pro-
vides parameters that aliow calculation of IP ratios at any FP ratio and anlthus provides the basis for comparing two ROC curves at either the same ratTP ratio or the same .FP ratio. ln brief, the relevant outputs for this purpose ateare: (1) a, the normai deviate value of the intercept of the ROC curve with se!the 7 axis; and (2) D, the slope of the ROC curve obtained from ROC curves ierplotted on normal deviate axes. The equation for this purpose in normal gldeviate space (Z) for the ZP ratio is:
Zr, - bZpp- a. (2) Pleval
The quantity Z7p on the Z scale can be converted to irP on the 0-10090 scale sol
by determining what percentage of the normal probabil ity distribution l ies ret
above (i.e., to the right oD Zrp. val
The Dorfman and Alf output also provides the variance and covarianceterms var(c) , var(b) , and covar(4, b) . These can be used to calculate the f ro
sampling variance or uncertainty of Z7p and thus of TP itself. This is done pu
in two steps. First we calculate a confidence interval for Z7p (in the Z scale),and second we transform the confidence interval back into the usual
0-10090 scale. The relevant equation for calculating the variance around aZ7p ratio is:
aq] ',{lpraueD 'salqet ,{lqtqeqord IeIuJou eql Eutsn 'dJ toJ sllull ra,e\ol pue dlraddn aql Sutlelsuerl Jo Jalleu eldruts E sr r'at, JoJ IP^JaluI sJuaplJuoJ aq lB pourplqo seq euo aou6 ('gE se ,trol s€ 1r JoJ ua^e IeIuJou ruau .{lqeuosea: luqlsuortnqrrtslp e^"q q puu t luql [g] ,(lpcrrtdure u^\oqs s"q zlal l 'ra'toarotr41) -llui'slepotu Eur.{yrapun eql Jo ssalpreEar 'suotlnqtrlstp u€Issn€D e^eq pFot{s suolfaql a8re1 sr aldues eql JI
Jo AS aql yo aldr11nur ewos +dJZ sauloleq dJZ to! Ie^JeluI eJueplJuoJ aqJ JoIJ
lrl sa^un] Jod co sIsrTVNv lYJILsIrvJS
142 McNrrr eNo HeNrry
var (b ' - b ) = var (b r )+var (b . ) -2covar (b t , b . )v a r ( a r - a r ) : v a r ( a r ) + v a r ( a " ) - 2 c o v a r ( a , , a . )
covar ( t r , -b2 , ay-a) = covar (b , , a , ) -covar (b , , o r ) -covar (br , a r )+ covar(br, ar)
The second method is useful i f Metz's program is unavailable. We canapproximate var (21 - Z) by the method of jackknif ing (see Fleiss [ l l ] for ageneral introduction to jackknifing, and Efron [12], equations 6.11 and 6.17 ,for the jackknife variance in two-sample problems). When the rating datacome from n" normals and na abnormals, the jackknife method consists ofobtaining nN+ nA different estimates of Zr- 22 and using the quantity
E I ( Z t - Z ) - 1 z , i - z . l ) 2
as the jackknife variance of (Zy- Zr). The quantity (Zr- Z) is obtainedfrom the entire data set. The i th jackknife estimate (Zi- Zil is obtained byfitt ing two separate ROC curves to the data set (of n1,,+na-1 subjects) Ctformed by deleting the paired ratings of the i th subject from the original Aldata set. Although this may sound computer-intensive, the numbers of pairsof ROC curves to be fitted depend on the number of rating categories andnot on the number of patients. For example, data on a five-point ratingscale wil l involve at most l0 distinct values for each modality; the nn+n1,quantities being summed in equation (6) will occur in multiples.
Eq
ExrENsroN oF THE Mrrnoo ro Murrrprr Rr,tnrns. As we have done so va.
far, we still base the analysis on equation (4), but now need to include sid
between-reader (56r) and within-reader (Srr) variances. For this purpose we 5q
use equation (5) (Chapter 4) from Swets [], as shown here: 1.6
SEroi rn: 2" ' I s !** ,11 - t , * , ) * t i ' : * '
( l - ro, * , ) -51, , l t ' ' , (7)
L 'rb, ,r: the observable correlation between the Z7p's obtained when a set
of readers reads the same cases in the two settings
rc,wr:the observable correlation between the Z7p's obtained when asingle reader reads the same set of cases in two settings
S.2*,,:53+Si., the observable variance in Z7p that would be found byhaving one reader read once each of a set of different case samples
S3**,: Sj,+ S|,, the observable variance in Z7p that would be found byhaving one case sample read once by each of a set of different readers
S],:the observable variance in Z7p that would be found by having onereader read one case sample on two or more independent occasions
/ : the number of indeoendent readers
1 5 t
TheqttalcalO Z ,
whcalspith(we
(6)
Thdis
NcDir
nol
'[eluJou
flartuuap ol S Jo rnt?^ E pue Ierurouq€ .(larlulJep Jo sultur e11'1"dl11]j:j111
IS8S
IBIOI
z9
z9 fc
CF
zI
pasEesICpaseesrpuoN
su(euo
sJap,(q P
SAI,(q P
eu0
0*tu;1ug
oprg Eupra tulod-ealg B uo peululqo a^rnJ JOU 'I alqul
The 95 percent confidence interval for Z7p is thus -0.48+ 1.96(0.2579), .,or ( - 0.99 to 0.03). Using the same merhod used to transform Z: - 0.48 to u^' '
a rounded ?"P value of 68 percent, we can transform the Z= -0.99 and ",.'
0.03 back to upper and lower ?"P limits of 84 percent and 49 percent. ;t-Table 2 summarizes the standard errors obtained in this wav for three
points on the above ROC curve. - 'De
Equation (3) gives confidence limits about this value as follows:
v ar ( Z 7) : (1.645)'� (0.04 6'7) + 0.097 4 - 2(1.645) (0.M78): 0.0665
SE(Zrp) = 0.25'79.
CoMp,lRrNc Ttvo Cunvss (UNrenEo Dnm) er ONr ?"Ppp. Table 3 dis-plays two sets of rating data for previously published results on gallium
Table 3. Ranking Data For an Unpaired Experiment [13]
Rating*-BWH3
NondiseasedDiseased
4l 9
I
A
I
Rating* -JHH
3
r36
4
125
s
tB'*'7
i'.
NondiseasedDiseased
J
l 8
BWH
a : 0.6665b : 0.4316
var(a) = 0.07234var(b) : 0 .03639covar(c, b) = 0.0163
:s/r\olloJ se are feql 'sSutuas o^u eql Jo qJ"e loJ (01001 Jo senle.\
d.l ft) at7 Jo'anoqe u,toqs sP 'uoIlBInJIPs s,{\olp pue t elqul ul u,/r\oqssr urerEord JIV pue ueluJloq eql uoJJ lndlno eql 'teuu€os JeeuIIIlJal eEursn (gg1) ptrdsog surldog suqof tB las uolloq oqt pue '€retu€c ra8uy
ue Sursn (HAfg) p1rdso11 s.uatuol1 pue rueq8trg aql le paulelqo s3,{\ EtBp
;o tas dot aqJ '[gl] stsdas yo aornos IEJoJ 3 JoJ qtJeas aql ut EutEutul olsllll
[0t] ruo:J poqrau aqt Sutsn pa\uaqr
zlII
s
s7l
s
IUn-SIT
e Es0'0I890'08tE0'0LSL0'0
z99r'0zn90'0 Evon'jzrLr'0 96v9'0 88zz'r
tq.D
'gtD
6E'0 -
08'r -
67'�1 =
09't =
saf Brurlsapooqrla{!l urnurJxEl^l
'q'a,.q'D
eeJ
puloll'(6,
zg tg ,g
seJuaFBAoJ puB saJuBFuA
*(,{.rolsrq lnoqlr^{: Z :,ftols;q qlpr: I) sralautured aarnJ JOU
s€929
I-I
z-
plorsttzl.{.ro1sr11 qfFr pBaU
Sg: t/ 'stuallBd Peseaslq
E6 0t leloJ_
I
Ih
I
IBroJsnt.ZIf,.ro1sr11 qllar pBaU
tE : u'sluallBd PasuostPuol
v>
(0
(e€'liis
7C
IIIII
li lz€
),ls
1,7 7
zs97,
.i.ro1sr11
Inoqtr t\PBJU
[ttl tueuFadxg parEd u uorJ suBrs IJ uo BIB(I Eupuy 'p alqu1
sil sa^'dnJ 3ou Jo srsrlvNv 'rvcr.rsrlvt.s
t-I_
III
146 McNrtl aNo Haxrrv
The three subcomponents, calculated as for equation (5), yield values of0.3820, 1"0416, and 0.5877, so that equat ion (5) y ie lds
var (2, Zr t = ( t .28)) 0.3820 + 1.0416 - 2(1.28) 0.5811= 0.1630
SE(Zr- Z) - \0 .1630 = 0.40.
Thus the crit ical ratio is -0.90/0.40, or 2.25, indicating that at FP:l}qoZP's obtained with history are statistically higher.
With the jackknife technique, the paired ratings of successive patientswere eliminated, as described in the Appendix, to create 54 + 35 :89 differ-ent data sets, each with 88 patients. Using the steps in the Appendix, weobta ined the jackkni fe est imate of SE(.Z1 - Z) :V0.1966:0.44, which isonly l0 percent higher than the more parametric SE of 0.40 calculated bythe "paired binormal" model of Metz.
Discussion
The work was motivated by two concerns with the use of area indices forcomparing two ROC curves: (1) That two curves might cross and in suchcases similar areas might result; and (2) that even in the absence of crossingcurves similar areas might result when, in fact, statistical differences couldexist in the region of clinical interest. These concerns pointed to the need tomake comparisons at single points on either the TP or the FP axis. In theprocess of developing the analysis discussed here we realized that there wasa need to provide a brief overview of commonly used indices for ROC anal-ysis, whether paired or unpaired experimental designs were used.
The major point of our review and analysis is this: Once we assumebinormal distributions of an ROC curve, all statistical properties are deter-mined by the parameters s and b of the maximum likelihood fit to the data.This paper has emphasized the use of these parameters for calculating confi-dence limits around single ?"P or FP points onywhere along the ROC curve.Others, particularly Swets and Pickett [], have discussed confidence limitsexplicit ly in relationship to area measurements and IP points correspond-ing to observed ?"P-FP pairs.
To put this work in perspective it is worthwhile to summarize previouswork in the general area of statistical analyses of ROC curves. The workfalls along two lines: (l) the use of a single index (for example, the area or?"Ppp) versus joint indices (for example, slope and intercept); and (2) con-sideration of between-reader variations and correlations, within-readervariations, and variations and correlations between cases.
Swets and Pickett [] give formulas for single indices that cover all possi- NDble experimental designs. Hanley and McNeil [6] elaborated on statistical Dconsiderations relating to a specific single index, namely the area; becauseof the unique relationship of the area to the Wilcoxon statistic it is possible De
to calculate explicitly the associated standard error due to case sampling. histol
The sROCworkent inexpliccurvewerethe ja
tiple rMt
of coro n l y '
Orthis pspacetributnaturpriatethrou
To olfrom
(1)modahere iwhertjointl
differ
NDD
(2)De
withc
'sluarled r.{Jns a^rJ eJp eJeql :,{Jolslq
lnoqll{\ g pue f:o1srq ql!t\ 1 go s8urler qll,t stuatlsd po$aslpuou aloleq
s0sr.0_ ez.nr_ ez s-. t ^ !^ 8z16.r_ er2 ! t^ _ c- II S 9 Ze - t. Z 8 0V CIN
'z - tz ,'z lz',{rolsrq lnoq}r^\
E pue .ftolsrq qlp\ Z go sSutlur pa^IeJaJ oq,t luatlud paseaslpuou elalag'sluerlud a^rsseJf,ns 1no Surneal .(q pautulqo eJ? sles El€p ,lneu aqJ (Z)
'Eu11d
alqrss(asneJ:
IUJrlsr-rssod
Japsaj-uor (ro ?aJ
IJO,'r\SNOIA;
-puod
s1IluII'a^Jn3
-rJuoJ'ulep :-relep
e[unss
-lEuE:
S€1r\ AJ
3q1 ulol paerplno3SurssoqJns uroJ sa:
{q Patsr qJrr.a,t\ 'xI
-IAJJIP
stuarlr
6588.0_ S8Z0,t_ 9Zs Z -Z
I II S 9 ZE'z-tz 'z 9 , t z I
Eurtuy.{.ro1srq fnoqllA{
wt6t_ 9Z9Z I --E 7 6 0n
tz gr,zr3ut1ug
f,.ro1srq q1;tr1
CCN
'luarOJJIp
,(1tq8r1s are ,(aql 'snq1 'urerSord 21a61 aqi Jo unr a18uts E LUoJJ .{llutotpal€urlsa eJan t elq€I qlr,^A uorlJeuuos ur JeIlJea u,toqs asoql s€eJeqnA'ur?r3ord
JIV pue UEnIJJoC eql Jo sunJ aleredas o,{u Iuor; pe^IJop aJP eJOq
sanle^ zZ pue t7 at4t wqt e1o51 '(paseaslp: CI :pa$aslpuou: AN) .{lqepou
qJBe JoJ se^JnJ 369 aleredas tIJ ot pesn ere t elqel urorS eleq (1)'sarlrppou o.,\nt ,{q pelenp^a sluatled Jo les elues aql uoJJ
paurclqo sdJT om1 aql ot JeJaJ zZ pue t7 anql'l' '127-t7)n^ ul€1qo oI
3ga ayEurs e uo turod .{ue lu sl€^ratul eJueplJuoJ JoJ s€lnurJoy ltrtldxosa,trE pue 'dtdJ ewlq srql 'xapur al8urs raqloue qlr,r sleap uotle8llsa^ut iue-serd aqa 'hl ttalctd pue sle,/r\S .{q patsaSSns sulnuloJ praua8 pus Jo lro,taql sluaualduot snqi U
'saseJ Jo las eruus aql uoJJ pa^lJep seArnJ JOH
roJ lnq 'seara qlr/r\ sl€3p osF [t] ilaxrlnl pue falueg ;o raded puoJes aqf .Io san
LVI S:r,\un] lou Jo slsA'IYNv lYlrrsrrYrs
I48 McNErr ,cNo HaNLr r
N D 3D
Dehistorlast fSimilvious
N D 4D
Dttory;
N D 4D
Dthisto
N D ID
Drtory
N D I
D
C(
The z
1 . ST
t q
\ O n t n O . € - $ C n $ - t . { $ o . . | ^ . 1 - t f , O- O c l o l cO @ ca. . . l O r O - . ] .O \O O $ - < f , €o \O 6 € @ O\ O O\ t <) - \O O\ -.-l \O \O \O O ro c . l - O r € O r € O \ C C \ C \ C $ r ^ r - O $ S6 . . t n o o 6 o r o o o . € @ o - r o . . l o o \ o€ . . I C . O O O O r O - O 5 - . O
= s = = S g E s E g S = S E s 3 g g s s .,- :..{.- ' .- -i - ' ^ -i - i .-r - ' -: ; ^' ^' -: ; .: ; ;
q)
z
N
N
I
\:'l
Ni
N
E N
.? N'
.ii,,ia
, N
v -
r . E
z z
o : "q 9 5
o . r. > q 2
a . a
a!
JI
t i
N
I
r.)
n € 6 r+ \O n O - O\ $ F- € \O cC oO O C 6 € -O :f, s. o F- t 6 <. r- r- O O o - a.l rO r- n € 6n \O r- @ a! r- € a- r- @ t € t O\ O. $ d r al or- @ oO O, O cO @ C\ cC € C) € O € cO € cO @ - t-o o o d - j d o d c j o - j o - : c l c l c l o o J o
l l l r l l l i l6 n n n n € € € $ $ $ h h o c € € r r a rcl € € cO € r- r F- h n 6 cl ot O\ O\.+ S 6l F l--.o 91 9! t! 6..{ q! 9! ^! Ol Cl Cl Cl Q a !a I o \o \o- O O O a a a O O O O O O O O O C - - -__i -_i _ -i __i - _ i -i - - --i J J i J J - J --il l l l i l l l
@ 6 € o \ - O € o \ O € - 6 - @ @ A @ a . l n 6o.l o al - \O o ol - o 6l @ o \O O - O - € \O O- O - O \ o O - O O - \ O O \ o O O O O r - o \ Oo, o\ o\ o o o. o. o o o. o 5 o o\ o. o. o. c\ 6r o\J - : - : ^ i ^ i - j - j ^ i - j - j 6 i - : ^ i , j - j - j - j - j ^ i - j
l l l l ! l
- n d - * * 6 o - - F o l 9 o 6 l c l - -- N
" 1 a n a a . 1 - 1 . 1 . ' 1 . ' l c ' I 1 a " 1 " 1 - f " n d j 1 aal * cl o $ * 6l o - a.l $ - <t $ n :l n o 6.1 S
A A N A A A A A A
z z z z z z z z z z z z z n3 . N4 , L
A
6961 '96r-L8n:9
loqr,{sd qtut t '"tep
poqleu Surteg 's1u,r:otur eJuoprJuoc Jo uorleurturalep pue .(:oeql uorlJolopleu8rs go sralarue.lud Jo uorterurtso pooqllalrl unurx?t\l :rf E JIV 'CC uetuJroq 't
'sluarlud qJns o^u are araql !.{ro1-srq lnoqlr^\ E pue ,{rolsrq qll,r\ t;o s8utler qltm sluatled pas?aslp alalacl
i:
9Z9ZI-CI-E (.8 OtcN
'sluarled a^U snol^-ard aql roJ senle^ z7 aql ol pnba are JnoJ aseql JoJ sanlu^ z7 aqt '.{peltrutg
'luarted 1srr3 ,{ral aql JoJ anler' t7 aql ol lunba qf,Be aJE sluatled JnoJ lsEIaseql roJ sanle^ t7 nol eql leql oJIloN 'sluartud qJns lnoJ ar€ aleql :,fuolslq
lnoqtrlrt 7 pue .{rolsrq qll,\ z Jo sflurle: qlt,u sluatled pesseslpuou el3lac
t'.
ll
;o,
8198'0 -
,z _,2s8e0.r_ 9Zs Z -Z
I OtI 9 Zt,Z
t€06.r_ 929 Z I - C-t z 6 6€cN
,Z
6Vrsa^dnJ )ou {o srslrvNv'rvJrJ.srJvrs
150 McNrn eNn HeNrrv
5. Bamber D: The area above the ordinal dominance graph and the area below thereceiver operating characterist ic graph. J Math Psychol l2:38i-415,1975
6. Hanley JA, McNeil BJ: The meaning and use of the area under a receiver operat-ing characteristic (ROC) curve. Radiology 143:29-36, 1982
7. Hanley JA, McNeil BJ: Method for comparing the area under iwo ROC curvesderived from the same cases. Radiology 148:839-843, 1983
E. Metz CE, Kronman HB: Stat ist ical signif icance tests ior binormal ROC curves.J Math Psychol 22:218-243,1980
9. Metz CE, Kronman HB: A test for the statistical significance of differencesbetween ROC curves. Inserm 88:647-6ffi, 19'79
10. Metz CE, Wang P-L, and Kronman HB: A new approach for testing the signifi-cance of differences between ROC curves measured from correlated data. In,Deconick F, ed: Information Processing in Medical Imaging VIIL The Hague:Mart inus Nijhof, 1984
ll . Fleiss J, Davis M: Jackknif ing functions of mult inomial frequencies, with anapplication to a measure of concordance. Am J Epidemiol 115:841-845, 1982
12. Efron B: Bootstrap methods. Another look at the jackknife. Ann Stat j : l -26,
t9'7913. McNeil BJ, Sanders R, Alderson PO, et al: A prospective study of computed
tomography, ultrasound and gallium imaging in patients with fever. Radiology139:647-653, 1981
14. McNeil BJ, Hanley JA, Funkenstein HH, Wallman J: The use of paired ROCcurves in studying the impact of history on radiography interpretation. CT ofthe head as a case study. Radiology 149:75-77, 1983