Top Banner
The Lung Image Database Consortium (LIDC): A Comparison of Different Size Metrics for Pulmonary Nodule Measurements 1 Anthony P. Reeves, PhD, Alberto M. Biancardi, PhD, Tatiyana V. Apanasovich, PhD, Charles R. Meyer, PhD Heber MacMahon, MD, Edwin J.R. van Beek, MD, Ella A. Kazerooni, MD, MS, David Yankelevitz, MD Michael F. McNitt-Gray, PhD, Geoffrey McLennan, MD, PhD, Samuel G. Armato III, PhD, Claudia I. Henschke, PhD, MD Denise R. Aberle, MD, Barbara Y. Croft, PhD, Laurence P. Clarke, PhD Rationale and Objectives. The goal was to investigate the effects of choosing between different metrics in estimating the size of pulmonary nodules as a factor both of nodule characterization and of performance of computer aided detection systems, because the latter are always qualified with respect to a given size range of nodules. Materials and Methods. This study used 265 whole-lung CT scans documented by the Lung Image Database Consortium (LIDC) using their protocol for nodule evaluation. Each inspected lesion was reviewed independently by four experienced radiologists who provided boundary markings for nodules larger than 3 mm. Four size metrics, based on the boundary markings, were considered: a unidimensional and two bidimensional measures on a single image slice and a volumetric measurement based on all the image slices. The radiologist boundaries were processed and those with four markings were analyzed to characterize the interradiologist variation, while those with at least one marking were used to examine the difference between the metrics. Results. The processing of the annotations found 127 nodules marked by all of the four radiologists and an extended set of 518 nodules each having at least one observation with three-dimensional sizes ranging from 2.03 to 29.4 mm (average 7.05 mm, median 5.71 mm). A very high interobserver variation was observed for all these metrics: 95% of estimated standard deviations were in the following ranges for the three-dimensional, unidimensional, and two bidimensional size metrics, respectively (in mm): 0.49 –1.25, 0.67–2.55, 0.78 –2.11, and 0.96 –2.69. Also, a very large difference among the metrics was observed: 0.95 probability-coverage region widths for the volume estimation conditional on unidimensional, and the two bidimensional size measurements of 10 mm were 7.32, 7.72, and 6.29 mm, respectively. Conclusions. The selection of data subsets for performance evaluation is highly impacted by the size metric choice. The LIDC plans to include a single size measure for each nodule in its database. This metric is not intended as a gold standard for nodule size; rather, it is intended to facilitate the selection of unique repeatable size limited nodule subsets. Key words. Quantitative image analysis; X-ray CT; detection; lung nodule annotation; size metrics. © AUR, 2007 Acad Radiol 2007; 14:1475–1485 1 From the School of Electrical and Computer Engineering, Rhodes Hall (A.P.R., A.M.B.), and Operational Research and Information Engineering (T.V.A.), Cornell University, Ithaca, NY 14853; Department of Radiology, The University of Michigan, Ann Arbor, MI, USA (C.R.M., E.A.K.); Depart- ment of Radiology, The University of Chicago, Chicago, IL (H.M., S.G.A.); Department of Radiology, University of Iowa, Iowa City, IA, USA (E.J.R.v.B.); Weill Medical College, Cornell University, New York, NY (D.Y., C.I.H.); David Geffen School of Medicine at UCLA, Los Angeles, CA (M.F.M.-G., D.R.A.); Medicine and Biomedical Engineering, University of Iowa, Iowa City, IA (G.M.); and Cancer Imaging Program, National Cancer Institute, Bethesda, MD, (B.Y.C., L.P.C.). This research was funded in part by the National Institutes of Health, National Cancer Institute, Cancer Imag- ing Program by the following grants: R33 CA101110, R01 CA078905, 1U01 CA 091085, 1U01 CA 091090, 1U01 CA 091099, 1U01 CA 091100, and 1U01 CA 091103. Address correspondence to: A.P.R. e-mail: [email protected] © AUR, 2007 doi:10.1016/j.acra.2007.09.005 1475
11

The Lung Image Database Consortium (LIDC)

Apr 26, 2023

Download

Documents

Sue-Ellen Case
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Lung Image Database Consortium (LIDC)

The Lung Image Database Consortium (LIDC):A Comparison of Different Size Metrics for Pulmonary Nodule Measurements1

Anthony P. Reeves, PhD, Alberto M. Biancardi, PhD, Tatiyana V. Apanasovich, PhD, Charles R. Meyer, PhDHeber MacMahon, MD, Edwin J.R. van Beek, MD, Ella A. Kazerooni, MD, MS, David Yankelevitz, MD

Michael F. McNitt-Gray, PhD, Geoffrey McLennan, MD, PhD, Samuel G. Armato III, PhD, Claudia I. Henschke, PhD, MDDenise R. Aberle, MD, Barbara Y. Croft, PhD, Laurence P. Clarke, PhD

Rationale and Objectives. The goal was to investigate the effects of choosing between different metrics in estimating thesize of pulmonary nodules as a factor both of nodule characterization and of performance of computer aided detectionsystems, because the latter are always qualified with respect to a given size range of nodules.

Materials and Methods. This study used 265 whole-lung CT scans documented by the Lung Image Database Consortium(LIDC) using their protocol for nodule evaluation. Each inspected lesion was reviewed independently by four experiencedradiologists who provided boundary markings for nodules larger than 3 mm. Four size metrics, based on the boundarymarkings, were considered: a unidimensional and two bidimensional measures on a single image slice and a volumetricmeasurement based on all the image slices. The radiologist boundaries were processed and those with four markings wereanalyzed to characterize the interradiologist variation, while those with at least one marking were used to examine thedifference between the metrics.

Results. The processing of the annotations found 127 nodules marked by all of the four radiologists and an extended setof 518 nodules each having at least one observation with three-dimensional sizes ranging from 2.03 to 29.4 mm (average7.05 mm, median 5.71 mm). A very high interobserver variation was observed for all these metrics: 95% of estimatedstandard deviations were in the following ranges for the three-dimensional, unidimensional, and two bidimensional sizemetrics, respectively (in mm): 0.49–1.25, 0.67–2.55, 0.78–2.11, and 0.96–2.69. Also, a very large difference among themetrics was observed: 0.95 probability-coverage region widths for the volume estimation conditional on unidimensional,and the two bidimensional size measurements of 10 mm were 7.32, 7.72, and 6.29 mm, respectively.

Conclusions. The selection of data subsets for performance evaluation is highly impacted by the size metric choice. TheLIDC plans to include a single size measure for each nodule in its database. This metric is not intended as a gold standardfor nodule size; rather, it is intended to facilitate the selection of unique repeatable size limited nodule subsets.

Key words. Quantitative image analysis; X-ray CT; detection; lung nodule annotation; size metrics.

© AUR, 2007

(M.F.M.-G., D.R.A.); Medicine and Biomedical Engineering, University ofIowa, Iowa City, IA (G.M.); and Cancer Imaging Program, National CancerInstitute, Bethesda, MD, (B.Y.C., L.P.C.). This research was funded in partby the National Institutes of Health, National Cancer Institute, Cancer Imag-ing Program by the following grants: R33 CA101110, R01 CA078905, 1U01CA 091085, 1U01 CA 091090, 1U01 CA 091099, 1U01 CA 091100, and1U01 CA 091103. Address correspondence to: A.P.R. e-mail:[email protected]

©

Acad Radiol 2007; 14:1475–1485

1 From the School of Electrical and Computer Engineering, Rhodes Hall(A.P.R., A.M.B.), and Operational Research and Information Engineering(T.V.A.), Cornell University, Ithaca, NY 14853; Department of Radiology,The University of Michigan, Ann Arbor, MI, USA (C.R.M., E.A.K.); Depart-ment of Radiology, The University of Chicago, Chicago, IL (H.M., S.G.A.);Department of Radiology, University of Iowa, Iowa City, IA, USA

(E.J.R.v.B.); Weill Medical College, Cornell University, New York, NY (D.Y.,C.I.H.); David Geffen School of Medicine at UCLA, Los Angeles, CA

AUR, 2007doi:10.1016/j.acra.2007.09.005

1475

Page 2: The Lung Image Database Consortium (LIDC)

REEVES ET AL Academic Radiology, Vol 14, No 12, December 2007

Accurate and reliable measurement of pulmonary nodulesize from CT scans has an important role in computer-assisted evaluation of lung lesions. It is a key factor inthe diagnosis of lung cancer as the estimation of nodulegrowth rates serves as a predictor of malignancy; sizechange can also be used to assess the efficacy of a thera-peutic treatment. Additionally, nodule sizing is a criticalaspect of computer-assisted diagnosis (CAD) systems, andin particular their detection subsystems, because they arealways qualified with respect to a given size range ofnodules.

The usual approach is to characterize a system basedon its performance on a subset of images from a docu-mented image database having a specified size range. Inthe context of spatial extent for a three-dimensional (3D)object without restrictions on shape, the size is best ex-pressed by the volume occupied by that object. However,other important considerations in choosing a method forestimation of size include the imaging modality and thetime available to the physician.

Image modalities may be only two-dimensional (2D)or highly anisotropic with regard to the third dimension.Manually measuring the lesion volume involves inspect-ing all images that include the lesion—a process that isvery time consuming. To provide a standard method forlesion size measurement, the World Health Organization(WHO) proposed in 1979 the use of the product of the max-imal diameter and its largest perpendicular (1), while theResponse Evaluation Criteria in Solid Tumors (RECIST)working group in 1998–2000 proposed the use of the (unidi-mensional) maximal diameter as a more efficient standardestimator of lesion volume (2).

Another measure that has been seen in some presentstudies (3, 4) is the bidimensional Modified Schwartzequation (MS), first introduced by Usuda et al. in 1994(5). This measure is similar to the WHO measure in usingthe maximal diameter and its largest perpendicular butdiffers from WHO because the lesion is assumed to be anellipsoid with a long axis equal to the lesion maximaldiameter and with equal-length short axes equal to thelargest perpendicular.

Addressing the problem from a different perspective isthe active interest in developing computer-assisted meth-ods that will aid the physician in measuring the size oflesions using volumetric methods (6–11). The challengehere is being able to calibrate and validate such methods.Currently, for images of real lesions the only acceptedmethod to establish their size is based on annotations per-

formed by expert radiologists.

1476

In 2000, the National Institutes of Health launched acooperative effort, known as the Lung Image DatabaseConsortium (LIDC) (12), to construct a set of annotatedlung images, especially low-dose helical CT scans ofadults screened for lung cancer, and related technical andclinical data, for the development, testing, and evaluationof different computer-aided cancer screening and diagno-sis technologies (13). The LIDC developed a pulmonarynodule documentation process (14, 15) where expert radi-ologists marked the visible lesion boundary belonging toeach lesion in all of the relevant axial images.

The LIDC annotation process did not require the ex-pert radiologist to provide either unidimensional or bidi-mensional measurement, a technique commonly used inclinical practice. However, given the full boundary of thelesion as marked by the radiologist, we used computeralgorithms to apply the rules from RECIST (unidimen-sional) and WHO (bidimensional) to provide estimates forthe largest diameter and the largest perpendicular mea-sures. We also computed volumetric-based measurementsby processing the boundary documentations.

This report addresses primarily the study of the abso-lute size of nodules, especially given its relevance to theCAD community, in two ways: by analyzing the variationbetween the expert markings and by comparing the fully3D volume measurements with the estimated RECIST,WHO, and MS measurement methods, which involve asingle 2D image.

MATERIALS AND METHODS

The evaluation of the impact of different size metricswas carried out on 265 documented whole-lung CT scans,of which 197 had nodules documented with radiologists’boundaries. All of the 197 scans were acquired from mul-tidetector row CT scanners with pixel size ranging from0.508 to 0.946 mm (average 0.66 mm) and an axial slicethickness ranging from 0.625 to 3.000 mm (average 1.7mm, median 1.8 mm). The tube current ranged from 40to 582 mA (average 177.5 mA, median 160 mA); tubevoltage was 120 kVp for more than half of the cases,with the remaining ones having voltages equal to 130kVp(n � 8), 135kVp (n � 23), and 140 kVp (n � 28). Allthe processing was performed on anonymized data, de-void of any identifying information in accordance to theHealth Insurance Portability and Accountability Act(HIPAA) Privacy Rule (14, 15) that was provided by the

LIDC institutions after approval by their respective IRBs.
Page 3: The Lung Image Database Consortium (LIDC)

Academic Radiology, Vol 14, No 12, December 2007 SIZE METRICS FOR PULMONARY NODULES

As per the LIDC process model (15, 16), each scan wasassessed by four experienced thoracic radiologists, usingcalibrated monitors with magnification capabilities anddetailed reading rules [e.g., initial window and level set-ting of 1500 HU and �500 HU) as described earlier(16)]. Within those rules, the radiologists were instructedto mark the entire boundary, in all the relevant axial scanimages, of all the nodules they estimated to be greaterthan 3 mm in diameter (for smaller nodules, which werenot included in this study, just the central location wasrequested).

The outer boundary was chosen to be made of thosepixels that were just outside the region of the nodule. Assome nodules have cavities or holes in them, radiologistscould also draw inner boundaries when they wanted toexpress the fact that a portion of the surrounding regiondid not belong to the actual nodule.

All the markings are stored as families of sets ofboundary pixels located in the axial image planes. A typi-cal marking on a single image is illustrated in Figure 1,where four radiologist-drawn boundaries are superim-posed onto the CT scan image. Figure 2 shows in moredetail the LIDC rules regarding the definition of the le-sion region. Here, the original scan image is above whilethe marked image is below, with the marker’s boundarypoints shown in white and the region designating themarked lesion shown in black. The National Cancer Im-aging Archive (NCIA) repository of the National Cancer

Figure 1. An example of a single image section of the markingsprovided by the LIDC database.

Institute (17) makes available the CT scans that have

been fully documented by the LIDC together with XMLdocumentation files that contain the boundary points cho-sen by all of the expert radiologists.

All the scans and the XML files that were currentlyavailable as of this writing were imported and parsed toextract the radiologists’ markings represented by outlineinformation. For each nodule, the markings for each radi-

Figure 2. An example of the LIDC rules in documenting nodules.Above (a), the original image data is presented. Below (b), thewhite boundary shows the actual boundary drawn by the radiolo-gist that encloses the black inner region belonging to the nodule.

ologist were converted, according to the LIDC process

1477

Page 4: The Lung Image Database Consortium (LIDC)

REEVES ET AL Academic Radiology, Vol 14, No 12, December 2007

rules (e.g., the black inner region in Figure 2B), into 3Dbinary occupancy images onto which size measurementscould be made. For example, a nodule that had beenmarked by just three of the four radiologists would havethree corresponding 3D binary images, one for each radi-ologist.

The 3D images were used to compute the key valuesfor the metrics under investigation: nodule volume, larg-est diameter, and its largest perpendicular.

The total lesion volume is estimated by counting thenumber of nodule pixels in each of the image slices andthen multiplying their sum by the voxel volume (18); thismethod is frequently used in CAD tools. Pixels belongingto the excluded inner regions do not belong to the noduleregion and therefore are not counted when computing thenodule volume.

The largest diameter is determined as the maximumdiameter, that is, the largest rectilinear segment that com-pletely lies within the nodule region (Fig. 3), among allthe axial-planar subsets of the nodule; this measure issimilar to that used in RECIST (2). The computer estima-tion of the RECIST measurement is illustrated in Figure3A. The solid line is the largest diameter that can beplaced in any axial image within the marked boundary.When lesions have cavities or holes in them, as it is pos-sible in the LIDC database, additional care must be takento estimate the largest diameter.

As a hypothetical example, if the radiologist hadmarked the pixels within the nodule shown white withan “X” in Figure 3B, this would imply a hole for thatregion that the radiologist wished to exclude. For theRECIST criterion, the diameter must be within the le-sion and not include the cavity; therefore, the diametershown by the solid line would be the newly computer-determined RECIST measurement for that lesion.

For the WHO and MS measurements, the largest per-pendicular is also computed by considering every diame-ter pixel to determine the largest sum of the two half-perpendiculars stemming from that pixel. For the WHOmeasurement, the lesion area a is computed as �

4 lw,whereas according to the MS equation, the lesion volumev is computed as being equal to �

6 lw2; in both cases, l isthe largest diameter length and w is the length of its larg-est perpendicular.

For our statistical evaluation, each measure is madeequivalent and directly comparable to the others by ex-pressing its value in terms of the diameter of an equiva-lent sphere or circle according to the measure dimension.

For any given volume estimate, the equivalent diameter d

1478

of a sphere with the same volume is calculated using

2�3 3v4�, where the volume of the equivalent sphere is v �

43��d

2�3. For any given area estimate the equivalent di-ameter d of a circle with the same area is calculated

Figure 3. This figure, above (a), describes graphically how thediameter and its largest perpendicular are computed as surro-gates of radiologist actions. Below (b), if the subregion with thepixels marked with a cross were to be hypothetically removedfrom the actual nodule region, then the previous diameter wouldnot be valid any longer and the new diameter with the relativelargest perpendicular would have to be determined. The three-dimensional metric size would be affected, too, being computedon the decreased nodule volume.

using 2� a�, where the area of the equivalent circle is

Page 5: The Lung Image Database Consortium (LIDC)

Academic Radiology, Vol 14, No 12, December 2007 SIZE METRICS FOR PULMONARY NODULES

a � ��d2�2. Hence, for the bidimensional and MS metrics,

the measured size is equivalent to a diameter of �lw and

�3lw2, respectively, where l is the longest diameter length

and w is the length of its largest perpendicular.For the interreader variation, we considered only those

nodules that had markings by all four radiologists. Forthose nodules, we computed mean and standard deviationvalues based on radiologists’ equivalent measurements.We also computed the smoothed estimators of the varia-tion as a function of the nodule size.

For the metric comparison when multiple markingswere available for a nodule, the median value of eachsize metric from these markings was used to represent thesize for that nodule. From a preliminary analysis, we ob-served that the measurement distributions were nonsym-mentrical and possibly nonunimodal. Therefore, we choseto use a nonparametric method to estimate the probabilitydensity functions because they are more flexible and donot assume a known functional form; in particular, weselected kernel estimators that work by smoothing out thecontribution of each observed data point over a localneighborhood of that data point (19). We determined thekernel estimates of the conditional distribution of the vol-ume measurement given the unidimensional, bidimen-sional, and MS sizes. From the estimates of the condi-tional distribution, estimates of conditional highest densityregions (HDRs) (20) were computed. HDRs are subsetsof the measurement values for which all the points insidea region have a higher probability density than all thepoints outside that region, and they provide tight confi-dence intervals with superior coverage properties thantheir alternatives. From these, we computed the 0.95 and0.99 probability coverage regions to represent our com-parative conditional distributions.

RESULTS

The entire dataset contained 522 lesions for whichboundaries were marked. Four nodules had diameters de-rived from their 3D measures greater than 30 mm (33, 36,48, and 68 mm, respectively) and were not considered inthe analysis because they fell outside the LIDC nodulesize range (16). Of these 518 lesions, only 127 were con-sidered to be nodules greater than 3 mm by all four radi-ologists and, therefore, had four sets of boundary mark-ings available.

The between-reader variation was analyzed for 127

nodules with four readings each and 3D sizes ranging

from 3.36 to 23.8 mm (average 9.01 mm, median 7.37mm). Figure 4 shows the smoothed estimators for thefour metrics [two data points fell outside the plot areafor the unidimensional metric with values of (25.7, 7.3)and (36.8, 3.2), and one for the MS metric with valuesof (16.3, 5.1)]. For the 3D metric, 95% of estimatedstandard deviations were within the range 0.49 –1.25.For the unidimensional, bidimensional, and MS sizemetrics, most of the standard deviations (95%) werewithin the ranges 0.67–2.55, 0.78 –2.11, and 0.96 –2.69,respectively. For the 3D derived diameters, 2 mm nod-ules (1.6%) had an SD greater than 2 mm, while 15nodules (11.8%), 13 nodules (10.2%), and 25 nodules(19.7%) had an SD greater than 2 mm for the unidi-mensional, bidimensional, and MS metrics, respec-tively. As an example, for the 3D, unidimensional, bi-dimensional, and MS size measurements of 10 mm, theestimated SDs were 0.85, 1.16, 1.18, and 1.39 respec-tively. If the size under consideration is 15 mm, thenthe estimated SDs are 1.09, 1.59, 1.54, and 1.79, re-spectively.

The full set of 518 nodules we considered for themetric comparison had nodule sizes, based on the 3Dmetric, ranging from 2.03 to 29.4 mm (average 7.05mm, median 5.71 mm; Figure 5 shows their size distri-bution. Of 518 nodules, 452 (87.3%) met the recom-mendations of RECIST (2). The remaining 66 nodules(average 3D metric size 4.42 mm, median 4.60 mm)were technically too small with respect to slice thick-ness to meet the RECIST recommendations for mea-surement, described in Appendix 1 of (2); in detail 30were on 16 scans with a slice thickness of 3.00 mm, 24were on 15 scans with a slice thickness of 2.50 mm,and the remaining 12 were on 11 scans with a slicethickness between 1.25 and 2.00 mm.

The RECIST rule to skip inner holes was applied in 17cases with an average relative decrease in length of 5.5%(median 4.5%) with respect to maximal segments thatignore the rule. Figure 6 shows the case where the changewas the largest (4.11 mm, 16.5%) because the inner re-gion with the light boundary was marked as not beingpart of the nodule.

Figure 7 shows HDRs for the 3D metric with 95% and99% coverage conditional on the unidimensional, bidi-mensional, and MS size measurements. The conditionalmedians are marked with dots. For example, if a radiolo-gist reports a unidimensional size measurement of 10mm, then the regions of probability coverage 0.95 and

0.99 with the smallest extent for the 3D metric measure-

1479

Page 6: The Lung Image Database Consortium (LIDC)

REEVES ET AL Academic Radiology, Vol 14, No 12, December 2007

1480

Page 7: The Lung Image Database Consortium (LIDC)

Academic Radiology, Vol 14, No 12, December 2007 SIZE METRICS FOR PULMONARY NODULES

ment are the intervals 4.16–11.48 and 3.60–12.21, re-spectively. If the 10-mm measure was from a bidimen-sional measurement, then the region intervals would be5.04–12.77 and 3.65–13.29, respectively. In the case ofthe MS metric, they would be 5.33–11.61 and4.24–12.69, respectively.

For comparison, if the 10-mm measure is from thebidimensional metric, the regions of probability coverage0.95 and 0.99 with the smallest extent for the MS metricare 10.62–12.61 and 10.54–12.63, respectively, whichrespectively represent a 74% and 78% relative decrease inthe range of the measure estimation.

DISCUSSION

The evaluation we performed on a set of 265 docu-mented whole-lung scans focused on two aspects: (a) theinterreader variability and its relationship to the analyzedmetrics and (b) the level of agreement between the ana-lyzed metrics.

Previous studies (21, 22) evaluated interobserver andintraobserver variability on single image measures, eithermonodimensional or bidimensional, and they found con-siderable variability in those measures. Meyer et al. (23)

4™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™Figure 4. Scatterplot of the standard deviation versus means of f

Figure 5. The size distribution (according to the three-dimen-sional metric) of the full set of 518 nodules.

curve for three-dimensional (a), unidimensional (b), bidimensional (c), an

took advantage of the boundaries drawn by six radiolo-gists around 23 lung nodules to compute lung nodule vol-umes showing that radiologists’ subjectivity is a majorsource of variability.

Our analysis of 127 pulmonary nodules found thatreader subjectivity on boundary locations may propagateinto a very large interobserver variation of the size esti-mates. It is clear, as shown by Figure 4, that the unidi-mensional, bidimensional, and MS size metrics have aslightly higher interobserver variation than the 3D metric.Additionally, the smoothed estimate of the 3D metricvariation starts lower and increases more slowly than thevariation estimates of the unidimensional, bidimensional,and MS metrics.

One reason for this high variability is that when thenodule has a complex shape, each radiologist may marksome boundary pixels as belonging or not belonging tothe nodule region. This in turn is reflected in diametersand perpendiculars that can show significant differencesin length, as shown in Figure 8.

™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™xperts’ measurements along with a nonparametric regression

Figure 6. A nodule with an inner region marked by a lightboundary. As the inner region and its boundary are not part of thenodule, the depicted segment cannot be considered a diameterby the RECIST rules.

™™™our e

d MS (d) size estimates.

1481

Page 8: The Lung Image Database Consortium (LIDC)

REEVES ET AL Academic Radiology, Vol 14, No 12, December 2007

As far as the comparison between metrics is con-cerned, a previous study by Van Hoe et al. in 1997 (24)compared five different measurement methods for changein lesion size between scans on CT images of liver metas-tases. The authors concluded that the 3D methods were aviable alternative to 2D methods. Other studies on sizechange for treatment-response assessment in lung cancerhave compared unidimensional and automated 3D mea-sures (21), unidimensional, bidimensional, and 3D mea-sures (25, 26), and manual bidimensional measures withan automated contour technique (27). The studies thatinvolved volume measurements concluded that there waspoor agreement between volumetric-based measures and

Figure 7. The 95% and 99% HDRs for the three-dimensional mthe bidimensional metric (b), and on the MS metric (c).

single-image-based methods.

1482

Our analysis of 518 nodules, marked by at least oneradiologist, showed that the differences between the sin-gle image measurements and the 3D measurements arevery large. The 0.95 and 0.99 probability coverage re-gions are very wide and their projections on the ordinateaxis have large overlaps; moreover, the intervals are notcentered around the marginal values, implying an under-lying bias or a size-dependent scaling factor between themeasures as well.

Poor agreement between the 3D and 2D methods iscommonly seen when the nodule does not conform tothe approximately spherical or ellipsoidal assumptionsthat underlie the 1D and 2D measurements, respec-

size estimate conditional on the unidimensional metric (a), on

etric

tively. Examples of this are shown in Figures 9 and 10.

Page 9: The Lung Image Database Consortium (LIDC)

Academic Radiology, Vol 14, No 12, December 2007 SIZE METRICS FOR PULMONARY NODULES

In Figure 9, the largest extent of the lesion is alignedwith the axial dimension, hence the 2D measurementsunderestimates the 3D-derived diameter, while in Fig-ure 10, the largest extent of the nodule is perpendicularto the axial dimension. Hence, the 2D measurementsoverestimates the 3D-derived diameter. In general, forthe unidimensional metric, we anticipated an overesti-mation of the 3D value because it is a measure of max-

Figure 8. An example of variability among radiologists. Each imest perpendicular (gray line) were determined according to the mimage (a) is on a different slice than the other three (b–d); this isradiologist’s individual marking.

imal extent rather than mean extent in two dimensions.

In the context of CAD algorithm evaluation, the selectionof the size metric may have an important impact. For exam-ple, consider that a CAD system is designed to detect pul-monary nodules larger than 6 mm; to evaluate this system,we select from the documented database all nodules largerthan 6 mm as the set of true positives. For the LIDC dataset, if we use the 6-mm size limit, the number of nodulesselected from the 518 total nodules is as follows: 3D 228,

hows the slice where the largest diameter (dark line) and larg-s provided by each of the four radiologists (a–d). The first

ible because each slice selected for measurement is based on a

age sarkingposs

unidimensional 310, bidimensional 197, and MS 242.

1483

Page 10: The Lung Image Database Consortium (LIDC)

tted l

cular

REEVES ET AL Academic Radiology, Vol 14, No 12, December 2007

A further problem is the difference between the se-lected subsets. For example, if we compare the selectionsfor the 3D and bidimensional metrics, we find that 45nodules in the 3D metric list are not in the bidimensionalmetric list and 14 nodules that are in the bidimensionalmetric list are not in the 3D metric list. The consequenceis that the measured system sensitivity is reduced if dif-ferent metrics for size cutoff are used for test set selectionand CAD system implementation. Consider the perfectCAD system that will detect every nodule marked by theLIDC and that it uses the 3D metric for a 6-mm mini-mum size cutoff.

If we test this system selecting from the LIDC casesalso using the 3D metric, then all the nodules will be cor-rectly identified for 100% sensitivity. However, if we test

Figure 9. A selected case where the three-dimmensional (8.3 mm), bidimensional (8.0 mm), anshow all the nodule regions, in consecutive axiametric measure. The frame with dashed boundaeter (solid line) and its largest perpendicular (do

Figure 10. A selected case where the three-dimensional (21.7 mm), bidimensional (14.1 mm),right show all the nodule regions, in consecutivesional metric measure. The frame with the dotteest diameter (solid line) and its largest perpendi

the same system with the selection by the bidimensional

1484

metric criterion then, of the 197 nodules, 14 will not beidentified because they are considered to be less than 6mm; therefore, the measured sensitivity will be 183 of197, or 95%. In addition, 45 nodules only in the 3D listwould be considered as false positives. If the unidimen-sional metric had been used for selecting the test cases,then the measured sensitivity would be further reduced to223 of 310, or 72%.

The selection of data subsets with size limits can onlybe directly compared if the same size metric is used inboth cases. The LIDC plans to include a single size mea-sure for each nodule in the database to facilitate the selec-tion of unique repeatable size limited nodule subsets. Thismetric is not intended as a gold standard for nodule size.Further, when evaluating the performance of CAD detec-

onal size (10.0 mm) is greater than the unidi-(7.9 mm) sizes. The tiled frames on the right

es, used to compute the three-dimensionalenlarged on the left to show the largest diam-ine).

ional size (10.6 mm) is smaller than the unidi-MS (12.2 mm) sizes. The tiled frames on thel slices, used to compute the three-dimen-undary is enlarged on the left to show the larg-(dotted line).

ensid MSl slicry is

mensand

axiad bo

tion algorithms, any difference between the data selection

Page 11: The Lung Image Database Consortium (LIDC)

Academic Radiology, Vol 14, No 12, December 2007 SIZE METRICS FOR PULMONARY NODULES

metric and the algorithm size selection criterion needs tobe considered.

REFERENCES

1. World Health Organization. WHO Handbook for Reporting Results ofCancer Treatment. Geneva: World Health Organization, 1979, offsetpublication No. 48.

2. Therasse P, Arbuck S, Eisenhauer E, et al. New guidelines to evaluatethe response to treatment in solid tumors. J Natl Cancer Inst 2000; 92:205–216.

3. Hasegawa M, Sone S, Takashima S, et al. Growth rate of small lungcancers detected on mass CT screening. Br J Radiol 2000; 73:1252–1259.

4. Lindell RM, Hartman TE, Swensen SJ, et al. Five-year lung cancerscreening experience: CT appearance, growth rate, location, and histo-logic features of 61 lung cancers. Radiology 2007; 242:555–562.

5. Usuda K, Saito Y, Sagawa M, et al. Tumor doubling time and prognos-tic assessment of patients with primary lung cancer. Cancer 1994; 74:2239–2244.

6. Ko JP, Rusinek H, Jacobs EL, et al. Small pulmonary nodules: Volumemeasurement at chest CT: Phantom study. Radiology 2003; 228:864–870.

7. Kuhnigk J-M, Dicken V, Bornemann L, et al. Fast automated segmenta-tion and reproducible volumetry of pulmonary metastases in CT-scansfor therapy monitoring. In Lecture Notes in Computer Science, Vol.3217, Medical Image Computing and Computer-Assisted Intervention.London: Springer-Verlag GmbH; 2004, pp. 933–941.

8. Okada K, Comaniciu D, Krishnan A. Robust anisotropic gaussian fittingfor volumetric characterization of pulmonary nodules in multislice CT.IEEE Trans Med Imaging 2005; 24:409–423.

9. Goodman LR, Gulsun M, Washington L, Nagy PG, Piacsek KL. In-herent variability of CT lung nodule measurements in vivo usingsemiautomated volumetric measurements. AJR Am J Roentgenol2006; 186:989 –994.

10. Revel MP, Merlin A, Peyrard S, et al. Software volumetric evaluation ofdoubling times for differentiating benign versus malignant pulmonarynodules. AJR Am J Roentgenol 2006; 187:135–142.

11. Reeves A, Chan A, Yankelevitz D, Henschke C, Kressler B, Kostis W.On measuring the change in size of pulmonary nodules. IEEE TransMed Imaging 2006; 25:435–450.

12. National Institutes of Health. Lung image database resource for imag-ing research. Available at http://grants.nih.gov/grants/guide/rfa-files/

RFA-CA-01-001.html. Accessed August 28, 2007.

13. National Cancer Institute, Lung Imaging Database Consortium (LIDC). Avail-able at http://imaging.cancer.gov/programsandresources/InformationSystems/LIDC. Accessed August 28, 2007.

14. Department of Health and Human Services. Unofficial Version of HIPAAAdministrative Simplification Regulation Text, 45 CFR Parts 160, 162, and164, 2006. Available at http://www.hhs.gov/ocr/AdminSimpRegText.pdf.Accessed August 28, 2007.

15. McNitt-Gray MF, Armato SG III, Meyer CR, et al. The Lung Image Da-tabase Consortium (LIDC) data collection process for nodule detectionand annotation. Acad Radiol, in press.

16. Armato SG III, McLennan G, McNitt-Gray MF, et al; Lung Image Data-base Consortium Research Group. Lung Image Database Consortium:Developing a resource for the medical imaging research community.Radiology 2004; 232:739–748.

17. National Cancer Institute. National Cancer Imaging Archive. Available athttps://imaging.nci.nih.gov/ncia/. Accessed August 28, 2007.

18. Breiman RS, Beck JW, Korobkin M, et al. Volume determinations usingcomputed tomography. AJR Am J Roentgenol 1982; 138:329–333.

19. Hyndman RJ, Bashtannyk DM, Grunwald GK. Estimating and visualiz-ing conditional densities. J Comput Graphical Stat 1996; 5:315–336.

20. Hyndman RJ. Computing and graphing highest density regions. AmStat 1996; 50:120–126.

21. Marten K, Auer F, Schmidt S, Kohl G, Rummeny EJ, Engelke C. Inade-quacy of manual measurements compared to automated CT volumetryin assessment of treatment response of pulmonary metastases usingRECIST criteria. Eur Radiol 2006; 16:781–790.

22. Bogot NR, Kazerooni EA, Kelly AM, Quint LE, Desjardins B, Nan B. In-terobserver and intraobserver variability in the assessment of pulmo-nary nodule size on CT using film and computer display methods.Acad Radiol 2005; 12:948–956.

23. Meyer CR, Johnson TD, McLennan G, et al. Evaluation of lung MDCTnodule annotation across radiologists and methods. Acad Radiol 2006;13:1254–1265.

24. Van Hoe L, Van Cutsem E, Vergote I, Baert AL, Bellon E, Dupont P.Size quantification of liver metastases in patients undergoing cancertreatment: Reproducibility of one-, two-, and three-dimensional mea-surements determined with spiral CT. Radiology 1997; 202:671–675.

25. Tran LN, Brown MS, Goldin JG, et al. Comparison of treatment re-sponse classifications between unidimensional, bidimensional, andvolumetric measurements of metastatic lung lesions on chest com-puted tomography. Acad Radiol 2004; 11:1355–1360.

26. Jennings SG, Winer-Muram HT, Tarver RD, Farber MO. Lung tumorgrowth: Assessment with CT—comparison of diameter and cross-sec-tional area with volume measurements. Radiology 2004; 231:866–871.

27. Schwartz LH, Ginsberg MS, DeCorato D, et al. Evaluation of tumor

measurements in oncology: Use of film-based and electronic tech-niques. J Clin Oncol 2000; 18:2179–2184.

1485