Research Article Discriminative Random Field Segmentation ...downloads.hindawi.com/journals/cmmm/2013/683216.pdf · Computational and Mathematical Methods in Medicine...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Hindawi Publishing CorporationComputational and Mathematical Methods in MedicineVolume 2013 Article ID 683216 9 pageshttpdxdoiorg1011552013683216
Research ArticleDiscriminative Random Field Segmentation ofLung Nodules in CT Studies
Brian Liu1 and Ashish Raj2
1 Cornell University Ithaca NY 14853 USA2Weill Cornell Medical College New York NY 10065 USA
Correspondence should be addressed to Brian Liu yl435cornelledu
Received 20 March 2013 Revised 2 June 2013 Accepted 15 June 2013
Academic Editor Tianye Niu
Copyright copy 2013 B Liu and A Raj This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
The ability to conduct high-quality semiautomatic 3D segmentation of lung nodules in CT scans is of high value to busy radiologistsDiscriminative random fields (DRFs) were used to segment 3D volumes of lung nodules in CT scan data using only one seed pointper nodule Optimal parameters for the DRF inference were first found using simulated annealing These parameters were thenused to solve the inference problem using the graph cuts algorithm Results of the segmentation exhibited high precision and recallThe system can be adapted to facilitate the process of longitudinal studies but will still require human checking for failed cases
1 Introduction
Traditionally the analysis of tumors through computedtomography (CT) scans involved time consuming manualsegmentation of tumor volumes where a radiologist or tech-nician would draw ROIs encapsulating the tumor areas byhand Numerous semiautomatic segmentation algorithmshave been proposed for a variety of tumors including brain[1] liver [2] breast [3] and lung [4] In certain cases such asZhang et al [1] the proposed method was not specific to acertain kind of tumor In other cases such as Kostis et al[4] the segmentation required prior knowledge about thecharacteristics of the types of tumors observed in order to domorphological processing
There exists a significant opportunity for reducing thehuman input required for nodule segmentation in longitudi-nal studies An initial seed point given at the first time pointcan be coregistered and extrapolated to subsequent studiesunder the assumption that nodules do not exhibit significantmovementThis is particularly useful in a clinical applicationfor tracking small pulmonary nodules in the lungs to deter-mine malignancy [4]
Markov random fields (MRFs) have been used in the areaof computer vision for segmentation by solving an energyminimization problem [5] We use the pixel grid as a graph
in which each pixel is a vertex and neighboring pixels sharean edge between themWe can then define an energy cost forany given labeling as a function of various features of theMRF In the traditional MRF definition the energy potentialcan be expressed as an association potential function ofeach node and an interaction potential function of pairs ofneighbors The goal is then to find an optimal labeling whichminimizes the total energy Solving the inference problemafterwards can be done quickly and optimally (for binarylabels for multiple label and within an approximation factor)using an optimization method such as graph cuts [6ndash9]Picking the right potential functions can often be a matter oftrial and error
There are several variants of MRFs out in the literatureIn particular conditional random fields (CRFs) generalizethe MRF formulation by allowing data to factor into thetraditionalMRF interaction potential formulation with a dis-criminative model instead of a generative model Kumar andHebertrsquos discriminative random fields (DRFs) [10] extend theusual work of conditional random fields to multiple dimen-sions In particular Kumar and Hebertrsquos construction allowsfor the use of a variety of discriminative models like SVMs[11]
DRFs do suffer from some problems however Becausethe learning process uses a pseudolikelihood approximation
2 Computational and Mathematical Methods in Medicine
the results tend to overestimate the interaction potential para-meters unless careful regularization is used [10] We avoidthis issue by optimizing using simulated annealing on theF-score so that inference results play a direct role in theoptimization The F-score is a direct measure of inferenceperformance so optimization based on the F-score shouldgive us better results than pseudo-likelihood maximizationUnfortunately this sacrifices many of the nice properties ofthe original formulation such as convexity In practice how-ever F-score optimization consistently produces slightly bet-ter results This method has been tried before for CRFs withbetter reported performance than standardCRF training [12]
Our goal in this paper is to applyDRFmethodology to thesegmentation of lung nodules in CT scans To our knowledgethis has never been attempted before A recent work by Yeet al [13] has used graph cuts to segment lung nodules butdid not use an underlying discriminative model to traintheir energy function DRFs have been used by Lee et al[11] for brain tumors in MRI scans with good results TheDRF methodology provides a strong flexible framework forimage segmentation tasks that providesmore robust segmen-tations than nongraphical models For example a previousstudy by Kostis et al has demonstrated a successful lungnodule segmentation algorithm through thresholding andmorphological processing [4] that required identification ofnodule type (eg juxtapleural juxtavascular) This followedearlier work by Zhao et al using progressive thresholding anda conditional shape constraint [14] These methods requiredifferent parameters for different kinds of nodules whichmakes the job of segmentation more time consuming Morerecently Hayashi et al used thresholding and morphologicalfiltering to accomplish the same goal [15]Whilemorphologi-cal filtering can dowell at estimating volumes the filters oftensmooth away surface data which has to be restored via someother method
On the other hand Xu et al used dynamic program-ming and expectation maximization to calculate the opti-mal boundaries of lung nodules using a shape constraintto counter the problem of juxtapleural nodules [16] Thismethod avoids the problem of smoothing away surface databut does not always performwell requiring human interven-tion In addition Xu et al work on each slice independentlywhich does not take advantage of the spacial informationfrom working in three dimensions Similarly a work hasbeen done by Okada et al on robust 2D ellipsoid fitting onsynthetic data [17] though their work does not focus on theend segmentation
Using DRFs we can incorporate simpler more approxi-mate morphological filtering into a set of other features andpairwise constraints to achieve an overall more accurate androbust segmentation Coming up with good features is rarelya systematic process instead one must often rely on intuitionand human knowledge of the problem In the case of lungnodules it is known that a lung nodule is generally locatedaround its seed point has CT intensities in a certain rangeand is usually round [18] This paper shows that good resultscan be achieved even with simple features containing thisinformation Furthermore we can easily learn parametersfrom training data and test performance on test data to avoid
the risk of overfitting The DRF framework allows us to swapout features as we see fit giving us the ability to adapt themethod for other volumes that need segmentation Since theultimate goal of this research is to create a semiautomaticsegmentation algorithm that can be applied to other types oftumor segmentation tasks this is a great advantage
2 Materials and Methods
21 Data Thedata set consisted of 4 pairs of training nodulesand 50 pairs of testing nodules from the VOLCANOrsquo09 Chal-lenge [19] For training and individual results only the first ofeach pair was used For longitudinal comparison results wenumbered each individual nodule such that nodules 119909 and50 + 119909 are the first and second nodules in pair 119909 respectivelyThese numbers will be used throughout Seed points weregiven with the data sets Training was done on the suppliedtraining set only with results evaluated on the suppliedtesting set only
The training set nodules showed variation in image noisebut lacked variation in nodule position In particular thetraining set contained no juxtapleural or juxtavascular nod-ules These kinds of nodules do show up in the testing set Inorder to maintain consistency with the VOLCANOrsquo09 Chal-lenge however the training and test sets were not rearranged
Ground truth voxel labelings for all nodules were donemanually by a graduate research fellow trained by a radiolo-gist
22 Algorithm Summary Several features such as estimatedradius and approximate segmentation are first calculatedthrough a morphological filtering process We will then usesupervised learning to learn the weights for these features ina DRF model of lung nodules from labeled training scansThe details of the feature generation and parameter learningare described in the following section After we have learnedthe parameters we can solve the inference problem using thesame feature generation process and graph cuts to obtain asegmentation on new scans
221 Constants and Nodule Feature Extraction We firstcalculate several global constants from the data A Gaussianmodel of nodule voxel intensities was calculated from thetraining data with constants 120583int and 120590int for the mean andstandard deviation respectively A uniformmodel (thresholdmodel) was calculated from the training data with constants119905min 119905max as the minimum andmaximum thresholds As seenin Figure 1 a Gaussian distribution can fit the nodule voxelsto a first approximation
In addition for each nodule its radius was estimated bytaking the following steps
(1) Denoising an in-slice Gaussian filter of one voxelstandard deviation was applied to smooth out highfrequency noise and then upper and lower thresholdswere applied to obtain an initial segmentation
Computational and Mathematical Methods in Medicine 3
Figure 1 (a) Histogram of voxel intensities of positive examples(nodule voxels) and negative examples (everything else) in trainingdata Negative examples overwhelm positive examples in all inten-sities even in the local area as shown in this histogram We mustthus exploit locality to achieve a good segmentation (b) Histogramnormalized by number of voxels
(2) Subvolume and initial radius estimation a rough esti-mate of radius 119903init was obtained by growing a bound-ing box and stopping when the fraction of voxels notin the initial segmentation reached 075 of the totalvolume
(3) Lung subvolume extraction a morphological closefollowed by a morphological open operation withan anisotropic sphere with 6mm radius (under
the assumption that most features in the lung aresmaller than 6mm) was performed on the inverse ofthe initial segmentationThe nodule area was filled inwith an anisotropic sphere of radius 119903init2 centered atthe input point and a morphological close operationwas applied to arrive at the final lung volume
(4) The initial segmentation was filtered to only includevoxels in the lung volume and filtered again to onlyinclude the voxels in the same connected componentas the seed point
(5) The center of the nodule was recalculated by findingthe local maximum of the 2D distance transform(distance from outside the smoothed segmentation)closest to the seed point on the same slice
(6) The final estimated nodule radius 119903 was calculatedby expanding a sphere from the new center until weincluded no more segmented voxels or the fractionof smoothed segmentation voxels inside the spherereached less than 05
23 DRF Framework We construct a DRF model of the CTvolume as follows
Let119866 = (119878 119864) be the graph that represents the 3D volumewhere each node in 119878 represents a voxel and an edge in 119864connects adjacent voxels in a 6-neighborhood Let 119899
119894be the
observed intensity at voxel 119904119894isin 119878 let 119901
119894be the 3-vector of
the relative coordinates of voxel 119904119894in the volume and let
label 119909119894isin minus1 1 be the label associated with 119904
119894 We define
an observation 119910119894= (119899119894 119901119894) The random variables 119909
119894obey
the Markov property that Pr(119909119894| 119910 119909
119878119894) = Pr(119909
119894| 119910 119909
119873119894)
where119873119894is the set of neighbors of 119904
119894and 119878119894 is everything in 119878
except 119904119894
Assuming only pairwise clique potentials to be nonzero
where 119885 is the partition function 119860119894is an association
potential and 119868119894119895is an interaction potential
231 Association Potential Wemodel the association poten-tial discriminatively using a logistic model since the labelsare binary We will define a feature vector 119891
119894at site 119904
119894as
a function of the observations 119910 The location of the lungnodule voxels was also modeled as a Gaussian deviating froma prior known location normalized by the estimated noduleradius 119903 calculated automatically and constants 119897 = (119897
119909 119897119910 119897119911)
and 120590loc = 119889V where V is the size of the voxel in 119909 119910 and 119911physical coordinates
4 Computational and Mathematical Methods in Medicine
We then define our feature vector to be
119891119892=radic(119899119894minus 120583)2
1205902
119891119906=
0 if 119899119894gt 119905min and 119899119894 lt 119905max
The first two features capture the cost of a voxelrsquos intensityin a Gaussian model and a uniform model respectively Thethird feature captures the cost for a distant voxel from theexpected nodule center
We then have the option of transforming our featurevector via some nonlinear transformation to ℎ
119894(119910) =
[1 1206011(119891119894(119910)) 120601
2(119891119894(119910))]119879 which is a kernel mapping
of our original feature vector with the introduction of a biaselement We chose not to use a kernel so 120601(119891
119894(119910)) = 119891
119894(119910)
The features are then weighted by a parameter 119908We formulate our association potential as a probability by
applying a logistic function
Pr (119909119894= 1 | 119910) =
1
1 + 119890minus119908119879ℎ119894(119910)
(3)
Since Pr(119909119894= minus1 | 119910) = 1minusPr(119909
Finally we model the association potential as the log ofthis probability in order to preserve the logistic regressioncharacteristics when the interaction potential factor is zero[10]
The parameter to learn in the association potential is then119908
232 Interaction Potential Wemodel the interaction poten-tial using the pairwise smoothing of the Ising model nor-malized by a constant minus the difference in intensities ofthe two sites We will define a new feature vector 120575
The 120573 term is a constant term controlling whether thesmoothing cost affects the potential The parameter to opti-mize then is V
24 Learning and Inference
241 Performance Metrics The primary performance met-rics for evaluation used are precision and recall Given acalculated labeling119874 and the ground truth labeling 119866 wherenodule voxels are positive samples and nonnodule voxels arenegative tp denotes true positive fp denotes false positiveand fn denotes false negative Precision and recall are thendefined as
precision = tptp + fp
recall = tptp + fn
(7)
242 Learning Optimal parameters were learned usingsimulated annealing on the F-score of inference results ontraining data
Given parameters 120579 = (119908 V) there exists an optimal label119874 such that for each 119909
119894given 119910119860(119909
119894 119910)+sum
119895isin119873119894119868(119909119894 119909119895 119910) is
greater than 119860(119909119894 119910) + sum 119895 isin 119873
the opposite label of 119909119894) The optimal labeling is calculated
using graph cuts [5]Optimal parameters were found by performing simulated
annealing on the F-score function defined as 2(precision lowastrecall(precision + recall)) At a given iteration 119894 a seg-mentation was calculated with graph cuts using parameters120579119894generated randomly from the previous parameters 120579
119894minus1
constrained distancewise by a ldquotemperaturerdquo parameter thatslowly decays as the iterations increase The calculated seg-mentation is then used to calculate the F-score which iscompared to the F-score of the previous iteration as part of thesimulated annealing process Matlabrsquos simulated annealingimplementation was used to find the optimal parametersBoundary parameters were (minusInf Inf) for all parameters in120579 Initial parameters for simulated annealing were 120579 = 0After the initial run boundary parameters were picked byhand to include the optimum with tighter one-sided boundsto improve running time for subsequent runs This did notchange the optimum parameter appreciably so the initialparameters were changed to the optimum parameters Againthis did not change the optimum parameters upon rerunningsimulated annealing This gives us more confidence that theoptimum parameters we found are in fact optimal in its localneighborhood
243 Inference The volume was first smoothed with a onevoxel radius Gaussian filter to get rid of high frequency noiseAn exact maximum a posteriori solution was then obtainedfor the pairwise Isingmodel by a graph cuts algorithm Graphcuts were performed using Olga Vekslerrsquos gco-30 library inC++ with a Matlab wrapper [6 9]
Computational and Mathematical Methods in Medicine 5
(a) Inferred segmentation
(b) Ground truth segmentation
(c) Unsegmented
Figure 2 Tumor 11 comparison of inferred segmentation versusthe ground truth labeling with the unsegmented subvolume forreference
(a) Inferred segmentation
(b) Ground truth segmentation
(c) Unsegmented
Figure 3 Tumor 23 Comparison of inferred segmentation versusthe ground truth labeling with the unsegmented subvolume forreference
3 Results
31 Segmentation Theparameters were learned from the firstnodules of the 4 given pairs of training nodules Results weresegmented using graph cuts on the first nodules of the 50 pairsof test nodules The mean precision was 092 and the meanrecall was 089 not accounting for the size of the nodulesAn example segmentation and the ground truth can be seenin Figures 2 and 3 When all 50 pairs (100 nodules) wereevaluated the mean precision was 091 and the mean recallwas 089
The segmented physical volumes were plotted against theground truth physical volumes in Figure 4 An ordinary leastsquares fit was applied to the data and the fit line closely
0 05
05
1 15
15
2 25
25
0
1
2
Ground truth size
Segm
ente
d siz
e
Segmented size versus ground truth size
NodulesLS fit y = 1046x + minus8935
Ideal fit y = x
times104
times104
Figure 4 Plot of segmented volume size versus ground truthvolume size An ordinary least squares fit is shown along with theexpected fit 119910 = 119909 The correlation coefficient 119877 = 099 and the119875 value 119875 = 000 Our method accurately estimates the volumescompared to ground truth with no significant bias towards eithera larger or a smaller segmentation
0 040302010
2
4
6
8
10
12
Relative error
Num
ber o
f exa
mpl
es
Relative error histogram
minus04 minus03 minus02 minus01
Figure 5 Histogram of errors relative to ground truth volume
approximates the expected fit line 119910 = 119909 The correlationcoefficient 119877 = 099 and the 119875 value 119875 = 000 This showsthat our method accurately estimates the volumes comparedto ground truth and that there is no significant bias towardseither a larger or a smaller segmentation
The relative volume error compared to ground truth wascalculated for each of the first 50 test examplesThemaximumpositive error was 033 and the maximum negative error wasminus031 A histogram of the relative errors is shown in Figure 5
6 Computational and Mathematical Methods in Medicine
1
02020404
06 060808
1 0246
Precision and recall histogram
Precision Recall
02020404
06 060808
Precision Recall
Figure 6 Histogram of precision and recall of first 50 segmentedexamples
A 2D histogram of the precisions and recalls is shown inFigure 6Most examples had precisions and recalls within the08 to 10 range
As a comparison test performance was compared to theRobust Statistical Segmentation procedure implemented inSlicerThe RSS method uses a statistics-driven active contourmodel for segmentation [20] Approximate volumes werespecified using ground truth data Boundary and intensityuniformity parameters were tuned by hand for each nod-ule until a satisfactory or best possible segmentation wasachieved Slicer RSS achieved a mean of 078 precision and078 recall under these conditions A histogram of the resultscan be seen in Figure 7 RSS is more inconsistent with itsperformance compared to our method Some segmentationscan be seen in Figures 8 9 and 10 and a volume renderingcan be seen in Figure 11 As a whole our method performedbetter than RSS used by Slicer but in some individual caseslike Figure 10 RSS performed better There are examples inwhich both methods performed poorly as well Tumor 30 issuch an example largely due to significant vascularization ofthe nodule and its juxtapleural position A volume renderingcomparison of Tumor 30 can be seen in Figure 12 RSSoversegmented the nodule significantly while DRF alsooversegmented the nodule to a lesser extent A slice-by-slicecomparison can be seen in Figure 13
The metric used to evaluate performance in the VOL-CANOrsquo09 Challenge is percent volume change (1198812minus1198811)1198811from the first sample volume of a pair (1198811) to the second one(1198812) In Figure 14 the percentage change for each testing pairwas plotted against the percentage change from a participant[15] and against the percentage change of our ground truthBecause there was no previous ground truth percentagechange established for the challenge our ground truth doesnot reflect the desired results of the challenge
4 Discussion
Due to the lack of widely available dedicated lung nodulesegmentation software currently it is difficult to compare
10123
Precision and recall histogram
02020404
06 0608
081
Precision Recall
Figure 7 Histogram of precision and recall of first 50 Slicer RSSsegmented examples
our results with existing standards In comparison withsimilar work Ye et al report a mean Dicersquos coefficient of079 on 101 nodules [13] Our Dicersquos coefficient (which is anequivalent definition to the F-score in this context) is 090The standard deviations of our F-scores were both around006 We suspect that our superior performance despitesimpler features can be explained by two factors first ourdiscriminative model and training gave us a better energyfunction and second simpler metrics may prove to be moretolerant to error Dehmeshki et al did not do a voxelwisecomparison but instead reported an ldquoacceptabilityrdquo metric of084 as determined by radiologist examination [21] Kostiset al seemed to have achieved very good results but theydid not report explicit performance metrics comparing theirresults to ground truth [4] Neither Zhao et al [14] orXu et al [16] reported data sets or performance metricscompared to ground truth The comparison with RobustStatistical Segmentation in Slicer shows our performanceagainst a state-of-the-art generalized segmentation tool andour method on average performs better
One must also be wary of placing too much trust inground truth Manual segmentations currently in use maydiffer significantly between users as Opfer and Wiemkerpointed out [22] Without a better idea of the variation inacceptable segmentations one runs the risk of overfittingFor a case like Tumor 30 (which was challenging for bothour algorithm and other comparison algorithms) the nearbyvasculature and pleura may affect the accuracy of manualsegmentations as well
Several groups participated in the VOLCANOrsquo09 Chal-lenge [15 18 19] but because the challenge was focusedon evaluating volume change in longitudinal studies insteadof measuring volume itself only volume change metricswere reported Volume change metrics from our resultswere comparable to the results from Hayashi et al [15]Because aggregate results for the VOLCANOChallenge wererenumbered before reporting in Reeves et al [19] we didnot compare their aggregate results Given our establishedground truth however we believe that the precision andrecall are a better measure of our performance in general
Computational and Mathematical Methods in Medicine 7
Figure 8 Tumor 20 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference In thisexample RSS overestimated the roundness and undersegmented the nodule Our method successfully segmented the bumps
Figure 9 Tumor 40 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference In thisexample both segmentation methods performed well
Figure 10 Tumor 50 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference This isan example in which our method oversegmented into the pleural wall while RSS did not
A natural extension of this work would be to apply thesame method to segmentation of other tumors in the bodyThe problem of segmentation in other anatomical areas hasof course been studied for example Lee et alrsquos work involvedsegmenting MRI data on brain tumors with results implyingtheir precision and recall were around 08 that each [11]
The main advantage of the DRF learning framework isthe automatic learning of energy function parameters forsegmentation Since all specific knowledge about the type oftumor we are looking for is learned automatically from thetraining examples as opposed to knowledge that is built intothe algorithm we can in theory train our model to workwith other types of tumors than the lung nodules presented
in this paper In practice lung nodules are generally easier todistinguish due to their high contrast to surrounding tissueso applying the model to other tumors will likely produceworse results
If the problem has been formulated properly the theo-retical optimum solution for the parameters should be themaximum likelihood solution to the DRF Our investiga-tion however found that the maximum likelihood solutionfavored oversegmentation achieving a very high recall butwith losses in precision We thus decided to use a morepractical approach and optimize directly based on the metricwe were using to evaluate the algorithm the F-score theharmonicmean of precision and recall Our results give better
8 Computational and Mathematical Methods in Medicine
Figure 11 Tumor 20 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volume
Figure 12 Tumor 30 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volumeBoth segmentation methods performed poorly but RSS vastlyoversegmented the nodule compared to our method
(a) Inferred segmentation
(b) RSS segmentation
(c) Ground truth segmentation
(d) Unsegmented
Figure 13 Tumor 30 comparison of inferred segmentation versusRSS with ground truth and unsegmented subvolume for reference
recall with similar precision compared to the maximumpseudo-likelihood solution for the parametersThe differenceis on the order of a few percentage points
In practice the inference step required to segment newnodules can be solved via fast polynomial time algorithmsusing graph cuts Using unoptimized Matlab code on a33 GHz quad core desktop with 8GB RAM this translated
0 10 20 30 40 50
3
35
Case number
Volume change comparison
Segmented volume changeHayashi reported volume changeGround truth volume change
05
15
25
0
1
2
minus05
minus1
Volu
me c
hang
e (V2minusV1)V
1Figure 14 Percent volume change versus Hayashi et alrsquos percentvolume change and our ground truth percent volume change
to sub-10 second segmentations for the volumes tested Withoptimized compiled code this will likely be much faster
41 Conclusion OurDRF semi-automatic segmentation pro-duces results that are generally very accurate with on average90 precision and recallThis system can be used to facilitatelung nodule size tracking applications Further work includescreating a clinical application in order to investigate theconsistency and clinical applicability of such a system Futurework can be done to expand the algorithmrsquos performanceto different types of tumors such as brain or liver Moreconsistency can be established with better radius estimationwhich can be achieved through a better initial segmentationAnother possibility would be to try extending the robustellipsoid fitting algorithm from Okada et al [17] to threedimensions allowing us to get a better estimate of noduleshape
References
[1] J Zhang K K Ma M H Er and V Chong ldquoTumor segmenta-tion frommagnetic resonance imaging by learning via one-classsupport vector machinerdquo in Proceedings of the InternationalWorkshop on Advanced Image Technology pp 207ndash211 2004
[2] Y L Huang J H Chen and W C Shen ldquoDiagnosis ofhepatic tumors with texture analysis in nonenhanced computedtomography imagesrdquoAcademic Radiology vol 13 no 6 pp 713ndash720 2006
[3] H D Li M Kallergi L P Clarke V K Jain and R A ClarkldquoMarkov random field for tumor detection in digital mammog-raphyrdquo IEEE Transactions on Medical Imaging vol 14 no 3 pp565ndash576 1995
[4] W J Kostis A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional segmentation and growth-rate estimation
Computational and Mathematical Methods in Medicine 9
of small pulmonary nodules in helical CT imagesrdquo IEEETransactions on Medical Imaging vol 22 no 10 pp 1259ndash12742003
[5] S Z Li Markov Random Field Modeling in Image AnalysisSpringer Secaucus NJ USA 2001
[6] Y Boykov O Veksler and R Zabih ldquoEfficient approximateenergy minimization via graph cutsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 20 no 12 pp1222ndash1239 2001
[7] V Kolmogorov and R Zabih ldquoWhat energy functions can beminimized via graph cutsrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 26 no 2 pp 147ndash159 2004
[8] Y Boykov and V Kolmogorov ldquoAn experimental comparisonof min-cutmax-flow algorithms for energy minimization invisionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 26 no 9 pp 1124ndash1137 2004
[9] A Delong A Osokin H N Isack and Y Boykov ldquoFast approx-imate energy minimization with label costsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo10) pp 2173ndash2180 June 2010
[10] S Kumar and M Hebert ldquoDiscriminative fields for modelingspatial dependencies in natural imagesrdquo in Advances in NeuralInformation Processing Systems MIT Press 2003
[11] C H Lee M Schmidt A Murtha A Bistritz J Sander andR Greiner ldquoSegmenting brain tumors with conditional randomfields and support vector machinesrdquo in Computer Vision forBiomedical Image Applications vol 3765 of Lecture Notes inComputer Science pp 469ndash478 2005
[12] J Suzuki E McDermott and H Isozaki ldquoTraining conditionalrandom fields with multivariate evaluation measuresrdquo in Pro-ceedings of the 21st International Conference on ComputationalLinguistics and 44th Annual Meeting of the Association forComputational Linguistics (COLINGACL rsquo06) pp 217ndash224Association for Computational Linguistics Stroudsburg PaUSA July 2006
[13] X Ye G Beddoe and G Slabaugh ldquoAutomatic graph cut seg-mentation of lesions in CT using mean shift superpixelsrdquoInternational Journal of Biomedical Imaging vol 2010 ArticleID 983963 14 pages 2010
[14] B Zhao A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional multicriterion automatic segmentation ofpulmonary nodules of helical computed tomography imagesrdquoOptical Engineering vol 38 no 8 pp 1340ndash1347 1999
[15] T Hayashi X Zhou and H Fujita ldquoAutomated approach tomeasure pulmonary nodule volume based on radius and CTnumberrdquo in Proceedings of the 2nd International Workshop onPulmonary Image Analysis pp 377ndash388 2009
[16] N Xu N Ahuja and R Bansal ldquoAutomated lung nodule seg-mentation using dynamic programming and EM-based classi-ficationrdquo in Medical Imaging 2002 Image Processing M Sonkaand J M Fitzpatrick Eds vol 4684 of Proceedings of SPIE pp666ndash676 May 2002
[17] K Okada D Comaniciu and A Krishnan ldquoRobust anisotropicGaussian fitting for volumetric characterization of pulmonarynodules in multislice CTrdquo IEEE Transactions on Medical Imag-ing vol 24 no 2 pp 409ndash423 2005
[18] T Duindam B deHoop and B vanGinneken ldquoNodule volumechange estimation in thoracic CT using sphere fitting morpho-logical segmentation and image registrationrdquo in Proceedings ofthe 2nd International Workshop on Pulmonary Image Analysispp 389ndash400 2009
[19] A Reeves A Jirapatnakul A Biancardi et al ldquoThe volcanorsquo09 challenge preliminary resultsrdquo in Proceedings of the 2ndInternational Workshop on Pulmonary Image Analysis pp 353ndash364 2009
[20] Y Gao A Tannenbaum and R Kikinis ldquoSimultaneous multi-object segmentation using local robust statistics and contourinteractionrdquo in Proceedings of the 2010 International MICCAIConference on Medical Computer Vision Recognition Tech-niques and Applications in Medical Imaging (MCV rsquo10) pp195ndash203 Springer 2011 httpportalacmorgcitationcfmid=19644781964502
[21] J Dehmeshki H Amin M Valdivieso and X Ye ldquoSegmen-tation of pulmonary nodules in thoracic CT scans a regiongrowing approachrdquo IEEE Transactions on Medical Imaging vol27 no 4 pp 467ndash4480 2008
[22] R Opfer and R Wiemker ldquoA new general tumor segmentationframework based on radial basis function energy minimizationwith a validation study on LIDC lung nodulesrdquo in MedicalImaging 2007 Image Processing vol 6512 of Proceedings of SPIEFebruary 2007 article 651217
2 Computational and Mathematical Methods in Medicine
the results tend to overestimate the interaction potential para-meters unless careful regularization is used [10] We avoidthis issue by optimizing using simulated annealing on theF-score so that inference results play a direct role in theoptimization The F-score is a direct measure of inferenceperformance so optimization based on the F-score shouldgive us better results than pseudo-likelihood maximizationUnfortunately this sacrifices many of the nice properties ofthe original formulation such as convexity In practice how-ever F-score optimization consistently produces slightly bet-ter results This method has been tried before for CRFs withbetter reported performance than standardCRF training [12]
Our goal in this paper is to applyDRFmethodology to thesegmentation of lung nodules in CT scans To our knowledgethis has never been attempted before A recent work by Yeet al [13] has used graph cuts to segment lung nodules butdid not use an underlying discriminative model to traintheir energy function DRFs have been used by Lee et al[11] for brain tumors in MRI scans with good results TheDRF methodology provides a strong flexible framework forimage segmentation tasks that providesmore robust segmen-tations than nongraphical models For example a previousstudy by Kostis et al has demonstrated a successful lungnodule segmentation algorithm through thresholding andmorphological processing [4] that required identification ofnodule type (eg juxtapleural juxtavascular) This followedearlier work by Zhao et al using progressive thresholding anda conditional shape constraint [14] These methods requiredifferent parameters for different kinds of nodules whichmakes the job of segmentation more time consuming Morerecently Hayashi et al used thresholding and morphologicalfiltering to accomplish the same goal [15]Whilemorphologi-cal filtering can dowell at estimating volumes the filters oftensmooth away surface data which has to be restored via someother method
On the other hand Xu et al used dynamic program-ming and expectation maximization to calculate the opti-mal boundaries of lung nodules using a shape constraintto counter the problem of juxtapleural nodules [16] Thismethod avoids the problem of smoothing away surface databut does not always performwell requiring human interven-tion In addition Xu et al work on each slice independentlywhich does not take advantage of the spacial informationfrom working in three dimensions Similarly a work hasbeen done by Okada et al on robust 2D ellipsoid fitting onsynthetic data [17] though their work does not focus on theend segmentation
Using DRFs we can incorporate simpler more approxi-mate morphological filtering into a set of other features andpairwise constraints to achieve an overall more accurate androbust segmentation Coming up with good features is rarelya systematic process instead one must often rely on intuitionand human knowledge of the problem In the case of lungnodules it is known that a lung nodule is generally locatedaround its seed point has CT intensities in a certain rangeand is usually round [18] This paper shows that good resultscan be achieved even with simple features containing thisinformation Furthermore we can easily learn parametersfrom training data and test performance on test data to avoid
the risk of overfitting The DRF framework allows us to swapout features as we see fit giving us the ability to adapt themethod for other volumes that need segmentation Since theultimate goal of this research is to create a semiautomaticsegmentation algorithm that can be applied to other types oftumor segmentation tasks this is a great advantage
2 Materials and Methods
21 Data Thedata set consisted of 4 pairs of training nodulesand 50 pairs of testing nodules from the VOLCANOrsquo09 Chal-lenge [19] For training and individual results only the first ofeach pair was used For longitudinal comparison results wenumbered each individual nodule such that nodules 119909 and50 + 119909 are the first and second nodules in pair 119909 respectivelyThese numbers will be used throughout Seed points weregiven with the data sets Training was done on the suppliedtraining set only with results evaluated on the suppliedtesting set only
The training set nodules showed variation in image noisebut lacked variation in nodule position In particular thetraining set contained no juxtapleural or juxtavascular nod-ules These kinds of nodules do show up in the testing set Inorder to maintain consistency with the VOLCANOrsquo09 Chal-lenge however the training and test sets were not rearranged
Ground truth voxel labelings for all nodules were donemanually by a graduate research fellow trained by a radiolo-gist
22 Algorithm Summary Several features such as estimatedradius and approximate segmentation are first calculatedthrough a morphological filtering process We will then usesupervised learning to learn the weights for these features ina DRF model of lung nodules from labeled training scansThe details of the feature generation and parameter learningare described in the following section After we have learnedthe parameters we can solve the inference problem using thesame feature generation process and graph cuts to obtain asegmentation on new scans
221 Constants and Nodule Feature Extraction We firstcalculate several global constants from the data A Gaussianmodel of nodule voxel intensities was calculated from thetraining data with constants 120583int and 120590int for the mean andstandard deviation respectively A uniformmodel (thresholdmodel) was calculated from the training data with constants119905min 119905max as the minimum andmaximum thresholds As seenin Figure 1 a Gaussian distribution can fit the nodule voxelsto a first approximation
In addition for each nodule its radius was estimated bytaking the following steps
(1) Denoising an in-slice Gaussian filter of one voxelstandard deviation was applied to smooth out highfrequency noise and then upper and lower thresholdswere applied to obtain an initial segmentation
Computational and Mathematical Methods in Medicine 3
Figure 1 (a) Histogram of voxel intensities of positive examples(nodule voxels) and negative examples (everything else) in trainingdata Negative examples overwhelm positive examples in all inten-sities even in the local area as shown in this histogram We mustthus exploit locality to achieve a good segmentation (b) Histogramnormalized by number of voxels
(2) Subvolume and initial radius estimation a rough esti-mate of radius 119903init was obtained by growing a bound-ing box and stopping when the fraction of voxels notin the initial segmentation reached 075 of the totalvolume
(3) Lung subvolume extraction a morphological closefollowed by a morphological open operation withan anisotropic sphere with 6mm radius (under
the assumption that most features in the lung aresmaller than 6mm) was performed on the inverse ofthe initial segmentationThe nodule area was filled inwith an anisotropic sphere of radius 119903init2 centered atthe input point and a morphological close operationwas applied to arrive at the final lung volume
(4) The initial segmentation was filtered to only includevoxels in the lung volume and filtered again to onlyinclude the voxels in the same connected componentas the seed point
(5) The center of the nodule was recalculated by findingthe local maximum of the 2D distance transform(distance from outside the smoothed segmentation)closest to the seed point on the same slice
(6) The final estimated nodule radius 119903 was calculatedby expanding a sphere from the new center until weincluded no more segmented voxels or the fractionof smoothed segmentation voxels inside the spherereached less than 05
23 DRF Framework We construct a DRF model of the CTvolume as follows
Let119866 = (119878 119864) be the graph that represents the 3D volumewhere each node in 119878 represents a voxel and an edge in 119864connects adjacent voxels in a 6-neighborhood Let 119899
119894be the
observed intensity at voxel 119904119894isin 119878 let 119901
119894be the 3-vector of
the relative coordinates of voxel 119904119894in the volume and let
label 119909119894isin minus1 1 be the label associated with 119904
119894 We define
an observation 119910119894= (119899119894 119901119894) The random variables 119909
119894obey
the Markov property that Pr(119909119894| 119910 119909
119878119894) = Pr(119909
119894| 119910 119909
119873119894)
where119873119894is the set of neighbors of 119904
119894and 119878119894 is everything in 119878
except 119904119894
Assuming only pairwise clique potentials to be nonzero
where 119885 is the partition function 119860119894is an association
potential and 119868119894119895is an interaction potential
231 Association Potential Wemodel the association poten-tial discriminatively using a logistic model since the labelsare binary We will define a feature vector 119891
119894at site 119904
119894as
a function of the observations 119910 The location of the lungnodule voxels was also modeled as a Gaussian deviating froma prior known location normalized by the estimated noduleradius 119903 calculated automatically and constants 119897 = (119897
119909 119897119910 119897119911)
and 120590loc = 119889V where V is the size of the voxel in 119909 119910 and 119911physical coordinates
4 Computational and Mathematical Methods in Medicine
We then define our feature vector to be
119891119892=radic(119899119894minus 120583)2
1205902
119891119906=
0 if 119899119894gt 119905min and 119899119894 lt 119905max
The first two features capture the cost of a voxelrsquos intensityin a Gaussian model and a uniform model respectively Thethird feature captures the cost for a distant voxel from theexpected nodule center
We then have the option of transforming our featurevector via some nonlinear transformation to ℎ
119894(119910) =
[1 1206011(119891119894(119910)) 120601
2(119891119894(119910))]119879 which is a kernel mapping
of our original feature vector with the introduction of a biaselement We chose not to use a kernel so 120601(119891
119894(119910)) = 119891
119894(119910)
The features are then weighted by a parameter 119908We formulate our association potential as a probability by
applying a logistic function
Pr (119909119894= 1 | 119910) =
1
1 + 119890minus119908119879ℎ119894(119910)
(3)
Since Pr(119909119894= minus1 | 119910) = 1minusPr(119909
Finally we model the association potential as the log ofthis probability in order to preserve the logistic regressioncharacteristics when the interaction potential factor is zero[10]
The parameter to learn in the association potential is then119908
232 Interaction Potential Wemodel the interaction poten-tial using the pairwise smoothing of the Ising model nor-malized by a constant minus the difference in intensities ofthe two sites We will define a new feature vector 120575
The 120573 term is a constant term controlling whether thesmoothing cost affects the potential The parameter to opti-mize then is V
24 Learning and Inference
241 Performance Metrics The primary performance met-rics for evaluation used are precision and recall Given acalculated labeling119874 and the ground truth labeling 119866 wherenodule voxels are positive samples and nonnodule voxels arenegative tp denotes true positive fp denotes false positiveand fn denotes false negative Precision and recall are thendefined as
precision = tptp + fp
recall = tptp + fn
(7)
242 Learning Optimal parameters were learned usingsimulated annealing on the F-score of inference results ontraining data
Given parameters 120579 = (119908 V) there exists an optimal label119874 such that for each 119909
119894given 119910119860(119909
119894 119910)+sum
119895isin119873119894119868(119909119894 119909119895 119910) is
greater than 119860(119909119894 119910) + sum 119895 isin 119873
the opposite label of 119909119894) The optimal labeling is calculated
using graph cuts [5]Optimal parameters were found by performing simulated
annealing on the F-score function defined as 2(precision lowastrecall(precision + recall)) At a given iteration 119894 a seg-mentation was calculated with graph cuts using parameters120579119894generated randomly from the previous parameters 120579
119894minus1
constrained distancewise by a ldquotemperaturerdquo parameter thatslowly decays as the iterations increase The calculated seg-mentation is then used to calculate the F-score which iscompared to the F-score of the previous iteration as part of thesimulated annealing process Matlabrsquos simulated annealingimplementation was used to find the optimal parametersBoundary parameters were (minusInf Inf) for all parameters in120579 Initial parameters for simulated annealing were 120579 = 0After the initial run boundary parameters were picked byhand to include the optimum with tighter one-sided boundsto improve running time for subsequent runs This did notchange the optimum parameter appreciably so the initialparameters were changed to the optimum parameters Againthis did not change the optimum parameters upon rerunningsimulated annealing This gives us more confidence that theoptimum parameters we found are in fact optimal in its localneighborhood
243 Inference The volume was first smoothed with a onevoxel radius Gaussian filter to get rid of high frequency noiseAn exact maximum a posteriori solution was then obtainedfor the pairwise Isingmodel by a graph cuts algorithm Graphcuts were performed using Olga Vekslerrsquos gco-30 library inC++ with a Matlab wrapper [6 9]
Computational and Mathematical Methods in Medicine 5
(a) Inferred segmentation
(b) Ground truth segmentation
(c) Unsegmented
Figure 2 Tumor 11 comparison of inferred segmentation versusthe ground truth labeling with the unsegmented subvolume forreference
(a) Inferred segmentation
(b) Ground truth segmentation
(c) Unsegmented
Figure 3 Tumor 23 Comparison of inferred segmentation versusthe ground truth labeling with the unsegmented subvolume forreference
3 Results
31 Segmentation Theparameters were learned from the firstnodules of the 4 given pairs of training nodules Results weresegmented using graph cuts on the first nodules of the 50 pairsof test nodules The mean precision was 092 and the meanrecall was 089 not accounting for the size of the nodulesAn example segmentation and the ground truth can be seenin Figures 2 and 3 When all 50 pairs (100 nodules) wereevaluated the mean precision was 091 and the mean recallwas 089
The segmented physical volumes were plotted against theground truth physical volumes in Figure 4 An ordinary leastsquares fit was applied to the data and the fit line closely
0 05
05
1 15
15
2 25
25
0
1
2
Ground truth size
Segm
ente
d siz
e
Segmented size versus ground truth size
NodulesLS fit y = 1046x + minus8935
Ideal fit y = x
times104
times104
Figure 4 Plot of segmented volume size versus ground truthvolume size An ordinary least squares fit is shown along with theexpected fit 119910 = 119909 The correlation coefficient 119877 = 099 and the119875 value 119875 = 000 Our method accurately estimates the volumescompared to ground truth with no significant bias towards eithera larger or a smaller segmentation
0 040302010
2
4
6
8
10
12
Relative error
Num
ber o
f exa
mpl
es
Relative error histogram
minus04 minus03 minus02 minus01
Figure 5 Histogram of errors relative to ground truth volume
approximates the expected fit line 119910 = 119909 The correlationcoefficient 119877 = 099 and the 119875 value 119875 = 000 This showsthat our method accurately estimates the volumes comparedto ground truth and that there is no significant bias towardseither a larger or a smaller segmentation
The relative volume error compared to ground truth wascalculated for each of the first 50 test examplesThemaximumpositive error was 033 and the maximum negative error wasminus031 A histogram of the relative errors is shown in Figure 5
6 Computational and Mathematical Methods in Medicine
1
02020404
06 060808
1 0246
Precision and recall histogram
Precision Recall
02020404
06 060808
Precision Recall
Figure 6 Histogram of precision and recall of first 50 segmentedexamples
A 2D histogram of the precisions and recalls is shown inFigure 6Most examples had precisions and recalls within the08 to 10 range
As a comparison test performance was compared to theRobust Statistical Segmentation procedure implemented inSlicerThe RSS method uses a statistics-driven active contourmodel for segmentation [20] Approximate volumes werespecified using ground truth data Boundary and intensityuniformity parameters were tuned by hand for each nod-ule until a satisfactory or best possible segmentation wasachieved Slicer RSS achieved a mean of 078 precision and078 recall under these conditions A histogram of the resultscan be seen in Figure 7 RSS is more inconsistent with itsperformance compared to our method Some segmentationscan be seen in Figures 8 9 and 10 and a volume renderingcan be seen in Figure 11 As a whole our method performedbetter than RSS used by Slicer but in some individual caseslike Figure 10 RSS performed better There are examples inwhich both methods performed poorly as well Tumor 30 issuch an example largely due to significant vascularization ofthe nodule and its juxtapleural position A volume renderingcomparison of Tumor 30 can be seen in Figure 12 RSSoversegmented the nodule significantly while DRF alsooversegmented the nodule to a lesser extent A slice-by-slicecomparison can be seen in Figure 13
The metric used to evaluate performance in the VOL-CANOrsquo09 Challenge is percent volume change (1198812minus1198811)1198811from the first sample volume of a pair (1198811) to the second one(1198812) In Figure 14 the percentage change for each testing pairwas plotted against the percentage change from a participant[15] and against the percentage change of our ground truthBecause there was no previous ground truth percentagechange established for the challenge our ground truth doesnot reflect the desired results of the challenge
4 Discussion
Due to the lack of widely available dedicated lung nodulesegmentation software currently it is difficult to compare
10123
Precision and recall histogram
02020404
06 0608
081
Precision Recall
Figure 7 Histogram of precision and recall of first 50 Slicer RSSsegmented examples
our results with existing standards In comparison withsimilar work Ye et al report a mean Dicersquos coefficient of079 on 101 nodules [13] Our Dicersquos coefficient (which is anequivalent definition to the F-score in this context) is 090The standard deviations of our F-scores were both around006 We suspect that our superior performance despitesimpler features can be explained by two factors first ourdiscriminative model and training gave us a better energyfunction and second simpler metrics may prove to be moretolerant to error Dehmeshki et al did not do a voxelwisecomparison but instead reported an ldquoacceptabilityrdquo metric of084 as determined by radiologist examination [21] Kostiset al seemed to have achieved very good results but theydid not report explicit performance metrics comparing theirresults to ground truth [4] Neither Zhao et al [14] orXu et al [16] reported data sets or performance metricscompared to ground truth The comparison with RobustStatistical Segmentation in Slicer shows our performanceagainst a state-of-the-art generalized segmentation tool andour method on average performs better
One must also be wary of placing too much trust inground truth Manual segmentations currently in use maydiffer significantly between users as Opfer and Wiemkerpointed out [22] Without a better idea of the variation inacceptable segmentations one runs the risk of overfittingFor a case like Tumor 30 (which was challenging for bothour algorithm and other comparison algorithms) the nearbyvasculature and pleura may affect the accuracy of manualsegmentations as well
Several groups participated in the VOLCANOrsquo09 Chal-lenge [15 18 19] but because the challenge was focusedon evaluating volume change in longitudinal studies insteadof measuring volume itself only volume change metricswere reported Volume change metrics from our resultswere comparable to the results from Hayashi et al [15]Because aggregate results for the VOLCANOChallenge wererenumbered before reporting in Reeves et al [19] we didnot compare their aggregate results Given our establishedground truth however we believe that the precision andrecall are a better measure of our performance in general
Computational and Mathematical Methods in Medicine 7
Figure 8 Tumor 20 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference In thisexample RSS overestimated the roundness and undersegmented the nodule Our method successfully segmented the bumps
Figure 9 Tumor 40 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference In thisexample both segmentation methods performed well
Figure 10 Tumor 50 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference This isan example in which our method oversegmented into the pleural wall while RSS did not
A natural extension of this work would be to apply thesame method to segmentation of other tumors in the bodyThe problem of segmentation in other anatomical areas hasof course been studied for example Lee et alrsquos work involvedsegmenting MRI data on brain tumors with results implyingtheir precision and recall were around 08 that each [11]
The main advantage of the DRF learning framework isthe automatic learning of energy function parameters forsegmentation Since all specific knowledge about the type oftumor we are looking for is learned automatically from thetraining examples as opposed to knowledge that is built intothe algorithm we can in theory train our model to workwith other types of tumors than the lung nodules presented
in this paper In practice lung nodules are generally easier todistinguish due to their high contrast to surrounding tissueso applying the model to other tumors will likely produceworse results
If the problem has been formulated properly the theo-retical optimum solution for the parameters should be themaximum likelihood solution to the DRF Our investiga-tion however found that the maximum likelihood solutionfavored oversegmentation achieving a very high recall butwith losses in precision We thus decided to use a morepractical approach and optimize directly based on the metricwe were using to evaluate the algorithm the F-score theharmonicmean of precision and recall Our results give better
8 Computational and Mathematical Methods in Medicine
Figure 11 Tumor 20 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volume
Figure 12 Tumor 30 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volumeBoth segmentation methods performed poorly but RSS vastlyoversegmented the nodule compared to our method
(a) Inferred segmentation
(b) RSS segmentation
(c) Ground truth segmentation
(d) Unsegmented
Figure 13 Tumor 30 comparison of inferred segmentation versusRSS with ground truth and unsegmented subvolume for reference
recall with similar precision compared to the maximumpseudo-likelihood solution for the parametersThe differenceis on the order of a few percentage points
In practice the inference step required to segment newnodules can be solved via fast polynomial time algorithmsusing graph cuts Using unoptimized Matlab code on a33 GHz quad core desktop with 8GB RAM this translated
0 10 20 30 40 50
3
35
Case number
Volume change comparison
Segmented volume changeHayashi reported volume changeGround truth volume change
05
15
25
0
1
2
minus05
minus1
Volu
me c
hang
e (V2minusV1)V
1Figure 14 Percent volume change versus Hayashi et alrsquos percentvolume change and our ground truth percent volume change
to sub-10 second segmentations for the volumes tested Withoptimized compiled code this will likely be much faster
41 Conclusion OurDRF semi-automatic segmentation pro-duces results that are generally very accurate with on average90 precision and recallThis system can be used to facilitatelung nodule size tracking applications Further work includescreating a clinical application in order to investigate theconsistency and clinical applicability of such a system Futurework can be done to expand the algorithmrsquos performanceto different types of tumors such as brain or liver Moreconsistency can be established with better radius estimationwhich can be achieved through a better initial segmentationAnother possibility would be to try extending the robustellipsoid fitting algorithm from Okada et al [17] to threedimensions allowing us to get a better estimate of noduleshape
References
[1] J Zhang K K Ma M H Er and V Chong ldquoTumor segmenta-tion frommagnetic resonance imaging by learning via one-classsupport vector machinerdquo in Proceedings of the InternationalWorkshop on Advanced Image Technology pp 207ndash211 2004
[2] Y L Huang J H Chen and W C Shen ldquoDiagnosis ofhepatic tumors with texture analysis in nonenhanced computedtomography imagesrdquoAcademic Radiology vol 13 no 6 pp 713ndash720 2006
[3] H D Li M Kallergi L P Clarke V K Jain and R A ClarkldquoMarkov random field for tumor detection in digital mammog-raphyrdquo IEEE Transactions on Medical Imaging vol 14 no 3 pp565ndash576 1995
[4] W J Kostis A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional segmentation and growth-rate estimation
Computational and Mathematical Methods in Medicine 9
of small pulmonary nodules in helical CT imagesrdquo IEEETransactions on Medical Imaging vol 22 no 10 pp 1259ndash12742003
[5] S Z Li Markov Random Field Modeling in Image AnalysisSpringer Secaucus NJ USA 2001
[6] Y Boykov O Veksler and R Zabih ldquoEfficient approximateenergy minimization via graph cutsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 20 no 12 pp1222ndash1239 2001
[7] V Kolmogorov and R Zabih ldquoWhat energy functions can beminimized via graph cutsrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 26 no 2 pp 147ndash159 2004
[8] Y Boykov and V Kolmogorov ldquoAn experimental comparisonof min-cutmax-flow algorithms for energy minimization invisionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 26 no 9 pp 1124ndash1137 2004
[9] A Delong A Osokin H N Isack and Y Boykov ldquoFast approx-imate energy minimization with label costsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo10) pp 2173ndash2180 June 2010
[10] S Kumar and M Hebert ldquoDiscriminative fields for modelingspatial dependencies in natural imagesrdquo in Advances in NeuralInformation Processing Systems MIT Press 2003
[11] C H Lee M Schmidt A Murtha A Bistritz J Sander andR Greiner ldquoSegmenting brain tumors with conditional randomfields and support vector machinesrdquo in Computer Vision forBiomedical Image Applications vol 3765 of Lecture Notes inComputer Science pp 469ndash478 2005
[12] J Suzuki E McDermott and H Isozaki ldquoTraining conditionalrandom fields with multivariate evaluation measuresrdquo in Pro-ceedings of the 21st International Conference on ComputationalLinguistics and 44th Annual Meeting of the Association forComputational Linguistics (COLINGACL rsquo06) pp 217ndash224Association for Computational Linguistics Stroudsburg PaUSA July 2006
[13] X Ye G Beddoe and G Slabaugh ldquoAutomatic graph cut seg-mentation of lesions in CT using mean shift superpixelsrdquoInternational Journal of Biomedical Imaging vol 2010 ArticleID 983963 14 pages 2010
[14] B Zhao A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional multicriterion automatic segmentation ofpulmonary nodules of helical computed tomography imagesrdquoOptical Engineering vol 38 no 8 pp 1340ndash1347 1999
[15] T Hayashi X Zhou and H Fujita ldquoAutomated approach tomeasure pulmonary nodule volume based on radius and CTnumberrdquo in Proceedings of the 2nd International Workshop onPulmonary Image Analysis pp 377ndash388 2009
[16] N Xu N Ahuja and R Bansal ldquoAutomated lung nodule seg-mentation using dynamic programming and EM-based classi-ficationrdquo in Medical Imaging 2002 Image Processing M Sonkaand J M Fitzpatrick Eds vol 4684 of Proceedings of SPIE pp666ndash676 May 2002
[17] K Okada D Comaniciu and A Krishnan ldquoRobust anisotropicGaussian fitting for volumetric characterization of pulmonarynodules in multislice CTrdquo IEEE Transactions on Medical Imag-ing vol 24 no 2 pp 409ndash423 2005
[18] T Duindam B deHoop and B vanGinneken ldquoNodule volumechange estimation in thoracic CT using sphere fitting morpho-logical segmentation and image registrationrdquo in Proceedings ofthe 2nd International Workshop on Pulmonary Image Analysispp 389ndash400 2009
[19] A Reeves A Jirapatnakul A Biancardi et al ldquoThe volcanorsquo09 challenge preliminary resultsrdquo in Proceedings of the 2ndInternational Workshop on Pulmonary Image Analysis pp 353ndash364 2009
[20] Y Gao A Tannenbaum and R Kikinis ldquoSimultaneous multi-object segmentation using local robust statistics and contourinteractionrdquo in Proceedings of the 2010 International MICCAIConference on Medical Computer Vision Recognition Tech-niques and Applications in Medical Imaging (MCV rsquo10) pp195ndash203 Springer 2011 httpportalacmorgcitationcfmid=19644781964502
[21] J Dehmeshki H Amin M Valdivieso and X Ye ldquoSegmen-tation of pulmonary nodules in thoracic CT scans a regiongrowing approachrdquo IEEE Transactions on Medical Imaging vol27 no 4 pp 467ndash4480 2008
[22] R Opfer and R Wiemker ldquoA new general tumor segmentationframework based on radial basis function energy minimizationwith a validation study on LIDC lung nodulesrdquo in MedicalImaging 2007 Image Processing vol 6512 of Proceedings of SPIEFebruary 2007 article 651217
Figure 1 (a) Histogram of voxel intensities of positive examples(nodule voxels) and negative examples (everything else) in trainingdata Negative examples overwhelm positive examples in all inten-sities even in the local area as shown in this histogram We mustthus exploit locality to achieve a good segmentation (b) Histogramnormalized by number of voxels
(2) Subvolume and initial radius estimation a rough esti-mate of radius 119903init was obtained by growing a bound-ing box and stopping when the fraction of voxels notin the initial segmentation reached 075 of the totalvolume
(3) Lung subvolume extraction a morphological closefollowed by a morphological open operation withan anisotropic sphere with 6mm radius (under
the assumption that most features in the lung aresmaller than 6mm) was performed on the inverse ofthe initial segmentationThe nodule area was filled inwith an anisotropic sphere of radius 119903init2 centered atthe input point and a morphological close operationwas applied to arrive at the final lung volume
(4) The initial segmentation was filtered to only includevoxels in the lung volume and filtered again to onlyinclude the voxels in the same connected componentas the seed point
(5) The center of the nodule was recalculated by findingthe local maximum of the 2D distance transform(distance from outside the smoothed segmentation)closest to the seed point on the same slice
(6) The final estimated nodule radius 119903 was calculatedby expanding a sphere from the new center until weincluded no more segmented voxels or the fractionof smoothed segmentation voxels inside the spherereached less than 05
23 DRF Framework We construct a DRF model of the CTvolume as follows
Let119866 = (119878 119864) be the graph that represents the 3D volumewhere each node in 119878 represents a voxel and an edge in 119864connects adjacent voxels in a 6-neighborhood Let 119899
119894be the
observed intensity at voxel 119904119894isin 119878 let 119901
119894be the 3-vector of
the relative coordinates of voxel 119904119894in the volume and let
label 119909119894isin minus1 1 be the label associated with 119904
119894 We define
an observation 119910119894= (119899119894 119901119894) The random variables 119909
119894obey
the Markov property that Pr(119909119894| 119910 119909
119878119894) = Pr(119909
119894| 119910 119909
119873119894)
where119873119894is the set of neighbors of 119904
119894and 119878119894 is everything in 119878
except 119904119894
Assuming only pairwise clique potentials to be nonzero
where 119885 is the partition function 119860119894is an association
potential and 119868119894119895is an interaction potential
231 Association Potential Wemodel the association poten-tial discriminatively using a logistic model since the labelsare binary We will define a feature vector 119891
119894at site 119904
119894as
a function of the observations 119910 The location of the lungnodule voxels was also modeled as a Gaussian deviating froma prior known location normalized by the estimated noduleradius 119903 calculated automatically and constants 119897 = (119897
119909 119897119910 119897119911)
and 120590loc = 119889V where V is the size of the voxel in 119909 119910 and 119911physical coordinates
4 Computational and Mathematical Methods in Medicine
We then define our feature vector to be
119891119892=radic(119899119894minus 120583)2
1205902
119891119906=
0 if 119899119894gt 119905min and 119899119894 lt 119905max
The first two features capture the cost of a voxelrsquos intensityin a Gaussian model and a uniform model respectively Thethird feature captures the cost for a distant voxel from theexpected nodule center
We then have the option of transforming our featurevector via some nonlinear transformation to ℎ
119894(119910) =
[1 1206011(119891119894(119910)) 120601
2(119891119894(119910))]119879 which is a kernel mapping
of our original feature vector with the introduction of a biaselement We chose not to use a kernel so 120601(119891
119894(119910)) = 119891
119894(119910)
The features are then weighted by a parameter 119908We formulate our association potential as a probability by
applying a logistic function
Pr (119909119894= 1 | 119910) =
1
1 + 119890minus119908119879ℎ119894(119910)
(3)
Since Pr(119909119894= minus1 | 119910) = 1minusPr(119909
Finally we model the association potential as the log ofthis probability in order to preserve the logistic regressioncharacteristics when the interaction potential factor is zero[10]
The parameter to learn in the association potential is then119908
232 Interaction Potential Wemodel the interaction poten-tial using the pairwise smoothing of the Ising model nor-malized by a constant minus the difference in intensities ofthe two sites We will define a new feature vector 120575
The 120573 term is a constant term controlling whether thesmoothing cost affects the potential The parameter to opti-mize then is V
24 Learning and Inference
241 Performance Metrics The primary performance met-rics for evaluation used are precision and recall Given acalculated labeling119874 and the ground truth labeling 119866 wherenodule voxels are positive samples and nonnodule voxels arenegative tp denotes true positive fp denotes false positiveand fn denotes false negative Precision and recall are thendefined as
precision = tptp + fp
recall = tptp + fn
(7)
242 Learning Optimal parameters were learned usingsimulated annealing on the F-score of inference results ontraining data
Given parameters 120579 = (119908 V) there exists an optimal label119874 such that for each 119909
119894given 119910119860(119909
119894 119910)+sum
119895isin119873119894119868(119909119894 119909119895 119910) is
greater than 119860(119909119894 119910) + sum 119895 isin 119873
the opposite label of 119909119894) The optimal labeling is calculated
using graph cuts [5]Optimal parameters were found by performing simulated
annealing on the F-score function defined as 2(precision lowastrecall(precision + recall)) At a given iteration 119894 a seg-mentation was calculated with graph cuts using parameters120579119894generated randomly from the previous parameters 120579
119894minus1
constrained distancewise by a ldquotemperaturerdquo parameter thatslowly decays as the iterations increase The calculated seg-mentation is then used to calculate the F-score which iscompared to the F-score of the previous iteration as part of thesimulated annealing process Matlabrsquos simulated annealingimplementation was used to find the optimal parametersBoundary parameters were (minusInf Inf) for all parameters in120579 Initial parameters for simulated annealing were 120579 = 0After the initial run boundary parameters were picked byhand to include the optimum with tighter one-sided boundsto improve running time for subsequent runs This did notchange the optimum parameter appreciably so the initialparameters were changed to the optimum parameters Againthis did not change the optimum parameters upon rerunningsimulated annealing This gives us more confidence that theoptimum parameters we found are in fact optimal in its localneighborhood
243 Inference The volume was first smoothed with a onevoxel radius Gaussian filter to get rid of high frequency noiseAn exact maximum a posteriori solution was then obtainedfor the pairwise Isingmodel by a graph cuts algorithm Graphcuts were performed using Olga Vekslerrsquos gco-30 library inC++ with a Matlab wrapper [6 9]
Computational and Mathematical Methods in Medicine 5
(a) Inferred segmentation
(b) Ground truth segmentation
(c) Unsegmented
Figure 2 Tumor 11 comparison of inferred segmentation versusthe ground truth labeling with the unsegmented subvolume forreference
(a) Inferred segmentation
(b) Ground truth segmentation
(c) Unsegmented
Figure 3 Tumor 23 Comparison of inferred segmentation versusthe ground truth labeling with the unsegmented subvolume forreference
3 Results
31 Segmentation Theparameters were learned from the firstnodules of the 4 given pairs of training nodules Results weresegmented using graph cuts on the first nodules of the 50 pairsof test nodules The mean precision was 092 and the meanrecall was 089 not accounting for the size of the nodulesAn example segmentation and the ground truth can be seenin Figures 2 and 3 When all 50 pairs (100 nodules) wereevaluated the mean precision was 091 and the mean recallwas 089
The segmented physical volumes were plotted against theground truth physical volumes in Figure 4 An ordinary leastsquares fit was applied to the data and the fit line closely
0 05
05
1 15
15
2 25
25
0
1
2
Ground truth size
Segm
ente
d siz
e
Segmented size versus ground truth size
NodulesLS fit y = 1046x + minus8935
Ideal fit y = x
times104
times104
Figure 4 Plot of segmented volume size versus ground truthvolume size An ordinary least squares fit is shown along with theexpected fit 119910 = 119909 The correlation coefficient 119877 = 099 and the119875 value 119875 = 000 Our method accurately estimates the volumescompared to ground truth with no significant bias towards eithera larger or a smaller segmentation
0 040302010
2
4
6
8
10
12
Relative error
Num
ber o
f exa
mpl
es
Relative error histogram
minus04 minus03 minus02 minus01
Figure 5 Histogram of errors relative to ground truth volume
approximates the expected fit line 119910 = 119909 The correlationcoefficient 119877 = 099 and the 119875 value 119875 = 000 This showsthat our method accurately estimates the volumes comparedto ground truth and that there is no significant bias towardseither a larger or a smaller segmentation
The relative volume error compared to ground truth wascalculated for each of the first 50 test examplesThemaximumpositive error was 033 and the maximum negative error wasminus031 A histogram of the relative errors is shown in Figure 5
6 Computational and Mathematical Methods in Medicine
1
02020404
06 060808
1 0246
Precision and recall histogram
Precision Recall
02020404
06 060808
Precision Recall
Figure 6 Histogram of precision and recall of first 50 segmentedexamples
A 2D histogram of the precisions and recalls is shown inFigure 6Most examples had precisions and recalls within the08 to 10 range
As a comparison test performance was compared to theRobust Statistical Segmentation procedure implemented inSlicerThe RSS method uses a statistics-driven active contourmodel for segmentation [20] Approximate volumes werespecified using ground truth data Boundary and intensityuniformity parameters were tuned by hand for each nod-ule until a satisfactory or best possible segmentation wasachieved Slicer RSS achieved a mean of 078 precision and078 recall under these conditions A histogram of the resultscan be seen in Figure 7 RSS is more inconsistent with itsperformance compared to our method Some segmentationscan be seen in Figures 8 9 and 10 and a volume renderingcan be seen in Figure 11 As a whole our method performedbetter than RSS used by Slicer but in some individual caseslike Figure 10 RSS performed better There are examples inwhich both methods performed poorly as well Tumor 30 issuch an example largely due to significant vascularization ofthe nodule and its juxtapleural position A volume renderingcomparison of Tumor 30 can be seen in Figure 12 RSSoversegmented the nodule significantly while DRF alsooversegmented the nodule to a lesser extent A slice-by-slicecomparison can be seen in Figure 13
The metric used to evaluate performance in the VOL-CANOrsquo09 Challenge is percent volume change (1198812minus1198811)1198811from the first sample volume of a pair (1198811) to the second one(1198812) In Figure 14 the percentage change for each testing pairwas plotted against the percentage change from a participant[15] and against the percentage change of our ground truthBecause there was no previous ground truth percentagechange established for the challenge our ground truth doesnot reflect the desired results of the challenge
4 Discussion
Due to the lack of widely available dedicated lung nodulesegmentation software currently it is difficult to compare
10123
Precision and recall histogram
02020404
06 0608
081
Precision Recall
Figure 7 Histogram of precision and recall of first 50 Slicer RSSsegmented examples
our results with existing standards In comparison withsimilar work Ye et al report a mean Dicersquos coefficient of079 on 101 nodules [13] Our Dicersquos coefficient (which is anequivalent definition to the F-score in this context) is 090The standard deviations of our F-scores were both around006 We suspect that our superior performance despitesimpler features can be explained by two factors first ourdiscriminative model and training gave us a better energyfunction and second simpler metrics may prove to be moretolerant to error Dehmeshki et al did not do a voxelwisecomparison but instead reported an ldquoacceptabilityrdquo metric of084 as determined by radiologist examination [21] Kostiset al seemed to have achieved very good results but theydid not report explicit performance metrics comparing theirresults to ground truth [4] Neither Zhao et al [14] orXu et al [16] reported data sets or performance metricscompared to ground truth The comparison with RobustStatistical Segmentation in Slicer shows our performanceagainst a state-of-the-art generalized segmentation tool andour method on average performs better
One must also be wary of placing too much trust inground truth Manual segmentations currently in use maydiffer significantly between users as Opfer and Wiemkerpointed out [22] Without a better idea of the variation inacceptable segmentations one runs the risk of overfittingFor a case like Tumor 30 (which was challenging for bothour algorithm and other comparison algorithms) the nearbyvasculature and pleura may affect the accuracy of manualsegmentations as well
Several groups participated in the VOLCANOrsquo09 Chal-lenge [15 18 19] but because the challenge was focusedon evaluating volume change in longitudinal studies insteadof measuring volume itself only volume change metricswere reported Volume change metrics from our resultswere comparable to the results from Hayashi et al [15]Because aggregate results for the VOLCANOChallenge wererenumbered before reporting in Reeves et al [19] we didnot compare their aggregate results Given our establishedground truth however we believe that the precision andrecall are a better measure of our performance in general
Computational and Mathematical Methods in Medicine 7
Figure 8 Tumor 20 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference In thisexample RSS overestimated the roundness and undersegmented the nodule Our method successfully segmented the bumps
Figure 9 Tumor 40 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference In thisexample both segmentation methods performed well
Figure 10 Tumor 50 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference This isan example in which our method oversegmented into the pleural wall while RSS did not
A natural extension of this work would be to apply thesame method to segmentation of other tumors in the bodyThe problem of segmentation in other anatomical areas hasof course been studied for example Lee et alrsquos work involvedsegmenting MRI data on brain tumors with results implyingtheir precision and recall were around 08 that each [11]
The main advantage of the DRF learning framework isthe automatic learning of energy function parameters forsegmentation Since all specific knowledge about the type oftumor we are looking for is learned automatically from thetraining examples as opposed to knowledge that is built intothe algorithm we can in theory train our model to workwith other types of tumors than the lung nodules presented
in this paper In practice lung nodules are generally easier todistinguish due to their high contrast to surrounding tissueso applying the model to other tumors will likely produceworse results
If the problem has been formulated properly the theo-retical optimum solution for the parameters should be themaximum likelihood solution to the DRF Our investiga-tion however found that the maximum likelihood solutionfavored oversegmentation achieving a very high recall butwith losses in precision We thus decided to use a morepractical approach and optimize directly based on the metricwe were using to evaluate the algorithm the F-score theharmonicmean of precision and recall Our results give better
8 Computational and Mathematical Methods in Medicine
Figure 11 Tumor 20 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volume
Figure 12 Tumor 30 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volumeBoth segmentation methods performed poorly but RSS vastlyoversegmented the nodule compared to our method
(a) Inferred segmentation
(b) RSS segmentation
(c) Ground truth segmentation
(d) Unsegmented
Figure 13 Tumor 30 comparison of inferred segmentation versusRSS with ground truth and unsegmented subvolume for reference
recall with similar precision compared to the maximumpseudo-likelihood solution for the parametersThe differenceis on the order of a few percentage points
In practice the inference step required to segment newnodules can be solved via fast polynomial time algorithmsusing graph cuts Using unoptimized Matlab code on a33 GHz quad core desktop with 8GB RAM this translated
0 10 20 30 40 50
3
35
Case number
Volume change comparison
Segmented volume changeHayashi reported volume changeGround truth volume change
05
15
25
0
1
2
minus05
minus1
Volu
me c
hang
e (V2minusV1)V
1Figure 14 Percent volume change versus Hayashi et alrsquos percentvolume change and our ground truth percent volume change
to sub-10 second segmentations for the volumes tested Withoptimized compiled code this will likely be much faster
41 Conclusion OurDRF semi-automatic segmentation pro-duces results that are generally very accurate with on average90 precision and recallThis system can be used to facilitatelung nodule size tracking applications Further work includescreating a clinical application in order to investigate theconsistency and clinical applicability of such a system Futurework can be done to expand the algorithmrsquos performanceto different types of tumors such as brain or liver Moreconsistency can be established with better radius estimationwhich can be achieved through a better initial segmentationAnother possibility would be to try extending the robustellipsoid fitting algorithm from Okada et al [17] to threedimensions allowing us to get a better estimate of noduleshape
References
[1] J Zhang K K Ma M H Er and V Chong ldquoTumor segmenta-tion frommagnetic resonance imaging by learning via one-classsupport vector machinerdquo in Proceedings of the InternationalWorkshop on Advanced Image Technology pp 207ndash211 2004
[2] Y L Huang J H Chen and W C Shen ldquoDiagnosis ofhepatic tumors with texture analysis in nonenhanced computedtomography imagesrdquoAcademic Radiology vol 13 no 6 pp 713ndash720 2006
[3] H D Li M Kallergi L P Clarke V K Jain and R A ClarkldquoMarkov random field for tumor detection in digital mammog-raphyrdquo IEEE Transactions on Medical Imaging vol 14 no 3 pp565ndash576 1995
[4] W J Kostis A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional segmentation and growth-rate estimation
Computational and Mathematical Methods in Medicine 9
of small pulmonary nodules in helical CT imagesrdquo IEEETransactions on Medical Imaging vol 22 no 10 pp 1259ndash12742003
[5] S Z Li Markov Random Field Modeling in Image AnalysisSpringer Secaucus NJ USA 2001
[6] Y Boykov O Veksler and R Zabih ldquoEfficient approximateenergy minimization via graph cutsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 20 no 12 pp1222ndash1239 2001
[7] V Kolmogorov and R Zabih ldquoWhat energy functions can beminimized via graph cutsrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 26 no 2 pp 147ndash159 2004
[8] Y Boykov and V Kolmogorov ldquoAn experimental comparisonof min-cutmax-flow algorithms for energy minimization invisionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 26 no 9 pp 1124ndash1137 2004
[9] A Delong A Osokin H N Isack and Y Boykov ldquoFast approx-imate energy minimization with label costsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo10) pp 2173ndash2180 June 2010
[10] S Kumar and M Hebert ldquoDiscriminative fields for modelingspatial dependencies in natural imagesrdquo in Advances in NeuralInformation Processing Systems MIT Press 2003
[11] C H Lee M Schmidt A Murtha A Bistritz J Sander andR Greiner ldquoSegmenting brain tumors with conditional randomfields and support vector machinesrdquo in Computer Vision forBiomedical Image Applications vol 3765 of Lecture Notes inComputer Science pp 469ndash478 2005
[12] J Suzuki E McDermott and H Isozaki ldquoTraining conditionalrandom fields with multivariate evaluation measuresrdquo in Pro-ceedings of the 21st International Conference on ComputationalLinguistics and 44th Annual Meeting of the Association forComputational Linguistics (COLINGACL rsquo06) pp 217ndash224Association for Computational Linguistics Stroudsburg PaUSA July 2006
[13] X Ye G Beddoe and G Slabaugh ldquoAutomatic graph cut seg-mentation of lesions in CT using mean shift superpixelsrdquoInternational Journal of Biomedical Imaging vol 2010 ArticleID 983963 14 pages 2010
[14] B Zhao A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional multicriterion automatic segmentation ofpulmonary nodules of helical computed tomography imagesrdquoOptical Engineering vol 38 no 8 pp 1340ndash1347 1999
[15] T Hayashi X Zhou and H Fujita ldquoAutomated approach tomeasure pulmonary nodule volume based on radius and CTnumberrdquo in Proceedings of the 2nd International Workshop onPulmonary Image Analysis pp 377ndash388 2009
[16] N Xu N Ahuja and R Bansal ldquoAutomated lung nodule seg-mentation using dynamic programming and EM-based classi-ficationrdquo in Medical Imaging 2002 Image Processing M Sonkaand J M Fitzpatrick Eds vol 4684 of Proceedings of SPIE pp666ndash676 May 2002
[17] K Okada D Comaniciu and A Krishnan ldquoRobust anisotropicGaussian fitting for volumetric characterization of pulmonarynodules in multislice CTrdquo IEEE Transactions on Medical Imag-ing vol 24 no 2 pp 409ndash423 2005
[18] T Duindam B deHoop and B vanGinneken ldquoNodule volumechange estimation in thoracic CT using sphere fitting morpho-logical segmentation and image registrationrdquo in Proceedings ofthe 2nd International Workshop on Pulmonary Image Analysispp 389ndash400 2009
[19] A Reeves A Jirapatnakul A Biancardi et al ldquoThe volcanorsquo09 challenge preliminary resultsrdquo in Proceedings of the 2ndInternational Workshop on Pulmonary Image Analysis pp 353ndash364 2009
[20] Y Gao A Tannenbaum and R Kikinis ldquoSimultaneous multi-object segmentation using local robust statistics and contourinteractionrdquo in Proceedings of the 2010 International MICCAIConference on Medical Computer Vision Recognition Tech-niques and Applications in Medical Imaging (MCV rsquo10) pp195ndash203 Springer 2011 httpportalacmorgcitationcfmid=19644781964502
[21] J Dehmeshki H Amin M Valdivieso and X Ye ldquoSegmen-tation of pulmonary nodules in thoracic CT scans a regiongrowing approachrdquo IEEE Transactions on Medical Imaging vol27 no 4 pp 467ndash4480 2008
[22] R Opfer and R Wiemker ldquoA new general tumor segmentationframework based on radial basis function energy minimizationwith a validation study on LIDC lung nodulesrdquo in MedicalImaging 2007 Image Processing vol 6512 of Proceedings of SPIEFebruary 2007 article 651217
The first two features capture the cost of a voxelrsquos intensityin a Gaussian model and a uniform model respectively Thethird feature captures the cost for a distant voxel from theexpected nodule center
We then have the option of transforming our featurevector via some nonlinear transformation to ℎ
119894(119910) =
[1 1206011(119891119894(119910)) 120601
2(119891119894(119910))]119879 which is a kernel mapping
of our original feature vector with the introduction of a biaselement We chose not to use a kernel so 120601(119891
119894(119910)) = 119891
119894(119910)
The features are then weighted by a parameter 119908We formulate our association potential as a probability by
applying a logistic function
Pr (119909119894= 1 | 119910) =
1
1 + 119890minus119908119879ℎ119894(119910)
(3)
Since Pr(119909119894= minus1 | 119910) = 1minusPr(119909
Finally we model the association potential as the log ofthis probability in order to preserve the logistic regressioncharacteristics when the interaction potential factor is zero[10]
The parameter to learn in the association potential is then119908
232 Interaction Potential Wemodel the interaction poten-tial using the pairwise smoothing of the Ising model nor-malized by a constant minus the difference in intensities ofthe two sites We will define a new feature vector 120575
The 120573 term is a constant term controlling whether thesmoothing cost affects the potential The parameter to opti-mize then is V
24 Learning and Inference
241 Performance Metrics The primary performance met-rics for evaluation used are precision and recall Given acalculated labeling119874 and the ground truth labeling 119866 wherenodule voxels are positive samples and nonnodule voxels arenegative tp denotes true positive fp denotes false positiveand fn denotes false negative Precision and recall are thendefined as
precision = tptp + fp
recall = tptp + fn
(7)
242 Learning Optimal parameters were learned usingsimulated annealing on the F-score of inference results ontraining data
Given parameters 120579 = (119908 V) there exists an optimal label119874 such that for each 119909
119894given 119910119860(119909
119894 119910)+sum
119895isin119873119894119868(119909119894 119909119895 119910) is
greater than 119860(119909119894 119910) + sum 119895 isin 119873
the opposite label of 119909119894) The optimal labeling is calculated
using graph cuts [5]Optimal parameters were found by performing simulated
annealing on the F-score function defined as 2(precision lowastrecall(precision + recall)) At a given iteration 119894 a seg-mentation was calculated with graph cuts using parameters120579119894generated randomly from the previous parameters 120579
119894minus1
constrained distancewise by a ldquotemperaturerdquo parameter thatslowly decays as the iterations increase The calculated seg-mentation is then used to calculate the F-score which iscompared to the F-score of the previous iteration as part of thesimulated annealing process Matlabrsquos simulated annealingimplementation was used to find the optimal parametersBoundary parameters were (minusInf Inf) for all parameters in120579 Initial parameters for simulated annealing were 120579 = 0After the initial run boundary parameters were picked byhand to include the optimum with tighter one-sided boundsto improve running time for subsequent runs This did notchange the optimum parameter appreciably so the initialparameters were changed to the optimum parameters Againthis did not change the optimum parameters upon rerunningsimulated annealing This gives us more confidence that theoptimum parameters we found are in fact optimal in its localneighborhood
243 Inference The volume was first smoothed with a onevoxel radius Gaussian filter to get rid of high frequency noiseAn exact maximum a posteriori solution was then obtainedfor the pairwise Isingmodel by a graph cuts algorithm Graphcuts were performed using Olga Vekslerrsquos gco-30 library inC++ with a Matlab wrapper [6 9]
Computational and Mathematical Methods in Medicine 5
(a) Inferred segmentation
(b) Ground truth segmentation
(c) Unsegmented
Figure 2 Tumor 11 comparison of inferred segmentation versusthe ground truth labeling with the unsegmented subvolume forreference
(a) Inferred segmentation
(b) Ground truth segmentation
(c) Unsegmented
Figure 3 Tumor 23 Comparison of inferred segmentation versusthe ground truth labeling with the unsegmented subvolume forreference
3 Results
31 Segmentation Theparameters were learned from the firstnodules of the 4 given pairs of training nodules Results weresegmented using graph cuts on the first nodules of the 50 pairsof test nodules The mean precision was 092 and the meanrecall was 089 not accounting for the size of the nodulesAn example segmentation and the ground truth can be seenin Figures 2 and 3 When all 50 pairs (100 nodules) wereevaluated the mean precision was 091 and the mean recallwas 089
The segmented physical volumes were plotted against theground truth physical volumes in Figure 4 An ordinary leastsquares fit was applied to the data and the fit line closely
0 05
05
1 15
15
2 25
25
0
1
2
Ground truth size
Segm
ente
d siz
e
Segmented size versus ground truth size
NodulesLS fit y = 1046x + minus8935
Ideal fit y = x
times104
times104
Figure 4 Plot of segmented volume size versus ground truthvolume size An ordinary least squares fit is shown along with theexpected fit 119910 = 119909 The correlation coefficient 119877 = 099 and the119875 value 119875 = 000 Our method accurately estimates the volumescompared to ground truth with no significant bias towards eithera larger or a smaller segmentation
0 040302010
2
4
6
8
10
12
Relative error
Num
ber o
f exa
mpl
es
Relative error histogram
minus04 minus03 minus02 minus01
Figure 5 Histogram of errors relative to ground truth volume
approximates the expected fit line 119910 = 119909 The correlationcoefficient 119877 = 099 and the 119875 value 119875 = 000 This showsthat our method accurately estimates the volumes comparedto ground truth and that there is no significant bias towardseither a larger or a smaller segmentation
The relative volume error compared to ground truth wascalculated for each of the first 50 test examplesThemaximumpositive error was 033 and the maximum negative error wasminus031 A histogram of the relative errors is shown in Figure 5
6 Computational and Mathematical Methods in Medicine
1
02020404
06 060808
1 0246
Precision and recall histogram
Precision Recall
02020404
06 060808
Precision Recall
Figure 6 Histogram of precision and recall of first 50 segmentedexamples
A 2D histogram of the precisions and recalls is shown inFigure 6Most examples had precisions and recalls within the08 to 10 range
As a comparison test performance was compared to theRobust Statistical Segmentation procedure implemented inSlicerThe RSS method uses a statistics-driven active contourmodel for segmentation [20] Approximate volumes werespecified using ground truth data Boundary and intensityuniformity parameters were tuned by hand for each nod-ule until a satisfactory or best possible segmentation wasachieved Slicer RSS achieved a mean of 078 precision and078 recall under these conditions A histogram of the resultscan be seen in Figure 7 RSS is more inconsistent with itsperformance compared to our method Some segmentationscan be seen in Figures 8 9 and 10 and a volume renderingcan be seen in Figure 11 As a whole our method performedbetter than RSS used by Slicer but in some individual caseslike Figure 10 RSS performed better There are examples inwhich both methods performed poorly as well Tumor 30 issuch an example largely due to significant vascularization ofthe nodule and its juxtapleural position A volume renderingcomparison of Tumor 30 can be seen in Figure 12 RSSoversegmented the nodule significantly while DRF alsooversegmented the nodule to a lesser extent A slice-by-slicecomparison can be seen in Figure 13
The metric used to evaluate performance in the VOL-CANOrsquo09 Challenge is percent volume change (1198812minus1198811)1198811from the first sample volume of a pair (1198811) to the second one(1198812) In Figure 14 the percentage change for each testing pairwas plotted against the percentage change from a participant[15] and against the percentage change of our ground truthBecause there was no previous ground truth percentagechange established for the challenge our ground truth doesnot reflect the desired results of the challenge
4 Discussion
Due to the lack of widely available dedicated lung nodulesegmentation software currently it is difficult to compare
10123
Precision and recall histogram
02020404
06 0608
081
Precision Recall
Figure 7 Histogram of precision and recall of first 50 Slicer RSSsegmented examples
our results with existing standards In comparison withsimilar work Ye et al report a mean Dicersquos coefficient of079 on 101 nodules [13] Our Dicersquos coefficient (which is anequivalent definition to the F-score in this context) is 090The standard deviations of our F-scores were both around006 We suspect that our superior performance despitesimpler features can be explained by two factors first ourdiscriminative model and training gave us a better energyfunction and second simpler metrics may prove to be moretolerant to error Dehmeshki et al did not do a voxelwisecomparison but instead reported an ldquoacceptabilityrdquo metric of084 as determined by radiologist examination [21] Kostiset al seemed to have achieved very good results but theydid not report explicit performance metrics comparing theirresults to ground truth [4] Neither Zhao et al [14] orXu et al [16] reported data sets or performance metricscompared to ground truth The comparison with RobustStatistical Segmentation in Slicer shows our performanceagainst a state-of-the-art generalized segmentation tool andour method on average performs better
One must also be wary of placing too much trust inground truth Manual segmentations currently in use maydiffer significantly between users as Opfer and Wiemkerpointed out [22] Without a better idea of the variation inacceptable segmentations one runs the risk of overfittingFor a case like Tumor 30 (which was challenging for bothour algorithm and other comparison algorithms) the nearbyvasculature and pleura may affect the accuracy of manualsegmentations as well
Several groups participated in the VOLCANOrsquo09 Chal-lenge [15 18 19] but because the challenge was focusedon evaluating volume change in longitudinal studies insteadof measuring volume itself only volume change metricswere reported Volume change metrics from our resultswere comparable to the results from Hayashi et al [15]Because aggregate results for the VOLCANOChallenge wererenumbered before reporting in Reeves et al [19] we didnot compare their aggregate results Given our establishedground truth however we believe that the precision andrecall are a better measure of our performance in general
Computational and Mathematical Methods in Medicine 7
Figure 8 Tumor 20 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference In thisexample RSS overestimated the roundness and undersegmented the nodule Our method successfully segmented the bumps
Figure 9 Tumor 40 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference In thisexample both segmentation methods performed well
Figure 10 Tumor 50 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference This isan example in which our method oversegmented into the pleural wall while RSS did not
A natural extension of this work would be to apply thesame method to segmentation of other tumors in the bodyThe problem of segmentation in other anatomical areas hasof course been studied for example Lee et alrsquos work involvedsegmenting MRI data on brain tumors with results implyingtheir precision and recall were around 08 that each [11]
The main advantage of the DRF learning framework isthe automatic learning of energy function parameters forsegmentation Since all specific knowledge about the type oftumor we are looking for is learned automatically from thetraining examples as opposed to knowledge that is built intothe algorithm we can in theory train our model to workwith other types of tumors than the lung nodules presented
in this paper In practice lung nodules are generally easier todistinguish due to their high contrast to surrounding tissueso applying the model to other tumors will likely produceworse results
If the problem has been formulated properly the theo-retical optimum solution for the parameters should be themaximum likelihood solution to the DRF Our investiga-tion however found that the maximum likelihood solutionfavored oversegmentation achieving a very high recall butwith losses in precision We thus decided to use a morepractical approach and optimize directly based on the metricwe were using to evaluate the algorithm the F-score theharmonicmean of precision and recall Our results give better
8 Computational and Mathematical Methods in Medicine
Figure 11 Tumor 20 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volume
Figure 12 Tumor 30 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volumeBoth segmentation methods performed poorly but RSS vastlyoversegmented the nodule compared to our method
(a) Inferred segmentation
(b) RSS segmentation
(c) Ground truth segmentation
(d) Unsegmented
Figure 13 Tumor 30 comparison of inferred segmentation versusRSS with ground truth and unsegmented subvolume for reference
recall with similar precision compared to the maximumpseudo-likelihood solution for the parametersThe differenceis on the order of a few percentage points
In practice the inference step required to segment newnodules can be solved via fast polynomial time algorithmsusing graph cuts Using unoptimized Matlab code on a33 GHz quad core desktop with 8GB RAM this translated
0 10 20 30 40 50
3
35
Case number
Volume change comparison
Segmented volume changeHayashi reported volume changeGround truth volume change
05
15
25
0
1
2
minus05
minus1
Volu
me c
hang
e (V2minusV1)V
1Figure 14 Percent volume change versus Hayashi et alrsquos percentvolume change and our ground truth percent volume change
to sub-10 second segmentations for the volumes tested Withoptimized compiled code this will likely be much faster
41 Conclusion OurDRF semi-automatic segmentation pro-duces results that are generally very accurate with on average90 precision and recallThis system can be used to facilitatelung nodule size tracking applications Further work includescreating a clinical application in order to investigate theconsistency and clinical applicability of such a system Futurework can be done to expand the algorithmrsquos performanceto different types of tumors such as brain or liver Moreconsistency can be established with better radius estimationwhich can be achieved through a better initial segmentationAnother possibility would be to try extending the robustellipsoid fitting algorithm from Okada et al [17] to threedimensions allowing us to get a better estimate of noduleshape
References
[1] J Zhang K K Ma M H Er and V Chong ldquoTumor segmenta-tion frommagnetic resonance imaging by learning via one-classsupport vector machinerdquo in Proceedings of the InternationalWorkshop on Advanced Image Technology pp 207ndash211 2004
[2] Y L Huang J H Chen and W C Shen ldquoDiagnosis ofhepatic tumors with texture analysis in nonenhanced computedtomography imagesrdquoAcademic Radiology vol 13 no 6 pp 713ndash720 2006
[3] H D Li M Kallergi L P Clarke V K Jain and R A ClarkldquoMarkov random field for tumor detection in digital mammog-raphyrdquo IEEE Transactions on Medical Imaging vol 14 no 3 pp565ndash576 1995
[4] W J Kostis A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional segmentation and growth-rate estimation
Computational and Mathematical Methods in Medicine 9
of small pulmonary nodules in helical CT imagesrdquo IEEETransactions on Medical Imaging vol 22 no 10 pp 1259ndash12742003
[5] S Z Li Markov Random Field Modeling in Image AnalysisSpringer Secaucus NJ USA 2001
[6] Y Boykov O Veksler and R Zabih ldquoEfficient approximateenergy minimization via graph cutsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 20 no 12 pp1222ndash1239 2001
[7] V Kolmogorov and R Zabih ldquoWhat energy functions can beminimized via graph cutsrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 26 no 2 pp 147ndash159 2004
[8] Y Boykov and V Kolmogorov ldquoAn experimental comparisonof min-cutmax-flow algorithms for energy minimization invisionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 26 no 9 pp 1124ndash1137 2004
[9] A Delong A Osokin H N Isack and Y Boykov ldquoFast approx-imate energy minimization with label costsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo10) pp 2173ndash2180 June 2010
[10] S Kumar and M Hebert ldquoDiscriminative fields for modelingspatial dependencies in natural imagesrdquo in Advances in NeuralInformation Processing Systems MIT Press 2003
[11] C H Lee M Schmidt A Murtha A Bistritz J Sander andR Greiner ldquoSegmenting brain tumors with conditional randomfields and support vector machinesrdquo in Computer Vision forBiomedical Image Applications vol 3765 of Lecture Notes inComputer Science pp 469ndash478 2005
[12] J Suzuki E McDermott and H Isozaki ldquoTraining conditionalrandom fields with multivariate evaluation measuresrdquo in Pro-ceedings of the 21st International Conference on ComputationalLinguistics and 44th Annual Meeting of the Association forComputational Linguistics (COLINGACL rsquo06) pp 217ndash224Association for Computational Linguistics Stroudsburg PaUSA July 2006
[13] X Ye G Beddoe and G Slabaugh ldquoAutomatic graph cut seg-mentation of lesions in CT using mean shift superpixelsrdquoInternational Journal of Biomedical Imaging vol 2010 ArticleID 983963 14 pages 2010
[14] B Zhao A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional multicriterion automatic segmentation ofpulmonary nodules of helical computed tomography imagesrdquoOptical Engineering vol 38 no 8 pp 1340ndash1347 1999
[15] T Hayashi X Zhou and H Fujita ldquoAutomated approach tomeasure pulmonary nodule volume based on radius and CTnumberrdquo in Proceedings of the 2nd International Workshop onPulmonary Image Analysis pp 377ndash388 2009
[16] N Xu N Ahuja and R Bansal ldquoAutomated lung nodule seg-mentation using dynamic programming and EM-based classi-ficationrdquo in Medical Imaging 2002 Image Processing M Sonkaand J M Fitzpatrick Eds vol 4684 of Proceedings of SPIE pp666ndash676 May 2002
[17] K Okada D Comaniciu and A Krishnan ldquoRobust anisotropicGaussian fitting for volumetric characterization of pulmonarynodules in multislice CTrdquo IEEE Transactions on Medical Imag-ing vol 24 no 2 pp 409ndash423 2005
[18] T Duindam B deHoop and B vanGinneken ldquoNodule volumechange estimation in thoracic CT using sphere fitting morpho-logical segmentation and image registrationrdquo in Proceedings ofthe 2nd International Workshop on Pulmonary Image Analysispp 389ndash400 2009
[19] A Reeves A Jirapatnakul A Biancardi et al ldquoThe volcanorsquo09 challenge preliminary resultsrdquo in Proceedings of the 2ndInternational Workshop on Pulmonary Image Analysis pp 353ndash364 2009
[20] Y Gao A Tannenbaum and R Kikinis ldquoSimultaneous multi-object segmentation using local robust statistics and contourinteractionrdquo in Proceedings of the 2010 International MICCAIConference on Medical Computer Vision Recognition Tech-niques and Applications in Medical Imaging (MCV rsquo10) pp195ndash203 Springer 2011 httpportalacmorgcitationcfmid=19644781964502
[21] J Dehmeshki H Amin M Valdivieso and X Ye ldquoSegmen-tation of pulmonary nodules in thoracic CT scans a regiongrowing approachrdquo IEEE Transactions on Medical Imaging vol27 no 4 pp 467ndash4480 2008
[22] R Opfer and R Wiemker ldquoA new general tumor segmentationframework based on radial basis function energy minimizationwith a validation study on LIDC lung nodulesrdquo in MedicalImaging 2007 Image Processing vol 6512 of Proceedings of SPIEFebruary 2007 article 651217
Computational and Mathematical Methods in Medicine 5
(a) Inferred segmentation
(b) Ground truth segmentation
(c) Unsegmented
Figure 2 Tumor 11 comparison of inferred segmentation versusthe ground truth labeling with the unsegmented subvolume forreference
(a) Inferred segmentation
(b) Ground truth segmentation
(c) Unsegmented
Figure 3 Tumor 23 Comparison of inferred segmentation versusthe ground truth labeling with the unsegmented subvolume forreference
3 Results
31 Segmentation Theparameters were learned from the firstnodules of the 4 given pairs of training nodules Results weresegmented using graph cuts on the first nodules of the 50 pairsof test nodules The mean precision was 092 and the meanrecall was 089 not accounting for the size of the nodulesAn example segmentation and the ground truth can be seenin Figures 2 and 3 When all 50 pairs (100 nodules) wereevaluated the mean precision was 091 and the mean recallwas 089
The segmented physical volumes were plotted against theground truth physical volumes in Figure 4 An ordinary leastsquares fit was applied to the data and the fit line closely
0 05
05
1 15
15
2 25
25
0
1
2
Ground truth size
Segm
ente
d siz
e
Segmented size versus ground truth size
NodulesLS fit y = 1046x + minus8935
Ideal fit y = x
times104
times104
Figure 4 Plot of segmented volume size versus ground truthvolume size An ordinary least squares fit is shown along with theexpected fit 119910 = 119909 The correlation coefficient 119877 = 099 and the119875 value 119875 = 000 Our method accurately estimates the volumescompared to ground truth with no significant bias towards eithera larger or a smaller segmentation
0 040302010
2
4
6
8
10
12
Relative error
Num
ber o
f exa
mpl
es
Relative error histogram
minus04 minus03 minus02 minus01
Figure 5 Histogram of errors relative to ground truth volume
approximates the expected fit line 119910 = 119909 The correlationcoefficient 119877 = 099 and the 119875 value 119875 = 000 This showsthat our method accurately estimates the volumes comparedto ground truth and that there is no significant bias towardseither a larger or a smaller segmentation
The relative volume error compared to ground truth wascalculated for each of the first 50 test examplesThemaximumpositive error was 033 and the maximum negative error wasminus031 A histogram of the relative errors is shown in Figure 5
6 Computational and Mathematical Methods in Medicine
1
02020404
06 060808
1 0246
Precision and recall histogram
Precision Recall
02020404
06 060808
Precision Recall
Figure 6 Histogram of precision and recall of first 50 segmentedexamples
A 2D histogram of the precisions and recalls is shown inFigure 6Most examples had precisions and recalls within the08 to 10 range
As a comparison test performance was compared to theRobust Statistical Segmentation procedure implemented inSlicerThe RSS method uses a statistics-driven active contourmodel for segmentation [20] Approximate volumes werespecified using ground truth data Boundary and intensityuniformity parameters were tuned by hand for each nod-ule until a satisfactory or best possible segmentation wasachieved Slicer RSS achieved a mean of 078 precision and078 recall under these conditions A histogram of the resultscan be seen in Figure 7 RSS is more inconsistent with itsperformance compared to our method Some segmentationscan be seen in Figures 8 9 and 10 and a volume renderingcan be seen in Figure 11 As a whole our method performedbetter than RSS used by Slicer but in some individual caseslike Figure 10 RSS performed better There are examples inwhich both methods performed poorly as well Tumor 30 issuch an example largely due to significant vascularization ofthe nodule and its juxtapleural position A volume renderingcomparison of Tumor 30 can be seen in Figure 12 RSSoversegmented the nodule significantly while DRF alsooversegmented the nodule to a lesser extent A slice-by-slicecomparison can be seen in Figure 13
The metric used to evaluate performance in the VOL-CANOrsquo09 Challenge is percent volume change (1198812minus1198811)1198811from the first sample volume of a pair (1198811) to the second one(1198812) In Figure 14 the percentage change for each testing pairwas plotted against the percentage change from a participant[15] and against the percentage change of our ground truthBecause there was no previous ground truth percentagechange established for the challenge our ground truth doesnot reflect the desired results of the challenge
4 Discussion
Due to the lack of widely available dedicated lung nodulesegmentation software currently it is difficult to compare
10123
Precision and recall histogram
02020404
06 0608
081
Precision Recall
Figure 7 Histogram of precision and recall of first 50 Slicer RSSsegmented examples
our results with existing standards In comparison withsimilar work Ye et al report a mean Dicersquos coefficient of079 on 101 nodules [13] Our Dicersquos coefficient (which is anequivalent definition to the F-score in this context) is 090The standard deviations of our F-scores were both around006 We suspect that our superior performance despitesimpler features can be explained by two factors first ourdiscriminative model and training gave us a better energyfunction and second simpler metrics may prove to be moretolerant to error Dehmeshki et al did not do a voxelwisecomparison but instead reported an ldquoacceptabilityrdquo metric of084 as determined by radiologist examination [21] Kostiset al seemed to have achieved very good results but theydid not report explicit performance metrics comparing theirresults to ground truth [4] Neither Zhao et al [14] orXu et al [16] reported data sets or performance metricscompared to ground truth The comparison with RobustStatistical Segmentation in Slicer shows our performanceagainst a state-of-the-art generalized segmentation tool andour method on average performs better
One must also be wary of placing too much trust inground truth Manual segmentations currently in use maydiffer significantly between users as Opfer and Wiemkerpointed out [22] Without a better idea of the variation inacceptable segmentations one runs the risk of overfittingFor a case like Tumor 30 (which was challenging for bothour algorithm and other comparison algorithms) the nearbyvasculature and pleura may affect the accuracy of manualsegmentations as well
Several groups participated in the VOLCANOrsquo09 Chal-lenge [15 18 19] but because the challenge was focusedon evaluating volume change in longitudinal studies insteadof measuring volume itself only volume change metricswere reported Volume change metrics from our resultswere comparable to the results from Hayashi et al [15]Because aggregate results for the VOLCANOChallenge wererenumbered before reporting in Reeves et al [19] we didnot compare their aggregate results Given our establishedground truth however we believe that the precision andrecall are a better measure of our performance in general
Computational and Mathematical Methods in Medicine 7
Figure 8 Tumor 20 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference In thisexample RSS overestimated the roundness and undersegmented the nodule Our method successfully segmented the bumps
Figure 9 Tumor 40 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference In thisexample both segmentation methods performed well
Figure 10 Tumor 50 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference This isan example in which our method oversegmented into the pleural wall while RSS did not
A natural extension of this work would be to apply thesame method to segmentation of other tumors in the bodyThe problem of segmentation in other anatomical areas hasof course been studied for example Lee et alrsquos work involvedsegmenting MRI data on brain tumors with results implyingtheir precision and recall were around 08 that each [11]
The main advantage of the DRF learning framework isthe automatic learning of energy function parameters forsegmentation Since all specific knowledge about the type oftumor we are looking for is learned automatically from thetraining examples as opposed to knowledge that is built intothe algorithm we can in theory train our model to workwith other types of tumors than the lung nodules presented
in this paper In practice lung nodules are generally easier todistinguish due to their high contrast to surrounding tissueso applying the model to other tumors will likely produceworse results
If the problem has been formulated properly the theo-retical optimum solution for the parameters should be themaximum likelihood solution to the DRF Our investiga-tion however found that the maximum likelihood solutionfavored oversegmentation achieving a very high recall butwith losses in precision We thus decided to use a morepractical approach and optimize directly based on the metricwe were using to evaluate the algorithm the F-score theharmonicmean of precision and recall Our results give better
8 Computational and Mathematical Methods in Medicine
Figure 11 Tumor 20 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volume
Figure 12 Tumor 30 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volumeBoth segmentation methods performed poorly but RSS vastlyoversegmented the nodule compared to our method
(a) Inferred segmentation
(b) RSS segmentation
(c) Ground truth segmentation
(d) Unsegmented
Figure 13 Tumor 30 comparison of inferred segmentation versusRSS with ground truth and unsegmented subvolume for reference
recall with similar precision compared to the maximumpseudo-likelihood solution for the parametersThe differenceis on the order of a few percentage points
In practice the inference step required to segment newnodules can be solved via fast polynomial time algorithmsusing graph cuts Using unoptimized Matlab code on a33 GHz quad core desktop with 8GB RAM this translated
0 10 20 30 40 50
3
35
Case number
Volume change comparison
Segmented volume changeHayashi reported volume changeGround truth volume change
05
15
25
0
1
2
minus05
minus1
Volu
me c
hang
e (V2minusV1)V
1Figure 14 Percent volume change versus Hayashi et alrsquos percentvolume change and our ground truth percent volume change
to sub-10 second segmentations for the volumes tested Withoptimized compiled code this will likely be much faster
41 Conclusion OurDRF semi-automatic segmentation pro-duces results that are generally very accurate with on average90 precision and recallThis system can be used to facilitatelung nodule size tracking applications Further work includescreating a clinical application in order to investigate theconsistency and clinical applicability of such a system Futurework can be done to expand the algorithmrsquos performanceto different types of tumors such as brain or liver Moreconsistency can be established with better radius estimationwhich can be achieved through a better initial segmentationAnother possibility would be to try extending the robustellipsoid fitting algorithm from Okada et al [17] to threedimensions allowing us to get a better estimate of noduleshape
References
[1] J Zhang K K Ma M H Er and V Chong ldquoTumor segmenta-tion frommagnetic resonance imaging by learning via one-classsupport vector machinerdquo in Proceedings of the InternationalWorkshop on Advanced Image Technology pp 207ndash211 2004
[2] Y L Huang J H Chen and W C Shen ldquoDiagnosis ofhepatic tumors with texture analysis in nonenhanced computedtomography imagesrdquoAcademic Radiology vol 13 no 6 pp 713ndash720 2006
[3] H D Li M Kallergi L P Clarke V K Jain and R A ClarkldquoMarkov random field for tumor detection in digital mammog-raphyrdquo IEEE Transactions on Medical Imaging vol 14 no 3 pp565ndash576 1995
[4] W J Kostis A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional segmentation and growth-rate estimation
Computational and Mathematical Methods in Medicine 9
of small pulmonary nodules in helical CT imagesrdquo IEEETransactions on Medical Imaging vol 22 no 10 pp 1259ndash12742003
[5] S Z Li Markov Random Field Modeling in Image AnalysisSpringer Secaucus NJ USA 2001
[6] Y Boykov O Veksler and R Zabih ldquoEfficient approximateenergy minimization via graph cutsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 20 no 12 pp1222ndash1239 2001
[7] V Kolmogorov and R Zabih ldquoWhat energy functions can beminimized via graph cutsrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 26 no 2 pp 147ndash159 2004
[8] Y Boykov and V Kolmogorov ldquoAn experimental comparisonof min-cutmax-flow algorithms for energy minimization invisionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 26 no 9 pp 1124ndash1137 2004
[9] A Delong A Osokin H N Isack and Y Boykov ldquoFast approx-imate energy minimization with label costsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo10) pp 2173ndash2180 June 2010
[10] S Kumar and M Hebert ldquoDiscriminative fields for modelingspatial dependencies in natural imagesrdquo in Advances in NeuralInformation Processing Systems MIT Press 2003
[11] C H Lee M Schmidt A Murtha A Bistritz J Sander andR Greiner ldquoSegmenting brain tumors with conditional randomfields and support vector machinesrdquo in Computer Vision forBiomedical Image Applications vol 3765 of Lecture Notes inComputer Science pp 469ndash478 2005
[12] J Suzuki E McDermott and H Isozaki ldquoTraining conditionalrandom fields with multivariate evaluation measuresrdquo in Pro-ceedings of the 21st International Conference on ComputationalLinguistics and 44th Annual Meeting of the Association forComputational Linguistics (COLINGACL rsquo06) pp 217ndash224Association for Computational Linguistics Stroudsburg PaUSA July 2006
[13] X Ye G Beddoe and G Slabaugh ldquoAutomatic graph cut seg-mentation of lesions in CT using mean shift superpixelsrdquoInternational Journal of Biomedical Imaging vol 2010 ArticleID 983963 14 pages 2010
[14] B Zhao A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional multicriterion automatic segmentation ofpulmonary nodules of helical computed tomography imagesrdquoOptical Engineering vol 38 no 8 pp 1340ndash1347 1999
[15] T Hayashi X Zhou and H Fujita ldquoAutomated approach tomeasure pulmonary nodule volume based on radius and CTnumberrdquo in Proceedings of the 2nd International Workshop onPulmonary Image Analysis pp 377ndash388 2009
[16] N Xu N Ahuja and R Bansal ldquoAutomated lung nodule seg-mentation using dynamic programming and EM-based classi-ficationrdquo in Medical Imaging 2002 Image Processing M Sonkaand J M Fitzpatrick Eds vol 4684 of Proceedings of SPIE pp666ndash676 May 2002
[17] K Okada D Comaniciu and A Krishnan ldquoRobust anisotropicGaussian fitting for volumetric characterization of pulmonarynodules in multislice CTrdquo IEEE Transactions on Medical Imag-ing vol 24 no 2 pp 409ndash423 2005
[18] T Duindam B deHoop and B vanGinneken ldquoNodule volumechange estimation in thoracic CT using sphere fitting morpho-logical segmentation and image registrationrdquo in Proceedings ofthe 2nd International Workshop on Pulmonary Image Analysispp 389ndash400 2009
[19] A Reeves A Jirapatnakul A Biancardi et al ldquoThe volcanorsquo09 challenge preliminary resultsrdquo in Proceedings of the 2ndInternational Workshop on Pulmonary Image Analysis pp 353ndash364 2009
[20] Y Gao A Tannenbaum and R Kikinis ldquoSimultaneous multi-object segmentation using local robust statistics and contourinteractionrdquo in Proceedings of the 2010 International MICCAIConference on Medical Computer Vision Recognition Tech-niques and Applications in Medical Imaging (MCV rsquo10) pp195ndash203 Springer 2011 httpportalacmorgcitationcfmid=19644781964502
[21] J Dehmeshki H Amin M Valdivieso and X Ye ldquoSegmen-tation of pulmonary nodules in thoracic CT scans a regiongrowing approachrdquo IEEE Transactions on Medical Imaging vol27 no 4 pp 467ndash4480 2008
[22] R Opfer and R Wiemker ldquoA new general tumor segmentationframework based on radial basis function energy minimizationwith a validation study on LIDC lung nodulesrdquo in MedicalImaging 2007 Image Processing vol 6512 of Proceedings of SPIEFebruary 2007 article 651217
6 Computational and Mathematical Methods in Medicine
1
02020404
06 060808
1 0246
Precision and recall histogram
Precision Recall
02020404
06 060808
Precision Recall
Figure 6 Histogram of precision and recall of first 50 segmentedexamples
A 2D histogram of the precisions and recalls is shown inFigure 6Most examples had precisions and recalls within the08 to 10 range
As a comparison test performance was compared to theRobust Statistical Segmentation procedure implemented inSlicerThe RSS method uses a statistics-driven active contourmodel for segmentation [20] Approximate volumes werespecified using ground truth data Boundary and intensityuniformity parameters were tuned by hand for each nod-ule until a satisfactory or best possible segmentation wasachieved Slicer RSS achieved a mean of 078 precision and078 recall under these conditions A histogram of the resultscan be seen in Figure 7 RSS is more inconsistent with itsperformance compared to our method Some segmentationscan be seen in Figures 8 9 and 10 and a volume renderingcan be seen in Figure 11 As a whole our method performedbetter than RSS used by Slicer but in some individual caseslike Figure 10 RSS performed better There are examples inwhich both methods performed poorly as well Tumor 30 issuch an example largely due to significant vascularization ofthe nodule and its juxtapleural position A volume renderingcomparison of Tumor 30 can be seen in Figure 12 RSSoversegmented the nodule significantly while DRF alsooversegmented the nodule to a lesser extent A slice-by-slicecomparison can be seen in Figure 13
The metric used to evaluate performance in the VOL-CANOrsquo09 Challenge is percent volume change (1198812minus1198811)1198811from the first sample volume of a pair (1198811) to the second one(1198812) In Figure 14 the percentage change for each testing pairwas plotted against the percentage change from a participant[15] and against the percentage change of our ground truthBecause there was no previous ground truth percentagechange established for the challenge our ground truth doesnot reflect the desired results of the challenge
4 Discussion
Due to the lack of widely available dedicated lung nodulesegmentation software currently it is difficult to compare
10123
Precision and recall histogram
02020404
06 0608
081
Precision Recall
Figure 7 Histogram of precision and recall of first 50 Slicer RSSsegmented examples
our results with existing standards In comparison withsimilar work Ye et al report a mean Dicersquos coefficient of079 on 101 nodules [13] Our Dicersquos coefficient (which is anequivalent definition to the F-score in this context) is 090The standard deviations of our F-scores were both around006 We suspect that our superior performance despitesimpler features can be explained by two factors first ourdiscriminative model and training gave us a better energyfunction and second simpler metrics may prove to be moretolerant to error Dehmeshki et al did not do a voxelwisecomparison but instead reported an ldquoacceptabilityrdquo metric of084 as determined by radiologist examination [21] Kostiset al seemed to have achieved very good results but theydid not report explicit performance metrics comparing theirresults to ground truth [4] Neither Zhao et al [14] orXu et al [16] reported data sets or performance metricscompared to ground truth The comparison with RobustStatistical Segmentation in Slicer shows our performanceagainst a state-of-the-art generalized segmentation tool andour method on average performs better
One must also be wary of placing too much trust inground truth Manual segmentations currently in use maydiffer significantly between users as Opfer and Wiemkerpointed out [22] Without a better idea of the variation inacceptable segmentations one runs the risk of overfittingFor a case like Tumor 30 (which was challenging for bothour algorithm and other comparison algorithms) the nearbyvasculature and pleura may affect the accuracy of manualsegmentations as well
Several groups participated in the VOLCANOrsquo09 Chal-lenge [15 18 19] but because the challenge was focusedon evaluating volume change in longitudinal studies insteadof measuring volume itself only volume change metricswere reported Volume change metrics from our resultswere comparable to the results from Hayashi et al [15]Because aggregate results for the VOLCANOChallenge wererenumbered before reporting in Reeves et al [19] we didnot compare their aggregate results Given our establishedground truth however we believe that the precision andrecall are a better measure of our performance in general
Computational and Mathematical Methods in Medicine 7
Figure 8 Tumor 20 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference In thisexample RSS overestimated the roundness and undersegmented the nodule Our method successfully segmented the bumps
Figure 9 Tumor 40 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference In thisexample both segmentation methods performed well
Figure 10 Tumor 50 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference This isan example in which our method oversegmented into the pleural wall while RSS did not
A natural extension of this work would be to apply thesame method to segmentation of other tumors in the bodyThe problem of segmentation in other anatomical areas hasof course been studied for example Lee et alrsquos work involvedsegmenting MRI data on brain tumors with results implyingtheir precision and recall were around 08 that each [11]
The main advantage of the DRF learning framework isthe automatic learning of energy function parameters forsegmentation Since all specific knowledge about the type oftumor we are looking for is learned automatically from thetraining examples as opposed to knowledge that is built intothe algorithm we can in theory train our model to workwith other types of tumors than the lung nodules presented
in this paper In practice lung nodules are generally easier todistinguish due to their high contrast to surrounding tissueso applying the model to other tumors will likely produceworse results
If the problem has been formulated properly the theo-retical optimum solution for the parameters should be themaximum likelihood solution to the DRF Our investiga-tion however found that the maximum likelihood solutionfavored oversegmentation achieving a very high recall butwith losses in precision We thus decided to use a morepractical approach and optimize directly based on the metricwe were using to evaluate the algorithm the F-score theharmonicmean of precision and recall Our results give better
8 Computational and Mathematical Methods in Medicine
Figure 11 Tumor 20 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volume
Figure 12 Tumor 30 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volumeBoth segmentation methods performed poorly but RSS vastlyoversegmented the nodule compared to our method
(a) Inferred segmentation
(b) RSS segmentation
(c) Ground truth segmentation
(d) Unsegmented
Figure 13 Tumor 30 comparison of inferred segmentation versusRSS with ground truth and unsegmented subvolume for reference
recall with similar precision compared to the maximumpseudo-likelihood solution for the parametersThe differenceis on the order of a few percentage points
In practice the inference step required to segment newnodules can be solved via fast polynomial time algorithmsusing graph cuts Using unoptimized Matlab code on a33 GHz quad core desktop with 8GB RAM this translated
0 10 20 30 40 50
3
35
Case number
Volume change comparison
Segmented volume changeHayashi reported volume changeGround truth volume change
05
15
25
0
1
2
minus05
minus1
Volu
me c
hang
e (V2minusV1)V
1Figure 14 Percent volume change versus Hayashi et alrsquos percentvolume change and our ground truth percent volume change
to sub-10 second segmentations for the volumes tested Withoptimized compiled code this will likely be much faster
41 Conclusion OurDRF semi-automatic segmentation pro-duces results that are generally very accurate with on average90 precision and recallThis system can be used to facilitatelung nodule size tracking applications Further work includescreating a clinical application in order to investigate theconsistency and clinical applicability of such a system Futurework can be done to expand the algorithmrsquos performanceto different types of tumors such as brain or liver Moreconsistency can be established with better radius estimationwhich can be achieved through a better initial segmentationAnother possibility would be to try extending the robustellipsoid fitting algorithm from Okada et al [17] to threedimensions allowing us to get a better estimate of noduleshape
References
[1] J Zhang K K Ma M H Er and V Chong ldquoTumor segmenta-tion frommagnetic resonance imaging by learning via one-classsupport vector machinerdquo in Proceedings of the InternationalWorkshop on Advanced Image Technology pp 207ndash211 2004
[2] Y L Huang J H Chen and W C Shen ldquoDiagnosis ofhepatic tumors with texture analysis in nonenhanced computedtomography imagesrdquoAcademic Radiology vol 13 no 6 pp 713ndash720 2006
[3] H D Li M Kallergi L P Clarke V K Jain and R A ClarkldquoMarkov random field for tumor detection in digital mammog-raphyrdquo IEEE Transactions on Medical Imaging vol 14 no 3 pp565ndash576 1995
[4] W J Kostis A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional segmentation and growth-rate estimation
Computational and Mathematical Methods in Medicine 9
of small pulmonary nodules in helical CT imagesrdquo IEEETransactions on Medical Imaging vol 22 no 10 pp 1259ndash12742003
[5] S Z Li Markov Random Field Modeling in Image AnalysisSpringer Secaucus NJ USA 2001
[6] Y Boykov O Veksler and R Zabih ldquoEfficient approximateenergy minimization via graph cutsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 20 no 12 pp1222ndash1239 2001
[7] V Kolmogorov and R Zabih ldquoWhat energy functions can beminimized via graph cutsrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 26 no 2 pp 147ndash159 2004
[8] Y Boykov and V Kolmogorov ldquoAn experimental comparisonof min-cutmax-flow algorithms for energy minimization invisionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 26 no 9 pp 1124ndash1137 2004
[9] A Delong A Osokin H N Isack and Y Boykov ldquoFast approx-imate energy minimization with label costsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo10) pp 2173ndash2180 June 2010
[10] S Kumar and M Hebert ldquoDiscriminative fields for modelingspatial dependencies in natural imagesrdquo in Advances in NeuralInformation Processing Systems MIT Press 2003
[11] C H Lee M Schmidt A Murtha A Bistritz J Sander andR Greiner ldquoSegmenting brain tumors with conditional randomfields and support vector machinesrdquo in Computer Vision forBiomedical Image Applications vol 3765 of Lecture Notes inComputer Science pp 469ndash478 2005
[12] J Suzuki E McDermott and H Isozaki ldquoTraining conditionalrandom fields with multivariate evaluation measuresrdquo in Pro-ceedings of the 21st International Conference on ComputationalLinguistics and 44th Annual Meeting of the Association forComputational Linguistics (COLINGACL rsquo06) pp 217ndash224Association for Computational Linguistics Stroudsburg PaUSA July 2006
[13] X Ye G Beddoe and G Slabaugh ldquoAutomatic graph cut seg-mentation of lesions in CT using mean shift superpixelsrdquoInternational Journal of Biomedical Imaging vol 2010 ArticleID 983963 14 pages 2010
[14] B Zhao A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional multicriterion automatic segmentation ofpulmonary nodules of helical computed tomography imagesrdquoOptical Engineering vol 38 no 8 pp 1340ndash1347 1999
[15] T Hayashi X Zhou and H Fujita ldquoAutomated approach tomeasure pulmonary nodule volume based on radius and CTnumberrdquo in Proceedings of the 2nd International Workshop onPulmonary Image Analysis pp 377ndash388 2009
[16] N Xu N Ahuja and R Bansal ldquoAutomated lung nodule seg-mentation using dynamic programming and EM-based classi-ficationrdquo in Medical Imaging 2002 Image Processing M Sonkaand J M Fitzpatrick Eds vol 4684 of Proceedings of SPIE pp666ndash676 May 2002
[17] K Okada D Comaniciu and A Krishnan ldquoRobust anisotropicGaussian fitting for volumetric characterization of pulmonarynodules in multislice CTrdquo IEEE Transactions on Medical Imag-ing vol 24 no 2 pp 409ndash423 2005
[18] T Duindam B deHoop and B vanGinneken ldquoNodule volumechange estimation in thoracic CT using sphere fitting morpho-logical segmentation and image registrationrdquo in Proceedings ofthe 2nd International Workshop on Pulmonary Image Analysispp 389ndash400 2009
[19] A Reeves A Jirapatnakul A Biancardi et al ldquoThe volcanorsquo09 challenge preliminary resultsrdquo in Proceedings of the 2ndInternational Workshop on Pulmonary Image Analysis pp 353ndash364 2009
[20] Y Gao A Tannenbaum and R Kikinis ldquoSimultaneous multi-object segmentation using local robust statistics and contourinteractionrdquo in Proceedings of the 2010 International MICCAIConference on Medical Computer Vision Recognition Tech-niques and Applications in Medical Imaging (MCV rsquo10) pp195ndash203 Springer 2011 httpportalacmorgcitationcfmid=19644781964502
[21] J Dehmeshki H Amin M Valdivieso and X Ye ldquoSegmen-tation of pulmonary nodules in thoracic CT scans a regiongrowing approachrdquo IEEE Transactions on Medical Imaging vol27 no 4 pp 467ndash4480 2008
[22] R Opfer and R Wiemker ldquoA new general tumor segmentationframework based on radial basis function energy minimizationwith a validation study on LIDC lung nodulesrdquo in MedicalImaging 2007 Image Processing vol 6512 of Proceedings of SPIEFebruary 2007 article 651217
Figure 8 Tumor 20 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference In thisexample RSS overestimated the roundness and undersegmented the nodule Our method successfully segmented the bumps
Figure 9 Tumor 40 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference In thisexample both segmentation methods performed well
Figure 10 Tumor 50 comparison of inferred segmentation versus RSS ground truth and the unsegmented subvolume for reference This isan example in which our method oversegmented into the pleural wall while RSS did not
A natural extension of this work would be to apply thesame method to segmentation of other tumors in the bodyThe problem of segmentation in other anatomical areas hasof course been studied for example Lee et alrsquos work involvedsegmenting MRI data on brain tumors with results implyingtheir precision and recall were around 08 that each [11]
The main advantage of the DRF learning framework isthe automatic learning of energy function parameters forsegmentation Since all specific knowledge about the type oftumor we are looking for is learned automatically from thetraining examples as opposed to knowledge that is built intothe algorithm we can in theory train our model to workwith other types of tumors than the lung nodules presented
in this paper In practice lung nodules are generally easier todistinguish due to their high contrast to surrounding tissueso applying the model to other tumors will likely produceworse results
If the problem has been formulated properly the theo-retical optimum solution for the parameters should be themaximum likelihood solution to the DRF Our investiga-tion however found that the maximum likelihood solutionfavored oversegmentation achieving a very high recall butwith losses in precision We thus decided to use a morepractical approach and optimize directly based on the metricwe were using to evaluate the algorithm the F-score theharmonicmean of precision and recall Our results give better
8 Computational and Mathematical Methods in Medicine
Figure 11 Tumor 20 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volume
Figure 12 Tumor 30 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volumeBoth segmentation methods performed poorly but RSS vastlyoversegmented the nodule compared to our method
(a) Inferred segmentation
(b) RSS segmentation
(c) Ground truth segmentation
(d) Unsegmented
Figure 13 Tumor 30 comparison of inferred segmentation versusRSS with ground truth and unsegmented subvolume for reference
recall with similar precision compared to the maximumpseudo-likelihood solution for the parametersThe differenceis on the order of a few percentage points
In practice the inference step required to segment newnodules can be solved via fast polynomial time algorithmsusing graph cuts Using unoptimized Matlab code on a33 GHz quad core desktop with 8GB RAM this translated
0 10 20 30 40 50
3
35
Case number
Volume change comparison
Segmented volume changeHayashi reported volume changeGround truth volume change
05
15
25
0
1
2
minus05
minus1
Volu
me c
hang
e (V2minusV1)V
1Figure 14 Percent volume change versus Hayashi et alrsquos percentvolume change and our ground truth percent volume change
to sub-10 second segmentations for the volumes tested Withoptimized compiled code this will likely be much faster
41 Conclusion OurDRF semi-automatic segmentation pro-duces results that are generally very accurate with on average90 precision and recallThis system can be used to facilitatelung nodule size tracking applications Further work includescreating a clinical application in order to investigate theconsistency and clinical applicability of such a system Futurework can be done to expand the algorithmrsquos performanceto different types of tumors such as brain or liver Moreconsistency can be established with better radius estimationwhich can be achieved through a better initial segmentationAnother possibility would be to try extending the robustellipsoid fitting algorithm from Okada et al [17] to threedimensions allowing us to get a better estimate of noduleshape
References
[1] J Zhang K K Ma M H Er and V Chong ldquoTumor segmenta-tion frommagnetic resonance imaging by learning via one-classsupport vector machinerdquo in Proceedings of the InternationalWorkshop on Advanced Image Technology pp 207ndash211 2004
[2] Y L Huang J H Chen and W C Shen ldquoDiagnosis ofhepatic tumors with texture analysis in nonenhanced computedtomography imagesrdquoAcademic Radiology vol 13 no 6 pp 713ndash720 2006
[3] H D Li M Kallergi L P Clarke V K Jain and R A ClarkldquoMarkov random field for tumor detection in digital mammog-raphyrdquo IEEE Transactions on Medical Imaging vol 14 no 3 pp565ndash576 1995
[4] W J Kostis A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional segmentation and growth-rate estimation
Computational and Mathematical Methods in Medicine 9
of small pulmonary nodules in helical CT imagesrdquo IEEETransactions on Medical Imaging vol 22 no 10 pp 1259ndash12742003
[5] S Z Li Markov Random Field Modeling in Image AnalysisSpringer Secaucus NJ USA 2001
[6] Y Boykov O Veksler and R Zabih ldquoEfficient approximateenergy minimization via graph cutsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 20 no 12 pp1222ndash1239 2001
[7] V Kolmogorov and R Zabih ldquoWhat energy functions can beminimized via graph cutsrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 26 no 2 pp 147ndash159 2004
[8] Y Boykov and V Kolmogorov ldquoAn experimental comparisonof min-cutmax-flow algorithms for energy minimization invisionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 26 no 9 pp 1124ndash1137 2004
[9] A Delong A Osokin H N Isack and Y Boykov ldquoFast approx-imate energy minimization with label costsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo10) pp 2173ndash2180 June 2010
[10] S Kumar and M Hebert ldquoDiscriminative fields for modelingspatial dependencies in natural imagesrdquo in Advances in NeuralInformation Processing Systems MIT Press 2003
[11] C H Lee M Schmidt A Murtha A Bistritz J Sander andR Greiner ldquoSegmenting brain tumors with conditional randomfields and support vector machinesrdquo in Computer Vision forBiomedical Image Applications vol 3765 of Lecture Notes inComputer Science pp 469ndash478 2005
[12] J Suzuki E McDermott and H Isozaki ldquoTraining conditionalrandom fields with multivariate evaluation measuresrdquo in Pro-ceedings of the 21st International Conference on ComputationalLinguistics and 44th Annual Meeting of the Association forComputational Linguistics (COLINGACL rsquo06) pp 217ndash224Association for Computational Linguistics Stroudsburg PaUSA July 2006
[13] X Ye G Beddoe and G Slabaugh ldquoAutomatic graph cut seg-mentation of lesions in CT using mean shift superpixelsrdquoInternational Journal of Biomedical Imaging vol 2010 ArticleID 983963 14 pages 2010
[14] B Zhao A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional multicriterion automatic segmentation ofpulmonary nodules of helical computed tomography imagesrdquoOptical Engineering vol 38 no 8 pp 1340ndash1347 1999
[15] T Hayashi X Zhou and H Fujita ldquoAutomated approach tomeasure pulmonary nodule volume based on radius and CTnumberrdquo in Proceedings of the 2nd International Workshop onPulmonary Image Analysis pp 377ndash388 2009
[16] N Xu N Ahuja and R Bansal ldquoAutomated lung nodule seg-mentation using dynamic programming and EM-based classi-ficationrdquo in Medical Imaging 2002 Image Processing M Sonkaand J M Fitzpatrick Eds vol 4684 of Proceedings of SPIE pp666ndash676 May 2002
[17] K Okada D Comaniciu and A Krishnan ldquoRobust anisotropicGaussian fitting for volumetric characterization of pulmonarynodules in multislice CTrdquo IEEE Transactions on Medical Imag-ing vol 24 no 2 pp 409ndash423 2005
[18] T Duindam B deHoop and B vanGinneken ldquoNodule volumechange estimation in thoracic CT using sphere fitting morpho-logical segmentation and image registrationrdquo in Proceedings ofthe 2nd International Workshop on Pulmonary Image Analysispp 389ndash400 2009
[19] A Reeves A Jirapatnakul A Biancardi et al ldquoThe volcanorsquo09 challenge preliminary resultsrdquo in Proceedings of the 2ndInternational Workshop on Pulmonary Image Analysis pp 353ndash364 2009
[20] Y Gao A Tannenbaum and R Kikinis ldquoSimultaneous multi-object segmentation using local robust statistics and contourinteractionrdquo in Proceedings of the 2010 International MICCAIConference on Medical Computer Vision Recognition Tech-niques and Applications in Medical Imaging (MCV rsquo10) pp195ndash203 Springer 2011 httpportalacmorgcitationcfmid=19644781964502
[21] J Dehmeshki H Amin M Valdivieso and X Ye ldquoSegmen-tation of pulmonary nodules in thoracic CT scans a regiongrowing approachrdquo IEEE Transactions on Medical Imaging vol27 no 4 pp 467ndash4480 2008
[22] R Opfer and R Wiemker ldquoA new general tumor segmentationframework based on radial basis function energy minimizationwith a validation study on LIDC lung nodulesrdquo in MedicalImaging 2007 Image Processing vol 6512 of Proceedings of SPIEFebruary 2007 article 651217
8 Computational and Mathematical Methods in Medicine
Figure 11 Tumor 20 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volume
Figure 12 Tumor 30 comparison of our segmented volume versusRSS and ground truth for reference From left to right our seg-mented volume ground truth volume and RSS segmented volumeBoth segmentation methods performed poorly but RSS vastlyoversegmented the nodule compared to our method
(a) Inferred segmentation
(b) RSS segmentation
(c) Ground truth segmentation
(d) Unsegmented
Figure 13 Tumor 30 comparison of inferred segmentation versusRSS with ground truth and unsegmented subvolume for reference
recall with similar precision compared to the maximumpseudo-likelihood solution for the parametersThe differenceis on the order of a few percentage points
In practice the inference step required to segment newnodules can be solved via fast polynomial time algorithmsusing graph cuts Using unoptimized Matlab code on a33 GHz quad core desktop with 8GB RAM this translated
0 10 20 30 40 50
3
35
Case number
Volume change comparison
Segmented volume changeHayashi reported volume changeGround truth volume change
05
15
25
0
1
2
minus05
minus1
Volu
me c
hang
e (V2minusV1)V
1Figure 14 Percent volume change versus Hayashi et alrsquos percentvolume change and our ground truth percent volume change
to sub-10 second segmentations for the volumes tested Withoptimized compiled code this will likely be much faster
41 Conclusion OurDRF semi-automatic segmentation pro-duces results that are generally very accurate with on average90 precision and recallThis system can be used to facilitatelung nodule size tracking applications Further work includescreating a clinical application in order to investigate theconsistency and clinical applicability of such a system Futurework can be done to expand the algorithmrsquos performanceto different types of tumors such as brain or liver Moreconsistency can be established with better radius estimationwhich can be achieved through a better initial segmentationAnother possibility would be to try extending the robustellipsoid fitting algorithm from Okada et al [17] to threedimensions allowing us to get a better estimate of noduleshape
References
[1] J Zhang K K Ma M H Er and V Chong ldquoTumor segmenta-tion frommagnetic resonance imaging by learning via one-classsupport vector machinerdquo in Proceedings of the InternationalWorkshop on Advanced Image Technology pp 207ndash211 2004
[2] Y L Huang J H Chen and W C Shen ldquoDiagnosis ofhepatic tumors with texture analysis in nonenhanced computedtomography imagesrdquoAcademic Radiology vol 13 no 6 pp 713ndash720 2006
[3] H D Li M Kallergi L P Clarke V K Jain and R A ClarkldquoMarkov random field for tumor detection in digital mammog-raphyrdquo IEEE Transactions on Medical Imaging vol 14 no 3 pp565ndash576 1995
[4] W J Kostis A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional segmentation and growth-rate estimation
Computational and Mathematical Methods in Medicine 9
of small pulmonary nodules in helical CT imagesrdquo IEEETransactions on Medical Imaging vol 22 no 10 pp 1259ndash12742003
[5] S Z Li Markov Random Field Modeling in Image AnalysisSpringer Secaucus NJ USA 2001
[6] Y Boykov O Veksler and R Zabih ldquoEfficient approximateenergy minimization via graph cutsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 20 no 12 pp1222ndash1239 2001
[7] V Kolmogorov and R Zabih ldquoWhat energy functions can beminimized via graph cutsrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 26 no 2 pp 147ndash159 2004
[8] Y Boykov and V Kolmogorov ldquoAn experimental comparisonof min-cutmax-flow algorithms for energy minimization invisionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 26 no 9 pp 1124ndash1137 2004
[9] A Delong A Osokin H N Isack and Y Boykov ldquoFast approx-imate energy minimization with label costsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo10) pp 2173ndash2180 June 2010
[10] S Kumar and M Hebert ldquoDiscriminative fields for modelingspatial dependencies in natural imagesrdquo in Advances in NeuralInformation Processing Systems MIT Press 2003
[11] C H Lee M Schmidt A Murtha A Bistritz J Sander andR Greiner ldquoSegmenting brain tumors with conditional randomfields and support vector machinesrdquo in Computer Vision forBiomedical Image Applications vol 3765 of Lecture Notes inComputer Science pp 469ndash478 2005
[12] J Suzuki E McDermott and H Isozaki ldquoTraining conditionalrandom fields with multivariate evaluation measuresrdquo in Pro-ceedings of the 21st International Conference on ComputationalLinguistics and 44th Annual Meeting of the Association forComputational Linguistics (COLINGACL rsquo06) pp 217ndash224Association for Computational Linguistics Stroudsburg PaUSA July 2006
[13] X Ye G Beddoe and G Slabaugh ldquoAutomatic graph cut seg-mentation of lesions in CT using mean shift superpixelsrdquoInternational Journal of Biomedical Imaging vol 2010 ArticleID 983963 14 pages 2010
[14] B Zhao A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional multicriterion automatic segmentation ofpulmonary nodules of helical computed tomography imagesrdquoOptical Engineering vol 38 no 8 pp 1340ndash1347 1999
[15] T Hayashi X Zhou and H Fujita ldquoAutomated approach tomeasure pulmonary nodule volume based on radius and CTnumberrdquo in Proceedings of the 2nd International Workshop onPulmonary Image Analysis pp 377ndash388 2009
[16] N Xu N Ahuja and R Bansal ldquoAutomated lung nodule seg-mentation using dynamic programming and EM-based classi-ficationrdquo in Medical Imaging 2002 Image Processing M Sonkaand J M Fitzpatrick Eds vol 4684 of Proceedings of SPIE pp666ndash676 May 2002
[17] K Okada D Comaniciu and A Krishnan ldquoRobust anisotropicGaussian fitting for volumetric characterization of pulmonarynodules in multislice CTrdquo IEEE Transactions on Medical Imag-ing vol 24 no 2 pp 409ndash423 2005
[18] T Duindam B deHoop and B vanGinneken ldquoNodule volumechange estimation in thoracic CT using sphere fitting morpho-logical segmentation and image registrationrdquo in Proceedings ofthe 2nd International Workshop on Pulmonary Image Analysispp 389ndash400 2009
[19] A Reeves A Jirapatnakul A Biancardi et al ldquoThe volcanorsquo09 challenge preliminary resultsrdquo in Proceedings of the 2ndInternational Workshop on Pulmonary Image Analysis pp 353ndash364 2009
[20] Y Gao A Tannenbaum and R Kikinis ldquoSimultaneous multi-object segmentation using local robust statistics and contourinteractionrdquo in Proceedings of the 2010 International MICCAIConference on Medical Computer Vision Recognition Tech-niques and Applications in Medical Imaging (MCV rsquo10) pp195ndash203 Springer 2011 httpportalacmorgcitationcfmid=19644781964502
[21] J Dehmeshki H Amin M Valdivieso and X Ye ldquoSegmen-tation of pulmonary nodules in thoracic CT scans a regiongrowing approachrdquo IEEE Transactions on Medical Imaging vol27 no 4 pp 467ndash4480 2008
[22] R Opfer and R Wiemker ldquoA new general tumor segmentationframework based on radial basis function energy minimizationwith a validation study on LIDC lung nodulesrdquo in MedicalImaging 2007 Image Processing vol 6512 of Proceedings of SPIEFebruary 2007 article 651217
Computational and Mathematical Methods in Medicine 9
of small pulmonary nodules in helical CT imagesrdquo IEEETransactions on Medical Imaging vol 22 no 10 pp 1259ndash12742003
[5] S Z Li Markov Random Field Modeling in Image AnalysisSpringer Secaucus NJ USA 2001
[6] Y Boykov O Veksler and R Zabih ldquoEfficient approximateenergy minimization via graph cutsrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 20 no 12 pp1222ndash1239 2001
[7] V Kolmogorov and R Zabih ldquoWhat energy functions can beminimized via graph cutsrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 26 no 2 pp 147ndash159 2004
[8] Y Boykov and V Kolmogorov ldquoAn experimental comparisonof min-cutmax-flow algorithms for energy minimization invisionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 26 no 9 pp 1124ndash1137 2004
[9] A Delong A Osokin H N Isack and Y Boykov ldquoFast approx-imate energy minimization with label costsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo10) pp 2173ndash2180 June 2010
[10] S Kumar and M Hebert ldquoDiscriminative fields for modelingspatial dependencies in natural imagesrdquo in Advances in NeuralInformation Processing Systems MIT Press 2003
[11] C H Lee M Schmidt A Murtha A Bistritz J Sander andR Greiner ldquoSegmenting brain tumors with conditional randomfields and support vector machinesrdquo in Computer Vision forBiomedical Image Applications vol 3765 of Lecture Notes inComputer Science pp 469ndash478 2005
[12] J Suzuki E McDermott and H Isozaki ldquoTraining conditionalrandom fields with multivariate evaluation measuresrdquo in Pro-ceedings of the 21st International Conference on ComputationalLinguistics and 44th Annual Meeting of the Association forComputational Linguistics (COLINGACL rsquo06) pp 217ndash224Association for Computational Linguistics Stroudsburg PaUSA July 2006
[13] X Ye G Beddoe and G Slabaugh ldquoAutomatic graph cut seg-mentation of lesions in CT using mean shift superpixelsrdquoInternational Journal of Biomedical Imaging vol 2010 ArticleID 983963 14 pages 2010
[14] B Zhao A P Reeves D F Yankelevitz and C I HenschkeldquoThree-dimensional multicriterion automatic segmentation ofpulmonary nodules of helical computed tomography imagesrdquoOptical Engineering vol 38 no 8 pp 1340ndash1347 1999
[15] T Hayashi X Zhou and H Fujita ldquoAutomated approach tomeasure pulmonary nodule volume based on radius and CTnumberrdquo in Proceedings of the 2nd International Workshop onPulmonary Image Analysis pp 377ndash388 2009
[16] N Xu N Ahuja and R Bansal ldquoAutomated lung nodule seg-mentation using dynamic programming and EM-based classi-ficationrdquo in Medical Imaging 2002 Image Processing M Sonkaand J M Fitzpatrick Eds vol 4684 of Proceedings of SPIE pp666ndash676 May 2002
[17] K Okada D Comaniciu and A Krishnan ldquoRobust anisotropicGaussian fitting for volumetric characterization of pulmonarynodules in multislice CTrdquo IEEE Transactions on Medical Imag-ing vol 24 no 2 pp 409ndash423 2005
[18] T Duindam B deHoop and B vanGinneken ldquoNodule volumechange estimation in thoracic CT using sphere fitting morpho-logical segmentation and image registrationrdquo in Proceedings ofthe 2nd International Workshop on Pulmonary Image Analysispp 389ndash400 2009
[19] A Reeves A Jirapatnakul A Biancardi et al ldquoThe volcanorsquo09 challenge preliminary resultsrdquo in Proceedings of the 2ndInternational Workshop on Pulmonary Image Analysis pp 353ndash364 2009
[20] Y Gao A Tannenbaum and R Kikinis ldquoSimultaneous multi-object segmentation using local robust statistics and contourinteractionrdquo in Proceedings of the 2010 International MICCAIConference on Medical Computer Vision Recognition Tech-niques and Applications in Medical Imaging (MCV rsquo10) pp195ndash203 Springer 2011 httpportalacmorgcitationcfmid=19644781964502
[21] J Dehmeshki H Amin M Valdivieso and X Ye ldquoSegmen-tation of pulmonary nodules in thoracic CT scans a regiongrowing approachrdquo IEEE Transactions on Medical Imaging vol27 no 4 pp 467ndash4480 2008
[22] R Opfer and R Wiemker ldquoA new general tumor segmentationframework based on radial basis function energy minimizationwith a validation study on LIDC lung nodulesrdquo in MedicalImaging 2007 Image Processing vol 6512 of Proceedings of SPIEFebruary 2007 article 651217