1 R2 ImageChecker CT CAD PMA: Clinical Results Nicholas Petrick, Ph.D. Office of Science and Technology Center for Devices and Radiological Health U.S. Food and Drug Administration
11
R2 ImageChecker CT CAD PMA:Clinical ResultsNicholas Petrick, Ph.D.Office of Science and Technology
Center for Devices and Radiological Health
U.S. Food and Drug Administration
2
Outline
• Applicability of Az in analysis• Az is same as area under the curve (AUC)
• Pool of CT cases for clinical study• Defining actionable nodules by panel of experts• Clinical studies
• Primary analysis: analysis using fixed expert panel• Secondary analysis: analysis using random panels of
experts• Measurement of CAD standalone performance
• Algorithm’s performance with no reader involvement
3
Applicability of Az in analysis
• Average reader ROC Curves (pre/post CAD)
FPP
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
TP
P
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Pre-CAD ROC
Post-CAD ROC
4
Applicability of Az in analysis
• Pre and post-CAD curves do not cross• No substantial pre/post-CAD crossing in
either averaged or individual ROC curves• Az is an appropriate performance measure
• Az used as figure of merit in all analysis
5
Pool of CT Cases
• Nodule cases• Documented cancers
• Primary neoplasm or extrathoracic neoplasm with presumptive spread to lungs
• Cases were allowed to contain non-nodule, pathologic processes (e.g., pneumonia, emphysema, etc.)
• Non-nodule cases• Normal cases
• No nodule deemed present by site P.I.• Primarily relied upon original radiology report
• History of cancer, radiation therapy, or even previous thorocatomy allowed
6
Defining Actionable Nodules by Panel of Experts
• ‘Actionable’ nodules are objects of interest• Panel of expert radiologists identify
actionable nodules• Nodules defined using a 2-pass process
7
Defining Actionable Nodules by Panel of Experts
• 1st reading of CT cases• Cases read independently & blinded by 3 expert radiologists• Radiologist provided subject’s age, gender, and indication
for exam• Marked all findings deemed lung nodules• Radiologist provided rating
• Intervention – Actionable, further workup advised• Surveillance – Actionable, monitor with follow-up studies• Probably Benign, calcified – no action required• Probably Benign, non-calcified – no action required
8
Defining Actionable Nodules by Panel of Experts
• 2nd pass• Findings that lacked 100% consensus after 1st pass were
reviewed unblinded by all 3 radiologists• 2/3 or 1/3 radiologists called the location a nodule are
reevaluated• Radiologists rated (or re-rated) the actionability of the
nodule candidates• Thresholds applied to all findings
• >4mm diameter• > -100 HU maximum density
• Each lung quadrant categorized by the highest actionable finding within quadrant
9
Defining Actionable Nodules by Panel of Experts
Disposition Unanimous Actionable
3/3
Majority Actionabl
e2/3
Minority Actionabl
e1/3
Sample Size 142 168 149
• 3 experts per panel
10
Clinical Studies
• ROC Observer Study• Az is test statistic
• Analysis of a 90 cases dataset (360 quadrants)
• Confidence intervals and significance testing• ANOVA-after-jackknife
• Bootstrap analysis
11
Clinical Studies Analysis Flowchart
Resampling
Scheme
Jackknife or
Bootstrap DefinitionOf Nodules
MRMC ROC Observer
Study
Pool of Cases
Pool of Experts
Pool of Readers
AzEstimates
12
ANOVA-after-Jackknife Analysis
• Parametric analysis• Leave-one case out (all 4 quadrants,
quadrant-based analysis)• Analysis assumes modality as a fixed
effect and readers, cases and all interactions as random effects
• Example• Set: [1 2 3], Partitions:[1 2], [1 3], [2 3]
13
Bootstrap Analysis
• Nonparametric analysis• Randomly generated datasets, based
on original data with replacement• Example
• Set: [1 2 3], Partitions:[3 2 3], [3 1 2], [1 1 2], …
14
Clinical Studies Primary Analysis
Resampling
Scheme
Jackknife or
Bootstrap
DefinitionOf Nodules
MRMC ROC Observer
Study
Pool of Cases
Pool of Experts
Pool of Readers
AzEstimates
• Fixed 3-member nodule definition panels (unanimous consensus)• ANOVA-after-jackknife and Bootstrap analysis
15
Clinical Studies Primary Analysis
• Fixed 3-member nodule definition panels
VarianceAnalysis
Pre-CADAz
Post-CADAz
ΔAzp-
valueLower C.L.
Upper C.L.
Jackknife 0.881 0.905 0.024
0.003 0.008 0.040
Bootstrap
0.879 0.903 0.025
<0.001
0.009 0.045
16
Clinical StudiesPrimary Analysis
• Statistically significant improvement in Az pre- to post-CAD• ΔAz~0.025
• ANOVA-after-jackknife and bootstrap analysis is consistent
• Analysis limited because it did not take into account any variation in the expert panel• Variability of panel would add uncertainty to performance
estimates• How would performance change with a different panel makeup?
• Different number of panel members• Different set of experts
17
Clinical Studies Secondary Analysis
Resampling
Scheme
BootstrapDefinitionOf Nodules
MRMC ROC Observer
Study
Pool of Cases
Pool of Experts
Pool of Readers
AzEstimates
• Random 3, 2, 1-member nodule definition panels (unanimous consensus)
• Only bootstrap analysis possible
18
Clinical StudiesSecondary Analysis
• Bootstrap analysis• Random 3-member nodule definition
panelsRandom
Panel Size
Pre-CADAz
Post-CADAz
ΔAzp-
valueLower C.L.
Upper C.L.
3-members
0.845 0.868 0.022
<0.001
0.008 0.040
2-members
0.832 0.854 0.022
0.002 0.008 0.039
1-member
0.817 0.838 0.021
<0.001
0.008 0.037
19
Clinical StudiesSecondary Analysis
• Sponsor's analysis takes into account random nature of expert panel for defining ‘actionable’ nodules• Different number of panel members: 3, 2, 1-member panels• Different panel makeup: bootstrap selection of panel
• All variations of panel makeup confirm a statistically significant improvement in Az from pre to post-CAD • ΔAz~0.02
• Likely to be a more appropriate analysis for assessment of devices when only panel truth is available
20
CAD Standalone Performance
• Performance of the CAD algorithm alone• Algorithm sensitivity and specificity (no reader
involvement)• Standalone CAD performance is important
• Radiologist needs this information to appropriately weight their confidence in the CAD markings
• Benchmark for future revisions to the algorithm • What is an appropriate performance measure
for this device?
21
CAD Standalone Performance
• Many of 142 findings (Fixed 3-member panel) did not meet criteria as a solid discrete, spherical density
• Second panel reevaluated nodules for appearance• 5 independent radiologists• 2 Categories
• Classic nodule: discrete solid, spherical or ovoid• Non-classic:
• Not discrete• Hyperdense• Irregularly shaped• Normal structure• Not a nodule
22
CAD Standalone Performance
No. Panelists defining as
classic
No. of Findings
CADTPF (%)
CADFalse
Marker Rate
TP Median Diamete
r(mm)
<3/5 65 32.3
~3 per-case
7.6-9.0
3/5 13 69.2 7.4
4/5 11 81.8 11.2
5/5 53 83.0 6.9
All 142 58.5 7.9<3/5 65 32.3
~3 per-case7.6-9.0
≥3/5 77 80.5 6.9-11.2
23
CAD Standalone Performance
• Large variation in performance of the CAD based on physicians assessment of nodule appearance as “classic”
24
Summary
• Az appropriate test statistic for clinical analysis
• No substantial crossing of pre/post-CAD ROC curves
• Primary Analysis• Nodule definition panel
• Fixed 3-member expert panel
• Shows statistically significant Az improvement in detection with CAD
• ANOVA-after-jackknife and bootstrap are comparable
25
Summary
• Secondary Analysis• Nodule Definition panel
• Varied number of panel members• Varied the panel makeup (bootstrap selection of panel
members)• Confirmed statistically significant Az improvement in
detection with CAD• Standalone performance
• Large variation in CAD performance based on reassessment of nodule appearance
• Necessary for appropriate utilization of the device by clinicians in the field and assessment of future algorithm revisions