Top Banner
Deep Learning in Radiology Promise and Caveats John R. Zech, M.D., M.A. PGY-1 Prelim Medicine, CPMC
101

Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Sep 12, 2019

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Deep Learning in RadiologyPromise and Caveats

John R. Zech, M.D., M.A.PGY-1 Prelim Medicine, CPMC

Page 2: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

About Me: John Zech

Preliminary medicine intern, CPMC

Future radiology resident, Columbia

Studied machine learning at Columbia (M.A. Statistics)

Prior to medicine: developed quantitative models in investment management

Page 3: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

The rise of the machines

Page 4: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

The rise of the machines: digit recognition

LeCun et al., 1998

Page 5: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

The rise of the machines: object recognition

Deng et al., 2009, Russakovsky et al., 2015

Page 6: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Segmentation CNNs

https://arxiv.org/pdf/1405.0312.pdf

Page 7: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Neural networks: biologically inspired

Hubel and Wiesel, 1962

https://youtu.be/IOHayh06LJ4

Page 8: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Neural networks: biologically inspired

Hubel and Wiesel, 1962 - https://youtu.be/IOHayh06LJ4

https://youtu.be/IOHayh06LJ4

Page 9: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Neural networks: biologically inspired

Hubel and Weisel, 1962

Page 10: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Neural networks: biologically inspired

http://cs231n.github.io/

Page 11: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Classification CNNsMaps image to a single classification

https://medium.com/@pechyonkin/key-deep-learning-architectures-lenet-5-6fc3c59e6f4

Page 12: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Lee et al., 2009

Learned features are sometimes interpretable

Lower level

Mid-level

High-level

Page 13: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

CNNs are trained in an iterative process using stochastic gradient descent

Page 14: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

MNIST: 32 x 32 pixels

ImageNet: varies, 224 x 224 - 299 x 299

CNNs have gotten more complex over 20 years

Page 15: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

CNNs have gotten more complex over 20 years

Page 16: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

CNNs have gotten more complex over 20 years

Page 17: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

CNNs have gotten more complex over 20 years

https://software.intel.com/en-us/articles/hands-on-ai-part-16-modern-deep-neural-network-architectures-for-image-classification

Page 18: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

CNNs are challenging to train, but...

You can start with pre-trained model and ‘fine-tune’ to your problem

Page 19: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

The rise of the machines: two case-studies in human-level clinical prediction

Page 20: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Esteva et al. (2017)

Page 21: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Esteva et al. (2017)

- Inception v3 model (299 x 299) pre-trained in another domain (ImageNet)

- Fine-tuned CNN with 129,450 clinical images

Page 22: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Esteva et al. (2017)

Page 23: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Esteva et al. (2017)

All comparison happened on 1,942 held-out biopsy-proven images: strong ground truth

Page 24: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Esteva et al. (2017)

Compare to 21 dermatologists on:

1. Keratinocyte carcinomas vs benign seborrheic keratoses (most common skin cancer)

2. Malignant melanomas versus benign nevi (most deadly skin cancer)

Page 25: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Esteva et al. (2017)

Page 26: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

What worked well in Esteva et al. (2017) :

● Image resolution not a limitation● Clinical information outside the image may have limited

value● Strong ground truth comparison: biopsy results

Page 27: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Rajpurkar et al. (2017)

Page 28: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Rajpurkar et al. (2017)

● Pre-trained DenseNet-121○ 224 x 224 pixels

Page 29: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Rajpurkar et al. (2017)

● Pre-trained DenseNet-121○ 224 x 224 pixels

● 112,120 NIH chest x-rays○ 70% train, 10% tune, 20% test

Page 30: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Rajpurkar et al. (2017)

● Pre-trained DenseNet-121○ 224 x 224 pixels

● 112,120 NIH chest x-rays○ 70% train, 10% tune, 20% test

● 14 diagnoses, including pneumonia

Page 31: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Rajpurkar et al. (2017)

● 14 diagnoses, including pneumonia● AUC for pneumonia: 0.7680

Page 32: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Rajpurkar et al. (2017)

● Human comparison: special 420 x-ray test set, labeled by 4 Stanford radiologists.

https://en.wikipedia.org/wiki/F1_score

Page 33: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Rajpurkar et al. (2017)

● Human comparison: special 420 x-ray test set, labeled by 4 Stanford radiologists.

https://en.wikipedia.org/wiki/F1_score

Page 34: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Human-level performance?

● AUC of Rajpurkar et al. (2017): for pneumonia 0.7680○ By comparison, AUC of Esteva et al. (2017): 0.91-0.96

● Why did the Rajpurkar et al. (2017) compare using 4 radiologists and F1 score?○ Low radiologist agreement

Page 35: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Reproduce-CheXNet: Zech (2018)

https://github.com/jrzech/reproduce-chexnet

Page 36: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96
Page 37: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Reproduce-CheXNet: Zech (2018)

Page 38: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Reproduce-CheXNet: Zech (2018)

Similar to Rajpurkar et al. (2017): 0.7680 vs. 0.7651

Page 39: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Will CheXNet generalize?

Page 40: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Confounders in Radiology: Zech et al. 2018

Page 41: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Confounders in Radiology: Zech et al. 2018

● How well does CheXNet generalize?● Trained CheXNet using data from

○ NIH○ Mount Sinai○ Indiana University

● Trained / tested using different combinations of data sources

Page 42: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Building the Mount Sinai dataset: Zech et al. 2018

● Exported and preprocessed 48,915 DICOM files from Mt. Sinai PACS

● Used NLP to automatically infer labels

Page 43: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Building the Mount Sinai dataset: Zech et al. 2018

Page 44: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Building the Mount Sinai dataset: Zech et al. 2018

Page 45: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Building the Mount Sinai dataset: Zech et al. 2018

“Without evidence of pneumonia”

Page 46: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Building the Mount Sinai dataset: Zech et al. 2018

● These are imperfect labels ○ ~90% sensitivity, specificity

Page 47: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Building the Mount Sinai dataset: Zech et al. 2018

● These are imperfect labels ○ ~90% sensitivity, specificity

● What could introduce biases into these labels?

Page 48: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Building the Mount Sinai dataset: Zech et al. 2018

● These are imperfect labels ○ ~90% sensitivity, specificity

● What could introduce biases into these labels?○ Radiologist thresholds for calling pathology○ Institutional templates○ Clinical scenario (i.e. ICU films for line placement)

Page 49: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Confounders in Radiology: Zech et al. 2018

Page 50: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Confounders in Radiology: Zech et al. 2018

Page 51: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Confounders in Radiology: Zech et al. 2018

Page 52: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Why better performance on joint NIH+Mount Sinai dataset?

Confounders in Radiology: Zech et al. 2018

Page 53: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Why better performance on joint NIH+Mount Sinai dataset?

It’s learning to detect site: Mount Sinai has much higher pneumonia rate

Confounders in Radiology: Zech et al. 2018

Page 54: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Confounders in Radiology: Zech et al. 2018

● CNN learned something very useful in making predictions, but not clinically helpful.

Page 55: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Confounders in Radiology: Zech et al. 2018

● CNN learned something very useful in making predictions, but not clinically helpful.

● CNNs are hard to interpret: >6 million parameters

Page 56: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

CNNs can an detect hospital system:

can it detect department within hospital?

Page 57: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

CNNs can an detect hospital system:

can it detect department within hospital?

Yes.

At Mount Sinai, CNNs could detect portable x-ray scanner department (inpatient vs. ED) with near-perfect accuracy

We don’t have metadata for NIH, but...

Page 58: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96
Page 59: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Confounders in Radiology: Zech (2018)

Page 60: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Confounders in Radiology: Zech (2018)

Page 61: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Confounders in Radiology: Zech (2018)

Page 62: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Confounders in Radiology: Zech et al. 2018● CNNs appear to exploit information beyond specific disease-related

imaging findings on x-rays to calibrate their disease predictions.

Page 63: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Confounders in Radiology: Zech et al. 2018● CNNs appear to exploit information beyond specific disease-related

imaging findings on x-rays to calibrate their disease predictions● Scanner type (especially portable vs regular PA/lateral) is easily exploited

Page 64: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Confounders in Radiology: Zech et al. 2018● CNNs appear to exploit information beyond specific disease-related

imaging findings on x-rays to calibrate their disease predictions● Scanner type (especially portable vs regular PA/lateral) is easily exploited● Whole-image, low-res classification is especially vulnerable

Page 65: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Confounders in Radiology: Zech et al. 2018● CNNs appear to exploit information beyond specific disease-related

imaging findings on x-rays to calibrate their disease predictions● Scanner type (especially portable vs regular PA/lateral) is easily exploited● Whole-image, low-res classification is especially vulnerable

Page 66: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Confounders in Radiology: Zech et al. 2018● CNNs appear to exploit information beyond specific disease-related

imaging findings on x-rays to calibrate their disease predictions.● Scanner type (especially portable vs regular PA/lateral) is easily exploited.● Whole-image, low-res classification is especially vulnerable

32 x 32

224 x 224

Page 67: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Rajpurkar et al. (2017)

● If the algorithm and the radiologists are givendifferent tasks, is the comparison fair?○ Algorithm: use all information, including metadata

implied by images, to optimize predictions○ Radiologist: identify disease-specific findings

● What does the ‘pneumonia’ label mean?○ Remarkably low agreement among radiologists ○ Low accuracy of CNN○ Imaging findings are REQUIRED for the diagnosis

→ raises questions given low inter-rater agreement

Page 68: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

How do we move forward from weakly-supervised ImageNet-based transfer learning?

Page 69: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

How do we move forward from weakly-supervised ImageNet-based transfer learning?

Domain adapted approaches that use segmentation

Page 70: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Domain-adapted CNN

3

Page 71: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Domain-adapted CNN

3

Page 72: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Domain-adapted CNN

3

https://arxiv.org/pdf/1709.07330.pdf

Page 73: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Classification CNNsMaps image to a single classification

https://medium.com/@pechyonkin/key-deep-learning-architectures-lenet-5-6fc3c59e6f4

Page 74: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Segmentation CNNs

https://arxiv.org/pdf/1405.0312.pdf

Page 75: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Segmentation: U-Net

https://arxiv.org/pdf/1505.04597.pdf

Page 76: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Segmentation: U-Net

https://arxiv.org/pdf/1709.07330.pdf

Page 77: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Domain-adapted CNN: Gale et al. (2017)

Page 78: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Domain-adapted CNN: Gale et al. (2017)

● Broad training dataset: 53,278 pelvis x-rays from Royal Adelaide Hospital

● Test set: only ED films

Page 79: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Domain-adapted CNN: Gale et al. (2017)

Four CNNs Used:

1. Filter for frontal x-rays2. Locate head of femur: 1024 x 1024 pixels3. Exclude films with metal implants4. Customized DenseNet

Page 80: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Domain-adapted CNN: Gale et al. (2017)

Customized DenseNet

● 1024 x 1024 receptive field● 1,434,176 parameters● two loss functions

○ fracture/no fracture○ Location: intra-capsular, extra-capsular, and no

fracture

Page 81: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Domain-adapted CNN: Gale et al. (2017)

Page 82: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Domain-adapted CNN: Gale et al. (2017)

● Careful data cleaning to avoid confounding variables○ Normalization○ No metal

● Chosen test set reflecting real clinical use scenario: ED● Followed a radiologist’s process

○ zooming in on femur○ maintain high resolution

Page 83: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Domain-adapted CNN: Chang et al. (2018)

● Identify hemorrhage on 10,159 head CTs● Used segmentation-based approach● Results in challenging ED environment in true forward out

of sample testing○ 0.989 AUC○ 97.2% accuracy○ 0.951 sensitivity ○ 0.973 specificity

Page 84: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Domain-adapted CNN: Chang et al. (2018)

Page 85: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Domain-adapted CNN: Chang et al. (2018)

Page 86: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Stronger approach and results,but needs generalization testing on new sites

Page 87: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Recht et al. (2018)

airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks

Page 88: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

“Current accuracy numbers are brittle and susceptible to even minute natural

variations in the data distribution.”

Recht et al. (2018)

Page 89: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

How will we use deep learning in radiology?

Page 90: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

How will we use deep learning in radiology?

● Can perform well at well-specified, clearly-designed imaging tasks: fracture, hemorrhage detection

Page 91: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

How will we use deep learning in radiology?

● Can perform well at well-specified, clearly-designed imaging tasks: fracture, hemorrhage detection○ but must be carefully designed

Page 92: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

How will we use deep learning in radiology?

● Can perform well at well-specified, clearly-designed imaging tasks: fracture, hemorrhage detection○ but must be carefully designed

● Could flag important information that affects interpretation, e.g., structured EHR data, text of physician notes

Page 93: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Are they truly ‘artificially intelligent’?

Page 94: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Or a (really intriguing) statistical model?

Page 95: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Figure courtesy Marcus Badgeley

How will we combine this new information with our prior beliefs?

Page 96: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Takeaways● What a convolutional neural network does

Page 97: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Takeaways● What a convolutional neural network does

● Early promising results in dermatology

Page 98: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Takeaways● What a convolutional neural network does

● Early promising results in dermatology

● Now used for weakly-supervised diagnosis in radiology, but CNNs appear to exploit information beyond specific disease-related imaging findings on x-rays to calibrate their disease predictions

Page 99: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Takeaways● What a convolutional neural network does

● Early promising results in dermatology

● Now used for weakly-supervised diagnosis in radiology, but CNNs appear to exploit information beyond specific disease-related imaging findings on x-rays to calibrate their disease predictions

● Domain adapted approaches are promising, butgeneralization performance needs assessment

Page 100: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Takeaways● What a convolutional neural network does

● Early promising results in dermatology

● Now used for weakly-supervised diagnosis in radiology, but CNNs appear to exploit information beyond specific disease-related imaging findings on x-rays to calibrate their disease predictions

● Domain adapted approaches are promising, butgeneralization performance needs assessment

● And how to put it all together?

Page 101: Deep Learning in Radiology - jrzech.github.io fileHuman-level performance? AUC of Rajpurkar et al. (2017): for pneumonia 0.7680 By comparison, AUC of Esteva et al. (2017): 0.91-0.96

Thank you!

...and everyone else who contributed to these projects!