Top Banner
Network Dissection: Quantifying Interpretability of Deep Visual Representations By David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba CS 381V Thomas Crosley and Wonjoon Goo
39

Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Sep 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Network Dissection: Quantifying Interpretability of Deep Visual RepresentationsBy David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba

CS 381VThomas Crosley and Wonjoon Goo

Page 2: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Detectors

Page 3: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Credit: slide from the original paper

Page 4: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Unit Distributions

● Compute internal activations for entire dataset

● Gather distribution for each unit across dataset

Page 5: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Top Quantile

● Compute Tk such that P(ak> Tk ) = 0.005 ● Tk is considered the top-quantile● Detected regions at test time are those with ak> Tk

Page 6: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Detector Concept

● Score of each unit is its IoU with the label

● Detectors are selected with IoU above a threshold

● Threshold is Uk,c > 0.04.

Page 7: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Test Data

● Compute activation map akfor all k neurons in the network

Page 8: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Scaling Up

● Scale each unit’s activation up to the original image size

● Call this the mask-resolution SK

● Use bi-linear interpolation

Page 9: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Thresholding

● Now make the binary segmentation mask Mk

● Mk = SK> TK

SK MK

Page 10: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Experiment: Detector Robustness

● Interest in adversarial examples

● Invariance to noise

● Composition by parts or statistics

Page 11: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Noisy Images

+ Unif[0, 1] + 5 * Unif[0, 1]

+ 100 * Unif[0, 1]+ 10 * Unif[0, 1]

Page 12: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Conv3

Original

+ 5 * Unif[0, 1]

+ 10 * Unif[0, 1]

+ Unif[0, 1]

+ 100 * Unif[0, 1]

Page 13: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Conv4

Original

+ 5 * Unif[0, 1]

+ 10 * Unif[0, 1]

+ Unif[0, 1]

+ 100 * Unif[0, 1]

Page 14: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Conv5

Original

+ 5 * Unif[0, 1]

+ 10 * Unif[0, 1]

+ Unif[0, 1]

+ 100 * Unif[0, 1]

Page 15: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Rotated Images

10 degreesOriginal

45 degrees 90 degrees

Page 16: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

conv3

Original

10 degrees

45 degrees

90 degrees

Page 17: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

conv4

10 degrees

45 degrees

90 degrees

Original

Page 18: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

conv5

10 degrees

45 degrees

90 degrees

Original

Page 19: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Rearranged Images

Page 20: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Rearranged Images

Page 21: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Rearranged Images

Page 22: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Conv3

Original

4x4 Patches

8x8 Patches

Page 23: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Conv4

Original

4x4 Patches

8x8 Patches

Page 24: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Conv5

Original

4x4 Patches

8x8 Patches

Page 25: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Axis-Aligned Interpretability

Page 26: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Axis-Aligned Interpretability

● Hypothesis 1:○ A linear combination of high level units serves just same

or better○ No specialized interpretation for each unit

● Hypothesis 2: (the authors’ argument)○ A linear combination will degrade the interpretability○ Each unit serves for unique concept

How similar is the way CNN learns to human?

Page 27: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Axis-Aligned Interpretability Result from the Authors

Figure: from the paper

● It seems valid argument, but is it the best way to show?● Problems

○ It depends on a rotation matrix used for test○ A 90 degree rotation between two axis, does not affect the

number of unique detectors○ The test should be done multiple times and report the

means and stds.

Page 28: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Experiment: Axis-Aligned Interpretability

Page 29: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Is it really axis aligned?

● Principle Component Analysis (PCA)○ Find orthonormal vectors explaining samples the most ○ The projections to the vector u_1 have higher variance

Figure: From Andrew Ng’s lecture note on PCA

❖ Argument: a unit itself can explain a concept➢ Projections to unit vectors should have higher variance➢ Principal axis (Loading) from PCA should be similar to one

of the unit vectors

Page 30: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Our method

1. Calculate the mean and std. of each unit activation2. Grab activations for a specific concept3. Subtract mean and std from activations4. Perform SVD5. Print Loading

Hypothesis 1 Hypothesis 2

The concept is interpreted with the combination of elementary basis

The concept can be interpreted with an elementary basis (eg. e_502 := (0,...,0,1,0,...,0) )

Page 31: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

(Supplementary) PCA and Singular Value Decomposition (SVD)

● Optimize target:

● With Lagrange multiplier:

● The eigenvector for the highest eigenvalue becomes principal axis (loading)

From Cheng Li, Bingyu Wang Notes

Page 32: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

PCA Results - Activations for Bird Concept

● Unit 502 stands high; concept bird is aligned to the unit● Does Unit 502 only serve for concept Bird?

○ Yes○ It does not stand for other concepts except bird

● Support Hypothesis 2

Page 33: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

PCA Results - Activations for Train Concept

● No units stands out for concept train○ Linear combination of them have better interpretability○ Support Hypothesis 1

Page 34: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

PCA Results - Activations for Train Concept

● No units stands out for concept train○ Linear combination of them have interpretability

Some objects with circle and trestle?

Page 35: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

PCA Results - Activations for Train Concept

● No units stands out for concept train○ Linear combination of them have interpretability

The sequence of square boxes?

Page 36: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

PCA Results - Activations for Train Concept

● No units stands out for concept train○ Linear combination of them have interpretability Dog face!

Page 37: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Conclusion…?

● Actually, it seems mixed!● CNN learns some human concepts naturally, but not always

○ It might highly correlated with the label we give

Page 38: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Other Thoughts

● What if we regularize the network to encourage its interpretability? ○ Taxonomy-Regularized Semantic Deep Convolutional Neural Networks,

Wonjoon Goo, Juyong Kim, Gunhee Kim, and Sung Ju Hwang, ECCV 2016

Page 39: Deep Visual Representations Quantifying Interpretability ofvision.cs.utexas.edu/381V-fall2017/slides/crosley-goo-exp.pdfDeep Visual Representations By David Bau, Bolei Zhou, Aditya

Thanks!Any questions?