Top Banner
The Singular Values of Convolutional Layers Hanie Sedghi Google Brain Joint work with Vineet Gupta and Phil Long
38

Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Jan 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

The Singular Values of Convolutional LayersHanie SedghiGoogle Brain

Joint work with Vineet Gupta and Phil Long

Page 2: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Neural Networks

22/38

Tremendous practical impact with deep learning

2/39

Page 3: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Deep Network Architecture Elementwise

NonlinearityLinear operation

33/38

Page 4: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Exploding and vanishing gradients

A1 A2 Al...x

● Gradients backpropagate

● Danger of explosion (NaN) and vanishing (very small changes)...

● … also in the forward direction

44/38

Page 5: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Key: singular values of linear layers

A1 A2 Al...x

● Main threats are linear layers

● Singular values bound factor by which layer

increases or decreases length of its input

55/38

Page 6: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Operator norm

A1 A2 Al...x

● Network Lipschitz constant ≤ product of operator norms of linear layers

● Motivates regularization via control of operator norms.

66/38

Page 7: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Residual networks

A1 A2... ...+

Operator norms of A1 and A2 are small

Singular values of block near 1

77/38

Page 8: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Control the operator norm

A1 A2 Al...x

● Regularization [Drucker and Le Cun, 1992; Hein and Andriushchenko,

2017; Yoshida and Miyato, 2017; Miyato et al., 2018]

● Generalization [Bartlett et al. 2017]

● Robustness to adversarial examples [Cisse et al. 2017]

88/38

Page 9: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Operator norm for Convolution

● Regularize networks by reducing the operator norm of the linear

transformation.

● Authors have identified operator norm as important, but they did not succeed

in finding operator norm for convolution.

● Resorted to approximations (Yoshida and Miyato, 2017; Miyato et al., 2018;

Gouk et al., 2018a).

99/38

Page 10: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Our Contribution:

● Characterize singular values of convolutional layers

● Simple, fast algorithm

● Regularizer (via projection)

Page 11: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Convolution Layer

Discrete-value Convolution

Linear combination of pixels

Applied locally

output input kernel

kernel

input

output

(Reproduced from medium.freecodecamp.org.)

1111/38

Page 12: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Multi-channel convolutional layer

1212/38

Page 13: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

1D Circular Convolution

The operator matrix is a circulant matrix

*

Operator matrix

1313/38

Page 14: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

1D Circular Convolution

The operator matrix is a circulant matrix

Discrete Fourier Transform

Column of F are eigenvectors.

*

Operator matrix

1414/38

Page 15: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

1D Circular Convolution

The operator matrix is a circulant matrix

Discrete Fourier Transform

Column of F are eigenvectors.

Singular values

*

Operator matrix

1515/38

Page 16: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

2D Single-channel Convolution

Operator matrix is a doubly-block circulant matrix

1616/38

Page 17: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

2D Single-channel Convolution

Operator matrix is a doubly-block circulant matrix

1717/38

Page 18: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

2D Single-channel Convolution

Operator matrix is a doubly-block circulant matrix

1818/38

Page 19: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

2D Multi-channel Convolution

1919/38

Page 20: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

2D Multi-channel Convolution

2020/38

Page 21: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

2D Multi-channel Convolution

2121/38

Page 22: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Proof Sketch

All blocks in M have the same eigenvectors, so...

=

2222/38

Page 23: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Proof Sketch

We need the singular values of

2323/38

Page 24: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Proof Sketch

We need the singular values of

2424/38

Page 25: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Proof SketchIf (v1, v2, v3) is a right singular vector of blue matrix, then(v1,0,0,0,v2,0,0,0,v3,0,0,0) is a right singular vector of the whole.

2525/38

Page 26: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

2D Multi-channel Convolution

2626/38

Page 27: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Computational Complexity

vs

2727/38

Page 28: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Application: Regularization

● Regularize deep convolutional networks by bounding the operator norm of each layer

● Improves generalization [Bartlett et al. 2017, Neyshabur et al. 2017]

● Improves robustness to adversarial attacks [Cisse et al. 2017]

Theorem [Lefkimmiatis et al., 2013] (paraphrased): Clipping the singular values of a matrix A at c projects A into set of matrices whose operator norm is bounded by c.

2828/38

Page 29: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Bounding the Operator Norm

● Clip singular values

● Problem: larger neighborhoods

● Solution: alternating projections, i.e.

○ project into set with bounded operator norm

○ project into set of convolutions with k×k neighborhoods

● For projection onto intersection, can use Dykstra’s algorithm

● In experiments, use simpler algorithm2929/38

Page 30: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Experiments

Page 31: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Experiments: Efficiency

3131/38

Page 32: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Effect of Clipping on test error

Clipping to 0.5 and 1.0 yielded test errors of 5.3% and 5.5% respectively.

Baseline error rate 6.2%

ResNet-32 on

CIFAR10 dataset

3232/38

Page 33: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Effect of Clipping on test errorResNet-32

CIFAR10 dataset

The projection does not slow down the training that much.

3333/38

Page 34: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

On batch normalization

● Earlier baseline uses batch normalization. which rescales weights

● Complicated interaction between batch norm and our method

● Repeated experiments without batchnorm

3434/38

Page 35: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Robustness to hyperparameter changesOperator-norm regularization and batch normalization are not redundant, and neither dominates the other.

3535/38

Page 36: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Singular values for ResNet V2

Layers closer to the input are plotted

with colors with a greater share of red.

The transformations with the largest

operator norms are closest to the input.

3636/38

Page 37: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Conclusion

● Characterized singular values of a 2D multichannel convnet.

● Provided efficient & practical method for deriving them for deep networks.

● This opens the door to various regularizers.

● We showed an effective projection into set of bounded norm operators.

3737/38

Page 38: Convolutional Layers The Singular Values of Google Brain ... · Key: singular values of linear layers A 1 A 2 A l... x Main threats are linear layers Singular values bound factor

Future work

● Experiments on more datasets.

● Improve state of the art models such as Generative Adversarial Networks.

Paper: to appear in ICLR 2019

Code: https://github.com/brain-research/conv-sv

Thank You!

3838/38