Top Banner
SMASH: One-Shot Model Architecture Search Through HyperNetworks Authors: Andrew Brock, Theodore Lim, J.M. Ritchie, and Nick Weston April 14, 2018 Presentation by Kamal Rai
12

SMASH: One-Shot Model Architecture Search Through ... · The Motivation When training neural networks, we: Fix the network architecture Specify a loss function L Find optimal weights

Jan 13, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SMASH: One-Shot Model Architecture Search Through ... · The Motivation When training neural networks, we: Fix the network architecture Specify a loss function L Find optimal weights

SMASH: One-Shot Model Architecture

Search Through HyperNetworks

Authors: Andrew Brock, Theodore Lim, J.M. Ritchie, and Nick Weston

April 14, 2018

Presentation by Kamal Rai

Page 2: SMASH: One-Shot Model Architecture Search Through ... · The Motivation When training neural networks, we: Fix the network architecture Specify a loss function L Find optimal weights

The Motivation

When training neural networks, we:

• Fix the network architecture

• Specify a loss function L• Find optimal weights W using backprop to minimize dL

dW

Iterate over design decisions until we obtain a good model

Model hyperparameters: Depth, width, connectivity

1

Page 3: SMASH: One-Shot Model Architecture Search Through ... · The Motivation When training neural networks, we: Fix the network architecture Specify a loss function L Find optimal weights

The Motivation

Finding optimal architectures requires extensive experimentation

Current automated architecture selection methods are expensive

Evolutionary techiniques and reinforcement learning

Given randomly sampled hyperparameters c , we can iteratively:

1. Optimize the weights of an auxilary network using ∂L(wc )∂Wc

∂Wc∂c

2. Optimize the weights of the main network

2

Page 4: SMASH: One-Shot Model Architecture Search Through ... · The Motivation When training neural networks, we: Fix the network architecture Specify a loss function L Find optimal weights

The HyperNetwork

Figure 1: Generate weights using an auxilary network

3

Page 5: SMASH: One-Shot Model Architecture Search Through ... · The Motivation When training neural networks, we: Fix the network architecture Specify a loss function L Find optimal weights

The Training Algorithm

Algorithm 1: Smash

Input: Space of all candidate architectures Rc

Initialize HyperNet weights H

loop

Sample input minibatch xi , random architecture c, and

architecture weights W (c)

Get training error Et = fc(W , xi ) = fc(H(c), xi ), backprop dEdW

through the HyperNet and then update H

end loop

loop

Sample a random architecture c and evaluate error on

validation set Ev = fc(H(c), xv )

end loop

Fix architecture and train normally with freely-varying weights W

4

Page 6: SMASH: One-Shot Model Architecture Search Through ... · The Motivation When training neural networks, we: Fix the network architecture Specify a loss function L Find optimal weights

Sampling Weights

Figure 2: Sampling from a hypernetwork

5

Page 7: SMASH: One-Shot Model Architecture Search Through ... · The Motivation When training neural networks, we: Fix the network architecture Specify a loss function L Find optimal weights

Ranking Candidate Models

Figure 3: Exploring performance on CIFAR-100

6

Page 8: SMASH: One-Shot Model Architecture Search Through ... · The Motivation When training neural networks, we: Fix the network architecture Specify a loss function L Find optimal weights

The strength of correlation depends on

• The capacity of the hypernet

• The ratio of hypernet generated weights to freely learned

weights

7

Page 9: SMASH: One-Shot Model Architecture Search Through ... · The Motivation When training neural networks, we: Fix the network architecture Specify a loss function L Find optimal weights

The Memory Model

Figure 4: Layers are ops that read and write to memory

8

Page 10: SMASH: One-Shot Model Architecture Search Through ... · The Motivation When training neural networks, we: Fix the network architecture Specify a loss function L Find optimal weights

An Experiment

Figure 5: Benchmark results

9

Page 11: SMASH: One-Shot Model Architecture Search Through ... · The Motivation When training neural networks, we: Fix the network architecture Specify a loss function L Find optimal weights

Limitations

• The space of candidate architectures must be pre-specified

• Does not address regularization or learning rate

• Not jointly training the hypernet and the main network

• Not using gradients to optimize the choice of main network

10

Page 12: SMASH: One-Shot Model Architecture Search Through ... · The Motivation When training neural networks, we: Fix the network architecture Specify a loss function L Find optimal weights

Conclusion

Can efficiently explore architectures using Hypernet weights

Two Related Works

• Hyperparameter Optimization with Hypernets. J. Lorraine and

D. Duvenaud

• Hyper-bandit: Bandit-based Configuration Evaluation for

Hyperparameter Optimization. L. Li, K. Jamieson, and G.

DeSalvo

11