Top Banner
Meta-Learning Unsupervised Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein
33

Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Sep 30, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Meta-Learning Unsupervised Update Rules

Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein

Page 2: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Outline

Motivation

Problem Breakdown

Method Overview

Meta-Learning Setup

Inner Loop

Outer Loop

Experimental Results

Critiques

Page 3: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

MotivationMotivation

Problem BreakdownMethod Overview

Meta-Learning SetupInner Loop

Outer LoopResults

Critiques

Unsupervised learning enables representation learning on mountains on unlabeled data for downstream tasks

Page 4: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

MotivationMotivation

Problem BreakdownMethod Overview

Meta-Learning SetupInner Loop

Outer LoopResults

Critiques

Unsupervised learning enables representation learning on mountains of unlabeled data for downstream tasks.

Unsupervised Learning Rules● VAE: Severe overfitting to training space.● GANs: Great for images, weak on discrete data (ex. text).● Both: Learning rule not unsupervised (ex. surrogate loss).

Page 5: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

MotivationMotivation

Problem BreakdownMethod Overview

Meta-Learning SetupInner Loop

Outer LoopResults

Critiques

Unsupervised learning enables representation learning on mountains of unlabeled data for downstream tasks

Unsupervised Learning Rules● VAE: Severe overfitting to training space.● GANs: Great for images, weak on discrete data (ex. text).● Both: Learning rule not unsupervised (ex. surrogate loss).

Question: Can we meta-learn an unsupervised learning rule?

Page 6: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Apply encoder to get compact vector

Semi-Supervised Few-Shot Classification

x1 x2 x3 x4

Labeled train

Fit Model

Apply unsupervised rule to tune encoder

x1 x2 x3 x4

Unlabeled train

x5

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

y1 y2 y3 y4

Page 7: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Semi-Supervised Few-Shot Classification

Apply encoder to get compact vector

x1 x2 x3 x4

Labeled train

Fit Model

Apply unsupervised rule to tune encoder

x1 x2 x3 x4

Unlabeled train

x5

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

y1 y2 y3 y4

Can we meta-learn this unsupervised learning rule?

Page 8: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Learning the Learning Rule

Backpropagation:

Unsupervised Update:

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 9: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Method Overview

Outer loop● Optimize meta-objective:

Inner loop● Learn encoder using unsupervised update rule.

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 10: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Meta-Learning SetupMotivation

Problem BreakdownMethod Overview

Meta-Learning SetupInner Loop

Outer LoopResults

Critiques

Page 11: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Meta-Learning Setup

Inner loop applies an unsupervised learning alg. on unlabeled data

Outer loop evaluates unsupervised learning alg. using labeled data

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 12: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Meta-Learning Setup

Inner loop applies an unsupervised learning alg. on unlabeled data

Outer loop evaluates unsupervised learning alg. using labeled data

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 13: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Inner Loop

Question: Given a base model, g(x; ɸ), which encodes inputs into compact vectors, how do we learn its parameters ɸ to give useful features?

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 14: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Inner Loop

Question: Given a base model, g(x; ɸ), which encodes inputs into compact vectors, how do we learn its parameters ɸ to give useful features?

Idea: What if we use another neural network to generate a neuron-specific error signal?

Then we can learn its parameters θ (the meta-parameters) to produce useful error signals

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 15: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Inner Loop: Forward Pass

1) Take an input

2) Generate intermediate activations

3) Produce a feature representation

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 16: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Inner Loop: Generate Error Signal

1) Input each layer’s activation through an MLP

2) Output error vector

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 17: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Inner Loop: Backward Pass

1) Initialize top-level error with output of MLP2) Backprop the error

3) Linearly combine output from MLP with backpropagated error

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 18: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Inner Loop: Update 𝝓𝝓 consists of all base model parameters Wi, Vi, and bi

Updates like ΔWi, ΔVi are linear* functions of local error quantities hi-1 and hi

*There are also nonlinear normalizations within this function

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 19: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Inner Loop: Key Points

● Error generating network replicates the mechanics of backprop for unsupervised learning

● An iterative updates tune 𝝓 for some higher-level objective

● Outer loop sets objective by modifying the error generating function

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 20: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Inner Loop: Key Points

● Error generating network replicates the mechanics of backprop for unsupervised learning

● An iterative updates tune 𝝓 for some higher-level objective

● Outer loop sets objective by modifying the error generating function

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 21: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Inner Loop: Key Points

● Error generating network replicates the mechanics of backprop for unsupervised learning

● An iterative updates tune 𝝓 for some higher-level objective

● Outer loop sets objective by modifying the error generating function

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 22: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Inner Loop: Key Points

● Error generating network replicates the mechanics of backprop for unsupervised learning

● An iterative updates tune 𝝓 for some higher-level objective

● Outer loop sets objective by modifying the error generating function

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 23: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Outer LoopMotivation

Problem BreakdownMethod Overview

Meta-Learning SetupInner Loop

Outer LoopResults

Critiques

Page 24: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Outer Loop: Compute MetaObjective

Apply encoder

x1 x2 x3 x4

Labeled support

MS Error

Labeled query

x*1 x*

2

Fit Linear Model

Evaluate Model

Apply Unsupervised Ruleθ to tune Encoder

x1 x2 x3 x4

Unlabeled support

x5

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

y1 y2 y3 y4 y*1 y*

2

Page 25: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Outer Loop: Compute MetaObjective

Apply encoder

x1 x2 x3 x4

Labeled support

MS Error

Labeled query

x*1 x*

2

Fit Linear Model

Evaluate Model

Apply Unsupervised Ruleθ to tune Encoder

x1 x2 x3 x4

Unlabeled support

x5

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

x1 x2 x3 x4 x*1 x*

2

Page 26: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Outer Loop: Compute MetaObjective

Apply encoder

x1 x2 x3 x4

Labeled support

MS Error

Labeled query

x*1 x*

2

Fit Linear Model

Evaluate Model

Apply Unsupervised Ruleθ to tune Encoder

x1 x2 x3 x4

Unlabeled support

x5

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

x1 x2 x3 x4 x*1 x*

2

Backprop all the way back to θ

Page 27: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Outer Loop: Compute MetaObjective

Apply encoder

x1 x2 x3 x4

Labeled support

MS Error

Labeled query

x*1 x*

2

Fit Linear Model

Evaluate Model

Apply Unsupervised Ruleθ to tune Encoder

x1 x2 x3 x4

Unlabeled support

x5

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

x1 x2 x3 x4 x*1 x*

2

Backprop all the way back to θ

Truncated backprop

Page 28: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Results

Training Data: CIFAR10 & Imagenet.

● Generalization over datasets.

● Generalization over domains

● Generalization over network architectures

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 29: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Results: Generalization over datasets

What’s going on? - Evaluation of unsupervised learning rule on different datasets - Comparison to other methods.

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 30: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Results: Generalization over Domains

What’s going on? Evaluation of unsupervised learning rule on 2-way text classification.30h vs 200h of meta-training.

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 31: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Results: Generalization over Networks

What’s going on?Evaluation of unsupervised learning rule on different network architectures.

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 32: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Critiques: Limitations

Computationally expensive. 8 days, 512 workers.

Many, many tricks.

Lack of ablative analysis.

Reproducibility. # labeled examples? # unlabeled?

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques

Page 33: Update Rules Meta-Learning Unsupervisedcs330.stanford.edu/presentations/presentation-10.16-1.pdf · Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein.

Critiques: Suggestions

Ablative analysis

Implicit MAML?

Investigate generalization to CNN and attention-based models.

Better way to encode learning rule? Is this architecture expressive?

MotivationProblem Breakdown

Method OverviewMeta-Learning Setup

Inner LoopOuter Loop

ResultsCritiques