Top Banner
Silicon Valley AI Lab Deep Learning scaling is predictable (empirically) Greg Diamos December 9, 2017
35

Deep Learning scaling is predictable (empirically)

Nov 18, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep Learning scaling is predictable (empirically)

Silicon Valley AI Lab

Deep Learning scaling is predictable(empirically)Greg DiamosDecember 9, 2017

Page 2: Deep Learning scaling is predictable (empirically)

AI

• AI is like electricity

Page 3: Deep Learning scaling is predictable (empirically)

Deep Learning scalesAc

cura

cy

Data + Model Size

Deep LearningTraditional methods

Page 4: Deep Learning scaling is predictable (empirically)

Why?

• Why do deep neural networks scale so well?

• How much data do we need?

• How fast do computers need to be?

Page 5: Deep Learning scaling is predictable (empirically)

This talk: looking deeper

Page 6: Deep Learning scaling is predictable (empirically)

SVAIL’s ASIMOV supercomputer• We used a 11 PFLOP/s GPU

supercomputer to study deep learning scaling

• 1500 GPUs

• 2 months training time

• **This experiment would cost over $2 million USD if performed on AWS**

Page 7: Deep Learning scaling is predictable (empirically)

Application domains

Speech RecognitionSpeech Synthesis

Natural LanguageUnderstanding

Computer Vision

Page 8: Deep Learning scaling is predictable (empirically)

State of the art neural nets

+

relu

weights

weights

H

C

T

+

*

* H

C

T

+

*

*

CONV + RNN

SPRECTRA NET

RECURRENTHIGHWAY NET

RNN + ATTENTIONRESNET

Page 9: Deep Learning scaling is predictable (empirically)

Methodology

More Data BiggerModel

Page 10: Deep Learning scaling is predictable (empirically)

Generalization error scaling

Page 11: Deep Learning scaling is predictable (empirically)

Generalization error scaling

Neural Language ModelDeep Speech

Page 12: Deep Learning scaling is predictable (empirically)

Model size scaling data

Page 13: Deep Learning scaling is predictable (empirically)

Model size scaling data

Resnet50 Object Detection Neural Language Model

Page 14: Deep Learning scaling is predictable (empirically)

Silicon Valley AI Lab

What do you think?

Page 15: Deep Learning scaling is predictable (empirically)

We find: generalization error scaling consistently follows a power-law

log(Error)

log(Data)

Best Guess

Irreducible Error(model bias, Bayes Error, etc)

Page 16: Deep Learning scaling is predictable (empirically)

We find: model size scales sublinearly

BestModelSize

Data

SOTA Models

Page 17: Deep Learning scaling is predictable (empirically)

Silicon Valley AI Lab

Acknowledgements

Page 18: Deep Learning scaling is predictable (empirically)

Silicon Valley AI Lab

The Deep Learning Recipe

Page 19: Deep Learning scaling is predictable (empirically)

Data-limited problems

log(Error)

log(Data)

Best Guess

Irreducible Error(model bias, Bayes Error, etc)

Not Enough Data!

Page 20: Deep Learning scaling is predictable (empirically)

Compute-limited problems

log(Error)

log(Data)

Best Guess

Irreducible Error(model bias, Bayes Error, etc)

It Takes Forever!

Page 21: Deep Learning scaling is predictable (empirically)

Solved problems

log(Error)

log(Data)

Best Guess

Irreducible Error(model bias, Bayes Error, etc)

Acceptable Error

Page 22: Deep Learning scaling is predictable (empirically)

Impossible problems

log(Error)

log(Data)

Best Guess

Irreducible Error(model bias, Bayes Error, etc)

Acceptable Error

Page 23: Deep Learning scaling is predictable (empirically)

Impossible problem

Page 24: Deep Learning scaling is predictable (empirically)

Silicon Valley AI Lab

Implications

Page 25: Deep Learning scaling is predictable (empirically)

#1: Data is extremely valuable• If all you need is scale, then we should invest in data

• How can we reduce the cost to collect and label data?

Page 26: Deep Learning scaling is predictable (empirically)

#2: Achievable error follows Moore’s Law

log(Error)

log(Data)

Random Guessing

Irreducible Error(model bias, Bayes Error, etc)

Acceptable Error

log(Computer Speed)

Page 27: Deep Learning scaling is predictable (empirically)

#2: Achievable error follows Moore’s LawSupporting Evidence

6http://cpudb.stanford.edu/

log(ComputerSpeed)

Time

Page 28: Deep Learning scaling is predictable (empirically)

#2: Achievable error follows Moore’s Lawlo

g(Ac

hiev

able

Erro

r)

Time

Random Guessing

Irreducible Error(model bias, Bayes Error, etc)

Page 29: Deep Learning scaling is predictable (empirically)

#3: Requirements are predictable

• We can now predict

• How much data we need

• How fast computers need to be

Page 30: Deep Learning scaling is predictable (empirically)

#4: Model architecture search

• Search may be feasible in the small data regime• if architecture affects the intercept, not the slope

• Caveats:• variance• models with different irreducible error

Page 31: Deep Learning scaling is predictable (empirically)

Silicon Valley AI Lab

We need you!

Page 32: Deep Learning scaling is predictable (empirically)

Reproduce our work

+

relu

weights

weights

H

C

T

+

*

* H

C

T

+

*

*

SPRECTRA NETRECURRENT

HIGHWAY NET

RNN + ATTENTIONRESNET ?

Page 33: Deep Learning scaling is predictable (empirically)

Build AI Data Centers

AI Node1x

2017

AI Data Center10,000x-100,000x

2025

Improved AI Chips10x-100x

2025

Page 34: Deep Learning scaling is predictable (empirically)

Join Us!

• http://bit.ly/join-svail

Page 35: Deep Learning scaling is predictable (empirically)

Silicon Valley AI Lab

Deep Learning scaling is predictable(empirically)http://research.baidu.com/deep-learning-scaling-predictable-empirically/https://arxiv.org/abs/1712.00409

Greg DiamosDecember 9, 2017