Top Banner
SigOpt. Confidential. Tuning the Untunable Techniques for Accelerating Deep Learning Optimization Talk ID: S9313
36

Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

May 28, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Tuning the UntunableTechniques for Accelerating Deep Learning Optimization

Talk ID: S9313

Page 2: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.2

How I got here: 10+ years of tuning models

Page 3: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

Hardware Environment

SigOpt is a experimentation and optimization platform

TransformationLabeling

Pre-ProcessingPipeline Dev.Feature Eng.

Feature Stores

DataPreparation Experimentation, Training, Evaluation

Notebook, Library, Framework

Experimentation & Model Optimization

On-Premise Hybrid Multi-Cloud

Insights, Tracking, Collaboration

Model Search, Hyperparameter Tuning

Resource Scheduler, Management

ValidationServing

DeployingMonitoringManagingInference

Online Testing

ModelDeployment

Page 4: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.4

Experimentation drives to better results

Data and models

stay private

Iterative, automated optimization

Built specifically for scalable enterprise use cases

Training Data

AI/ML Model

Model Evaluation

TestingData

New Configurations

ObjectiveMetric

BetterResults

EXPERIMENT INSIGHTSOrganize and introspect

experiments

OPTIMIZATION ENSEMBLEExplore and exploit with a

variety of techniques

ENTERPRISE PLATFORMBuilt to scale with your models in production

RES

T A

PI

Page 5: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Previous Work: Tuning CNNs for Competing Objectives

5

Takeaway: Real world problems have trade-offs, proper tuning maximizes impact

https://devblogs.nvidia.com/sigopt-deep-learning-hyperparameter-optimization/

Page 6: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Previous Work: Tuning Survey on NLP CNNs

6

Takeaway: Hardware speedups and tuning efficiency speedups are multiplicative

https://aws.amazon.com/blogs/machine-learning/fast-cnn-tuning-with-aws-gpu-instances-and-sigopt/

Page 7: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Previous Work: Tuning MemN2N for QA Systems

7

Takeaway: Tuning impact grows for models with complex, dependent parameter spaces

https://devblogs.nvidia.com/optimizing-end-to-end-memory-networks-using-sigopt-gpus/

Page 8: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Takeaway: Real world applications require specialized experimentation and optimization tools

sigopt.com/blog

● Multiple metrics● Jointly tuning architecture + hyperparameters● Complex, dependent spaces● Long training cycles

Page 9: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

How do you more efficiently tune models that take a long time to train?

Page 10: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.10

AlexNet to AlphaGo Zero: A 300,000x Increase in Compute

2012 20192013 2014 2015 2016 2017 2018

.00001

10,000

1

Peta

flop/

s - D

ay (T

rain

ing)

Year

• AlexNet

• Dropout

• Visualizing and Understanding Conv Nets

• DQN

• GoogleNet

• DeepSpeech2

• ResNets

• Xception

• Neural Architecture Search• Neural Machine Translation

• AlphaZero• AlphaGo Zero

• TI7 Dota 1v1

VGG• Seq2Seq

Page 11: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.11

Speech Recognition Deep Reinforcement Learning Computer Vision

Page 12: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.12

Hardware can help

Page 13: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Tuning Acceleration Gain

Level of Effort for a Modeler to Build

Parallel TuningGains mostly

proportional to distributed tuning

width

Tuning MethodBayesian can drive 10x+ acceleration

over random

Tuning TechniqueMultitask, early termination can

reduce tuning time by 30%+

Today’s Focus

Page 14: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Start with a simple idea:We can use information about “partially trained” models

to more efficiently inform hyperparameter tuning

Page 15: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.15

Previous work: Hyperband / Early Termination

Random search, but stop poor performance early at a grid of checkpoints.

Converges to traditional random search quickly.

https://www.automl.org/blog_bohb/ and Li, et al, https://openreview.net/pdf?id=ry18Ww5ee

Page 16: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Building on prior research related to successive halving and Bayesian techniques, Multitask samples lower-cost tasks to

inexpensively learn about the model and accelerate full Bayesian Optimization.

Swersky, Snoek, and Adams, “Multi-Task Bayesian Optimization”http://papers.nips.cc/paper/5086-multi-task-bayesian-optimization.pdf

Page 17: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.17

Visualizing Multitask: Learning from Approximation

Partial Full

Source: Klein et al., https://arxiv.org/pdf/1605.07079.pdf

Page 18: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Cheap approximations promise a route to tractability, but bias and noise complicate their use. An unknown bias arises whenever a computational model incompletely models a real-world phenomenon, and is pervasive in applications.

Poloczek, Wang, and Frazier, “Multi-Information Source Optimization”https://papers.nips.cc/paper/7016-multi-information-source-optimization.pdf

Page 19: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.19

Visualizing Multitask: Power of Correlated Approximation Functions

Source: Swersky et al., http://papers.nips.cc/paper/5086-multi-task-bayesian-optimization.pdf

Page 20: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Why multitask optimization?

Page 21: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.21

Case: Putting Multitask Optimization to the TestGoal: Benchmark the performance of Multitask and Early Termination methods

Model: SVM

Dataset: Covertype, Vehicle, MNIST

Methods:● Multitask Enhanced (Fabolas) ● Multitask Basic (MTBO)● Early Termination (Hyperband)● Baseline 1 (Expected Improvement)● Baseline 2 (Entropy Search)

Source: Klein et al., https://arxiv.org/pdf/1605.07079.pdf

Page 22: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.22

Result: Multitask Outperforms other Methods

Pull from paper

Source: Klein et al., https://arxiv.org/pdf/1605.07079.pdf

Page 23: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Case study

Can we accelerate optimization and improve performance on a prevalent

deep learning use cases?

Page 24: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Case: Cars Image Classification

24

Stanford Datasethttps://ai.stanford.edu/~jkrause/cars/car_dataset.html

16,185 images, 196 classes Labels: Car, Make, Year

Page 25: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Resnet: A powerful tool for image classification

25

Page 26: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Architecture Comparison

ModelTuningImpact

Analysis

Experiment scenarios

26

Baseline SigOpt Multitask

ResNet 50Scenario 1a

Pre-Train on ImagenetTune Fully Connected Layer

Scenario 1bOptimize Hyperparameters to

Tune the Fully Connected Layer

ResNet 18 Scenario 2aFine Tune Full Network

Scenario 2bOptimize Hyperparameters to

Fine Tune the Full Network

Page 27: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Hyperparameter setup

27

Hyperparameter Lower Bound Upper Bound Categorical Values Transformation

Learning Rate 1.2e-4 1.0 - log

Learning Rate Scheduler 0 0.99 - -

Batch Size 16 256 - Powers of 2

Nesterov - - True, False -

Weight Decay 1.2e-5 1.0 - log

Momentum 0 0.9 - -

Scheduler Step 1 20 - -

Page 28: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Opportunity for Hyperparameter

Optimization toImpact Performance

Fully Tuning the Network Outperforms

Results: Optimizing and tuning the full network outperforms

28

Baseline SigOpt Multitask

ResNet 50 Scenario 1a46.41%

Scenario 1b47.99%

(+1.58%)

ResNet 18 Scenario 2a83.41%

Scenario 2b87.33%

(+3.92%)

Page 29: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Insight: Multitask improved optimization efficiency

29

Low-cost tasks overly sampled at the beginning... ...and inform the full-cost to drive accuracy over time

Example: Cost allocation and accuracy over time

Page 30: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.30

Insight: Multitask efficiency at the hyperparameter levelExample: Learning rate accuracy and values by cost of task over time

Progression of observations over time Accuracy and value for each observation Parameter importance analysis

Page 31: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Insight: Optimization improves real-world outcomes

31

Example: Misclassifications by baseline that were accurately classified by optimized model

Partial imagesPredicted: Chrylser 300

Actual: Scion xD

Name, design should helpPredicted: Chevy Monte Carlo

Actual: Lamborghini

Busy imagesPredicted: smart fortwoActual: Dodge Sprinter

Multiple carsPredicted: Nissan Hatchback

Actual: Chevy Sedan

Page 32: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Insight: Parallelization further accelerates wall-clock time

32

928 total hours to optimize ResNet 18220 observations per experiment20 p2.xlarge AWS ec2 instances45 hour actual wall-clock time

Page 33: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Implication: Multiple benefits from multitask

33

Cost efficiency Multitask Bayesian Random

Hours per training 4.2 4.2 4.2

Observations 220 646 646

Number of Runs 1 1 20

Total compute hours 924 2,713 54,264

Cost per GPU-hour $0.90 $0.90 $0.90

Total compute cost $832 $2,442 $48,838

Time to optimize Multitask Bayesian Random

Total compute hours 924 2,713 54,264

# of Machines 20 20 20

Wall-clock time (hrs) 46 136 2,713

1.7% the cost of random search to

achieve similar performance

58x faster wall-clock time to

optimize with multitask than random search

Page 34: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Impact of efficient tuning grows with model complexity

34

Page 35: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Summary

Optimizing particularly expensive models is a tough challenge

Hardware is part of the solution, as is adding width to your experiment

Algorithmic solutions offer compelling ways to further accelerate

These solutions typically improve model performance and wall-clock time

35

Page 36: Tuning the Untunable - Nvidia...Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing

SigOpt. Confidential.

Thank you!

Learn more about Multitask Optimization: https://app.sigopt.com/docs/overview/multitask

Free access for Academics & Nonprofits: https://sigopt.com/edu

Solution-oriented program for the Enterprise: https://sigopt.com/pricing

Leading applied optimization research: https://sigopt.com/research

GitHub repo for this use case: https://github.com/sigopt/sigopt-examples/tree/master/stanford-car-classification

… and we're hiring! https://sigopt.com/careers