NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

DEPARTMENT OF HEALTH AND HUMAN SERVICES • National Institutes of Health • National Cancer InstituteFrederick National Laboratory is a federally funded research and development center operated by Leidos Biomedical Research, Inc., for the National Cancer Institute.

NIH.AI Workshop on Hyperparameters Optimization

Intro to hyperparameter sweeps techniquesGeorge Zaki, Ph.D. [C]BIDS, Frederick National Lab for Cancer Research (FNLCR)

July 18, 2019

Model’s Parameters

• Are fit during training• Result of model fitting or training• Also optimized during training

• Examples• Slope and intercept in linear modeling

• Weights and biases in Neural Nets

What are Hyperparameters?

• Parameters of your system with no straightforward method on how to set their values:– Usually set before learning process– Is not directly estimated from the data

deepai.org

Examples of Hyperparameters

• The depth of a decision tree• Number of trees in a forest• Number of hidden layers and neurons in a neural network, • Degree of regularization to prevent overfitting• K in K-means• Learning rate schedule in Stochastic Gradient Descent (SGD)• …. Activation functions

Generalized Machine Learning Workflow

https://github.com/ECP-CANDLE/Tutorials/tree/master/2019/ECP

What is Hyperparameter Optimization

• Models have a large number of possible configuration parameters, called hyperparameters

• Applying optimization can automate part of the design of machine learning models

• Involves two problem: – How to set the values of the hyperparameters?– How to manage multiple evaluations on compute resources?

Hyperparameter Optimization (tuning) = HPO

Generalized HPO Diagram

https://sigopt.com/blog/common-problems-in-hyperparameter-optimization/

Basic HPO Strategies

• Grid search: – Generate all possible combinatorial configurations

6 hyperparameters, each with 4 values: 4^6 = 4096 configurations

• Random search– Randomly select some configurations to evaluate

• Sequential grid search: – Sequentially adjust one hyperparameter at a time, while fixing all other

hyperparameters

• Generic optimization– Evolutionary algorithms (Simulated annealing, particle swarm, genetic algorithms)

– Bayesian Optimization– Gradient-Based Optimization– Model-based optimization (mlrMBO in R)

U-Net Hyper Parameters Example:288 possible configurations

ONLY 2 Levels of Max-Pooling

N_layers = {2,3,4,5}

How many convolution filters?

Num_filters= {16,32,64}

What is the activation function?

Activation= {relu, softmax, tanh}

Size of conv filter?

Filter_size = {3x3, 5x5}

Drop out some results to avoid

overfitting?

Drop_out = {0, 0.2, 0.4, 0.6, 0.8}

Effect of Hyper Parameters Sweepon the Objective Function

00.10.20.30.40.50.60.70.80.91

1 51 101 151 201 251

Configuration ID

DICE Values Intersection

Ground Truth

Predicted

DICE =

Frederick National Laboratory for Cancer Research

Baseline Methods: Grid Search & Random Search

• Embarrassingly parallel• Curse of dimensionality

• Embarrassingly parallel• Does not learn from history

Bayesian Optimization

1. Initially select random configurations to evaluate

2. Build a surrogate gaussian process as approximation of the objective function based on seen evaluations (posteriorydistribution)

3. Select good configurations to evaluate next based on a surrogate function (acquisition function) of your real objective.

4. Balance exploration versus exploitation

5. Repeat steps 2:4 until you reach your compute budget

Gaussian process approximation of objective function from Eric Brochu, Cora and Freitas 2010

Random versus Bayesian

https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f

HPO packages

• Python:– Hyperopt– scikit-optimize – Spearmint

• R:– mlrMBO

• Cloud:– Google’s Hypertune– Amazon’s SageMaker

• NN hyperparameter-specific optimization– NEAT, Optunity, …

HPO and High Performance Computing (HPC)

• HPO requires good amount of compute resources• HPC Used to manage large-scale training runs

– Hyperparameter searches O(104) jobs

– Cross validation (5-fold, 10-fold, etc.)

• Each job could be 10’s to 100’s of nodes• At NIH, we can use the Biowulf HPC cluster to perform these

evaluations

Survey

• Please follow the following link to share your thoughts about the workshop:

https://bit.ly/2JPagbe

References

• https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf

• https://cloud.google.com/blog/products/gcp/hyperparameter-tuning-cloud-machine-learning-engine-using-bayesian-optimization

• https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html

• https://roamanalytics.com/2016/09/15/optimizing-the-hyperparameter-of-which-hyperparameter-optimizer-to-use/

• https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/tune-model-hyperparameters

NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

Documents

Zappos as a HPO Final Paper

Waldorf Fryer FN8130 HPO Sales Brochure_c

Hpo Professional Ethics En

HyperDrive: Exploring Hyperparameters with POP...

FFD HPO Newsletter - November 2015

Dompet HPO Makara Etnik Terbaru 2014

Harga maika etnik hpo

HPO - Veranderingslijder of leider verandering

HP Certification Exam HPO-M15

Katarzyna Woznica´ arXiv:2002.04276v1 [stat.ML] 11 Feb...

HPO 24-26-28 Reciprocating compressor unit - Sabroe ·...

9 step to hpo

HPO organisational behavior

Efficient Hyperparameter Optimization of Deep Learning...

Hyperparameters, Tuning and Meta-Learning for Random Forest....

Hyperparameters and Validation Setssrihari/CSE676/5.3...