BBO MADS and categorical variables HPO Computational experiments Discussion HYPERNOMAD: Hyperparameter optimization of deep neural networks using mesh adaptive direct search S´ ebastien Le Digabel, Dounia Lakhmiri, Christophe Tribes CORS 2019 2019–05–28 HYPERNOMAD: Hyperparameter optimization with MADS 1/30
33
Embed
HYPERNOMAD: Hyperparameter optimization of deep neural … · 2020. 10. 6. · BBO MADS and categorical variables HPO Computational experiments Discussion HYPERNOMAD: Hyperparameter
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
BBO MADS and categorical variables HPO Computational experiments Discussion
HYPERNOMAD: Hyperparameter optimization of deepneural networks using mesh adaptive direct search
Sebastien Le Digabel, Dounia Lakhmiri, Christophe Tribes
CORS 2019
2019–05–28
HYPERNOMAD: Hyperparameter optimization with MADS 1/30
BBO MADS and categorical variables HPO Computational experiments Discussion
Presentation outline
Blackbox optimization
The MADS algorithm with categorical variables
Hyperparameters Optimization (HPO)
Computational experiments
Discussion
HYPERNOMAD: Hyperparameter optimization with MADS 2/30
BBO MADS and categorical variables HPO Computational experiments Discussion
Blackbox optimization
The MADS algorithm with categorical variables
Hyperparameters Optimization (HPO)
Computational experiments
Discussion
HYPERNOMAD: Hyperparameter optimization with MADS 3/30
BBO MADS and categorical variables HPO Computational experiments Discussion
Blackbox optimization (BBO) problems
I Optimization problem:
minx∈Ω
f(x)
I Evaluations of f (the objective function) and of the functions defining Ω areusually the result of a computer code (a blackbox).
I Variables are typically continuous, but in this work, some of them are discrete –integers or categorical variables.
HYPERNOMAD: Hyperparameter optimization with MADS 4/30
BBO MADS and categorical variables HPO Computational experiments Discussion
Blackbox optimizationWe consider
minx∈Ω
f(x)
where the evaluations of f and the functions defining Ω are the result of a computersimulation (a blackbox).
- -x ∈ Rn f(x)
x ∈ Ω ?
I Each call to the simulation may be expensive.I The simulation can fail.I Sometimes f(x) 6= f(x).I Derivatives are not available and cannot be approximated.
HYPERNOMAD: Hyperparameter optimization with MADS 5/30
BBO MADS and categorical variables HPO Computational experiments Discussion
Blackbox optimizationWe consider
minx∈Ω
f(x)
where the evaluations of f and the functions defining Ω are the result of a computersimulation (a blackbox).
- -x ∈ Rn f(x)
x ∈ Ω ?
I Each call to the simulation may be expensive.I The simulation can fail.I Sometimes f(x) 6= f(x).I Derivatives are not available and cannot be approximated.
HYPERNOMAD: Hyperparameter optimization with MADS 5/30
BBO MADS and categorical variables HPO Computational experiments Discussion
Blackbox optimization
The MADS algorithm with categorical variables
Hyperparameters Optimization (HPO)
Computational experiments
Discussion
HYPERNOMAD: Hyperparameter optimization with MADS 6/30
BBO MADS and categorical variables HPO Computational experiments Discussion
General framework
Algorithm
-
x
f(x)
x ∈ Ω ?
HYPERNOMAD: Hyperparameter optimization with MADS 7/30
BBO MADS and categorical variables HPO Computational experiments Discussion
Mesh Adaptive Direct Search (MADS) in Rn
I [Audet and Dennis, Jr., 2006].
I Iterative algorithm that evaluates the blackbox at some trial points on a spatialdiscretization called the mesh.
I One iteration = search and poll.
I The search allows trial points generated anywhere on the mesh.
I The poll consists in generating a list of trial points constructed from polldirections. These directions grow dense.
I At the end of the iteration, the mesh size is reduced if no new success point isfound.
I Algorithm backed by a convergence analysis.
HYPERNOMAD: Hyperparameter optimization with MADS 8/30
BBO MADS and categorical variables HPO Computational experiments Discussion
[0] Initializations (x0, ∆0: initial poll size )[1] Iteration k
let δk ≤ ∆k be the mesh size parameterSearch
test a finite number of mesh pointsPoll (if the Search failed)
construct set of directions Dk
test poll set Pk = xk + δkd : d ∈ Dkwith ‖δkd‖ ' ∆k
[2] Updatesif success
xk+1 ← success pointincrease ∆k
elsexk+1 ← xkdecrease ∆k
k ← k + 1, stop if ∆k ≤ ∆min or go to [1]
HYPERNOMAD: Hyperparameter optimization with MADS 9/30
BBO MADS and categorical variables HPO Computational experiments Discussion
Poll illustration (successive fails and mesh shrinks)
δk = 1
∆k = 1
rxk
rp1
rp2
r
p3
trial points=p1, p2, p3
HYPERNOMAD: Hyperparameter optimization with MADS 10/30
BBO MADS and categorical variables HPO Computational experiments Discussion
Poll illustration (successive fails and mesh shrinks)
δk = 1
∆k = 1
rxk
rp1
rp2
r
p3
trial points=p1, p2, p3
δk+1 = 1/4
∆k+1 = 1/2
rxkr
p4 rp5
r
p6
= p4, p5, p6
HYPERNOMAD: Hyperparameter optimization with MADS 10/30
BBO MADS and categorical variables HPO Computational experiments Discussion
Poll illustration (successive fails and mesh shrinks)
δk = 1
∆k = 1
rxk
rp1
rp2
r
p3
trial points=p1, p2, p3
δk+1 = 1/4
∆k+1 = 1/2
rxkr
p4 rp5
r
p6
= p4, p5, p6
δk+2 = 1/16
∆k+2 = 1/4
rxk
rp7
r p8rSSp9
= p7, p8, p9
HYPERNOMAD: Hyperparameter optimization with MADS 10/30
BBO MADS and categorical variables HPO Computational experiments Discussion
Types of variables in MADS
I MADS has been initially designed for continuous variables.
I Some theory exists for categoricalvariables [Audet and Dennis, Jr., 2001, Abramson, 2004, Abramson et al., 2009].
I (Other discrete variables now considered in MADS: Integer, binary,granular [Audet et al., 2019]).
I Two kinds of “categorical” variables:
I Non-orderable and unrelaxable discrete variables.
I An integer whose value changes the number of variables of the problem.
HYPERNOMAD: Hyperparameter optimization with MADS 11/30
BBO MADS and categorical variables HPO Computational experiments Discussion
Example: A thermal insulation system
!"
!#
$%&'( %& %&)(
!&'(!&!&)(
*&'(
*&
∆%&
min∆x,T,n,M
power(∆x,T,n,M)
s.t. ∆x ≥ 0 TC ≤ T ≤ THn ∈ N M ∈Materials
HYPERNOMAD: Hyperparameter optimization with MADS 12/30
BBO MADS and categorical variables HPO Computational experiments Discussion
MADS with categorical variables
I [Abramson et al., 2009].
I The search is still a finite search on the mesh, free of any rules.
I The poll is the failsafe step that evaluates function values at mesh neighbors forthe continuous variables, and in a user-defined set of neighbors N (xk).
I This set of neighbors defines a notion of local optimality.
HYPERNOMAD: Hyperparameter optimization with MADS 13/30
BBO MADS and categorical variables HPO Computational experiments Discussion
Extended poll
•xk
•yk •zk•yjk
HYPERNOMAD: Hyperparameter optimization with MADS 14/30
BBO MADS and categorical variables HPO Computational experiments Discussion
Extended poll
• • ••xk−1
xkxk+1
x
•••
•
yk−1
ykyk+1
y
•••
•
zk−1
zkzk+1
z
•yjk
•y
HYPERNOMAD: Hyperparameter optimization with MADS 15/30
BBO MADS and categorical variables HPO Computational experiments Discussion
Blackbox optimization
The MADS algorithm with categorical variables
Hyperparameters Optimization (HPO)
Computational experiments
Discussion
HYPERNOMAD: Hyperparameter optimization with MADS 16/30
BBO MADS and categorical variables HPO Computational experiments Discussion
HPO with HYPERNOMAD
I PhD project of Dounia Lakhmiri.
I We focus on the HPO of deep neural networks.
I Our advantages:I Blackbox optimization problem:
One blackbox call = Training + validation + test, for a fixed set of hyperparameters.
I Presence of categorical variables (ex.: number of layers).
I Existing methods are mostly heuristics(grid search, random search, GAs, etc.)
I Based on the NOMAD implementation of MADS.
HYPERNOMAD: Hyperparameter optimization with MADS 17/30
BBO MADS and categorical variables HPO Computational experiments Discussion
References IAbramson, M. (2004).
Mixed variable optimization of a Load-Bearing thermal insulation system using a filter pattern search algorithm.Optimization and Engineering, 5(2):157–177.
Abramson, M., Audet, C., Chrissis, J., and Walston, J. (2009).
Mesh Adaptive Direct Search Algorithms for Mixed Variable Optimization.Optimization Letters, 3(1):35–47.
Audet, C. and Dennis, Jr., J. (2001).
Pattern search algorithms for mixed variable programming.SIAM Journal on Optimization, 11(3):573–594.
Audet, C. and Dennis, Jr., J. (2006).
Mesh Adaptive Direct Search Algorithms for Constrained Optimization.SIAM Journal on Optimization, 17(1):188–217.
Audet, C., Le Digabel, S., and Tribes, C. (2019).
The Mesh Adaptive Direct Search Algorithm for Granular and Discrete Variables.SIAM Journal on Optimization, 29(2):1164–1189.
Bergstra, J. and Bengio, Y. (2012).
Random search for hyper-parameter optimization.Journal of Machine Learning Research, 13:281–305.
HYPERNOMAD: Hyperparameter optimization with MADS 29/30
BBO MADS and categorical variables HPO Computational experiments Discussion
References II
Deshpande, A. (2019).
A Beginner’s Guide To Understanding Convolutional Neural Networks.https:
Diaz, G., Fokoue, A., Nannicini, G., and Samulowitz, H. (2017).
An effective algorithm for hyperparameter optimization of neural networks.IBM Journal of Research and Development, 61(4):9:1–9:11.
Hutter, F., Hoos, H. H., and Leyton-Brown, K. (2011).
Sequential model-based optimization for general algorithm configuration.In International Conference on Learning and Intelligent Optimization, pages 507–523. Springer.
Le Digabel, S. (2011).
Algorithm 909: NOMAD: Nonlinear Optimization with the MADS algorithm.ACM Transactions on Mathematical Software, 37(4):44:1–44:15.
HYPERNOMAD: Hyperparameter optimization with MADS 30/30