We circumvent the expensive evaluation with a deterministic RBF surrogate defined as: We tackle the high-dimensionality by reducing the probability φ of searching along dimension: We escape local minima by computing a compound score for each candidate point. The score is a dynamic weighted average of a metric based on the distance from the best found solution and of a metric based on the candidate surrogate value. Efficient Hyperparameter Optimization of Deep Learning Algorithms using Deterministic RBF Surrogates Ilija Ilievski a Taimoor Akhtar b Jiashi Feng c Christine Annette Shoemaker b [email protected] [email protected] [email protected] [email protected] ➢ In short ➢ More : bit.ly/hord-aaai Supplement and more at: ilija139.github.io a) Graduate School for Integrative Sciences and Engineering b) Industrial and Systems Engineering c) Electrical and Computer Engineering ➢ Details ➢ Results Optimization of 19 CNN hyperparameters Optimization of 8 CNN hyperparameters Optimization of 6 MLP hyperparameters Optimization of 15 CNN hyperparameters ➢ Algorithm HORD: Hyperparameter Optimization using deterministic RBF surrogate and DYCORS Input: n 0 init evals, N max max evals, m #candidates Sample n 0 points X n0 ≔{x i } Populate A n0 ≔{X n0 ,f(X n0 )} while n < N max update surrogate S n with A n ≔{X n ,f(X n )} set x best ≔argmin{f(X n )} sample m candidate points t i around x best according to probabilities φ n compute V ev , V dm , and W n for all t i set x * ≔argmin{W n (t i )} set A n+1 ≔{ A n ⋃ (x * ,f(x * )} end while return x best ➢ Comparison Evaluations required by HORD to match the best found error by state-of-the-art hyperparameter optimization algorithms Deep learning algorithms are powerful but are very sensitive to the many hyperparameters they have: number of layers and nodes, learning rate, weights initialization... Optimizing the validation error with respect to the hyperparameters involves the minimization of highly multimodal and expensive function in high dimensions. We propose an algorithm that matches the performance of the state-of-the-art hyperparameter optimization algorithms while using up to 6 times fewer evaluations. Distance metric: Surrogate value metric: Final candidate score: