Top Banner
Overview and Recent Advances in Derivative Free Optimization Katya Scheinberg Joint work with A. Berahas, J. Blanchet, L. Cao, C. Cartis, A. R. Conn, M. Menickelly, C. Paquette, L. Vicente School of Operations Research and Information Engineering IPAM Workshop: From Passive to Active: Generative and Reinforcement Learning with Physics, Sept 23-27, 2019
47

Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Oct 09, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Overview and Recent Advances in Derivative Free Optimization

Katya Scheinberg

Joint work with A. Berahas, J. Blanchet, L. Cao, C. Cartis, A. R. Conn, M. Menickelly, C.Paquette, L. Vicente

School of Operations Research and Information Engineering

IPAM Workshop: From Passive to Active: Generative and Reinforcement Learning withPhysics, Sept 23-27, 2019

Page 2: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Local and Global Optimization

From Roos, Terlaky and DeKlerk, ”Nonlinear Optimisation”, 2002.

Katya Scheinberg (Cornell University) 2 / 37

Page 3: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Optimization and gradient descent

Katya Scheinberg (Cornell University) 3 / 37

Page 4: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Black Box Optimization Problems

minx∈Rn

f (x)

x f(x)BLACK BOX

f nonlinear function; derivatives of f not available

Noisy functions, stochastic or deterministic

minx∈Rn

f (x) = φ(x) + ε(x) minx∈Rn

f (x) = φ(x)(1 + ε(x))

Katya Scheinberg (Cornell University) 4 / 37

Page 5: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Motivation

Machine Learning

Source(s): https://blog.statsbot.co/, https://campus.datacamp.com/

Deep Learning

Source(s): https://medium.com/

Reinforcement Learning

Source(s): http://people.csail.mit.edu/

Katya Scheinberg (Cornell University) 5 / 37

Page 6: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Optimizingpropertiesobtainedfromexpensivesimulationsorexperiments

Criticaltemperaturesfrommoleculardynamics

simulations

Dignon et al. ACS Cent. Sci., Article ASAP

ReactionrateestimationfromkineticMonteCarlo

simulations

Activationbarriersfromquantummechanicalnudgedelasticbandcalculations

ZACROS (http://zacros.org/tutorials) Andersen et al. Front. Chem. 2019

Yieldestimationfromexperimentalorganicsynthesis

reactorsystems

• Manyexamplesexistinthedomainofmolecularandmaterialssciencewherecalculatingapropertyrequiresexpensivecomputationsorexperiments

• Inmanyofthesecases,derivativesarenotavailable

Holmesetal.React.Chem.Eng.,2016,1,36

Katya Scheinberg (Cornell University) 6 / 37

Page 7: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Derivative-free methods: direct and random search

Iterative algorithms that converge to a local optima.

In each iteration:1 Evaluate a set of sample points around the current iterate;2 Choose the sample point with the best function value;3 Make this point the next iterate;

Katya Scheinberg (Cornell University) 7 / 37

Page 8: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Derivative free methods: model-based

Iterative algorithms that converge to a local optimum.

In each iteration:

1 Evaluate a set of sample points around the current iterate;

2 Interpolate the sample points with a linear or quadratic model;

3 Use this model to find the next iterate;

Katya Scheinberg (Cornell University) 8 / 37

Page 9: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Model-Based Trust Region Method (pioneered by M.J.D. Powell)

(a) starting point (b) initial sampling

Katya Scheinberg (Cornell University) 9 / 37

Page 10: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Model-Based Trust Region Method

Katya Scheinberg (Cornell University) 10 / 37

Page 11: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Model-Based Trust Region Method

Katya Scheinberg (Cornell University) 11 / 37

Page 12: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Model-Based Trust Region Method

Katya Scheinberg (Cornell University) 12 / 37

Page 13: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Model-Based Trust Region Method

Katya Scheinberg (Cornell University) 13 / 37

Page 14: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Model-Based Trust Region Method

Katya Scheinberg (Cornell University) 14 / 37

Page 15: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Model-Based Trust Region Method

Katya Scheinberg (Cornell University) 15 / 37

Page 16: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Model-Based Trust Region Method

Katya Scheinberg (Cornell University) 16 / 37

Page 17: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Model-Based Trust Region Method

Katya Scheinberg (Cornell University) 17 / 37

Page 18: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Model-Based Trust Region Method

Katya Scheinberg (Cornell University) 18 / 37

Page 19: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Model-Based Trust Region Method

Shrinking and expanding trust region radius, exploiting curvature, efficient in terms of samples

Katya Scheinberg (Cornell University) 19 / 37

Page 20: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Direct Search

11307 function evaluations

Katya Scheinberg (Cornell University) 20 / 37

Page 21: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Random Search

3705 function evaluations

Katya Scheinberg (Cornell University) 21 / 37

Page 22: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Trust Region Method

69 function evaluations

Katya Scheinberg (Cornell University) 22 / 37

Page 23: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Active learning, generative models and derivative free optimizationoptimization

What does model-based derivative-free optimization do?

Using some ”labeled” data (x , f (x)), build a models m(x). What do we want from thatmodel m(x)? Quality? Simplicity?

Optimize m(x) or ”related function”, to obtain new potentially interesting data point. Whatdo we optimize?

Modify model (how?), repeat.

What do we need for convergence?

Katya Scheinberg (Cornell University) 23 / 37

Page 24: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Assumptions on models for convergence

For trust region, first-order convergence

‖∇f (xk )−∇mk (xk )‖ ≤ O(∆k ),

For trust region, second-order convergence

‖∇2f (xk )−∇2mk (xk )‖ ≤ O(∆k )

‖∇f (xk )−∇mk (xk )‖ ≤ O(∆2k )

For line search, first-order converegnce

‖∇f (xk )−∇mk (xk )‖ ≤ O(αk‖∇mk‖)

Intuition

In other words, model should have comparable Taylor expansion as the true function w.r.t. thestep size.

Katya Scheinberg (Cornell University) 24 / 37

Page 25: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Assumptions on models for convergence

For trust region, first-order convergence

‖∇f (xk )−∇mk (xk )‖ ≤ O(∆k ), w.p. 1− δ

For trust region, second-order convergence

‖∇2f (xk )−∇2mk (xk )‖ ≤ O(∆k ) w.p. 1− δ‖∇f (xk )−∇mk (xk )‖ ≤ O(∆2

k ) w.p. 1− δ

For line search, first-order converegnce

‖∇f (xk )−∇mk (xk )‖ ≤ O(αk‖∇mk‖) w.p. 1− δ

Intuition

In other words, model should have comparable Taylor expansion as the true function w.r.t. thestep size.

Katya Scheinberg (Cornell University) 24 / 37

Page 26: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Building models via linear interpolation

m(y) = f (x) + g(x)T (y − x) : m(y) = f (y), ∀y ∈ Y.

Katya Scheinberg (Cornell University) 25 / 37

Page 27: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Building models via linear interpolation

m(y) = f (x) + g(x)T (y − x) : m(y) = f (y), ∀y ∈ Y.

Let Y = {x + σy1, ..., x + σyn}, σ > 0,

FY =

f (x + σy1) − f (x)

.

.

.f (x + σyn) − f (x)

∈ Rn, MY =

yT1

.

.

.

yTn

∈ Rn×n

Katya Scheinberg (Cornell University) 25 / 37

Page 28: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Building models via linear interpolation

m(y) = f (x) + g(x)T (y − x) : m(y) = f (y), ∀y ∈ Y.

Let Y = {x + σy1, ..., x + σyn}, σ > 0,

FY =

f (x + σy1) − f (x)

.

.

.f (x + σyn) − f (x)

∈ Rn, MY =

yT1

.

.

.

yTn

∈ Rn×n

Model m(y) constructed to satisfy interpolation conditions:

σMYg = FY

Katya Scheinberg (Cornell University) 25 / 37

Page 29: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Building models via linear interpolation

m(y) = f (x) + g(x)T (y − x) : m(y) = f (y), ∀y ∈ Y.

Let Y = {x + σy1, ..., x + σyn}, σ > 0,

FY =

f (x + σy1) − f (x)

.

.

.f (x + σyn) − f (x)

∈ Rn, MY =

yT1

.

.

.

yTn

∈ Rn×n

Model m(y) constructed to satisfy interpolation conditions:

σMYg = FY

Theorem [Conn, Scheinberg & Vicente, 2008]

Let Y = {x, x + σy1, . . . , x + σyn} be set of interpolation points such that maxi ‖yi‖ ≤ 1 and that MY is nonsingular.Suppose that the function f has L-Lipschitz continuous gradients. Then,

‖∇m(x)−∇f (x)‖ ≤‖M−1Y ‖2

√nσL

2.

Cost: O(n3) (reduces to O(n2) if MY is orthornormal and O(n2) if MY = I )

Katya Scheinberg (Cornell University) 25 / 37

Page 30: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Quadratic Interpolation Models

m(y) = f (x) + g(x)T (y − x) +1

2(y − x)TH(x)(y − x) : m(y) = f (y), ∀y ∈ Y.

Katya Scheinberg (Cornell University) 26 / 37

Page 31: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Quadratic Interpolation Models

m(y) = f (x) + g(x)T (y − x) +1

2(y − x)TH(x)(y − x) : m(y) = f (y), ∀y ∈ Y.

Let Y = {x + σy1, ..., x + σyN}, σ > 0,

FY =

f (x + σy1) − f (x)

.

.

.f (x + σyN ) − f (x)

∈ RN , MY =

yT1 vec(y1y

T1 )

.

.

.

.

.

.

yTn vec(ynyTn )

∈ RN×N

Katya Scheinberg (Cornell University) 26 / 37

Page 32: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Quadratic Interpolation Models

m(y) = f (x) + g(x)T (y − x) +1

2(y − x)TH(x)(y − x) : m(y) = f (y), ∀y ∈ Y.

Let Y = {x + σy1, ..., x + σyN}, σ > 0,

FY =

f (x + σy1) − f (x)

.

.

.f (x + σyN ) − f (x)

∈ RN , MY =

yT1 vec(y1y

T1 )

.

.

.

.

.

.

yTn vec(ynyTn )

∈ RN×N

Model m(y) constructed to satisfy interpolation conditions:

σMY (g , vec(H)) = FY

Katya Scheinberg (Cornell University) 26 / 37

Page 33: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Quadratic Interpolation Models

m(y) = f (x) + g(x)T (y − x) +1

2(y − x)TH(x)(y − x) : m(y) = f (y), ∀y ∈ Y.

Let Y = {x + σy1, ..., x + σyN}, σ > 0,

FY =

f (x + σy1) − f (x)

.

.

.f (x + σyN ) − f (x)

∈ RN , MY =

yT1 vec(y1y

T1 )

.

.

.

.

.

.

yTn vec(ynyTn )

∈ RN×N

Model m(y) constructed to satisfy interpolation conditions:

σMY (g , vec(H)) = FY

Theorem [Conn, Scheinberg & Vicente, 2008]

Let Y = {x, x + σy1, . . . , x + σyn+n(n+1)/2} be set of interpolation points such that maxi ‖yi‖ ≤ 1 and that MY is

nonsingular. Suppose that the function f has L-Lipschitz continuous Hessians. Then,

‖∇m(x)−∇f (x)‖ ≤ O(‖M−1Y ‖2nσ

2L).

‖∇2m(x)−∇2f (x)‖ ≤ O(‖M−1Y ‖2nσL

).

Cost: O(n6)

Katya Scheinberg (Cornell University) 26 / 37

Page 34: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Interpolation model quality

Katya Scheinberg (Cornell University) 27 / 37

Page 35: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Model deterioration

Katya Scheinberg (Cornell University) 28 / 37

Page 36: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Some conclusions so far

Interpolation models allow for old points to be reused and hence are very economical interms of samples.

Linear algebra is expensive and more importantly can be ill-conditioned.

Can improve lin. alg. cost and conditioning by using pre-designed sample sets, but it is moreexpensive in terms of samples (e.g. FD needs n samples per gradient estimate).

What alternatives are there?

Katya Scheinberg (Cornell University) 29 / 37

Page 37: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Gaussian Smoothing

F (x) = Eε∼N (0,I )f (x + σε) =

∫Rn

f (x + σε)π(ε|0, I )dε

π(y |x ,Σ) is the pdf of N (x ,Σ) evaluated at y

F (x) is a Gaussian smoothed approximation to f (x)

∇F (x) =1

σEε∼N (0,I )f (x + σε)ε

Idea: Approximate ∇f (x) by a sample average approximation of ∇F (x)

g(x) =1

N∑i=1

f (x + σεi )εi

Katya Scheinberg (Cornell University) 30 / 37

Page 38: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Gaussian Smoothing

F (x) = Eε∼N (0,I )f (x + σε) =

∫Rn

f (x + σε)π(ε|0, I )dε

π(y |x ,Σ) is the pdf of N (x ,Σ) evaluated at y

F (x) is a Gaussian smoothed approximation to f (x)

∇F (x) =1

σEε∼N (0,I )f (x + σε)ε

Idea: Approximate ∇f (x) by a sample average approximation of ∇F (x)

g(x) =1

N∑i=1

f (x + σεi )εi

Issue: Variance →∞ as σ → 0

Katya Scheinberg (Cornell University) 30 / 37

Page 39: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Gaussian Smoothing

F (x) = Eε∼N (0,I )f (x + σε) =

∫Rn

f (x + σε)π(ε|0, I )dε

π(y |x ,Σ) is the pdf of N (x ,Σ) evaluated at y

F (x) is a Gaussian smoothed approximation to f (x)

∇F (x) =1

σEε∼N (0,I )(f (x + σε)−f (x))ε

Idea: Approximate ∇f (x) by a sample average approximation of ∇F (x)

g(x) =1

N∑i=1

(f (x + σεi )−f (x))εi

Katya Scheinberg (Cornell University) 30 / 37

Page 40: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Gaussian Smoothing

N = 1, theoretical analysis of convergence rates for convex problems

used in reinforcement learning, no theory, N is large

uses interpolation on top of sample average approximation

uniform distribution on a ball for online learning

uniform distribution on a ball for model-free LQR

Katya Scheinberg (Cornell University) 31 / 37

Page 41: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Analysis of Variance for Gaussian Smoothing

‖g(x)−∇f (x)‖ ≤ ‖g(x)−∇F (x)‖︸ ︷︷ ︸sample average error

+ ‖∇F (x)−∇f (x)‖︸ ︷︷ ︸smoothing error

≤ r +√nσL

Theorem [Berahas, Cao, S., 2019]

Suppose that the function f (x) has L-Lipschitz continuous gradients. Let g(x) denote the GSG approximation to ∇f (x). If

N ≥1

δr2

(3n‖∇f (x)‖2 +

n(n2 + 6n + 8)L2σ2

4

).

then, ‖g(x)−∇f (x)‖ ≤ r +√

nσL.

with probability at least 1− δ.

Essentially N ∼ 3n

Katya Scheinberg (Cornell University) 32 / 37

Page 42: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Gradient Approximation Accuracy

numerical experiment setup and results:

f (x) =

n/2∑i=1

M sin(x2i−1) + cos(x2i )

+L−M

2nxT 1n×nx ,

which has ‖∇f (0)‖ =√

n2M. We use n = 20, M = 1, L = 2, σ = 0.01, and N = 4n for the

smoothing methods.

Katya Scheinberg (Cornell University) 33 / 37

Page 43: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Gradient Approximation Accuracy

Katya Scheinberg (Cornell University) 34 / 37

Page 44: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Algorithm Performance

More&Wild problems set (53 smooth problems)

1 2 4 8 16 32 64 128Performance Ratio

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FFD (LBFGS)CFD (LBFGS)GSG (SD,n)BSG (LBFGS,4n)DFOTR

(a) τ = 10−1

1 2 4 8 16 32 64Performance Ratio

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FFD (LBFGS)CFD (LBFGS)GSG (SD,n)BSG (LBFGS,4n)DFOTR

(b) τ = 10−3

1 2 4 8 16 32 64Performance Ratio

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FFD (LBFGS)CFD (LBFGS)GSG (SD,n)BSG (LBFGS,4n)DFOTR

(c) τ = 10−5

0 200 400 600 800Number of Function Evaluations

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FFD (LBFGS)CFD (LBFGS)GSG (SD,n)BSG (LBFGS,4n)DFOTR

(d) τ = 10−1

0 100 200 300 400 500 600 700Number of Function Evaluations

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FFD (LBFGS)CFD (LBFGS)GSG (SD,n)BSG (LBFGS,4n)DFOTR

(e) τ = 10−3

0 200 400 600 800 1000 1200 1400Number of Function Evaluations

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FFD (LBFGS)CFD (LBFGS)GSG (SD,n)BSG (LBFGS,4n)DFOTR

(f) τ = 10−5

Performance and data profiles for best variant of each method. Top row: performance profiles; Bottom row:data profiles.

Katya Scheinberg (Cornell University) 35 / 37

Page 45: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Algorithm Performance

FD = forward finite differenceLIOD = linear interpolation of orthogonal directionsLS = (backtracking) linear searchGSG = Gaussian smooth gradient

0 200 400 600 800

Iterations

-50

0

50

100

150

200

250

300

350

400

Reward

Swimmer

FDLIODLIOD (LS)GSG

0 1000 2000 3000 4000 5000

Iterations

-8000

-6000

-4000

-2000

0

2000

4000

6000

Reward

HalfCheetah

FDLIODLIOD (LS)GSG

0 500 1000 1500 2000

Iterations

-10 5

-10 4

-10 3

-10 2

-10 1

-10 0

Reward

Reacher

FDLIODLIOD (LS)GSG

Reinforcement learning tasks: Swimmer (left), HalfCheetah (center), Reacher (right).

Katya Scheinberg (Cornell University) 36 / 37

Page 46: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Conclusions

Model based derivative free methods are efficient and theoretically sound

Select the type of models according to application but make sure theory applies

Use randomization only when necessary, as it can slow down convergence

Andrew R Conn, Katya Scheinberg, and Luis N Vicente. Introduction to Derivative-freeOptimization MPS-SIAM Optimization series. SIAM, Philadelphia, USA, 2008.

Albert Berahas, Liyuan Cao, Krzyzstof Choromanski, Katya Scheinberg. A theoretical andempirical comparison of gradient approximations in derivative-free optimization, arXivpreprint arXiv:1904.11585,1905.01332, 2019.

Jeffrey Larson, Matt Menickelly, and Stefan M Wild. Derivative-free optimization methodsarXiv preprint arXiv:1904.11585, 2019.

Thank you!

Katya Scheinberg (Cornell University) 37 / 37

Page 47: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and

Conclusions

Model based derivative free methods are efficient and theoretically sound

Select the type of models according to application but make sure theory applies

Use randomization only when necessary, as it can slow down convergence

Andrew R Conn, Katya Scheinberg, and Luis N Vicente. Introduction to Derivative-freeOptimization MPS-SIAM Optimization series. SIAM, Philadelphia, USA, 2008.

Albert Berahas, Liyuan Cao, Krzyzstof Choromanski, Katya Scheinberg. A theoretical andempirical comparison of gradient approximations in derivative-free optimization, arXivpreprint arXiv:1904.11585,1905.01332, 2019.

Jeffrey Larson, Matt Menickelly, and Stefan M Wild. Derivative-free optimization methodsarXiv preprint arXiv:1904.11585, 2019.

Thank you!

Katya Scheinberg (Cornell University) 37 / 37