Top Banner
CHAPTER 15 CHAPTER 15 S S IMULATION IMULATION - - B B ASED ASED O O PTIMIZATION PTIMIZATION II II : : S S TOCHASTIC TOCHASTIC G G RADIENT AND RADIENT AND S S AMPLE AMPLE P P ATH ATH M M ETHODS ETHODS •Organization of chapter in ISSO –Introduction to gradient estimation –Interchange of derivative and integral –Gradient estimation techniques •Likelihood ratio/score function (LR/SF) •Infinitesimal perturbation analysis (IPA) –Optimization with gradient estimates Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall
15

Organization of chapter in ISSO Introduction to gradient estimation

Feb 05, 2016

Download

Documents

zlhna

Slides for Introduction to Stochastic Search and Optimization ( ISSO ) by J. C. Spall. CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS. Organization of chapter in ISSO Introduction to gradient estimation - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Organization of chapter in  ISSO Introduction to gradient estimation

CHAPTER 15 CHAPTER 15 SSIMULATIONIMULATION--BBASEDASED OOPTIMIZATIONPTIMIZATION IIII: :

SSTOCHASTICTOCHASTIC GGRADIENT AND RADIENT AND SSAMPLE AMPLE PPATHATH MMETHODSETHODS

•Organization of chapter in ISSO–Introduction to gradient estimation–Interchange of derivative and integral–Gradient estimation techniques

•Likelihood ratio/score function (LR/SF)•Infinitesimal perturbation analysis (IPA)

–Optimization with gradient estimates–Sample path method

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

Page 2: Organization of chapter in  ISSO Introduction to gradient estimation

15-2

• Estimate the gradient of the loss function with respect to parameters for optimization from simulation outputs

where L() is a scalar-valued loss function to minimize and is a p-dimensional vector of parameters

• Essential properties of gradient estimates— Unbiased:

— Small variance

L

g

Issues in Gradient EstimationIssues in Gradient Estimation

ˆ( ) ( )E g g

( )( ) Lg

Page 3: Organization of chapter in  ISSO Introduction to gradient estimation

15-3

Two Types of ParametersTwo Types of Parameters

where V is the random effect in the system, is the probability density function of V

• Distributional parameters D: Elements of that enter via their effect on probability distribution of V. For example, if scalar V has distribution N(,2), then and 2 are distributional parameters

• Structural parameters S: Elements of that have effects directly on the loss function (via Q)

• Distinction not always obvious

|pV

( ) , , |S DL E Q Q p dVV

Page 4: Organization of chapter in  ISSO Introduction to gradient estimation

15-4

Interchange of Derivative and IntegralInterchange of Derivative and Integral• Unbiased gradient estimations using only one simulation

require the interchange of derivative and integral:

• Above generally not true. Technical conditions needed for validity:— Q ·pV and are continuous—

• Above has implications in practical applications

Q pL Q p d d, |

, | VV

?

/Q pV

0 0, | ,Q p q q dV

1 1, | ,Q p q q dV

Page 5: Organization of chapter in  ISSO Introduction to gradient estimation

15-5

A General Form of Gradient EstimateA General Form of Gradient Estimate• Assume that all the conditions required for the exchange of

derivative and integral are satisfied,

• Hence, an unbiased gradient estimate can be obtained as

1

| ,( ) , |

| ,, | |

log | ,,

p QQ p d

p QQ p p d

p QE Q

VV

VV V

V

g

V VV

p Q

Q

log | ,ˆ( ) , V V Vg V

Output from one Output from one simulationsimulation!!

Page 6: Organization of chapter in  ISSO Introduction to gradient estimation

15-6

Two Gradient Estimates: LR/SF and IPATwo Gradient Estimates: LR/SF and IPA

• Likelihood Ratio/ Score Function (LR/SF): only distributional parameters

• Infinitestimal Perturbation Analysis (IPA): only structural parameters

log | ,

ˆ( ) ,p Q

Q V V Vg V

pure LR/SFpure LR/SF pure IPApure IPA

LR SF

pQ/

log |ˆ ( ) ,

V V

g V

IPA

Q ,ˆ ( )

V

g

Page 7: Organization of chapter in  ISSO Introduction to gradient estimation

15-7

Comparison of Pure LR/SF and IPAComparison of Pure LR/SF and IPA• In practice, neither extreme (LR/SF or IPA) may provide a

framework for reasonable implementation:– LR/SF may require deriving a complex distribution

function starting from U(0,1)– IPA may lead to intractable Q/with a complex Q(,V)

• Pure LR/SF gradient estimate tend to suffer from large variance (variance can grow with the number of components in V)

• Pure IPA may result in a Q(,V) that fails to meet the conditions for valid interchange of derivative and integral. Hence can lead to biased gradient estimate.

• In many cases where IPA is feasible, it leads to low variance gradient estimate

Page 8: Organization of chapter in  ISSO Introduction to gradient estimation

15-8

• Let Z be exponential random variable with mean . That is . Define L E(Z) . Then L/1.

— LR/SF estimate: V Z; Q(,V) V.

— IPA estimate: V U(0,1); Q(,V) logV (Z logV).

• Both of LR/SF and IPA estimators are unbiased

A Simple Example: Exponential Distribution A Simple Example: Exponential Distribution

/| zZp z e

/

log |ˆ ( ) 1V

LR SF

p V V Vg V

,

ˆ ( ) logIPA

Q Vg V

Page 9: Organization of chapter in  ISSO Introduction to gradient estimation

15-9

• Use the gradient estimates in the root-finding stochastic approximation (SA) algorithm to minimize the loss function L() E[Q(,V)]: Find such that g() 0 based on simulation outputs

• A general root-finding SA algorithm:

where ak is the step size with

• If Yk is unbiased and has bounded variance (and other appropriate assumptions hold), then (a.s.)

Stochastic Optimization with Stochastic Optimization with Gradient EstimateGradient Estimate

k k k k ka1ˆ ˆ ˆ( ) Y

0, 0,k k ka a a

an estimate ofan estimate of kˆ( )g

ˆk

Page 10: Organization of chapter in  ISSO Introduction to gradient estimation

15-10

Simulation-Based OptimizationSimulation-Based Optimization• Use gradient estimate derived from one simulation run in the

iteration of SA:

where Vk is the realization of V from a simulation run with parameter set at

ˆ ˆ

,log |ˆ ˆ , ,( )( ) ( )k k

kV kk k k k

QpQVVY V

ˆk

run one simulation with to obtain Vk

ˆk

derive gradient estimate from Vk

iterate SA with the gradient estimate

ˆk

k

Page 11: Organization of chapter in  ISSO Introduction to gradient estimation

15-11

Example: Experimental ResponseExample: Experimental Response(Examples 15.4 and 15.5 in (Examples 15.4 and 15.5 in ISSOISSO))

• Let {Vk} be i.i.d. randomly generated binary (on-off) stimuli with “on” probability . Assume Q(,,Vk) represents negative of specimen response, where is design parameter. Objective is to design experiment to maximize the response (i.e., minimize Q) by selecting values for and .

• Gradient estimate: [, ]T;

where and denotes derivative w.r.t. x

1| 1 , 0 or1Vp

ˆ ˆ, ,ˆ ( 1)

ˆ ,

( ) ( )( )

( )

kk k k k

k k

k k

VQ QY

Q

V V

V

ˆ ˆ ˆ,

T

k k k xQ

Page 12: Organization of chapter in  ISSO Introduction to gradient estimation

15-12

Experimental Response (continued)Experimental Response (continued)• Specific response function:

where is a structural parameter, but is both a distributional and structural parameter. Then:

2, (1 )Q V V

1| 1 , 0 or 1.Vp

2 2( ) ,L E Q V

1 3; 1 3

1 3/

( ) //

L

Page 13: Organization of chapter in  ISSO Introduction to gradient estimation

15-13

Search Path in Experimental Response ProblemSearch Path in Experimental Response Problem

Page 14: Organization of chapter in  ISSO Introduction to gradient estimation

15-14

Sample Path MethodSample Path Method• Sample path method based on reusingreusing a fixed set of

simulation runs• Method based on minimizing rather than L()

— represents sample mean of N simulation runs• If N is large, then minimum of is close to minimum of

L() (under conditions)• Optimization problem with is effectively deterministic

— Can use standard nonlinear programming— IPA and/or LR/SF methods of gradient estimation still

relevant• Generally need to choose a fixed value of (reference value)

to produce the N simulation runs• Choice of reference value has impact on for finite N

( )NL ( )NL

( )NL

( )NL

( )NL

Page 15: Organization of chapter in  ISSO Introduction to gradient estimation

15-15

Accuracy of Sample Path MethodAccuracy of Sample Path Method• Interested in accuracy of sample path method in seeking true

optimal (minimum of L())• Let represent minimum of surrogate loss• Let denote final solution from nonlinear programming

method• Hence, error in estimate is due to two sources:

— Error in nonlinear programming solution to finding — Difference in and

• Triangle inequality can be used to provide bound to overall error:

• Sometimes numerical values can be assigned to two right-hand terms in triangle inequality

( )NL *N

*N

*N

N Nˆ ˆ