Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Tutorial: A Unified Framework for

Optimization under Uncertainty

Enterprisewide Optimization GroupCarnegie Mellon University

September 11, 2018

Warren B. Powell

Princeton UniversityDepartment of Operations Research

and Financial Engineering

© 2018 Warren B. Powell, Princeton University

Ad-click bidding

Roomsage.com

Revenue management

Earning vs. learning

» You want to maximize

revenues, but you do not

know how demand

responds to price.

You earn the most with

prices near the middle, but

you do not learn anything.

You learn the most by sampling

endpoints, but then you do not

earn anything.

Learning problems

Health sciences

» Sequential design of

experiments for drug discovery

» Drug delivery – Optimizing the

design of protective

membranes to control drug

release

» Medical decision making –

Optimal learning for medical

treatments.

Drug discovery

Designing molecules

» X and Y are sites where we can hang substituents to change the

behavior of the molecule. We approximate the performance using

a linear belief model:

0

ij ij

sites i substituents j

Y X

» How to sequence experiments to

learn the best molecule as quickly

as possible?

Reg

ret

Ride sharing

Uber/Lyft

» Provides real-time, on-demand

transportation.

» Drivers are encouraged to enter or leave

the system using pricing signals and

informational guidance.

Decisions:

» How to price to get the right balance of

drivers relative to customers.

» Real-time management of drivers.

» Policies (rules for managing drivers,

customers, …)

Matching buyers with sellers

Now we have a logistic curve for

each origin-destination pair (i,j)

Number of offers for each (i,j) pair

is relatively small.

Need to generalize the learning

across hundreds to thousands of

markets.

0

0( , | )

1

aij ij ij

aij ij ij

p a

Y

p a

eP p a

e

Buyer Seller

Offered price

Emergency storm response

Hurricane Sandy

» Once in 100 years?

» Rare convergence of events

» But, meteorologists did an

amazing job of forecasting

the storm.

The power grid

» Loss of power creates

cascading failures (lack of

fuel, inability to pump water)

» How to plan?

» How to react?

Meeting variability with portfolios of generation

with mixtures of dispatchability

Storage applications

How much energy to store in a battery to handle the

volatility of wind and spot prices to meet demands?

Modeling

Before we can solve complex problems, we have

to know how to think about them.

The biggest challenge when making decisions

under uncertainty is modeling.

Min E {cx}Ax = b

x > 0

Mathematician

Software

Organize class

libraries, and set up

communications and

databases

Modeling

For deterministic problems, we speak the language

of mathematical programming

» Linear programming:

» For time-staged problems

min x cx

0

Ax b

x

1 1

0

t t t t t

t t t

t

A x B x b

D x u

x

0 ,...,

0

minT

T

x x t t

t

c x

Arguably Dantzig’s biggest

contribution, more so than the

simplex algorithm, was his

articulation of optimization

problems in a standard format,

which has given algorithmic

researchers a common

language.

Stochastic

programming

Markov

decision

processes

Reinforcement

learning

Optimal

control

Model

predictive

control

Robust

optimization

Approximate

dynamic

programming

Online

computation

Simulation

optimization

Stochastic

search

Decision

analysis

Stochastic

control

Simulation

optimizationDynamic

Programming

and

control

Optimal

learning

Bandit

problems

Stochastic

programming

Markov

decision

processes

Reinforcement

learning

Optimal

control

Model

predictive

control

Robust

optimization

Approximate

dynamic

programming

Online

computation

Simulation

optimization

Stochastic

search

Decision

analysis

Stochastic

control

Simulation

optimizationDynamic

Programming

and

control

Optimal

learning

Bandit

problems

Outline

Elements of a dynamic model

Modeling uncertainty

Designing policies

The four classes of policies

From deterministic to stochastic optimization

Outline



Designing policies



Modeling dynamic systems

All sequential decision problems can be modeled

using five core components:

» State variables• What do we need to know at time t?

» Decision variables• What are our decisions?

» Exogenous information• What do we learn for the first time between t and t+1?

» Transition function• How do the state variables evolve over time?.

» Objective function• What are our performance metrics?

Modeling dynamic problems

The state variable:

Controls community

"Information state"

Operations research/MDP/Computer science

, , System state, where:

Resource state (physical state)

Loca

t

t t t t

t

x

S R I B

R

tion/status of truck/train/plane

Energy in storage

Information state

Prices

Weather

Belief state ("state of knowl

t

t

I

B

edge")

Belief about performance of a drug or catalyst

Belief about the status of equipment

The state variable

My definition of a state variable:

» The first depends on a policy. The second depends

only on the problem (and includes the constraints).

» Using either definition, all properly modeled problems

are Markovian!


Decisions:

Markov decision processes/Computer science

Discrete action

Control theory

Low-dimensional continuous vector

Operations research

Usually a discrete or continuous but high-dimensional

t

t

t

a

u

x

vector of decisions.

At this point, we do not specify to make a decision.

Instead, we define the function ( ) (or ( ) or ( )),

where specifies the type of policy. " " carries information

about the type of functi

how

X s A s U s

.on , and any tunable parameters ff

The decision variables

Styles of decisions

» Binary

» Finite

» Continuous scalar

» Continuous vector

» Discrete vector

» Categorical

0,1x X

1,2,...,x X M

,x X a b

1( ,..., ), K kx x x x R

1( ,..., ), K kx x x x Z

1( ,..., ), is a category (e.g. red/green/blue)I ix a a a


Exogenous information:

New information that first became known at time

ˆ ˆ ˆˆ = , , ,

ˆ Equipment failures, delays, new arrivals

New drivers being hired to the network

ˆ New customer demands

t

t t t t

t

t

W t

R D p E

R

D

ˆ Changes in prices

ˆ Information about the environment (temperature, ...)

t

t

p

E

Note: Any variable indexed by t is known at time t. This convention,

which is not standard in control theory, dramatically simplifies the

modeling of information.

1 2Below, we let represent a sequence of actual observations , ,....

refers to a sample realization of the random variable .t t

W W

W W


The transition function

1 1

1 1

1 1

1 1

( , , )

ˆ Inventories

ˆ Spot prices

ˆ Market demands

M

t t t t

t t t t

t t t

t t t

S S S x W

R R x R

p p p

D D D

Also known as the:

“System model”

“State transition model”

“Plant model”

“Plant equation”

“Transition law”

“State equation”

“Transfer function”

“Transformation function”

“Law of motion”

“Model”

For many applications, these equations are unknown. This

is known as “model-free” dynamic programming.

Objective functions

» Cumulative reward (“online learning”)

• Policies have to work well over time.

» Final reward (“offline learning”)

• We only care about how well the final decision 𝑥𝜋,𝑁 works.

» Risk

1 0

0

max , ( ), |T

t t t t t

t

C S X S W S

E

Modeling stochastic, dynamic problems

,

0ˆmax ( , ) |NF x W S

E

0 0 0 1 1 1 0max ( , ( )), ( , ( )),..., ( , ( )) |T T TC S X S C S X S C S X S S

The complete model:

» Objective function

• Cumulative reward (“online learning”)

• Final reward (“offline learning”)

• Risk:

» Transition function:

» Exogenous information:

1 0

0

max , ( ), |T

t t t t t

t

C S X S W S

E


1 1, , ( )M

t t t tS S S x W

0 1 2, , ,..., TS W W W

,

0ˆmax ( , ) |NF x W S

E

0 0 0 1 1 1 0max ( , ( )), ( , ( )),..., ( , ( )) |T T TC S X S C S X S C S X S S

The modeling process

Modeling real applications

» I conduct a conversation with a domain expert to fill in

the elements of a problem:

State

variables

Decision

variables

New

information

Transition

function

Objective

function

What we need to know

(and only what we need)

What we control

What we didn’t know

when we made our decision

How the state variables evolve

Performance metrics

Outline



Designing policies




Classes of uncertainty

» Observational uncertainty

» Prognostic uncertainty (forecasting)

» Experimental noise/variability

» Transitional uncertainty

» Inferential uncertainty

» Model uncertainty

» Systematic exogenous uncertainty

» Control/implementation uncertainty

» Algorithmic noise

» Goal uncertainty

Modeling uncertainty in the context of stochastic optimization is a

relatively untapped area of research.

Outline



Designing policies



Designing policies

We have to start by describing what we mean by a

policy.

» Definition:

A policy is a mapping from a state to an action.

… any mapping.

How do we search over an arbitrary space of

policies?

Designing policies

“Policies” and the English language

Behavior Habit Procedure

Belief Laws/bylaws Process

Bias Manner Protocols

Commandment Method Recipe

Conduct Mode Ritual

Convention Mores Rule

Culture Patterns Style

Customs Plans Technique

Dogma Policies Tenet

Etiquette Practice Tradition

Fashion Prejudice Way of life

Formula Principle

Designing policies

Two fundamental strategies for finding policies:

1) Policy search – Search over a class of functions for

making decisions to optimize some metric.

2) Lookahead approximations – Approximate the impact

of a decision now on the future.

0( , )0

max , ( | ) |f f

T

t t t tf Ft

E C S X S S

*

' ' ' ' 1

' 1

( ) arg max ( , ) max ( , ( )) | | ,t

T

t t x t t t t t t t t t

t t

X S C S x C S X S S S x

E E

Designing policies

Policy search:

1a) Policy function approximations (PFAs)• Lookup tables

– “when in this state, take this action”

• Parametric functions

– Order-up-to policies: if inventory is less than s, order up to S.

– Affine policies -

– Neural networks

• Locally/semi/non parametric

– Requires optimizing over local regions

1b) Cost function approximations (CFAs)• Optimizing a deterministic model modified to handle uncertainty

(buffer stocks, schedule slack)

( )( | ) arg max ( , | )

t t

CFA

t t txX S C S x

X

( | )PFA

t tx X S

( | ) ( )PFA

t t f f t

f F

x X S S

Designing policies

Lookahead approximations – Approximate the impact of a

decision now on the future:

2a) Approximating the value of being in a state (VFA):

2b) Direct lookahead (DLA)

Optimal policy:

Approximate policy – solve an approximate lookahead model:

*

1 1

1 1

( ) arg max ( , ) ( ) | ,

( ) arg max ( , ) ( ) | ,

arg max ( , ) ( )

t

t

t

t t x t t t t t t

VFA

t t x t t t t t t

x x

x t t t t

X S C S x V S S x

X S C S x V S S x

C S x V S

E

E

' ' ' , 1

' 1

( ) arg max ( , ) max ( , ( )) | | ,t

t HDLA

t t x t t tt t tt t t tt t

t t

X S C S x C S X S S S x

E E

*

' ' ' 1

' 1

( ) arg max ( , ) max ( , ( )) | | ,

t

T

t t x t t t t t t t t

t t

X S C S x C S X S S S xE E

1) Policy function approximations (PFAs)

» Lookup tables, rules, parametric/nonparametric functions

2) Cost function approximation (CFAs)

»

3) Policies based on value function approximations (VFAs)

»

4) Direct lookahead policies (DLAs)

» Deterministic lookahead/rolling horizon proc./model predictive control

» Chance constrained programming

» Stochastic lookahead /stochastic prog/Monte Carlo tree search

» “Robust optimization”

Four (meta)classes of policies

( )( | ) arg max ( , | )

t t

CFA

t t txX S C S x

X

( ) arg max ( , ) ( , )t

VFA x x

t t x t t t t t tX S C S x V S S x

' '

' 1

( ) arg max ( , ) ( ) ( ( ), ( ))t

TLA S

t t tt tt tt tt

t t

X S C S x p C S x

xtt,xt ,t+1

,...,xt ,t+T

,

' '( ),...,

' 1

( ) arg max min ( , ) ( ( ), ( ))ttt t t H

TLA RO

t t tt tt tt ttw Wx x

t t

X S C S x C S w x w

[ ( )] 1t tP A x f W

Poli

cy s

earc

h

,

' ',...,

' 1

( ) arg max ( , ) ( , )

tt t t H

TLA D

t t tt tt tt ttx x

t t

X S C S x C S x

Look

ahea

dap

pro

xim

atio

ns




»


»







( )( | ) arg max ( , | )

t t

CFA

t t txX S C S x

X

,

' ',...,

' 1

( ) arg max ( , ) ( , )

tt t t H

TLA D

t t tt tt tt ttx x

t t

X S C S x C S x

( ) arg max ( , ) ( , )t

VFA x x


' '

' 1

( ) arg max ( , ) ( ) ( ( ), ( ))t

TLA S

t t tt tt tt tt

t t

X S C S x p C S x

xtt,xt ,t+1

,...,xt ,t+T

,

' '( ),...,

' 1


TLA RO


t t

X S C S x C S w x w

[ ( )] 1t tP A x f W

Funct

ion a

ppro

x.




»


»







( )( | ) arg max ( , | )

t t

CFA

t t txX S C S x

X

,

' ',...,

' 1

( ) arg max ( , ) ( , )

tt t t H

TLA D

t t tt tt tt ttx x

t t

X S C S x C S x

( ) arg max ( , ) ( , )t

VFA x x


' '

' 1

( ) arg max ( , ) ( ) ( ( ), ( ))t

TLA S

t t tt tt tt tt

t t

X S C S x p C S x

xtt,xt ,t+1

,...,xt ,t+T

,

' '( ),...,

' 1


TLA RO


t t

X S C S x C S w x w

[ ( )] 1t tP A x f W

Imb

edd

ed o

pti

miz

atio

n

Learning problems

Classes of learning problems in stochastic

optimization

1) Approximating the objective

𝐹(𝑥|𝜃) ≈ 𝔼𝐹(𝑥,𝑊).

2) Designing a policy 𝑋𝜋 𝑆 𝜃 .

3) A value function approximation

𝑉𝑡(𝑆𝑡|𝜃) ≈ 𝑉𝑡(𝑆𝑡).

4) Designing a cost function approximation:• The objective function 𝐶𝜋 𝑆𝑡 , 𝑥𝑡|𝜃 .

• The constraints 𝑋𝜋(𝑆𝑡|𝜃)

5) Approximating the transition function

𝑆𝑀(𝑆𝑡 , 𝑥𝑡 ,𝑊𝑡+1|𝜃) ≈ 𝑆𝑀(𝑆𝑡 , 𝑥𝑡 ,𝑊𝑡+1)

Approximation strategies

Approximation strategies» Lookup tables

• Independent beliefs

• Correlated beliefs

» Linear parametric models• Linear models

• Sparse-linear

• Tree regression

» Nonlinear parametric models• Logistic regression

• Neural networks

» Nonparametric models• Gaussian process regression

• Kernel regression

• Support vector machines

• Deep neural networks

Learning challenges

The learning challenge

From big (batch) data… … to recursive learning

Learning challenges

Variable-dimensional learning

Outline



Designing policies



Outline


» Policy function approximations (PFAs)

» Cost function approximations (CFAs)

» Value function approximations (VFAs)

» Direct lookahead policies (DLAs)

» A hybrid lookahead/CFA

Outline







Policy function approximations

Battery arbitrage – When to charge, when to

discharge, given volatile LMPs

0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71

Grid operators require that batteries bid charge and

discharge prices, an hour in advance.

We have to search for the best values for the policy

parameters

DischargeCharge

Charge Discharge and .



Our policy function might be the parametric

model (this is nonlinear in the parameters):charge

charge discharge

charge

1 if

( | ) 0 if

1 if

t

t t

t

p

X S p

p

Energy in storage:

Price of electricity:


Finding the best policy

» We need to maximize

» We cannot compute the expectation, so we run simulations:

DischargeCharge

0

max ( ) , ( | )T

t

t t t

t

F C S X S

E

Outline







Cost function approximations

Lookup table

» We can organize potential catalysts into groups

» Scientists using domain knowledge can estimate

correlations in experiments between similar catalysts.

51


Correlated beliefs: Testing one material teaches us about other

materials

1 2 3 4 4 5


Cost function approximations (CFA)

» Upper confidence bounding

» Interval estimation

» Boltzmann exploration (“soft max”)• Choose x with probability:

log( | ) arg max

UCB n UCB n UCB

x x n

x

nX S

N

( | ) arg maxIE n IE n IE n

x x xX S

0

n

xz

'

'

( )

nx

nx

n

x

x

eP

e

53


Picking 𝜃𝐼𝐸 = 0 means we are evaluating each choice

at the mean.

1 2 3 4 4 5

54


Picking 𝜃𝐼𝐸 = 2 means we are evaluating each choice

at the 95th percentile.

1 2 3 4 4 5


Optimizing the policy

» We optimize 𝜃𝐼𝐸 to maximize:

where

Notes:

» This can handle any belief model,

including correlated beliefs, nonlinear

belief models.

» All we require is that we be able to

simulate a policy.

( | ) arg max ( , )n IE n IE n IE n n n n

x x x x xx X S S

,max ( ) ,IE

IE NF F x W

E

Reg

ret


Other applications

» Airlines optimizing schedules with schedule slack to

handle weather uncertainty.

» Manufacturers using buffer stocks to hedge against

production delays and quality problems.

» Grid operators scheduling extra generation capacity in

case of outages.

» Adding time to a trip planned by Google maps to

account for uncertain congestion.

Outline







Value function approximations

Q-learning (for discrete actions)

» But what if the action a is a vector?

1

'

1

1 1

ˆ ( , ) ( , ) max ( ', ')

ˆ( , ) (1 ) ( , ) ( , )

n n n n n n

a

n n n n n n n n n

n n

q s a r s a Q s a

Q s a Q s a q s a

g

a a

-

-

- -

= +

= - +

Blood management

Managing blood inventories

Blood management

Managing blood inventories over time

t=0

0S

1 1ˆ ˆ,R D

1S

Week 1

1x

2 2ˆ ˆ,R D

2S

Week 2

2x

2

xS

3 3ˆ ˆ,R D

3S3x

Week 3

3

xS

t=1 t=2 t=3

Week 0

0x

O-,1

O-,2

O-,3

AB+,2

AB+,3

O-,0

,ˆ

t ABD

AB+,0

AB+,1

AB+,2

O-,0

O-,1

O-,2

,( ,0)t ABR

,( ,1)t ABR

,( ,2)t ABR

,( ,0)t OR

,( ,1)t OR

,( ,2)t OR

,ˆ

t ABD

,ˆ

t AD

,ˆ

t ABD

,ˆ

t ABD

,ˆ

t ABD

,ˆ

t ABD

AB+

AB-

A+

A-

B+

B-

O+

O-

x

tR

AB+,0

AB+,1

,ˆ

t ABD

Satisfy a demand Hold

tS ˆ , t tR D

AB+,0

AB+,1

AB+,2

tR

O-,0

O-,1

O-,2

x

tR

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

1,ˆ

t ABR

1tR

1,ˆ

t OR

ˆtD

,( ,0)t ABR

,( ,1)t ABR

,( ,2)t ABR

,( ,0)t OR

,( ,1)t OR

,( ,2)t OR

AB+,0

AB+,1

AB+,2

tR x

tR

O-,0

O-,1

O-,2

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

ˆtD

,( ,0)t ABR

,( ,1)t ABR

,( ,2)t ABR

,( ,0)t OR

,( ,1)t OR

,( ,2)t OR

( )tF R

AB+,0

AB+,1

AB+,2

tR x

tR

O-,0

O-,1

O-,2

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

ˆtD

,( ,0)t ABR

,( ,1)t ABR

,( ,2)t ABR

,( ,0)t OR

,( ,1)t OR

,( ,2)t OR

Solve this as a

linear program.

( )tF R

AB+,0

AB+,1

AB+,2

tR x

tR

O-,0

O-,1

O-,2

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

ˆtD

Dual variables give

value additional

unit of blood..

Duals

,( ,0)t̂ AB

,( ,1)t̂ AB

,( ,2)t̂ AB

,( ,0)t̂ O

,( ,1)t̂ O

,( ,2)t̂ O

Updating the value function approximation

Estimate the gradient at

,( ,2)

n

t ABR

,( ,2)ˆn

t AB

n

tR

( )tF R


Update the value function at

,

1

x n

tR

1

1 1( )n x

t tV R

,

1

x n

tR

,( ,2)ˆn

t AB

( )tF R

,( ,2)

n

t ABR


Update the value function at ,

1

x n

tR

,( ,2)ˆn

t AB

,

1

x n

tR

1

1 1( )n x

t tV R


Update the value function at ,

1

x n

tR

,

1

x n

tR

1

1 1( )n x

t tV R

1 1( )n x

t tV R

Exploiting concavity

Derivatives are used to estimate a piecewise linear approximation

( )t tV R

tR

Iterative learning

t

Iterative learning

Iterative learning

Iterative learning

1200000

1300000

1400000

1500000

1600000

1700000

1800000

1900000

0 100 200 300 400 500 600 700 800 900 1000

Ob

jecti

ve fu

ncti

on

Iterations

Approximate dynamic programming

… a typical performance graph.

Outline







Lookahead policies

Planning your next chess move:

» You put your finger on the piece while you think about

moves into the future. This is a lookahead policy,

illustrated for a problem with discrete actions.

Lookahead policies

Decision trees:

Lookahead policies

Modeling lookahead policies

» Lookahead policies solve a lookahead model, which is an

approximation of the future.

» It is important to understand the difference between the:

• Base model – this is the model we are trying to solve by finding

the best policy. This is usually some form of simulator.

• The lookahead model, which is our approximation of the future

to help us make better decisions now.

» The base model is typically a simulator, or it might be the

real world.

XX

X

X

X

X

callcall

call

call

call

call

call

0.26 0.38 0.78 0.6 0.6

0

0

0

0.02

0.03

0.04

0.04

0.05

0.064

0.0640.064 0.064

0.502 0.502

0.502 0.502 0.502

0.603 0.603

0.670.670.05

0.05

0.5030.5

0.76

0.5030.503

0.540.54

0.503

0.5030.503


XX

X

X

X

X

0.0 0.0 0.0 0.0 0.0

0

0

00.032

0.048

0.05

60.056

0.08

0.0604

0.06040.0604 0.0604

0.51 0.51

0.51 0.51 0

0.62 0.62

00.990.08

0.08

0.5030.5

0.76

0.5030.503

0.540.54

0.503

0.5030.503


callcall

call

call

call

call

call

XX

X

X

X

X

0.0 0.0 0.0 0.0 0.0

0

0

00.0

0.0

0.0

0.0

0.06

0.060.06 0.06

0 0

0 0

0 0

000.0

0.0

0.5030.5

0.76

0.5030.503

0.540.54

0.503

0.5030.503

00.0


callcall

call

call

call

call

call

Decision Outcome Decision Outcome Decision

Lookahead policies

Monte Carlo tree search:

C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis and S.

Colton, “A survey of Monte Carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in Games,

vol. 4, no. 1, pp. 1–49, March 2012.

Outline







Parametric cost function approximation

An energy storage problem:


Forecasts evolve over time as new information arrives:

Actual

Rolling forecasts,

updated each

hour.

Forecast made at

midnight:


Benchmark policy – Deterministic lookahead


Parametric cost function approximations

» Replace the constraint

with:

» Lookup table modified forecasts (one adjustment term for

each time in the future):

» Exponential function for adjustments (just two parameters)

» Constant adjustment (one parameter)

't t

'

wr

ttx '

wd

ttx

0fs = 10fs =

20fs = 30fs =

Lookup table

Constant parameter

Exponential function

𝜃

𝜃

𝜃

𝜃


Improvement over deterministic benchmark:

Lookup tableExponential

Constant

An energy storage problem

Consider a basic energy storage problem:

» We are going to show that with minor variations in the

characteristics of this problem, we can make each class

of policy work best.


We can create distinct flavors of this problem:

» Problem class 1 – Best for PFAs• Highly stochastic (heavy tailed) electricity prices

• Stationary data

» Problem class 2 – Best for CFAs• Stochastic prices and wind (but not heavy tailed)

• Stationary data

» Problem class 3 - Best for VFAs• Stochastic wind and prices (but not too random)

• Time varying loads, but inaccurate wind forecasts

» Problem class 4 – Best for deterministic lookaheads• Relatively low noise problem with accurate forecasts

» Problem class 5 – A hybrid policy worked best here• Stochastic prices and wind, nonstationary data, noisy forecasts.


The policies

» The PFA:• Charge battery when price is below p1

• Discharge when price is above p2

» The CFA• Optimize over a horizon H; maintain upper and lower bounds (u, l)

for every time period except the first (note that this is a hybrid with a

lookahead).

» The VFA• Piecewise linear, concave value function in terms of energy, indexed

by time.

» The lookahead (deterministic)• Optimize over a horizon H (only tunable parameter) using forecasts of

demand, prices and wind energy

» The lookahead CFA• Use a lookahead policy (deterministic), but with a tunable parameter

that improves robustness.


Each policy is best on certain problems

» Results are percent of posterior optimal solution

» … any policy might be best depending on the data.

Joint research with Prof. Stephan Meisel, University of Muenster, Germany.

Outline



Designing policies



From deterministic to stochastic

Imagine that you would like to solve the time-dependent

linear program:

» subject to

We can convert this to a proper stochastic model by

replacing with and taking an expectation:

The policy has to satisfy with transition function:

0 ,...,

0

minT

T

x x t t

t

c x

0 0 0

1 1 , 1.t t t t t

A x b

A x B x b t

tx ( )t tX S

0

min ( )T

t t t

t

c X S

E

( )t tX St t tA x R

1 1, ,M

t t t tS S S x W

Modeling

Deterministic


» Decision variables:

» Constraints:

• at time t

• Transition function

Stochastic


» Policy

» Constraints at time t

» Transition function

» Exogenous information

1 0

0

max , ( ), |

T

t t t t t

t

E C S X S W S0 ,...,

0

minT

T

t tx x

t

c x

( )t t t tx X S X

1 1t t t tR b B x 1 1, ,M

t t t tS S S x W

0 1 2( , , ,..., )TS W W W

0

t t t

t

A x R

x

t

X

0 ,..., Tx x :X S X

From deterministic to stochastic

Deterministic problems

» Modeling is important, but not

central.

» Algorithms are the most

important, and hardest part.

» Huh?

» Just add up the costs!!

Stochastic problems

» Modeling is the most

important, and hardest, aspect

of stochastic optimization

» Searching for policies is

important, but less critical.

» Modeling uncertainty is often

overlooked, but is of central

importance.

» Evaluating a policy is

important, and difficult. In a

simulator? In the field?

The universal objective function

with

You next need to develop a stochastic model:

» Model uncertainty about parameters in 𝑆0» Model the stochastic process 𝑊1,𝑊2, … ,𝑊𝑁 (for training)

» Model the random variable 𝑊 (for testing, if necessary)

Then search for policies:

» Policy search:• PFAs, CFAs

» Lookahead policies:• VFAs, DLAs

1 0

0

max , ( ), |

T

t t t t t

t

E C S X S W S


1 1, , ( )M

t t t tS S S x W

Thank you!

For more information, go to

http://www.castlelab.princeton.edu/jungle/

Scroll to “Educational materials”

http://www.castlelab.princeton.edu/jungle/

Theory

Applications

Computation

Modeling

Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Documents