Page 1
Tutorial: A Unified Framework for
Optimization under Uncertainty
Enterprisewide Optimization GroupCarnegie Mellon University
September 11, 2018
Warren B. Powell
Princeton UniversityDepartment of Operations Research
and Financial Engineering
© 2018 Warren B. Powell, Princeton University
Page 2
Ad-click bidding
Roomsage.com
Page 3
Revenue management
Earning vs. learning
» You want to maximize
revenues, but you do not
know how demand
responds to price.
You earn the most with
prices near the middle, but
you do not learn anything.
You learn the most by sampling
endpoints, but then you do not
earn anything.
Page 4
Learning problems
Health sciences
» Sequential design of
experiments for drug discovery
» Drug delivery – Optimizing the
design of protective
membranes to control drug
release
» Medical decision making –
Optimal learning for medical
treatments.
Page 5
Drug discovery
Designing molecules
» X and Y are sites where we can hang substituents to change the
behavior of the molecule. We approximate the performance using
a linear belief model:
0
ij ij
sites i substituents j
Y X
» How to sequence experiments to
learn the best molecule as quickly
as possible?
Reg
ret
Page 6
Ride sharing
Uber/Lyft
» Provides real-time, on-demand
transportation.
» Drivers are encouraged to enter or leave
the system using pricing signals and
informational guidance.
Decisions:
» How to price to get the right balance of
drivers relative to customers.
» Real-time management of drivers.
» Policies (rules for managing drivers,
customers, …)
Page 7
Matching buyers with sellers
Now we have a logistic curve for
each origin-destination pair (i,j)
Number of offers for each (i,j) pair
is relatively small.
Need to generalize the learning
across hundreds to thousands of
markets.
0
0( , | )
1
aij ij ij
aij ij ij
p a
Y
p a
eP p a
e
Buyer Seller
Offered price
Page 8
Emergency storm response
Hurricane Sandy
» Once in 100 years?
» Rare convergence of events
» But, meteorologists did an
amazing job of forecasting
the storm.
The power grid
» Loss of power creates
cascading failures (lack of
fuel, inability to pump water)
» How to plan?
» How to react?
Page 9
Meeting variability with portfolios of generation
with mixtures of dispatchability
Page 10
Storage applications
How much energy to store in a battery to handle the
volatility of wind and spot prices to meet demands?
Page 11
Modeling
Before we can solve complex problems, we have
to know how to think about them.
The biggest challenge when making decisions
under uncertainty is modeling.
Min E {cx}Ax = b
x > 0
Mathematician
Software
Organize class
libraries, and set up
communications and
databases
Page 12
Modeling
For deterministic problems, we speak the language
of mathematical programming
» Linear programming:
» For time-staged problems
min x cx
0
Ax b
x
1 1
0
t t t t t
t t t
t
A x B x b
D x u
x
0 ,...,
0
minT
T
x x t t
t
c x
Arguably Dantzig’s biggest
contribution, more so than the
simplex algorithm, was his
articulation of optimization
problems in a standard format,
which has given algorithmic
researchers a common
language.
Page 13
Stochastic
programming
Markov
decision
processes
Reinforcement
learning
Optimal
control
Model
predictive
control
Robust
optimization
Approximate
dynamic
programming
Online
computation
Simulation
optimization
Stochastic
search
Decision
analysis
Stochastic
control
Simulation
optimizationDynamic
Programming
and
control
Optimal
learning
Bandit
problems
Page 14
Stochastic
programming
Markov
decision
processes
Reinforcement
learning
Optimal
control
Model
predictive
control
Robust
optimization
Approximate
dynamic
programming
Online
computation
Simulation
optimization
Stochastic
search
Decision
analysis
Stochastic
control
Simulation
optimizationDynamic
Programming
and
control
Optimal
learning
Bandit
problems
Page 15
Outline
Elements of a dynamic model
Modeling uncertainty
Designing policies
The four classes of policies
From deterministic to stochastic optimization
Page 16
Outline
Elements of a dynamic model
Modeling uncertainty
Designing policies
The four classes of policies
From deterministic to stochastic optimization
Page 17
Modeling dynamic systems
All sequential decision problems can be modeled
using five core components:
» State variables• What do we need to know at time t?
» Decision variables• What are our decisions?
» Exogenous information• What do we learn for the first time between t and t+1?
» Transition function• How do the state variables evolve over time?.
» Objective function• What are our performance metrics?
Page 18
Modeling dynamic problems
The state variable:
Controls community
"Information state"
Operations research/MDP/Computer science
, , System state, where:
Resource state (physical state)
Loca
t
t t t t
t
x
S R I B
R
tion/status of truck/train/plane
Energy in storage
Information state
Prices
Weather
Belief state ("state of knowl
t
t
I
B
edge")
Belief about performance of a drug or catalyst
Belief about the status of equipment
Page 19
The state variable
My definition of a state variable:
» The first depends on a policy. The second depends
only on the problem (and includes the constraints).
» Using either definition, all properly modeled problems
are Markovian!
Page 20
Modeling dynamic problems
Decisions:
Markov decision processes/Computer science
Discrete action
Control theory
Low-dimensional continuous vector
Operations research
Usually a discrete or continuous but high-dimensional
t
t
t
a
u
x
vector of decisions.
At this point, we do not specify to make a decision.
Instead, we define the function ( ) (or ( ) or ( )),
where specifies the type of policy. " " carries information
about the type of functi
how
X s A s U s
.on , and any tunable parameters ff
Page 21
The decision variables
Styles of decisions
» Binary
» Finite
» Continuous scalar
» Continuous vector
» Discrete vector
» Categorical
0,1x X
1,2,...,x X M
,x X a b
1( ,..., ), K kx x x x R
1( ,..., ), K kx x x x Z
1( ,..., ), is a category (e.g. red/green/blue)I ix a a a
Page 22
Modeling dynamic problems
Exogenous information:
New information that first became known at time
ˆ ˆ ˆˆ = , , ,
ˆ Equipment failures, delays, new arrivals
New drivers being hired to the network
ˆ New customer demands
t
t t t t
t
t
W t
R D p E
R
D
ˆ Changes in prices
ˆ Information about the environment (temperature, ...)
t
t
p
E
Note: Any variable indexed by t is known at time t. This convention,
which is not standard in control theory, dramatically simplifies the
modeling of information.
1 2Below, we let represent a sequence of actual observations , ,....
refers to a sample realization of the random variable .t t
W W
W W
Page 23
Modeling dynamic problems
The transition function
1 1
1 1
1 1
1 1
( , , )
ˆ Inventories
ˆ Spot prices
ˆ Market demands
M
t t t t
t t t t
t t t
t t t
S S S x W
R R x R
p p p
D D D
Also known as the:
“System model”
“State transition model”
“Plant model”
“Plant equation”
“Transition law”
“State equation”
“Transfer function”
“Transformation function”
“Law of motion”
“Model”
For many applications, these equations are unknown. This
is known as “model-free” dynamic programming.
Page 24
Objective functions
» Cumulative reward (“online learning”)
• Policies have to work well over time.
» Final reward (“offline learning”)
• We only care about how well the final decision 𝑥𝜋,𝑁 works.
» Risk
1 0
0
max , ( ), |T
t t t t t
t
C S X S W S
E
Modeling stochastic, dynamic problems
,
0ˆmax ( , ) |NF x W S
E
0 0 0 1 1 1 0max ( , ( )), ( , ( )),..., ( , ( )) |T T TC S X S C S X S C S X S S
Page 25
The complete model:
» Objective function
• Cumulative reward (“online learning”)
• Final reward (“offline learning”)
• Risk:
» Transition function:
» Exogenous information:
1 0
0
max , ( ), |T
t t t t t
t
C S X S W S
E
Modeling stochastic, dynamic problems
1 1, , ( )M
t t t tS S S x W
0 1 2, , ,..., TS W W W
,
0ˆmax ( , ) |NF x W S
E
0 0 0 1 1 1 0max ( , ( )), ( , ( )),..., ( , ( )) |T T TC S X S C S X S C S X S S
Page 26
The modeling process
Modeling real applications
» I conduct a conversation with a domain expert to fill in
the elements of a problem:
State
variables
Decision
variables
New
information
Transition
function
Objective
function
What we need to know
(and only what we need)
What we control
What we didn’t know
when we made our decision
How the state variables evolve
Performance metrics
Page 27
Outline
Elements of a dynamic model
Modeling uncertainty
Designing policies
The four classes of policies
From deterministic to stochastic optimization
Page 28
Modeling uncertainty
Classes of uncertainty
» Observational uncertainty
» Prognostic uncertainty (forecasting)
» Experimental noise/variability
» Transitional uncertainty
» Inferential uncertainty
» Model uncertainty
» Systematic exogenous uncertainty
» Control/implementation uncertainty
» Algorithmic noise
» Goal uncertainty
Modeling uncertainty in the context of stochastic optimization is a
relatively untapped area of research.
Page 29
Outline
Elements of a dynamic model
Modeling uncertainty
Designing policies
The four classes of policies
From deterministic to stochastic optimization
Page 30
Designing policies
We have to start by describing what we mean by a
policy.
» Definition:
A policy is a mapping from a state to an action.
… any mapping.
How do we search over an arbitrary space of
policies?
Page 31
Designing policies
“Policies” and the English language
Behavior Habit Procedure
Belief Laws/bylaws Process
Bias Manner Protocols
Commandment Method Recipe
Conduct Mode Ritual
Convention Mores Rule
Culture Patterns Style
Customs Plans Technique
Dogma Policies Tenet
Etiquette Practice Tradition
Fashion Prejudice Way of life
Formula Principle
Page 32
Designing policies
Two fundamental strategies for finding policies:
1) Policy search – Search over a class of functions for
making decisions to optimize some metric.
2) Lookahead approximations – Approximate the impact
of a decision now on the future.
0( , )0
max , ( | ) |f f
T
t t t tf Ft
E C S X S S
*
' ' ' ' 1
' 1
( ) arg max ( , ) max ( , ( )) | | ,t
T
t t x t t t t t t t t t
t t
X S C S x C S X S S S x
E E
Page 33
Designing policies
Policy search:
1a) Policy function approximations (PFAs)• Lookup tables
– “when in this state, take this action”
• Parametric functions
– Order-up-to policies: if inventory is less than s, order up to S.
– Affine policies -
– Neural networks
• Locally/semi/non parametric
– Requires optimizing over local regions
1b) Cost function approximations (CFAs)• Optimizing a deterministic model modified to handle uncertainty
(buffer stocks, schedule slack)
( )( | ) arg max ( , | )
t t
CFA
t t txX S C S x
X
( | )PFA
t tx X S
( | ) ( )PFA
t t f f t
f F
x X S S
Page 34
Designing policies
Lookahead approximations – Approximate the impact of a
decision now on the future:
2a) Approximating the value of being in a state (VFA):
2b) Direct lookahead (DLA)
Optimal policy:
Approximate policy – solve an approximate lookahead model:
*
1 1
1 1
( ) arg max ( , ) ( ) | ,
( ) arg max ( , ) ( ) | ,
arg max ( , ) ( )
t
t
t
t t x t t t t t t
VFA
t t x t t t t t t
x x
x t t t t
X S C S x V S S x
X S C S x V S S x
C S x V S
E
E
' ' ' , 1
' 1
( ) arg max ( , ) max ( , ( )) | | ,t
t HDLA
t t x t t tt t tt t t tt t
t t
X S C S x C S X S S S x
E E
*
' ' ' 1
' 1
( ) arg max ( , ) max ( , ( )) | | ,
t
T
t t x t t t t t t t t
t t
X S C S x C S X S S S xE E
Page 35
1) Policy function approximations (PFAs)
» Lookup tables, rules, parametric/nonparametric functions
2) Cost function approximation (CFAs)
»
3) Policies based on value function approximations (VFAs)
»
4) Direct lookahead policies (DLAs)
» Deterministic lookahead/rolling horizon proc./model predictive control
» Chance constrained programming
» Stochastic lookahead /stochastic prog/Monte Carlo tree search
» “Robust optimization”
Four (meta)classes of policies
( )( | ) arg max ( , | )
t t
CFA
t t txX S C S x
X
( ) arg max ( , ) ( , )t
VFA x x
t t x t t t t t tX S C S x V S S x
' '
' 1
( ) arg max ( , ) ( ) ( ( ), ( ))t
TLA S
t t tt tt tt tt
t t
X S C S x p C S x
xtt,xt ,t+1
,...,xt ,t+T
,
' '( ),...,
' 1
( ) arg max min ( , ) ( ( ), ( ))ttt t t H
TLA RO
t t tt tt tt ttw Wx x
t t
X S C S x C S w x w
[ ( )] 1t tP A x f W
Poli
cy s
earc
h
,
' ',...,
' 1
( ) arg max ( , ) ( , )
tt t t H
TLA D
t t tt tt tt ttx x
t t
X S C S x C S x
Look
ahea
dap
pro
xim
atio
ns
Page 36
1) Policy function approximations (PFAs)
» Lookup tables, rules, parametric/nonparametric functions
2) Cost function approximation (CFAs)
»
3) Policies based on value function approximations (VFAs)
»
4) Direct lookahead policies (DLAs)
» Deterministic lookahead/rolling horizon proc./model predictive control
» Chance constrained programming
» Stochastic lookahead /stochastic prog/Monte Carlo tree search
» “Robust optimization”
Four (meta)classes of policies
( )( | ) arg max ( , | )
t t
CFA
t t txX S C S x
X
,
' ',...,
' 1
( ) arg max ( , ) ( , )
tt t t H
TLA D
t t tt tt tt ttx x
t t
X S C S x C S x
( ) arg max ( , ) ( , )t
VFA x x
t t x t t t t t tX S C S x V S S x
' '
' 1
( ) arg max ( , ) ( ) ( ( ), ( ))t
TLA S
t t tt tt tt tt
t t
X S C S x p C S x
xtt,xt ,t+1
,...,xt ,t+T
,
' '( ),...,
' 1
( ) arg max min ( , ) ( ( ), ( ))ttt t t H
TLA RO
t t tt tt tt ttw Wx x
t t
X S C S x C S w x w
[ ( )] 1t tP A x f W
Funct
ion a
ppro
x.
Page 37
1) Policy function approximations (PFAs)
» Lookup tables, rules, parametric/nonparametric functions
2) Cost function approximation (CFAs)
»
3) Policies based on value function approximations (VFAs)
»
4) Direct lookahead policies (DLAs)
» Deterministic lookahead/rolling horizon proc./model predictive control
» Chance constrained programming
» Stochastic lookahead /stochastic prog/Monte Carlo tree search
» “Robust optimization”
Four (meta)classes of policies
( )( | ) arg max ( , | )
t t
CFA
t t txX S C S x
X
,
' ',...,
' 1
( ) arg max ( , ) ( , )
tt t t H
TLA D
t t tt tt tt ttx x
t t
X S C S x C S x
( ) arg max ( , ) ( , )t
VFA x x
t t x t t t t t tX S C S x V S S x
' '
' 1
( ) arg max ( , ) ( ) ( ( ), ( ))t
TLA S
t t tt tt tt tt
t t
X S C S x p C S x
xtt,xt ,t+1
,...,xt ,t+T
,
' '( ),...,
' 1
( ) arg max min ( , ) ( ( ), ( ))ttt t t H
TLA RO
t t tt tt tt ttw Wx x
t t
X S C S x C S w x w
[ ( )] 1t tP A x f W
Imb
edd
ed o
pti
miz
atio
n
Page 38
Learning problems
Classes of learning problems in stochastic
optimization
1) Approximating the objective
𝐹(𝑥|𝜃) ≈ 𝔼𝐹(𝑥,𝑊).
2) Designing a policy 𝑋𝜋 𝑆 𝜃 .
3) A value function approximation
𝑉𝑡(𝑆𝑡|𝜃) ≈ 𝑉𝑡(𝑆𝑡).
4) Designing a cost function approximation:• The objective function 𝐶𝜋 𝑆𝑡 , 𝑥𝑡|𝜃 .
• The constraints 𝑋𝜋(𝑆𝑡|𝜃)
5) Approximating the transition function
𝑆𝑀(𝑆𝑡 , 𝑥𝑡 ,𝑊𝑡+1|𝜃) ≈ 𝑆𝑀(𝑆𝑡 , 𝑥𝑡 ,𝑊𝑡+1)
Page 39
Approximation strategies
Approximation strategies» Lookup tables
• Independent beliefs
• Correlated beliefs
» Linear parametric models• Linear models
• Sparse-linear
• Tree regression
» Nonlinear parametric models• Logistic regression
• Neural networks
» Nonparametric models• Gaussian process regression
• Kernel regression
• Support vector machines
• Deep neural networks
Page 40
Learning challenges
The learning challenge
From big (batch) data… … to recursive learning
Page 41
Learning challenges
Variable-dimensional learning
Page 42
Outline
Elements of a dynamic model
Modeling uncertainty
Designing policies
The four classes of policies
From deterministic to stochastic optimization
Page 43
Outline
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Direct lookahead policies (DLAs)
» A hybrid lookahead/CFA
Page 44
Outline
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Direct lookahead policies (DLAs)
» A hybrid lookahead/CFA
Page 45
Policy function approximations
Battery arbitrage – When to charge, when to
discharge, given volatile LMPs
Page 46
0.00
20.00
40.00
60.00
80.00
100.00
120.00
140.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71
Grid operators require that batteries bid charge and
discharge prices, an hour in advance.
We have to search for the best values for the policy
parameters
DischargeCharge
Charge Discharge and .
Policy function approximations
Page 47
Policy function approximations
Our policy function might be the parametric
model (this is nonlinear in the parameters):charge
charge discharge
charge
1 if
( | ) 0 if
1 if
t
t t
t
p
X S p
p
Energy in storage:
Price of electricity:
Page 48
Policy function approximations
Finding the best policy
» We need to maximize
» We cannot compute the expectation, so we run simulations:
DischargeCharge
0
max ( ) , ( | )T
t
t t t
t
F C S X S
E
Page 49
Outline
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Direct lookahead policies (DLAs)
» A hybrid lookahead/CFA
Page 50
Cost function approximations
Lookup table
» We can organize potential catalysts into groups
» Scientists using domain knowledge can estimate
correlations in experiments between similar catalysts.
Page 51
51
Cost function approximations
Correlated beliefs: Testing one material teaches us about other
materials
1 2 3 4 4 5
Page 52
Cost function approximations
Cost function approximations (CFA)
» Upper confidence bounding
» Interval estimation
» Boltzmann exploration (“soft max”)• Choose x with probability:
log( | ) arg max
UCB n UCB n UCB
x x n
x
nX S
N
( | ) arg maxIE n IE n IE n
x x xX S
0
n
xz
'
'
( )
nx
nx
n
x
x
eP
e
Page 53
53
Cost function approximations
Picking 𝜃𝐼𝐸 = 0 means we are evaluating each choice
at the mean.
1 2 3 4 4 5
Page 54
54
Cost function approximations
Picking 𝜃𝐼𝐸 = 2 means we are evaluating each choice
at the 95th percentile.
1 2 3 4 4 5
Page 55
Cost function approximations
Optimizing the policy
» We optimize 𝜃𝐼𝐸 to maximize:
where
Notes:
» This can handle any belief model,
including correlated beliefs, nonlinear
belief models.
» All we require is that we be able to
simulate a policy.
( | ) arg max ( , )n IE n IE n IE n n n n
x x x x xx X S S
,max ( ) ,IE
IE NF F x W
E
Reg
ret
Page 56
Cost function approximations
Other applications
» Airlines optimizing schedules with schedule slack to
handle weather uncertainty.
» Manufacturers using buffer stocks to hedge against
production delays and quality problems.
» Grid operators scheduling extra generation capacity in
case of outages.
» Adding time to a trip planned by Google maps to
account for uncertain congestion.
Page 57
Outline
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Direct lookahead policies (DLAs)
» A hybrid lookahead/CFA
Page 58
Value function approximations
Q-learning (for discrete actions)
» But what if the action a is a vector?
1
'
1
1 1
ˆ ( , ) ( , ) max ( ', ')
ˆ( , ) (1 ) ( , ) ( , )
n n n n n n
a
n n n n n n n n n
n n
q s a r s a Q s a
Q s a Q s a q s a
g
a a
-
-
- -
= +
= - +
Page 59
Blood management
Managing blood inventories
Page 60
Blood management
Managing blood inventories over time
t=0
0S
1 1ˆ ˆ,R D
1S
Week 1
1x
2 2ˆ ˆ,R D
2S
Week 2
2x
2
xS
3 3ˆ ˆ,R D
3S3x
Week 3
3
xS
t=1 t=2 t=3
Week 0
0x
Page 61
O-,1
O-,2
O-,3
AB+,2
AB+,3
O-,0
,ˆ
t ABD
AB+,0
AB+,1
AB+,2
O-,0
O-,1
O-,2
,( ,0)t ABR
,( ,1)t ABR
,( ,2)t ABR
,( ,0)t OR
,( ,1)t OR
,( ,2)t OR
,ˆ
t ABD
,ˆ
t AD
,ˆ
t ABD
,ˆ
t ABD
,ˆ
t ABD
,ˆ
t ABD
AB+
AB-
A+
A-
B+
B-
O+
O-
x
tR
AB+,0
AB+,1
,ˆ
t ABD
Satisfy a demand Hold
tS ˆ , t tR D
Page 62
AB+,0
AB+,1
AB+,2
tR
O-,0
O-,1
O-,2
x
tR
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
1,ˆ
t ABR
1tR
1,ˆ
t OR
ˆtD
,( ,0)t ABR
,( ,1)t ABR
,( ,2)t ABR
,( ,0)t OR
,( ,1)t OR
,( ,2)t OR
Page 63
AB+,0
AB+,1
AB+,2
tR x
tR
O-,0
O-,1
O-,2
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
ˆtD
,( ,0)t ABR
,( ,1)t ABR
,( ,2)t ABR
,( ,0)t OR
,( ,1)t OR
,( ,2)t OR
Page 64
( )tF R
AB+,0
AB+,1
AB+,2
tR x
tR
O-,0
O-,1
O-,2
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
ˆtD
,( ,0)t ABR
,( ,1)t ABR
,( ,2)t ABR
,( ,0)t OR
,( ,1)t OR
,( ,2)t OR
Solve this as a
linear program.
Page 65
( )tF R
AB+,0
AB+,1
AB+,2
tR x
tR
O-,0
O-,1
O-,2
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
ˆtD
Dual variables give
value additional
unit of blood..
Duals
,( ,0)t̂ AB
,( ,1)t̂ AB
,( ,2)t̂ AB
,( ,0)t̂ O
,( ,1)t̂ O
,( ,2)t̂ O
Page 66
Updating the value function approximation
Estimate the gradient at
,( ,2)
n
t ABR
,( ,2)ˆn
t AB
n
tR
( )tF R
Page 67
Updating the value function approximation
Update the value function at
,
1
x n
tR
1
1 1( )n x
t tV R
,
1
x n
tR
,( ,2)ˆn
t AB
( )tF R
,( ,2)
n
t ABR
Page 68
Updating the value function approximation
Update the value function at ,
1
x n
tR
,( ,2)ˆn
t AB
,
1
x n
tR
1
1 1( )n x
t tV R
Page 69
Updating the value function approximation
Update the value function at ,
1
x n
tR
,
1
x n
tR
1
1 1( )n x
t tV R
1 1( )n x
t tV R
Page 70
Exploiting concavity
Derivatives are used to estimate a piecewise linear approximation
( )t tV R
tR
Page 71
Iterative learning
t
Page 72
Iterative learning
Page 73
Iterative learning
Page 74
Iterative learning
Page 75
1200000
1300000
1400000
1500000
1600000
1700000
1800000
1900000
0 100 200 300 400 500 600 700 800 900 1000
Ob
jecti
ve fu
ncti
on
Iterations
Approximate dynamic programming
… a typical performance graph.
Page 76
Outline
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Direct lookahead policies (DLAs)
» A hybrid lookahead/CFA
Page 77
Lookahead policies
Planning your next chess move:
» You put your finger on the piece while you think about
moves into the future. This is a lookahead policy,
illustrated for a problem with discrete actions.
Page 79
Lookahead policies
Decision trees:
Page 80
Lookahead policies
Modeling lookahead policies
» Lookahead policies solve a lookahead model, which is an
approximation of the future.
» It is important to understand the difference between the:
• Base model – this is the model we are trying to solve by finding
the best policy. This is usually some form of simulator.
• The lookahead model, which is our approximation of the future
to help us make better decisions now.
» The base model is typically a simulator, or it might be the
real world.
Page 81
XX
X
X
X
X
callcall
call
call
call
call
call
0.26 0.38 0.78 0.6 0.6
0
0
0
0.02
0.03
0.04
0.04
0.05
0.064
0.0640.064 0.064
0.502 0.502
0.502 0.502 0.502
0.603 0.603
0.670.670.05
0.05
0.5030.5
0.76
0.5030.503
0.540.54
0.503
0.5030.503
Emergency storm response
Page 82
XX
X
X
X
X
0.0 0.0 0.0 0.0 0.0
0
0
00.032
0.048
0.05
60.056
0.08
0.0604
0.06040.0604 0.0604
0.51 0.51
0.51 0.51 0
0.62 0.62
00.990.08
0.08
0.5030.5
0.76
0.5030.503
0.540.54
0.503
0.5030.503
Emergency storm response
callcall
call
call
call
call
call
Page 83
XX
X
X
X
X
0.0 0.0 0.0 0.0 0.0
0
0
00.0
0.0
0.0
0.0
0.06
0.060.06 0.06
0 0
0 0
0 0
000.0
0.0
0.5030.5
0.76
0.5030.503
0.540.54
0.503
0.5030.503
00.0
Emergency storm response
callcall
call
call
call
call
call
Page 84
Decision Outcome Decision Outcome Decision
Page 85
Lookahead policies
Monte Carlo tree search:
C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis and S.
Colton, “A survey of Monte Carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in Games,
vol. 4, no. 1, pp. 1–49, March 2012.
Page 87
Outline
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Direct lookahead policies (DLAs)
» A hybrid lookahead/CFA
Page 88
Parametric cost function approximation
An energy storage problem:
Page 89
Parametric cost function approximation
Forecasts evolve over time as new information arrives:
Actual
Rolling forecasts,
updated each
hour.
Forecast made at
midnight:
Page 90
Parametric cost function approximation
Benchmark policy – Deterministic lookahead
Page 91
Parametric cost function approximation
Parametric cost function approximations
» Replace the constraint
with:
» Lookup table modified forecasts (one adjustment term for
each time in the future):
» Exponential function for adjustments (just two parameters)
» Constant adjustment (one parameter)
't t
'
wr
ttx '
wd
ttx
Page 92
0fs = 10fs =
20fs = 30fs =
Lookup table
Constant parameter
Exponential function
𝜃
𝜃
𝜃
𝜃
Page 93
Parametric cost function approximation
Improvement over deterministic benchmark:
Lookup tableExponential
Constant
Page 94
An energy storage problem
Consider a basic energy storage problem:
» We are going to show that with minor variations in the
characteristics of this problem, we can make each class
of policy work best.
Page 95
An energy storage problem
We can create distinct flavors of this problem:
» Problem class 1 – Best for PFAs• Highly stochastic (heavy tailed) electricity prices
• Stationary data
» Problem class 2 – Best for CFAs• Stochastic prices and wind (but not heavy tailed)
• Stationary data
» Problem class 3 - Best for VFAs• Stochastic wind and prices (but not too random)
• Time varying loads, but inaccurate wind forecasts
» Problem class 4 – Best for deterministic lookaheads• Relatively low noise problem with accurate forecasts
» Problem class 5 – A hybrid policy worked best here• Stochastic prices and wind, nonstationary data, noisy forecasts.
Page 96
An energy storage problem
The policies
» The PFA:• Charge battery when price is below p1
• Discharge when price is above p2
» The CFA• Optimize over a horizon H; maintain upper and lower bounds (u, l)
for every time period except the first (note that this is a hybrid with a
lookahead).
» The VFA• Piecewise linear, concave value function in terms of energy, indexed
by time.
» The lookahead (deterministic)• Optimize over a horizon H (only tunable parameter) using forecasts of
demand, prices and wind energy
» The lookahead CFA• Use a lookahead policy (deterministic), but with a tunable parameter
that improves robustness.
Page 97
An energy storage problem
Each policy is best on certain problems
» Results are percent of posterior optimal solution
» … any policy might be best depending on the data.
Joint research with Prof. Stephan Meisel, University of Muenster, Germany.
Page 98
Outline
Elements of a dynamic model
Modeling uncertainty
Designing policies
The four classes of policies
From deterministic to stochastic optimization
Page 99
From deterministic to stochastic
Imagine that you would like to solve the time-dependent
linear program:
» subject to
We can convert this to a proper stochastic model by
replacing with and taking an expectation:
The policy has to satisfy with transition function:
0 ,...,
0
minT
T
x x t t
t
c x
0 0 0
1 1 , 1.t t t t t
A x b
A x B x b t
tx ( )t tX S
0
min ( )T
t t t
t
c X S
E
( )t tX St t tA x R
1 1, ,M
t t t tS S S x W
Page 100
Modeling
Deterministic
» Objective function
» Decision variables:
» Constraints:
• at time t
• Transition function
Stochastic
» Objective function
» Policy
» Constraints at time t
» Transition function
» Exogenous information
1 0
0
max , ( ), |
T
t t t t t
t
E C S X S W S0 ,...,
0
minT
T
t tx x
t
c x
( )t t t tx X S X
1 1t t t tR b B x 1 1, ,M
t t t tS S S x W
0 1 2( , , ,..., )TS W W W
0
t t t
t
A x R
x
t
X
0 ,..., Tx x :X S X
Page 101
From deterministic to stochastic
Deterministic problems
» Modeling is important, but not
central.
» Algorithms are the most
important, and hardest part.
» Huh?
» Just add up the costs!!
Stochastic problems
» Modeling is the most
important, and hardest, aspect
of stochastic optimization
» Searching for policies is
important, but less critical.
» Modeling uncertainty is often
overlooked, but is of central
importance.
» Evaluating a policy is
important, and difficult. In a
simulator? In the field?
Page 102
The universal objective function
with
You next need to develop a stochastic model:
» Model uncertainty about parameters in 𝑆0» Model the stochastic process 𝑊1,𝑊2, … ,𝑊𝑁 (for training)
» Model the random variable 𝑊 (for testing, if necessary)
Then search for policies:
» Policy search:• PFAs, CFAs
» Lookahead policies:• VFAs, DLAs
1 0
0
max , ( ), |
T
t t t t t
t
E C S X S W S
Modeling stochastic, dynamic problems
1 1, , ( )M
t t t tS S S x W
Page 103
Thank you!
For more information, go to
http://www.castlelab.princeton.edu/jungle/
Scroll to “Educational materials”
Page 107
Theory
Applications
Computation
Modeling