The Dynamic Energy Resource Model Group Peer Review Committee Lawrence Livermore National Laboratories July, 2007 Warren Powell Alan Lamont Jeffrey Stewart Abraham George © 2007 Warren B. Powell, Princeton University
© 2007 Warren B. Powell Slide 1
The Dynamic Energy Resource Model
Group Peer Review CommitteeLawrence Livermore National Laboratories
July, 2007
Warren PowellAlan Lamont
Jeffrey StewartAbraham George
© 2007 Warren B. Powell, Princeton University
© 2007 Warren B. Powell Slide 2
Dynamic energy resource managementQuestions:» How will the market evolve in terms of the adoption of
competing energy technologies?• How many windmills, and where?• How much ethanol capacity?• How will the capacity of coal, natural gas and oil evolve?
» What government policies should be implemented?• Carbon tax? Cap and trade?• Tax credits for windmills and solar panels?• Tax credits for ethanol?
» Where should we invest R&D dollars?• Ethanol or hydrogen?• Batteries or windmills?• Hydrogen production, storage or conversion?
© 2007 Warren B. Powell Slide 3
Uncertainties:» Technology:
• Carbon sequestration• The cost of batteries, fuel cells, solar panels• The storage of hydrogen, efficiency of solar panels, …
» Climate: • Global and regional temperatures• Changing patterns of snow storage on mountains• Wind patterns
» Markets: • Global supplies of oil and natural gas• International consumption patterns• Domestic purchasing behaviors (SUV’s?)• Tax policies• The price of oil and natural gas
Dynamic energy resource management
© 2007 Warren B. Powell Slide 4
Research challenges:» Making decisions
• Finding the best decisions (capacity decisions, R&D decisions, government policies) requires solving high-dimensional stochastic, dynamic programs.
• How do we obtain practical solutions to stochastic, dynamic programs which exhibit state variables with millions of dimensions?
» Modeling multiple time scales• We have to represent wind, temperature, rain and snow fall, market
prices and government policies.• This requires modeling hourly, daily, seasonal and yearly dynamics.
» Modeling multiple levels of resolution• Spatial: We need to represent the location of windmills at state,
regional and county levels.• Behavioral: We need to capture the differences between travel
behavior patterns (long commutes vs. short trips, commercial fleet vehicles vs. personal use), or the difference between light and heavy industrial power use.
Dynamic energy resource management
© 2007 Warren B. Powell Slide 5
Alternative ways of solving large stochastic optimization problems:» Simulation using myopic policies – Using rules to determine
decisions based on the current state of the system. Rules are hard to design, and decisions now do not consider the impact on the future.
» Deterministic optimization – Ignores uncertainty (and problems are still very large scale).
» Rolling horizon procedures – Uses point estimates of what might happen in the future. Will not produce robust behaviors.
» Stochastic programming – Cannot handle multiple sources of uncertainty over multiple time periods.
» Markov decision processes – Discrete state, discrete action will not scale (“curse of dimensionality”)
Dynamic energy resource management
© 2007 Warren B. Powell Slide 6
Proposed approach: Approximate dynamic programming» Our research combines mathematical programming, simulation and
statistics in a dynamic programming framework.• Math programming handles high-dimensional decisions.• Simulation handles complex dynamics and high-dimensional
information processes.• Statistical learning is used to improve decisions iteratively.• Solution strategy is highly intuitive – tends to mimic human behavior.
» Features:• Scales to very large scale problems.• Easily handles complex dynamics and information processes.• Rigorous theoretical foundation
» Research challenge:• Calibrating the model.• Designing high quality policies using the tools of approximate
dynamic programming.• Evaluating the quality of these policies.
Dynamic energy resource management
© 2007 Warren B. Powell Slide 7
OutlineMy experiencesA resource allocation modelADP and the post-decision state variableIllustration using blood managementLaboratory and theoretical resultsThe dynamic energy resource model
© 2007 Warren B. Powell Slide 8
OutlineMy experiencesA resource allocation modelADP and the post-decision state variableIllustration using blood managementLaboratory and theoretical resultsThe dynamic energy resource model
© 2007 Warren B. Powell Slide 9
Yellow Freight System
© 2004 Warren B. Powell, Princeton University
© 2007 Warren B. Powell Slide 10
© 2007 Warren B. Powell Slide 11
© 2007 Warren B. Powell Slide 12
© 2007 Warren B. Powell Slide 13
© 2007 Warren B. Powell Slide 14
The fractional jet ownership industry
© 2007 Warren B. Powell Slide 15NetJets Inc.
© 2007 Warren B. Powell Slide 16
Planning for a risky world
Weather•Robust design of emergency response
networks.
•Design of financial instruments to hedge against weather emergencies to protect individuals, companies and municipalities.
•Design of sensor networks and communication systems to manage responses to major weather events.
Disease•Models of disease propagation for response
planning.
•Management of medical personnel, equipment and vaccines to respond to a disease outbreak.
•Robust design of supply chains to mitigate the disruption of transportation systems.
© 2007 Warren B. Powell Slide 17
© 2007 Warren B. Powell Slide 18
© 2007 Warren B. Powell Slide 19
Energy management
Energy resource management• How to balance investment in ethanol, windmills, nuclear, coal-
to-hydrogen?
• When should we make multidecade commitments to evolving technologies?
• What is the pattern of demands?
• How will climate change affect adoption patterns?
Energy R&D portfolio planning• Where should DOE, NSF, … invest R&D dollars for new
technologies?
• How do we balance investments in different components of an energy technology pathway?
• How do we evaluate the probability of a successful R&D program?
• How do we solve multistage resource allocation problems for R&D problems?
© 2007 Warren B. Powell Slide 20
Part VII - CASTLE Lab NewsCASTLE Lab News
New Modeling Language Captures Complexities of Real-World Operations!
75 cents
Spans the gap betweensimulation and optimization.
CASTLE Lab announced the development of a powerful new simulation environment for modeling complex operations in transportation and logistics. The dissertation of Dr. Joel Shapiro, it offers the flexibility of simulation environments, but the intelligence of optimization. The modeling language will allow managers to quickly test
continued on page 3
Thursday, March 2, 1999
© 2007 Warren B. Powell Slide 21
© 2007 Warren B. Powell Slide 22
OutlineMy experiencesA resource allocation modelADP and the post-decision state variableIllustration using blood managementLaboratory and theoretical resultsThe dynamic energy resource model
© 2007 Warren B. Powell Slide 23
A resource allocation model
Attribute vectors:
a =Location
ETAA/C typeFuel level
Home shopCrewEqpt1
Eqpt100
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
LocationETAHome
ExperienceDriving hours
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
TypeLocation
Age
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
Asset classTime invested⎡ ⎤⎢ ⎥⎣ ⎦
© 2007 Warren B. Powell Slide 24
Energy resource modeling
The state of a resource:
Capacity of facilitiesLocation
CostCarbon output
AgeReserves
ta
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟
= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
⎫⎪⎪⎪⎬⎪⎪⎪⎭
© 2007 Warren B. Powell Slide 25
A resource allocation model
Modeling resources:» The state of a single resource:
» The state of multiple resources:
» The information process:
The attributes of a single resource The attribute space
aa=∈A
ˆ The change in the number of resources with attribute .
taRa
=
( )The number of resources with attribute
The resource state vectorta
t ta a
R a
R R∈
=
=A
© 2007 Warren B. Powell Slide 26
A resource allocation model
Modeling demands:» The attributes of a single demand:
» The demand state vector:
» The information process:
The attributes of a demand to be served. The attribute space
bb=∈B
( )The number of demands with attribute
The demand state vectortb
t tb b
D b
D D∈
=
=B
ˆ The change in the number of demands with attribute .
tbDb
=
© 2007 Warren B. Powell Slide 27
Energy resource modeling
The system state:
( ), , System state, where:
Resource state (how much capacity, reserves) Market demands "system parameters" State of the technology (costs, pe
t t t t
t
t
t
S R D
RD
ρ
ρ
= =
===
rformance) Climate, weather (temperature, rainfall, wind) Government policies (tax rebates on solar panels) Market prices (oil, coal)
⎫⎪⎪⎪⎬⎪⎪⎪⎭
© 2007 Warren B. Powell Slide 28
Energy resource modeling
The decision variable:
New capacityRetired capacity
:Type
LocationTechnology
t
for eachx
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟
= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
⎫⎪⎪⎪⎬⎪⎪⎪⎭
© 2007 Warren B. Powell Slide 29
Energy resource modeling
Exogenous information:
⎫⎪⎪⎪⎬⎪⎪⎪⎭
( )ˆ ˆ ˆNew information = , ,t t t tW R D ρ=
ˆ Exogenous changes in capacity, reservesˆ New demands for energy from each sourceˆ Changes in technology (due to R&D)
t
t
t
R
Dρ
=
==
© 2007 Warren B. Powell Slide 30
Energy resource modeling
The transition function
⎫⎪⎪⎪⎬⎪⎪⎪⎭
1 1( , , )Mt t t tS S S x W+ +=
© 2007 Warren B. Powell Slide 31
A resource allocation model
The three states of our system» The state of a single resource/entity
» The resource state vector
» The system state vector
1
2
3
t
t t
t
aa a
a
⎡ ⎤⎢ ⎥= ⎢ ⎥⎢ ⎥⎣ ⎦
1
2
3
ta
t ta
ta
R
R R
R
⎡ ⎤⎢ ⎥
= ⎢ ⎥⎢ ⎥⎣ ⎦
( ), ,t t t tS R D ρ=
© 2007 Warren B. Powell Slide 32
A resource allocation model
DemandsResources
© 2007 Warren B. Powell Slide 33
A resource allocation model
t t+1 t+2
© 2007 Warren B. Powell Slide 34
A resource allocation model
t t+1 t+2
Optimizing at a point in time
Optimizing over time
© 2007 Warren B. Powell Slide 35
OutlineMy experiencesA resource allocation modelADP and the post-decision state variableIllustration using blood managementLaboratory and theoretical resultsThe dynamic energy resource model
Do not use
weather report
Use w
eath
er re
port
Forecast sunny .6
Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200Sun .7 -$200
Schedule game
Cancel game
Rain .2 -$2000Clouds .3 $1000Sun .5 $5000Rain .2 -$200Clouds .3 -$200Sun .5 -$200
Schedule game
Cancel game
Forecast cloudy .3
Forecast rain .1
- Decision nodes
- Outcome nodes
Information
ActionInformation
Action
State
State
© 2007 Warren B. Powell Slide 37
Laying the foundation
Dynamic programming review:» Let:
» We model system dynamics using:
"State" of our "system" at time t. "Action" that we take to change the system.
( , ) Contribution earned when we take action from state .
t
t
t t t
Sx
C S x x S
==
=
1
1
( | , ) Probability that action takes us from state to state
t t t t
t t
p S S x xS S
+
+
=
© 2007 Warren B. Powell Slide 38
Laying the foundation
Bellman’s equation:» Standard form:
» Expectation form:
1 1'
( ) max ( , ) ( ' | , ) ( ') t t x t t t t t t ts
V S C S x p s S x V S s+ +⎛ ⎞= + =⎜ ⎟⎝ ⎠
∑
( ){ }( )1 1( ) max ( , ) ( , ) | t t x t t t t t t t tV S C S x E V S S x S+ += +
Do not use
weather report
Use w
eath
er re
port
Forecast sunny .6
Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200Sun .7 -$200
Schedule game
Cancel game
Rain .2 -$2000Clouds .3 $1000Sun .5 $5000Rain .2 -$200Clouds .3 -$200Sun .5 -$200
Schedule game
Cancel game
Forecast cloudy .3
Forecast rain .1
- Decision nodes
- Outcome nodes
Do not use
weather report
Use w
eath
er re
port
Forecast sunny .6
Schedule game
Cancel game
Schedule game
Cancel game
Schedule game
Cancel game
Schedule game
Cancel game
Forecast cloudy .3
Forecast rain .1
-$1400
-$200
$2300
-$200
$3500
-$200
$2400
-$200
Do not use
weather report
Use w
eath
er re
port
Forecast sunny .6Schedule game
Cancel game
Forecast cloudy .3
Forecast rain .1 -$200
$2300
$3500
$2400
-$200
Do not use
weather report
Use w
eath
er re
port
$2770
$2400
© 2007 Warren B. Powell Slide 43
Bellman’s equation
We just solved Bellman’s equation:
» We found the value of being in each state by stepping backward through the tree.
{ }1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈
= +X
© 2007 Warren B. Powell Slide 44
Bellman’s equation
The challenge of dynamic programming:
Problem: Curse of dimensionality
{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈
= +X
© 2007 Warren B. Powell Slide 45
The curses of dimensionality
What happens if we apply this idea to our blood problem?» State variable is:
• The supply of each type of blood, along with its age
– 8 blood types– 6 ages– = 48 “blood types”
• The demand for each type of blood– 8 blood types
» Decision variable is how much of 48 blood types to supply to 8 demand types.
• 216- dimensional decision vector» Random information
• Blood donations by week (8 types)• New demands for blood (8 types)
© 2007 Warren B. Powell Slide 46
The curses of dimensionality
The challenge of dynamic programming:
Problem: Curse of dimensionality
{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈
= +X
Three curses
State spaceOutcome spaceAction space (feasible region)
© 2007 Warren B. Powell Slide 47
The curses of dimensionality
The computational challenge:
How do we find ? 1 1( )t tV S+ +
How do we compute the expectation?
How do we find the optimal solution?
{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈
= +X
Do not
weath
Use w
eath
er re
port
Forecast sunny .6
Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200S n 7 $200
Schedule game
Cancel game
Rain 2 -$2000
Forecast cloudy .3
Forecast rain .1
Do not
weath
Use w
eath
er re
port
Forecast sunny .6
Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200S n 7 $200
Schedule game
Cancel game
Rain 2 -$2000
Forecast cloudy .3
Forecast rain .1
{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈
= +X
tS
1tS +
Do not use
weather report
Use w
eath
er re
port
Forecast sunny .6
Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200Sun .7 -$200
Schedule game
Cancel game
Rain .2 -$2000Clouds .3 $1000Sun .5 $5000Rain .2 -$200Clouds .3 -$200Sun .5 -$200
Schedule game
Cancel game
Forecast cloudy .3
Forecast rain .1
- Decision nodes
- Outcome nodes
Do not use
weather report
Use w
eath
er re
port
Forecast sunny .6
Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200Sun .7 -$200
Schedule game
Cancel game
Rain .2 -$2000Clouds .3 $1000Sun .5 $5000Rain .2 -$200Clouds .3 -$200Sun .5 -$200
Schedule game
Cancel game
Forecast cloudy .3
Forecast rain .1
- Decision nodes
- Outcome nodes
© 2007 Warren B. Powell Slide 50
Pre- and post-decision states
New concept:» The “pre-decision” state variable:
•
• Same as a “decision node” in a decision tree.
» The “post-decision” state variable:
•
• Same as an “outcome node” in a decision tree.
The information required to make a decision t tS x=
The state of what we know immediately after we make a decision.
xtS =
© 2007 Warren B. Powell Slide 51
⎛⎜⎜⎜⎝
⎞⎟⎟⎟⎠
Pre- and post-decision states
Pre-decision, state-action, and post-decision
Pre-decision state State Action Post-decision state
93 states 93 9 state-action pairs× 93 states
© 2007 Warren B. Powell Slide 52
A single, complex entity
CityETAEquip
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
Dallas41.2
Good
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
40t =Pre-decision
Chicago54.7Good
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
40t =Post-decision
Chicago56.2
Repair
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
50t =Pre-decision
Pre- and post-decision attributes for our nomadic truck driver:
Chicago--
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
Decision40t =
…
© 2007 Warren B. Powell Slide 53
Pre- and post-decision states
( , )t t tS R D=
Pre-decision: resources and demands
© 2007 Warren B. Powell Slide 54
, ( , )x M xt t tS S S x=
Pre- and post-decision states
© 2007 Warren B. Powell Slide 55
Pre- and post-decision states
1 1 1ˆ ˆ( , )t t tW R D+ + +=
xtS ,
1 1( , )M W xt t tS S S W+ +=
© 2007 Warren B. Powell Slide 56
Pre- and post-decision states
1tS +
© 2007 Warren B. Powell Slide 57
System dynamics
It is traditional to assume you are given the one-step transition matrix:
» Computing the transition matrix is impossible for the vast majority of problems.
We are going to assume that we are given a transition function:
» This is at the heart of any simulation model. » Often rule-based. Very easy to compute, even for large-scale
problems.
( )1 1, , Mt t t tS S S x W+ +=
1 1 ( | , ) Probability that action takes us from state to state t t t t t tp S S x x S S+ +=
© 2007 Warren B. Powell Slide 58
The transition function
Computing the post-decision state:» Method 1 – Divide the effect of decisions and information
» Method 2 – State-action pairs (“Q-learning”)
» Method 3 – Post-decision based on point estimate
( )( )
, 1 , 1 1
1 1
, , is a point-estimate of at time .
, ,
x Mt t t t t t t t
Mt t t t
S S S x W W W t
S S S x W+ + +
+ +
=
=
( ), Produces huge post-decision state spacext t tS S x=
( )( )
,
,1 1
, The pure effect of a decision
, The effect of the exogenous information
x M xt t t
M W xt t t
S S S x
S S S W+ +
=
=
© 2007 Warren B. Powell Slide 59
The transition function
Actually, we have three transition functions:» The attribute transition function:
» The resource transition function
» The general transition function:
( )( )
,
,1 1
, The pure effect of a decision
, The effect of the exogenous information
x M xt t t
M W xt t t
a a a x
a a a W+ +
=
=
( )( )
,
,1 1
, The pure effect of a decision
, The effect of the exogenous information
x M xt t t
M W xt t t
S S S x
S S S W+ +
=
=
( )( )
,
,1 1
, The pure effect of a decision
, The effect of the exogenous information
x M xt t t
M W xt t t
R R R x
R R R W+ +
=
=
© 2007 Warren B. Powell Slide 60
Bellman’s equations with the post-decision state
Bellman’s equations broken into stages:
» Optimization problem (making the decision):
• Note: this problem is deterministic!
» Simulation problem (the effect of exogenous information):
( )( ),( ) max ( , ) ( , ) x M xt t x t t t t t t tV S C S x V S S x= +
{ },1 1( ) ( ( , )) |x x M W x x
t t t t t tV S E V S S W S+ +=
© 2007 Warren B. Powell Slide 61
Bellman’s equations with the post-decision state
Challenges» For most practical problems, we are not going to be
able to compute .
» Concept: replace it with an approximation and solve
» So now we face:• What should the approximation look like?• How do we estimate it?
( )( ) max ( , ) ( ) x xt t x t t t t tV S C S x V S= +
( )x xt tV S
( )xt tV S
( )( ) max ( , ) ( ) xt t x t t t t tV S C S x V S= +
© 2007 Warren B. Powell Slide 62
Approximating the value function
Value function approximations:» Linear (in the resource state):
» Piecewise linear, separable:
» Indexed PWL separable:
( ) ( )x xt t ta ta
aV R V R
∈
= ∑A
Best when assets are complex,which means that is small(typically 0 or 1).
taR
Best when assets are simple,which means that may belarger.
taR
( )x xt t ta ta
aV R v R
∈
= ⋅∑A
( ) ( ) | ( )x xt t ta ta t
aV R V R features
∈
= ∑A
Helps to capture dependencies.
e.g. status of technology, climate, …
© 2007 Warren B. Powell Slide 63
Approximating the value function
Value function approximations:» Ridge regression (Klabjan and Adelman)
» Benders cuts (more on this later)
0x
( )t tV R
1x
( ) ( ) f
xt t tf tf tf fa ta
f a
V R V R R Rθ∈ ∈
= =∑ ∑F A
© 2007 Warren B. Powell Slide 64
Our general algorithmStep 1: Start with a post-decision state Step 2: Obtain Monte Carlo sample of and
compute the next pre-decision state:
Step 3: Solve the deterministic optimization using anapproximate value function:
to obtain . Step 4: Update the value function approximation
Step 5: Find the next post-decision state:
, 1 ,1 1 1 1 1 1 ˆ( ) (1 ) ( )n x n n x n n
t t n t t n tV S V S vα α−− − − − − −= − +
( )1 ,ˆ max ( , ) ( ( , ) )n n n M x nt x t t t t t tv C S x V S S x−= +
ntx
( ), ,1 , ( )n M W x n n
t t tS S S W ω−=
,1
x ntS −
( )ntW ω
, , ( , )x n M x n nt t tS S S x=
Simulation
Optimization
Statistics
© 2007 Warren B. Powell Slide 65
Competing updating methods
Comparison to other methods:» Classical MDP (value iteration)
» Classical ADP (pre-decision state):
» Our method (update around post-decision state):
( ), 1 ,
, 1 ,1 1 1 1 1 1
ˆ max ( , ) ( ( , ))
ˆ( ) (1 ) ( )
n n x n M x nt x t t t t t t
n x n n x n nt t n t t n t
v C S x V S S x
V S V S vα α
−
−− − − − − −
= +
= − +
( )11( ) max ( , ) ( )n n
x tV S C S x EV S−+= +
( )11
'
11 1
ˆ max ( , ) ( ' | , ) '
ˆ( ) (1 ) ( )
n n n nt x t t t t t t
s
n n n n nt t n t t n t
v C S x p s S x V s
V S V S vα α
−+
−− −
⎛ ⎞= +⎜ ⎟⎝ ⎠
= − +
∑ˆ updates ( )t t tv V S
1 1ˆ updates ( )xt t tv V S− −
, 1x ntV −
© 2007 Warren B. Powell Slide 66
OutlineMy experiencesA resource allocation modelADP and the post-decision state variableIllustration using blood managementLaboratory and theoretical resultsThe dynamic energy resource model
© 2007 Warren B. Powell Slide 67
Blood management
Managing blood inventories
© 2007 Warren B. Powell Slide 68
Blood management
Managing blood inventories over time
t=0
0S1 1
ˆ ˆ,R D1S
Week 1
1x1xS
2 2ˆ ˆ,R D
2S
Week 2
2x2xS
3 3ˆ ˆ,R D
3S3x
Week 2
3xS
t=1 t=2 t=3
O-,1
O-,2
O-,3
AB+,2
AB+,3
O-,0
,ˆ
t ABD +
AB+,0
AB+,1
AB+,2
O-,0
O-,1
O-,2
,( ,0)t ABR +
,( ,1)t ABR +
,( ,2)t ABR +
,( ,0)t OR −
,( ,1)t OR −
,( ,2)t OR −
,ˆ
t ABD −
,ˆ
t AD +
,ˆ
t ABD +
,ˆ
t ABD +
,ˆ
t ABD +
,ˆ
t ABD +
AB+
AB-
A+
A-
B+
B-
O+
O-
xtR
AB+,0
AB+,1
,ˆ
t ABD +
Satisfy a demand Hold
tS = ( )ˆ , t tR D
AB+,0
AB+,1
AB+,2
tR
O-,0
O-,1
O-,2
xtR
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
1,ˆ
t ABR + +
1tR +
1,ˆ
t OR + −
ˆtD
,( ,0)t ABR +
,( ,1)t ABR +
,( ,2)t ABR +
,( ,0)t OR −
,( ,1)t OR −
,( ,2)t OR −
AB+,0
AB+,1
AB+,2
tR xtR
O-,0
O-,1
O-,2
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
ˆtD
,( ,0)t ABR +
,( ,1)t ABR +
,( ,2)t ABR +
,( ,0)t OR −
,( ,1)t OR −
,( ,2)t OR −
( )tF R
AB+,0
AB+,1
AB+,2
tR xtR
O-,0
O-,1
O-,2
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
ˆtD
,( ,0)t ABR +
,( ,1)t ABR +
,( ,2)t ABR +
,( ,0)t OR −
,( ,1)t OR −
,( ,2)t OR −
Solve this as a linear program.
( )tF R
AB+,0
AB+,1
AB+,2
tR xtR
O-,0
O-,1
O-,2
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
ˆtD
Dual variables give value additional unit of blood..
Duals
,( ,0)t̂ ABν +
,( ,1)t̂ ABν +
,( ,2)t̂ ABν +
,( ,0)t̂ Oν −
,( ,1)t̂ Oν −
,( ,2)t̂ Oν −
© 2007 Warren B. Powell Slide 74
Updating the value function approximation
Estimate the gradient at
,( ,2)nt ABR +
,( ,2)ˆnt ABν +
ntR
( )tF R
© 2007 Warren B. Powell Slide 75
Updating the value function approximation
Update the value function at
,1
x ntR −
11 1( )n x
t tV R−− −
,1
x ntR −
,( ,2)ˆnt ABν +
( )tF R
,( ,2)nt ABR +
© 2007 Warren B. Powell Slide 76
Updating the value function approximation
Update the value function at ,1
x ntR −
,( ,2)ˆnt ABν +
,1
x ntR −
11 1( )n x
t tV R−− −
© 2007 Warren B. Powell Slide 77
Updating the value function approximation
Update the value function at ,1
x ntR −
,1
x ntR −
11 1( )n x
t tV R−− −
1 1( )n xt tV R− −
© 2007 Warren B. Powell Slide 78
Blood management
t
© 2007 Warren B. Powell Slide 79
Blood management
© 2007 Warren B. Powell Slide 80
Blood management
© 2007 Warren B. Powell Slide 81
Blood management
© 2007 Warren B. Powell Slide 82
0
50
100
150
200
250
300
350
400
450
500
0 50 100 150 200
Iterations
Tota
l Sho
rtage
s (#
uni
ts)
Not UsingValueFunctions
Using ValueFunctions
Blood management
© 2007 Warren B. Powell Slide 83
© 2007 Warren B. Powell Slide 84
© 2007 Warren B. Powell Slide 85
© 2007 Warren B. Powell Slide 86
© 2007 Warren B. Powell Slide 87
© 2007 Warren B. Powell Slide 88
© 2007 Warren B. Powell Slide 89
Implementation metricsResults from the real world:
2521
30 32
41
21
37.7
10.6 12
05
1015202530354045
Setouts Swaps Nonpreferredconsists
Underpowered Overpowered
Perc
ent
HistoryModel
© 2007 Warren B. Powell Slide 90
Schneider National
© 2007 Warren B. Powell Slide 91
© 2007 Warren B. Powell Slide 92
0
200
400
600
800
1000
1200
1400
US_SOLO US_IC US_TEAM
Capacity category
Rev
enue
per
WU
Historical maximum
Simulation
Historical minimum
0
200
400
600
800
1000
1200
US_SOLO US_IC US_TEAM
Capacity category
Util
izat
ion Historical maximum
Simulation
Historical minimumRevenue per WU
Utilization
Case study: truckload trucking
Historical min and maxCalibrated model
© 2007 Warren B. Powell Slide 93
OutlineMy experiencesA resource allocation modelADP and the post-decision state variableIllustration using blood managementLaboratory and theoretical resultsThe dynamic energy resource model
© 2007 Warren B. Powell Slide 94
Two-stage optimizationPiecewise linear, separable value function approximations:
Piecewise linear, separable:
( ) ( )t t tl tll
V R V R∈
=∑L
© 2007 Warren B. Powell Slide 95
Two-stage optimizationBenders decomposition:
0x
( )t tV R
1x
⎫⎪⎪⎬⎪⎪⎭
Multidimensional cuts produce provably convergent, nonseparablevalue function approximation.
© 2007 Warren B. Powell Slide 96
The competition
Exact solutions using Benders:
0x
0V“L-Shaped” decomposition
(Van Slyke and Wets)
0x
0VStochastic decomposition
(Higle and Sen)
0x
0VCUPPS
(Chen and Powell)
© 2007 Warren B. Powell Slide 97
The competitionPercent from optimal 100 iterations
0
5
10
15
20
25
30
35
40
45
SD L-shaped CUPPS SPAR
10 locations25 locations50 locations100 locations
10
20
30
40
0
Percent over optimal after 100 iterations
Benders
Perc
ent e
rror
Increasing problem size
Separable
© 2007 Warren B. Powell Slide 98
The competitionPercent from optimal 100 iterations
0
5
10
15
20
25
30
35
40
45
SD L-shaped CUPPS SPAR
10 locations25 locations50 locations100 locations
10
20
30
40
0
Percent over optimal after 100 iterations
Increasing problem size
Benders Separable
Perc
ent e
rror
© 2007 Warren B. Powell Slide 99
Multistage problems
Deterministic, (integer) multicommodity flow
© 2007 Warren B. Powell Slide 100
Multistage problems
Deterministic, (integer) multicommodity flow
60
65
70
75
80
85
90
95
100
105
Base
T_30
T_90
I_10
I_40
C_IIC_IIIC_IV R_1
R_5R_10
0R_40
0C_1C_8
Perc
ent o
f opt
imal
100 = optimal continuous relaxation
© 2007 Warren B. Powell Slide 101
Multistage problems
Stochastic, (integer) multicommodity flow
0
20
40
60
80
100
120
Base
I_10
I_40
C_II C_III
C_IV R_1 R_5R_1
00R_4
00 C_1 C_8
Perc
ent o
f pos
terio
r opt
imal
Rolling horizonADP
© 2007 Warren B. Powell Slide 102
Properties of separable, piecewise linear value function approximations:» Converges to optimal when we sample all points infinitely often:
• H. Topaloglu and W. B Powell, OR Letters, 2003.» Provably optimal for two-stage, nonseparable functions with continuously
differentiable second stage:• R. K.-L. Cheung and W.B. Powell, Operations Research, 2000.
» Provably optimal for two-stage, separable problems:• Powell, W.B., A. Ruszczynski and H. Topaloglu, Mathematics of Operations
Research, 2004.» Near-optimal for two-stage, nonseparable with nondifferentiable second
stage:• Powell, W.B., A. Ruszczynski and H. Topaloglu, Mathematics of Operations
Research, 2004.» Provably optimal for scalar, finite-horizon multistage problems:
• J. Nascimento and W. B. Powell, (under review, Math of OR)• J. Nascimento and W. B. Powell (in preparation)
Results apply when we use “pure exploitation” – do not assume points are sampled infinitely often.
Two-stage stochastic programming
© 2007 Warren B. Powell Slide 103
OutlineMy experiencesA resource allocation modelADP and the post-decision state variableIllustration using blood managementLaboratory and theoretical resultsThe dynamic energy resource model
© 2007 Warren B. Powell Slide 104
Energy resource modeling
oiltx
2008
oiltR ˆ oil
tD ˆ oiltρˆ oil
tRNew information 2009
1oiltR + 1
oiltx + 1
ˆ oiltD + 1ˆ oil
tρ +1ˆ oil
tR +
New information
windtxwind
tR ˆ windtD ˆ wind
tρˆ windtR 1
windtR + 1
windtx + 1
ˆ windtD + 1ˆ wind
tρ +1ˆ wind
tR +
coaltxcoal
tR ˆ coaltD ˆ coal
tρˆ coaltR 1
coaltR + 1
coaltx + 1
ˆ coaltD + 1ˆ coal
tρ +1ˆ coal
tR +
corntxcorn
tR ˆ corntD ˆ corn
tρˆ corntR 1
corntx +1
corntR + 1
ˆ corntD + 1ˆ corn
tρ +ˆ corn
tR
© 2007 Warren B. Powell Slide 105
We have to allocate resources before we know the demands for different types of energy in the future:
Energy resource modeling
© 2007 Warren B. Powell Slide 106
We use value function approximations of the future to make decisions now:
Energy resource modeling
© 2007 Warren B. Powell Slide 107
,,1x ntR →
,,2x ntR →
,,3x ntR →
,,4x ntR →
,,5x ntR →
This determines how much capacity to provide:
Energy resource modeling
© 2007 Warren B. Powell Slide 108
,1ˆ ( )ntv ω
,2ˆ ( )ntv ω
,3ˆ ( )ntv ω
,4ˆ ( )ntv ω
,5ˆ ( )ntv ω
Marginal value:
,,1x ntR →
,,2x ntR →
,,3x ntR →
,,4x ntR →
,,5x ntR →
Energy resource modeling
© 2007 Warren B. Powell Slide 109
1, 1,( )xt AB t ABV R− + − +
,1,
x nt ABR − +
Using the marginal values, we iteratively estimate piecewise linear functions.
Energy resource modeling
© 2007 Warren B. Powell Slide 110
R1t
ktv+
ktv−
Right derivativeLeft derivative
1, 1,( )xt AB t ABV R− + − +
,1,
x nt ABR − +
Using the marginal values, we iteratively estimate piecewise linear functions.
Energy resource modeling
© 2007 Warren B. Powell Slide 111
R1t
( 1)ktv+ +
( 1)ktv− +
1, 1,( )xt AB t ABV R− + − +
,1,
x nt ABR − +
Using the marginal values, we iteratively estimate piecewise linear functions.
Energy resource modeling
© 2007 Warren B. Powell Slide 112
Linear value function approximations:
Linear (in the resource state):
( )t t tl tll
V R v R∈
= ⋅∑L
Two-stage stochastic programming
© 2007 Warren B. Powell Slide 113
Piecewise linear, separable value function approximations:
Piecewise linear, separable:
( ) ( )t t tl tll
V R V R∈
=∑L
Two-stage stochastic programming
© 2007 Warren B. Powell Slide 114
Research challenges
Approximate dynamic programming:» At the heart of an ADP algorithm is the challenge of
finding a value function approximation “that works”• Can be used within commercial LP solvers• Can be updated (estimated) easily• Is stable• Provides high quality solutions
» Assessing solution quality• Is it realistic?
– Do we seem to mimic markets and public policy?• Is it robust?
– Do we achieve energy goals under different scenarios?
© 2007 Warren B. Powell Slide 115
For the dynamic energy resource model, it is not enough to have a value function that depends purely on the resource vector.» The value of coal plants depends on our ability to sequester carbon.» We need to capture the “state of the world” in our value function
approximations.Strategies:» Let be the full system state vector, capturing the cost of
technologies, government policies, etc. etc.» Let be a set of “features” that appear to be
important explanatory variables. Identifying features is the “art” of ADP.
» We can then fit value functions that depend on the features.
( | ( )) ( | ( ))t t t ta ta ta
V R S V R Sφ φ∈
= ∑A
Research challenges
tS
( ),f tS fφ ∈F
© 2007 Warren B. Powell Slide 116
Research challenges
Strategies for fitting
» Lookup-table• Very general, but suffers from curse of dimensionality
» Linear regression with low-dimensional polynomials• Can work –depends on the problem.
» Kernel regression• Powerful strategy that combines lookup-table with regression
models.• Use within ADP is surprisingly young.• Variety of issues unique to ADP.
( | ( )) ( | ( ))t t t ta ta ta
V R S V R Sφ φ∈
= ∑A
© 2007 Warren B. Powell Slide 117
Research challenges
Approximate dynamic programming:» How do we establish that we are getting “good”
solutions?• Demonstrate techniques on simpler problems.• Compare against other methods for larger problems.
» We need algorithms that are fast and stable.• Identifying variance reduction methods from the simulation
community that work on this problem class.• Developing kernel regression techniques for improved fitting
of the value function.• Finding the best smoothing techniques for recursive updating.• Parallel processing for accelerating simulations.
© 2007 Warren B. Powell Slide 118
Research challenges
System modeling» Modeling the evolution of technology using compact
representations• If we invest in technology, how do we describe the change
process?
» Modeling physical processes at multiple scales• Wind, temperature, rainfall at different levels of discretization.
» We need a software architecture that allows a larger community to participate in the modeling
• We need to tap into various types of domain expertise, such as climate modeling, transportation modeling, …
© 2007 Warren B. Powell Slide 119
© 2007 Warren B. Powell Slide 120
OutlineR&D for hydrogen fuel cells
© 2007 Warren B. Powell Slide 121
R&D optimization for hydrogen fuel cellWe have been testing two methods for solving the hydrogen fuel cell R&D portfolio problem» Brute force
• Enumerate all decisions• Use Monte Carlo sampling to estimate the value of a particular
set of technologies• Will not scale to large problems
» Approximate dynamic programming• Replace value function with linear approximation• Determine portfolio by solving a knapsack problem using a
solver.• Scales to large problems, but how large is the error introduced
by the linear value function approximation?
© 2007 Warren B. Powell Slide 122
R&D optimization for hydrogen fuel cellTest problems» Smaller dataset
• 12 projects• Must choose 5 to research• 792 combinations
» Larger dataset• 18 projects• Must choose 5 to fund• 8568 combinations
» General• First choose projects to research• Learn results of research• Choose the best technologies for the fuel cell, and evaluate the cost of
the fuel cell.
© 2007 Warren B. Powell Slide 123
R&D optimization for hydrogen fuel cell
Elements of our hydrogen fuel cell problem
© 2007 Warren B. Powell Slide 124
R&D optimization for hydrogen fuel cell
The optimization problem» Performance of fuel cell depends on parameters » Choose a subset of projects to perform additional
research.» Parameters for the chosen project will change in a
random way.
© 2007 Warren B. Powell Slide 125
R&D optimization for hydrogen fuel cell
Shape of the cost function
© 2007 Warren B. Powell Slide 126
Marginal value of each research projectTop 5 are funded.Projects in common color compete
Project 6 drops out of R&D portfolio; project 9 enters
© 2007 Warren B. Powell Slide 127
Results from 792 R&D portfolios
© 2007 Warren B. Powell Slide 128
Confidence interval for the value of the solution resulting from a particular R&D portfolio (five projects).
Best estimate
© 2007 Warren B. Powell Slide 129
Best overall solution (from brute force)Optimal solution chosen by ADP
© 2007 Warren B. Powell Slide 130
Optimal solution chosen by ADP Best overall solution (from brute force)
Results from 8568 R&D portfolios