The Dynamic Energy Resource Model - Princeton …castlelab.princeton.edu/html/Presentations/Powell_LLNL...© 2007 Warren B. Powell Slide 1 The Dynamic Energy Resource Model Group Peer

© 2007 Warren B. Powell Slide 1

The Dynamic Energy Resource Model

Group Peer Review CommitteeLawrence Livermore National Laboratories

July, 2007

Warren PowellAlan Lamont

Jeffrey StewartAbraham George

© 2007 Warren B. Powell, Princeton University


Dynamic energy resource managementQuestions:» How will the market evolve in terms of the adoption of

competing energy technologies?• How many windmills, and where?• How much ethanol capacity?• How will the capacity of coal, natural gas and oil evolve?

» What government policies should be implemented?• Carbon tax? Cap and trade?• Tax credits for windmills and solar panels?• Tax credits for ethanol?

» Where should we invest R&D dollars?• Ethanol or hydrogen?• Batteries or windmills?• Hydrogen production, storage or conversion?


Uncertainties:» Technology:

• Carbon sequestration• The cost of batteries, fuel cells, solar panels• The storage of hydrogen, efficiency of solar panels, …

» Climate: • Global and regional temperatures• Changing patterns of snow storage on mountains• Wind patterns

» Markets: • Global supplies of oil and natural gas• International consumption patterns• Domestic purchasing behaviors (SUV’s?)• Tax policies• The price of oil and natural gas

Dynamic energy resource management


Research challenges:» Making decisions

• Finding the best decisions (capacity decisions, R&D decisions, government policies) requires solving high-dimensional stochastic, dynamic programs.

• How do we obtain practical solutions to stochastic, dynamic programs which exhibit state variables with millions of dimensions?

» Modeling multiple time scales• We have to represent wind, temperature, rain and snow fall, market

prices and government policies.• This requires modeling hourly, daily, seasonal and yearly dynamics.

» Modeling multiple levels of resolution• Spatial: We need to represent the location of windmills at state,

regional and county levels.• Behavioral: We need to capture the differences between travel

behavior patterns (long commutes vs. short trips, commercial fleet vehicles vs. personal use), or the difference between light and heavy industrial power use.



Alternative ways of solving large stochastic optimization problems:» Simulation using myopic policies – Using rules to determine

decisions based on the current state of the system. Rules are hard to design, and decisions now do not consider the impact on the future.

» Deterministic optimization – Ignores uncertainty (and problems are still very large scale).

» Rolling horizon procedures – Uses point estimates of what might happen in the future. Will not produce robust behaviors.

» Stochastic programming – Cannot handle multiple sources of uncertainty over multiple time periods.

» Markov decision processes – Discrete state, discrete action will not scale (“curse of dimensionality”)



Proposed approach: Approximate dynamic programming» Our research combines mathematical programming, simulation and

statistics in a dynamic programming framework.• Math programming handles high-dimensional decisions.• Simulation handles complex dynamics and high-dimensional

information processes.• Statistical learning is used to improve decisions iteratively.• Solution strategy is highly intuitive – tends to mimic human behavior.

» Features:• Scales to very large scale problems.• Easily handles complex dynamics and information processes.• Rigorous theoretical foundation

» Research challenge:• Calibrating the model.• Designing high quality policies using the tools of approximate

dynamic programming.• Evaluating the quality of these policies.



OutlineMy experiencesA resource allocation modelADP and the post-decision state variableIllustration using blood managementLaboratory and theoretical resultsThe dynamic energy resource model




Yellow Freight System

© 2004 Warren B. Powell, Princeton University






The fractional jet ownership industry

© 2007 Warren B. Powell Slide 15NetJets Inc.


Planning for a risky world

Weather•Robust design of emergency response

networks.

•Design of financial instruments to hedge against weather emergencies to protect individuals, companies and municipalities.

•Design of sensor networks and communication systems to manage responses to major weather events.

Disease•Models of disease propagation for response

planning.

•Management of medical personnel, equipment and vaccines to respond to a disease outbreak.

•Robust design of supply chains to mitigate the disruption of transportation systems.




Energy management

Energy resource management• How to balance investment in ethanol, windmills, nuclear, coal-

to-hydrogen?

• When should we make multidecade commitments to evolving technologies?

• What is the pattern of demands?

• How will climate change affect adoption patterns?

Energy R&D portfolio planning• Where should DOE, NSF, … invest R&D dollars for new

technologies?

• How do we balance investments in different components of an energy technology pathway?

• How do we evaluate the probability of a successful R&D program?

• How do we solve multistage resource allocation problems for R&D problems?


Part VII - CASTLE Lab NewsCASTLE Lab News

New Modeling Language Captures Complexities of Real-World Operations!

75 cents

Spans the gap betweensimulation and optimization.

CASTLE Lab announced the development of a powerful new simulation environment for modeling complex operations in transportation and logistics. The dissertation of Dr. Joel Shapiro, it offers the flexibility of simulation environments, but the intelligence of optimization. The modeling language will allow managers to quickly test

continued on page 3

Thursday, March 2, 1999





A resource allocation model

Attribute vectors:

a =Location

ETAA/C typeFuel level

Home shopCrewEqpt1

Eqpt100

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

LocationETAHome

ExperienceDriving hours

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

TypeLocation

Age

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

Asset classTime invested⎡ ⎤⎢ ⎥⎣ ⎦


Energy resource modeling

The state of a resource:

Capacity of facilitiesLocation

CostCarbon output

AgeReserves

ta

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟

= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

⎫⎪⎪⎪⎬⎪⎪⎪⎭



Modeling resources:» The state of a single resource:

» The state of multiple resources:

» The information process:

The attributes of a single resource The attribute space

aa=∈A

ˆ The change in the number of resources with attribute .

taRa

=

( )The number of resources with attribute

The resource state vectorta

t ta a

R a

R R∈

=

=A



Modeling demands:» The attributes of a single demand:

» The demand state vector:

» The information process:

The attributes of a demand to be served. The attribute space

bb=∈B

( )The number of demands with attribute

The demand state vectortb

t tb b

D b

D D∈

=

=B

ˆ The change in the number of demands with attribute .

tbDb

=



The system state:

( ), , System state, where:

Resource state (how much capacity, reserves) Market demands "system parameters" State of the technology (costs, pe

t t t t

t

t

t

S R D

RD

ρ

ρ

= =

===

rformance) Climate, weather (temperature, rainfall, wind) Government policies (tax rebates on solar panels) Market prices (oil, coal)

⎫⎪⎪⎪⎬⎪⎪⎪⎭



The decision variable:

New capacityRetired capacity

:Type

LocationTechnology

t

for eachx

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟

= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

⎫⎪⎪⎪⎬⎪⎪⎪⎭



Exogenous information:

⎫⎪⎪⎪⎬⎪⎪⎪⎭

( )ˆ ˆ ˆNew information = , ,t t t tW R D ρ=

ˆ Exogenous changes in capacity, reservesˆ New demands for energy from each sourceˆ Changes in technology (due to R&D)

t

t

t

R

Dρ

=

==



The transition function

⎫⎪⎪⎪⎬⎪⎪⎪⎭

1 1( , , )Mt t t tS S S x W+ +=



The three states of our system» The state of a single resource/entity

» The resource state vector

» The system state vector

1

2

3

t

t t

t

aa a

a

⎡ ⎤⎢ ⎥= ⎢ ⎥⎢ ⎥⎣ ⎦

1

2

3

ta

t ta

ta

R

R R

R

⎡ ⎤⎢ ⎥

= ⎢ ⎥⎢ ⎥⎣ ⎦

( ), ,t t t tS R D ρ=



DemandsResources



t t+1 t+2



t t+1 t+2

Optimizing at a point in time

Optimizing over time



Do not use

weather report

Use w

eath

er re

port

Forecast sunny .6

Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200

Schedule game

Cancel game


Schedule game

Cancel game


Schedule game

Cancel game


Schedule game

Cancel game

Forecast cloudy .3

Forecast rain .1

- Decision nodes

- Outcome nodes

Information

ActionInformation

Action

State

State


Laying the foundation

Dynamic programming review:» Let:

» We model system dynamics using:

"State" of our "system" at time t. "Action" that we take to change the system.

( , ) Contribution earned when we take action from state .

t

t

t t t

Sx

C S x x S

==

=

1

1

( | , ) Probability that action takes us from state to state

t t t t

t t

p S S x xS S

+

+

=


Laying the foundation

Bellman’s equation:» Standard form:

» Expectation form:

1 1'

( ) max ( , ) ( ' | , ) ( ') t t x t t t t t t ts

V S C S x p s S x V S s+ +⎛ ⎞= + =⎜ ⎟⎝ ⎠

∑

( ){ }( )1 1( ) max ( , ) ( , ) | t t x t t t t t t t tV S C S x E V S S x S+ += +

Do not use

weather report

Use w

eath

er re

port

Forecast sunny .6


Schedule game

Cancel game


Schedule game

Cancel game


Schedule game

Cancel game


Schedule game

Cancel game

Forecast cloudy .3

Forecast rain .1

- Decision nodes

- Outcome nodes

Do not use

weather report

Use w

eath

er re

port

Forecast sunny .6

Schedule game

Cancel game

Schedule game

Cancel game

Schedule game

Cancel game

Schedule game

Cancel game

Forecast cloudy .3

Forecast rain .1

-$1400

-$200

$2300

-$200

$3500

-$200

$2400

-$200

Do not use

weather report

Use w

eath

er re

port

Forecast sunny .6Schedule game

Cancel game

Forecast cloudy .3

Forecast rain .1 -$200

$2300

$3500

$2400

-$200

Do not use

weather report

Use w

eath

er re

port

$2770

$2400


Bellman’s equation

We just solved Bellman’s equation:

» We found the value of being in each state by stepping backward through the tree.

{ }1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈

= +X


Bellman’s equation

The challenge of dynamic programming:

Problem: Curse of dimensionality

{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈

= +X


The curses of dimensionality

What happens if we apply this idea to our blood problem?» State variable is:

• The supply of each type of blood, along with its age

– 8 blood types– 6 ages– = 48 “blood types”

• The demand for each type of blood– 8 blood types

» Decision variable is how much of 48 blood types to supply to 8 demand types.

• 216- dimensional decision vector» Random information

• Blood donations by week (8 types)• New demands for blood (8 types)



The challenge of dynamic programming:

Problem: Curse of dimensionality


= +X

Three curses

State spaceOutcome spaceAction space (feasible region)



The computational challenge:

How do we find ? 1 1( )t tV S+ +

How do we compute the expectation?

How do we find the optimal solution?


= +X

Do not

weath

Use w

eath

er re

port

Forecast sunny .6


Schedule game

Cancel game


Schedule game

Cancel game

Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200S n 7 $200

Schedule game

Cancel game

Rain 2 -$2000

Forecast cloudy .3

Forecast rain .1

Do not

weath

Use w

eath

er re

port

Forecast sunny .6


Schedule game

Cancel game


Schedule game

Cancel game

Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200S n 7 $200

Schedule game

Cancel game

Rain 2 -$2000

Forecast cloudy .3

Forecast rain .1


= +X

tS

1tS +

Do not use

weather report

Use w

eath

er re

port

Forecast sunny .6


Schedule game

Cancel game


Schedule game

Cancel game


Schedule game

Cancel game


Schedule game

Cancel game

Forecast cloudy .3

Forecast rain .1

- Decision nodes

- Outcome nodes

Do not use

weather report

Use w

eath

er re

port

Forecast sunny .6


Schedule game

Cancel game


Schedule game

Cancel game


Schedule game

Cancel game


Schedule game

Cancel game

Forecast cloudy .3

Forecast rain .1

- Decision nodes

- Outcome nodes


Pre- and post-decision states

New concept:» The “pre-decision” state variable:

•

• Same as a “decision node” in a decision tree.

» The “post-decision” state variable:

•

• Same as an “outcome node” in a decision tree.

The information required to make a decision t tS x=

The state of what we know immediately after we make a decision.

xtS =


⎛⎜⎜⎜⎝

⎞⎟⎟⎟⎠


Pre-decision, state-action, and post-decision

Pre-decision state State Action Post-decision state

93 states 93 9 state-action pairs× 93 states


A single, complex entity

CityETAEquip

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

Dallas41.2

Good

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

40t =Pre-decision

Chicago54.7Good

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

40t =Post-decision

Chicago56.2

Repair

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

50t =Pre-decision

Pre- and post-decision attributes for our nomadic truck driver:

Chicago--

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

Decision40t =

…



( , )t t tS R D=

Pre-decision: resources and demands


, ( , )x M xt t tS S S x=




1 1 1ˆ ˆ( , )t t tW R D+ + +=

xtS ,

1 1( , )M W xt t tS S S W+ +=



1tS +


System dynamics

It is traditional to assume you are given the one-step transition matrix:

» Computing the transition matrix is impossible for the vast majority of problems.

We are going to assume that we are given a transition function:

» This is at the heart of any simulation model. » Often rule-based. Very easy to compute, even for large-scale

problems.

( )1 1, , Mt t t tS S S x W+ +=

1 1 ( | , ) Probability that action takes us from state to state t t t t t tp S S x x S S+ +=



Computing the post-decision state:» Method 1 – Divide the effect of decisions and information

» Method 2 – State-action pairs (“Q-learning”)

» Method 3 – Post-decision based on point estimate

( )( )

, 1 , 1 1

1 1

, , is a point-estimate of at time .

, ,

x Mt t t t t t t t

Mt t t t

S S S x W W W t

S S S x W+ + +

+ +

=

=

( ), Produces huge post-decision state spacext t tS S x=

( )( )

,

,1 1

, The pure effect of a decision

, The effect of the exogenous information

x M xt t t

M W xt t t

S S S x

S S S W+ +

=

=



Actually, we have three transition functions:» The attribute transition function:

» The resource transition function

» The general transition function:

( )( )

,

,1 1



x M xt t t

M W xt t t

a a a x

a a a W+ +

=

=

( )( )

,

,1 1



x M xt t t

M W xt t t

S S S x

S S S W+ +

=

=

( )( )

,

,1 1



x M xt t t

M W xt t t

R R R x

R R R W+ +

=

=


Bellman’s equations with the post-decision state

Bellman’s equations broken into stages:

» Optimization problem (making the decision):

• Note: this problem is deterministic!

» Simulation problem (the effect of exogenous information):

( )( ),( ) max ( , ) ( , ) x M xt t x t t t t t t tV S C S x V S S x= +

{ },1 1( ) ( ( , )) |x x M W x x

t t t t t tV S E V S S W S+ +=


Bellman’s equations with the post-decision state

Challenges» For most practical problems, we are not going to be

able to compute .

» Concept: replace it with an approximation and solve

» So now we face:• What should the approximation look like?• How do we estimate it?

( )( ) max ( , ) ( ) x xt t x t t t t tV S C S x V S= +

( )x xt tV S

( )xt tV S

( )( ) max ( , ) ( ) xt t x t t t t tV S C S x V S= +


Approximating the value function

Value function approximations:» Linear (in the resource state):

» Piecewise linear, separable:

» Indexed PWL separable:

( ) ( )x xt t ta ta

aV R V R

∈

= ∑A

Best when assets are complex,which means that is small(typically 0 or 1).

taR

Best when assets are simple,which means that may belarger.

taR

( )x xt t ta ta

aV R v R

∈

= ⋅∑A

( ) ( ) | ( )x xt t ta ta t

aV R V R features

∈

= ∑A

Helps to capture dependencies.

e.g. status of technology, climate, …


Approximating the value function

Value function approximations:» Ridge regression (Klabjan and Adelman)

» Benders cuts (more on this later)

0x

( )t tV R

1x

( ) ( ) f

xt t tf tf tf fa ta

f a

V R V R R Rθ∈ ∈

= =∑ ∑F A


Our general algorithmStep 1: Start with a post-decision state Step 2: Obtain Monte Carlo sample of and

compute the next pre-decision state:

Step 3: Solve the deterministic optimization using anapproximate value function:

to obtain . Step 4: Update the value function approximation

Step 5: Find the next post-decision state:

, 1 ,1 1 1 1 1 1 ˆ( ) (1 ) ( )n x n n x n n

t t n t t n tV S V S vα α−− − − − − −= − +

( )1 ,ˆ max ( , ) ( ( , ) )n n n M x nt x t t t t t tv C S x V S S x−= +

ntx

( ), ,1 , ( )n M W x n n

t t tS S S W ω−=

,1

x ntS −

( )ntW ω

, , ( , )x n M x n nt t tS S S x=

Simulation

Optimization

Statistics


Competing updating methods

Comparison to other methods:» Classical MDP (value iteration)

» Classical ADP (pre-decision state):

» Our method (update around post-decision state):

( ), 1 ,

, 1 ,1 1 1 1 1 1

ˆ max ( , ) ( ( , ))

ˆ( ) (1 ) ( )

n n x n M x nt x t t t t t t

n x n n x n nt t n t t n t

v C S x V S S x

V S V S vα α

−

−− − − − − −

= +

= − +

( )11( ) max ( , ) ( )n n

x tV S C S x EV S−+= +

( )11

'

11 1

ˆ max ( , ) ( ' | , ) '

ˆ( ) (1 ) ( )

n n n nt x t t t t t t

s

n n n n nt t n t t n t

v C S x p s S x V s

V S V S vα α

−+

−− −

⎛ ⎞= +⎜ ⎟⎝ ⎠

= − +

∑ˆ updates ( )t t tv V S

1 1ˆ updates ( )xt t tv V S− −

, 1x ntV −




Blood management

Managing blood inventories


Blood management

Managing blood inventories over time

t=0

0S1 1

ˆ ˆ,R D1S

Week 1

1x1xS

2 2ˆ ˆ,R D

2S

Week 2

2x2xS

3 3ˆ ˆ,R D

3S3x

Week 2

3xS

t=1 t=2 t=3

O-,1

O-,2

O-,3

AB+,2

AB+,3

O-,0

,ˆ

t ABD +

AB+,0

AB+,1

AB+,2

O-,0

O-,1

O-,2

,( ,0)t ABR +

,( ,1)t ABR +

,( ,2)t ABR +

,( ,0)t OR −

,( ,1)t OR −

,( ,2)t OR −

,ˆ

t ABD −

,ˆ

t AD +

,ˆ

t ABD +

,ˆ

t ABD +

,ˆ

t ABD +

,ˆ

t ABD +

AB+

AB-

A+

A-

B+

B-

O+

O-

xtR

AB+,0

AB+,1

,ˆ

t ABD +

Satisfy a demand Hold

tS = ( )ˆ , t tR D

AB+,0

AB+,1

AB+,2

tR

O-,0

O-,1

O-,2

xtR

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

1,ˆ

t ABR + +

1tR +

1,ˆ

t OR + −

ˆtD

,( ,0)t ABR +

,( ,1)t ABR +

,( ,2)t ABR +

,( ,0)t OR −

,( ,1)t OR −

,( ,2)t OR −

AB+,0

AB+,1

AB+,2

tR xtR

O-,0

O-,1

O-,2

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

ˆtD

,( ,0)t ABR +

,( ,1)t ABR +

,( ,2)t ABR +

,( ,0)t OR −

,( ,1)t OR −

,( ,2)t OR −

( )tF R

AB+,0

AB+,1

AB+,2

tR xtR

O-,0

O-,1

O-,2

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

ˆtD

,( ,0)t ABR +

,( ,1)t ABR +

,( ,2)t ABR +

,( ,0)t OR −

,( ,1)t OR −

,( ,2)t OR −

Solve this as a linear program.

( )tF R

AB+,0

AB+,1

AB+,2

tR xtR

O-,0

O-,1

O-,2

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

ˆtD

Dual variables give value additional unit of blood..

Duals

,( ,0)t̂ ABν +

,( ,1)t̂ ABν +

,( ,2)t̂ ABν +

,( ,0)t̂ Oν −

,( ,1)t̂ Oν −

,( ,2)t̂ Oν −


Updating the value function approximation

Estimate the gradient at

,( ,2)nt ABR +

,( ,2)ˆnt ABν +

ntR

( )tF R



Update the value function at

,1

x ntR −

11 1( )n x

t tV R−− −

,1

x ntR −

,( ,2)ˆnt ABν +

( )tF R

,( ,2)nt ABR +



Update the value function at ,1

x ntR −

,( ,2)ˆnt ABν +

,1

x ntR −

11 1( )n x

t tV R−− −



Update the value function at ,1

x ntR −

,1

x ntR −

11 1( )n x

t tV R−− −

1 1( )n xt tV R− −


Blood management

t


Blood management


Blood management


Blood management


0

50

100

150

200

250

300

350

400

450

500

0 50 100 150 200

Iterations

Tota

l Sho

rtage

s (#

uni

ts)

Not UsingValueFunctions

Using ValueFunctions

Blood management








Implementation metricsResults from the real world:

2521

30 32

41

21

37.7

10.6 12

05

1015202530354045

Setouts Swaps Nonpreferredconsists

Underpowered Overpowered

Perc

ent

HistoryModel


Schneider National



0

200

400

600

800

1000

1200

1400

US_SOLO US_IC US_TEAM

Capacity category

Rev

enue

per

WU

Historical maximum

Simulation

Historical minimum

0

200

400

600

800

1000

1200

US_SOLO US_IC US_TEAM

Capacity category

Util

izat

ion Historical maximum

Simulation

Historical minimumRevenue per WU

Utilization

Case study: truckload trucking

Historical min and maxCalibrated model




Two-stage optimizationPiecewise linear, separable value function approximations:

Piecewise linear, separable:

( ) ( )t t tl tll

V R V R∈

=∑L


Two-stage optimizationBenders decomposition:

0x

( )t tV R

1x

⎫⎪⎪⎬⎪⎪⎭

Multidimensional cuts produce provably convergent, nonseparablevalue function approximation.


The competition

Exact solutions using Benders:

0x

0V“L-Shaped” decomposition

(Van Slyke and Wets)

0x

0VStochastic decomposition

(Higle and Sen)

0x

0VCUPPS

(Chen and Powell)


The competitionPercent from optimal 100 iterations

0

5

10

15

20

25

30

35

40

45

SD L-shaped CUPPS SPAR

10 locations25 locations50 locations100 locations

10

20

30

40

0

Percent over optimal after 100 iterations

Benders

Perc

ent e

rror

Increasing problem size

Separable


The competitionPercent from optimal 100 iterations

0

5

10

15

20

25

30

35

40

45

SD L-shaped CUPPS SPAR

10 locations25 locations50 locations100 locations

10

20

30

40

0

Percent over optimal after 100 iterations

Increasing problem size

Benders Separable

Perc

ent e

rror


Multistage problems

Deterministic, (integer) multicommodity flow


Multistage problems

Deterministic, (integer) multicommodity flow

60

65

70

75

80

85

90

95

100

105

Base

T_30

T_90

I_10

I_40

C_IIC_IIIC_IV R_1

R_5R_10

0R_40

0C_1C_8

Perc

ent o

f opt

imal

100 = optimal continuous relaxation


Multistage problems

Stochastic, (integer) multicommodity flow

0

20

40

60

80

100

120

Base

I_10

I_40

C_II C_III

C_IV R_1 R_5R_1

00R_4

00 C_1 C_8

Perc

ent o

f pos

terio

r opt

imal

Rolling horizonADP


Properties of separable, piecewise linear value function approximations:» Converges to optimal when we sample all points infinitely often:

• H. Topaloglu and W. B Powell, OR Letters, 2003.» Provably optimal for two-stage, nonseparable functions with continuously

differentiable second stage:• R. K.-L. Cheung and W.B. Powell, Operations Research, 2000.

» Provably optimal for two-stage, separable problems:• Powell, W.B., A. Ruszczynski and H. Topaloglu, Mathematics of Operations

Research, 2004.» Near-optimal for two-stage, nonseparable with nondifferentiable second

stage:• Powell, W.B., A. Ruszczynski and H. Topaloglu, Mathematics of Operations

Research, 2004.» Provably optimal for scalar, finite-horizon multistage problems:

• J. Nascimento and W. B. Powell, (under review, Math of OR)• J. Nascimento and W. B. Powell (in preparation)

Results apply when we use “pure exploitation” – do not assume points are sampled infinitely often.

Two-stage stochastic programming





oiltx

2008

oiltR ˆ oil

tD ˆ oiltρˆ oil

tRNew information 2009

1oiltR + 1

oiltx + 1

ˆ oiltD + 1ˆ oil

tρ +1ˆ oil

tR +

New information

windtxwind

tR ˆ windtD ˆ wind

tρˆ windtR 1

windtR + 1

windtx + 1

ˆ windtD + 1ˆ wind

tρ +1ˆ wind

tR +

coaltxcoal

tR ˆ coaltD ˆ coal

tρˆ coaltR 1

coaltR + 1

coaltx + 1

ˆ coaltD + 1ˆ coal

tρ +1ˆ coal

tR +

corntxcorn

tR ˆ corntD ˆ corn

tρˆ corntR 1

corntx +1

corntR + 1

ˆ corntD + 1ˆ corn

tρ +ˆ corn

tR


We have to allocate resources before we know the demands for different types of energy in the future:



We use value function approximations of the future to make decisions now:



,,1x ntR →

,,2x ntR →

,,3x ntR →

,,4x ntR →

,,5x ntR →

This determines how much capacity to provide:



,1ˆ ( )ntv ω

,2ˆ ( )ntv ω

,3ˆ ( )ntv ω

,4ˆ ( )ntv ω

,5ˆ ( )ntv ω

Marginal value:

,,1x ntR →

,,2x ntR →

,,3x ntR →

,,4x ntR →

,,5x ntR →



1, 1,( )xt AB t ABV R− + − +

,1,

x nt ABR − +

Using the marginal values, we iteratively estimate piecewise linear functions.



R1t

ktv+

ktv−

Right derivativeLeft derivative

1, 1,( )xt AB t ABV R− + − +

,1,

x nt ABR − +




R1t

( 1)ktv+ +

( 1)ktv− +

1, 1,( )xt AB t ABV R− + − +

,1,

x nt ABR − +




Linear value function approximations:

Linear (in the resource state):

( )t t tl tll

V R v R∈

= ⋅∑L



Piecewise linear, separable value function approximations:

Piecewise linear, separable:

( ) ( )t t tl tll

V R V R∈

=∑L



Research challenges

Approximate dynamic programming:» At the heart of an ADP algorithm is the challenge of

finding a value function approximation “that works”• Can be used within commercial LP solvers• Can be updated (estimated) easily• Is stable• Provides high quality solutions

» Assessing solution quality• Is it realistic?

– Do we seem to mimic markets and public policy?• Is it robust?

– Do we achieve energy goals under different scenarios?


For the dynamic energy resource model, it is not enough to have a value function that depends purely on the resource vector.» The value of coal plants depends on our ability to sequester carbon.» We need to capture the “state of the world” in our value function

approximations.Strategies:» Let be the full system state vector, capturing the cost of

technologies, government policies, etc. etc.» Let be a set of “features” that appear to be

important explanatory variables. Identifying features is the “art” of ADP.

» We can then fit value functions that depend on the features.

( | ( )) ( | ( ))t t t ta ta ta

V R S V R Sφ φ∈

= ∑A

Research challenges

tS

( ),f tS fφ ∈F


Research challenges

Strategies for fitting

» Lookup-table• Very general, but suffers from curse of dimensionality

» Linear regression with low-dimensional polynomials• Can work –depends on the problem.

» Kernel regression• Powerful strategy that combines lookup-table with regression

models.• Use within ADP is surprisingly young.• Variety of issues unique to ADP.

( | ( )) ( | ( ))t t t ta ta ta

V R S V R Sφ φ∈

= ∑A


Research challenges

Approximate dynamic programming:» How do we establish that we are getting “good”

solutions?• Demonstrate techniques on simpler problems.• Compare against other methods for larger problems.

» We need algorithms that are fast and stable.• Identifying variance reduction methods from the simulation

community that work on this problem class.• Developing kernel regression techniques for improved fitting

of the value function.• Finding the best smoothing techniques for recursive updating.• Parallel processing for accelerating simulations.


Research challenges

System modeling» Modeling the evolution of technology using compact

representations• If we invest in technology, how do we describe the change

process?

» Modeling physical processes at multiple scales• Wind, temperature, rainfall at different levels of discretization.

» We need a software architecture that allows a larger community to participate in the modeling

• We need to tap into various types of domain expertise, such as climate modeling, transportation modeling, …



OutlineR&D for hydrogen fuel cells


R&D optimization for hydrogen fuel cellWe have been testing two methods for solving the hydrogen fuel cell R&D portfolio problem» Brute force

• Enumerate all decisions• Use Monte Carlo sampling to estimate the value of a particular

set of technologies• Will not scale to large problems

» Approximate dynamic programming• Replace value function with linear approximation• Determine portfolio by solving a knapsack problem using a

solver.• Scales to large problems, but how large is the error introduced

by the linear value function approximation?


R&D optimization for hydrogen fuel cellTest problems» Smaller dataset

• 12 projects• Must choose 5 to research• 792 combinations

» Larger dataset• 18 projects• Must choose 5 to fund• 8568 combinations

» General• First choose projects to research• Learn results of research• Choose the best technologies for the fuel cell, and evaluate the cost of

the fuel cell.


R&D optimization for hydrogen fuel cell

Elements of our hydrogen fuel cell problem



The optimization problem» Performance of fuel cell depends on parameters » Choose a subset of projects to perform additional

research.» Parameters for the chosen project will change in a

random way.



Shape of the cost function


Marginal value of each research projectTop 5 are funded.Projects in common color compete

Project 6 drops out of R&D portfolio; project 9 enters


Results from 792 R&D portfolios


Confidence interval for the value of the solution resulting from a particular R&D portfolio (five projects).

Best estimate


Best overall solution (from brute force)Optimal solution chosen by ADP


Optimal solution chosen by ADP Best overall solution (from brute force)

Results from 8568 R&D portfolios

The Dynamic Energy Resource Model - Princeton …castlelab.princeton.edu/html/Presentations/Powell_LLNL...© 2007 Warren B. Powell Slide 1 The Dynamic Energy Resource Model Group Peer

Documents