Top Banner
Day 4: Stochastic Dynamic Programming Day 4 Notes Howitt and Msangi 1
50

Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Jul 15, 2015

Download

Education

IFPRI-EPTD
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Day 4: Stochastic Dynamic Programming

Day 4 NotesHowitt and Msangi 1

Page 2: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Understand Bellman’s Principle of Optimality and the basic Stochastic Dynamic programming problem

Solve the SDP with value function iteration

Apply the concepts of models to agro-forestry and livestock herd dynamics

Make changes to the SDP and simulate the corresponding change in optimal solution

Day 4 NotesHowitt and Msangi 2

Page 3: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Re-cap on rangeland stocking model…. Introduction to Stochastic Dynamic Programming◦ Extend DP framework to include stochastic state variables

and apply to herd and agro forestry management

Stochastic Cake Eating

Multi-State Models◦ Function Approximation

Agro-Forestry Application◦ Input Data and State Space◦ Simulation

Herd Dynamics Application◦ Input Data◦ Simulation

Day 4 NotesHowitt and Msangi 3

Page 4: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

An Application to Reservoir Management

Day 3 NotesHowitt and Msangi 4

Page 5: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

“Estimating Intertemporal Preferences for Resource Allocation” AJAE, 87(4): 969-983.(Howitt RE, S Msangi, A Reynaud, KC Knapp)

What started out as a calibration exercise –ended up as a research project (with some interesting research discoveries)

Day 3 NotesHowitt and Msangi 5

Page 6: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Many of the Important Policy Questions in Natural Resource Management Revolve Around How to Deal with Uncertainty over Time (Global Climate Change, Extreme Weather Events, Invasive Species Encroachment, Disease Outbreak, etc. )

Policy Makers look to Economic Models to Provide them with Guidance on Best Management Practices

Page 7: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Economic Policy Models Have Typically Downplayed the Role of Risk in the Preferences of the Decision-maker

Few Studies Have Ever Tried to Measure the Degree to Which Risk Aversion Matters in Resource Management Problems

Time-Additive Separability in Dynamic Models Imposes Severe Constraints on Intertemporal Preferences

Page 8: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

In order to Address this Gap in the Natural Resources literature….

We Applied Dynamic Estimation Methods to an Example of Reservoir Management

We Relaxed the Assumption of Time-Additive Separability of the Decision-Maker’s Utility

We Tested with Alternative Utility Forms to Determine the Importance of Risk Aversion

Page 9: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Koopmans (1960) laid the foundation for eliminating deficiencies of TAS with recursive preferences.

Recursive Utility is a class of functionals designed to offer a generality to time preferences while still maintaining time consistency in behavior.

Allows for the potential smoothing of consumption by allowing complementarity between time periods.

Page 10: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

( )W

( )1( ) ( ), ( )U W u c U S=c c

States the weak separability of the future from present

where

is an aggregator function

For TAS, the aggregator is simply ( )( ), ( )W u c x u c xβ= +

Page 11: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

( ) ( )1

( ), 1 ( )W u c x u c x ρρ ρβ β = − ⋅ + ⋅

1( )1

EIS σρ

=−

So we choose our aggregator to be

and the implied elasticity (“resistance”) to inter-temporal substitution is given by

where ( ),0 (0,1]ρ ∈ −∞ ∪

Page 12: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Time Additive Separable UtilityUsing Bellman’s recursive relationship:

{ }

{ }

{ }

{ }

1

2

1 2

1

1 1 2

2 2 3

2 31 2 3, ,

11, 2

( ) max ( ) ( )

( ) max ( ) ( )

( ) max ( ) ( )

:

( ) max ( ) ( ) ( ) ( )

( )( )

t

t

t

t t t

t t tc

t t tc

t t tc

t t t t tc c c

tt t

V x U c V x

V x U c V x

V x U c V x

Substituting and simplifying

V x U c U c U c V x

u cNote that MRS c

β

β

β

β β β

β

+

+

+ +

+

+ + +

+ + +

+ + +

++ +

= +

= +

= +

= + + +

′=

2( )tu c +′

Page 13: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Iso-Elastic Recursive UtilityA utility function with a CES across time periods.

1

2

1

1

1

1

1 1 2

1

2 2 3

( 1) 1,

( ) max (1 ) ( ) ( )

( ) max (1 ) ( ) ( )

( ) max (1 ) ( ) ( )

:

( ) max (1 ) ( ) (1 ) (

t

t

t

t t

t t tc

t t tc

t t tc

t At t t tc c

V x U c V x

V x U c V x

V x U c V x

Substituting and simplifying

V x U c U c

ρ ρ ρ

ρ ρ ρ

ρ ρ ρ

ρ

β β

β β

β β

β β β

+

+

+

+

+ + +

+ + +

+ +

= − +

= − +

= − +

= − + −1

1 2

1 1

22

12 3

( 2) 1 2 3, ,

1 21 1 2

1

2

2

) ( )

( ) max (1 ) ( ) (1 ) ( ) (1 ) ( ) ( )

( ) (1 ) ( ) (1 ) ( ) (1 ) ( ) ( )

( ) (1 )

t t t

t

t At t t t t tc c c

tt t t t

t

t

t

V x

V x U c U c U c V x

V x u c U c U c V xc

V xc

ρ

ρ

ρ ρ

ρ ρ ρ ρ ρ

ρ ρ ρ ρ

β

β β β β β β

β β β β β β

β β

+ +

+

+ + + +

−+ + +

+

+

+

= − + − + − +

∂ ′ = − − + − + ∂

∂ ′= −∂

1 1

1 2 32 1 2 3( ) (1 ) ( ) (1 ) ( ) (1 ) ( ) ( )t t t t tu c U c U c U c V x

ρρ ρ ρ ρ ρβ β β β β β

−+ + + + − + − + − +

Page 14: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

With Recursive Utility All Periods Enter into MRS

11 121 1 2

1, 2 2 32 1 2 3

( ) (1 ) ( ) (1 ) ( ) ( )1( ) (1 ) ( ) (1 ) ( ) (1 ) ( ) ( )

t t t tt t

t t t t t

u c U c U c V xMRSu c U c U c U c V x

ρ ρ ρ ρ ρ

ρ ρ ρ ρ

β β β ββ β β β β β β

− −

+ + ++ +

+ + + +

′ − + − += ′ − + − + − +

In micro-economics we have an appreciation of the difference between linear and CES utility in static consumer theory

The same intuition applies here in a dynamic context….

Page 15: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

The previous equations show that the marginal rate of substitution across time is path dependent.

Timing is now an explicit economic control variable We no longer assume that “The marginal rate of

substitution between lunch and dinner is independent of the amount of breakfast” (Henry Wan).

A smaller elasticity of intertemporal substitution flattens out the optimal time path of resource use-yielding a time consistent sustainable result.

Page 16: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Stochastic Equations of Motion link Stocks and Flows

Randomness in the equations of motion or exogenous random shocks change the system evolution

The current state and future distributions are usually known to decision makers

Management decisions inherently optimize a stochastic dynamic path of resource use and consequently maximize dynamic stochastic utility

Page 17: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

A Simple Resource Network with a Single State Variable

te1~

te2~

Demand

tS

tw

Page 18: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

( ) ( )2 1

1

1Max (1 ) E ( ) E t e t t e twU W q U

ρ ρρ α αβ β +

= − ⋅ +

.

≥≤≥

+=−+=

+

+

+

0

~ ~

1

1

2

11

t

t

t

ttt

tttt

wSSSS

ewqweSS

The Optimization Problem

Page 19: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

( )1

2 2 1 10, Max (1 ) ( )d ( , )d t /w

V S e W w e Φ V S e Φρ ρ

ρ α αβ β≥

= − ⋅ + +

∫ ∫

Which can be re-stated in terms of Bellman’s Recurrence Relationship…

..and which we solve by numerically with Continuous-valued State and Control Variables

Page 20: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Solving for the Expected Value Function

Initialize withIntermediate Value Function W(Xt , ut )

Nodes for: State Evaluation and Stochastic Inflow valuesProbabilities of Inflow over k Stochastic node values

Define the State Variable on [ -1, 1] Interval for each polynomial node j

Value Iteration Loop (n steps)n = n+1

Error = If Error > 0.1 e-7Stop

Value Function Chebychev Polynomial Coefficients

jXaVpuXWPVNBk i

kjti

niktt

nj ∀

+= ∑ ∑ +

− )(),(max ,1

1

φβ

( )∑ −−i

ni

ni aa 21

( )( ) ( )∑

∑++

+

=

j

njti

njti

j

njti

nj

ni XX

XPVNBa ,

1,1

,1

φφ

φ

( )jx

Page 21: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming
Page 22: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming
Page 23: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

( ) 32 0067.045.0150 qqqqW ⋅+⋅−⋅=

ttt

ttttt

capeeeecapesp

⋅⋅⋅+

⋅+⋅=

13

1

2111

~0.02305-~0.000993

~0.005024~0.095382),~(

Current Profit Function

Spill Function

Page 24: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Net Benefit Function for Water

0

1

2

3

4

5

6

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56

q, (MAF)

W(q

), 10

00 M

$U

S

Page 25: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

We employ a nested procedure to solve the SDP problem with value iteration, while we systematically change the parameter values of the objective function to maximize a likelihood function.

We employ a derivative-free ( Nelder Meade) search algorithm to implement the ‘hill-climbing’ procedure that searches for the likelihood-maximizing values of preference parameters

Page 26: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

EIS value( )1 1 ρ−

Coeff. of Risk Aversion

1 α−

ρ

α

These parameters were calculated with a fixed discount rate of β

Parameter Estimated Value Standard Error

-9.000 4.60 0.100

-0.440 0.23 1.440

Log Likelihood -10.257

=0.95. Standard errors are based on 500 bootstrap repetitions

Page 27: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

1,set estimateα ρ=

For Risk-Neutral Recursive model (RNR)

For Risk-Neutral (non-Recursive) model (RN)

For Non-Recursive model (with Risk) use CRRA

1set ρ α= =

( )0.95fix β =

( )0.95fix β =

)(1 1

)1(

+

+−= tt

t UEWU βαα

,estimate α β

Page 28: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

0

0.5

1

1.5

2

2.5

3

3.5

1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996

Mill

ion

Acr

e fe

et

RN

ACTUAL

Page 29: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

0

0.5

1

1.5

2

2.5

3

3.5

1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996

Mill

ion

Acr

e fe

et

CRRA

ACTUAL

Page 30: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

0

0.5

1

1.5

2

2.5

3

3.5

1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996

Mill

ion

Acr

e fe

et

REC

ACTUAL

Page 31: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

0

1

2

3

4

5

6

7

8

9

1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996

Mill

ion

Acr

e fe

et

RN

ACTUAL

Page 32: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

0

1

2

3

4

5

6

7

8

9

1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996

Mill

ion

Acr

e fe

et

REC

CRRA

ACTUAL

Page 33: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Clearly a non-recursive model that ignores risk fares the worst, when compared to actual storage and releases

Adding risk, but not recursivity of preferences, gets you closer to actual values…but not quite….

A Recursive Specification outperforms both of these, with or without risk aversion

Page 34: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Estimation of the Fully-Recursive model is robust to Discount Values and the Parameter Estimates appear to be Stationary over the Study Period

Once we allow Intertemporal Preferences to be recursive, the role of Risk in explaining Resource Management Behavior is Reduced

Imposing Time-Additive Separability on Dynamic Models may have more severe implications for behavior than most researchers realize…..

Page 35: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Day 3 NotesHowitt and Msangi 35

Page 36: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Extend DP framework to include stochastic state variables in the model

Apply the new framework to herd dynamics and agro-forestry management

Return to cake eating example

Day 4 NotesHowitt and Msangi 36

Page 37: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Stochastic Cake Eating◦ What if I want cake today, but not tomorrow?

Cake Eating Example: CakeEatingDP_ChebyAprox_Stochastic_Day4.gms

Consider a taste shock , so that utility from cake consumption is now: ◦ Knows the value of stochastic shock today, but

unknown for future periods. ◦ Agent should factor in the potential future shocks

in today’s consumption decision

Day 4 NotesHowitt and Msangi 37

ε( , )u c ε

Page 38: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Step 1: Define nature of stochastic shock◦ First-order Markov process: probability of future

shocks is described by current period◦ Two states: , described by and◦ The transition between states follows a first-order

Markov process, described by matrix :

◦ An element in the matrix yields the probability of moving from state i to j in the next period:

Day 4 NotesHowitt and Msangi 38

andl h hε lε

Πll lh

hl hh

π ππ π

Π =

( )1Pr |ij t j t iπ ε ε ε ε+≡ = =

Page 39: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Agent’s choice of how much cake to eat depends on:◦ Size of cake◦ Realization of the taste shock

With current shock knowledge and expected transition to future periods, the stochastic cake-eating problem can be written as:

Day 4 NotesHowitt and Msangi 39

( ) ( ){ }1| 1 1 1( , ) max , , , ~t t

tt t t t t t t t tc

V x u c E V x x x c Markovε εε ε β ε ε+ + + += + = −

Page 40: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Markov process for evolution of taste shock states that today’s preferences yields the probability of tomorrow’s preferences◦ This may not hold if we believe that tomorrow does not

depend on the value today

We can specify any type of random variable in the SDP problem.◦ Consider specifying the taste shock as a random variable◦ Define e points, with known probability, , of a shock

with magnitude , we define the probabilities such that:

Day 4 NotesHowitt and Msangi 40

epreshk

1ee

pr =∑

Page 41: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

After defining the known probability and shock of magnitude, we can re-write the stochastic cake-eating problem as:

Assume the stochastic shock affects utility multiplicatively:

Simple stochastic process where the distribution of e in future periods is independent of the current period and independent of other states and the control.

The contraction mapping theorem holds: there exists a fixed point of the function equation (Bellman)◦ Solve for this point using same methods for the deterministic DP

Day 4 NotesHowitt and Msangi 41

( ) ( ){ }1| 1 1 1( , ) max , , , ~t t

tt t t t e e t t t t tc

V x e u c e E V x e x x c e RVβ+ + + += + = −

( )1 1 1 1( , ) max ( ) ( ) ( ) ( ) ,t

t t t t t t t t t tc eV x e shk e u c pr e shk e V x e x x cβ + + + +

= + = −

Page 42: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

SDP and DP framework both extend naturally to models with several state variables.◦ Will generally involve multiple states that we need to

simultaneously model For example: Herd stocking (prices, disease, rainfall, herd

size and population dynamics◦ In general, we can write for any number of states m:

Computational costs of extending the dynamic framework to many states◦ As the number of states increases, so does the number

of points we must evaluate and solve the DP.◦ “Curse of dimensionality”

Day 4 NotesHowitt and Msangi 42

( ) ( ) ( ){ }1

11 1( ) max , ,..., ,..., ,

tt

m m m m mt t t t t t t tc

V x f c x x V x x x g x cβ++ += + =

Page 43: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Function Approximation◦ Extend naturally to multi-state applications Chebychev approximation approach◦ Extension to m states Define the state variables upper and lower bounds:

Map to the [-1,1] interval using the same formula:

Transformation back to the interval can be calculated as:

Day 4 NotesHowitt and Msangi 43

,m mL U

2 1ˆ cos , for 1,...,2j

m jx j nn

π − = =

,m mL U

( )( )ˆ

2j

j

m m m m mm

x L U U Lx

+ −=

Page 44: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

◦ Given the mapping back to the interval, we can now define the Chebychev interpolation matrix using the recursive formula:

◦ Defined the state space and Chebychev nodes and basis functions for each state variable m.

◦ We can write the Chebychev approximation to the value function as:

◦ The value function approximation with multiple state simply extends the Chebychev polynomials to additional dimensions to approximate the solution over each state.

Day 4 NotesHowitt and Msangi 44

,m mL U

1

2

1 2

1

ˆ

ˆ2 3

m

m

mj j j

x

x j

φ

φ

φ φ φ− −

=

=

= − ∀ ≥

1..1

....m jj jm

m

mj j

V a φ=∑ ∑ ∏

Page 45: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Agro-Forestry Example: AgroForestryModel_DP_Day4.gms◦ Varying degree of age, expected yield and

profitability—how do I manage a fixed amount of land with new plantings and removals?

Input Data and State Space◦ 20 year time horizon◦ Early, mature and old trees◦ 60% of early tree plantings transition to mature

trees and 30% of mature trees transition to old trees

Day 4 NotesHowitt and Msangi 45

Page 46: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

The transition between age profiles are as follows:

Model Data◦ 100 hectares◦ Cost to uproot is 20/ha◦ Cost to replant is 100/ha◦ 5% discount rate

Key Model Parameters

Day 4 NotesHowitt and Msangi 46

Transition Matrix Early Mature OldEarly 0.4 0.6 0Mature 0 0.7 0.3Old 0 0 1

Model Data Early Mature OldPrice per kg 10 10 10Yield (kg/ha) 0 10 5Initial profile (plantings) 10 5 4

Page 47: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Simulation◦ Three state variables: early, mature, old◦ Approximate the solution of the infinite horizon

problem by Chebychev approximation of the value function Define m=3, and:

Day 4 NotesHowitt and Msangi 47

1 2 3, ,1 2 3jj j j

m

mj j j

V a φ=∑∑∑ ∏

Page 48: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Herd Dynamics Example: HerdDynamics_DP_Day4.gms◦ Varying degree of age and productivity◦ Three state variables: juvenile, female adult and

male adult Productive output: milk and meat Grazing land: fixed amount and known productivity Minimum number of livestock for breeding purposes◦ When do we add to the herd, or sell from the herd,

given market conditions and resource constraints?

Day 4 NotesHowitt and Msangi 48

Page 49: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Input Data◦ 40 year time horizon◦ 5% discount rate◦ Other key input assumptions:

◦ Females birth rate = 1.5 juveniles per year 30% juveniles, 30% transition to males, and 40% transition to females

◦ Herd can be fed by grazing on a fixed amount of land, or from off-farm purchased feed Different nutrient content and ultimately different animal productivity

Day 4 NotesHowitt and Msangi 49

Input DataJuven

ileAdult Male

Adult Female

Animal weight 40 300 275Milk yield (kg/yr/animal) 0 0 50Initial animals 60 20 30Birth rate per female (animal/yr) 1.5 0 0

Transition Matrix JuvenileAdult Male

Adult Female

Juvenile 0.3 0.3 0.4Adult Male 0 1 0Adult Female 0 0 1

Page 50: Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Programming

Simulation◦ Over a 100 year time horizon◦ Approximates the value function at 3 Chebychev

nodes◦ Agent to maximize present value of profits by

determining optimal rates of: Animals sold and purchased Milk sold◦ Agent may purchase off-farm feed, and responds to

fixed and known market demand and supply for inputs and outputs◦ Herd age evolves endogenously by defined

parameters

Day 4 NotesHowitt and Msangi 50