Information collection in a linear programscholar.rhsmith.umd.edu/.../files/2010_sp12_lp.pdf · Computation Theory: asymptotic optimality 4 Experimental results 5 Conclusions 2/37.

Information collection in a linear program

Ilya O. Ryzhov Warren B. Powell

Operations Research and Financial EngineeringPrinceton University

Princeton, NJ 08544, USA

International Conference on Stochastic ProgrammingAugust 17, 2010

1 / 37

Outline

1 Introduction

2 Mathematical model

3 The knowledge gradient algorithmDerivationComputationTheory: asymptotic optimality

4 Experimental results

5 Conclusions

2 / 37

Outline

1 Introduction




5 Conclusions

3 / 37

Motivation: emergency response

Our goal is to find the shortest(least congested) path across anetwork

This is an LP where eachobjective coefficient representsthe congestion on an edge

We can measure the localcongestion on an individual edge(e.g. from the air) and changeour estimate of the congestionon that edge

4 / 37

Motivation: agricultural planning

We solve an LP to maximizetotal crop yield subject toacreage constraints indifferent fields

The exact yield fromplanting a certain field isunknown

Before settling on a plan, weperform expensive soil testson different fields to improveour beliefs about the yield

5 / 37

LP formulation

We consider an LP in standard form,

V (c) = maxx cT xs.t. Ax = b

x ≥ 0,

where the vector c ∈RM is unknown

We have a Bayesian prior belief about c in which the coefficients arecorrelated

We can measure a coefficient (e.g. perform a soil test) and observe aresult that changes our beliefs

We are given N measurements to learn the true optimal valueV (c)...what should we measure?

6 / 37

The effect of learning

Changing our estimate of a single objective coefficient can drasticallychange what we believe to be the optimal solution

Consider the shortest-path problem:

7 / 37




7 / 37




7 / 37

Correlated beliefs in optimal learning

By measuring one coefficient, we can obtain information about manyother coefficientsIn a traffic network, if edge (i , j) is congested, it is likely that edgesinto i and out of j are congested

8 / 37



8 / 37



8 / 37


Correlations are modeled using a covariance matrix

We assume c ∼N(c0,Σ0

)Example:

Σ0 =

12 6 36 7 43 4 15

The value Σ0

j ,k represents our belief about the covariance ofcoefficients j and k

9 / 37

A quick literature review

Stochastic linear programmingI Theoretical properties of the expected optimal value of a stochastic LP

(Madansky 1960, Itami 1974)I Approximate algorithms for multi-stage problems (Birge 1982)

Parametric linear programming/sensitivity analysisI Linear programs with varying objective coefficients (Jansen et al. 1997)

Optimal learningI Simple underlying optimization models, e.g. ranking and selection

(Bechhofer et al. 1985) and multi-armed bandits (Gittins 1989)I Recent work on learning with correlated beliefs (Frazier et al. 2009)

and independent beliefs on graphs (Ryzhov & Powell 2010)

Our work synthesizes and builds on concepts from all of these areas.

10 / 37

Outline

1 Introduction




5 Conclusions

11 / 37

Preliminaries

We assume that the feasible region is known and bounded

Let x (c) be the optimal solution, i.e. the solution of

V (c) = maxx cT xs.t. Ax = b

x ≥ 0,

By strong duality, the dual LP has the same optimal value:

V (c) = miny bT ys.t. AT y − s = c

s ≥ 0

Let y (c) and s (c) denote the optimal dual solution

12 / 37

Learning with correlated Bayesian beliefs

At first, we believe that

c ∼N(c0,Σ0

)We measure the jth coefficient and observe

c1j ∼N (cj ,λj) .

As a result, our beliefs change:

c1 = c0 +c1j − c0

j

λj + Σ0jj

Σ0ej

Σ1 = Σ0−Σ0eje

Tj Σ0

λj + Σ0jj

Repeat the process to obtain c2,c3, ...

The vector ej isgiven by

ej = (0, ...,1, ...,0)T

where component jis equal to 1.

13 / 37

Dynamic programming formulation

The optimal measurement strategy can be described using Bellman’sequation:

V ∗,N(cN ,ΣN

)= V

(cN)

V ∗,n (cn,Σn) = maxj

IE[V ∗,n+1

(cn+1,Σn+1

)|cn,Σn, jn = j

]The optimal measurement J∗,n (cn,Σn) is the choice of j thatachieves the argmax in V ∗,n (cn,Σn)

Due to the curse of dimensionality, this equation is computationallyintractable

14 / 37

Outline

1 Introduction




5 Conclusions

15 / 37

Definition

Originally developed for ranking and selection (Gupta & Miescke1996, Frazier et al. 2008)

The KG decision rule is given by

JKG ,n (cn,Σn) = arg maxj

IEnj

[V(cn+1

)−V (cn)

]The KG factor

νKG ,nj = IEn

j

[V(cn+1

)−V (cn)

]is the expected improvement in our estimate of the optimal value ofthe LP that is achieved by measuring j

The future beliefs cn+1 are random at time n, meaning that KGcomputes the expected value of a stochastic LP

16 / 37

Derivation

It can be shown (Frazier et al. 2009) that, given cn and Σn, and giventhat we measure j at time n, the conditional distribution of cn+1 is

cn+1 ∼ cn +Σnej√λj + Σn

jj

·Z

where Z is a one-dimensional standard normal random variable.

Thus, the KG factor becomes

νKG ,nj = IE

[V(cn + ∆cn

j ·Z)]−V (cn)

where ∆cnj =

Σnej√λj +Σn

jj

.

17 / 37

Graphical illustration

The solution x(cn + z∆cn

j

)is constant if z is in a certain interval

Varying z rotates the level curve of the objective function

18 / 37



j



18 / 37



j



18 / 37



j



18 / 37

Derivation (continued)

The set of values of z for which x(cn + z∆cn

j

)is constant is known

(Hadigheh & Terlaky 2006) as the invariant support set

Let −∞ = z1 < z2 < ... < zI = ∞ be a partition of the real line intoinvariant support sets

Let xi = x(cn + z∆cn

j

)for z ∈ (zi ,zi+1)

Then,

IEV(cn + ∆cn

j ·Z)

= ∑i

∫ zi+1

zi

[(cn + z∆cn

j

)Txi

]φ(z)dz

where φ is the standard normal pdf

19 / 37


The optimal solution x(cn + z∆cn

j

)changes at the breakpoints zi

The level curve of cn + zi∆cnj is tangent to a face of the polyhedron

20 / 37



j



z > zi

20 / 37



j



z = zi

20 / 37



j



z < zi

20 / 37

The KG formula

After some algebra, we can obtain an expression

νKG ,nj = ∑

i

(bi+1−bi ) f (−|zi |)

whereI bi =

(∆cn

j

)Txi

I f (z) = zΦ(z) + φ (z)I Φ is the standard normal cdf

This formula gives the exact value of the KG factor, provided that wecan compute the breakpoints zi of the piecewise linear function

21 / 37

Computation of the breakpoints

At time n, we start with one optimal solution x (cn) for z = 0

We determine whether z = 0 is itself a breakpoint by solving two LPs

z− = miny ,s,z zs.t. AT y − s− z∆cn

j = cn

x (cn)T s = 0s ≥ 0,

andz+ = maxy ,s,z z

s.t. AT y − s− z∆cnj = cn

x (cn)T s = 0s ≥ 0

The values z−,z+ are the smallest and largest values of z for whichx (cn) is optimal (Roos et al. 1997)

22 / 37


If either z− or z+ is equal to zero, then z = 0 is a breakpointSuppose that z− = 0 and z+ > 0

23 / 37



23 / 37



x (cn)

23 / 37



x (cn)

xl (cn)

23 / 37

Finding the neighbouring extreme point

The point xl (cn) is the optimal solution to the LP

Vl (cn) = minx

(∆cn

j

)Tx

s.t. Ax = b

(s (cn))T x = 0x ≥ 0

The quantity(

∆cnj

)Txl (cn) is the left derivative of the piecewise

linear function at the breakpoint z = 0

The right derivative is(

∆cnj

)Tx (cn) itself

24 / 37

Finding the next breakpoint

However, xl (cn) is also optimal at two breakpoints, zero and zl (cn)

x (cn)

xl (cn)

25 / 37



xl (cn)

z = 0

25 / 37



xl (cn)

z = zl (cn)

25 / 37


This next breakpoint is the optimal value of the LP

zl (cn) = miny ,s,z zs.t. AT y − s− z∆cn

j = cn

(xl (cn))T s = 0s ≥ 0.

This LP is identical to the one we used to find z−, but with x (cn)replaced by xl (cn)

We can now find a new z− and repeat the procedure until z− =−∞

26 / 37

Other cases

If z− < 0 and z+ = 0, we can find the neighbouring extreme pointxu (cn) by solving

Vu (cn) = maxx

(∆cn

j

)Tx

s.t. Ax = b

(s (cn))T x = 0x ≥ 0

The next breakpoint is the optimal value of an LP

zu (cn) = maxy ,s,z zs.t. AT y − s− z∆cn

j = cn

(xu (cn))T s = 0s ≥ 0

Again, the process can be repeated until z+ = ∞

27 / 37

Other cases

z− < 0 < z+: zero is not a breakpoint, but both z− and z+ arez− = z+ = 0: zero is a breakpoint, but x (cn) is not an extreme point

xu (cn)

xl (cn)

x (cn)

28 / 37

Summary of algorithm for computing KG factors

Given a set of beliefs (cn,Σn), do the following for j = 1, ...,M:

1 Let z = 0 and solve for x (cn), y (cn) and s (cn).

2 Solve two LPs to obtain z−,z+ and decide whether z = 0 is abreakpoint.

3 Solve a sequence of LPs to obtain the entire vector z of breakpointsand the set x of invariant solutions.

4 Compute νKG ,nj using z and x .

Finally, we measure the coefficient with the largest νKG ,nj .

29 / 37

Asymptotic optimality property of KG

Proposition

For any measurement strategy π,

IEπV(cN)≤ IEV (c) .

Theorem

limN→∞

IEKGV(cN)

= IEV (c) .

Recall that our objective is to maximize IEπV(cN)

As N → ∞, the KG method achieves the highest possible value

30 / 37

Outline

1 Introduction




5 Conclusions

31 / 37

Experimental results: shortest-path problem

Ten layered graphs (22 nodes, 50 edges)

Ten larger layered graphs (38 nodes, 102 edges)

32 / 37

Outline

1 Introduction




5 Conclusions

33 / 37

Conclusions

We have proposed a new class of optimal learning problems where theunderlying optimization model is a linear program

We have derived a knowledge gradient method for deciding what tomeasure in this setting

The KG method computes the value of a single measurement exactlyand is asymptotically optimal

The algorithm for finding breakpoints terminates in finite time, but iscomputationally expensive

34 / 37

References

Bechhofer, R., Santner, T. & Goldsman, D. (1995) Design andAnalysis of Experiments for Statistical Selection, Screening andMultiple Comparisons. John Wiley and Sons, New York.

Birge, J. (1982) “The value of the stochastic solution in stochasticlinear programs with fixed recourse.” Mathematical Programming 24,314–325.

Frazier, P.I., Powell, W. & Dayanik, S. (2008) “A knowledge-gradientpolicy for sequential information collection.” SIAM J. on Control andOptimization 47:5, 2410-2439.

Frazier, P.I., Powell, W.B. & Dayanik, S. (2009) “Theknowledge-gradient policy for correlated normal rewards.” INFORMSJ. on Computing 21:4, 599-613.

35 / 37

References

Gittins, J.C. (1989) Multi-Armed Bandit Allocation Indices. JohnWiley and Sons, New York.

Gupta, S. & Miescke, K. (1996) “Bayesian look ahead one stagesampling allocation for selecting the best population.” J. onStatistical Planning and Inference 54:229-244.

Hadigheh, A. & Terlaky, T. (2006) “Sensitivity analysis in linearoptimization: Invariant support set intervals.” European Journal onOperational Research 169:3, 1158–1175.

Itami, H. (1974) “Expected Value of a Stochastic Linear Program andthe Degree of Uncertainty of Parameters.” Management Science21:3, 291–301.

36 / 37

References

Jansen, B., de Jong, J., Roos, C. & Terlaky, T. (1997) “Sensitivityanalysis in linear programming: just be careful!” European Journal ofOperational Research 101:1, 15–28.

Madansky, A. (1960) “Inequalities for stochastic linear programmingproblems.” Management Science 6:2, 197–204.

Roos, C., Terlaky, T. & Vial, J. (1997) Theory and Algorithms forLinear Optimization: An Interior Point Approach. John Wiley andSons, Chichester, UK.

Ryzhov, I.O. & Powell, W.B. (2010) “Information collection on agraph.” To appear in Operations Research.

37 / 37

Information collection in a linear programscholar.rhsmith.umd.edu/.../files/2010_sp12_lp.pdf · Computation Theory: asymptotic optimality 4 Experimental results 5 Conclusions 2/37.

Documents