Top Banner
A Hierarchy of Near-Optimal Policies for Multi-stage Adaptive Optimization Dimitris Bertsimas Dan A. Iancu Pablo A. Parrilo June 15, 2009 Abstract In this paper, we propose a new tractable framework for dealing with linear dynamical systems affected by uncertainty, applicable to multi-stage robust optimization and stochastic programming. We introduce a hierarchy of near-optimal polynomial disturbance-feedback con- trol policies, and show how these can be computed by solving a single semidefinite programming problem. The approach yields a hierarchy parameterized by a single variable (the degree of the polynomial policies), which controls the trade-off between the optimality gap and the com- putational requirements. We evaluate our framework in the context of two classical inventory management applications, in which very strong numerical performance is exhibited, at relatively modest computational expense. 1 Introduction Multistage optimization problems under uncertainty are prevalent in numerous fields of engineering, economics, finance, and have elicited interest on both a theoretical and a practical level from diverse research communities. Among the most established methodologies for dealing with such problems are dynamic programming (DP) Bertsekas [2001], stochastic programming Birge and Louveaux [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005], and, more recently, robust optimization (see Kerrigan and Maciejowski [2003], Ben-Tal et al. [2005a, 2006], Bertsimas et al. [2010] and references therein). In the current paper, we consider discrete-time, linear dynamical systems of the form x(k + 1) = A(k) x(k)+ B(k) u(k)+ w(k), (1) evolving over a finite planning horizon, k =0,...,T 1. The variables x(k) R n represent the state, and the controls u(k) R nu denote actions taken by the decision maker. A(k) and B(k) are matrices of appropriate dimensions, describing the evolution of the system, and the initial state, x(0), is assumed known. The system is affected by unknown 1 , additive disturbances, w(k), which * Sloan School of Management and Operations Research Center, Massachusetts Institute of Technology, 77 Mas- sachusetts Avenue, E40-147, Cambridge, MA 02139, USA. Email: [email protected]. Operations Research Center, Massachusetts Institute of Technology, 77 Massachusetts Avenue, E40-130, Cam- bridge, MA 02139, USA. Email: [email protected]. Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, 77 Massachusetts Avenue, 32D-726, Cambridge, MA 02139, USA. Email: [email protected]. 1 We use the convention that the disturbance w(k) is revealed in period k after the control action u(k) is taken, so that u(k + 1) is the first decision allowed to depend on w(k). 1
27

A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

Sep 08, 2018

Download

Documents

vanhanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

A Hierarchy of Near-Optimal Policies for Multi-stage Adaptive

Optimization

Dimitris Bertsimas ∗ Dan A. Iancu† Pablo A. Parrilo ‡

June 15, 2009

Abstract

In this paper, we propose a new tractable framework for dealing with linear dynamicalsystems affected by uncertainty, applicable to multi-stage robust optimization and stochasticprogramming. We introduce a hierarchy of near-optimal polynomial disturbance-feedback con-trol policies, and show how these can be computed by solving a single semidefinite programmingproblem. The approach yields a hierarchy parameterized by a single variable (the degree of thepolynomial policies), which controls the trade-off between the optimality gap and the com-putational requirements. We evaluate our framework in the context of two classical inventorymanagement applications, in which very strong numerical performance is exhibited, at relativelymodest computational expense.

1 Introduction

Multistage optimization problems under uncertainty are prevalent in numerous fields of engineering,economics, finance, and have elicited interest on both a theoretical and a practical level from diverseresearch communities. Among the most established methodologies for dealing with such problemsare dynamic programming (DP) Bertsekas [2001], stochastic programming Birge and Louveaux[2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005], and, more recently,robust optimization (see Kerrigan and Maciejowski [2003], Ben-Tal et al. [2005a, 2006], Bertsimaset al. [2010] and references therein).

In the current paper, we consider discrete-time, linear dynamical systems of the form

x(k + 1) = A(k)x(k) + B(k)u(k) + w(k), (1)

evolving over a finite planning horizon, k = 0, . . . , T − 1. The variables x(k) ∈ Rn represent the

state, and the controls u(k) ∈ Rnu denote actions taken by the decision maker. A(k) and B(k) are

matrices of appropriate dimensions, describing the evolution of the system, and the initial state,x(0), is assumed known. The system is affected by unknown1, additive disturbances, w(k), which

∗Sloan School of Management and Operations Research Center, Massachusetts Institute of Technology, 77 Mas-sachusetts Avenue, E40-147, Cambridge, MA 02139, USA. Email: [email protected].

†Operations Research Center, Massachusetts Institute of Technology, 77 Massachusetts Avenue, E40-130, Cam-bridge, MA 02139, USA. Email: [email protected].

‡Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, 77 MassachusettsAvenue, 32D-726, Cambridge, MA 02139, USA. Email: [email protected].

1We use the convention that the disturbance w(k) is revealed in period k after the control action u(k) is taken,so that u(k + 1) is the first decision allowed to depend on w(k).

1

Page 2: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

are assumed to lie in a given compact, basic semialgebraic set,

Wkdef= {w(k) ∈ R

nw : gj(w(k)) ≥ 0, j ∈ Jk} , (2)

where gj ∈ R[w] are multivariate polynomials depending on the vector of uncertainties at time k,w(k), and Jk is a finite index set. We note that this formulation captures many uncertainty setsof interest, such as polytopic (all gj affine), p-norms, ellipsoids, and intersections thereof. For now,we restrict our description to uncertainties that are additive and independent across time, but ourframework can also be extended to cases where the uncertainties are multiplicative (e.g., affectingthe system matrices), and also dependent across time (please refer to Section 3.3 for details).

We assume that the dynamic evolution of the system is constrained by a set of linear inequalities,

{

Ex(k)x(k) + Eu(k)u(k) ≤ f(k), k = 0, . . . , T − 1,

Ex(T )x(T ) ≤ f(T ),(3)

where Ex(k) ∈ Rrk·n, Eu(k) ∈ R

rk·nu ,f(k) ∈ Rrk for the respective k, and the system incurs

penalties that are piece-wise affine and convex in the states and controls,

h (k,x(k),u(k)) = maxi∈Ik

[

c0(k, i) + cx(k, i)T x(k) + cu(k, i)T u(k)]

, (4)

where Ik is a finite index set, and c0(k, i) ∈ R, cx(k, i) ∈ Rn, cu(k, i) ∈ R

nu are pre-specified costparameters. The goal is to find non-anticipatory control policies u(0),u(1), . . . ,u(T − 1) thatminimize the cost incurred by the system in the worst-case scenario,

J = h (0,x(0),u(0)) + maxw(0)

[

h (1,x(1),u(1)) + . . .

+ maxw(T−2)

[

h (T − 1,x(T − 1),u(T − 1)) + maxw(T−1)

h (T,x(T ))]

. . .]

.

With the state of the dynamical system at time k given by x(k), one can resort to the Bellmanoptimality principle of DP Bertsekas [2001] to compute optimal policies, u⋆(k,x(k)), and optimalvalue functions, J⋆(k,x(k)). Although DP is a powerful technique as to the theoretical charac-terization of the optimal policies, it is plagued by the well-known curse of dimensionality, in thatthe complexity of the underlying recursive equations grows quickly with the size of the state-space,rendering the approach ill suited to the computation of actual policy parameters. Therefore, inpractice, one would typically solve the recursions numerically (e.g., by multi-parametric program-ming Bemporad et al. [2000, 2002, 2003]), or resort to approximations, such as approximate DPBertsekas and Tsitsiklis [1996], Powell [2007], stochastic approximation Asmussen and Glynn [2007],simulation based optimization (Glasserman and Tayur [1995], Marbach and Tsitsiklis [2001]), andothers. Some of the approximations also come with performance guarantees in terms of the ob-jective value in the problem, and many ongoing research efforts are placed on characterizing thesub-optimality gaps resulting from specific classes of policies (the interested reader can refer to thebooks Bertsekas [2001], Bertsekas and Tsitsiklis [1996] and Powell [2007] for a thorough review).

An alternative approach, originally proposed in the stochastic programming community (seeBirge and Louveaux [2000], Garstka and Wets [1974] and references therein), is to consider controlpolicies that are parametrized directly in the sequence of observed uncertainties, and typicallyreferred to as recourse decision rules. For the case of linear constraints on the controls, withuncertainties regarded as random variables having bounded support and known distributions, andthe goal of minimizing an expected piece-wise quadratic, convex cost, the authors in Garstka and

2

Page 3: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

Wets [1974] show that piece-wise affine decision rules are optimal, but pessimistically conclude thatcomputing the actual parameterization is usually an “impossible task” (for a precise quantificationof that statement, see Dyer and Stougie [2006] and Nemirovski and Shapiro [2005]).

Disturbance-feedback parameterizations have recently been used by researchers in robust controland robust optimization (see Lofberg [2003], Kerrigan and Maciejowski [2003, 2004], Goulart andKerrigan [2005], Ben-Tal et al. [2004, 2005a, 2006], Bertsimas and Brown [2007], Skaf and Boyd[2008a,b], and references therein). In most of the papers, the authors restrict attention to the caseof affine policies, and show how reformulations can be done that allow the computation of the policyparameters by solving convex optimization problems, which vary from linear and quadratic (e.g.Ben-Tal et al. [2005a], Kerrigan and Maciejowski [2004]), to second-order conic and semidefiniteprograms (e.g. Lofberg [2003], Ben-Tal et al. [2005a], Bertsimas and Brown [2007], Skaf and Boyd[2008a]). Some of the first steps towards analyzing the properties of disturbance-affine policieswere taken in Kerrigan and Maciejowski [2004], Ben-Tal et al. [2005a], where it was shown that,under suitable conditions, the resulting parametrization has certain desirable system theoreticproperties (stability and robust invariance), and that the class of affine disturbance feedback policiesis equivalent to the class of affine state feedback policies with memory of prior states, thus subsumingthe well-known open-loop and pre-stabilizing control policies.

With the exception of a few classical cases, such as linear quadratic Gaussian or linear ex-ponential quadratic Gaussian2, characterizing the performance of affine policies in terms of ob-jective function value is typically very hard. The only result in a constrained, robust settingthat the authors are aware of is our recent paper Bertsimas et al. [2010], in which it is shownthat, in the case of one-dimensional systems, with independent state and control constraints(

Lk ≤ uk ≤ Uk, Lxk ≤ xk ≤ Ux

k

)

, linear control costs and any convex state costs, disturbance-affine policies are, in fact, optimal, and can be found efficiently. As a downside, the same paperpresents simple examples of multi-dimensional systems where affine policies are sub-optimal.

In fact, in most applications, the restriction to the affine case is done for purposes of tractability,and almost invariably results in loss of performance (see the remarks at the end of Nemirovski andShapiro [2005]), with the optimality gap being sometimes very large. In an attempt to address thisproblem, recent work has considered parameterizations that are affine in a new set of variables,derived by lifting the original uncertainties into a higher dimensional space. For example, theauthors in Chen and Zhang [2009], Chen et al. [2008], Sim and Goh [2009] suggest using so-calledsegregated linear decision rules, which are affine parameterizations in the positive and negativeparts of the original uncertainties. Such policies provide more flexibility, and their computation(for two-stage decision problems in a robust setting) requires roughly the same complexity as thatneeded for a set of affine policies in the original variables. Another example following similar ideas isChatterjee et al. [2009], where the authors consider arbitrary functional forms of the disturbances,and show how, for specific types of p-norm constraints on the controls, the problems of finding thecoefficients of the parameterizations can be relaxed into convex optimization problems. A similarapproach is taken in Skaf and Boyd [2008b], where the authors also consider arbitrary functionalforms for the policies, and show how, for a problem with convex state-control constraints and convexcosts, such policies can be found by convex optimization, combined with Monte-Carlo sampling (toenforce constraint satisfaction). Chapter 14 of the recent book Ben-Tal et al. [2009] also containsa thorough review of several other classes of such adjustable rules, and a discussion of cases whensophisticated rules can actually improve over the affine ones.

The main drawback of some of the above approaches is that the right choice of functional form

2These refer to problems that are unconstrained, with Gaussian disturbances, and the goal of minimizing expectedcosts that are quadratic or exponential of a quadratic, respectively. For these, the optimal policies are affine in thestates - see Bertsekas [2001] and references therein.

3

Page 4: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

for the decision rules is rarely obvious, and there is no systematic way to influence the trade-off between the performance of the resulting policies and the computational complexity requiredto obtain them, rendering the frameworks ill-suited for general multi-stage dynamical systems,involving complicated constraints on both states and controls.

The goal of our current paper is to introduce a new framework for modeling and (approximately)solving such multi-stage dynamical problems. While we restrict attention mainly to the robust,mini-max objective setting, our ideas can be extended to deal with stochastic problems, in whichthe uncertainties are random variables with known, bounded support and distribution that is eitherfully or partially known3 (see Section 3.3 for a discussion). Our main contributions are summarizedbelow:

• We introduce a natural extension of the aforementioned affine decision rules, by consideringcontrol policies that depend polynomially on the observed disturbances. For a fixed poly-nomial degree d, we develop a convex reformulation of the constraints and objective of theproblem, using Sums-Of-Squares (SOS) techniques. In the resulting framework, polynomialpolicies of degree d can be computed by solving a single semidefinite programming problem(SDP), which, for a fixed precision, can be done in polynomial time Vandenberghe and Boyd[1996]. Our approach is advantageous from a modelling perspective, since it places littleburden on the end user (the only choice is the polynomial degree d), while at the same timeproviding a lever for directly controlling the trade-off between performance and computation(higher d translates into policies with better objectives, obtained at the cost of solving largerSDPs).

• To test our polynomial framework, we consider two classical problems arising in inventorymanagement (single echelon with cumulative order constraints, and serial supply chain withlead-times), and compare the performance of affine, quadratic and cubic control policies. Theresults obtained are very encouraging - in particular, for all problem instances considered,quadratic policies considerably improve over affine policies (typically by a factor of 2 or 3),while cubic policies essentially close the optimality gap (the relative gap in all simulations isless than 1%, with a median gap of less than 0.01%).

The paper is organized as follows. Section 2 presents the mathematical formulation of the prob-lem, briefly discusses relevant solution techniques in the literature, and introduces our framework.Section 3, which is the main body of the paper, first shows how to formulate and solve the problemof searching for the optimal polynomial policy of fixed degree, and then discusses the specific caseof polytopic uncertainties. Section 3.3 also elaborates on immediate extensions of the frameworkto more general multi-stage decision problems. Section 5 translates two classical problems frominventory management into our framework, and Section 6 presents our computational results, ex-hibiting the strong performance of polynomial policies. Section 7 concludes the paper and suggestsdirections of future research.

1.1 Notation

Throughout the rest of the paper, we denote scalar quantities by lowercase, non-bold face symbols(e.g. x ∈ R, k ∈ N), vector quantities by lowercase, boldface symbols (e.g. x ∈ R

n, n > 1), andmatrices by uppercase symbols (e.g. A ∈ R

n·n, n > 1). Also, in order to avoid transposing vectorsseveral times, we use the comma operator ( , ) to denote vertical vector concatenation, e.g. with

3In the latter case, the cost would correspond to the worst-case distribution consistent with the partial information.

4

Page 5: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

x = (x1, . . . , xn) ∈ Rn and y = (y1, . . . , ym) ∈ R

m, we write (x,y)def= (x1, . . . , xn, y1, . . . , ym) ∈

Rm+n.

We refer to quantities specific to time-period k by either including the index in parenthesis,e.g. x(k), J⋆ (k,x(k)), or by using an appropriate subscript, e.g. xk, J⋆

k (xk). When referring tothe j-th component of a vector at time k, we always use the parenthesis notation for time, andsubscript for j, e.g., xj(k).

Since we seek policies parameterized directly in the uncertainties, we introduce w[k]def= (w1, . . . ,wk−1)

to denote the history of known disturbances at the beginning of period k, and W[k]def= W1 × · · · ×

Wk−1 to denote the corresponding uncertainty set. By convention, w[0] ≡ {∅}.With x = (x1, . . . , xn), we denote by R[x] the ring of polynomials in variables x1, . . . , xn, and

by Pd[x] the R-vector space of polynomials in x1, . . . , xn, with degree at most d. We also let

Bd(x)def=

(

1, x1, x2, . . . , xn, x21, x1x2, . . . , x1xn, x2

2, x2x3 . . . , xdn

)

(5)

be the canonical basis of Pd[x], and s(d)def=

(

n+dd

)

be its dimension. Any polynomial p ∈ Pd[x] iswritten as a finite linear combination of monomials,

p(x) = p(x1, . . . , xn) =∑

α∈Nn

pαxα = pTBd(x), (6)

where xα def= xα1

1 xα22 . . . xαn

n , and the sum is taken over all n-tuples α = (α1, α2, . . . , αn) ∈ Nn

satisfying∑n

i=1 αi ≤ d. In the expression above, p = (pα) ∈ Rs(r) is the vector of coefficients of

p(x) in the basis (5). In situations where the coefficients pα of a polynomial are decision variables,in order to avoid confusions, we refer to x as the indeterminate (similarly, we refer to p(x) as apolynomial in indeterminate x). By convention, we take p(∅) ≡ p0,0,...,0, i.e., a polynomial withoutindeterminate is simply a constant.

For a polynomial p ∈ R[x], we use deg(p) to denote the largest degree of a monomial presentin p.

2 Problem Description

Using the notation mentioned in the introduction, our goal is to find non-anticipatory controlpolicies u0,u1, . . . ,uT−1 that minimize the cost incurred by the system in the worst-case scenario.In other words, we seek to solve the problem:

minu0

[

h0 (x0,u0) + maxw0

minu1

[

h1 (x1,u1) + · · ·+

+ minuT−1

[

hT−1 (xT−1,uT−1) + maxwT−1

hT (xT )]

. . .]

]

(7a)

(P ) s.t. xk+1 = Ak xk + Bk uk + wk, ∀ k ∈ {0, . . . , T − 1}, (7b)

Ex(k)xk + Eu(k)uk ≤ fk, ∀ k ∈ {0, . . . , T − 1}, (7c)

Ex(T )xT ≤ fT . (7d)

As already mentioned, the control actions uk do not have to be decided entirely at time periodk = 0, i.e., (P ) does not have to be solved as an open-loop problem. Rather, uk is allowed to dependon the information set available4 at time k, resulting in control policies uk : Fk → R

nu , where Fk

consists of past states, controls and disturbances, Fk = {xt}0≤t≤k ∪ {ut}0≤t<k ∪ {wt}0≤t<k.

4More formally, the decision process uk is adapted to the filtration generated by past values of the disturbancesand controls.

5

Page 6: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

While Fk is a large (expanding with k) set, the state xk represents sufficient information fortaking optimal decisions at time k. Thus, with control policies depending on the states, onecan resort to the Bellman optimality principle of Dynamic Programming (DP) Bertsekas [2001],to compute optimal policies, u⋆

k(xk), and optimal value functions, J⋆k (xk). As suggested in the

introduction, the approach is limited due to the curse of dimensionality, so that, in practice,one typically resorts to approximate schemes for computing suboptimal, state-dependent policiesBertsekas and Tsitsiklis [1996], Powell [2007], Marbach and Tsitsiklis [2001].

In this paper, we take a slightly different approach, and consider instead policies parametrizeddirectly in the observed uncertainties,

uk : W0 ×W1 × · · · ×Wk−1 → Rnu. (8)

In this context, the decisions that must be taken are the parameters defining the specific functionalform sought for uk. One such example of disturbance-feedback policies, often considered in theliterature, is the affine case, i.e., uk = Lk · (1,w0, . . . ,wk−1), where the decision variables are thecoefficients of the matrices Lk ∈ R

nu×(1+k×nw), k = 0, . . . , T − 1.

In this framework, with (7b) used to express the dependency of states xk on past uncertainties,the state-control constraints (7c), (7d) at time k can be written as functions of the parametricdecisions L0, . . . , Lk and the uncertainties w0, . . . ,wk−1, and one typically requires these constraintsto be obeyed robustly, i.e., for any possible realization of the uncertainties.

As already mentioned, this approach has been explored before in the literature, in both thestochastic and robust frameworks (Birge and Louveaux [2000], Garstka and Wets [1974], Lofberg[2003], Kerrigan and Maciejowski [2003, 2004], Goulart and Kerrigan [2005], Ben-Tal et al. [2004,2005a, 2006], Bertsimas and Brown [2007], Skaf and Boyd [2008a]). The typical restriction to thesub-class of affine policies, done for purposes of tractability, almost invariably results in loss ofperformance Nemirovski and Shapiro [2005], with the gap being sometimes very large.

To illustrate this effect, we introduce the following simple example5, motivated by a similar casein Chen and Zhang [2009]:

Example 1. Consider a two-stage problem, where w ∈ W is the uncertainty, with W ={

w ∈R

N : ‖w‖2 ≤ 1}

, x ∈ R is a first-stage decision (taken before w is revealed), and y ∈ RN is a

second-stage decision (allowed to depend on w). We would like to solve the following optimization:

minimizex,y(w)

x

such that x ≥N

i=1

yi, ∀w ∈ W,

yi ≥ w2i , ∀w ∈ W.

(9)

It can be easily shown (see Lemma 1 in Section 8.1) that the optimal objective in Problem (9)is 1, corresponding to yi(w) = w2

i , while the best objective achievable under affine policies y(w)is N , for yi(w) = 1, ∀ i. In particular, this simple example shows that the optimality gap resultingfrom the use of affine policies can be made arbitrarily large (as the problem size increases).

Motivated by these facts, in the current paper we explore the performance of a more generalclass of disturbance-feedback control laws, namely policies that are polynomial in past-observed

5We note that this example can be easily cast as an instance of Problem (P ). We opt for the simpler notation tokeep the ideas clear.

6

Page 7: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

uncertainties. More precisely, for a specified degree d, and with w[k] denoting the vector of alldisturbances in Fk,

w[k]def= (w0,w1, . . . ,wk−1 ) ∈ R

k·nw, (10)

we consider a control law at time k in which every component is a polynomial of degree at most din variables w[k], i.e., uj(k,w[k]) ∈ Pd[w[k]], and thus:

uk(w[k]) = Lk Bd(w[k]), (11)

where Bd(w[k]) is the canonical basis of Pd[w[k]], given by (5). The new decision variables become

the matrices of coefficients Lk ∈ Rnu·s(d), k = 0, . . . , T − 1, where s(d) =

(

k·nw+dd

)

is the dimensionof Pd[w[k]]. Therefore, with a fixed degree d, the number of decision variables remains polynomiallybounded in the size of the problem input, T, nu, nw.

This class of policies constitutes a natural extension of the disturbance-affine control laws, i.e.,the case d = 1. Furthermore, with sufficiently large degree, one can expect the performance ofthe polynomial policies to become near-optimal (recall that, by the Stone-Weierstrass TheoremRudin [1976], any continuous function on a compact set can be approximated as closely as desiredby polynomial functions). The main drawback of the approach is that searching over arbitrarypolynomial policies typically results in non-convex optimization problems. To address this issue,in the next section, we develop a tractable, convex reformulation of the problem based on Sum-Of-Squares (SOS) techniques Parrilo [2000, 2003], Lasserre [2001].

3 Polynomial Policies and Convex Reformulations Using Sums-

Of-Squares

Under polynomial policies of the form (11), one can use the dynamical equation (7b) to express everycomponent of the state at time k, xj(k), as a polynomial in indeterminate w[k], whose coefficients are

linear combinations of the entries in {Lt}0≤t≤k−1. As such, with ex(k, j)T and eu(k, j)T denotingthe j-th row of Ex(k) and Eu(k), respectively, a typical state-control constraint (7c) can be written

ex(k, j)T xk + eu(k, j)T uk ≤ fj(k) ⇔

pconj,k (w[k])

def= fj(k) − ex(k, j)T xk − eu(k, j)T uk ≥ 0, ∀w[k] ∈ W[k].

In particular, feasibility of the state-control constraints at time k is equivalent to ensuring that thecoefficients {Lt}0≤t≤k−1 are such that the polynomials pcon

j,k (w[k]), j = 1, . . . , rk, are non-negativeon the domain W[k].

Similarly, the expression (4) for the stage cost at time k can be written as

hk(xk,uk) = maxi∈Ik

pcosti (w[k]),

pcosti (w[k])

def= c0(k, i) + cx(k, i)T xk(w[k]) + cu(k, i)T uk(w[k]),

i.e., the cost hk is a piece-wise polynomial function of the past-observed disturbances w[k]. There-fore, under polynomial control policies, we can rewrite the original Problem (P) as the following

7

Page 8: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

polynomial optimization problem:

minL0

[

maxi∈I1

pcosti (w[0]) + max

w0

minL1

[

maxi∈I2

pcosti (w[1]) + . . .

(PPOP) + maxwT−2

minLT−1

[

maxi∈IT−1

pcosti (w[T−1]) + max

wT−1

maxi∈IT

pcosti (w[T ])

]

. . .]

]

(12a)

s.t. pconj,k (w[k]) ≥ 0, ∀ k = 0, . . . , T, ∀ j = 1, . . . , rk, ∀w[k] ∈ W[k]. (12b)

In this formulation, the decision variables are the coefficients {Lt}0≤t≤T−1, and (12b) summarize allthe state-control constraints. We emphasize that the expression of the polynomial controls (11) andthe dynamical system equation (7b) should not be interpreted as real constraints in the problem(rather, they are only used to derive the dependency of the polynomials pcost

i (w[k]) and pconj,k (w[k])

on {Lt}0≤t≤k−1 and w[k]).

3.1 Reformulating the Constraints

As mentioned in the previous section, under polynomial control policies, a typical state-controlconstraint (12b) in program (PPOP) can now be written as:

p(ξ) ≥ 0, ∀ ξ ∈ W[k], (13)

where ξ ≡ w[k] ∈ Rk·nw is the history of disturbances, and p(ξ) is a polynomial in variables

ξ1, ξ2, . . . , ξk·nwwith degree at most d,

p(ξ) = pTBd

(

ξ)

,

whose coefficients pi are affine combinations of the decision variables Lt, 0 ≤ t ≤ k − 1. It is easyto see that constraint (13) can be rewritten equivalently as

p(ξ) ≥ 0, ∀ ξ ∈ W[k]def=

{

ξ ∈ Rk·nw : gj(ξ) ≥ 0, j = 1, . . . ,m

}

, (14)

where {gj}1≤j≤m are all the polynomial functions describing the compact basic semi-algebraic setW[k] ≡ W0 × · · · ×Wk−1, immediately derived from (2). In this form, (14) falls in the general classof constraints that require testing polynomial non-negativity on a basic closed, semi-algebraic set,i.e., a set given by a finite number of polynomial equalities and inequalities. To this end, note thata sufficient condition for (14) to hold is:

p = σ0 +m

j=1

σj gj , (15)

where σj ∈ R[ξ], j = 0, . . . ,m, are polynomials in the variables ξ which are furthermore sumsof squares (SOS). This condition translates testing the non-negativity of p on the set W[k] into asystem of linear equality constraints on the coefficients of p and σj , j = 0, . . . ,m, and a test whetherσj are SOS. The main reason why this is valuable is because testing whether a polynomial of fixeddegree is SOS is equivalent to solving a semidefinite programming problem (SDP) (refer to Parrilo[2000, 2003], Lasserre [2001] for details), which, for a fixed precision, can be done in polynomialtime, by interior point methods Vandenberghe and Boyd [1996].

On first sight, condition (15) might seem overly restrictive. However, it is motivated by recentpowerful results in real algebraic geometry (Putinar [2003], Jacobi and Prestel [2001]), which,

8

Page 9: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

under mild conditions6 on the functions gj , state that any polynomial that is strictly positive on acompact semi-algebraic set W[k] must admit a representation of the form (15), where the degreesof the σj polynomials are not a priori bounded. In our framework, in order to obtain a tractableformulation, we furthermore restrict these degrees so that the total degree of every product σj gj is

at most max(

d,maxj

(

deg(gj))

)

, the maximum between the degree of the control policies (11) under

consideration and the largest degree of the polynomials gj giving the uncertainty sets. While thisrequirement is more restrictive, and could, in principle, result in conservative parameter choices,it avoids ad-hoc modeling decisions and has the advantage of keeping a single parameter that isadjustable to the user (the degree d), which directly controls the trade-off between the size of theresulting SDP formulation and the quality of the overall solution. Furthermore, in our numericalsimulations, we find that this choice performs very well in practice, and never results in infeasibleconditions.

3.2 Reformulating the Objective

Recall from our discussion in the beginning of Section 3 that, under polynomial control policies, atypical stage cost becomes a piecewise polynomial function of past uncertainties, i.e., a maximumof several polynomials. A natural way to bring such a cost into the framework presented before isto introduce, for every stage k = 0, . . . , T , a polynomial function of past uncertainties, and requireit to be an upper-bound on the true (piecewise polynomial) cost.

More precisely, and to fix ideas, consider the stage cost at time k, which, from our earlierdiscussion, can be written as

hk(xk,uk) = maxi∈Ik

pcosti (w[k]),

pcosti (w[k]) = c0(k, i) + cx(k, i)T xk(w[k]) + cu(k, i)T uk(w[k]), ∀ i ∈ Ik.

In this context, we introduce a modified stage cost hk ∈ Pd[w[k]], which we constrain to satisfy

hk(w[k]) ≥ pcosti (w[k]), ∀w[k] ∈ W[k], ∀ i ∈ Ik,

and we replace the overall cost for Problem (PPOP) with the sum of the modified stage costs. Inother words, instead of minimizing the objective (7a), we seek to solve:

min J

s.t. J ≥T

k=0

hk(w[k]), ∀w[T ] ∈ w[T ], (16a)

hk(w[k]) ≥ pcosti (w[k]), ∀w[k] ∈ W[k], ∀ i ∈ Ik. (16b)

The advantage of this approach is that, now, constraints (16a) and (16b) are of the exact samenature as (13), and thus fit into the SOS framework developed earlier. As a result, we can usethe same semidefinite programming approach to enforce them, while preserving the tractability ofthe formulation and the trade-off between performance and computation delivered by the degreed. The main drawback is that the cost J may conceivably, in general, over-bound the optimal costof Problem (P ), due to several reasons:

6These are readily satisfied when gj are affine, or can be satisfied by simply appending a redundant constraintthat bounds the 2-norm of the vector ξ

9

Page 10: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

1. We are replacing the (true) piece-wise polynomial cost hk with an upper bound given by thepolynomial cost hk. Therefore, the optimal value J of problem (16a) may, in general, belarger than the true cost corresponding to the respective polynomial policies, i.e., the cost ofproblem (PPOP).

2. All the constraints in the model, namely (16a), (16b), and the state-control constraints (12b),are enforced using SOS polynomials with fixed degree (see the discussion in Section 3.1), andthis is sufficient, but not necessary.

However, despite these multiple layers of approximation, our numerical experiments, presentedin Section 6, suggest that most of the above considerations are second-order effects when comparedwith the fact that polynomial policies of the form (11), are themselves, in general, suboptimal. Infact, our results suggest that with a modest polynomial degree (3, and sometimes even 2), one canclose most of the optimality gap between the SDP formulation and the optimal value of Problem(P ).

To summarize, our framework can be presented as the sequence of steps below:

Framework for computing polynomial policies of degree d

1: Consider polynomial control policies in the disturbances, uk(w[k]) = Lk Bd

(

w[k]

)

.2: Express all the states xk according to equation (7b). Each component of a typical state xk

becomes a polynomial in indeterminate w[k], with coefficients given by linear combinations of{Lt}0≤t≤k−1.

3: Replace a typical stage cost hk(xk,uk) = maxi∈Ikpcost

i (w[k]) with a modified stage cost hk ∈

Pd[w[k]], constrained to satisfy hk(w[k]) ≥ pcosti (w[k]), ∀w[k] ∈ W[k], ∀ i ∈ Ik.

4: Replace the overall cost with the sum of the modified stage costs.5: Replace a typical constraint p(w[k]) ≥ 0, ∀w[k] ∈

{

ξ : gj(ξ) ≥ 0, j = 1, . . . ,m}

(for eitherstate-control or costs) with the requirements:

p = σ0 +

m∑

j=1

σjgj

(

linear constraints on coefficients)

σj SOS, j = 0, . . . ,m.(

m + 1 SDP constraints)

deg(σj gj) ≤ max(

d,maxj

(

deg(gj))

)

,

deg(σ0) = maxj

(

deg(σj gj))

.

6: Solve the resulting SDP to obtain the coefficients Lk of the policies.

The size of the overall formulation is controlled by the following parameters:

• There are O(

T 2 · maxk(rk + |Ik|) · (maxk |Jk|) ·(

T ·nw+d

d

)

)

linear constraints

• There are O(

T 2·maxk(rk+|Ik|)·(maxk |Jk|))

SDP constraints, each of size at most(

T ·nw+⌈ d2⌉

⌈ d2⌉

)

• There are O(

T ·[

nu + T · maxk(rk + |Ik|) · (maxk |Jk|)] (

T ·nw+d

d

)

)

variables

10

Page 11: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

Above, ddef= max

(

d,maxj

(

deg(gj))

)

, i.e., the largest between d and the degree of any polynomial

gj defining the uncertainty sets. Since, for all practical purposes, most uncertainty sets consideredin the literature are polyhedral or quadratic, the main parameter that controls the complexity is d(for d ≥ 2).

As the main computational bottleneck comes from the SDP constraints, we note that theirsize and number could be substantially reduced by requiring the control policies to only dependon a partial history of the uncertainties, e.g., by considering uk : Wk−q × Wk−q+1 × · · · × Wk−1,for some fixed q > 0, and by restricting xk in a similar fashion. In this case, there would be

O(

T · q ·maxk(rk + |Ik|) · (maxk |Jk|))

SDP constraints, each of size at most(

q·nw+⌈ d2⌉

⌈ d2⌉

)

, and only

O(∑

k |Jk|)

SDP constraints of size(

T ·nw+⌈ d2⌉

⌈ d2⌉

)

.

3.3 Extensions

For completeness, we conclude our discussion by briefly mentioning several modelling extensionsthat can be readily captured in our framework:

1. Although we only consider uncertainties that are “independent” across time, i.e., the historyw[k] always belongs to the Cartesian product W0 × · · · × Wk−1, our approach could be im-mediately extended to situations in which the uncertainty sets characterize partial sequences.As an example, instead of Wk, we could specify a semi-algebraic description for the historyW[k],

(w0,w1, . . . ,wk−1) ∈ W[k] ={

ξ ∈ Rk×nw : gj(ξ) ≥ 0,∀ j ∈ Jk

}

,

which could be particularly useful in situations where the uncertainties are generated byprocesses that are dependent across time. The only modification would be to use the newspecification for the set W[k] in the typical state-control constraints (13) and the cost refor-mulation constraints (16a), (16b).

2. While we restrict the exposition to uncertainties that are only affecting the system dynamicsadditively, i.e., by means of equation (1), the framework can be extended to situations wherethe system and constraint matrices, A(k), B(k), Ex(k), Eu(k),f(k) or the cost parameters,cx(k, i) or cu(k, i) are also affected by uncertainty. These situations are of utmost practicalinterest, in both the inventory examples that we consider in the current paper, but also inother realistic dynamical systems. As an example, suppose that the matrix A(k) is affinelydependent on uncertainties ζk ∈ Zk ⊂ R

nζ ,

A(k) = A0(k) +

nζ∑

i=1

ζi(k)Ai(k),

where Ai(k) ∈ Rn×n,∀ i ∈ {0, . . . , nζ} are deterministic matrices, and Zk are closed, basic

semi-algebraic sets. Then, provided that the uncertainties wk and ζk are both observable inevery period7, our framework can be immediately extended to decision policies that dependon the histories of both sources of uncertainty, i.e., uk(w0, . . . ,wk−1, ζ0, . . . , ζk−1).

7When only the states xk are observable, then one might not be able to simultaneously discriminate and measureboth uncertainties.

11

Page 12: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

3. Note that, instead of considering uncertainties as lying in given sets, and adopting a min-max(worst-case) objective, we could accommodate the following modelling assumptions:

(a) The uncertainties are random variables, with bounded support given by the set W0 ×W1 × . . .WT−1, and known probability distribution function F. The goal is to findu0, . . . ,uT−1 so as to obey the state-control constraints (3) almost surely, and to mini-mize the expected costs,

minu0

[

h0 (x0,u0) + Ew0∼F minu1

[

h1 (x1,u1) + . . .

+ minuT−1

[

hT−1 (xT−1,uT−1) + EwT−1∼F hT (xT )]

. . .]

]

. (17)

In this case, since our framework already enforces almost sure (robust) constraint satis-faction, the only potential modifications would be in the reformulation of the objective.Since the distribution of the uncertainties is assumed known, and the support is bounded,the moments exist and can be computed up to any fixed degree d. Therefore, we couldpreserve the reformulation of state-control constraints and stage-costs in our framework(i.e., Steps 2 and 4), but then proceed to minimize the expected sum of the polynomialcosts hk (note that the expected value of a polynomial function of uncertainties can beimmediately obtained as a linear function of the moments).

(b) The uncertainties are random variables, with the same bounded support as above, butunknown distribution function F, belonging to a given set of distributions, F . The goalis to find control policies obeying the constraints almost surely, and minimizing theexpected costs corresponding to the worst-case distribution F,

minu0

[

h0 (x0,u0) + supF∈F

Ew0 minu1

[

h1 (x1,u1) + · · ·+

minuT−1

[

hT−1 (xT−1,uT−1) + supF∈F

EwT−1hT (xT )

]

. . .]

]

. (18)

In this case, if partial information (such as the moments of the distribution up to degreed) is available, then the framework in (a) could be applied. Otherwise, if the onlyinformation available about F were the support, then our framework could be appliedwithout modification, but the solution obtained would exactly correspond to the min-max approach, and hence be quite conservative.

While these extensions are certainly worthy of attention, we do not pursue them here, and restrictour discussion in the remainder of the paper to the original worst-case formulation.

4 Other Methodologies for Computing Decision Rules or Exact

Values

Our goal in the current section is to discuss the relation between our polynomial hierarchy andseveral other established methodologies in the literature8 for computing affine or quadratic decisionrules. More precisely, for the case of ∩-ellipsoidal uncertainty sets, we show that our framework

8We are grateful to one of the anonymous referees for pointing out reference Ben-Tal et al. [2009], which was notat our disposal at the time of conducting the research.

12

Page 13: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

delivers policies of degree 1 or 2 with performance at least as good as that obtained by applyingthe methods in Ben-Tal et al. [2009]. In the second part of the section, we discuss the particularcase of polytopic uncertainty sets, where exact values for Problem (P ) can be found (which arevery useful for benchmarking purposes).

4.1 Affine and Quadratic Policies for ∩-Ellipsoidal Uncertainty Sets

Let us consider the specific case when the uncertainty sets Wk are given by the intersection offinitely many convex quadratic forms, and have nonempty interior (this is one of the most generalclasses of uncertainty sets treated in the robust optimization literature, see, e.g., Ben-Tal et al.[2009]).

We first focus attention on affine disturbance-feedback policies, i.e., uk(w[k]) = Lk B1(w[k]), andperform the same substitution of a piece-wise affine stage cost with an affine cost that over-boundsit9. Finding the optimal affine policies then requires solving the following instance of Problem(PPOP):

minLk ,zk,zk,0,J

J (19a)

J ≥T

k=0

(

zTk w[k] + zk,0

)

, (19b)

zTk B1(w[k]) ≥ c0(k, i) + cx(k, i)T xk(w[k]) + cu(k, i)T uk(w[k]), (19c)

∀w[k] ∈ W[k], ∀ i ∈ Ik, ∀ k ∈ {0, . . . , T − 1},

zTT B1(w[T ]) ≥ c0(T, i) + cx(T, i)T xT (w[T ]), (19d)

∀w[T ] ∈ W[T ], ∀ i ∈ IT ,

(PAFF)(

xk+1(w[k+1]) = Ak xk(w[k]) + Bk uk(w[k]) + w(k),)

(19e)

∀ k ∈ {0, . . . , T − 1},

fk ≥ Ex(k)xk(w[k]) + Eu(k)uk(w[k]), (19f)

∀w[k] ∈ W[k], ∀k ∈ {0, . . . , T − 1},

fT ≥ Ex(T )xT (w[T ]). (19g)

∀w[T ] ∈ W[T ].

In this formulation, the decision variables are {Lk}0≤k≤T−1, {zk}0≤k≤T and J , and equation (19e)should be interpreted as giving the dependency of xk on w[k] and the decision variables, which canthen be used in the constraints (19c), (19d), (19f), and (19g). Note that, in the above optimizationproblem, all the constraints are bi-affine functions of the uncertainties and the decision variables,and thus, since the uncertainty sets W[k] have tractable conic representations, the techniques inBen-Tal et al. [2009] can be used to compute the optimal decisions in (PAFF).

Letting J⋆AFF denote the optimal value in (PAFF), and with J⋆

d=r representing the optimal valueobtained from our polynomial hierarchy (with SOS constraints) for degree d = r, we have thefollowing result.

Theorem 1. If the uncertainty sets Wk are given by the intersection of finitely many convexquadratic forms, and have nonempty interior, then the objective functions obtained from the poly-

9This is the same approach as that taken in Ben-Tal et al. [2009]; when the stage costs hk are already affine inxk, uk, the step is obviously not necessary

13

Page 14: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

nomial hierarchy satisfy the following relation

J⋆AFF ≥ J⋆

d=1 ≥ J⋆d=2 ≥ . . .

Proof. See Section 8.3 of the Appendix.

The above result suggests that the performance of our polynomial hierarchy can never be worsethan that of the best affine policies.

For the same case of Wk given by intersection of convex quadratic forms, a popular techniqueintroduced by Ben-Tal and Nemirovski in the robust optimization literature, and based on usingthe approximate S-Lemma, could be used for computing quadratic decision rules. More precisely,the resulting problem (PQUAD) can be obtained from (PAFF) by using uk(xk) = Lk · B2(w[k]), and

by replacing zTk B2(w[k]) and zT

T B2(w[T ]) in (19c) and (19d), respectively. Since all the constraintsbecome quadratic polynomials in indeterminates w[k], one can use the Approximate S-Lemma toenforce the resulting constraints (See Chapter 14 in Ben-Tal et al. [2009] for details). If we letJ⋆

QUAD denote the optimal value resulting from this method, a proof paralleling that of Theorem 1can be used to show that J⋆

QUAD ≥ J⋆d=2, i.e., the performance of the polynomial hierarchy for d ≥ 2

cannot be worse than that delivered by the S-Lemma method.

In view of these results, one can think of the polynomial framework as a generalization of twoclassical methods in the literature, with the caveat that (for degree d ≥ 3), the resulting SOSproblems that need to be solved can be more computationally challenging.

4.2 Determining the Optimal Value for Polytopic Uncertainties

Here, we briefly discuss a specific class of Problems (P ), for which the exact optimal value can becomputed by solving a (large) mathematical program. This is particularly useful for benchmarkingpurposes, since it allows a precise assessment of the polynomial framework’s performance (notethat the approach presented in Section 3 is applicable to the general problem, described in theintroduction).

Consider the particular case of polytopic uncertainty sets, i.e., when all the polynomial functionsgj in (2) are actually affine. It can be shown (see Theorem 2 in Bemporad et al. [2003]) that piece-wise affine state-feedback policies10 uk(xk) are optimal for the resulting Problem (P ), and that thesequence of uncertainties that achieves the min-max value is an extreme point of the uncertaintyset, that is, w[T ] ∈ ext(W0)×· · ·×ext(WT−1). As an immediate corollary of this result, the optimalvalue for Problem (P ), as well as the optimal decision at time k = 0 for a fixed initial state x0,u⋆

0(x0), can be computed by solving the following optimization problem (see Ben-Tal et al. [2005a],

10One could also immediately extend the result of Garstka and Wets [1974] to argue that disturbance-feedbackpolicies uk(w[k]) are also optimal.

14

Page 15: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

Bemporad et al. [2002, 2003] for a proof):

minuk(w[k]),zk(w[k]),J

J (20a)

J ≥T

k=0

zk(w[k]), (20b)

zk(w[k]) ≥ hk

(

xk(w[k]),uk(w[k]))

, k = 0, . . . , T − 1, (20c)

(P )ext zT (w[T ]) ≥ hT

(

xT (w[T ]))

, (20d)

xk+1(w[k+1]) = Ak xk(w[k]) + Bk uk(w[k]) + w(k), k = 0, . . . , T − 1, (20e)

fk ≥ Ex(k)xk(w[k]) + Eu(k)uk(w[k]), k = 0, . . . , T − 1, (20f)

fT ≥ Ex(T )xT (w[T ]). (20g)

In this formulation, non-anticipatory uk(w[k]) control values and corresponding states xk(w[k]) arecomputed for every vertex of the disturbance set, i.e., for every w[k] ∈ ext(W0)×· · ·×ext(Wk−1), k =0, . . . , T − 1. The variables zk(w[k]) are used to model the stage cost at time k, in scenario w[k].Note that constraints (20c), (20d) can be immediately rewritten in linear form, since the functionshk(x,u), hT (x) are piece-wise affine and convex in their arguments.

We emphasize that the formulation does not seek to compute an actual policy u⋆k(xk), but

rather the values that this policy would take (and the associated states and costs), when the un-certainty realizations are restricted to extreme points of the uncertainty set. As such, the variablesuk(w[k]),xk(w[k]) and zk(w[k]) must also be forced to satisfy a non-anticipativity constraint11,which is implicitly taken into account when only allowing them to depend on the portion of theextreme sequence available at time k, i.e., w[k]. Due to this coupling constraint, Problem (P )ext

results in a Linear Program which is doubly-exponential in the horizon T , with the number of vari-ables and the number of constraints both proportional to the number of extreme sequences in theuncertainty set, O

(∏T−1

k=0 |ext(Wk)|)

. Therefore, solving (P )ext is relevant only for small horizons,but is very useful for benchmarking purposes, since it provides the optimal value of the originalproblem.

We conclude this section by examining a particular example when the uncertainty sets take aneven simpler form, and polynomial policies (11) are provably optimal. More precisely, we considerthe case of scalar uncertainties (nw = 1), and

w(k) ∈ W(k)def= [wk, wk] ⊂ R, ∀ k = 0, . . . , T − 1, (21)

known in the literature as box uncertainty Ben-Tal and Nemirovski [2002], Ben-Tal et al. [2004].

Under this model, any partial uncertain sequence w[k]def= (w0, . . . , wk−1) will be a k-dimensional

vector, lying inside the hypercube W[k]def= W0 × · · · ×Wk−1 ⊂ R

k.

Introducing the subclass of multi-affine policies12 of degree d, given by

uj(k,w[k]) =∑

α∈{0,1}k

ℓα (w[k])α, where

k∑

i=1

αi ≤ d, (22)

11In our current notation, non-anticipativity is equivalent to requiring that, for any two sequences (w0, . . . , wT−1)and (w0, . . . , wT−1) satisfying wt = wt,∀ t ∈ {0, . . . , k − 1}, we have ut(w[t]) = ut(w[t]),∀ t ∈ {0, . . . , k}.

12Note that these are simply polynomial policies of the form (11), involving only square-free monomials, i.e., every

monomial, wα[k]

def=

Qk−1i=0 w

αi

i , satisfies the condition αi ∈ {0, 1}.

15

Page 16: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

one can show (see Theorem 2 in the Appendix) that multi-affine policies of degree T −1 are, in fact,optimal for Problem (P ). While this theoretical result is of minor practical importance (due tothe large degree needed for the policies, which translates into prohibitive computation), it providesmotivation for restricting attention to polynomials of smaller degree, as a midway solution thatpreserves tractability, while delivering high quality objective values.

For completeness, we remark that, for the case of box-uncertainty, the authors in Ben-Tal et al.[2009] show one can seek separable polynomial policies of the form

uj(k,w[k]) =

k∑

i=1

pi(wi), ∀ j ∈ {1, . . . , nu}, ∀ k ∈ {0, . . . , T − 1},

where pi ∈ Pd[x] are univariate polynomials in indeterminate x. The advantage of this approach isthat the reformulation of a typical state-control constraint would be exact (refer to Lemma 14.3.4in Ben-Tal et al. [2009]). The main pitfall, however, is that for the case of box-uncertainty, such arule would never improve over purely affine rules, i.e., where all the polynomials pi have degree 1(refer to Lemma 14.3.6 in Ben-Tal et al. [2009]). However, as we will see in our numerical results(to be presented in Section 6), polynomials policies that are not separable, i.e., are of the generalform (11), can and do improve over the affine case.

5 Examples from Inventory Management

To test the performance of our proposed policies, we consider two problems arising in inventorymanagement.

5.1 Single Echelon with Cumulative Order Constraints

This first example was originally discussed in a robust framework by Ben-Tal et al. [2005b], in thecontext of a more general model for the problem of negotiating flexible contracts between a retailerand a supplier in the presence of uncertain orders from customers. We describe a simplified versionof the problem, which is sufficient to illustrate the benefit of our approach, and refer the interestedreader to Ben-Tal et al. [2005b] for more details.

The setting is the following: consider a single-product, single-echelon, multi-period supply chain,in which inventories are managed periodically over a planning horizon of T periods. The unknowndemands wk from customers arrive at the (unique) echelon, henceforth referred to as the retailer,and are satisfied from the on-hand inventory, denoted by xk at the beginning of period k. Theretailer can replenish the inventory by placing orders uk, at the beginning of each period k, for acost of ck per unit of product. These orders are immediately available, i.e., there is no lead-time inthe system, but there are capacities on the order size in every period, Lk ≤ uk ≤ Uk, as well as onthe cumulative orders places in consecutive periods, Lk ≤

∑kt=0 ut ≤ Uk. After the demand wk is

realized, the retailer incurs holding costs Hk+1 ·max{0, xk + uk −wk} for all the amounts of supplystored on her premises, as well as penalties Bk+1 · max{wk − xk − uk, 0}, for any demand that isbacklogged.

In the spirit of robust optimization, we assume that the only information available about thedemand at time k is that it resides within an interval centered around a nominal (mean) demanddk, which results in the uncertainty set Wk = {wk ∈ R :

∣wk − dk

∣ ≤ ρ · dk }, where ρ ∈ [0, 1] canbe interpreted as an uncertainty level.

With the objective function to be minimized as the cost resulting in the worst-case scenario, weimmediately obtain an instance of our original Problem (P ), i.e., a linear system with n = 2 states

16

Page 17: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

and nu = 1 control, where x1(k) represents the on-hand inventory at the beginning of time k, andx2(k) denotes the total amount of orders placed in prior times, x2(k) =

∑k−1t=0 u(t). The dynamics

are specified by

x1(k + 1) = x1(k) + u(k) − w(k),

x2(k + 1) = x2(k) + u(k),

with the constraints

Lk ≤ u(k) ≤ Uk,

Lk ≤ x2(k) + u(k) ≤ Uk,

and the costs

hk(xk, uk) = max{

ck uk + [Hk, 0]T xk, ck uk + [−Bk, 0]T xk

}

,

hT (xT ) = max{

[HT , 0]T xT , [−BT , 0]T xT

}

.

We remark that the cumulative order constraints, Lk ≤∑k

t=0 ut ≤ Uk, are needed here, sinceotherwise, the resulting (one-dimensional) system would fit the theoretical results from Bertsimaset al. [2010], which would imply that polynomial policies of the form (11) and polynomial stagecosts of the form (16b) are already optimal for degree d = 1 (affine). Therefore, testing for higherorder polynomial policies would not add any benefit.

5.2 Serial Supply Chain

As a second problem, we consider a serial supply chain, in which there are J echelons, numbered1, . . . , J , managed over a planning horizon of T periods by a centralized decision maker. The j-th echelon can hold inventory on its premises, for a per-unit cost of Hj(k) in time period k. Inevery period, echelon 1 faces the unknown, external demands w(k), which it must satisfy fromthe on-hand inventory. Unmet demands can be backlogged, incurring a particular per-unit cost,B1(k). The j-th echelon can replenish its on-hand inventory by placing orders with the immediateechelon in the upstream, j + 1, for a per-unit cost of cj(k). For simplicity, we assume the ordersare received with zero lead-time, and are only constrained to be non-negative, and we assume thatthe last echelon, J , can replenish inventory from a supplier with infinite capacity.

Following a standard requirement in inventory theory Zipkin [2000], we maintain that, undercentralized control, orders placed by echelon j at the beginning of period k cannot be backloggedat echelon j +1, and thus must always be sufficiently small to be satisfiable from on-hand inventoryat the beginning13 of period k at echelon j + 1. As such, instead of referring to orders placed byechelon j to the upstream echelon j +1, we will refer to physical shipments from j +1 to j, in everyperiod.

This problem can be immediately translated into the linear systems framework mentioned be-fore, by introducing the following states, controls, and uncertainties:

• Let xj(k) denote the local inventory at stage j, at the beginning of period k.

• Let uj(k) denote the shipment sent in period k from echelon j + 1 to echelon j.

13This implies that the order placed by echelon j in period k (to the upstream echelon, j + 1) cannot be used tosatisfy the order in period k from the downstream echelon, j − 1. Technically, this corresponds to an effective leadtime of 1 period, and a more appropriate model would redefine the state vector accordingly. We have opted to keepour current formulation for simplicity.

17

Page 18: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

• Let the unknown external demands arriving at echelon 1 represent the uncertainties, w(k).

The dynamics of the linear system can then be formulated as

x1(k + 1) = x1(k) + u1(k) − w(k), k = 0, . . . , T − 1,

xj(k + 1) = xj(k) + uj(k) − uj−1(k), j = 2, . . . , J, k = 0, . . . , T − 1,

with the following constraints on the states and controls

uj(k) ≥ 0, j = 1, . . . , J, k = 0, . . . , T − 1, (non-negative shipments)

xj(k) ≥ uj−1(k), j = 2, . . . , J, k = 0, . . . , T − 1, (downstream order ≤ upstream inventory)

and the costs

h1

(

k, x1(k), u1(k))

= c1(k)u1(k) + max{

H1(k)x1(k), −B1(k)x1(k)}

, k = 0, . . . , T − 1

h1

(

T, x1(T ))

= max{

H1(T )x1(T ), −B1(T )x1(T )}

,

hj

(

k, xj(k), uj(k))

= cj(k)uj(k) + Hj(k)xj(k), k = 0, . . . , T − 1

hj

(

T, xj(T ))

= Hj(T )xj(T ).

With the same model of uncertainty as before, Wk =[

dk(1− ρ), dk(1 + ρ)]

, for some known meandemand dk and uncertainty level ρ ∈ [0, 1], and the goal to decide shipment quantities uj(k) so asto minimize the cost in the worst-case scenario, we obtain a different example of Problem (P ).

6 Numerical Experiments

In this section, we present numerical simulations testing the performance of polynomial policiesin each of the two problems mentioned in Section 5. In order to examine the dependency of ourresults on the size of the problem, we proceed in the following fashion.

6.1 First Example

For the first model (single echelon with cumulative order constraints), we vary the horizon of theproblem from T = 4 to T = 10, and for every value of T , we:

1. Create 100 problem instances, by randomly generating the cost parameters and the con-straints, in which the performance of polynomial policies of degree 1 (affine) is suboptimal.

2. For every such instance, we compute:

• The optimal cost OPT , by solving the exponential Linear Program (P )ext.

• The optimal cost Pd obtained with polynomial policies of degree d = 1, 2, and 3, re-spectively, by solving the corresponding associated SDP formulations, as introduced inSection 3.

We also record the relative optimality gap corresponding to each polynomial policy, definedas (Pd − OPT )/OPT , and the solver time.

3. We compute statistics over the 100 different instances (recording the mean, standard devi-ation, min, max and median) for the optimality gaps and solver times corresponding to allthree polynomial parameterizations.

18

Page 19: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

Table 1 and Table 2 record these statistics for relative gaps and solver times, respectively. Thefollowing conclusions can be drawn from the results:

• Policies of higher degree decrease the performance gap considerably. In particular, while affinepolicies yield an average gap between 2.8% and 3.7% (with a median gap between 2% and2.7%), quadratic policies reduce both average and median gaps by a factor of 3, and cubicpolicies essentially close the optimality gap (all gaps are smaller than 1%, with a median gapsmaller than 0.01%). To better see this, Figure 1 illustrates the box-plots corresponding tothe three policies for a typical case (here, T = 6).

• The reductions in the relative gaps are not very sensitive to the horizon, T . Figure 2(a)illustrates this effect for the case of quadratic policies, and similar plots can be drawn for theaffine and cubic cases.

• The computational time grows polynomially with the horizon size. While computations forcubic policies are rather expensive, the quadratic case, shown in Figure 2(b), shows promisefor scalability - for horizon T = 10, the median and average solver times are below 15 seconds.

6.2 Second Example

For the second model (serial supply chain), we fix the problem horizon to T = 7, and vary thenumber of echelons from J = 2 to J = 5. For every resulting size, we go through the same steps 1-3as outlined above, and record the same statistics, displayed in Table 3 and Table 4, respectively.Essentially the same observations as before hold. Namely, policies of higher degree result in strictimprovements of the objective function, with cubic policies always resulting in gaps smaller than 1%(see Figure 3(a) for a typical case). Also, increasing the problem size (here, this corresponds to thenumber of echelons, J) does not affect the reductions in gaps, and the computational requirementsdo not increase drastically (see Figure 3(b), which corresponds to quadratic policies).

All our computations were done in a MATLABR© environment, on the MIT Operations ResearchCenter computational machine (3 GHz IntelR© Dual Core Xeon R© 5050 Processor, with 8GB of RAMmemory, running Ubuntu Linux). The optimization problems were formulated using YALMIPLofberg [2004], and the resulting SDPs were solved with SDPT3 Toh et al. [1999].

We remark that the computational times could be substantially reduced by exploiting thestructure of the polynomial optimization problems (e.g., Nie [2009]), and by utilizing more suitabletechniques for solving smooth large-scale SDPs (see, e.g., Lan et al. [2009] and the referencestherein). Such techniques are immediately applicable to our setting, and could provide a large speed-

Table 1: Relative gaps (in %) for polynomial policies in Example 1

Degree d = 1 Degree d = 2 Degree d = 3T avg std mdn min max avg std mdn min max avg std mdn min max

4 2.84 2.41 2.18 0.02 9.76 0.75 0.85 0.47 0.00 3.79 0.03 0.12 0.00 0.00 0.915 2.82 2.29 2.52 0.04 11.22 0.62 0.71 0.39 0.00 3.92 0.02 0.09 0.00 0.00 0.566 3.09 2.63 2.36 0.01 9.82 0.69 0.89 0.25 0.00 3.47 0.03 0.10 0.00 0.00 0.597 3.25 2.95 2.58 0.13 15.00 0.83 0.99 0.43 0.00 4.79 0.06 0.17 0.00 0.00 0.938 3.66 3.29 2.69 0.03 18.36 1.06 1.17 0.74 0.00 5.81 0.10 0.17 0.00 0.00 0.999 2.93 2.78 2.12 0.05 11.56 0.80 0.86 0.55 0.00 3.39 0.07 0.13 0.00 0.00 0.6110 3.44 3.60 2.09 0.00 18.20 0.76 1.16 0.26 0.00 5.76 0.05 0.12 0.00 0.00 0.74

19

Page 20: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

Figure 1: Box plots comparing the performance of different polynomial policies for horizon T = 6

Rel

ativ

ega

ps

(in

%)

Degree of the policy1 2 3

0

1

2

3

4

5

6

7

8

9

10

Rel

ativ

ega

ps

(in

%)

Horizon

0

1

2

3

4

4

5

5

6

6 7 8 9 10

(a)

Sol

ver

tim

es(s

econ

ds)

Horizon

2

4

4 5

6

6 7

8

8 9

10

10

12

14

16

18

(b)

Figure 2: Performance of quadratic policies for Example 1 - (a) illustrates the weak dependencyof the improvement on the problem size (measured in terms of the horizon T ), while (b) comparesthe solver times required for different problem sizes.

up over general-purpose algorithms (such as the interior point methods implemented in SDPT3),

Table 2: Solver times (in seconds) for polynomial policies in Example 1

Degree d = 1 Degree d = 2 Degree d = 3T avg std mdn min max avg std mdn min max avg std mdn min max

4 0.47 0.05 0.46 0.38 0.63 1.27 0.10 1.27 1.13 1.62 3.33 0.21 3.24 3.01 4.035 0.58 0.06 0.58 0.46 0.75 2.03 0.20 1.97 1.69 2.65 7.51 0.91 7.27 6.58 12.086 0.73 0.11 0.72 0.62 1.50 2.29 0.22 2.28 1.87 3.26 18.96 2.54 18.25 16.07 31.867 0.88 0.08 0.87 0.72 1.07 3.08 0.23 3.10 2.47 3.67 48.83 5.63 47.99 40.65 74.098 1.13 0.12 1.11 0.94 1.92 4.79 0.32 4.75 3.97 5.96 157.73 20.67 153.91 126.15 217.809 1.53 0.17 1.51 1.27 2.66 7.65 0.51 7.65 6.10 9.59 420.75 60.10 411.09 334.71 760.1310 1.31 0.15 1.30 1.07 2.19 14.77 1.24 14.80 11.81 18.57 1846.94 600.89 1640.10 1313.18 4547.09

20

Page 21: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

Rel

ativ

ega

ps

(in

%)

Degree of polynomial policies1 2 3

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

(a)

Sol

ver

tim

es(s

econ

ds)

Number of echelons2 3 4 5

10

20

30

40

50

60

(b)

Figure 3: Performance of polynomial policies for Example 2. (a) compares the three policies forproblems with J = 3 echelons, and (b) shows the solver times needed to compute quadratic policiesfor different problem sizes.

hence allowing much larger and more complicated instances to be solved.

7 Conclusions

In this paper, we have presented a new method for dealing with multi-stage decision problems af-fected by uncertainty, applicable to robust optimization and stochastic programming. Our approachconsists of constructing a hierarchy of sub-optimal polynomial policies, parameterized directly inthe observed uncertainties. The problem of computing such an optimal polynomial policy canbe reformulated as an SDP, which can be solved efficiently with interior point methods. Fur-thermore, the approach allows modelling flexibility, in that the degree of the polynomial policiesexplicitly controls the trade-off between the quality of the approximation and the computationalrequirements. To test the quality of the policies, we have considered two applications in inventorymanagement, one involving a single echelon with constrained cumulative orders, and the secondinvolving a serial supply chain. For both examples, quadratic policies (requiring modest computa-tional requirements) were able to substantially reduce the optimality gap, and cubic policies (undermore computational requirements) were always within 1% of optimal. Given that our tests wererun using publicly-available, general-purpose SDP solvers, we believe that, with the advent of morepowerful (commercial) packages for interior point methods, as well as dedicated algorithms for

Table 3: Relative gaps (in %) for polynomial policies in Example 2

Degree d = 1 Degree d = 2 Degree d = 3J avg std mdn min max avg std mdn min max avg std mdn min max

2 1.87 1.48 1.47 0.00 8.27 1.38 1.16 1.11 0.00 6.48 0.06 0.14 0.01 0.00 0.963 1.47 0.89 1.27 0.16 4.46 1.08 0.68 0.93 0.14 3.33 0.04 0.06 0.00 0.00 0.324 1.14 2.46 0.70 0.05 24.63 0.67 0.53 0.53 0.01 2.10 0.04 0.07 0.00 0.00 0.385 0.35 0.37 0.21 0.03 1.85 0.27 0.32 0.15 0.00 1.59 0.02 0.03 0.00 0.00 0.15

21

Page 22: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

Table 4: Solver times (in seconds) for polynomial policies Example 2

Degree d = 1 Degree d = 2 Degree d = 3J avg std mdn min max avg std mdn min max avg std mdn min max

2 1.22 0.20 1.18 0.86 2.35 5.58 1.05 5.44 3.82 8.79 81.64 14.02 80.88 52.55 116.563 1.72 0.26 1.70 1.21 3.09 8.84 1.40 8.53 6.83 13.19 115.08 20.91 109.96 77.29 183.844 1.57 0.22 1.55 1.20 2.85 12.59 1.63 12.44 8.86 17.86 160.05 19.34 159.29 82.11 207.565 2.59 1.46 1.97 1.51 8.18 18.97 6.59 17.59 13.21 63.71 250.43 109.96 227.56 144.54 952.37

solving SOS problems, our method should have applicability to large scale, real-world optimizationproblems.

8 Appendix

8.1 Suboptimality of Affine Policies

Lemma 1. Consider Problem (9), written below for convenience. Recall that x is a (first-stage)non-adjustable decision, while y is a second-stage adjustable policy (allowed to depend on w).

minimizex,y(w)

x

such that x ≥N

i=1

yi, ∀w ∈ W ={

(w1, . . . , wN ) ∈ RN : ‖w‖2 ≤ 1

}

, (23a)

yi ≥ w2i , ∀w ∈ W. (23b)

The optimal value in the problem is 1, corresponding to policies yi(w) = w2i , i = 1, . . . , N . Fur-

thermore, the optimal achievable objective under affine policies y(w) is N .

Proof. Note that for any feasible x,y, we have x ≥∑N

i=1 yi ≥∑N

i=1 w2i , for any w ∈ W. Therefore,

with∑N

i=1 w2i = 1, we must have x ≥ 1. Also note that y⋆

i (w) = w2i is robustly feasible for

constraint (23b), and results in an objective x⋆ = maxw∈W∑N

i=1 w2i = 1, which equals the lower

bound, and is hence optimal.Consider an affine policy in the second stage, yAFF

i (w) = βi + αTi w, i = 1, . . . , N . With e1

denoting the first unit vector (1 in the first component, 0 otherwise), for any i = 1, . . . , N , we have:

w = e1 ∈ W ⇒ βi + αi(1) ≥ 1

w = −e1 ∈ W ⇒ βi − αi(1) ≥ 1

}

⇒ βi ≥ 1.

This implies that xAFF ≥∑N

i=1 yAFFi (w) ≥ N +

∑Ni=1 αT

i w. In particular, with w = 0 ∈ W, wehave xAFF ≥ N . The optimal choice, in this case, will be to set αi = 0, resulting in xAFF = N .

8.2 Optimality of Multi-affine Policies

Theorem 2. Multi-affine policies of the form (22), with degree at most d = T − 1, are optimal forproblem (P ).

Proof. The following trivial observation will be useful in our analysis:

Observation 1. A multi-affine policy uj of the form (22) is an affine function of a given variablewi, when all the other variables wl, l 6= i, are fixed. Also, with uj of degree at most d, the number

of coefficients ℓα is(

k0

)

+(

k1

)

+ · · · +(

kd

)

.

22

Page 23: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

Recall that the optimal value in Problem (P ) is that same as the optimal value in Problem(P )ext from Section 4.2. Let us denote the optimal decisions obtained from solving problem (P )ext

by uextk (w[k]),x

extk (w[k]), respectively. Note that, at time k, there are at most 2k such distinct

values uextk (w[k]), and, correspondingly, at most 2k values xext

k (w[k]), due to the non-anticipativitycondition and the fact that the extreme uncertainty sequences at time k, w[k] ∈ ext(W[k]) =

ext(W0) × · · · × ext(Wk−1), are simply the vertices of the hypercube W[k] ⊂ Rk. In particular, at

the last time when decisions are taken, k = T − 1, there are at most 2T−1 distinct optimal valuesuext

T−1(w[T−1]) computed.Consider now a multi-affine policy of the form (22), of degree T −1, implemented at time T −1.

By Observation 1, the number of coefficients in the j-th component of such a policy is exactly(

T−10

)

+(

T−11

)

+ · · · +(

T−1T−1

)

= 2T−1, by Newton’s binomial formula. Therefore, the total nu · 2T−1

coefficients for uT−1 could be computed so that

uT−1(w[T−1]) = uextT−1(w[T−1]), ∀ w[T−1] ∈ ext(W[T−1]), (24)

that is, the value of the multi-affine policy exactly matches the 2T−1 optimal decisions computedin (P )ext, at the 2T−1 vertices of W[T−1]. The same process can be conducted for times k =T − 2, . . . , 1, 0, to obtain multi-affine policies of degree at most14 T − 1 that match the valuesuext

k (w[k]) at the extreme points of W[k].With such multi-affine control policies, it is easy to see that the states xk become multi-affine

functions of w[k]. Furthermore, we have xk(w[k]) = xextk (w[k]), ∀w[k] ∈ ext(w[k]). A typical state-

control constraint (7c) written at time k amounts to ensuring that

ex(k, j)T xk(w[k]) + eu(k, j)T uk(w[k]) − fj(k) ≤ 0,

∀w[k] ∈ W[k],

where ex(k, j)T ,eu(k, j)T denote the j-th row of Ex(k) and Eu(k), respectively. Note that theleft-hand side of this expression is also a multi-affine function of the variables w[k]. Since, byour observation, the maximum of multi-affine functions is reached at the vertices of the fea-sible set, i.e., w[k] ∈ ext(W[k]), and, by (24), we have that for any such vertex, uk(w[k]) =uext

k (w[k]),xk(w[k]) = xextk (w[k]), we immediately conclude that the constraint above is satisfied,

since uextk (w[k]),x

extk (w[k]) are certainly feasible.

A similar argument can be invoked for constraint (7d), and also to show that the maximumof the objective function is reached on the set of vertices ext(W[T ]), and, since the values of themulti-affine policies exactly correspond to the optimal decisions in program (P )ext, optimality ispreserved.

8.3 Comparison with Other Methodologies

Theorem. If the uncertainty sets Wk are given by the intersection of finitely many convex quadraticforms, and have nonempty interior, then the objective functions obtained from the polynomial hi-erarchy satisfy the following relation:

J⋆AFF ≥ J⋆

d=1 ≥ J⋆d=2 ≥ . . . ,

Proof. First, note that the hierarchy can only improve when the polynomial degree d is increased(this is because any feasible solutions for a particular degree d remain feasible for degree d + 1).Therefore, we only need to prove the first inequality.

14In fact, multi-affine policies of degree k would be sufficient at time k

23

Page 24: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

Consider any feasible solution to Problem (PAFF) under disturbance-affine policies, i.e., anychoice of matrices {Lk}0≤k≤T−1, coefficients {zk}0≤k≤T and cost J , such that all constraints in(PAFF) are satisfied.

Note that a typical constraint in Problem (PAFF) becomes

f(w[k]) ≥ 0, ∀w[k] ∈ W[k],

where f is a degree 1 polynomial in indeterminate w[k], with coefficients that are affine functions ofthe decision variables. By the assumption in the statement of the theorem, the sets Wk are convex,with nonempty interior, ∀ k ∈ {0, . . . , T − 1}, which implies that W[k] = W0 × · · · × Wk−1 is alsoconvex, with non-empty interior.

Therefore, the typical constraint above can be written as

f(w[k]) ≥ 0, ∀w[k] ∈{

ξ ∈ Rk×nw : gj(ξ) ≥ 0, j ∈ J

}

,

where J is a finite index set, and gj(·) are convex. By the nonlinear Farkas Lemma (see, e.g.,Proposition 3.5.4 in Bertsekas et al. [2003]), there must exist multipliers 0 ≤ λj ∈ R,∀ j ∈ J , suchthat

f(w[k]) ≥∑

j∈J

λjgj(w[k]).

But then, recall that our SOS framework required the existence of polynomials σj(w[k]), j ∈ {0}∪J ,such that

f(w[k]) = σ0(w[k]) +∑

j∈J

σj(w[k]) gj(w[k]).

By choosing σj(w[k]) ≡ λj , ∀ j ∈ J , and σ0(w[k]) = f(w[k])−∑

j∈J λjgj(w[k]), we can immediatelysee that:

• ∀ j 6= 0, σj are SOS (they are positive constants)

• Since gj are quadratic, and f is affine, σ0 is a quadratic polynomial which is non-negative,for any w[k]. Therefore, since any such polynomial can be represented as a sum-of-squares(see Parrilo [2003], Lasserre [2001]), we also have that σ0 is SOS.

By these two observations, we can conclude that the particular choice Lk,zk, J will also remainfeasible in our SOS framework applied to degree d = 1, and, hence, J⋆

AFF ≥ J⋆d=1.

References

Soren Asmussen and Peter Glynn. Stochastic Simulation: Algorithms and Analysis. Springer, 2007. ISBN978-0387306797.

A. Bemporad, F. Borrelli, and M. Morari. Optimal Controllers for Hybrid Systems: Stability and PiecewiseLinear Explicit Form. In Decision and Control, 2000. Proceedings of the 39th IEEE Conference on,volume 2, pages 1810–1815 vol.2, 2000. doi: 10.1109/CDC.2000.912125.

A. Bemporad, F. Borrelli, and M. Morari. Model Predictive Control Based on Linear Programming - TheExplicit Solution. IEEE Transactions on Automatic Control, 47(12):1974–1985, 2002.

24

Page 25: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

A. Bemporad, F. Borrelli, and M. Morari. Min-Max Control of Constrained Uncertain Discrete-Time LinearSystems. IEEE Transactions on Automatic Control, 48(9):1600–1606, 2003. ISSN 0018-9286.

A. Ben-Tal and A. Nemirovski. Robust optimization - methodology and applications. Math. Program., 92(3):453–480, 2002. ISSN 0025-5610. doi: http://dx.doi.org/10.1007/s101070100286.

A. Ben-Tal, A. Goryashko, E. Guslitzer, and A. Nemirovski. Adjustable robust solutions of uncertainlinear programs. Math. Program., 99(2):351–376, 2004. ISSN 0025-5610. doi: http://dx.doi.org/10.1007/s10107-003-0454-y.

A. Ben-Tal, S. Boyd, and A. Nemirovski. Control of Uncertainty-Affected Discrete Time Linear Systems viaConvex Programming. Working paper, 2005a.

A. Ben-Tal, B. Golany, A. Nemirovski, and J.-P. Vial. Retailer-Supplier Flexible Commitments Contracts: ARobust Optimization Approach. Manufacturing & Service Operations Management, 7(3):248–271, 2005b.ISSN 1526-5498. doi: http://dx.doi.org/10.1287/msom.1050.0081.

A. Ben-Tal, S. Boyd, and A. Nemirovski. Extending Scope of Robust Optimization: Comprehensive RobustCounterparts of Uncertain Problems. Mathematical Programming, 107(1):63–89, 2006. ISSN 0025-5610.doi: http://dx.doi.org/10.1007/s10107-005-0679-z.

Aharon Ben-Tal, Laurent El-Ghaoui, and Arkadi Nemirovski. Robust Optimization. Princeton Series inApplied Mathematics. Princeton University Press, 2009.

D. P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, Belmont, MA, 2001. ISBN1886529272.

D. P. Bertsekas and J. N. Tsitsiklis. Neurodynamic Programming. Athena Scientific, Belmont, MA, 1996.

D. P. Bertsekas, A. Nedic, and A. Ozdaglar. Convex Analysis and Optimization. Athena Scientific, 2003.ISBN 1886529450.

Dimitris Bertsimas and David B. Brown. Constrained Stochastic LQC: A Tractable Approach. IEEETransactions on Automatic Control, 52(10):1826–1841, Oct. 2007.

Dimitris Bertsimas, Dan A. Iancu, and Pablo A. Parrilo. Optimality of Affine Policies in Multistage RobustOptimization. MATHEMATICS OF OPERATIONS RESEARCH, 35(2):363–394, 2010. doi: 10.1287/moor.1100.0444. URL http://mor.journal.informs.org/cgi/content/abstract/35/2/363.

John R. Birge and Francois Louveaux. Introduction to Stochastic Programming. Springer, 2000. ISBN0387982175.

Debasish Chatterjee, Peter Hokayem, and John Lygeros. Stochastic model predictive control withbounded control inputs: a vector space approach. Submitted for publication., March 2009. URLhttp://arxiv.org/abs/0903.5444.

Xin Chen and Yuhan Zhang. Uncertain Linear Programs: Extended Affinely Adjustable Robust Coun-terparts. Operations Research, page opre.1080.0605, 2009. doi: 10.1287/opre.1080.0605. URLhttp://or.journal.informs.org/cgi/content/abstract/opre.1080.0605v1.

Xin Chen, Melvyn Sim, Peng Sun, and Jiawei Zhang. A Linear Decision-Based Approximation Approach toStochastic Programming. Operations Research, 56(2):344–357, 2008. doi: 10.1287/opre.1070.0457.

G. E. Dullerud and F. Paganini. A Course in Robust Control Theory. Springer, 2005. ISBN 0387989455.

Martin Dyer and Leen Stougie. Computational complexity of stochastic programming problems. Math-ematical Programming, 106(3):423–432, 2006. ISSN 0025-5610. doi: http://dx.doi.org/10.1007/s10107-005-0597-0.

25

Page 26: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

Stanley J. Garstka and Roger J.-B. Wets. On Decision Rules in Stochastic Programming. MathematicalProgramming, 7(1):117–143, 1974. ISSN 0025-5610. doi: 10.1007/BF01585511.

Paul Glasserman and Sridhar Tayur. Sensitivity Analysis for Base-Stock Levels in Multiechelon Production-Inventory Systems. Management Science, 41(2):263–281, 1995. doi: 10.1287/mnsc.41.2.263. URLhttp://mansci.journal.informs.org/cgi/content/abstract/41/2/263.

P. J. Goulart and E.C. Kerrigan. Relationships Between Affine Feedback Policies for Robust Control withConstraints. In Proceedings of the 16th IFAC World Congress on Automatic Control, July 2005.

T. Jacobi and A. Prestel. Distinguished Representations of Strictly Positive Polynomials. Journal fur diereine und angewandte Mathematik (Crelles Journal), 2001(532):223–235, 2001. doi: 10.1515/crll.2001.023.URL http://www.reference-global.com/doi/abs/10.1515/crll.2001.023.

E.C. Kerrigan and J.M. Maciejowski. On robust optimization and the optimal control of constrained linearsystems with bounded state disturbances. In Proceedings of the 2003 European Control Conference,Cambridge, UK, September 2003.

E.C. Kerrigan and J.M. Maciejowski. Properties of a New Parameterization for theControl of Constrained Systems with Disturbances. Proceedings of the 2004 Ameri-can Control Conference, 5:4669–4674 vol.5, June-July 2004. ISSN 0743-1619. URLhttp://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=01384049.

Guangui Lan, Zhaosong Lu, and Renato D. C. Monteiro. Primal-dual First-order Methods withO(1/ǫ) Iteration-complexity for Cone Programming. Mathematical Programming, 2009. doi: 10.1007/s10107-008-0261-6.

J. B. Lasserre. Global Optimization with Polynomials and the Problem of Moments. SIAM Journal onOptimization, 11:796–817, 2001.

J. Lofberg. Approximations of closed-loop minimax MPC. Proceedings of the 42nd IEEE Conference onDecision and Control, 2:1438–1442 Vol.2, Dec. 2003. ISSN 0191-2216. doi: 10.1109/CDC.2003.1272813.

J. Lofberg. YALMIP : A Toolbox for Modeling and Optimization in MATLAB. In Proceedings of the CACSDConference, Taipei, Taiwan, 2004. URL http://control.ee.ethz.ch/~joloef/yalmip.php.

Peter Marbach and John N. Tsitsiklis. Simulation-based optimization of Markov reward processes. IEEETransactions on Automatic Control, 46(2):191–209, January 2001. doi: 10.1109/9.905687.

Arkadi Nemirovski and Alexander Shapiro. Continuous Optimization: Current Trends and Applications,volume 99 of Applied Optimization, chapter On Complexity of Stochastic Programming Problems, pages111–146. Springer US, 2005. doi: 10.1007/b137941.

Jiawang Nie. Regularization Methods for Sum of Squares Relaxations in Large Scale Polynomial Optimiza-tion. Submitted for publication., September 2009. URL http://arxiv.org/abs/0909.3551.

Pablo A. Parrilo. Structured Semidefinite Programs and Semialgebraic Geometry Methods in Robustness andOptimization. PhD thesis, California Institute of Technology, 2000.

Pablo A. Parrilo. Semidefinite programming relaxations for semialgebraic problems. Mathematical Program-ming Series B, 96(2):293–320, May 2003. doi: http://dx.doi.org/10.1007/s10107-003-0387-5.

Warren Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley Series inProbability and Statistics. Wiley-Interscience, 2007.

M. Putinar. Positive polynomials on compact semi-algebraic sets. Indiana University Mathematics Journal,42:969–984, 2003.

26

Page 27: A Hierarchy of Near-Optimal Policies for Multi-stage …dbertsim/papers/Robust Optimization...2009-06-15 · [2000], robust control Zhou and Doyle [1998], Dullerud and Paganini [2005],

Walter Rudin. Principles of Mathematical Analysis. Mc-Graw Hill Science/Engineering/Math, 1976. ISBN007054235X.

Melvyn Sim and Joel Goh. Distributionally Robust Optimization and its Tractable Approximations. Ac-cepted in Operations Research, April 2009.

J. Skaf and S. Boyd. Design of Affine Controllers Via Convex Optimization. Submitted to IEEE Transactionson Automatic Control, 2008a. URL http://www.stanford.edu/~boyd/papers/affine_contr.html.

J. Skaf and S. Boyd. Nonlinear Q-Design for Convex Stochastic Control. Submitted to IEEE Transactionson Automatic Control, 2008b. URL http://www.stanford.edu/~boyd/papers/nonlin_Q_param.html.

K.C. Toh, M.J. Todd, and R. Tutuncu. SDPT3 a MATLAB Software Package for Semidefinite Programming,1999. URL http://www.math.nus.edu.sg/~mattohkc/sdpt3.html.

Lieven Vandenberghe and Stephen Boyd. Semidefinite programming. SIAM Review, 38(1):49–95, 1996. ISSN0036-1445. doi: http://dx.doi.org/10.1137/1038003.

K. Zhou and J. C. Doyle. Essentials of Robust Control. Prentice Hall, 1998. ISBN 0135258332.

P. Zipkin. Foundations of Inventory Management. McGraw Hill, 2000. ISBN 0256113793.

27