Distribution of the Optimal Value of a Stochastic Mixed ...

Distribution of the Optimal Value of a Stochastic Mixed

Zero-One Linear Optimization Problem under Objective

Uncertainty

Karthik Natarajan∗ Chung-Piaw Teo† Zhichao Zheng‡

June 20, 2011

Abstract

This paper is motivated by the question to approximate the distribution of the completion

time of a project network with random activity durations. In general, we consider the

mixed zero-one linear optimization problem under objective uncertainty, and develop an

approach to approximate the distribution of its optimal value when the random objective

coefficients follow a multivariate normal distribution. Linking our model to the classical

Stein’s Identity, we show that the best normal approximation of the random optimal value,

under the L2-norm, can be computed by solving the persistency problem, first introduced

by Bertsimas et al. (2006). We further extend our method to the minimum quadratic

regret problem, and show that for any general mixed zero-one linear optimization problem,

the minimum quadratic regret solution can be computed by solving a related persistency

problem. Extensive computational studies on are presented to demonstrate the advantages

of the new method.

Key words: stochastic mixed zero-one linear optimization; persistency; distribution approx-

imation; regret; completely positive programming; project management; portfolio selection

1 Introduction

One of the fundamental problems in project management is to identify the project completion

time when the activity durations are random. It is well known that any project can be rep-

resented as a directed acyclic graph. In this paper, we adopt the conventional activity-on-arc

representation of the project network, where activities are represented by arcs and nodes rep-

resent the milestones that indicate the starting or ending of the activities. The length of an

∗Department of Management Sciences, City University of Hong Kong, Hong Kong. Email: [email protected]

†Department of Decision Sciences, NUS Business School, National University of Singapore. Email: [email protected]

‡Department of Decision Sciences, NUS Business School, National University of Singapore. Email:[email protected]

1

arc is the duration of the activity represented by that arc. Hence, if all the activities have

deterministic durations, finding the project completion time is as easy as finding the longest

path in a corresponding directed acyclic graph, which can be solved as a linear programming

(LP) problem. However, when the activity durations are stochastic, the analysis of the random

project completion time becomes nontrivial.

It has long been the interest of both researchers and practitioners to estimate the distribution

of the project completion time. Over the past few decades, various ways have been invented

to approximate this distribution (cf. Dodin (1985), Cox (1995), etc.). Unfortunately, to the

best of our knowledge, none of these offer a rigorous approach to guarantee the accuracy of

their approximations. In this paper, we partially address this issue under the assumption that

the activity durations follow a multivariate normal distribution, and we construct a normal

distribution that is optimal under L2-norm to approximate the random project completion

time. In fact, our method applies to any general random mixed 0-1 LP problem under objective

uncertainty:

Z (c) := maxx∈P

n∑j=1

cjxj (1)

where c = (c1, . . . , cn)T is the random cost coefficient vector with mean vector µ and covariance

matrix Σ, and P is the domain of the feasible solutions (assumed to be bounded) defined by

P := {x ∈ Rn : aTi x = bi, ∀i = 1, . . . ,m,

xj ∈ {0, 1} , ∀j ∈ B ⊆ {1, . . . , n} ,x ≥ 0 }

In the project management problem, P characterizes the set of paths in the project network,

and cj is the random duration of activity j. Note that there is by now a huge literature on

finding the distribution of Z (c) for various classes of combinatorial optimization problems,

including minimum assignment, spanning tree, and traveling salesman problem (cf. Aldous &

Steele (2003)). These problems are notoriously hard, and often only partial results (asymptotic,

or when c is uniformly distributed) are known. Finding the exact distribution for general mixed

0-1 LP problem appears to be almost impossible.

Under the Critical Path Method (CPM), which is often used in the project management

community, the random project completion time is often estimated by replacing cj with its

expected value µj , i.e., Z (µ) is used to approximate the project completion time. In the classical

Program Evaluation and Review Technique (PERT), this is taken one step further where the

distribution of the project completion time is approximated by∑n

j=1 γj cj =∑n

j=1 γj(cj −µj)+

Z(µ), with

γj =

{1, if arc j is active in the longest path when solving Z(µ),0, otherwise.

Due to the simplicity of the approach, PERT has gained a lot of popularity, and the random

project networks are sometimes also called PERT networks. This has led to a natural estimation

2

problem:

(P) minf,γ

E

Z(c)− f −∑j

γj(cj − µj)

2which is the central question addressed by this paper. Note that we are solving for the best

normal approximation (in L2-norm) to the random project duration, as an affine function of

the individual task activity duration. We explicitly obtain the (approximate) solution to this

optimization problem.

Besides the project management problem, Problem (P) is also related to another class of

interesting problems, which is the minimum quadratic regret solution to mixed 0-1 LP problem.

In the simplest case, we consider a class of portfolio selection problem, where cj represents

the random return of a financial asset j and P characterizes the set of all possible investment

decisions. To differentiate with Problem (P), we change the notation from cj to rj in this case.

In particular, we are interested to solve

min∑nj=1 xj=1,x≥0

E

Z (r)−n∑

j=1

rjxj

2where Z (c) represents the best possible investment return, i.e.,

Z (r) := max∑nj=1 xj=1,x≥0

n∑j=1

rjxj , (2)

We show that finding the optimal portfolio strategy to minimize the quadratic regret also reduces

to solving a related persistency problem.

Bertsimas et al. (2006) introduced the notion of the persistency of a binary decision variable

in Problem (1) as the probability that the variable is active (i.e., takes value of 1) in an optimal

solution to Problem (1). We generalize this concept to include continuous variables as follows:

Definition 1 The persistency of the decision variable xj in Problem (1) is defined as E[xj (c)],

where xj (c) denotes an optimal value of xj as a function of the random vector c. When xj is

a binary variable, E[xj (c)] = P(xj (c) = 1).

Regarding the issue of multiple optima, when c is continuous, the support of c over which

Problem (1) has multiple optimal solutions has measure zero and x (c) is unique almost surely1.

For discretely distributed c with possibly multiple optimal solutions over a support of strictly

positive measure, x (c) is defined to be an optimal solution randomly selected from the set of

optimal solutions at c.

The notion of persistency generalizes “criticality index” in project networks and “choice

probability” in discrete choice models (c.f. Bertsimas et al. (2006), Natarajan et al. (2009),

1In case c is normally distributed, x (c) would be unique almost surely.

3

Mishra et al. (2011)). By persistency problem, we mean the problem of estimating the persis-

tency values.

Outline of this paper: In the next section, we review the related literature. In Sec-

tion 3, we introduce our distribution approximation model, followed by an extension in regret

minimization problem using portfolio management as an example in Section 4. Results from

extensive computational studies are presented and discussed in Section 5. Finally, we provide

some concluding remarks in Section 6.

2 Literature Review

Our problem has a long history, and is related to the classical “distribution problem of stochastic

linear programming” literature (cf. Ewbank et al. (1974), Prekopa (1966) and the references

therein). The distribution of the optimum is often approximated by some cumbersome numerical

methods such as the Cartesian integration method (c.f. Bereanu (1963)). These methods have

been studied under the general framework when the uncertain parameters may appear in the

objective, constraint matrix, or the right hand side of the LP problems. However, the total

number of random variables are very limited due to the numerical methods employed. In the

case of project management problem, finding the distribution of completion time in a PERT

network is still an active area of research with a rich literature (cf. Yao & Chu (2007) and the

references therein). Most of the work in this area has been focused on using some graphical

approaches to reduce the size of the graph and to reduce the complexity of estimating the

distribution of the project completion time (e.g., Dodin (1985)). A substream of the research

tries to find a good normal approximation to the project completion time distribution, but could

only rely their methods on the Central Limit Theorem and some moment estimation methods

(e.g., Cox (1995)). We solve this open problem and shows that the best normal approximation

to the completion time distribution, under L2-norm, can be obtained by solving the related

persistency problem introduced by Bertsimas et al. (2006), and further studied in Natarajan et

al. (2009).

Brown et al. (1997) brought up the issue of persistence and persistent modeling in optimiza-

tion through a series of cases studies. Although the concept of persistence discussed in that

paper is very broad and different from the persistency defined above, they are closely related

through the issue of data uncertainty in optimization, which is also related to robust optimiza-

tion. The authors pointed out that from the perspective of persistence, robust optimization

seeks a baseline solution that will persist as best possible with a number of alternate forecast

revisions. On the other hand, persistency describes the degree of persistence of each individual

decision variable in an optimization problem with data uncertainty. Indeed, we can further

generalize Definition 1 to the persistency of a feasible solution, i.e., the probability that this

feasible solution is optimal. However, it is beyond the scope of this paper and we would like to

leave it for future research.

4

Over the past few years, a substream of research in the field of persistency estimation

yielded in a series of semidefinite programming (SDP) models based on the connection between

the moment cone and the semidefinite cone. A common feature of these models is that they

only assume the knowledge of moment information of the uncertainty rather than the exact

forms of the distribution. Hence, they are also referred as distributionally robust stochastic

programming (DRSP) models.

Bertsimas et al. (2006) introduced arguably the first computational approach to approximate

the persistency by solving a class of SDPs called Marginal Moment Model (MMM) under the

assumption that the random vector c is described only through the marginal moments of each

cj and all the decision variables in Problem (1) is binary. Natarajan et al. (2009) extended

MMM to general mixed-integer LP problems, but their model formulation is based on the

characterization of the convex hull of the binary reformulation which is typically difficult to

derive. Mishra et al. (2011) presented a SDP model named Cross Moment Model (CMM) for c

described by both the marginal and cross moments. The formulation of CMM is based on the

extreme point enumeration of Problem (1). Hence, the size of CMM can easily go exponential

for general LP problems. Inspired by a recent application of conic optimization on mixed 0-1 LP

problems due to Burer (2009), Natarajan et al. (2011) developed a parsimonious convex conic

optimization model without binary variables to estimate the persistency of a general mixed 0-1

LP problem when c is described by both the marginal and cross moments. They referred their

model as Completely Positive Cross Moment Model (CPCMM). In this paper, we mainly exploit

this model to estimate the persistency values. Detailed review of this model will be provided in

Section 5.1.

In a related vein, Vandenberghe et al. (2007) used SDP to obtain a generalized Chebyshev

bound on the probability that a random vector lies within a set defined by several strict quadratic

inequalities. Unfortunately, their bound is not suitable for general persistency estimation (cf.

Natarajan et al. (2011)).

A recent paper by Agrawal et al. (2011) investigated the loss incurred by ignoring correla-

tions in a DRSP model and proposed a new concept called price of correlations (POC). They

showed that POC is bounded from above for a certain class of cost functions, suggesting that

the intuitive approach of assuming independent distributions may actually work well for these

problems. However, independence conditions can be extremely difficult to capture as well. One

of the most famous examples is given by Hagstrom (1988), who showed that computing the

expected value of the longest path in a directed acyclic graph is #P-complete when the arc

lengths are restricted to taking two possible values and independent of each other. Perhaps a

DRSP model with correlation conditions is more tractable. Agrawal et al. (2011) also admitted

that for some cost functions, POC can be particularly large, indicating the need of DRSP mod-

els to capture correlations. Fortunately, CPCMM partially fills this gap, which in turns further

strengthens our approximation method.

5

3 Optimal Normal Approximation to the Distribution

As discussed in the introduction, our main idea is to approximate the distribution of Z (c) by

a normal distribution, W (c) with the following form:

W (c) = f +

n∑j=1

γj (cj − µj) , (3)

where f and γj ’s are adjustable parameters. Note that Equation (3) is sufficient to represent

any normal distribution. The objective is to choose f and γj ’s such that the expected squared

deviation between W (c) and Z (c) is minimized. In particular, we aim to solve:

(P) minf,γ

E

Z (c)− f −n∑

j=1

γj (cj − µj)

2It turns out that the solution to Problem (P) under the normality assumption of c is related to

the concept of persistency in a straightforward manner as shown in the following theorem.

Theorem 1 When c follows a multivariate normal distribution, a solution to Problem (P) is

f∗ = E [Z(c)] , γ∗l = E [xl(c)] , l = 1, . . . , n.

This solution is unique when the covariance matrix Σ is positive definite.

The proof of Theorem 1 utilizes the following surprising covariance identity due to Stein,

and its proof is enclosed in Appendix A for completeness.

Lemma 1 [Stein’s Identity] Let the random vector X = (X1, . . . , Xn)T be multivariate normal-

ly distributed with mean vector µ and covariance matrix Σ. For any function h(x1, . . . , xn) :

Rn → R such that ∂h(x1, . . . , xn)/∂xj exists almost everywhere and E[|∂h(X)/∂xj |] < ∞,

∀j = 1, . . . , n. Denote ∇h(X) = (∂h(X)/∂x1, . . . , ∂h(X)/∂xn)T . Then

Cov(X, h(X)) = ΣE[∇h(X)],

Specifically,

Cov (Xl, h(X1, . . . , Xn)) =n∑

j=1

Cov (Xl, Xj)E

[∂

∂xjh(X1, . . . , Xn)

], ∀l = 1, . . . , n.

Proof of Theorem 1. The necessary and sufficient optimality conditions of Problem (P) are

E

Z (c)− f −n∑

j=1

γj (cj − µj)

= 0, and

6

E

Z (c)− f −n∑

j=1

γj (cj − µj)

(cl − µl)

= 0, ∀l = 1, . . . , n.

Hence, the optimal solution to (P), (f∗,γ∗) should satisfy

f∗ = E [Z (c)] , and

E

[(Z (c)−E [Z (c)]−

n∑l=1

γ∗j (cj − µj)

)(cl − µl)

]= 0, ∀l = 1, . . . , n.

Rearranging the second set of conditions, we get

Cov (cl, Z (c)) =

n∑j=1

γ∗jCov (cl, cj) , ∀l = 1, . . . , n. (4)

Observe that

E

[∂Z(c)

∂cj

]= E

∂

∂cl

n∑j=1

cjxj(c)

= E [xl(c)] , ∀l = 1, . . . , n,

since xj(c)/∂cl exists almost everywhere and equals to zero whenever it exists for all j, l =

1, . . . , n. Thus, we get γ∗l = E [xl(c)] , l = 1, . . . , n as one solution to Equation (4), which is also

unique if the covariance matrix Σ is positive definite. Therefore, the proof is completed.

With Theorem 1, the problem of finding the best normal approximation to the distribution

of Z(c) is transformed into computing the persistency in Problem (1) as well as E[Z(c)]. In the

next section, we present another application of our method in the quadratic regret minimization

problem using the portfolio management problem as an example.

4 Regret in Portfolio Management

Portfolio management problem deals with allocating the budget in various financial assets with

the aim to earn a good return while keeping the risk low. The pioneering work by Markowitz

(1952) proposed an optimization framework to balance the expected return and the associated

risk characterized by variance as follows:

(M) min xTΣxs.t. µTx ≥ τ

Ax = bx ≥ 0

where µ and Σ are the mean and covariance matrix of the random return vector r, respective-

ly, and τ is the target expected return. µ and Σ are assumed to be known. The constraints

Ax = b and x ≥ 0 describe the restrictions on portfolio selection. Note that the presence of

the constraints x ≥ 0 does not rule out short selling. Short sales of an asset can be modeled

7

by introducing short and long positions in each asset as separate variables and adjusting the

corresponding returns. However, we assume that the short selling is limited through the con-

straints Ax = b and x ≥ 0. The simplest example is∑n

j=1 xj = 1 and x ≥ 0. Next, Markowitz

introduced the notion of efficient portfolio to denote the portfolio with minimal variance for a

given level of target expected return, and continuum of such portfolios forms a mean/variance

efficient frontier.

Based on the foundation paved by Markowitz, the modern portfolio theory flourished. A

substream of the research has gained a lot of attention due to the fact that investors tend to

evaluate their choices against some target return distributions, i.e., benchmark returns. Such

models are generally referred as tracking models (cf. Dembo & King (1992)). The investor

incurs a cost when the portfolio return falls below the benchmark. This cost is also called

“regret” if the benchmark is the investment return under the assumption that the investor has

the ability to predict future asset returns exactly. Precisely, regret of an investment strategy x

is defined as

Regret(x, r) := Z (r)−n∑

j=1

rjxj .

where Z (r) represents the best possible investment return, i.e.,

Z (r) := maxAx=b,x≥0

n∑j=1

rjxj , (5)

However, if the portfolio is selected by simply minimizing the expected regret, the result is to

risk all the budget on the asset with the highest expected return (and short on the lowest one

as much as possible). A common approach is to minimize the quadratic regret. In other words,

we try to find a portfolio with a return distribution that best approximates the ideal return

distribution in the sense of expected squared deviation, i.e.,

(R) minAx=b,x≥0

E

Z (r)−n∑

j=1

rjxj

2Unlike Problem (P), we lose one degree of freedom in this distribution approximation problem,

i.e., in Problem (R), there is no equivalence of f as in Problem (P).

Interestingly, the regret minimization approach is related to another stream of online port-

folio selection research based on aggregate return maximization and the concept of universal

portfolio introduced by Cover (1991) (cf. Helmbold et al. (1998)). These portfolio models

usually apply the logarithm transform on the aggregate return and then use the Taylor series

expansion to approximate the objective function, e.g., E[log(1 + rTx)] u E[rTx − (rTx)2/2],

provided the return is sufficiently small. Then maximizing the approximated aggregate return

is the same as minimizing E[(1− rTx)2], i.e., using 1 as the benchmark when the return is suffi-

ciently small. This works well except when the return is high, so we replace the benchmark from

1 to the theoretical optimal return. Another advantage of using the best return distribution as

8

the benchmark is the ease of defining the cost function. Because regret is always nonnegative,

we do not need an asymmetric cost function to penalize the downside risk only, which is more

difficult to optimize (cf. King & Jensen (1992)).

Theorem 2 Under the assumption that r follows a multivariate normal distribution, Problem

(R) can be solved as a convex quadratic programming problem as follows:

(R′) minAx=b,x≥0

xT(Σ+ µµT

)x− 2 (ΣE[x(r)] +E[Z(r)]µ)T x

where E[x (r)] is the persistency value in Problem (5). Furthermore, when Σ+µµT is positive

definite, this problem can be solved as a second-order cone programming (SCOP) problem.

Proof. This theorem can be proved by expanding the objective function in Problem (R) and

then applying Stein’s Identity. Since it is a straightforward algebraic manipulation, we omit the

details.

Therefore, the problem of finding the minimum quadratic regret portfolio can also be solved

by computing the persistency values in Problem (5) as well as E[Z(r)]. Note that the regret

minimization model (i.e., Problem (R)) presented in this section is applicable to any general

problem of finding minimum quadratic regret solution to a (discrete) optimization problem, e.g.,

least squared regret shortest path (cf. Aissi et al. (2009)). Then our result says such problem

can be reduced to solving a class of (mixed integer) convex quadratic optimization problem.

It would be interesting to compare the formulation of Problem (R) with the Markowitz mod-

el. Rather than (M), it has been shown by King & Jensen (1992) that the efficient mean/variance

frontier can be drawn by solving the following special case of a tracking problem:

(M′) minAx=b,x≥0

E

n∑j=1

rjxj − t

2as the parameter t varies between certain bounds depending on the value of the Lagrange

multipliers for the target expected return constraints at the extremes of the frontier. They also

showed that this problem is equivalent to the following problem:

(M′′) minAx=b,x≥0

xTΣx− 2λµTx

as the parameter λ varies from 0 to +∞. Interestingly, despite the parameter adjusted according

to the target expected return, the objective coefficients in (M′′), Σ and µ are replaced by

E[rrT

]and E

[Z (r) rT

]in (R), respectively. Hopefully, the increased difficulty of (R) due

to the requirement of estimating E [Z (r)] and E[Z (r) rT

]may pay out as a better portfolio

selection strategy. To demonstrate of the difference in the portfolio construction, we present

the analytical solutions to Problem (R) and (M) when there are only two assets available to

construct the portfolio in the following example.

9

Example 1 Suppose that there are two financial assets with random return r1 and r2. More-

over, E [rj ] = µj, V ar (rj) = σ2j , j = 1, 2, and Cov(r1, r2) = σ12 are known. Assume that the

only constraint encoded by Ax = b is x1 + x2 = 1. Then the KKT conditions of Problem (R)

are

2E[r21]x1 + 2E [r1r2]x2 + α− 2E [Z (r) r1]− β1 = 0

2E [r2r1]x1 + 2E[r22]x2 + α− 2E [Z (r) r2]− β2 = 0

x1 + x2 = 1x1β1 = 0x2β2 = 0x1, x2, β1, β2 ≥ 0

Note that in this case Z (r) = max {r1, r2} = r2+(r1 − r2)+, where (y)+ := max {y, 0}. Hence,

with a little algebra, the solution to the above system of equations can be expressed asx1 =

E[(r1−r2)2I{r1≥r2}]

E[(r1−r2)2]

=E[(r1−r2)

2|r1≥r2]E[(r1−r2)

2]P (r1 ≥ r2)

x2 =E[(r1−r2)

2I{r1<r2}]E[(r1−r2)

2]=

E[(r1−r2)2|r1<r2]

E[(r1−r2)2]

P (r1 < r2)

where I denotes the indicator function. As discussed before, the solution in this case is not

simply the persistency values, i.e., P (r1 ≥ r2) and P (r1 < r2), but rather the persistency tilted

by some factors related to the squared deviation of the two random returns. Please note that we

do not involve Stein’s Identity in this example due to the simplicity of the problem.

On the other hand, in the Markowitz model (M), let the expected return of the minimum

variance portfolio computed from (M) without the expected return constraint be τ∗. It can be

verified that as long as the target return τ ≥ τ∗ and the covariance matrix Σ is positive definite,

the expected return constraint would be tight. Although one can justify this claim through the

convex analysis, we offer an alternative argument based on the KKT conditions of Problem (M):

2σ21x1 + 2σ12x2 − µ1θ + α− β1 = 0

2σ22x2 + 2σ12x1 − µ2θ + α− β2 = 0

x1 + x2 = 1θ (τ − µ1x1 − µ2x2) = 0τ − µ1x1 − µ2x2 ≤ 0x1β1 = 0x2β2 = 0x1, x2, β1, β2 ≥ 0

Assume that the constraint µ1x1 +µ2x2 ≥ τ is not tight, i.e. τ −µ1x1 −µ2x2 < 0. Then θ = 0,

and we can obtain x1 =σ22−σ12

σ21−2σ12+σ2

2=

σ22−σ12

V ar(r1−r2)

x2 =σ21−σ12

σ21−2σ12+σ2

2=

σ21−σ12

V ar(r1−r2)

if σ21 ≥ σ12 and σ2

2 ≥ σ12{x1 = 0x2 = 1

if σ22 < σ12{

x1 = 1x2 = 0

if σ21 < σ12

which exactly resembles the minimum variance portfolio, i.e., the solution to the following prob-

10

lem:min σ2

1x21 + 2σ12x1x2 + σ2

2x22

s.t. x1 + x2 = 1x1, x2 ≥ 0

Then the condition τ − µ1x1 − µ2x2 < 0 is just τ < τ∗. Note that this solution makes sense

only if

V ar (r1 − r2) = 0. (6)

In another situation when the constraint µ1x1 + µ2x2 ≥ τ is tight, together with the budget

condition x1 + x2 = 1, we can easily find the optimal portfolio:{x1 =

τ−µ2

µ1−µ2

x2 =µ2−τµ1−µ2

which is valid under the assumption that

τ ∈ [µ1, µ2] or [µ2, µ1] , and µ1 = µ2. (7)

Those situations when some of the conditions in (6) and (7) do not hold correspond to the

extreme cases that are trivial to analyze. For example, V ar (r1 − r2) = 0 implies the two

assets are perfectly correlated and their variances are the same, i.e., σ12 = σ1σ2 and σ1 = σ2.

Consequently, the covariance matrix is not positive definite. In this case, the variance of the

portfolio would be constant and equal to σ21. Then the only concern is to meet the target return

requirement.

One salient feature of the Markowitz model is the separation on the treatment of mean and

variance of the portfolios. Especially in this example with only two assets, the target expected

return constraint and the budget constraint are enough to determine the optimal portfolio under

certain situations as discussed above. Whereas, the quadratic regret approach is able to capture

more distributional information of the random returns through an integrated framework. The

detailed performance of these two models will be analyzed later in Section 5.3.

5 Computational Study

In this section, we first review some approaches for persistency estimation and then demonstrate

the performance of our approximation model through two classes of problems, i.e., project and

portfolio management problems. Simulation is conducted and serves as a benchmark in all the

experiments, which is possible because the deterministic versions of Problem (1) considered in

this section are all polynomial time solvable. If the deterministic problem is NP-hard, imple-

menting a simulation method would require the solution to a number of NP-hard problems,

which can be very resource consuming.

11

5.1 Persistency Models

As analyzed in the previous section, the success of our approximation method hinges on the

accuracy of the estimation on the expected objective value of the random optimization problem

as well as the persistency values.

In literature, the problem of estimating the expected objective value of random optimization

problems has been studied for a long time. In case of the project management problem, the

search for the expected project completion time started half century ago (cf. Fulkerson (1962))

and is still an active research topic (cf. Yao & Chu (2007)). On the other hand, the persistency

problem has only been brought into the optimization area since Bertsimas et al. (2006). As

reviewed before, there have been several models developed for the purpose of estimating the per-

sistency values (cf. Natarajan et al. (2009), Mishra et al. (2011), Natarajan et al. (2011), Kong

et al. (2011) etc.). In this section, we will review the most recent progress on the persistency

estimation mainly contributed by Natarajan et al. (2011).

Natarajan et al. (2011) consider the following stochastic optimization problem:

ZP := supc∼(µ,Σ)+

E [Z(c)] ,

where c ∼ (µ,Σ)+ means that the set of distributions of the random coefficient vector c (as-

sumed to be nonempty) is defined by the nonnegative support Rn+, finite mean vector µ and

finite second moment matrix Σ, i.e., c ∈ {X : E[X] = µ,E[XXT ] = Σ,P(X ≥ 0) = 1}. Theyproved that ZP can be solved as the following convex conic optimization problem:

ZC = max∑n

j=1 Yj,js.t. aT

i Xai − 2biaTi x+ b2i = 0, ∀i = 1, . . . ,m

Xj,j = xj , ∀j ∈ B 1 µT xT

µ Σ Y T

x Y X

≽cp 0

(i.e., ZP = ZC) where the decision variables are x ∈ Rn, X ∈ Rn×n, and Y ∈ Rn×n, and for

a matrix A ∈ Rn×n, A ≽cp 0 means that A lies in the cone of completely positive matrices of

dimension n defined as

CPn :={A ∈ Rn×n | ∃V ∈ Rn×k

+ , such that A = V V T}.

The linear program over the convex cone of the completely positive matrices is called a com-

pletely positive program (CPP), and ZC is a typical CPP. The authors named this model as

CPCMM, which stands for Completely Positive Cross Moment Model. In a subsequent paper,

Kong et al. (2011) re-developed CPCMM from a different perspective based on the general conic

optimization theory, which significantly generalizes CPCMM such that more support constraints

can be incorporated, e.g., some ellipsoid bounding constraints on the random vector.

In the formulation of ZC , the variables x, Y and X attempt to encode the information

xj = E [xj(c)], Yi,j = E [cjxi(c)] and Xi,j = E [xi(c)xj(c)]. Thus, through solving ZC , the

12

optimal objective value gives the value of E [Z(c)], and the optimal value of x is simply the

persistency value. However, CPCMM ignores the distributional information. Hence, when

c is normally distributed, CPCMM only gives an upper bound to E [Z(c)] and estimates on

the persistency values. Another issue is that although the completely positive cone is closed,

convex and pointed, it is widely believed that CPPs are NP-hard to solve. Fortunately, there

are various hierarchies of tractable approximations for the completely positive cone (cf. Bomze

et al. (2000), Parrilo (2000) and Klerk et al. (2002) etc.). In the computational study, when

we need to solve CPCMM, we will exploit the simple SDP approximation of the completely

positive constraint, i.e., A ≽cp 0 is relaxed to A ≽ 0 and A ≥ 0, where A ≽ 0 means that A is

positive semidefinite.

Despite all these numerical inaccuracies, we show in the next section that our approximation

method is still practically attractive due to the use of persistency in the approximation and the

ability to capture correlations through CPCMM.

5.2 Project Management Problem

In this section, we present the results for approximating the distribution of the project com-

pletion time when the activity durations are stochastic. We try to compare our approach with

as many existing approximation methods as possible, except some approaches that require the

activity duration distribution to be either discrete or at least represented in some discrete form,

e.g., bounding distribution method by Kleindorfer (1971), and the finite-state discrete approx-

imation with cascading collapse method by Ord (1991)2, etc.

Another critical issue is that almost all the approximated distributions derived using previous

methods do not reside in the same probability space as Z(c), which makes the computation of the

squared deviation impossible. This problem arises since the traditional approaches solely focus

on the distribution (like tail probabilities, etc.) but overlook the approximation error between

the approximated completion time and the true completion time under a specific realization of

the random activity durations. For example, Cox (1995) assumed the project completion time

to be normally distributed at first, and then tried to estimate the moments of the completion

time. Thus the final approximated distribution does not admit a calculation of the squared

deviation from the true project completion time distribution. Hence, we have to resort to other

measures to compare the performance of different approximation methods, e.g., some descriptive

statistics, like mean and standard deviation. In addition, we also deploy the following measure

to quantify the distance between two distributions:

Square Norm Distance (F,G) :=

∫ 1

0

[F−1 (y)−G−1 (y)

]2dy

where F and G are the cumulative distribution functions of two distributions.

2Although the approach developed in Ord (1991) is based on the finite-state discrete characterization of allthe activity durations as well as the completion time of each milestone, it could still be applied to the casewhen the activity durations are continuously distributed. The only concern is to find an appropriate discreteapproximation of each activity duration distribution.

13

On the other hand, PERT approach simply considers the expected duration of each activity

when choosing the critical path (i.e., the path with longest expected completion time), and

then use the mean and variance of this critical path to approximate the mean and variance

of the project completion time, respectively. Finally, resorting to the Central Limit Theorem,

PERT assumes the project completion time is normally distributed (c.f. MacCrimmon & Ryavec

(1964)). Therefore, when the project activities follow the multivariate normal distribution, the

PERT estimation admits the computation of squared deviation and thus can be compared to

our approach in a greater detail.

The most important advantage of our approximation approach using CPCMM is its ability

to capture the correlations among the random variables. To the best of our knowledge, none of

the previous studies address the issues of correlated activities for the project management prob-

lem. However, potential positive correlations among the activity duration may exist in many

project networks. For example, if some troubles are encountered in the delivery of concrete

for a task in a construction project, this problem is likely to influence the duration of numer-

ous activities involving concrete pours in this project. With the following example network,

besides comparing the performance of different approximation methods, we also illustrate the

importance of considering the correlations among activities.

Example 2 The network consists of four nodes and five arcs as shown in Figure 1. All the

activities are independent and normally distributed with mean and variance both equal to one.

Figure 1: Network for Example 2

The network in Example 2 is the “Wheatstone bridge” network from Lindsey (1972) and

later regarded as the “forbidden graph” by Dodin (1985) since it is the basic evidence of graph

irreducibility. This network has been studied in almost every piece of the research work in this

field. Ord (1991) summarized the results for this graph with normally distributed activity du-

rations, and also provided the results from his discrete approximation method with a parameter

k indicating the number of discrete points used to approximate the normal distribution3. All

these results are presented in Table 1, where T denotes the project completion time and σ(.)

3Note that the approximated distributions obtained by Ord (1991) should be a discrete distribution as well.However, we modified his theory a bit in computing the square norm distance by assuming the final approximateddistribution follows a normal distribution with the moments derived from his original procedure.

14

denotes its standard deviation. The new result from our method is also presented in Table 1 un-

der “CPCMM”, since we use CPCMM to find the estimates on the expected project completion

time and persistency of the arcs.

Expected Square NormE[T ] σ(T ) Error on σ(T ) Square Deviation Distance

106 simulation rounds 3.516 1.39 - - -Numerical integration 3.483 1.47 5.76% - 0.017Ord (1991) k = 2 3.261 0.70 49.64% - 0.543

k = 3 3.485 1.04 25.18% - 0.128k = 4 3.525 1.08 22.32% - 0.101k = 5 3.582 1.15 17.27% - 0.068k = 6 3.594 1.15 17.27% - 0.069

Cox (1995) 3.639 1.69 21.58% - 0.116PERT 3.000 1.73 24.46% 0.973 0.395

CPCMM 3.918 1.30 6.47% 0.473 0.176Combo 3.525 1.30 6.47% 0.311 0.015

Table 1: Results for Example

From Table 1, we can see that as predicted, CPCMM gives only an upper bound on the

expected completion time, but it produces the best estimate for the standard deviation except

the numerical integration approach. Regardless of the high accuracy, the integration approach

would be too tedious to be applicable for even medium-size networks. This suggests that using

persistency could be a promising way to estimate the variability in the project completion time.

Recall in our approximation model, the variance is solely determined by the persistency values

(i.e., γj ’s in Equation (3)).

Now we propose a better way to utilize our approximation model. Credited to the flexibility

of the model, we can use E[T ] estimated from the other approach together with the persistency

estimated from CPCMM to form a new approximation using Equation (3). Since there are

much more well established results on estimating the expected project completion time, they

provide us a rich pool of resources to construct better approximations. The last row of Table

1 showcases such an example, named “Combo” distribution, which is a constructed using E[T ]

estimated from “Ord (1991) k = 4” and the persistency estimated from CPCMM. The new

distribution’s square norm distance from the optimal distribution drops even lower than numer-

ical integration approach! Figure 2 plots the density and cumulative distribution functions of

different approximations together with the simulation results.

To demonstrate the importance of having correlated activities, let us consider a simple

variation of Example 2.

Example 3 Consider the same network as Example 2 with the same mean and variance for

each activity, but the activities are correlated with each other. For analysis purpose, consider two

types of randomly generated correlation: one is the general correlation without any restriction,

and the other is the sparse and positive correlation, which is more natural in a project network.

Note that all the previous methods except PERT give the same approximations since they

ignore correlations among activities. The approximated distribution from PERT is simply the

15

−2 0 2 4 6 8 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Project Completion Time

Density of the Project Completion Time and Its Approximations

SimulationPERTCPCMMCombo

−2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Pro

babi

lity

Distribution of the Project Completion Time and Its Approximations

SimulationPERTCPCMMCombo

Figure 2: Distributions for Example 2

distribution of the critical path based on the expected activity completion time, i.e., Path

1 → 2 → 3 → 4.

Table 2 summarizes the results for ten instances of Example 3 with the first five instances

corresponding to the networks with general correlated activities and the last five instances corre-

sponding to the networks with sparsely and positively correlated activities. All the correlation

matrices are randomly generated for analysis. We have conducted this experiment for more

than hundreds of random correlation matrices, and the findings are the same as those discussed

here. Thus, we only report ten instances in this paper for succinctness. The sample size for all

the simulations is 106.

ExpectedApproximation Squared

# Method E[T ] σ(T ) Deviation1 Simulation 3.489 1.454 -

PERT 3.000 1.802 0.882CPCMM 3.889 1.374 0.452

2 Simulation 3.776 1.646 -PERT 3.000 2.239 1.902

CPCMM 4.288 1.502 0.7973 Simulation 3.584 1.533 -

PERT 3.000 1.940 1.162CPCMM 4.006 1.432 0.553

4 Simulation 3.474 1.473 -PERT 3.000 1.819 0.829


PERT 3.000 1.928 1.130CPCMM 3.994 1.458 0.537

ExpectedApproximation Squared

# Method E[T ] σ(T ) Deviation6 Simulation 3.331 1.813 -

PERT 3 2.154 0.486CPCMM 3.663 1.786 0.288

7 Simulation 3.449 1.774 -PERT 3 2.099 0.782


PERT 3 2.369 1.029CPCMM 3.844 1.854 0.437

9 Simulation 3.316 1.959 -PERT 3 2.270 0.575

CPCMM 3.530 1.937 0.27210 Simulation 3.630 2.177 0

PERT 3 2.836 1.530CPCMM 4.040 2.117 0.589

(Simulation Results for Uncorrelated Case: E[T ] = 3.516, σ(T ) = 1.390)

Table 2: Results for Example 3

It is observed that both mean and variance of the project completion time can vary a lot from

the independent case, especially variance. The variance estimates from CPCMM again cope

nicely with the true variances. For the sparse and positive correlation case, CPCMM works

even better with an average relative error less than 3%. When the activities are positively

correlated, PERT tends to substantially overestimate the variance of the project completion

16

time, and the expected squared deviation of our method is significantly less than PERT with

a reduction around 50%. Figure 3 showcases the approximated distribution for one instance

randomly picked with the following correlation matrix,1 0.148 0.738 0 0.805

0.148 1 0 0 0.0290.738 0 1 0 0.9800 0 0 1 0

0.805 0 0.980 0 1

,

which is instance #10 in Table 2.

−2 0 2 4 6 8 100

0.05

0.1

0.15

0.2

0.25

0.3


Density of the Project Completion Time and Its Approximations

SimulationPERTCPCMM

−2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Pro

babi

lity

Distribution of the Project Completion Time and Its Approximations

SimulationPERTCPCMM

Figure 3: Distributions for Example 3 Instance #10

To further justify the robustness of our model, we also test our approach on a series of

random project networks of larger sizes, and compare the results with PERT.

Example 4 Consider the random project network generated by the following algorithm.

Algorithm of Random Project Network Generation:

1. Randomly set the number of nodes (n′) in the project network.

2. Construct a zero adjacency matrix. Go through every matrix entry in the upper triangle

(above the diagonal), and replace 0 by 1 if an independent realization of a uniform random

variable U(0, 1) is greater than s, where s ∈ [0, 1]. s can be used to control the density of

the graph. For example, when s = 0, we will get a complete graph. More precisely, after

this step, the random network will have an expected number of arcs E[m′] = s·n′(n′−1)/2,

and each node will have s(n′ − 1) expected number of neighbours. We set s = 0.2 in our

experiments.

3. Remove all the isolated nodes in the network.

4. Create an initial node s. For each node i, add an arc s → i, if node i has no incoming

arcs.

17

5. Create a terminal node t. For each node i, add an arc i → t, if node i has no outgoing

arcs. After this step, the structure of the network is fixed. Denote the number of nodes as

n and the number of arcs as m.

6. For arc i, generate the random arc length with mean µi uniformly drawn between 1 and

10, and standard deviations uniformly drawn from 0 to 0.7µi.

7. Randomly generate a general or sparse and positive correlation matrix for the activities.

The results for twenty random networks are presented in Table 3. The first ten instances are

generated with general random correlation matrices, and the next ten instances are generated

with sparse and positive random correlation matrices. For this example, we also constructed

and analyzed more than one hundred random networks, and all the findings are the same as

those discussed later. Thus, for succinctness we only report twenty examples in this paper. The

sample size for all the simulations is 2× 104.

Expectedn Approximation Squared

# m Method E[T ] σ(T ) Deviation

1 16 Simulation 33.237 4.745 -32 PERT 29.886 5.923 33.410

CPCMM 37.480 3.576 25.879Combo 29.886 3.576 19.137

2 14 Simulation 44.010 5.403 -30 PERT 41.105 6.870 28.445

CPCMM 47.977 4.389 22.977Combo 41.105 4.389 15.157

3 12 Simulation 39.979 6.127 -21 PERT 39.687 6.412 1.854

CPCMM 43.064 5.602 10.987Combo 38.687 5.602 1.748

4 13 Simulation 43.792 7.016 -25 PERT 42.404 8.584 10.534

CPCMM 46.475 6.289 27.129Combo 42.404 6.289 6.827

5 9 Simulation 26.616 3.854 -13 PERT 25.326 4.582 5.881

CPCMM 28.522 3.498 5.797Combo 25.326 3.498 3.733

6 15 Simulation 29.976 4.355 -27 PERT 29.420 5.048 3.481

CPCMM 34.000 3.544 18.276Combo 29.420 3.544 2.755

7 23 Simulation 48.164 6.437 -54 PERT 46.247 7.691 14.772

CPCMM 54.627 5.526 47.342Combo 46.247 5.526 9.028

8 20 Simulation 42.818 6.225 -39 PERT 40.832 7.244 18.317

CPCMM 49.696 4.820 54.846Combo 40.832 4.820 11.809

9 8 Simulation 37.501 6.069 -15 PERT 35.921 6.480 10.961

CPCMM 39.418 5.732 7.697Combo 35.921 5.732 6.750

10 13 Simulation 40.850 6.409 024 PERT 39.188 7.704 16.221

CPCMM 45.180 5.514 25.265Combo 39.188 5.514 9.681

Expectedn Approximation Squared

# m Method E[T ] σ(T ) Deviation

11 16 Simulation 37.146 8.503 -30 PERT 33.551 10.426 32.549

CPCMM 41.982 7.669 31.818

12 13 Simulation 35.275 6.102 -21 PERT 31.748 5.521 33.864

CPCMM 38.866 5.390 19.872

13 17 Simulation 33.663 5.603 -33 PERT 28.109 9.710 78.637

CPCMM 39.405 3.938 46.236

14 9 Simulation 33.163 5.041 -20 PERT 29.595 6.307 30.245

CPCMM 36.247 4.407 14.260

15 18 Simulation 33.553 3.923 -36 PERT 28.131 6.836 60.655

CPCMM 37.833 1.706 28.119

16 14 Simulation 32.591 3.996 -26 PERT 30.214 5.929 19.506

CPCMM 36.153 3.167 17.278

17 16 Simulation 37.777 5.489 -30 PERT 33.551 7.083 46.085

CPCMM 42.861 4.184 36.043

18 11 Simulation 24.558 3.898 -16 PERT 21.409 6.655 27.180

CPCMM 26.754 3.209 9.480

19 13 Simulation 27.946 3.574 -22 PERT 26.1780 4.716 13.833

CPCMM 29.771 2.873 7.205

20 14 Simulation 43.964 5.691 -25 PERT 40.991 7.245 27.092

CPCMM 47.977 4.629 24.708

Table 3: Results of random project networks for Example 4

18

For general random correlation, as the size of the network increases, the mixture of positive

and negative correlations among activities tend to cancel out and PERT produces competitively

good estimate on the expected project completion time for most instances. Meanwhile, for some

graphs with dominant critical paths, PERT gives quite accurate estimates on both mean and

variance of the project completion time. On the other hand, since the approximation error of

CPP gets larger as the size of the problem increases, the upper bounds on the expected project

completion time from CPCMM gets worse. Thus, we observe that sometimes PERT gets a

lower expected squared deviation than CPCMM. Nevertheless, the persistency estimates from

CPCMM still perform very well, so the approximated distribution designed using the mean

from PERT and the persistency from CPCMM (shown as “Combo” in Table 3) tends to give

much lower expected squared deviation. For projects with sparsely and positively correlated

activities, PERT substantially underestimates the mean and overestimates the variance, and

the performance of CPCMM dominates PERT.

5.3 Portfolio Management Problems

In this section, we test the performance of the portfolio selected based on the minimum quadratic

regret criterion using our method developed in Section 4 with the constraint Ax = b specified

by∑n

j=1 xj = 1.

Data

We collect the daily return data of n industry portfolios (n = 10 and 174) over the past twenty

years between 1991 and 2010 from Fama/French Data Library5. The portfolios are constructed

by the following manner:

Each NYSE, AMEX, and NASDAQ stock is assigned to an industry portfolio at the

end of June of year t based on its four-digit SIC code at that time. (The Compustat

SIC codes are used for the fiscal year ending in calendar year t − 1. Whenever

Compustat SIC codes are not available, the CRSP SIC codes are used for June of

year t.) Then the returns are computed from July of t to June of t + 1. All the

stocks are equally weighted in the portfolio.

The mean (AVG), standard deviation (STD) and coefficient of variation (CV) of the daily

returns for each industry portfolio are summarized in Table 4. We also examine the frequency

plots of the daily returns for each industry portfolio and confirm the applicability of the normal

distribution assumption. See Figure 4 for two example frequency plots (on the top row) of

“Food” and “Util” industry portfolio from 17 industry portfolios data6. Although the time

4Similar experiments are conducted for n = 30 and the findings are the same as what will be presented later.We omit the detailed reports for n = 30 case due to the space constraint.

5http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data library.html6The reason for demonstrating these two industry portfolios is that they represent two classes of portfolios in

our data set. The food industry is relatively more stable, while the “Util” is more volatile, which is particularlyreflected on their return fluctuations during the last financial crisis period from 2007 to 2010.

19

Industry NoDur Durbl Manuf Enrgy HiTec Telcm Shops Hlth Utils OtherAVG 0.0870 0.1095 0.0893 0.0845 0.0882 0.0851 0.1015 0.0846 0.0845 0.0724STD 0.8416 1.1158 1.0357 1.4580 1.2462 1.3093 0.9779 1.0892 0.7948 0.8365CV 9.6736 10.1900 11.5980 17.2544 14.1293 15.3854 9.6345 12.8747 9.4059 11.5539

(a) 10 industry portfolios

Industry Food Mines Oil Clths Durbl Chems Cnsum Cnstr Steel FabPrAVG 0.0870 0.1095 0.0893 0.0845 0.0882 0.0851 0.1015 0.0846 0.0714 0.1211STD 0.7621 1.5778 1.4367 1.0422 1.0003 1.1348 1.1088 1.1227 1.4089 2.5584CV 8.7598 14.4091 16.0885 12.3337 11.3413 13.3349 10.9241 13.2707 19.7325 21.1263

Industry Machn Cars Trans Utils Rtail Finan OtherAVG 0.1002 0.0786 0.0809 0.0608 0.0901 0.0898 0.0947STD 1.1929 1.2875 1.0655 0.7948 1.0445 0.8012 1.0298CV 11.9052 16.3804 13.1706 13.0724 11.5927 8.9220 10.8743

(a) 17 industry portfolios

Table 4: Characterisics of the return data

series plots of the daily return data (middle row of Figure 4) suggest that the underlying return

distribution may not be stationary over time, we would like to argue that as long as the observed

history and the investment horizon is long enough, the assumption of unique multivariate normal

return distribution for the portfolios may still hold in our experiment. However, the trouble is

whether the historical information is sufficient or not to predict the true distribution. Thus,

we plot the cumulative distribution functions of the returns for time period from 1991 to 2010

as well as the time period from 1991 to 2001, respectively, as shown in the last row of Figure

4. Despite the exclusion of the financial crisis period (2007 to 2010), the distribution observed

in 1991 to 2001 is quite close to the one with ten more years of observation. Nevertheless, the

analysis above is merely a detailed description of the data, the focus of this experiment is to

access the quality of the investment decisions from different portfolio selection strategies.

Methodology

When the daily return data of n industry portfolios are considered, the investment decision is

to divide the total budget (normalized to 1) into these n portfolios. One practical example of

such problem could be the investment in various funds.

Rolling Horizon Framework:

Assume the past k-year data (k = 5 or 10) immediately before time period t (exclusive of

t) are used to support the portfolio selection at period t and the performance of the investment

decision at period t is recorded (i.e., out-of-sample performance). This framework is rolling

forward from period to period, i.e., the budget allocations are potentially different from period to

period. We assume that there are no transaction costs for changing the portfolio structure. For

each portfolio selection strategy, which will be explained later, the performance is continuously

tracked for 10 years, i.e., from 2001 to 2010.

For example, if we choose to utilize the past 10 year information to support the current

investment decision, then the data from Jan 2nd, 1991 (the first trading day in 1991) to Dec

29th, 2000 (the last trading day in 2000) is the initial history. The investment decision developed

20

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011−15

−10

−5

0

5

10

15

Time

Dai

ly R

etur

n

Daily Return of "Food" Industry Portfolio from 1991 to 2010

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011−15

−10

−5

0

5

10

15

Time

Dai

ly R

etur

n

Daily Return of "Util" Industry Portfolio from 1991 to 2010

−5 −4 −3 −2 −1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Daily Return

Pro

babi

lity

Cumulative Distribution Function of the Daily Return of "Food" Industry Portfolio

1991 − 20101991 − 2001

−5 −4 −3 −2 −1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Daily Return

Pro

babi

lity

Cumulative Distribution Function of the Daily Return of "Util" Industry Portfolio

1991 − 20101991 − 2001

(a) Food (b) Other

Figure 4: Distribution analysis of the daily returns of two portfolios picked from 17 industryportfolios

from this information set is then tested on Jan 2nd, 2001, i.e., the return of the investment based

on the actual portfolio returns on Jan 2nd, 2001 is recorded. Such experimental framework will

shift forward day by day with a fixed length of history in terms of the number of days. If we

choose to utilize only the past 5-year information to support the current investment decision,

then the data from Jan 2nd, 1996 to Dec 29th, 2000 would be the initial history.

Portfolio Selection Strategies:

The abundance of the historical return data forms a natural sample of the stochastic port-

folio returns. Hence, Besides of relying on the persistency models, we can also estimate the

persistency from the historical data. The strategies based on both approaches are considered.

The following six different portfolio selection strategies are investigated including our minimum

quadratic regret approach:

1. Expected Return

All the budget is devoted in the portfolio with the highest expected return in history.

21

2. Uniform

At the beginning of every period, the budget is reallocated and divided evenly among the

n portfolios. Since the portfolios are always rebalanced to a fixed proportion, this is also

called a constant-rebalanced portfolio strategy.

3. Markowitz

The Markowitz mean/variance model is used to compute the optimal portfolio, and the

target return is the average of the overall historical returns, i.e., the average of the average

historical returns of each industry portfolio. Effectively, the constraint requires the ex-

pected return to exceed the expected return of the “Uniform” strategy when the historical

returns are used to predict the future returns.

4. Persistency

(a) Empirical: For each portfolio, the percentage of all the trading days that it outper-

forms the rest portfolios is calculated from the historical data, and then the budget is

allocated according to these percentages, i.e., the empirical persistency.

(b) CPCMM: The budget is allocated according to the persistency estimated from CPCM-

M.

5. Conditional Value-at-Risk (CVaR)

CVaR of a random variable X (representing a loss) at a given probability level β ∈ (0, 1)

is defined as the conditional expectation of X exceeding a certain threshold, called Value-

at-Risk(VaR), which is the percentile of X at β, i.e.,

V aRβ

(X)= inf

{x ∈ R : P

(X ≤ x

)≥ β

}, and

CV aRβ

(X)= E

[X∣∣∣X > V aRβ

(X)]

.

As a coherent risk measure, CVaR has gained a lot interest over the past ten years.

Rockafellar & Uryasev (2000) showed that CVaR can be determined by minimizing a

more tractable auxiliary function without predetermining VaR first as follows:

CV aRβ

(X)= min

α∈R

{α+

1

1− βE

[(X − α

)+]}Then the budget allocation can be chosen to minimize CVaR of loss at a given tolerance

level β, i.e.,

(C) minα∈R,

∑nj=1 γj=1,γ≥0

{α+

1

1− βE[(−rTγ − α

)+]}This is a risk averse approach since it aims to minimize the conditional expectation of loss

exceeding VaR and meanwhile with probability 1− β, the loss is below α. Please refer to

Rockafellar & Uryasev (2000) and Rockafellar & Uryasev (2004) for more details on the

development of CVaR, and Zhu & Fukushima (2009) for the most recent review in this

area of research. We tested β = 0.8, 0.9 and 0.95 in the experiments, and since the results

22

are almost the same, only the results corresponds to β = 0.9 are reported.

(a) Empirical: Solve Problem (C) as a stochastic programming problem with the historical

returns as the sample.

(b) CPCMM: Solve Problem (C) using the extensions of CPCMM developed in Natarajan

et al. (2011).

6. Quadratic Regret

(a) Empirical: The investment decision is obtained by solving Problem (R) with parame-

ters empirically determined from the history returns as “Persistency (a)”, which include

the first two moments of the returns, E [x(c)] as well as E [Z (c)].

(b) CPCMM: The investment decision is obtained by solving Problem (R) with E [x(c)]

and E [Z (c)] estimated from solving CPCMM.

Performance Measures:

We impose the following measures to gouge the performance of different portfolio selection

strategies.

1. Regret

Two types of regret are considered, i.e., absolute regret (or difference regret) and quadratic

regret. For each type of regret, the total regret over the ten years investment horizon

(from 2001 to 2010) is recorded, which also reflects the average level of regret the investor

may face everyday. Note that although our investment decision is developed based on

minimizing the quadratic regret, the actual performance on the quadratic regret measure

from our strategy may not be the best, because the test we conducted is out-of-sample

and the existence of the numerical errors either from the return moment estimation or

solving the tractable approximation of CPCMM is unavoidable.

2. Annual Return

The average (AVG), standard deviation (STD) and signal-to-noise ratios (SNR = AVG/STD)

of the annual returns, as well as the volatility-adjusted annual return (VR = AVG−0.5×STD2) are computed. These measures help capture the risk and return tradeoff for dif-

ferent strategies in a short term.

3. Aggregate Return

We also keep track of the aggregate return for different strategies over the ten years

investment horizon compounded daily. This measure reflects the return from different

strategies in the long run.

Results and Discussion

Table 5 to Table 8 summarize the performance of different portfolio selection strategies on differ-

ent industry portfolios. The top and second best performance for each measure are underscored

23

Regret Annual Return AggregateQuadratic Absolute AVG STD SNR VR Return Inc(%)

Uniform 0.4048 25.6094 0.1913 0.2089 0.9158 0.1695 5.4964 -Expected Return 0.6450 25.1376 0.2382 0.3326 0.7162 0.1829 6.2697 14.07Markowitz 0.4902 25.8735 0.1650 0.1683 0.9809 0.1508 4.5595 -17.05Persistency (CPCMM) 0.3886 25.6728 0.1850 0.2306 0.8024 0.1584 4.9162 -10.56Persistency (Empirical) 0.3890 25.6181 0.1904 0.2362 0.8062 0.1625 5.1235 -6.79CVaR (CPCMM) 0.5061 26.0535 0.1471 0.1686 0.8727 0.1329 3.8063 -30.75CVaR (Empirical) 0.5163 26.1357 0.1390 0.1682 0.8265 0.1249 3.5089 -36.16Quadratic Regret (CPCMM) 0.4003 25.4729 0.2049 0.2350 0.8717 0.1773 5.9391 8.05Quadratic Regret (Empirical) 0.3971 25.4932 0.2028 0.2380 0.8524 0.1745 5.7789 5.14

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 20110

1

2

3

4

5

6

7

8

9

Time

Wea

lth

Wealth Growth from 2001 to 2010

S & P 500 IndexUniformExpected ReturnMarkowitzPersistencyCVaRQuadratic Regret

Table 5: Performance of different portfolio selection strategies on 10 industry portfolios using 5years as history

in the tables. Each figure below the table depicts the growth of one’s fortune (assume to be

1 initially) over the ten year investment horizon under different portfolio selection strategies.

Since the performance of empirical and CPCMM estimation does not vary too much nor cross

over with the performance from different strategies, we choose to plot the empirical results to

keep the figure readable. We also plot the growth of S&P 500 Index as a simple reference point.

In terms of accessing the performance, we use the “Uniform” strategy as a bench mark for

comparison. The percentage increases (or decreases) in the aggregate return for other strategies

against the “Uniform” strategy are reported in the tables.

It can be observed that our “Quadratic Regret” strategy usually generates the highest

aggregate return except in one case when it is outperformed by the “Expected Return” strategy.

Fortunately, performance of the “Quadratic Regret” strategy is much more stable than the

“Expected Return” strategy. As mentioned before, the “Quadratic Regret” strategy doest

not guarantee a minimal quadratic regret in the out-of-sample performance test. However,

the performance is still satisfiable, as the regret is very close to the best performance among

24


Uniform 0.4048 25.6094 0.1913 0.2089 0.9158 0.1695 5.4964 -Expected Return 0.6369 25.5171 0.2005 0.3462 0.5791 0.1406 4.0984 -25.43Markowitz 0.4709 25.7712 0.1752 0.1717 1.0204 0.1605 5.0201 -8.67Persistency (CPCMM) 0.3875 25.6270 0.1895 0.2241 0.8458 0.1644 5.2236 -4.96Persistency (Empirical) 0.3861 25.5861 0.1936 0.2285 0.8474 0.1675 5.3874 -1.98CVaR (CPCMM) 0.4958 25.9990 0.1526 0.1732 0.8811 0.1376 3.9883 -27.44CVaR (Empirical) 0.4982 26.0630 0.1462 0.1739 0.8407 0.1311 3.7362 -32.02Quadratic Regret (CPCMM) 0.3926 25.3044 0.2216 0.2235 0.9914 0.1966 7.2190 31.34Quadratic Regret (Empirical) 0.3897 25.3628 0.2158 0.2265 0.9528 0.1901 6.7643 23.07

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 20110

1

2

3

4

5

6

7

Time

Wea

lth



Table 6: Performance of different portfolio selection strategies on 10 industry portfolios using10 years as history

the strategies tested. Interestingly, the “Quadratic Regret” strategy usually gives the lowest

absolute regret in almost all cases.

The extreme strategy “Expected Return” gives the most unstable performance as indicated

by the smallest signal-to-risk ratio in all scenarios as well as the most fluctuated aggregate return.

Furthermore, as suggested by the common sense that “high expectation = big disappointment”,

the investor taking such extremal strategy usually suffers from the largest regret.

For the “Markowitz” strategy, although we explicitly require the expected return to exceed

that from the “Uniform” strategy, when tested out-of-sample, it performs significantly inferior

to the “Uniform” strategy. This might be due to the way that the Markowitz mean/variance

model penalizes the risk through variance (two-sided) and achieves the tradeoff between return

and risk through a separated treatment on mean and variance. In this experiment, this is

justified by its lowest standard deviation and largest signal-to-risk ratio of the annual return.

Another interesting finding is that among all the strategies we have tested, “CVaR” is the

most conservative one. Although it achieves the minimal variation in return, the opportunity

25


Uniform 0.7113 35.5749 0.2145 0.2174 0.9865 0.1909 6.8134 -Expected Return 0.9547 35.4593 0.2260 0.3313 0.6821 0.1711 5.5682 -17.55Markowitz 0.8680 35.7727 0.1948 0.1512 1.2888 0.1834 6.3251 -0.07Persistency (CPCMM) 0.6906 35.4390 0.2280 0.2364 0.9644 0.2001 7.4705 9.64Persistency (Empirical) 0.6924 35.3802 0.2339 0.2450 0.9543 0.2039 7.7575 13.86CVaR (CPCMM) 0.8827 36.0414 0.1681 0.1507 1.1156 0.1567 4.8390 -28.98CVaR (Empirical) 0.8799 36.0720 0.1651 0.1521 1.0853 0.1535 4.6832 -31.26Quadratic Regret (CPCMM) 0.7134 35.4136 0.2305 0.2435 0.9467 0.2009 7.5307 10.53Quadratic Regret (Empirical) 0.7082 35.3473 0.2371 0.2475 0.9580 0.2065 7.9669 16.93

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 20110

1

2

3

4

5

6

7

8

Time

Wea

lth



Table 7: Performance of different portfolio selection strategies on 17 industry portfolios using 5years as history

cost appears to be too high as indicated by the lowest average annual return (consequently the

lowest aggregated return), as well as the large regret.

The “Persistency” strategy performs quite moderate and stable. Interestingly, it usually

gives the smallest regret, especially in terms of the quadratic regret.

Besides the number of portfolios, the length of history, k also has some impact on the

performance of different strategies. In general, increasing k improves the performance of the

investment strategies except the “Uniform” strategy, which does not rely on any historical in-

formation. Intuitively, more data available means higher accuracy on the estimation of the

underlying return distribution, so as long as the assumption on the uniqueness of such distri-

bution holds as justified before, strategies based on more information would perform better.

To summarize, the “Quadratic Regret” strategy could consistently achieve a higher return,

and it would be suitable for the investment problem with more choices on the portfolios and

more historical information.

26

Regret Annual Return AggregateQuadratic Absolute AVG STD SNR Return Inc(%)

Uniform 0.7113 35.5749 0.2145 0.2174 0.9865 0.1909 6.8134 -Expected Return 0.9964 35.0902 0.2627 0.3456 0.7602 0.2030 7.6648 12.50Markowitz 0.8602 35.9246 0.1801 0.1524 1.1822 0.1685 5.4451 -20.08Persistency (CPCMM) 0.6910 35.4051 0.2314 0.2331 0.9928 0.2042 7.7903 14.34Persistency (Empirical) 0.6904 35.3328 0.2386 0.2395 0.9960 0.2099 8.2448 21.01CVaR (CPCMM) 0.8720 36.0677 0.1655 0.1537 1.0767 0.1537 4.6913 -31.15CVaR (Empirical) 0.8708 36.0689 0.1654 0.1550 1.0674 0.1534 4.6770 -31.36Quadratic Regret (CPCMM) 0.7045 35.3120 0.2406 0.2327 1.0341 0.2135 8.5563 25.58Quadratic Regret (Empirical) 0.7000 35.2626 0.2455 0.2377 1.0330 0.2172 8.8827 30.37

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 20110

1

2

3

4

5

6

7

8

9

Time

Wea

lth



Table 8: Performance of different portfolio selection strategies on 17 industry portfolios using10 years as history

6 Discussion and Conclusion

In this paper, we show that several classes of stochastic optimization problems can be trans-

formed into the related persistency problems, like project completion time distribution ap-

proximation problem and quadratic regret minimization problem. Extensive computational

experiments were presented to demonstrate the advantages of our distribution approximation

method, especially the benefits of introducing persistency into the distribution approximation

problem.

The results in this paper can be developed further in several ways. The computational

experiments on portfolio management problem presented in this paper can be more compre-

hensive. We can further compare the investment strategy with other online portfolio selection

strategies using geometric weights updates etc., often used in the machine learning community

(cf. Helmbold et al. (1998)). With the knowledge on the distribution of the optimal value,

we can now conduct more in-depth risk analysis or parameter calibration for the underlying

stochastic mixed zero-one linear optimization problem. We leave these and other related issues

27

for future research.

Appendix A. Proof of Lemma 1

Proof. The proof is consolidated from Stein (1972), Stein (1981) and Liu (1994).

We begin by showing the univariate version of Stein’ts Identity (cf. Stein (1972) and Stein

(1981)).

Let Y follow a standard normal distribution, N (0, 1), and ϕ (y) denote the standard normal

density with the derivative ϕ′ (y) = −yϕ (y). For any function g : R → R such that g′ exists

almost everywhere and E[|g′(Y )|] < ∞,

E [g′ (Y ))] =∫∞−∞ g′(y)ϕ (y) dy

=∫∞0 g′(y)

[−∫∞y −zϕ (z) dz

]dy +

∫ 0−∞ g′(y)

[∫ y−∞−zϕ (z) dz

]dy

=∫∞0 zϕ (z)

[∫ z0 g′(y)dy

]dz −

∫ 0−∞ zϕ (z)

[∫ 0z g′(y)dy

]dz

=(∫∞

0 +∫ 0−∞

){zϕ (z) [g(z)− g(0)]} dz

=∫∞−∞ zϕ (z) g(z)dz

= E [Y g (Y )]

where the third equality is justified by Fubini’s Theorem. Note that E[Y ] = 0 and V ar(Y ) = 1,

the equality proved above is essentially

Cov (Y, g (Y )) = V ar(Y )E[g′ (Y ))

]. (8)

Next, we generalize the above result into the multivariate case (cf. Stein (1981) and Liu

(1994)).

Let Z = (Z1, . . . , Zn)T , where Zj ’s are independent and identically distributed standard

normal random variables. It is straightforward to show by Equation (8) that for any function

h : Rn → R satisfying the same conditions as h,

E[Zj h (Z) | (Z2, . . . , Zn)

]= E

[∂h (Z)

∂zj| (Z2, . . . , Zn)

], ∀j = 1, . . . , n.

Taking the expectation of both sides, we find that

E[Zj h (Z)

]= E

[∂h (Z)

∂zj

], ∀j = 1, . . . , n,

i.e.,

Cov(Z, h (Z)

)= E

[∇h (Z)

].

Note that the random vector X can be written as X = Σ1/2Z + µ. Consider h (Z) =

h(Σ1/2Z + µ

), then ∇h (Z) = Σ1/2∇h (X). Hence,

Cov (X, h (X)) = Cov(Σ1/2Z, h (Z)

)= Σ1/2E

[∇h (Z)

]= ΣE [∇h (X)] .

28

References

Agrawal, S., Y. Ding, A. Saberi, Y. Ye (2011) Price of correlations in stochastic optimization, Manuscript.

Aissi, H., C. Bazgan, D. Vanderpooten (2009) Min-max and min-max regret versions of combinatorialoptimization problems: A survey , European Journal of Operational Research, 197, pp. 427–438.

Aldous, D., M. Steele (2003) The objective method: Probabilistic combinatorial optimization and localweak convergence, in Probability on Discrete Structures, H. Kesten (ed), Springer, Berlin, 110, pp.1–72.

Bereanu, B. (1963) On stochastic linear programming. I: Distribution problems: A single random variable,Romanian Journal of Pure and Applied Mathematics, 8, pp. 683–697.

Berman, A., N. Shaked-Monderer (2003) Completely Positive Matrices, World Scientific, Singapore.

Bertsimas, D., X. V. Doan, K. Natarajan, C. P. Teo (2010) Models for minimax stochastic linearoptimization problems with risk aversion, Mathematics of Operations Research, 35, pp. 580–602.

Bertsimas, D., K. Natarajan, C. P. Teo (2004) Probabilistic combinatorial optimization: moments,semidefinite programming and asymptotic bounds, SIAM Journal of Optimization, 15, pp. 185–209.

Bertsimas, D., K. Natarajan, C. P. Teo (2006) Persistence in discrete optimization under data uncer-tainty , Mathematical Programming, 108, pp. 251–274.

Bomze, I. M., M. Dur, E. D. Klerk, C. Roos, A. J. Quist, T. Terlaky, (2000) On copositive programmingand standard quadratic optimization problems, Journal of Global Optimization, 18, pp. 301–320.

Boyd, S., L. Vandenberghe (2004) Convex Optimizatioin, Cambridge University Press.

Borkar, V. (1995) Probability Theory: An Advanced Course, S. Axler, F. W. Gehring, P. R. Halmos(eds), Springer, New York..

Brown, G. G., R. F. Dell, R. K. Wood (1997) Optimization and persistence, Interfaces, 27, pp. 15–37.

Burer, S. (2009) On the copositive representation of binary and continuous nonconvex quadratic programs,Mathematical Programming, 120, pp. 479–495.

Cover, T. M. (1991) Universal portfolios, Mathematical Finance, 1, pp. 1–29.

Cox, M. A. (1995) Simple normal approximation to the completion time distribution for a PERT network ,International Journal of Project Management, 13, pp. 265–270.

Dembo, R. S., A. J. King (1992) Tracking models and the optimal regret distribution in asset allocation,Applied Stochastic Models and Data Analysis, 8, pp. 195–207.

Dodin, B. (1985) Bounding the project completion time distribution in PERT networks, OperationsResearch, 33, pp. 862–881.

Dur, M. (2009) Copositive programming: A survey , In: M. Diehl, F. Glineur, E. Jarlebring, W. Michiels(Eds.), Recent Advances in Optimization and its Applications in Engineering, Springer, pp. 3–20.

Ewbank, J. B., B. L. Foote, H. J. Kumin (1974), A method for the solution of the distribution problemof stochastic linear programming , SIAM Journal on Applied Mathematics, 26, pp. 225–238.

Fulkerson, D. R. (1962) Expected critical path lengths in PERT networks, Operations Research, 10, pp.808–817.

Hagstrom, J. N. (1988) Computational complexity of PERT problems, Networks, 18, pp. 139–147.

Helmbold, D. P., R. E. Schapire, Y. Singer, M. K. Warmuth (1997) A comparison of new and oldalgorithms for a mixture estimation problem, Machine Learning, 22, pp. 97–119.

Helmbold, D. P., R. E. Schapire, Y. Singer, M. K. Warmuth (1998) On-line portfolio selection usingmultiplicative updates, Mathematical Finance, 8, pp. 325–347.

Kamburowski, J. (1985) A Note on the Stochastic Shortest Route Problem, Operations Research, 33,pp. 696–698.

29

King, A. J., D. L. Jensen (1992) Linear-quadratic efficient frontiers for portfolio optimization, AppliedStochastic Models and Data Analysis, 8, pp. 195–207.

Klerk, E. de, D. V. Pasechnik (2002) Approximation of the stability number of a graph via copositiveprogramming , SIAM Journal on Optimization, 12, pp. 875–892.

Kleindorfer, G. B. (1971) Bounding distributions for a stochastic acyclic network , Operations Research,19, pp. 1586–1601.

Kong, Q., C. Y. Lee, C. P. Teo, Z. Zheng (2011) Scheduling arrivals to a stochastic service deliverysystem using copositive cones, Manuscript.

Liu, J. S. (1994) Siegel’s formula via Stein’s identities, Statistics & Probability Letters, 21, pp. 247–251.

Lindsey, J. H. (1972) An estimate of expected critical-path length in PERT networks, OperationsResearch, 20, pp. 800–812.

MacCrimmon, K. R., C. A. Ryavec (1964) An analytical study of the PERT assumptions, OperationsResearch, 12, pp. 16–37.

Markowitz, H. M. (1952) Portfolio selection, Journal of Finance, 7, pp. 77–91.

Markowitz, H. M. (1959) Portfolio Selection: Efficient Diversification of Investments, Wiley, New York.

Mishra, V. K., K. Natarajan, H. Tao, C. P. Teo (2011) Choice prediction with semidefinite optimizationwhen utilities are correlated , Manuscript.

Natarajan, K., M. Song, C. P. Teo (2009) Persistency model and its applications in choice modeling ,Management Science, 55, pp. 453–469.

Natarajan, K., C. P. Teo, Z. Zheng (2011a) Mixed zero-one linear programs under objective uncertainty:a completely positive representation, Forthcoming in Operations Resaerch.

Ord, J. K. (1991) A simple approximation to the completion time distribution for a PERT network , TheJournal of the Operational Research Society, 42, pp. 1011–1017.

Papadatos, N., V. Papathanasiou (2003) Multivariate covariance identities with an application to orderstatistics, Sankhya: The Indian Journal of Statistics, 65, pp. 307–316.

Parrilo, P. A. (2000) Structured Semidefinite Programs and Semi-algebraic Geometry Methods in Ro-bustness and Optimization, Ph.D. Dissertation, California Institute of Technology.

Prekopa, A. (1966) On the probability distribution of the optimum of a random linear program, SIAMJournal on Control and Optimization, 4, pp. 211–222.

Rockafellar, R. T., S. Uryasev (2000) Optimization of conditional value-at-risk , Journal of Risk, 2, pp.21–42.

Rockafellar, R. T., S. Uryasev (2004) Conditional value-at-risk for general loss distributions, Journal ofBanking and Finance, 26, pp. 1443–1471.

Sculli, D. (1983) The completion time of PERT networks, The Journal of the Operational ResearchSociety, 34, pp. 155–158.

Siegel, A. F. (1993) A surprising covariance involving the minimum of multivariate normal variables,Journal of the American Statistical Association, 88, pp. 77–80.

Stein, C. M. (1972) A bound for the error in the normal approximation to the distribution of a sum ofdependent random variables, Proceedings of the Berkeley Symposium on Mathematical Statisticsand Probability, 2, pp. 583–602.

Stein, C. M. (1981) Estimation of the mean of a multivariate normal distribution, The Annals ofStatistics, 9, pp. 1135–1151.

Vandenberghe, L., S. Boyd, K. Comanor (1972) Generalized Chebyshev bounds via semidefinite program-ming , SIAM Review, 49, pp. 52–64.

Yao, M. J., W. M. Chu (2007) A new approximation algorithm for obtaining the probability distributionfunction for project completion time, Computers & Mathematics with Applications, 54, pp. 282–295.

Zhu, S., M. Fukushima (2009) Worst-case Conditional Value-at-Risk with application to robust portfoliomanagement , Operations Research, 57, pp. 1155–1168.

30

Distribution of the Optimal Value of a Stochastic Mixed ...

Documents