Top Banner
Proceedings of the 2002 Winter Simulation Conference E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, eds. PROPERTIES OF THE NORTA METHOD IN HIGHER DIMENSIONS Soumyadip Ghosh Shane G. Henderson School of Operations Research and Industrial Engineering Cornell University Ithaca, NY 14853, U.S.A. ABSTRACT The NORTA method for multivariate generation is a fast general purpose method for generating samples of a random vector with given marginal distributions and given product- moment or rank correlation matrix. However, this method has been shown to fail to work for some feasible correlation matrices. (A matrix is feasible if there exists a random vector with the given marginal distributions and the matrix as the correlation matrix.) We investigate how this feasibility problem behaves as the dimension of the random vector is increased and find the problem to become acute rapidly. We also find that a modified NORTA procedure, augmented by a semidefinite program (SDP) that aims to generate a correlation matrix “close” to the desired one, performs well with increasing dimension. 1 INTRODUCTION Cario and Nelson (1997) described the NORTA method for generating random vectors with prescribed correlation matrix. This method belongs to a family of methods avail- able for multivariate generation that address the specific problem of generating samples of a finite dimensional random vector such that the generated samples match a given set of marginal distributions for the individual com- ponents, and some measure of dependence between them, typically chosen to be either the product-moment or the rank correlation matrix. (The product-moment correlation matrix for a random vector X = ( X 1 ,..., X d ) is the matrix 6 X = (6 X (i , j ) : 1 i , j d ) where 6 X (i , j ) = cov( X i , X j ) (var X i var X j ) 1/2 . The rank correlation matrix is of the same form except that now 6 X (i , j ) = cov( F i ( X i ), F j ( X j )) (var F i ( X i ) var F j ( X j )) 1/2 , where F i and F j are the distribution functions of X i and X j respectively.) The philosophy of specifying marginals and correla- tions to model dependent random variates is clearly an approximate one, since the joint distribution is not com- pletely specified. However, almost all the methods that have been suggested to model and generate from the full joint-distribution suffer from some serious drawbacks, for instance the enormous amount of information needed to specify (and fit) the joint distribution, and the specific na- ture of these methods that precludes an easy adaptation to cases where a joint distribution of a different nature is to be modelled. These drawbacks make their use impractical for a model of even moderate complexity. Hence, by aiming for the simpler goal of matching only the marginal distri- butions and the correlation matrix, one hopes to capture the essence of the dependence between the components while being able to work with easily implementable methods that work well in higher dimensions. Another argument in support of modelling random vec- tors in this way involves the use of diffusion approximations to model queueing systems. In many cases the limiting diffusions depend only on the first two moments of the input distributions. Therefore, there is some insensitivity in performance measures computed from these models to the exact form of the input distributions. In general then, if a form of this insensitivity is present in a model, the approach discussed here for modelling random vectors is quite reasonable. The NORTA method involves a componentwise trans- formation of a multivariate normal random vector, and cap- italizes on the fact that multivariate normal random vectors are easily generated; see e.g., Law and Kelton (2000), p. 480. Cario and Nelson (1997) traced the roots of the method back to Mardia (1970) who looked at bivariate distributions, and to Li and Hammond (1975) who concentrated on the case where all of the marginals have densities (with respect to Lebesgue measure). Iman and Conover (1982) implemented the same transformation procedure to induce a given rank
7

Advanced Input Modeling: Properties of the NORTA Method In Higher Dimensions

May 10, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Advanced Input Modeling: Properties of the NORTA Method In Higher Dimensions

Proceedings of the 2002 Winter Simulation ConferenceE. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, eds.

PROPERTIES OF THE NORTA METHOD IN HIGHER DIMENSIONS

Soumyadip GhoshShane G. Henderson

School of Operations Research and Industrial EngineeringCornell University

Ithaca, NY 14853, U.S.A.

sotd

o

r

e

oilifiah-

n

a

-an-

atfullrto-

n toe

forg-thehileat

-s

ngety

ton,

is

s-p-rs80.ckdsetotednk

ABSTRACT

The NORTA method for multivariate generation is a fageneral purpose method for generating samples of a randvector with given marginal distributions and given producmoment or rank correlation matrix. However, this methohas been shown to fail to work for somefeasiblecorrelationmatrices. (A matrix is feasible if there exists a random vectwith the given marginal distributions and the matrix as thcorrelation matrix.) We investigate how this feasibilityproblem behaves as the dimension of the random vectoincreased and find the problem to become acute rapidWe also find that a modified NORTA procedure, augmentby a semidefinite program (SDP) that aims to generatecorrelation matrix “close” to the desired one, performs wewith increasing dimension.

1 INTRODUCTION

Cario and Nelson (1997) described the NORTA methofor generating random vectors with prescribed correlatimatrix. This method belongs to a family of methods avaable for multivariate generation that address the specproblem of generating samples of a finite dimensionrandom vector such that the generated samples matcgiven set of marginal distributions for the individual components, and some measure of dependence between thtypically chosen to be either the product-moment or thrank correlation matrix. (The product-moment correlatiomatrix for a random vectorX = (X1, . . . , Xd) is the matrix6X = (6X(i , j ) : 1 ≤ i , j ≤ d) where

6X(i , j ) = cov(Xi , X j )

(varXi varX j )1/2 .

The rank correlation matrix is of the same form except thnow

6X(i , j ) = cov(Fi (Xi ), Fj (X j ))

(varFi (Xi ) varFj (X j ))1/2,

tm

-

re

isly.da

ll

dn-cla

em,e

t

where Fi and Fj are the distribution functions ofXi andX j respectively.)

The philosophy of specifying marginals and correlations to model dependent random variates is clearlyapproximate one, since the joint distribution is not completely specified. However, almost all the methods thhave been suggested to model and generate from thejoint-distribution suffer from some serious drawbacks, foinstance the enormous amount of information neededspecify (and fit) the joint distribution, and the specific nature of these methods that precludes an easy adaptatiocases where a joint distribution of a different nature is to bmodelled. These drawbacks make their use impracticala model of even moderate complexity. Hence, by aiminfor the simpler goal of matching only the marginal distributions and the correlation matrix, one hopes to captureessence of the dependence between the components wbeing able to work with easily implementable methods thwork well in higher dimensions.

Another argument in support of modelling random vectors in this way involves the use of diffusion approximationto model queueing systems. In many cases the limitidiffusions depend only on the first two moments of thinput distributions. Therefore, there is some insensitiviin performance measures computed from these modelsthe exact form of the input distributions. In general theif a form of this insensitivity is present in a model, theapproach discussed here for modelling random vectorsquite reasonable.

The NORTA method involves a componentwise tranformation of a multivariate normal random vector, and caitalizes on the fact that multivariate normal random vectoare easily generated; see e.g., Law and Kelton (2000), p. 4Cario and Nelson (1997) traced the roots of the method bato Mardia (1970) who looked at bivariate distributions, anto Li and Hammond (1975) who concentrated on the cawhere all of the marginals have densities (with respectLebesgue measure). Iman and Conover (1982) implementhe same transformation procedure to induce a given ra

Page 2: Advanced Input Modeling: Properties of the NORTA Method In Higher Dimensions

Ghosh and Henderson

te,e

-see99

aiskri-a00

den

re,gener

od--

oth8)ro-o

angivibuisisrexes

ndeaonbe

sateer.hee-lesoyleats

ere,tor

t.ite

a,lts

tor

lis

de

.

.

isoness.e

a

correlation in the output. Their method is only approximain that the output will have only very approximately thdesired rank correlation.

The NORTA method is a very efficient, easy to implement general purpose generation method, and hasadaptations to various contexts. Clemen and Reilly (19described how to use the NORTA procedure to inducedesired rank correlation in the context of decision and ranalysis. Lurie and Goldberg (1998) implemented a vaant of the NORTA method for generating samples ofpredetermined size. Henderson, Chiera, and Cooke (20adapt the NORTA method to generate samples of depenquasi-random numbers.

Due to its attactive properties, the NORTA proceduwhen it works, is often the method of choice for generatinrandom vectors with arbitrary marginals and any givfeasible correlation matrix. It is thus natural to ask wheththe NORTA procedure can matchany feasible correlationmatrix for a given set of marginals.

For 2-dimensional random vectors, the NORTA methcan match any feasible correlation matrix. This follows immediately from the characterizations in Whitt (1976). However, this does not hold for dimensions 3 and greater. BLi and Hammond (1975) and Lurie and Goldberg (199postulate examples in 3 dimensions where the NORTA pcedure might fail for feasible correlation matrices, but dnot establish that the counterexamples exist. In GhoshHenderson (2001a), Ghosh and Henderson (2001b), wea computational procedure based on chessboard distrtions to determine whether a given correlation matrixfeasible for the marginal distributions or not. Using thprocedure, we rigorously established that such counteamples do exist. Let us call feasible correlation matricthat cannot be matched using the NORTA methodNORTAdefectivematrices.

Based on the numerical results obtained in Ghosh aHenderson (2001b) we had also conjectured that this fsibility problem might get steadily worse as the dimensiincreases, in the sense that the NORTA method wouldincreasingly likely to fail for feasible correlation matriceas the dimension of the matrices grew. We investigthis aspect of the feasiblity problem further in this papWe estimate, for each dimension, the probability that tNORTA procedure fails to work for a feasible rank corrlation matrix chosen uniformly from the set of all feasibcorrelation matrices. Kurowicka and Cooke (2001) allooked at this problem, but they work with a probabilitdistribution that is not uniform over the set of all feasibcorrelation matrices. Our results confirm their finding ththe probability the NORTA procedure fails to work growrapidly with dimension.

Suppose we are willing to trade off accuracy for thsake of the efficiency of the NORTA generation procedui.e., we wish to use NORTA to generate a random vec

n)

)t

de-

-

-

with the prescribed marginals, and a correlation matrix thais, at least approximately, the required correlation matrixIn Ghosh and Henderson (2001b) we describe a semidefinprogramming approach that can assist in this regard.

The proposed augmented NORTA method works inexactly the same manner as the original method unlessNORTA defective matrix is encountered. For such a matrixa semidefinite program is set up and solved, and the resuare then used to modify the inputs given to the NORTAgeneration step in the hope that the generated random vechas a correlation matrix that is “close” to the desired one(it has the same marginal distributions). The numericaresults in Ghosh and Henderson (2001b) indicate that this typically true for the 3-dimensional case. In this paper, weexamine higher dimensions, exploring how the augmenteNORTA method performs as the dimension increases. Thresults indicate that NORTA can typically get very close toa target correlation matrix, even in very high dimensionsSo in high dimensions, while NORTA is unlikely to be ableto exactly match a desired correlation matrix, it may beable to match a correlation matrix that is very close to thedesired one.

The next section reviews the NORTA procedure andindicates why some matrices may be NORTA defectiveSection 3 studies how the NORTA feasibility problem affectsits performance as the dimension of the random vectorincreased. Section 4 briefly describes the SDP augmentatiproposed in Ghosh and Henderson (2001b), and studihow this augmented method performs in higher dimensionFinally, Section 5 summarizes the conclusions that we werable to draw from our studies.

2 THE NORTA PROCEDURE

Suppose that we wish to generate i.i.d. replicates ofrandom vectorX = (X1, . . . , Xd) with prescribed marginaldistributions

Fi (·) = P(Xi ≤ ·), i = 1, . . . , d,

and product-moment or rank correlation matrix

6X = 6X(i , j ), 1 ≤ i , j ≤ d.

If we assume6X to be feasible for the marginals, thenthe NORTA method generates i.i.d. replicates ofX by thefollowing procedure.

1. Generate an IRd valued joint normal random vectorZ = (Z1, . . . , Zd) with mean vector 0 and covari-ance matrix6Z = (6Z(i , j ) : 1 ≤ i , j ≤ d),where6Z(i , i ) = 1 for i = 1, . . . , d.

Page 3: Advanced Input Modeling: Properties of the NORTA Method In Higher Dimensions

Ghosh and Henderson

a

ti

ha

n

sis

a

the

.ios

-

ns

a

slyrlop1b)ea

-es,ratenalu-A.e

n,

.

al

thedillseton

inur

etAandx.

nhest”he

2. Compute the vectorX = (X1, . . . , Xd) via

Xi = F−1i (8(Zi )), (1)

for i = 1, . . . , d, where8 is the distribution func-tion of a standard normal random variable, and

F−1i (u) = inf {x : Fi (x) ≥ u}. (2)

The vectorX generated by this procedure will havethe prescribed marginal distributions. To see this, note theachZi has a standard normal distribution, so that8(Zi )

is uniformly distributed on(0, 1), and soF−1i (8(Zi )) will

have the required marginal distribution.The covariance matrix6Z should be chosen, in a pre-

processing phase, so that it induces the prescribed correlamatrix 6X on X. However, there is no general closed formexpression that gives6Z in terms of6X. Indeed, determin-ing the right6Z is the most difficult step in implementingthe NORTA method.

Each component of6X has been shown to dependonly on the corresponding component of6Z. As in Carioand Nelson (1997), we can defineci j (z) = 6X(i , j ) torepresent the correlation betweenXi and X j as a functionof the correlationz betweenZi and Z j , when Xi and X j

are generated as in (1). Cario and Nelson (1997) show tunder certain very mild conditionsci j (·) is a non-decreasing,continuous function. This result helps us perform an efficienumerical search for a value3Z(i , j ) that solves

ci j (3Z(i , j )) = 6X(i , j ). (3)

Hence a numerical estimate3Z of 6Z can be determined bysolving a number of one-dimensional root-finding problemUnless stated otherwise, we assume that a solution exfor (3).

Henderson, Chiera, and Cooke (2000) also show thunder stronger assumptions the value ofz in (3) is uniquelydetermined by6X(i , j ). They infer from this that if theirassumptions hold and if NORTAcan work, then itwill .

The matrix3Z is constructed in a way that does nonecessarily ensure that it is positive semidefinite. It migindeed turn out to be indefinite, in which case it cannot ba valid covariance matrix for a joint normal distributionCan this happen, i.e., can there exist a feasible correlatmatrix that, under exact numerical estimation in (3), givean indefinite3Z?

Li and Hammond (1975) postulated the following counterexample. SupposeX = (X1, X2, X3) is a random vectorwith uniform (0, 1] marginals, and correlation matrix

6X = 1 −0.4 0.2

−0.4 1 0.80.2 0.8 1

.

t

on

t

t

.ts

t

t

n

For this special case of uniform marginals, the equatio(3) can be solved analytically as (Kruskal 1958)

3Z(i , j ) = 2 sin(π

66X(i , j )

). (4)

The unique solution3Z for the given6X turns out to beindefinite.

This counterexample is of course valid only if suchuniform random vectorX exists. Li and Hammond (1975)did not show this, and no general purpose method previouexisted to determine the feasibility of a correlation matrix foa given set of marginals. We have since been able to devea computational procedure in Ghosh and Henderson (200that can determine, foralmostany (in a Lebesgue measursense) given correlation matrix, whether it is feasible forgiven set of marginal distributions or not.

Applying this algorithm to the Li and Hammond example gives a construction of the random vector, so that it doindeed, exist. In Ghosh and Henderson (2001b) we genea number of such feasible matrices for three-dimensiouniform random vectors that are NORTA defective. The nmerical results suggest a structure to the failure of NORTTo explain this observation more carefully we need somnotation.

Suppose that the marginal distributionsF1, . . . , Fd havedensities with bounded support, and are fixed. We cawith an abuse of notation, view ad × d correlation matrixas an element ofd(d − 1)/2 dimensional vector spaceThis follows because there ared(d − 1)/2 elements abovethe diagonal, the matrix is symmetric, and the diagonelements are equal to 1. Let� denote the set of feasiblecorrelation matrices. (Under the assumptions made onmarginals, the results hold identically for both rank anproduct-moment correlations, and hence no distinction wbe made between them.) We view this set as a subof d(d − 1)/2 dimensional space. Ghosh and Henders(2001b) prove that in this setting� is nonempty, convex,closed and full-dimensional.

Returning to the discussion above, we found that3 dimensions, NORTA defective matrices tended to occnear the boundary of�. Moreover, the indefinite correlationmatrices3Z determined for the joint normal distributionfrom (3) seemed to lie close to (but outside of) the sof symmetric positive semidefinite matrices. So NORTdefective matrices tended to occur near the boundary,they were never too distant from a NORTA feasible matri

3 NORTA IN HIGHER DIMENSIONS

As mentioned above, NORTA appears to fail most oftewhen the correlation matrix is close to the boundary of tset�. Now, in a sense that can be made precise, “mopoints in certain sets in high dimensions lie close to t

Page 4: Advanced Input Modeling: Properties of the NORTA Method In Higher Dimensions

Ghosh and Henderson

i

sia

a

-

r

n

h

y

ey

ns-be

e

off

-theystive

s

dyingn.

ne.fl-le

boundary. For example, consider the interior of the unhypercube[−1

2, 12]d in IRd represented by the hypercube

[−1−ε2 , 1−ε

2 ]d. The ratio of the volumes of the interior andthe whole set is then(1 − ε)d, which decreases rapidly to0 asd increases.

This suggests that feasible matrices within the set�

may become increasingly likely to be NORTA defective athe dimension of the problem increases, so that the feasiblproblem that NORTA faces becomes increasingly acutethe dimension increases.

We investigate this dimensionality aspect of the NORTAfeasiblity problem in the context of generating samples ofuniform random vector, i.e., a random vector with uniform(0, 1] marginal distributions. This case has special significance to the NORTA method because, by construction, thmethod has to generate a uniform random vector as the fi(intermediary) step. Furthermore, the rank correlation matrix of a NORTA generated vector with continuous marginadistributions coincides with the product moment correlatiomatrix for the intermediate uniform random vector.

This special case also has two advantages. First, tfunctionci j is explicitly known; see (4). Hence any feasiblecorrelation matrix for a uniform random vector can be easiltested for NORTA feasibility.

Second, it has recently been established (Kurowicka anCooke 2001) that the set of all feasible correlation matricefor uniform marginals, say�, coincides with the set of allsymmetric positive semidefinite matrices with ones on thdiagonal. Thus the problem of estimating the probabilitof NORTA infeasibility reduces to the following algorithm.

1. Let n ≥ 1 be given.2. Let 6X(1), . . . , 6X(n) be an i.i.d. sample chosen

uniformly from

� = {6 : 6 = 6T ,6 � 0, 6i i = 1 ∀i }.

3. For eachi = 1, . . . , n let 3Z(i ) be obtained from6X(i ) using the componentwise relation (4).

4. Estimate the probability of NORTA infeasibilityby the proportion of matrices in{3Z(i ) : i =1, . . . , n} that are not positive semidefinite.

(The matrix inequalityA � 0 signifies a constraint that thematrix A be positive semidefinite.)

Note that in estimating the probability of NORTA in-feasibility we have had to choose a probability distributionon �. The uniform distribution (with respect to Lebesguemeasure) is a natural choice, and is the one we preferwork with. Kurowicka and Cooke (2001) also give esti-mates for the probability of NORTA feasibility but they usea different distribution on�.

his

t

tys

est-l

e

ds

to

3.1 Sampling Uniformly from �

Our first attempt at estimating the probability of NORTAinfeasibility was to combine two well-known methods isimulation estimation: importance sampling and ratio etimation. We used importance sampling on the hypercu

[−1, 1] d(d−1)2 (� is a strict subset of this hypercube) to choos

correlation vectors from�. We then used ratio estimation(see, e.g., Henderson (2001)) to estimate the probabilityNORTA infeasibility. The estimator of the probability oNORTA infeasibility was therefore of the form

∑ni=1[I (6X(i ) � 0,3Z(i ) 6� 0) 2−d

φ(6X(i )) ]∑ni=1[I (6X(i ) � 0) 2−d

φ(6X(i )) ],

where the matrices6X(i ) were chosen independently withdensity φ from the hypercube[−1, 1]d. We chose thedensityφ in a heuristic fashion.

This method of estimation worked well in lower dimensions but we found that it became excessively slow asdimension increased. Indeed, it took more than two dato generate on the order of a thousand samples of posidefinite matrices even for a dimension as low asd = 12.Clearly, a better sampling technique was needed.

Investing some further thought into the problem led uto construct a method that samplesexactlyfrom the uniform(in a Lebesgue measure sense) distribution on the set�.This method starts with the one-dimensional matrix [1] anthen “grows out” the matrix to the dimension desired bsuccessively adding an extra row (and the correspondmirrored column) chosen from an appropriate distributioTo be more precise the method is as follows.

1. Let 6 be the 1× 1 matrix 1.2. For i = 2, . . . , d

(a) Let U be a column vector in IRi−1 chosen,independently of all else, from distributionϕi

say.

(b) Set

6 =[

6 UUT 1

].

(c) Next i .

The distributionsϕi are conditional distributions thatdepend on the partial matrix6 constructed thus far. Wedo not specify them further here.

This method has two key advantages over the first oFirst, sampling fromϕi can be reduced to the problem osampling from a univariate beta distribution, a very welstudied problem for which efficient algorithms are availab(see Law and Kelton 2000, p. 453-458). Consequently t

Page 5: Advanced Input Modeling: Properties of the NORTA Method In Higher Dimensions

Ghosh and Henderson

en

ioath

th.itythiofw

-

a

ioshatheh

eese

ledth

is

fin-

trixet itty

nn

r,

b-

il-e

s

onn

nhtion

red

method scales very well with dimension. In our study wwere able to generate samples consisting of many thousaof matrices up to dimensiond = 25 in a matter of hours.

Second, this method does not involve a ratio-estimatstep, which means that the estimation is more straightforwto implement. For a given sample size, we also foundresults to be more accurate.

We used the exact sampling approach to estimateprobability of NORTA infeasibility for various dimensionsOur results are given in Figure 1, where the probabilis plotted against dimension. The plot establishes thatfeasiblity problem rapidly becomes acute as the dimensincreases. As seen from the results, the probability omatrix being NORTA defective is almost 1 even in as loa dimension as seventeen.

0 5 10 15 20 250

0.2

0.4

0.6

0.8

1

1.2

1.4

Dimension

Pro

babi

lity

Probability of NORTA defectiveness

Figure 1: Probability of NORTA infeasibility, based on sampling 15,000 matrices uniformly from� in each dimension.Also shown are 95% confidence intervals.

4 FIXING NORTA

To recoup, we previously noted that NORTA defective mtrices appear to lie close to the boundary of the set�. Thereason matrices are NORTA defective is that the correlatmatrix3Z determined for the joint normal distribution turnout to be indefinite, and hence infeasible. Moreover, tresults from the previous section confirm our intuition thsince most points in a set lie near the boundary in higdimensions, the NORTA infeasibility problem grows witdimension.

However, we also observed that the indefinite matric3Z lie very close to the set of feasible correlation matricfor joint normal random vectors (i.e., the set of positivsemidefinite matrices with ones on the diagonal). Thisto the suggestion in Ghosh and Henderson (2001b) that

ds

nrde

e

ena

-

n

e

r

s

e

setup stage of NORTA be augmented with an SDP thatused, if3Z turns out indefinite, to find a matrix6Z thatis “close” to 3Z and is positive semidefinite. The matrix6Z is then used within the NORTA method.

Why is this approach reasonable? In Theorem 2 oCario and Nelson (1997) it is shown that under a certamoment condition, the output correlation matrix is a continuous function of the input covariance matrix6Z usedin the NORTA procedure. So if6Z is “close” to 3Z,then we can expect the correlation matrix of the NORTAgenerated random vectors to be close to the desired ma6X. The moment condition always holds when we arattempting to match rank correlations, and we can expecto hold almost invariably when matching product-momencorrelations. Therefore, it is eminently reasonable to trand minimize some measure of distancer (3Z,6Z) say,between3Z and6Z.

The SDP falls under the broad class of matrix completioproblems; see Alfakih and Wolkowicz (2000), or Johnso(1990). For this case, given3Z as data, we wish to choosea symmetric matrix6Z to

minimize r (6Z,3Z)

subject to 6Z � 0, (5)

6Z(i , i ) = 1.

The metricr (·, ·) can be chosen as desired. In particulachoosing either theL1 metric

r (A, B) =∑i> j

|Ai j − Bi j |

or the L∞ metric

r (A, B) = maxi> j

|Ai j − Bi j |

makes the minimization problem an SDP-constrained prolem with a linear objective function. Efficient algorithms,and public domain codes implementing them, are avaable for solving semidefinite problems of this type; seWolkowicz, Saigal, and Vandenberghe (2000).

The SDP framework allows us to include preferenceon how the search for6Z is performed. For example, wecan require that for some(i , j ), 6Z(i , j ) ≥ 3Z(i , j ), orthat the value3Z(i , j ) change by at mostδ > 0.

Numerical studies conducted in Ghosh and Henders(2001b) indicate that in 3 dimensions this SDP augmentatioyields NORTA generated random vectors with correlatiomatrices that are close to the desired ones. One migthen ask whether this remains the case as the dimensincreases.

We use a setting identical to that used in Section 3 fothis study, and our measure of performance is the expect

Page 6: Advanced Input Modeling: Properties of the NORTA Method In Higher Dimensions

d Henderson

ioa

iun

s

nh

tae

)t

osi

to-s

nn

s.ni-

.veintec-al

stesseual

-n.

n

e-

dnins

ofly

i-nglts

ce

Ghosh an

L1 distance that we have to move from the desired correlatmatrix to reach a NORTA feasible one. This means ththe minimization problem (5) is solved withr (·, ·) as theL1 metric and no additional constraints are added.

3 4 5 6 7 8 90.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

Dimension

Dis

tanc

e

Average L1 distances moved

Average of correpsonding L∞ distances

Figure 2: Performance of the SDP-augmented NORTAhigher dimensions. 15,000 matrices were generatedformly from � and the semidefinite program, withr takenas theL1 distance, solved for the NORTA defective caseThe solid line gives the expectedL1 distance with 95%confidence intervals as marked, with the average taken oover NORTA defective matrices. The dotted line gives tcorresponding expected distance as measured in theL∞metric.

Figure 2 plots the results. We see that the expectedL1distance increases as the dimensiond increases at what mighbe perceived as a linear rate, although one could reasonargue for a superlinear rate. If the rate of increase is indlinear then, since there ared(d − 1)/2 matrix entries abovethe diagonal, theaveragechange per entry is (eventuallydecreasing with dimension. Of course, it is possible thasmall number of entries change by a large amount. TheL∞distance is also shown, and we see that indeed, at leastentry is changed by an increasing amount as the dimenincreases.

It might be preferable from a modelling standpointinstead minimize theL∞ distance, so that one tries to minimize the maximum deviation from the target correlationThe results in this case are shown in Table 1.

We see that the expectedL∞ distance appears to remaiconstant at around 0.005 or even decrease with dimensio

One might also attempt a hybrid of theL1 and L∞approaches, perhaps by minimizing theL1 distance subjectto an upper bound on theL∞ distance.

nt

ni-

.

lye

blyed

a

neon

.

.

Table 1: The SDP-augmented NORTA in higher dimensionFor each dimension, 15,000 matrices were generated uformly from � and the semidefinite program, withr takenas theL∞ distance, solved for NORTA defective matricesThe second column gives the number of NORTA defectimatrices encountered. The third column gives the poestimate, taken as an average only over the NORTA deftive matrices, and not over all 15,000 matrices. The fincolumn gives the halfwidth of 95% confidence intervals.

d ND L∞ CI3 524 0.0057 0.00044 1640 0.0053 0.00025 3271 0.0049 0.00016 4961 0.0045 0.00017 6988 0.0043 0.00018 8826 0.00414 0.000059 10428 0.00404 0.00004

Thus, the SDP-augmented NORTA problem performwell on average even in higher dimensions. It generarandom vectors with correlation matrices which are cloto the desired ones, while keeping changes to the individcorrelations within reasonable limits.

5 CONCLUSIONS

We have empirically reached the following conclusions:

• The feasibility problem that the NORTA procedure faces becomes steadily worse with dimensioNORTA fails in the vast majority of cases even ias low a dimension as seventeen.

• The NORTA procedure, when augmented with thSDP optimization of Section 4, can generate samples with the required marginal distributions, ana correlation matrix that is a close approximatioto the one desired, and the approximation remaaccurate as the dimension increases.

An added bonus is the exact sampling procedureSection 3.1 which can be generalized to sample uniformfrom the set of all positive semidefinite matrices with dagonals fixed at specific values. We are presently workion refining this procedure and plan to publish the resuelsewhere.

ACKNOWLEDGMENTS

This research was partially supported by National ScienFoundation Grant Number DMI 9984717.

Page 7: Advanced Input Modeling: Properties of the NORTA Method In Higher Dimensions

Ghosh and Henderson

n

t-

ts,

-

t

,

is

ne.

.

,

--

r--

l

-s

s

.

.),

ellrch

ng

g-

l-slet-

e-s

REFERENCES

Alfakih, A., and H. Wolkowicz. 2000. Matrix completionproblems. InHandbook of Semidefinite Programming:Theory, Algorithms and Applications, ed. H. Wolkow-icz, R. Saigal, and L. Vandenberghe, 533–545. BostoKluwer.

Cario, M. C., and B. L. Nelson. 1997. Modeling and generaing random vectors with arbitrary marginal distributionsand correlation matrix. Technical report, Departmenof Industrial Engineering and Management ScienceNorthwestern University, Evanston, Illinois.

Clemen, R. T., and T. Reilly. 1999. Correlations and copulas for decision and risk analysis.Management Sci-ence45:208–224.

Ghosh, S., and S. G. Henderson. 2001a. Chessboard disbutions. InProceedings of the 2001 Winter SimulationConference, ed. B. A. Peters, J. S. Smith, D. J. Medeirosand M. W. Rohrer, 385–393. Piscataway NJ: IEEE.

Ghosh, S., and S. G. Henderson. 2001b. Chessboard dtributions and random vectors with specified marginaland covariance matrix.Operations Research.To appear.

Henderson, S. G. 2001. Mathematics for simulation. IProceedings of the 2001 Winter Simulation Conferenc,ed. B. A. Peters, J. S. Smith, D. J. Medeiros, and M. WRohrer, 83–94. Piscataway NJ: IEEE.

Henderson, S. G., B. A. Chiera, and R. M. Cooke. 2000Generating “dependent" quasi-random numbers. InPro-ceedings of the 2000 Winter Simulation Conference, ed.J. A. Joines, R. R. Barton, K. Kang, and P. A. Fishwick527–536. Piscataway NJ: IEEE.

Iman, R., and W. Conover. 1982. A distribution-free approach to inducing rank correlation among input variables.Communications in Statistics: Simulation andComputation11:311–334.

Johnson, C. R. 1990. Matrix completion problems: a suvey. Proceedings of Symposia in Applied Mathematics 40:171–198.

Kruskal, W. 1958. Ordinal measures of associaton.Journalof the American Statistical Association53:814–861.

Kurowicka, D., and R. M. Cooke. 2001. Conditional, par-tial and rank correlation for the elliptical copula; de-pendence modelling in uncertainty analysis. Technicareport, Delft University of Technology, Mekelweg 4,2628CD Delft, Netherlands.

Law, A. M., and W. D. Kelton. 2000.Simulation Modelingand Analysis. 3rd ed. New York: McGraw-Hill.

Li, S. T., and J. L. Hammond. 1975. Generation of pseudorandom numbers with specified univariate distributionand correlation coefficients.IEEE Transactions on Sys-tems, Man, and Cybernetics5:557–561.

Lurie, P. M., and M. S. Goldberg. 1998. An approxi-mate method for sampling correlated random variable

:

ri-

s-

from partially-specified distributions.Management Sci-ence44:203–218.

Mardia, K. V. 1970. A translation family of bivariate distri-butions and Fréchet’s bounds.SankhyaA32:119–122.

Whitt, W. 1976. Bivariate distributions with given marginalsThe Annals of Statistics4:1280–1289.

Wolkowicz, H., R. Saigal, and L. Vandenberghe. (Eds2000.Handbook of Semidefinite Programming: TheoryAlgorithms and Applications. Boston: Kluwer.

AUTHOR BIOGRAPHIES

SOUMYADIP GHOSH is a doctoral student in the Schoolof Operations Research and Industrial Engineering at CornUniversity. He has a Masters degree in Operations Reseafrom the University of Michigan, Ann Arbor. He can becontacted at<[email protected]> .

SHANE G. HENDERSON is an assistant professor in theSchool of Operations Research and Industrial Engineeriat Cornell University. He has previously held positions inthe Department of Industrial and Operations Engineerinat the University of Michigan and the Department of Engineering Science at the University of Auckland. He isan associate editor for the ACM Transactions on Modeing and Computer Simulation, Mathematics of OperationResearch, and Operations Research Letters, and the newster editor for the INFORMS College on Simulation. Hisresearch interests include discrete-event simulation, queuing theory and scheduling problems. His e-mail addresis <[email protected]> , and his webpage is<www.orie.cornell.edu/˜shane> .