Top Banner
254 European Journal of Operational Research 42 (1989) 254-267 North-Holland Theory and Methodology Linearly constrained estimation by mathematical programming Emil KLAFSZKY * Department of Computer Science, Technical University, Miskolc, 3515 Miskolc-Egyetemvdtros, Hungary Jfinos MAYER* Department of Applied Mathematics, Computer and Automation Institute of the Hungarian Academy of Sciences, 1132 Budapest, Victor H.u. 18-20, Hungary Tamfis TERLAKY * Department of Operations Research, Eiitv6s Lbrfnd University, 1088 Budapest, M{tzeum krt. 6-8, Hungary Abstract: Some mathematical programming models of the mixing problem are discussed in this paper. Five models, based on different discrepancies, are considered and their fundamental properties are examined. Using variational and Smirnov distances, linear programming models are obtained. Pearson divergence leads to quadratic programming, Hellinger divergence leads to/p-programming and the Kullback-Leibler information divergence gives a special geometric programming model. Finally some computational experiences are presented. Keywords: Statistical estimation, mixing, distance, information divergence, linear programming, quadratic programming, /p-programming, geometric programming. 1. Introduction Mathematical models of mixing problems have several applications in practice. Two examples are the following: Gasoline blending. From given hydrocarbon dis- tillates the 'best approximation' of a given kind of petrol with prescribed quality parameters is to be mixed. (Parameters may be given according to any standards.) Because of its practical importance * Research partially supported by the Hungarian National Research Foundation under Grant no. OTKA-1049. Received April 1988; revised August 1988 this problem received attention already at the early years of mathematical programming, see [3,7]. Concrete mixing. From different sand and gravel components with various granule distributions the 'best approximation' of the 'ideal' sand and gravel mixture for a given kind of concrete is to be mixed. This is one of the standard problems en- gineers are faced with when building trusses, see [26,20]. Mixing of chemicals, liquids, gases or other kinds of material leads frequently to the same mathematical problem. This indicates that the mixing problem considered in this paper has several practical applications. Five mathematical 0377-2217/89/$3.50 © 1989, ElsevierSciencePublishers B.V. (North-Holland)
14

Linearly constrained estimation by mathematical programming

May 14, 2023

Download

Documents

Henri Barkey
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linearly constrained estimation by mathematical programming

254

European Journal of Operational Research 42 (1989) 254-267 North-Holland

Theory and Methodology

Linearly constrained estimation by mathematical programming

Emil K L A F S Z K Y * Department of Computer Science, Technical University, Miskolc, 3515 Miskolc-Egyetemvdtros, Hungary

Jfinos MAYER* Department of Applied Mathematics, Computer and Automation Institute of the Hungarian Academy of Sciences, 1132 Budapest, Victor H.u. 18-20, Hungary

Tamfis T E R L A K Y * Department of Operations Research, Eiitv6s Lbrfnd University, 1088 Budapest, M{tzeum krt. 6-8, Hungary

Abstract: Some mathematical programming models of the mixing problem are discussed in this paper. Five models, based on different discrepancies, are considered and their fundamental properties are examined. Using variational and Smirnov distances, linear programming models are obtained. Pearson divergence leads to quadratic programming, Hellinger divergence leads to/p-programming and the Kullback-Leibler information divergence gives a special geometric programming model. Finally some computational experiences are presented.

Keywords: Statistical estimation, mixing, distance, information divergence, linear programming, quadratic programming, /p-programming, geometric programming.

1. Introduction

Mathematical models of mixing problems have several applications in practice. Two examples are the following:

Gasoline blending. From given hydrocarbon dis- tillates the 'best approximation' of a given kind of petrol with prescribed quality parameters is to be mixed. (Parameters may be given according to any standards.) Because of its practical importance

* Research partially supported by the Hungarian National Research Foundation under Grant no. OTKA-1049.

Received April 1988; revised August 1988

this problem received attention already at the early years of mathematical programming, see [3,7].

Concrete mixing. From different sand and gravel components with various granule distributions the 'best approximation' of the 'ideal' sand and gravel mixture for a given kind of concrete is to be mixed. This is one of the standard problems en- gineers are faced with when building trusses, see [26,20].

Mixing of chemicals, liquids, gases or other kinds of material leads frequently to the same mathematical problem. This indicates that the mixing problem considered in this paper has several practical applications. Five mathematical

0377-2217/89/$3.50 © 1989, Elsevier Science Publishers B.V. (North-Holland)

Page 2: Linearly constrained estimation by mathematical programming

E. Klafszky et al. / Linearly constrained estimation 255

models of the mixing problem are studied here. The mixing problem is closely related to the de- composition of distributions, see Medgyessy [18].

The general mathematical model of mixing and five measures for discrepancy are presented in the second section. Four of these were summarized by Csiszfir [5,6], as special cases of the f-divergence while the fifth (Smirnov distance) can be found in R6nyi's book [21]. Section 2 also contains a brief discussion of the pertinent works in the literature. Based on the different discrepancies five models are constructed in the third section. Basic proper- ties, special structure of these models and their interconnections are examined as well. In the fourth section computational results are presented and the final section contains a brief summary together with conclusions.

2. Mathematical models for mixing

The general mathematical model of mixing is presented in this section and five divergences-- frequently used in mathematical statistics--are discussed.

The mathematical model of mixing. Let a (i) = ( ai, 1 . . . . . a,,j . . . . . ai,,), i = 1 . . . . . m, be probability distributions. Denote

a(1) }

A = " = (a I . . . . . aj . . . . . a , ) a (m) ]

as the m × n matrix, having a m , . . . , a (") as its row vectors, and al , . •., a j , . . . , a n denotes column vectors of matrix A. Using distributions a m . . . . , a (m) our aim is to find the best approxi- mation of the probability distribution

c= (v,, . . . ,vj ..... v,).

In this case A>I0, A - l = l , c - l = l , where 1 = (1 . . . . . 1) denotes the all-one vector.

Problem. Find a vector x ~ R m, for which 1 • x = 1, x >/0 (mixing conditions) hold, and the mixture xA = z is a 'good' approximation of the goal distribution, i.e. for which D(z I1 c) is minimal.

Here function D(z [ ]c) measures the deviation of the distributions z and c. Depending on the

different functions D ( z l l c) different mixing models are obtained.

Divergence functions. It was proved by CsiszAr [5,6] that most of the divergences that are com- monly used to measure the discrepancy between distributions, belong to the family of f-diver- gences. Some of these are presented here.

Let f : R + --, ~ be a convex function with f(1) = 0, and for the sake of convenience we may assume that

f(O)-- lira f(u); O.f(o)=a. lira f(u) u ~ 0 + u ~ U

Definit ion 2.1. Let P = ( P l , . . . , P , , ) and q = (q~ . . . . . G ) be two arbitrary distributions; then

D / ( p [ ] q ) = ~ q j . f pj j = l

is the f-dwergence of p and q.

With different functions f , different dis- crepancies are derived.

Variational distance (norm l 0. Let f ( u ) = ] u - l l ; then

D/(pL l q )= j~=lq j -~ j - = = ]PJ-qJ]"

Pearson's Q-divergence. Let f ( u ) -~ (u - 1)2; then

D i ( p l lq) = Y'. qj - 1 j = l

o (p j_ q )2 = Z

j = 1 q2

Kullback-Leibler information divergence (en- tropy). Let f ( u ) = u. log u, and for the sake of simplicity of presentation,

0 . l o g -=0 lim u . l o g -=u lim u- log u = 0 a u ~ , 0 + a u ~ 0 +

and

a l im a . log a a . log ~ = u-*0 + u

a - ' l o g a - a , lim l o g u = - ~ . u ~ O +

Page 3: Linearly constrained estimation by mathematical programming

256 E. Klafszky et al. / Linearly constrained estimation

So in this case

Df(p[ [ q ) = ~ qj~j( log pj = " Ps j = l qj E Pj log - - . j = l qs

This Df(p [ [q) is called information divergence. Hellinger divergence. Let f(u) = 1 - f f f ; then

Df(Pl lq)= ~ q j ( l -

: : 1 -

j = l j = l j = l

Smirnov distance. (/+-norm). Let P = (P1 . . . . . P,) and Q = (Q1 . . . . , Q,) be cumulative distribu- tions, i.e.

J J

Pj = E Pi and Q / = ~ qi, J = 1 , . . . , n . i=1 i=1

Then

Of(p Iq) = max IPj-Qsl l<~j<~n

is the Smirnov distance of the distributions P and q.

Mixing models, defined by the above given distances are examined in the next section. For the sake of simplicity mixing models obtained by using variational, Pearson, etc. discrepancy measures will be called variational model, Pearson model, etc.

The variational, Pearson, Kullback-Leibler and Hellinger models as formulated above belong to the following class of linearly constrained statisti- cal estimation problems:

minimize D ( p [ [ q),

1 .p = 1, (2 .1 ) p>~0, p ~ ,

where ~ is a given convex polyhedron and q a fixed discrete probability distribution.

Using the variational distance in (2.1) leads to a problem which can easily be reformulated as a linear programming problem. This is common knowledge, see e.g. [23].

Before discussing the other three models a short comment is to be made. Polyhedron ~ in (2.1) can either be given as an intersection of half spaces, i.e. ~ = {plHp <~ h) for some given ma-

trix H and vector h, or it can be specified as the convex hull of some given vectors, i.e.

~a= conv{ a <1) ..... a<m) }.

Our mixing problem as formulated above clearly falls in this second class of problems. It is a well-known fact that these two formulations are far from being equivalent from the point of view of the numerical solution of problems with linear constraints. The transition from one representa- tion of ~a to the other requires application of algorithms which are very time consuming.

Selecting the Pearson divergence as the objec- tive function in (2.1) the resulting problem is clearly a convex quadratic programming problem. In the case when ~a is given by a system of linear inequalities standard quadratic programming methods can be applied while for the representa- tion of ~a as a convex hull special-purpose al- gorithms have been developed, see Wolfe [27,28].

Using information divergence in (2.1) results in estimation models introduced by Kullback and Leibler, see Kullback [16]. These type of models with an inequality representation of ~a have been thoroughly investigated by Jaynes, Kullback and Tribus, see Jaynes [14]; a corresponding duality theory was developed and dual-type algorithms were constructed via unconstrained convex pro- gramming by Brockett, Charnes and Cooper [2], and Charnes and Cooper [4]. Quadratically con- strained models in this type have been introduced and studied by Fang and Rajasekera [12]. The connection of (2.1) with geometric programming is well-known, see Duffin and Peterson [9], a gener- alization is given in Scott and Jefferson [24]. In this paper problem (2.1) is considered with the alternative representation of ~a, in fact ~a is as- sumed to be given as the convex hull of a finite set of distributions. Using geometric programming this gives rise to a duality relationship and dual- type algorithmic approach different from those given in [2,4] (constrained dual).

For the case when Hellinger distance is used in (2.1) the authors of this paper did not succeed in finding pertinent works in the literature.

The Smirnov model does not belong to the family specified by (2.1), but it is a well-known fact that it can be reformulated as a linear pro- gramming problem, see e.g. [23].

Concerning our notations, remark that matrices will be denoted by Latin capital letters, vectors by

Page 4: Linearly constrained estimation by mathematical programming

E. Klafszky et al. / Linearly constrained estimation 257

Latin small letters and their components by the corresponding Greek letters, e.g.

a(l ) m 17

A = " = (a , . . . . . a , ) = (%j),=, , ,=1 a (m)

The following denotations for l 1- and /a-norms will also be introduced:

17

IIPIIA = ~ IPj[ and I lP l lc = max ]pjI- j = 1 1 ~j~<n

3. Duality relationships, algorithmic aspects

In this section mixing models obtained by the above introduced discrepancy measures are ex- amined. By constructing primal-dual pairs of mathematical programming problems, algorithms for solution of mixing models are proposed as well. For convenience positivity of the goal distri- bution, (c > 0), is assumed below. To allow for a short formulation of duality propositions the fol- lowing notion will be introduced.

Definition (Gapless duality). Two mathematical programming problems are said to be gapless du- als of each other, if

(i) one of them is formulated as a minimization, the other one as a maximization problem,

(ii) there exists an optimal solution for both problems and the optimal values of the objective functions coincide,

(iii) for any feasible solution of the minimiza- tion problem the objective function majorizes the objective function value of the maximization prob- lem at any feasible solution of that problem.

3.1. Variational model

If variational distance measures the dis- crepancy between two distributions, we have the following mixing model.

Find a vector x ~ R " for which 17

I x a j - ~/I is minimal,

(3.1) subject to 1. x = 1,

x>~0.

The basic duality relationship concerning this estimation problem is formulated in the following proposition.

Proposition 1. The following optimization problems

are gapless duals.

Primal problem (P)

minimize II xA - c II A

subject to 1 - x = 1,

x>~0.

Dual problem (D)

maximize cy +

subject to A y + l ~ < ~ O ,

Ilyllc~l.

Proof. Introducing new variables z + = ( f + . . . . . ~+) >~ 0 and z - = (~{ . . . . . ~-) >/0 with ~j - ~f = x a j - - "/j , j = l . . . . . n , the primal problem is equiv- alent to the following linear programming prob- lem.

minimize 1 - z + + 1 • z -

subject to AVx + z + - z - = c, (3.2)

l ' x = l , x>~0, z+>~0; z->~0.

The dual of (3.2) is as follows:

maximize cy +

subject to A y + 1- 7/~< 0, (3.3)

- l ~ < y ~ l .

Problem (3.3) is the linear programming formula- tion of the dual in the proposition, so LP-duality proves Proposition 1. []

Remark. The linear programming primal-dual formulation of the variational model is well- known, see e.g. [23].

The matrix structure of problems (3.2) and (3.3) is illustrated in Figure 1.

A special feature of problem (3.2) is that it contains all positive and negative unit vectors except +e17+l. Utilizing this special structure a starting feasible solution can be found by a single pivot operation and so the first phase of the simplex method becomes trivial. In fact, any unit element of the last row (e.g. the first one) can be chosen as the first pivot element. In this case the unit matrices + E of the tableau remain un- changed and depending on the sign of the new right-hand side components, the previously chosen

Page 5: Linearly constrained estimation by mathematical programming

258 E. Klafszky et al. / Linearly constrained estimation

[ o i I L I I

I 0

Figure 1. Structure of the LP-equivalent of the variational model

E -E

0 . . . 0 0

vector together with unit vectors +e j or - % , j = 1 . . . . . n, can be chosen as an initial feasible basis to start the simplex method with.

To summarize the considerations above: The variational model has been transformed into a special linear programming problem where the first phase of the simplex method turns out to be trivial.

Remark. When the dual problem (3.3) is solved by the simplex method, the upper-bounding tech- nique can be used. Fixing the components of y at their lower bound ( - 1), slack variables provide an initial feasible solution.

3.2. Pearson model

The above formulated problems (3.5) and (3.6) are gapless duals.

Proof. Denote z = xA; then the objective function in (3.4) can be reformulated as follows.

(z T - cT)S(z -- C)

= zTSz -- 2zTSc + cTSc.

Using relations Sc = 1 and z- 1 = 1, it is clear that problems (3.4) and (3.5) are equivalent.

The gapless duality follows from /p-program- ming duality theory, see e.g. Terlaky [25]. []

It is well-known, see [25], that

y + S z = O ,

x ( A y - 1-7/) = 0, (3.7)

together with primal and dual feasibility, is an optimality criterion for problems (3.5) and (3.6).

Remark. Notice that for any optimal solution (y, z, ~) of (3.6) A y <~ 0 holds: since y = - Sz = - S A T x for all optimal solutions x of (3.6). This implies Ay = - A S A T x <~ O, since A >/0, S >/0, and x >/0. (This property will be used in the algorithm for solving the Kullback-Leibler model.)

Using Pearson's Q-divergence for building the mixing model the following problem is derived.

maximize ( x A - c ) S ( A T x -- c)

subject to 1. x = 1, (3.4)

x>_-0,

where S in an n × n matrix defined by S = diag(1/yj), i.e. ojj = 1/yj , j = 1 . . . . . n, and oi j= O, i = 1 . . . . . n; j = l . . . . . n, ifi--/:j.

Proposition 2. An equivalent formulation of the Pearson model is the following quadratic program- ming problem.

Primal problem (P)

minimize ½ z VSz

subject to xA - z = 0, (3.5) l . x = l , x>~0.

Let us formulate the following dual problem.

Dual problem (D)

maximize - ½yTS-1Y + ~1 (3.6) subject to Ay+l .~l<<.O.

Quadratic programming problems (3.5) and (3.6) will be solved as a linear complementarity problem. The corresponding linear complementar- ity problem is the following:

xA - z = O, 1 . x = l , A y + l . n + u = O ,

y + S z = O ,

x>~O, u>~O, xu = O.

Using relations z = xA and y = - S z and intro- ducing the notation P = A S A T the following lin- ear complementarity problem is obtained.

1 - x = l , 1 . ~ l - Px + u = O ,

x>~0; u = 0 , xu = 0. (3.8)

The data structure of problem (3.8) is illustrated in Figure 2.

In this case variables (~, ~1, /~2 . . . . . /~m) give a starting basis (by two pivot operations; pivot posi-

Page 6: Linearly constrained estimation by mathematical programming

x

o l d D{

{

Figure 2.

E. Klafszky et al. / Linearly constrained estimation

1

i

O

Structure of the linear complementarity problem corresponding to the Pearson model

tions are indicated in Figure 2 by [] for Lemke's [17] complementary pivot algorithm).

Note that ~/ is a free variable and so Lemke's method can be mod i f i ed I l ike the simplex method--so that 7/is kept as a basic variable, i.e. it is never a candidate for leaving the basis.

3.3. Kullback-Leibler model

Using information divergence for measuring the discrepancy between the goal distribution and the mixture we have the following problem.

Find a vector x ~ R" which

minimizes i ~'j log ~j j = ] "Yj

subject to xA - z = 0, (3.9)

1 - x = l , x>~0.

The duality relationship underlying this problem is formulated as follows.

Proposition 3. The following pr imal-dual problems are gapless duals. The primal & equivalent to (3.9).

Primal problem (P)

minimize - ~ fj log yj + fj log 5 j = l j ~ l

subject to xA - z = 0, (3.10)

1 - z = l , x>~0; z>~0.

Dual problem (D)

maximize - l o g i Yj e-nJ j = l ( 3 . 1 1 )

subject to d y <~ O.

259

ProoL Since xA = z and A >/0, z >/0, and A • 1 = 1; these relations imply that 1 • x = 1 is equivalent to 1 • z = 1 in this case. So (3.9) is equivalent with (3.10). It can easily be seen that problem (3.10) is a geometric programming problem (Duffin, Peter- son and Zener [8]; Klafszky [15]) whose dual problem is as follows:

maximize

subject to e"'"~'~< 1, i = 1 . . . . . m, (3.12)

e~ ~ yj e ~'~<1. j=l

The first group of constraints in problem (3.12) can be reformulated as the inequality Ay <~ 0 by taking the logarithm of both sides. The last in- equality is clearly active at any optimal solution so by eliminating variable 7/the dual problem in the proposition results. Observing the Slater regularity of problem (3.10) the gapless duality follows from geometric programming duality theory [8,15] []

Problem (3.12) is clearly equivalent to the follow- ing problem.

minimize ~ yj e -°j j_ , (3.13)

subject to Ay <~ O.

Let us consider the system of equilibrium condi- tions of problems (3.10) and (3.12), as given be- low.

xA - z = O, Ay <~ O,

1 - z = l , t/

x>~0, z>~0, e n ~ j e ~ J = l , j = l

xAy = O,

~/ e - ~ j + ~ = ~j .

Let us introduce the notations

n

G ( y ) = ~ "/j e 77j and IIq[I = ~ Iqjl- 1--1 j = l

In this case G ( y ) = II V'G(y)II, moreover varia- bles z and ~7 can be eliminated from the equi- librium system, since

1 1 e " - G(v~ and z - G ( y ~ T G ( y )

Page 7: Linearly constrained estimation by mathematical programming

260 E. Klafszky et aL / Linearly constrained estimation

hold. This means that the equilibrium system can equivalently be formulated in the simple form as follows:

1 xA - - VG(y) , Ay ~ O,

G(y)

x >/O, (3.14)

xAy = O.

Notice that system (3.14) is exactly the Kuhn- Tucker system of the dual problem (3.13).

Numerical solution of the Kullback-Leibler mod- eL One possible way to solve the Kullback-Leibler model would be to solve directly problem (3.9) or the equivalent problem

& a j x minimize ~ (ajx) log - -

j = l YJ

subject to 1. x = 1,

x>~0.

An other possibility is the solution of the dual problem (3.13), in this case the optimal value of the primal variables can be obtained from the equilibrium conditions (3.14).

The first way is alluring, since the reformulated primal problem is a convex programming problem with a single linear equality constraint and non- negativity requirements for the variables. The pitfall inherent in this formulation can be dis- covered when partial derivatives of the objective are computed: It turns out that the objective function is not differentiable along the subspaces ajx = O, j = 1,. . . , n. Considering the geometric programming formulation (3.10) of the primal problem it turns out that this trouble is just a special instance of the general difficulty of the same nature, reported in connection with the numerical solution of general geometric program- ming problems [22]. A new approach, based on controlled dual perturbations has recently been published by Fang, Peterson and Rajasekera [11] for handling this problem.

To avoid the difficulties mentioned above, in our approach the second way was chosen, i.e. the dual problem (3.13) was solved and optimal pro- portions for mixing was computed from the Kuhn-Tucker system (3.14).

Algorithm. The algorithm developed for the solution of the linearly constrained nonlinear pro- gramming problem (3.13) is based on the reduced

gradient type algorithm of Murtagh and Saunders which was successfully implemented in their MINOS program package [19]. So nonbasic varia- bles are divided into two subsets as free ('super- basic) and fixed ('nonbasic') variables. For the minimization in the subspace of free variables the Davidon-Fletcher-Powell method [13] has been implemented. The special features of the problem have been utilized in the following way:

Starting point. In connection with the Pearson model it has been shown above that for the opti- mal solution of that model the inequality Ay <~ 0 holds. This means that the optimal solution of the Pearson model can be used as a feasible starting point for our algorithm. Since the Pearson di- vergence can be considered as an approximation of the Kullback-Leibler divergence (using Taylor series expansion), this provides a 'good' starting point.

No pivot transformations needed. If ,4 is of full row rank, then choosing any starting basis from the column vectors of ,4 no basis transformations are needed in the solution process since the */i variables are all free variables.

Computation of primal variables. If y* is an optimal solution of problem (3.13) then an opti- mal solution (x*, z*) of (3.9) can be determined as follows. (Here again full row rank of matrix A is assumed.) It was proved above that the follow- ing relation determines the optimal z *,

1 z*= - - v O ( y * ) .

G ( y * )

Vector x* can be determined easily as follows.

A~x * = z * ,~ ~ x * = ~z~

, ,x*= _ _ 1 v O(y,)B_l. G ( y * )

Here subscripts B and N denote basic and non- basic parts, respectively. The vector

wTG(y* )B -1

consists of those components of the reduced gradi- ent of G (y *) which belong to slack variables, so it is computed at each step of the algorithm which solves (3.13).

Page 8: Linearly constrained estimation by mathematical programming

E. Klafszky et al. / Linearly constrained estimation 261

As the proposed method can be considered as a specialization of the algorithm published in [19], theoretical convergence behavior is at least as good as it is for the method in [19].

3.4. Hellinger model

If Hellinger divergence is used as the objective function of the mixing problem, then the follow- ing model results.

m i n i m i z e 1 -- ~ i ( ajx )'~j j : l (3.15)

subject to 1 . x = l , x > 0 .

Proposition 4. Problem (3.15) can equivalently be formulated as an lp-programming problem.

Proof. First notice that the minimization of

1 - ~ (~-7,x)7, j~l

is equivalent to the maximization of // y

j=l

Introducing new variables ~0j, j = 1 . . . . . n, un- der the conditions ajx >1 oa 2, j = 1 , . . . , n, the max- imization of the above specified function is equiv- alent to the maximization of

n

j=l

since increasing oaj implies increasing ajx. Clearly ajx = oa 2, for all j ' s at an optimal solution.

Since a (n >/0 and a u) =~ 0 hold, so constraint 1 • x = 1 can be replaced by the constraint 1 • x ~ 1 because again equality holds for all optimal solu- tions.

Based on the above considerations, problem (3.15) can be reformulated as the following/p-pro- gramming primal problem.

maximize ~ ~j~0j j=l 1 2 ½ajx<~O, = 1 . subject to 7~0j - j . . . . n,

1 . x - l ~ < 0 , - E x <~ O,

where ~j = ~-j, j = 1 . . . . . n.

(3.16)

This proves the proposition. []

Proposition 5. Problem (3.16) as primal problem, and the dual given below are gapless duals.

Dual problem (D)

minimize

subject to g y + 1 . ~ / ~ 0 ,

y > O ,

~/~0.

(3.17)

Proof. Utilizing theoretical results of l /program- ming [25], the dual problem of (3.16) can be formulated as follows.

minimize

subject to

2 n ~j

+ j~, ~/J 2 @ ' =

U=?, - { A y + l . C l - v = O ,

y>~0, ~ > 0 , v>-0,

~ j j = 0 ~ t x j = O , j = l . . . . . n.

This problem simplifies if we use the conditions u = ? > 0, since in this case /~/= 0 cannot occur, i.e. y > 0 holds for all feasible solutions. The dual problem becomes the following.

t/

minimize ~ + 7 "b Yl= 2'/9

subject to lay<<, 1. ~,

y > 0 ; ~ > 0 .

By introducing the notation ~ / = - 2 ~ the dual problem (3.17), given in the proposition, has been derived. Let us observe that problem (3.16) is Slater regular, the desired gapless duality relation- ship results from /p-programming duality theory, see [25]. []

For simplicity denote G(y) = E~= t't;/~; in the sequel.

Let us consider the equilibrium system of the primal-dual pair (3.16) and (3.17). This is the

Page 9: Linearly constrained estimation by mathematical programming

262 E. Klafszky et aL / Linearly constrained estimation

following system:

w2 - a j x < O, 1 . ~ + A y ~ < 0 ,

l . x = l , ~ < 0 ,

x>_-0, y > 0 ,

~ i ( a ( i ) y + ~ ) = O , i = 1 . . . . . m,

~ ( l . x - 1) = 0 ,

~ j ( o f l - a , x ) = 0, j = l . . . . . n,

~joaj = {j, j = l . . . . . n.

Considering the complementarity conditions, the first one is obviously equivalent to the equality x A y + 7/= 0, the third one to aa 2 - a j x = 0 for all j ' s , since y > 0 holds. Furthermore matrix A has no zero vectors as its row vector, so ~ < 0 and 1 • x = 1 hold. Using these observations the above given equilibrium system has the following equiv- alent formulation.

xA = - W G ( y ) , 1 .~ + A y e 0 ,

l . x = 1, y > 0, (3.18) x>~0

x A y = - 7,

where

v a ( y ) = . . . . . ,,, .

Since G ( y ) = - W G ( y ) y , so for any vectors, satisfying (3.18) the following relations hold:

-+1 = x A y = - X7G( y ) y = G( y ) .

By eliminating t/, (3.18) is equivalent to the fol- lowing:

xA = - WG(y) , A y - 1. GCY) <~ O, l . x = 1, y > 0, (3.19)

x > 0 .

It can easily be derived from the considerations above that for any optimal solution of the dual problem (3.17) ~ / = - G ( y ) holds, so (3.17) is equivalent to the following convex programming problem.

minimize G ( y )

subject to Ay - 1 . G( y ) <~ O,

y > 0 .

Notice that for optimal solutions,

It wG(y) [ ] A = -- WG(y) • 1 = x A . 1 = x . 1 = 1

holds.

Numerical solution o f the Hellinger model. Con- sidering the numerical solution of the primal prob- lem (3.15) the same arguments hold as for the Kullback-Leibler model, i.e. the objective func- tion is not differentiable along the subspaces a j x =0 , j = 1 . . . . . n, which is a source of several numerical difficulties. So dual problem (3.17) was solved and the optimal value of the mixing varia- ble x was computed utilizing the equilibrium con- ditions (3.19). For general /p-problems a new al- gorithm has been published recently by Fang and Rajasekera [10]: they handle the above-mentioned difficulties directly in the primal problem by con- trolled dual perturbations.

For the solution of the linearly constrained problem (3.17) the same version of the reduced gradient method as for the Kullback-Leibler model has been implemented. The following spe- cialities of the problem are utilized.

Starting point selection, no pivoting needed. Re- marks made in Section 3.3 concerning the basis are valid here as well. Starting from a feasible solution of the dual problem (3.17), inequality y > 0 holds throughout the procedure, since the objective function acts in this case like a penalty function.

Computation o f pr imal variables. Having com- puted the optimal solution of (3.17) the optimal solution of the primal problem can be computed analogously as in the case of the Kullback-Leibler model.

For the theoretical convergence behavior the same remark holds, as for the Kullback-Leibler model.

3.5. Smirnov model

If cumulative distributions are mixed instead of probability distributions (densities), then Smirnov distance can be used.

For simplicity of notations denote

J

Yj= ~ Vk, j = l . . . . . n, k = l

and

J

a i j = Y" aik, j = l . . . . . n; i = l , . . . , m. k = l

So the following Smirnov model results.

Page 10: Linearly constrained estimation by mathematical programming

E. Klafszky et al. / Linearly constrained estimation 263

minimize max I x a j - vjl 1 <<.j<~n

subject to l . x = 1, (3.20)

x>~0.

The underlying duality relationship is as follows.

Proposition 6. The fol lowing two problems are gap-

less duals.

Primal problem (P)

minimize II xA - c II c

subject to l . x = l ,

x > 0 .

Dual problem (D)

maximize cy + 7/

subject to A y + l . ~ 7 < ~ O ,

I lYllA~<l

Proof. The primal problem is clearly equivalent to the following linear programming problem.

minimize ~"

subject to x A + ~ . l > c ,

- x A + ~. 1 >1 - c , (3.21)

x l = 1 , x > 0 .

The dual problem of (3.21) is the following.

maximize cY 1 - cY 2 + ~1,

subject to A y 1 - A y 2 + 1 . 77 <~ 0, (3.22)

1 .yl + 1 . y2 = 1, yl>~0, y2>~0,

which is just a reformulation of the dual problem given in the proposition. It is obvious that at the optimal solution II y ]l A = 1 holds. So gapless du- ality follows from LP duality theory. []

Remark. The primal-dual LP-formulation of the Smirnov model is well-known, see e.g. [23].

Considering the size of problems (3.21) and (3.22) it is more convenient with the simplex method to solve dual problem (3.22). The matrix structure of problem (3.22) is illustrated in Figure 3.

The first phase of the simplex method can be passed by performing a single pivot step. In fact, one can chose an element of the shadowed part of the last row (e.g. the first one) as a pivot element.

-o l . . . . . 7 4 °

E

0

Figure 3. Structure of the LP-equivalent of the Smirnov model

After pivoting the right-hand side is nonnegative again (since A >/0 holds) and so the variable corresponding to the pivot column and slack vari- ables provide a feasible basic solution.

Note that simply by two pivots an initial feasi- ble basis can be constructed for the primal prob- lem as well. To verify this is left to the reader.

Remark. Comparing the duality relations in Prop- osition 6 with the analogous relations given in connection with the variational model complete symmetry of l~- and /z-norms can be observed.

4. Computational experiences

A program system--based on the proposed algorithms--has been developed for the solution of the different mixing models in order to be able to compare them computationally.

The program system has been developed in FORTRAN-77 language and was run on a 8MHz IBM/PC-AT endowed with a 10 MHz math coprocessor.

The algorithms implemented for the various models utilize the specific features of the individ- ual models and they are based on the following methods.

Variational model: Simplex method with upper- bounding technique for the dual problem.

Pearson model: Lemke's complementary pivot method.

Kullback-Leibler Reduced gradient method for model: the dual problem.

Hellinger model: Reduced gradient method for the dual problem.

Smirnov model: Simplex method for the dual problem.

In the cases of the Kullback-Leibler and Hel- linger models our experience was that to compute

Page 11: Linearly constrained estimation by mathematical programming

264

Table 1 Granule distributions

E. Klafszky et al. / Linearly constrained estimation

for sand and gravel mixtures

1. 0.200 0.174 0.214 0.087 0.058 0.048 0.161 0.016 0.015 0.027 2. 0.150 0.100 0.200 0.080 0.090 0.130 0.210 0.010 0.010 0.020 3. 0.000 0.000 0.010 0.140 0.210 0.120 0.070 0.440 0.010 0.000

a4. 0.265 0.140 0.180 0.130 0.100 0.070 0.055 0.040 0.005 0.015 5. 0.000 0.010 0.020 0.030 0.030 0.120 0.710 .050 0.030 0.000

a6. 0.230 0.230 0.190 0.100 0.100 0.060 0.040 0.050 0.000 0.000 a7. 0.130 0.170 0.160 0.140 0.110 0.110 0.080 0.060 0.010 0.030

8. 0.000 0.050 0.100 0.080 0.090 0.180 0.430 0.050 0.020 0.000 9. 0.000 0.039 0.337 0.139 0.087 0.085 0.245 0.017 0.017 0.034

10. 0.640 0.189 0.070 0.006 0.005 0.013 0.042 0.005 0.010 0.020 11. 0.620 0.250 0.045 0.005 0.008 0.006 0.032 0.004 0.010 0.020 12. 0.010 0.350 0.510 0.030 0.010 0.010 0.010 0.010 0.007 0.053 13. 0.000 0.250 0.260 0.200 0.090 0.050 0.020 0.050 0.020 0.060 14. 0.000 0.000 0.000 0.000 0.000 0.000 0.020 0.330 0.300 0.350 a15. 0.200 0.100 0.170 0.140 0.120 0.090 0.090 0.020 0.020 0.050 ~16. 0.110 0.080 0.150 0.120 0.120 0.130 0.130 0.090 0.020 0.050

a This distribution represents a desirable mixture for some specific kind of concrete.

a solution of the primal problem with a given accuracy via solving the dual and utilizing the equilibrium conditions, requires the solution of the dual with a significantly higher accuracy. So e.g. to get a primal solution with an accuracy 10 -5 the norm of the reduced gradient for the dual problem was to be decreased below 10 -8 , and to 10-3 primal accuracy corresponds 10-6 dual accu- racy in the previous sense. For the critical oper- ations (scalar product of vectors, division) double precision computations were employed.

Scaling input data (columns of matrix A, and vector c) turned out to be a useful tool for impro- ving numerical stability of the algorithms.

Some of our computations results are presented here. Our models were tested on two different data sets. One of them is a practical and the other is a randomly generated data set.

Practical data set. The data in this case originate in a practical mixing problem. Data represent granule distributions of different kind of sand and gravel mixtures, rows of Table 1 contain the gran-

Table 2 Divergence for Example 1

P I H V S

P 0.129 0.060 0.030 0.303 0.113 I 0.130 0.059 0.028 0.298 0.110 H 0.131 0.059 0.028 0.297 0.110 V 0.186 0.079 0.038 0.281 0.103 S 0.263 0.123 0.062 0.406 0.054

ule distributions. Rows marked by an "a" contain distributions which represent desirable mixtures for some specific kind of concrete, they can be used as goal distributions. The first column of Table 1 serves as identification for the distribu- tions, the individual distributions will be referred to by their serial number.

Some typical results of our computations are presented below. The following abbreviations are used.

P: Pearson divergence. I: I divergence, Kullback-Leibler divergence. H: Hellinger distance. V: Variational distance. S: Smimov distance.

Example 1. Distributions 1, 2, 3, 5, 9 and 10 are mixed to approximate distribution 7. Rows of Table 2 correspond to the different mixing mod- els, the diagonal elements are the optimal values of the divergence function whereas off-diagonals are substitution values of the optimal weights into the other divergence functions.

Table 3 Optimal weights for Example 1

P 0.768 0.063 0.170 0.000 0.000 0.000 I 0.672 0.161 0.167 0.000 0.000 0.000 H 0.640 0.194 0.169 0.000 0.000 0.000 V 0.230 0.657 0.113 0.000 0.000 0.000 S 0.358 0.610 0.001 0.000 0.000 0.032

Page 12: Linearly constrained estimation by mathematical programming

E. Klafszky et al. / Linearly constrained estimation 265

Table 4 Divergences for Example 2

P I H V S

P 0.024 0.012 0.006 0.147 0.042 1 0.024 0.012 0.006 0.148 0.042 H 0.024 0.012 0.006 0.148 0.042 V 0.046 0.021 0.010 0.149 0.053 S 0.029 0.015 0.008 0.150 0.019

Table 6 Divergences for Example 3

P I H V S

P 0.101 0.051 0.026 0.282 0.118 1 0.102 0.051 0.026 0.280 0.108 H 0.105 0.051 0.026 0.280 0.103 V 0.132 0.066 0.034 0.272 0.165 S 0.245 0.114 0.058 0.371 0.048

Table 5 Optimal weights for Example 2

P 0.000 I 0.000 H 0.000 V 0.000 S 0.000

0.665 0.174 0.000 0.000 0.000 0.000 0.115 0.046 0.664 0.172 0.000 0.000 0.000 0.000 0.116 0.048 0.663 0.172 0.000 0.000 0.000 0.000 0.117 0.049 0.733 0.144 0.000 0.000 0.000 0.000 0.027 0.096 0.608 0.201 0.000 0.000 0.000 0.000 0.163 0.028

The optimal weights obtained from the five models are presented in Table 3, all values being rounded to 3 decimals. The sum of weights for the original values is 1.0 with an accuracy of 10 -5.

These results represent average behavior of the models, i.e. results for the P, 1 and H fits are nearly the same with I and H being very close to each other and P differing from them by a little extent. Results for divergences V and S differ from the previous results and from each other significantly.

Example 2. In this example distribution 16 was approximated by distributions 1, 2, 3, 5, 8, 9, 12, 13 and 14. Results are presented in Tables 4 and 5 in an analogous way as results for the previous example. The optimal weights are shown in Table 5.

These results show a deviation from average behavior into the direction of considerable coinci- dence of results for the P, I and H fits. Diver- gences V and S behave as in the previous exam- ple.

Example 3. Distribution 7 is approximated by distributions 1, 2, 3, 5, 8, 9, 10, 11 and 12. The results can be observed in Tables 6 and 7.

Table 7 Optimal weights for Example 3

In this case results differ significantly for the different models this being true also for diver- gences P, I and H. Notice that there is a struct- ural difference between weights for P and 1. The first entry in the first row is a ' true' zero, not just a rounded value.

Remark. Looking at the first three columns in Tables 2, 4 and 6 it can be observed that the ratio of the entries in the first and second columns is roughly 2, the same being true for the second and third columns. This is in accordance with mathe- matical statistics, see [1].

Randomly generated test problems. To explore the behavior of the different mixing models a population consisting of 80 randomly generated distributions has been computed, and using these data 150 test problems have been solved. This computational experiment confirmed the conclu- sions drawn from our practical problem. The aver- age behavior shows slight differences between the P, I and H fits. We have found however prob- lems, where the Pearson model gave completely different wights as the Kullback-Leibler and Hel- linger models. Such problem is given in the next example.

Example 4. The first two distributions in Table 8 are mixed to approximate the third distribution. The computational results are presented in Table 9.

The results show that for the Pearson and Smirnov models the optimal mixture consists solely

P 0.000 0.490 0.182 0.000 0.000 I 0.159 0.410 0.184 0.000 0.000 H 0.236 0.375 0.184 0.000 0.000 V 0.000 0.524 0.117 0.000 0.000 S 0.000 0.808 0.016 0.000 0.000

0.000 0.000 0.124 0.205 0.000 0.000 0.082 0.165 0.000 0.000 0.060 0.144 0.000 0.000 0.078 0.280 0.018 0.000 0.000 0.159

Page 13: Linearly constrained estimation by mathematical programming

266 E. Klafszky et at,. / Linearly constrained estimation

Table 8 Distributions from a randomly generated data set

1. 0.049 0.044 0.044 0.044 0.058 0.067 0.091 0.107 0.126 0.128 0.128 0.114 2. 0.052 0.045 0.044 0.048 0.059 0.077 0.097 0.116 0.117 0.131 0.122 0.092

a3. 0.007 0.022 0.028 0.050 0.077 0.106 0.134 0.135 0.129 0.119 0.113 0.080

This distribution represents a desirable mixture for some specific kind of concrete.

Table 9 Divergences and optimal weights for Example 4

P I H V S Optimal weights

P 0.340 0.096 0.040 0.276 0.034 1.00 0.00 1 0.352 0.089 0.036 0.234 0.024 0.00 1.00 H 0.352 0.089 0.036 0.234 0.024 0.00 1.00 V 0.352 0.089 0.036 0.234 0.024 0.00 1.00 S 0.340 0.096 0.040 0.276 0.080 1.00 0.00

of the first component while the optimal mixture resulting from the Kullback-Leibler, Hellinger and variational models consists exclusively of the sec- ond component. According to our experience problems with significant difference between the results obtained from the Pearson model, and the Kullback-Leibler and Hellinger models may ap- pear in situations when among the components there are 'similar' distributions.

The CPU time for the linear and quadratic models was just a few seconds. For the Kull- back-Leibler and Hellinger models the CPU times reported below were obtained in the case of 10 -5 primal accuracy, with an accuracy of 10 -8 for the reduced gradient in the dual problem. For distri- butions with 10 elements the CPU time for mixing 9 distributions was in the average around 20 sec- onds for the Kullback-Leibler model, and around 40 seconds for the Hellinger model. With distribu- tions consisting of 12 elements and mixing 2 com- ponents, average CPU time for the Kullback- Leibler model was 15 seconds and for the Hel- linger model 35 seconds. Due to the lack of a good starting point solving the Hellinger model re- quired in all cases significantly more CPU time than solving the Kullback-Leibler model.

5. Conclusions

Five mathematical programming models be- longing to the class of linearly constrained statisti- cal estimation problems have been investigated in this paper. Four of them are associated with the unifying framework of f-divergences while the fifth (Smirnov distance) has been included for the sake

of completeness. The models are characterized within the field of linearly constrained estimation by the fact that the constraint polyhedron is given as the convex hull of some fixed distributions. For the Kullback-Leibler and Hellinger models new duality relations and dual-type algorithms have been developed and the Hellinger model has been formulated as an /p-programming problem.

Summarizing our computational experiences we can say that in most cases results from the Pear- son model gave a very good approximation of results obtained by applying the Kullback-Leibler or the Hellinger model. Table 7 shows however that sometimes this is not the case and the ap- propriate model is to be selected carefully. If costs are associated with the different distributions (this is the case in practice), then significantly different costs can be obtained for the mixture by using different models.

The Pearson model can be solved very effi- ciently by Lemke's method. The reduced gradient method for I and H mixing turned out to be much slower but the result for the Pearson model can be utilized as a starting point for the iterations thus reducing the number of iterations signifi- cantly.

The results obtained for the variational and Smirnov models differ from each other and from the results from the other three models, but they can be solved effectively with the simplex method.

Acknowledgements

We would like to thank the referees of this paper for their helpful and constructive sugges- tions made on the first draft.

Page 14: Linearly constrained estimation by mathematical programming

E. Klafszky et al. / Linearly constrained estimation 267

References

[1] Borovkov, A.A., Mathematical Statistics, Nauka Publ. Co., Moscow, 1984, in Russian.

[2] Brockett, P.L., Charnes, A., and Cooper, W.W., "'MDI estimation via unconstrained convex programming", Com- munications in Statistics B - Simulation and Computation 9 (1980) 223-234.

[3] Charnes, A., Cooper, W.W., and Mellon, A., "Blending aviation gasolines--a study in programming interdepen- dent activities in an integrated oil company", Econometrica 20 (1952) 135-159.

[4] Charnes, A., and Cooper, W.W., "Constrained Kull- back-Leibler estimation; generalized Cobb-Douglas bal- ance, and unconstrained convex programming", Rendi- conti di Accademia Nazionale dei Lincei, Serie VIII LVII1/4 (1975) 568-576.

[5] Csiszhr, I., "Information divergence type measures for discrepancy between probability distributions", MTA HI Oszr Ki~zlemfnyei 17 (1967) 123-149, 267-291 (in Hungarian).

[6] Csisz~tr, I., "/-divergence geometry of probability distribu- tions and minimization problems", Annals of Probability 3 (1975) 146-158.

[7] Dantzig, G.B., Linear Programming and Extensions, Princeton University Press, Princeton, N J, 1963.

[8] Duffin, R.J., Peterson, E.L., and Zener, C., Geometric Programming: Theory and Applications, Wiley, New York, 1967.

[9] Duffin, RJ., and Peterson, E.L., "Optimization and in- sight by geometric programming", Journal of Applied Physics 60 (1986) 1860-1864.

[10] Fang, S.C., and Rajasekera, J.R., "Controlled dual per- turbations for/p-programming", Zeitschrift fiir Operations Research 30 (1986) A 29-A 42.

[11] Fang, S.C., Peterson, E.L., and Rajasekera, J.R., "Con- trolled dual perturbations for posynomial programs", European Journal of Operational Research 35 (1988) 111-117.

[12] Fang, S.C., and Rajasekera, J.R., "Quadratically con- strained minimum cross-entropy analysis", Mathematical Programming, forthcoming.

[13] Gill, P.E., Murray, W., and Wright, M.H., Practical Opti- mization, Academic Press, London, 1981.

[14] Jaynes, E.T., "Where do we stand on maximum entropy?", in: R.D. Levine and M. Tribus, The Maximum Entropy Formalism, The MIT Press, Cambridge, MA, 1979, 15-118.

[15] Klafszky, E., "Geometric programming", ,~vstems Analy- sis & Related Topics, Seminary Notes 11 (1976) Budapest.

[16] Kullback, S., Information Theory and Statistics, Wiley, 1959.

[17] Lemke, C.E., "Bimatrix equilibrium points and mathe- matical programming", Management Science 11 (1965) 681-689.

[18] Medgyessy, P., Decomposition of Superpositions of Density Functions and Discrete Distributions, Wiley, 1977.

[19] Murtagh, B.A., and Saunders, M.A., "Large scale linearly constrained optimization", Mathematical Programming 14 (1965) 41-72.

[20] Rrnyi, A.: "On the mathematical theory of comminution", bSpitOanyag (1960) 1-8 (in Hungarian).

[21] Rrnyi, A., Wahrscheinlichkeitsrechnung mit einem Anhang fiber lnformationstheorie, VEB Deutscher Verlag der Wissenschaften, Berlin, 1962 (in German).

[22] Sarma, P.V.L.N., Martens, X.M., Reklaitis, G.V., and Rijckaert, A.B., "A comparison of computational strate- gies for geometric programs", JOTA 26 (1978).

[23] Schrage, L., Linear, Integer, and Quadratic" Programming with L1NDO, The Scientific Press, 1984.

[24] Scott, C.H., and Jefferson, T.R., "A generalisation of geometric programming with an application to informa- tion theory", Information Sciences 12 (1977) 263-269.

[25] Terlaky, T.: "On /p-programming", European Journal of Operational Research 22 (1985) 70-100.

[26] Wesche, K., Baustoffe fiir tragende Bauteile, Bauverlag Gmbh., Wiesbaden, Berlin, 1974.

[27] Wolfe, P, "Algorithm for a least-distance programming problem", Mathematical Programming Stua~v 1 (1974) 190-205.

[28] Wolfe, P., "Finding the nearest point in a polytope", Mathematical Programming 11 (1976) 128-149.