-
Optimization MethodsDraft of August 26, 2005
III.Solving Linear Programsby Interior-Point Methods
Robert Fourer
Department of Industrial Engineering and Management
SciencesNorthwestern UniversityEvanston, Illinois 60208-3119,
U.S.A.
(847) 491-3151
[email protected]://www.iems.northwestern.edu/4er/
Copyright c 19892004 Robert Fourer
-
B72 Optimization Methods 9.3
-
Draft of August 26, 2005 B73
10. Essential Features
Simplex methods get to the solution of a linear program by
moving fromvertex to vertex along edges of the feasible region. It
seems reasonable thatsome better method might get to an optimum
faster by instead moving throughthe interior of the region,
directly toward the optimal point. This is not as easyas it sounds,
however.
As in other respects the low-dimensional geometry of linear
programs can bemisleading. It is convenient to think of
two-dimensional and three-dimensionalfeasible regions as being
polyhedrons that are fairly round in shape, but theseare the cases
in which a long step through the middle is easy to see and
makesgreat progress. When there are thousands or even millions of
variables, it isquite another matter to see a path through the
feasible polyhedron, whichtypically is highly elongated. One
possibility, building on the simplex method,would be to increase
from zero not one but all variables that have negativereduced
costs. No practical way has been found, however, to compute
stepsbased only on the reduced costs that tend to move through the
center of thepolyhedron toward the optimum rather than across to
boundary points that arefar from the optimum (and are not
necessarily vertex points).
The key to an effective interior-point method is to borrow a few
simple ideasfrom nonlinear optimization. In the context of linear
programming, these ideasare sufficiently elementary that we can
develop them independently. Applica-tions to general nonlinear
programming will be taken up in subsequent chap-ters.
10.1 Preliminaries
We show in this chapter how an effective interior-point method
can be de-rived from a simple idea for solving the optimality
conditions for linear pro-gramming. We consider in particular the
complementary slackness conditionsthat were derived in Part III for
primal and dual linear programs in the form
Minimize cTx Maximize bTpiSubject to Ax = b Subject to ATpi
c
x 0Complementary slackness says that x and pi are optimal
provided that theysatisfy
. Primal feasibility: Ax = b, x 0
. Dual feasibility: ATpi c
. Complementarity:Either xj = 0 or aTj pij = cj (or both), for
each j = 1, . . . , n
To make these conditions easier to work with, we begin by
writing them asequations in nonnegative variables. We treat all
vectors as column vectors.
We start by introducing a vector of slack variables, , so that
ATpi cmay be expressed equivalently by ATpi + = c and 0. In these
terms,
-
B74 Optimization Methods 10.1
ATpi = c if and only if = 0, so the complementary slackness
conditionsbecome xj = 0 or j = 0 (or both). But saying that xj = 0
or j = 0 isequivalent to saying that xj
j = 0. Thus we have the following equivalent
statement of the complementary slackness conditions: x and pi
are optimalprovided that they satisfy
. Primal feasibility: Ax = b, x 0
. Dual feasibility: ATpi + = c, 0
. Complementarity: xj j = 0 for every j = 1, . . . , n
These conditions comprise a square system ofm+2n equations in
them+2nvariables (x,pi,), plus nonnegativity of x and .
It remains to collect the equations xjj = 0 into a matrix
equation. For thispurpose, we define diagonalmatrices X and whose
only nonzero elements areXjj = xj and jj = j , respectively. For
example, for n = 4,
X =
x1 0 0 00 x2 0 00 0 x3 00 0 0 x4
and =1 0 0 00 2 0 00 0 3 00 0 0 4
.The diagonal elements of X are xjj , exactly the expressions
that must be zeroby complementary slackness, so we could express
complementary slackness as
X =x11 0 0 00 x22 0 00 0 x33 00 0 0 x44
=
0 0 0 00 0 0 00 0 0 00 0 0 0
.But this is n2 equations, of which all but n are 0 = 0. Instead
we collapse theequations to the n significant ones by writing
Xe =x11 0 0 00 x22 0 00 0 x33 00 0 0 x44
1111
=x11x22x33x44
=
0000
.where e is a vector whose elements are all 1. We will write
this as Xe = 0, withthe 0 understood as in previous chapters to
refer to a vector of zeroes.
We have now shown that solving the primal optimization problem
for xand the dual optimization problem for pi is equivalent to
solving the followingcombined system of equations and nonnegativity
restrictions:
Ax = bATpi + = cXe = 0x 0, 0
-
Draft of August 26, 2005 B75
We can regard the interior points (x, pi , ) of this system to
be those that satisfythe inequalities strictly: x > 0, > 0.
Our goal is to show how interior-pointmethods can generate a series
of such points that tend toward a solution of thelinear
program.
Diagonal matrices will prove to be convenient throughout the
developmentof interior-point methods. If F and G are matrices
having the vectors f =(f1, . . . , fn) and g = (g1, . . . , gn) on
their diagonals and zeroes elsewhere, then
. FT = F
. FG = GF is a diagonal matrix having nonzero elements fjgj
.
. F1 is a diagonal matrix having nonzero elements f1j , or 1/fj
.
. F1G = GF1 is a diagonal matrix having nonzero elements gj/fj
.
. Fe = f , and F1f = e, where e is a vector of all 1s.We write F
= diag(f ) to say that F is the diagonal matrix constructed from f
.As in these examples, we will normally use lower-case letters for
vectors and thecorresponding upper-case letters for the
corresponding diagonal matrices.
10.2 A simple interior-point method
Much as we did in the derivation of the simplex method, well
start off byassuming that we already know primal-feasible and
dual-feasible interior-pointsolutions x and (pi, ):
Ax = b, x > 0 AT pi + = c, > 0
Our goal is to find solutions x and (pi, ) such that Ax = b and
ATpi + = c, but with x and being not interior but instead
complementary:xj 0, j 0, and at least one of the two is = 0, for
each j = 1, . . . , n. Welllater show (in Section 11.2) that the
interior-point approach is easily extended tostart from any point
(x, pi , ) that has x > 0 and > 0, regardless of whetherthe
equations are initially satisfied.
We can now give an elementary explanation of the method.
Starting from afeasible, interior-point solution (x, pi , ), we
wish to find a step (x,pi,)such that (x +x, pi +pi, +) is a better
such solution, in the sense thatit comes closer to satisfying the
complementarity conditions.
To find the desired step, we substitute (x+x, pi+pi, +) into the
equa-tions for feasibility and complementarity of the solution.
Writing X = diag(x), = diag() and X = diag(x), = diag() in line
with our previousnotation, we have
A(x +x) = bAT (pi +pi)+ ( +) = c(X +X)(+)e = 0
Since were given Ax = b and AT pi + = c, these equations
simplify to
-
B76 Optimization Methods 10.2
Ax = 0ATpi + = 0X + x = XeXe
We would like to solve these m + 2n equations for the steps the
m + 2n-values but although all the terms on the left are linear in
the steps, theterm Xe on the right is nonlinear. So long as each xj
is small relative toxj and each j is small relative to j , however,
we can expect each xjjto be especially small relative to xjj . This
suggests that we can get a reason-able approximate solution for the
steps by solving the linear equations that areproduced by dropping
the Xe term from the above equations. (The sameapproach will return
in a later chapter, in a more general setting, as Newtonsmethod for
solving nonlinear equations.)
With the Xe term dropped, we can solve the third equation for ,
= X1(Xe x) = X1x,
and substitute for in the second equation to getAx = 0ATpi X1x
=
The matrix X1 requires no special work to compute; because X has
nonzeroentries only along its diagonal, so does X1, with the
entries being 1/xj .
Rearranging our equations in matrix terms, we have an (m + n) (m
+ n)equation system in the unknowns x and pi :[
X1 ATA 0
][ xpi]=[0
].
Because X1 is another diagonal matrix it has entries j/xj along
its diago-nal we can optionally solve for x,
x = X1( ATpi) = x + (X1)ATpi,and substitute for x in Ax = 0 to
arrive at anmm equation system,
A(X1)ATpi = Ax = b.The special forms of these equation systems
guarantee that they have solutionsand allow them to be solved
efficiently.
At this point we could hope to use (x + x, pi + pi, + ) as our
new,improved solution. Some of the elements of x+x and + might turn
outto be negative, however, whereas they must be positive to keep
our new solutionwithin the interior of the feasible region. To
prevent this, we instead take ournew solution to be
(x + (x), pi + (pi), + ())
-
Draft of August 26, 2005 B77
where is a positive fraction 1. Because the elements of the
vectors x and are themselves > 0, we know that x+(x) and +()
will also be strictlypositive so long as is chosen small enough.
(The vectors x and servehere as what are known as step directions
rather than whole steps, and is thestep length.)
The derivation of a formula for is much like the derivation of
the min-imum ratio criterion for the simplex method (Part II). For
each xj 0, thevalue of x + (x) can only increase along with ; but
for xj < 0, the valuedecreases. Thus we require
xj + (xj) > 0 = < xjxj for each j such that xj < 0.The
same reasoning applied to + () gives
j + (j) > 0 = < jj for each j such that j < 0.For to be
less than all of these values, it must be less than their minimum,
andso we require
1, < x = minj:xj
-
B78 Optimization Methods 10.3
Given x such that Ax = b, x > 0.Given pi , such that AT pi +
= c, > 0.Choose a step fraction 0 < < 1.Choose a
complementarity tolerance > 0.
Repeat
Solve
[X1 ATA 0
][ xpi]=[0
]and set = X1x.
Let x = minj:xj
-
Draft of August 26, 2005 B79
figure. The iterates will progress through the interior toward
the optimal vertex,(12/3,55/6).
To illustrate the computations, we add appropriate slack
variables to trans-form this problem to one of equalities in
nonnegative variables:
Minimize 2x1 + 1.5x2Subject to 12x1 + 24x2 x3 = 120
16x1 + 16x2 x4 = 12030x1 + 12x2 x5 = 120x1 + x6 = 15
x2 + x7 = 15x1, . . . , x7 0
From the figure it is clear that (x1, x2) = (10,10) is a point
near the middleof the feasible set. Substituting into the
equations, we can easily solve for thevalues of the slack variables
(x3, . . . , x7), which are all positive at any interiorpoint. Then
we have as an initial primal iterate,
x =
101024020030055
> 0.
For an initial dual iterate, the algorithm requires a pi such
that = cAT pi > 0.Writing these equations explicitly, we
have
Figure 102. The feasible region for the example of Section 10.3,
with two iterationpaths of an affine-scaling interior-point
method.
-
B80 Optimization Methods 10.3
1 = 2 12pi1 16pi2 30pi3 1pi4 > 02 = 1.5 24pi1 16pi2 12pi3
1pi5 > 03 = 0 + 1pi1 > 04 = 0 + 1pi2 > 05 = 0 + 1pi3 >
06 = 0 1pi4 > 07 = 0 1pi5 > 0
These can be satisfied by picking positive values for the first
three pi -variables,and then setting the remaining two sufficiently
negative; pi1 = pi2 = pi3 = 1 andpi4 = pi5 = 60 will do, for
example. We then have
pi = [ 1 1 1 60 60, ] , = [ 4 9.5 1 1 1 60 60 ] > 0.
In our matrix terminology, we also have
c = [ 2 1.5 0 0 0 0 0 ] ,A =
12 24 1 0 0 0 016 16 0 1 0 0 030 12 0 0 1 0 01 0 0 0 0 1 00 1 0
0 0 0 1
, b =
1201201201515
.We choose the step fraction = 0.995, which for our examples
givesreliable results that cannot be improved upon by settings
closer to 1. Finally,we specify a complementarity tolerance =
0.00001, so that the algorithm willstop when all xjj <
.00001.
To begin an iteration, the algorithm must form the linear
equation system[X1 ATA 0
][ xpi]=[0
].
In the matrixs upper left-hand corner is a diagonal block whose
entries arej/xj . Below this block is a copy of the matrix A, and
to its right is a copyof AT , with the lower right-hand block
filled out by zeros. The right-hand sideis a copy of filled out by
zeros. So the entire system to be solved at the firstiteration in
our example is
.4000 0 0 0 0 0 0 12 16 30 1 00 .9500 0 0 0 0 0 24 16 12 0 10 0
.0042 0 0 0 0 1 0 0 0 00 0 0 .0050 0 0 0 0 1 0 0 00 0 0 0 .0033 0 0
0 0 1 0 00 0 0 0 0 12 0 0 0 0 1 00 0 0 0 0 0 12 0 0 0 0 112 24 1 0
0 0 0 0 0 0 0 016 16 0 1 0 0 0 0 0 0 0 030 12 0 0 1 0 0 0 0 0 0 01
0 0 0 0 1 0 0 0 0 0 00 1 0 0 0 0 1 0 0 0 0 0
x1x2x3x4x5x6x7pi1pi2pi3pi4pi5
=
49.51116060
00000
.
-
Draft of August 26, 2005 B81
(The vertical and horizontal lines are shown only to emphasize
the matrixsblock structure; they have no mathematical
significance.)
The study of solving an equation system of this kind is a whole
topic in itself.For now, we simply report that the solution is
x =
0.10170.06582.79972.68033.84140.10170.0658
, pi =
0.98830.98660.987261.220860.7895
.
We can then set = X1x
=
49.51116060
.4000 0 0 0 0 0 00 .9500 0 0 0 0 00 0 .0042 0 0 0 00 0 0 .0050 0
0 00 0 0 0 .0033 0 00 0 0 0 0 12 00 0 0 0 0 0 12
0.10170.06582.79972.68033.84140.10170.0658
=
3.95939.43750.98830.98660.987261.220860.7895
.
The entire step vector (x,pi,) has now been computed.The
remainder of the iteration determines the length of the step. The
ra-
tio xj/(xj) is computed for each of the five xj < 0, and x is
set to thesmallest:x1 < 0 : x1/(x1) = 10/0.1017 = 98.3284x2 <
0 : x2/(x2) = 10/0.0658 = 152.0034x3 < 0 : x3/(x3) = 240/2.7997
= 85.7242x4 < 0 : x4/(x4) = 200/2.6803 = 74.6187 = xx5 < 0 :
x5/(x5) = 300/3.8414 = 78.0972The ratios j/(j) are computed in the
same way to determine :
1 < 0 : 1/(1) = 4/3.9593 = 1.01032 < 0 : 2/(2) =
9.5/9.4375 = 1.00663 < 0 : 3/(3) = 1/0.9883 = 1.01184 < 0 :
4/(4) = 1/0.9866 = 1.01365 < 0 : 5/(5) = 1/0.9872 = 1.01306 <
0 : 6/(6) = 60/61.2208 = 0.9801 = 7 < 0 : 7/(7) = 60/60.7895 =
0.9870
-
B82 Optimization Methods 10.3
The step length is thus given by
= min(1, x, )= min(1, .995 74.6187, .995 0.9801) = 0.975159.
Thus the iteration concludes with the computation of the next
iterate as
x = x + (x) =
1010
240200300
55
+ 0.975159
0.10170.06582.79972.68033.84140.10170.0658
=
9.90089.9358
237.2699197.3863296.2541
5.09925.0642
,
and similarly for pi and .Although there are 7 variables in the
form of the problem that the algorithm
works on, we can show the algorithms progress in Figure 102 by
plotting thepoints defined by the x1 and x2 components of the
iterates. In the first iteration,we have moved from (10,10) to
(9.9008,9.9358).
If we continue in the same way, then the algorithm carries out a
total of 9iterations before reaching a solution that satisfies the
stopping conditions:
iter x1 x2 max xjj0 10.0000 10.0000 300.0000001 9.9008 9.9358
0.975159 11.0583162 6.9891 9.2249 0.423990 6.7288273 3.2420 8.5423
0.527256 2.8787294 1.9835 6.6197 0.697264 1.1563415 2.0266 5.4789
0.693037 0.4860166 1.8769 5.6231 0.581321 0.1893017 1.7204 5.7796
0.841193 0.0271348 1.6683 5.8317 0.979129 0.0008369 1.6667 5.8333
0.994501 0.000004
The step length drops at iteration 2 but then climbs toward the
ideal step lengthof 1. The max xjj term steadily falls, until at
the end it has a value that is notsignificantly different from
zero.
Consider now what happens when we try a different starting
point, not sowell centered:
x =
141
72120312
114
, pi =
0.010.010.011.001.00
.
The first 10 iterations proceed as follows:
-
Draft of August 26, 2005 B83
iter x1 x2 max xjj0 14.0000 1.0000 33.8800001 13.3920 0.7515
0.386962 21.2752272 11.9960 0.7102 0.126892 18.6258183 9.7350
0.6538 0.242316 14.3014764 8.6222 0.6915 0.213267 11.4132125 8.1786
0.9107 0.197309 9.2469316 6.9463 1.5269 0.318442 6.5360697 5.0097
2.4951 0.331122 4.4671888 5.0000 2.5000 0.117760 3.9421319 5.0000
2.5000 0.217604 3.084315
10 5.0000 2.5000 0.361839 1.968291
The iterates have become stuck near the boundary, in particular
near the non-optimal vertex point (5,2.5). (The iterates indeed
appear to reach the vertexpoint, but that is only because we have
rounded the output at the 4th place.)Over the next 10 iterations
the steps make little further progress, with fallingto minuscule
values:
iter x1 x2 max xjj11 5.0000 2.5000 0.002838 1.96270612 4.9999
2.5001 0.000019 1.96266813 4.9999 2.5001 0.000011 1.96264714 4.9998
2.5002 0.000021 1.96260615 4.9996 2.5004 0.000042 1.96252516 4.9991
2.5009 0.000083 1.96236217 4.9983 2.5017 0.000165 1.96203718 4.9966
2.5034 0.000330 1.96139019 4.9932 2.5068 0.000658 1.96010020 4.9865
2.5135 0.001312 1.957527
Then finally the iterates start to move along (but slightly
interior to) the edgedefined by 16x1+16x2 120, eventually
approaching the vertex that is optimal:
iter x1 x2 max xjj21 4.9732 2.5268 0.002617 1.95240422 4.9466
2.5534 0.005216 1.94221923 4.8941 2.6059 0.010386 1.92204224 4.7909
2.7091 0.020642 1.88235025 4.5915 2.9085 0.040872 1.80535426 4.2178
3.2822 0.080335 1.66016827 3.5612 3.9388 0.155645 1.40176528 2.5521
4.9479 0.293628 0.99424529 1.6711 5.8289 0.418568 0.60326430 1.6667
5.8333 0.227067 0.46664031 1.6667 5.8333 0.398038 0.28090232 1.6667
5.8333 0.635639 0.10235033 1.6667 5.8333 0.864063 0.01391334 1.6667
5.8333 0.977201 0.00031735 1.6667 5.8333 0.994594 0.000002
Although x1 and x2 have reached their optimal values at the 30th
iteration, thealgorithm requires 5 more iterations to bring down
the maximum xjj and soto prove optimality. The step values also
remain low until near the very end.
-
B84 Optimization Methods 10.4
Figure 102 shows the paths taken in both of our examples. Its
easy tosee here that one starting point was much better centered
that the other, butfinding a well-centered starting point for a
large linear program is in general ahard problem as hard as finding
an optimal point. Thus there is no reliableway to keep the affine
scaling method from sometimes getting stuck near theboundary and
taking a large number of iterations. This difficulty motivates
acentered method that we consider next.
10.4 A centered interior-point method
To avoid the poor performance of the affine scaling approach, we
need a wayto keep the iterates away from the boundary of the
feasible region, until theybegin to approach the optimum. One very
effective way to accomplish this is tokeep the iterates near a
well-centered path to the optimum.
Given the affine scaling method, the changes necessary to
produce a such acentered method are easy to describe:
. Change the complementary slackness conditions xjj = 0 toxjj =
, where is a positive constant.
. Start with a large value of , and gradually reduce it toward 0
asthe algorithm proceeds.
We explain first how these changes affect the computations the
differencesare minor and why they have the desired centering
effect. We can then moti-vate a simple formula for choosing at each
step.
The centering steps. The modified complementary slackness
conditions rep-resent only a minor change in the equations we seek
to solve. In matrix formthey are:
Ax = bATpi + = cXe = ex 0, 0
(Since e is a vector of all ones, e is a vector whose elements
are all .) Becausenone of the terms involving variables have been
changed, the equations for thestep come out the same as before,
except with the extra e term on the right:
Ax = 0ATpi + = 0X + x = e XeXe
Dropping the X term once more, and solving the third equation to
substi-tute
= X1(e Xe x) = X1(x e),into the second, we arrive at almost the
same equation system, the only changebeing the replacement of by
X1e in the right-hand side:
-
Draft of August 26, 2005 B85
[X1 ATA 0
][ xpi]=[ X1e
0
].
(The elements of the vector X1e are /xj .) Once these equations
are solved,a step is determined as in affine scaling, after which
the centering parameter may be reduced as explained later in this
section.
Intuitively, changing xjj = 0 to xjj = tends to produce a more
centeredsolution because it encourages both xj and j to stay away
from 0, and hencethe boundary, as the algorithm proceeds. But there
is a deeper reason. Themodified complementarity conditions are the
optimality conditions for a modi-fied optimization problem.
Specifically, x and (pi, ) satisfy the conditions:
Ax = b, x 0ATpi + = c, 0xj
j = , for each j = 1, . . . , n
if and only if x is optimal for
Minimize cTx nj=1
logxj
Subject to Ax = bx 0
This is known as the log barrier problem for the linear program,
because the logterms can be viewed as forcing the optimal values xj
away from zero. Indeed,as any xj approaches zero, logxj goes to
infinity, thus ruling out any suffi-ciently small values of xj from
being part of the optimal solution. Reducing does allow the optimal
values to get closer to zero, but so long as is positivethe barrier
effect remains. (The constraints x 0 are needed only when = 0.)
A centered interior-point algorithm is thus often called a
barrier methodfor linear programming. Only interior points are
generated, but since is de-creased gradually toward 0, the iterates
can converge to an optimal solution in
Figure 103. The central path for the feasible region shown in
Figure 102.
-
B86 Optimization Methods 10.4
which some of the xj are zero. The proof is not as
mathematically elementaryas that for the simplex method, so we
delay it to a future section, along withfurther modifications that
are important to making barrier methods a practicalalternative for
solving linear programs.
Barrier methods also have an intuitive geometric interpretation.
We haveseen that as 0, the optimal solution to the barrier problem
approaches theoptimal solution to the original linear program. On
the other hand, as ,the cTx part of the barrier problems objective
becomes insignificant and the op-timal solution to the barrier
problem approaches the minimizer of
nj=1 logxj
subject to Ax = b. This latter point is known as the analytic
center; in a senseit is the best-centered point of the feasible
region.
If all of the solutions to the barrier problem for all values of
are takentogether, they form a curve or path from the analytic
center of the feasibleregion to a non-interior optimal solution.
Figure 103 depicts the central pathfor the feasible region plotted
in Figure 102.
Choice of the centering parameter. If we were to run the
interior-pointmethod with a fixed value of the barrier parameter,
it would eventually con-verge to the point (x, pi , ) on the
central path that satisfies
Ax = b, x 0AT pi + = c, 0xjj = , for each j = 1, . . . , n
We do not want the method to find such a point for any positive
, however,but to approximately follow the central path to a point
that satisfies these equa-tions for = 0. Thus rather than fixing ,
we would like in general to choosesuccessively smaller values as
the algorithm proceeds. On the other hand wedo not want to choose
too small a value too soon, as then the iterates may failto become
sufficiently centered and may exhibit the slow convergence typical
ofaffine scaling.
These considerations suggest that we take a more adaptive
approach. Giventhe current iterate (x, pi , ), we first make a
rough estimate of a value of corresponding to a nearby point on the
central path. Then, to encourage thenext iterate to lie further
along the path, we set the barrier parameter at thenext iteration
to be some fraction of our estimate.
To motivate a formula for an estimate of , we observe that, if
the currentiterate were actually on the central path, then it would
satisfy the modifiedcomplementarity conditions. For some , we would
have
x11 = x22 = . . . = xnn = The current iterate is not on the
central path, so these terms are in general alldifferent. As our
estimate, however, we can reasonably take their average,
= 1n
nj=1
xjj = xn .
Then we can take as our centering parameter at the next
iteration some fixedfraction of this estimate:
-
Draft of August 26, 2005 B87
Given x such that Ax = b, x > 0.Given pi , such that AT pi +
= c, > 0.Choose a step feasibility fraction 0 < < 1.Choose
a step complementarity fraction 0 < < 1.Choose a
complementarity tolerance > 0.
Repeat
Let = xn
.
Solve
[X1 ATA 0
][ xpi]=[ X1e
0
]and
set = X1(x e).Let x = min
j:xj
-
B88 Optimization Methods 10.5
X1e =
49.51116060
21.0714
1/101/101/2401/2001/3001/51/5
=
1.89297.39290.91220.89460.929855.785755.7857
.
The matrix and the rest of the right-hand side are as they were
before, however,so we solve
[ X1 ATA 0
][ xpi]=[ X1e
0
]=
1.89297.39290.91220.89460.929855.785755.7857
00000
.
Once this system has been solved for x and pi , the rest of the
barrier iterationis the same as an affine scaling iteration. Thus
we omit the details, and reportonly that the step length comes out
to be
= min(1, x, )= min(1, .99995 93.7075, .99995 1.0695) = 1
and the next iterate is computed as
x = x + (x) =
1010
240200300
55
+ 1.0
0.03140.05341.65761.35641.58290.03140.0534
=
10.031410.0534
241.6576201.3564301.5829
4.96864.9466
.
Whereas the first iteration of affine scaling on this example
caused x1 and x2 todecrease, the first step of the barrier method
increases them. Although the pathto the solution is different,
however, the number of iterations to optimality isstill 9:
-
Draft of August 26, 2005 B89
Figure 105. Iteration paths of the barrier interior-point
method, starting from thesame two points as in the Figure 102 plots
for the affine scaling method.
Figure 106. Detail from the plots for the (14,1) starting point
shown in Figures 102and 105. The affine scaling iterates (open
diamonds) get stuck near the sub-optimalvertex (5,2.5), while the
barrier iterates (filled circles) skip right past the vertex.
-
B90 Optimization Methods 10.5
iter x1 x2 max xjj0 10.0000 10.0000 300.0000001 10.0314 10.0534
1.000000 21.071429 24.0138532 9.1065 9.5002 0.896605 2.107143
5.5360773 1.8559 7.6205 0.840474 0.406796 1.7251534 1.6298 5.9255
0.780709 0.099085 0.6546325 1.7156 5.7910 1.000000 0.029464
0.0605676 1.6729 5.8294 1.000000 0.002946 0.0038947 1.6671 5.8332
1.000000 0.000295 0.0003048 1.6667 5.8333 1.000000 0.000029
0.0000309 1.6667 5.8333 1.000000 0.000003 0.000003
As our discussion of the barrier method would predict, the
centering parameter starts out large but quickly falls to near zero
as the optimum is approached.Also, thanks to the centering
performed at the first several steps, the step dis-tance soon
returns to its ideal value of 1.
When the barrier method is instead started from the
poorly-centered pointof Section 10.3, it requires more iterations,
but only about a third as many asaffine scaling:
iter x1 x2 max xjj0 14.0000 1.0000 33.8800001 13.0939 0.9539
0.465821 0.798571 19.3254232 7.9843 1.0080 0.481952 0.463779
10.8394973 7.0514 1.6930 0.443953 0.262612 6.5445254 4.5601 2.9400
0.479414 0.157683 3.7474615 4.4858 3.1247 0.539274 0.089647
1.8060386 2.2883 5.2847 0.543138 0.046137 0.8850097 1.6356 5.9110
0.495625 0.023584 0.5078438 1.7235 5.7945 0.971205 0.013064
0.0348609 1.6700 5.8312 1.000000 0.001645 0.002064
10 1.6669 5.8332 1.000000 0.000164 0.00016711 1.6667 5.8333
1.000000 0.000016 0.00001612 1.6667 5.8333 1.000000 0.000002
0.000002
Figure 105 plots the iteration paths taken by the barrier method
from our twostarting points. The path from the point that is not
well-centered still tends tobe near the boundary, but that is not
so surprising the first 8 iterations takea step that is .99995
times the step that would actually reach the boundary.
The important thing is that the barrier method avoids vertex
points thatare not optimal. This can be seen more clearly in Figure
106, which, for thestarting point (14,1), plots the iterates of
both methods in the neighborhoodof the vertex (5,2.5). Affine
scaling jams around this vertex, while the barrieriterations pass
it right by.
-
Draft of August 26, 2005 B91
11. Practical Refinements and Extensions
This chapter considers various improvements to interior-point
methods thathave proved to be very useful in practice:
. Taking separate primal and dual step lengths
. Starting from points that do not satisfy the constraint
equations
. Handling simple bounds implicitly
Further improvements, not discussed here, include the
computation of a predic-tor and a corrector step at each iteration
to speed convergence, and the use ofa homogeneous form of the
linear program to provide more reliable detectionof infeasible and
unbounded problems.
11.1 Separate primal and dual step lengths
To keep things simple, we have defined a single step length for
both theprimal and the dual iterates. But to gain some flexibility,
we can instead choosex + (x) as the next primal iterate and (pi
+(pi), +()) as the nextdual iterate. As before, since Ax = b and Ax
= 0, every iterate remainsprimal-feasible:
A(x + (x)) = Ax + (Ax) = b.Moreover, because AT pi + = c and
ATpi + = 0, every iterate remainsdual-feasible:
AT (pi +(pi))+ ( +()) = AT pi + +(ATpi +) = c.The different
primal and dual steps retain these properties because the primaland
dual variables are involved in separate constraints.
We now have for the primal variables, as before,
xj + (xj) > 0 = < xjxj for each j such that xj < 0.But
now for the dual variables,
j +(j) > 0 = < jj for each j such that j < 0.It follows
that
1, < x = minj:xj
-
B92 Optimization Methods 11.3
where < 1 is a chosen parameter as before.
11.2 Infeasible starting points
The assumption of an initial feasible solution has been
convenient for ourderivations, but is not essential. If Ax = b does
not hold, then the equations forthe step are still A(x +x) = b, but
they only rearrange to
Ax = b Axrather than simplifying to Ax = 0. Similarly if AT pi +
= c does not hold,then the associated equations for the step are
still AT (pi +pi)+ ( +) = c,but they only rearrange to
ATpi + = c AT pi rather than simplifying to ATpi + = 0. As was
the case in our derivationof the barrier method, however, this
generalization only changes the constantterms of the step
equations. Thus we can proceed as before to drop a xterm and
eliminate to give[
X1 ATA 0
][ xpi]=[(c AT pi) X1e
(b Ax)].
Again the matrix is the same. The only difference is the
addition of a few termsin the right-hand side. After the step
equations have been solved, the compu-tation of the step length and
the determination of a new iterate can proceed asbefore.
Methods of this kind are called, naturally, infeasible
interior-point methods.Their iterates may eventually achieve
feasibility:
. If at any iteration the step length = 1, then the new primal
iteratex + (x) is simply x + x, which satisfies A(x + x) = Ax + Ax
=b + 0 = b. Hence the next iterate is primal-feasible, after which
the algo-rithm works like the previous, feasible one and all
subsequent iterates areprimal-feasible.
. If at any iteration the step length = 1, then the new dual
iterate (pi +(pi), + ()) is simply (pi + pi, + ), which satisfies
AT (pi +pi)+ ( +) = (AT pi + )+ (ATpi +) = c + 0 = c. Hence the
nextiterate is dual-feasible, after which the algorithm works like
the previous,feasible one and all subsequent iterates are
dual-feasible.
After both a = 1 and a = 1 step have been taken, therefore, the
algorithmbehaves like a feasible interior-point method. It can be
proved however thateven if all primal step lengths < 1 or all
dual step lengths < 1 then theiterates must still converge to a
feasible point.
-
Draft of August 26, 2005 B93
11.3 Bounded variables
The logic of our previous interior-point method derivations can
be extendedstraightforwardly to the case where the variables are
subject to bounds xj uj .We work with the following primal-dual
pair, which incorporates additional dualvariables corresponding to
the bound constraints:
Minimize cTx Maximize bTpi uSubject to Ax = b Subject to piA
c
x u 0x 0
There are now two sets of complementarity conditions:
. Either xj = 0 or piaj j = cj (or both), for each j = 1, . . .
, n
. Either j = 0 or xj = uj (or both), for each j = 1, . . . ,
nWriting j for cj(piajj) and sj for ujxj , these conditions are
equivalentto xjj = 0 and jsj = 0.
Thus the primal feasibility, dual feasibility, complementarity,
and nonnega-tivity conditions for the barrier problem are
Ax = b, x + s = uATpi + = cXe = e, Se = ex 0, 0, s 0, 0
From this point the step equations is much as before. We
substitute x +x forx, s +s for s, and so forth for the other
variables; drop the terms Xe andSe; and use three of the five
equations to eliminate the slack vectors , , and s. The remaining
equations are[
X1 S1 ATA 0
][ xpi]
=[(c + AT pi) S1(u x) (X1 S1)e
(b Ax)].
These equations may look a lot more complicated, but theyre not
much morework to assemble than the ones we previously derived. In
the matrix, the onlychange is to replace X1 by X1 S1, another
diagonal matrix whoseentries are xjj + sjj . The expression for the
right-hand side vector, althoughit has a few more terms, still
involves only vectors and diagonal matrices, andso remains
inexpensive to compute.
The starting point for this extended method can be any x > 0,
> 0, s > 0, > 0. As in the infeasible method described
previously, the solution need notinitially satisfy any of the
equations. In particular, the solution may initially failto satisfy
x + s = u, and in fact x may start at a value greater then u. This
isbecause the method handles x u like any other constraint, adding
slacks tomake it just another equality in nonnegative
variables.
-
B94 Optimization Methods 11.3
As an alternative we can define an interior point to be one that
is strictlywithin all its bounds, upper as well as lower. This
means that the initial solutionhas to satisfy 0 < x < u as
well as > 0, > 0. Thus rather than treating sas an
independent variable, we can define s = u x > 0, in which case
theright-hand-side term S1(u x) = S1s = and the step equations can
besimplified to look more like the ones for the non-bounded
case:[
X1 S1 ATA 0
][ xpi]=[(c AT pi) (X1 S1)e
(b Ax)].
To proceed in this way it is necessary to pick the step length
so that 0 0 = < xjxj for each j such that xj < 0,and also
xj + (xj) < u = < u xjxj for each j such that xj > 0.To
keep less than all these values, we must choose it less than their
minimum.Adding the conditions necessary to keep + () > 0 and +
() > 0, wearrive at
1, < x = minj:xj0 u xjxj ,
1, < = minj:j