Top Banner
Computational Optimization and Applications, 25, 85–122, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. A Truncated Newton Method for the Solution of Large-Scale Inequality Constrained Minimization Problems FRANCISCO FACCHINEI [email protected] GIAMPAOLO LIUZZI [email protected] STEFANO LUCIDI [email protected] Dipartimento di Informatica e Sistemistica “A. Ruberti”, Universit ` a di Roma “La Sapienza”, Via Buonarroti 12, 00185 Roma, Italy Received December 27, 2001; Accepted September 9, 2002 Dedicated to: This paper is dedicated to Lucien Polak on the occasion of his 72nd birthday, in appreciation of his outstanding and sustained contributions to optimization. His work has been a constant source of inspiration for many researchers and it is a pleasure and an honor for us to contribute to this special issue. Abstract. A new active set Newton-type algorithm for the solution of inequality constrained minimization problems is proposed. The algorithm possesses the following favorable characteristics: (i) global convergence under mild assumptions; (ii) superlinear convergence of primal variables without strict complementarity; (iii) a Newton-type direction computed by means of a truncated conjugate gradient method. Preliminary computational results are reported to show viability of the approach in large scale problems having only a limited number of constraints. Keywords: constrained optimization, active set, Newton-type method, exact penalty function, strict comple- mentarity 1. Introduction We consider the numerical solution of the inequality constrained minimization problem minimize f (x ) s.t. g(x ) 0, (P) where f : R n R and g : R n R m are three times continuously differentiable and where we assume that second order derivatives of f and g are available. For the sake of simplicity, we do not consider equality constraints. In fact, the analysis of the equality constrained case would complicate considerably the presentation without adding much to the basic ideas we want to convey. However, equality constraints can be easily handled and indeed this is actually done in our implementation, see Section 7. Constrained nonlinear optimization is regarded as an active and challenging research field especially in the case of large scale problems. To date, a considerable number of
38

A truncated Newton method for the solution of large-scale inequality constrained minimization problems

Apr 23, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

Computational Optimization and Applications, 25, 85–122, 2003c© 2003 Kluwer Academic Publishers. Manufactured in The Netherlands.

A Truncated Newton Method for the Solutionof Large-Scale Inequality ConstrainedMinimization Problems

FRANCISCO FACCHINEI [email protected] LIUZZI [email protected] LUCIDI [email protected] di Informatica e Sistemistica “A. Ruberti”, Universita di Roma “La Sapienza”, Via Buonarroti 12,00185 Roma, Italy

Received December 27, 2001; Accepted September 9, 2002

Dedicated to: This paper is dedicated to Lucien Polak on the occasion of his 72nd birthday, in appreciation of hisoutstanding and sustained contributions to optimization. His work has been a constant source of inspiration formany researchers and it is a pleasure and an honor for us to contribute to this special issue.

Abstract. A new active set Newton-type algorithm for the solution of inequality constrained minimizationproblems is proposed. The algorithm possesses the following favorable characteristics: (i) global convergenceunder mild assumptions; (ii) superlinear convergence of primal variables without strict complementarity; (iii) aNewton-type direction computed by means of a truncated conjugate gradient method. Preliminary computationalresults are reported to show viability of the approach in large scale problems having only a limited number ofconstraints.

Keywords: constrained optimization, active set, Newton-type method, exact penalty function, strict comple-mentarity

1. Introduction

We consider the numerical solution of the inequality constrained minimization problem

minimize f (x)

s.t. g(x) ≤ 0,(P)

where f : Rn → R and g : R

n → Rm are three times continuously differentiable and

where we assume that second order derivatives of f and g are available.For the sake of simplicity, we do not consider equality constraints. In fact, the analysis

of the equality constrained case would complicate considerably the presentation withoutadding much to the basic ideas we want to convey. However, equality constraints can beeasily handled and indeed this is actually done in our implementation, see Section 7.

Constrained nonlinear optimization is regarded as an active and challenging researchfield especially in the case of large scale problems. To date, a considerable number of

Page 2: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

86 FACCHINEI, LIUZZI AND LUCIDI

solution techniques have been proposed in the literature to solve Problem (P). In particular,on the basis of these methods some efficient codes have been developed. We recall, as anexample, LANCELOT by Conn et al. [10], which is a projected augmented Lagrangianmethod with trust region step computation and direct handling of simple bound constraints.NITRO by Byrd et al. [7, 8], which is a primal-dual interior point SQP method using trustregion techniques. Byrd, Nocedal and coworkers successively improved their interior pointtrust region algorithm and came up with KNITRO [9] a stable and reliable optimizationpackage both for small and large scale constrained and unconstrained nonlinear programs.As concerns recent SQP methods we cite SNOPT by Gill, Murray and Saunders [23], whichis an SQP method making use of a smooth augmented Lagrangian merit function along witha line search step computation procedure. The filter SQP method of Fletcher and Lyffer [21,22, 38] is basically an SQP trust region method but, rather then using a merit function toenforce convergence toward a solution, it makes use of a “filter” which allows a step to beaccepted if it reduces either the objective function or a measure of the constraint violation.Finally we cite LOQO by Shanno and Vanderbei [35], which is essentially a primal-dualinterior point method. All these codes have proved [2, 5, 6, 23, 31] to be very effective insolving large scale nonlinear programs.

In this paper we are interested in solving Problem (P) when the number of constraintsis small with respect to the number of variables which is assumed to be large. Our aimis to exploit this particular aspect of the structure of such problems in order to develop anew algorithm for locating KKT points of Problem (P) which enjoys good global and localconvergence properties under reasonable assumptions. The approach proposed in this papercan be described quite easily and informally in the following way. At each iteration a searchdirection pk is calculated; then the main iteration of the algorithm has the form

xk+1 = xk + αk pk,

where the step size αk is chosen to yield a sufficient reduction of a suitable merit functionZ (x ; ε), namely an exact differentiable penalty function which ensures progress towardoptimality. By weighting the possible decrease in the objective function against the re-quirement of enforcing the feasibility of iterates, such a merit function allows us to decidewhether xk+1 is a better approximation to the solution than xk . Furthermore, the use of aproper merit function allows us both to evaluate if the search direction is able to endow thealgorithm with global convergence and, if this is not the case, to provide some alternativedescent directions.

The merit function usually depends upon a penalty parameter ε that has to be tuned duringthe optimization process. Assuming that Z (x ; ε) is differentiable, a possible, slightly moredetailed scheme is given below.

Generic line search algorithm (GLSA) for Problem (P)

0. Data: x◦ ∈ Rn, ε > 0, γ ∈ (0, 1/2), σ ∈ (0, 1).

1. Initialization. Set k := 0.2. Test for convergence. If xk is a KKT point then STOP.3. Choice of the search direction. Compute a search direction pk .

Page 3: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 87

4. Test on the penalty parameter. If ε is not sufficiently small, then reduce it and goto Step 3.

5. Line search. Compute αk = (σ )h where h is the smallest nonnegative integer suchthat

Z (xk + αk pk ; ε) ≤ Z (xk ; ε) + γαk∇Z (xk ; ε) pk,

set xk+1 = xk + αk pk, k := k + 1 and go to Step 2.

Schemes similar to GLSA can be given also in the case the merit function Z is not differ-entiable. In general there is a trade-off: the calculation of non differentiable merit functionsis very simple, while differentiable exact merit functions, tend to be more computationalexpensive; on the other hand, as can be easily expected, algorithms based on non differen-tiable functions are conceptually more involved than those which employ a differentiablemerit function.

Roughly speaking, the global convergence properties of GLSA depend on the meritfunction Z and on the search direction p, which obviously has to be a descent directionfor Z . A superlinear convergence rate depends instead on the local properties of the searchdirection p and on the step size α being equal to one eventually. We remark that, if wedo not consider the updating of the penalty parameter ε at Step 4, algorithm GLSA lookslike an algorithm for the unconstrained minimization of Z ; however, since usually thereis a correspondence between the unconstrained solutions of Z and those of Problem (P)only if ε is sufficiently small, we have to iteratively update ε. It is obviously desirablefor numerical reasons, that the penalty parameter does not go to zero and eventually staysfixed.

The exact definition of an algorithm following the scheme of GLSA depends on thespecific choices we make in Steps 3–5 and on the selection of the merit function. Exactdifferentiable penalty functions have been profitably used in the definition of globally andsuperlinearly convergent algorithms for inequality constrained nonlinear programs. In par-ticular, Di Pillo et al. [13], Facchinei [19] and Qi and Yang [34] propose SQP approachesstabilized by means of a continuously differentiable exact penalty function. Basically, theabove algorithms differs in the treatment of inconsistent quadratic subproblems. In fact,while in [13] and [19] a first order approximation to the antigradient of the exact penaltyfunction is used as an alternate direction, in [34], whenever an infeasible subproblemis encountered, the authors resort to the regularization technique proposed by Spellucci[36]. More recently, Qi and Yang [33] again employ the exact continuously differentiablepenalty function of [29] and propose an active set QP-free algorithm which is globallyand superlinearly convergent and that requires only the solution of linear systems. All theabovementioned algorithms employ a quasi-Newton approximation to the Hessian of theLagrangian.

In this paper we propose a new active set Newton-type algorithm for inequality con-strained nonlinear programs, which is globally and superlinearly convergent without re-quiring the strict complementarity condition to hold, and which requires only the solutionof linear systems. The approach that we propose is based upon the use of the exact con-tinuously differentiable penalty function introduced by Contaldi et al. [11] which allows

Page 4: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

88 FACCHINEI, LIUZZI AND LUCIDI

us to tackle problems having unbounded feasible set. Another distinguishing feature of theproposed approach is that we compute a search direction inexactly by means of an iterativetruncated conjugate gradient procedure.

In Section 2 we introduce some notation and assumptions. In Section 3 we recall themerit function Z (x ; ε) of [11] along with its main exactness properties. In Section 4 wedescribe how to compute a suitable search direction. In Section 5 we derive some propertiesof the search direction. In Section 6 we first discuss the strategy for updating the penaltyparameter ε and the choice of the step length and then we give a detailed description of thealgorithm and of its properties. Finally, in the last section of this paper we present somepreliminary numerical results on a selection of large scale problems.

2. Notations and assumptions

Given Problem (P), we denote by

F = {x ∈ Rn : g(x) ≤ 0}

its feasible set and by

I0(x) = {i : gi (x) = 0}

the set of active constraints at x . The Lagrangian function associated with Problem (P) isthe function

L(x, λ) = f (x) + λg(x).

A Kuhn-Tucker (KKT) pair for Problem (P) is a pair (x, λ) ∈ Rn × R

m such that

∇x L(x, λ) = 0, g(x)λ = 0, λ ≥ 0, g(x) ≤ 0. (2.1)

Let α ∈ R be a strictly positive constant. We consider the following open perturbation ofthe feasible set F of Problem (P):

S ={

x ∈ Rn :

m∑i=1

g+i (x)3 < α

}⊃ F, (2.2)

and we denote by S its closure, by S its interior and by ∂S its boundary. Moreover, weintroduce the function

a(x) = α −m∑

i=1

g+i (x)3, (2.3)

which takes positive values on S and is zero on its boundary.

Page 5: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 89

The global convergence property of many optimization algorithms lies on so-called “aposteriori” assumptions such as, for instance, the boundedness of certain sequences pro-duced by the algorithm itself. Unfortunately, requiring these assumptions may result into alack of reliability of optimization methods which may fail to converge even on relativelyeasy and well-posed problems (see [37]).

Having this problem in mind, we prefer to use “a priori” assumptions, that is, assumptionson the behavior of the objective function and constraints which, at least in principle, mightbe verified before the solution process begins.

In subsequent sections, we suppose the following three assumptions to be always satisfied.

Assumption A1. At least one of the two following conditions is satisfied

(a) the set S is compact.(b) There exists a known feasible point x ∈ F and f (x) is radially unbounded on S (that

is for any sequence {xk} ⊆ S such that ‖xk‖ → ∞ we have f (xk) → ∞).

Assumption A2. For every x ∈ F the gradients∇gi (x), i ∈ I0(x), are linearly independent.

Assumption A3. At least one of the two following conditions is satisfied

(a) at every point x ∈ S \ F∑i :gi (x)>0

vi (x)∇gi (x) �= 0,

where

vi (x) = gi (x) + 3

2

‖g(x)‖2gi (x)2

a(x),

and a(x) is given by (2.3).(b) There exists a known feasible point x ∈ F .

It seems to us appropriate to discuss these assumptions. First of all, we note that all of themare quite weaker than the ones commonly used in defining continuously differentiable exactmerit functions (see, e.g, [3, 14, 28]). More specifically, we see that by point (b) of A1constrained optimization problems with unbounded feasible set can be tackled, providedthat a feasible point is known and that the objective function is radially unbounded onS. Assumption A2 is an unavoidable assumption whenever we deal with exact penaltyfunctions which are not even defined if A2 is not satisfied. Nevertheless, we stress the factthat the linear independence property is required to hold only in the feasible region ratherthan in the whole space R

n . Additionally, it is worth noting the fact that Assumption A2is of great importance whenever we require the multipliers, or an estimate of them, to beunique.

Finally, we have Assumption A3. As it is readily seen, this assumption is a weakeningof the well-known generalized Mangasarian-Fromowitz constraint qualification condition

Page 6: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

90 FACCHINEI, LIUZZI AND LUCIDI

([30] and [16]) and we point out that A3 is related to the feasibility of Problem (P) andmore precisely to the behavior of the constraints outside the feasible region. In fact, as it hasalready been shown in [29], in the case of F compact and gi convex, with i = 1, . . . , m,this is a necessary and sufficient condition for the feasible set not to be empty, that is, atleast in this case, this is the weakest assumption ever possible.

Given two vectors u, v ∈ Rp we denote by max{u, v} the vector whose i-th component

is max{ui , vi }, by V the diagonal matrix V = diag{v1, . . . , vp} and by v+ the vectorv+ = max{v, 0}. In the following, by Ip we denote the p × p identity matrix and by e thevector of all 1’s of appropriate dimension.

3. The merit function

We use the differentiable exact merit function studied in [11]. The main advantage of thismerit function over other ones proposed in the literature, is that it allows us to tackleproblems with unbounded feasible set.

Another main characteristic of this merit function is that it is not defined on the wholespace but only on the open set S and that it goes to infinity on the boundary of S. From thispoint of view, this merit function can be regarded as an exterior barrier penalty function.

The basis for constructing a continuously differentiable penalty function is the definitionof a continuously differentiable multiplier function which yields an estimate of the multipliervector associated with Problem (P), namely a function λ(x) such that, if (x, λ) ∈ R

n × Rm

is a KKT pair for Problem (P), then it results λ(x) = λ. In this context we use the multiplierfunction proposed in [29]; this function is defined under Assumption A2 and its expressionis given by:

λ(x) = −M(x)−1∇g(x)∇ f (x), (3.4)

with M(x) an m × m matrix defined by:

M(x) = ∇g(x)∇g(x) + G(x)2 +m∑

i=1

g+i (x)3 Im . (3.5)

Following analogous reasoning as in [24] and [15], it is easy to check that, under AssumptionA2, the matrix M(x) given by (3.5) is nonsingular for every x ∈ R

n .We remark that the evaluation of λ(x) requires the solution of a square linear system of

order m which is not a difficult task provided that the number of constraints is small, as inour context.

On the open set S (2.2), we can define the following penalty function (see [11])

Z (x ; ε) = f (x) +m∑

i=1

[λi (x) max[gi (x), −εpi (x ; ε)λi (x)]

+ 1

2εpi (x ; ε)max[gi (x), −εpi (x ; ε)λi (x)]2], (3.6)

Page 7: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 91

where

pi (x ; ε) = a(x)

1 + 2εa(x)λi (x)2, i = 1, . . . , m.

It is clear that the main computational burden associated with the evaluation of Z (x ; ε) is theevaluation of the multiplier vector λ(x). Obviously, this cannot be considered as a difficulttask when the number of constraints m is small. However, it should also be pointed out thatmost algorithms for constrained optimization require, at each iteration, an estimate of theLagrange multipliers based on the solution of a linear system.

A detailed study of the properties of Z (x ; ε) has been carried out in [11] to which werefer the interested reader. Here we only report those properties of function Z (x ; ε) that arerelevant to our algorithm. The proofs of the following four propositions are reported in [11].

Proposition 3.1. For all x ∈ S and for all positive ε, Z (x ; ε) is continuously differentiableand its gradient is given by

∇Z (x ; ε) = ∇ f (x) + ∇g(x)λ(x) +m∑

i=1

max[gi (x), −εpi (x ; ε)λi (x)]

×[

1

εpi (x ; ε)∇gi (x) + ∇λi (x) − max[gi (x), −εpi (x ; ε)λi (x)]

2εpi (x ; ε)2∇ pi (x ; ε)

],

(3.7)

where, for i = 1, . . . , m,

∇ pi (x ; ε) = −pi (x ; ε)2

[3

a(x)2

m∑j=1

g+j (x)2∇g j (x) + 4ελi (x)∇λi (x)

], (3.8)

∇λ(x) = −M(x)−1

[∇g(x)∇2

x L(x, λ(x)) +m∑

i=1

ei∇x L(x, λ(x))∇2gi (x)

+ 2�(x)G(x)∇g(x) + 3λ(x)m∑

i=1

g+i (x)2∇gi (x)

], (3.9)

and where

∇x L(x, λ(x)) � [∇x L(x, λ)]λ=λ(x) ∇2x L(x, λ(x)) �

[∇2x L(x, λ)

]λ=λ(x).

Furthermore, for every positive ε, ∇Z (x ; ε) is locally Lipschitz continuous at every pointx ∈ S.

As regards expression (3.9), it can be derived, with minor modifications, as in [24] and [15].Now we report a more technical result which is needed to prove the global convergence

Page 8: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

92 FACCHINEI, LIUZZI AND LUCIDI

properties of our algorithm. In order to do that, we first define the following level set relativeto the open set S:

�(x ; ε) = {x ∈ S : Z (x ; ε) ≤ Z (x ; ε)},where x ∈ F , if a feasible point is known, and, in particular, when points (b) ofAssumptions A1 and A3 hold. Otherwise, if a feasible point is not available, we choosex ∈ S.

Proposition 3.2.(a) For every ε > 0 the set �(x ; ε) is closed. Furthermore, a compact subset � of S exists

such that

�(x ; ε) ⊆ �

for every ε > 0.(b) Let {εk} be a sequence of positive numbers converging to zero and let {xk} be a sequence

of points such that xk ∈ �(x ; εk). If

limk→∞

εk∇Z (xk ; εk) = 0,

then every limit point x of {xk} belongs to F .(c) Let x ∈ F . Then, there exist numbers ε(x) > 0, η(x) > 0 and ρ(x) > 0 such that, for

all ε ∈ (0, ε(x)] and for all x ∈ S satisfying ‖x − x‖ ≤ η(x), the following formulaholds:

ε2‖∇g(x)∇Z (x ; ε)‖2 ≥ ρ(x)‖max[g(x), −εP(x ; ε)λ(x)]‖2.

In the next two propositions we briefly report the main exactness properties [11] of the exactpenalty function Z (x ; ε).

Proposition 3.3. For every ε > 0,

(a) if (x, λ) is a KKT pair of Problem (P) then it results λ(x) = λ and

∇Z (x ; ε) = 0;

(b) if x ∈ S is a point such that

∇Z (x ; ε) = 0 and max[g(x), −εP(x ; ε)λ(x)] = 0,

then (x, λ(x)) is a KKT pair of Problem (P).

Proposition 3.4. There exists a value ε > 0 such that for all ε ∈ (0, ε],(a) if x ∈ �(x ; ε) is a stationary point of Z (x ; ε), the pair (x, λ(x)) is a KKT pair of

Problem (P);(b) if x ∈ �(x ; ε) is a global minimum point of Z (x ; ε), then x is a global minimum point

of Problem (P) and λ(x) is its corresponding KKT multiplier, and conversely.

Page 9: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 93

4. The search direction

We use two directions in our algorithm. The first one, indicated by dk , is a truncatedNewton-type direction which should be regarded as the standard choice. The second one isthe antigradient of the merit function Z ; this latter direction is used when dk is not definedor is, in some suitable technical sense to be defined later, unsuitable. There is not much tosay about the antigradient of the merit function, except that it can be easily calculated andthat its evaluation does not require costly calculations, except for the solution of the linearsystem involved in the definition of λ(x) that, however, is already carried out in evaluatingthe value of Z and need not be repeated when evaluating its gradient.

Then, we pass to the description of the truncated Newton-type direction. A basic ingre-dient in the definition of dk is an estimate of the active set at a solution point. We use thefollowing estimate

A(xk) = {i : gi (xk) + εpi (x

k ; ε)λi (xk) ≥ 0}. (4.10)

The definition of A(xk) (which will also be denoted by Ak in the following) may lookawkward at first sight, but it is obviously related to the max terms in the definition of thepenalty function Z (x ; ε). In any case, taking into account that every pi (x ; ε) is a positivefunction bounded away from 0 around a KKT point x , and recalling that λ(x) = λ, it iseasy to check, see [20] for details, that

I+(x) ⊆ A(x) ⊆ I0(x), ∀x ∈ B(x), (4.11)

where

I+(x) = {i : gi (x) = 0 and λi > 0}, (4.12)

and B(x) is a suitably small neighborhood of x .The search direction dk is obtained by approximately solving the following system (see

e.g. [20] and [17]):[∇2

x L(xk, λ(xk)) ∇gAk (xk)

∇gAk (xk) 0

] [dk

zk

]= −

[∇ f (xk)

gAk (xk)

], (4.13)

which, omitting the arguments, can also be rewritten as[ ∇2x Lk ∇gk

Ak(∇gkAk

)0

] [dk

zk − λkAk

]= −

[∇ f k + ∇gk

Ak λkAk

gkAk

]. (4.14)

In the sequel, for the sake of simplicity, we shall use the following assumption:

Assumption B1. For every k the matrix ∇gAk (xk) has full column rank.

We stress that Assumption B1 is not an additional assumption and will never be invokedas such in establishing the properties of the algorithm. Assumption A2 and property (4.11)

Page 10: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

94 FACCHINEI, LIUZZI AND LUCIDI

imply that Assumption B1 above holds in a sufficiently small neighborhood of a KKT point;and this is all we really need. However, in this section and in the next one, for simplicitywe will make reference to Assumption B1 when establishing some intermediate results.

As suggested in [7], and [8] (see also [17]), we compute a truncated solution of system(4.13) by decomposing the search direction dk into a vertical step dk

v ∈ R(∇gAk (xk)) and ahorizontal step dk

o ∈ N (∇gAk (xk)), namely:

dk = dkv + dk

o , (4.15)

with dkv

dko = 0 and ∇gAk (xk)dk = ∇gAk (xk)dk

v .Then we define the following projection operator Pk : R

n → N (∇gAk (xk)):

Pk = I − ∇gAk (xk)(∇gAk (xk)∇gAk (xk))−1∇gAk (xk), (4.16)

using which we can break system (4.13) into the following three systems of equations:

Pk(∇2

x L(xk, λ(xk))(dk

o + dkv

) + ∇gAk (xk)zk + ∇ f (xk)) = 0, (4.17)

(I − Pk)(∇2

x L(xk, λ(xk))(dk

o + dkv

) + ∇gAk (xk)zk + ∇ f (xk)) = 0, (4.18)

∇gAk (xk)dkv = −gAk (xk). (4.19)

Hence, we obtain dkv as the minimum norm solution of Eq. (4.19). Therefore, taking into

account Assumption B1, we have that dkv is given by

dkv = −∇gAk (xk)(∇gAk (xk)∇gAk (xk))−1gAk (xk). (4.20)

Once obtained the vertical step dkv of the search direction, we compute the horizontal step

dko solving Eq. (4.17) which, taking into account that Pk(∇gAk (xk)zk) = 0, can be rewrit-

ten as

Pk∇2x L(xk, λ(xk))dk

o = −Pk(∇2

x L(xk, λ(xk))dkv + ∇ f (xk)

). (4.21)

System (4.21), which constitutes a major difficulty in the computation of the direction dk (incase m � n), can be solved by means of a truncated Newton scheme based on a projectedconjugate gradient method which we shall further investigate in the sequel, and that producesan approximated solution for the horizontal component dk

o of the search direction. Finally,having calculated the complete direction dk we can compute zk , which will be used in thedefinition of algorithm TNA in Section 6, by means of the following formula

zk = −(∇gAk (xk)∇gAk (xk))−1∇gAk (xk)(∇2

x L(xk, λ(xk))dk + ∇ f (xk)). (4.22)

Vector zk , thus calculated, exactly solves system (4.18).Let us further analyze the calculation of the horizontal component of the search direction

dko . Since any do ∈ N (∇gAk (xk)) can be written as do = Pk y, with y ∈ Rn , the horizontal

step dko can be obtained by computing a solution yk of the following system:

Pk∇2x L(xk, λ(xk))Pk y = −Pk

(∇2x L(xk, λ(xk))dk

v + ∇ f (xk)), (4.23)

Page 11: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 95

and then setting dko = Pk yk . In the case of large scale problems, instead of exactly solving

system (4.23), which frequently is an expensive task, it is possible to efficiently computea truncated solution by using some conjugate gradient based method. Similarly to precon-ditioned conjugate gradient algorithms, all the computations for solving system (4.23) canbe directly referred to the original vector do. In this way it is possible to derive an iterativeprocedure that using the same computations of a conjugate gradient method, except for anadditional projection per iteration, approximately computes the horizontal step dk

o .Such an iterative procedure has been first proposed in [7]. In this paper we employ the

projected CG method introduced in [17] and which is reported below

PCG algorithm:

Data: ηk > 0, c1 ∈ (0, 1), c2 ∈ (0, 1). Define tolk = 1k ‖Pk(∇2

x L(xk, λ(xk))dkv + ∇ f (xk))‖.

Step 0: Set t◦ = 0, θ◦ = 0, r◦ = −Pk(∇2x L(xk, λ(xk))dk

v + ∇ f (xk)), r◦ = r◦,s◦ = r◦, and i = 0.

Step 1: If |(si )∇2x Lksi | ≤ c1‖si‖2 then set:

dko =

{r◦ if i = 0;

θ i if i > 0.

and STOP.Step 2: Compute: ρi = (si )r i

(si )∇2x Lk si , t i+1 = t i + ρi si , r i+1 = r i − ρi∇2

x Lksi ,

r i+1 = Pkr i+1.

θ i+1 ={θ i − ρi si if (si )∇2

x Lksi < −c1‖si‖2;

θ i + ρi si if (si )∇2x Lksi > c1‖si‖2.

Step 3: If ‖r i+1‖ > tolk then compute: β i = (r i+1)r i+1

(r i )r i , si+1 = r i+1 + β i si , set i := i + 1and go to Step 1; else

Step 4:

t ={

t i+1 if −(r◦)t i+1 ≤ 0;

−t i+1 if −(r◦)t i+1 > 0.

If |(r◦)t | ≥ c2‖r◦‖2 then set dko = t , else set dk

o = θ i+1.STOP.

The above procedure derives by the application of the truncated scheme proposed in [25]to the solution of system (4.21).

Algorithm PCG has the following distinctive properties

– it is a projected conjugate gradient method in the sense that it generates conjugate direc-tions lying in the null space of the transpose gradients of the active constraints;

Page 12: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

96 FACCHINEI, LIUZZI AND LUCIDI

– if no conjugate direction is encountered with curvature too small in absolute value itterminates producing a direction satisfying system (4.21) within a prefixed tolerancetolk .

– if the truncated direction (t) produced by the algorithm does not satisfies a sufficient anglecondition with the initial residual Pk(∇2

x L(xk, λ(xk))dkv +∇ f (xk)) then the direction t is

replaced by a perturbation (θ ) obtained by reversing those conjugate directions having anegative curvature; in this way a direction dk

o is always produced which guarantees “goodproperties” to the global direction dk (4.15) as we shall clarify in the following section.

We stress the fact that all the computations required to come up with the search directiondk , namely the computation of the vertical step dk

v and the projections necessary to executeStep 2 of PCG algorithm, can be carried out by means of a unique matrix factorization asdescribed in the appendix.

5. Properties of the search direction

In this section we study some important properties of the search direction dk given by

dk = dko + dk

v ,

where dkv solves system (4.19) and dk

o is the truncated direction computed by means ofalgorithm PCG. This section is highly technical and the properties we establish here are thekey to the study of our algorithm. The main results in the section are Propositions 5.5, 5.6and 5.7.

Proposition 5.5 essentially says that the quantity

‖dk‖ + ‖zk − λAk (xk)‖ + ‖λN k (xk)‖

is a stationarity measure in that it gauges whether we are approaching a stationary point.Here N k denotes the complement of Ak , that is,

N k = {1, . . . , m} \ Ak .

Proposition 5.6 shows that, if the penalty parameter ε is chosen sufficiently small, thenthe direction dk satisfies an “angle condition” which guarantees that the above stationaritymeasure is driven to zero by the algorithm to be described in the next section.

Finally, Proposition 5.7 establishes that the “local” iteration xk+1 = xk + dk is super-linearly convergent under suitable assumptions. The superlinear convergence of the globalalgorithm will obviously hinge on this property.

The rest of the results in this section are used to establish the above main properties.

Proposition 5.1. Let {xk} be a bounded sequence and dko be computed by means of the

PCG algorithm. Then, positive constants ρ1 and ρ2 exist, which do not depend on the

Page 13: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 97

iteration index k, such that

(dk

o

)Pk(∇2

x L(xk, λk)dkv + ∇ f (xk)

) ≤ −ρ1

∥∥Pk(∇2

x L(xk, λk)dkv + ∇ f (xk)

)∥∥2

(5.24)∥∥dko

∥∥ ≤ ρ2

∥∥Pk(∇2

x L(xk, λk)dkv + ∇ f (xk)

)∥∥.

(5.25)

Proof: Let us set

Pk∇2x L(xk, λ(xk))Pk = H and Pk

(∇2x L(xk, λ(xk))dk

v + ∇ f (xk)) = g

in system (4.23) and consider that, by the Projection Theorem, ‖Pk y‖ ≤ ‖y‖, then theproof easily follows from the results of Theorem 2.2 of Ref. [25].

The above proposition states an important technical result. It is an extension to the con-strained case of the results reported in [25] and guarantees that the direction dk

o producedby algorithm PCG is related to the right hand side of system (4.21).

This connection is able to guarantee that the complete direction dk is, roughly speaking,a gradient related direction for the penalty function (see Proposition 5.6 below). The fol-lowing proposition states another quite technical result that will be used when proving theabovementioned connection.

Proposition 5.2. Let {xk} be a bounded sequence and dko be computed by means of the

PCG algorithm. Then a positive constant ρ1 exists, which does not depend on the iterationindex k, such that

‖Pk(∇ f (xk) + ∇gAk (xk)zk)‖ ≤ 1

ρ1

∥∥dko

∥∥ + ∥∥Pk∇2x L(xk, λ(xk))dk

v

∥∥. (5.26)

Proof: By Proposition 5.1, dko satisfies relation (5.24) from which it follows that

∣∣(dko

)P(∇2

x L(xk, λ(xk))dkv + ∇ f (xk)

)∣∣ ≥ ρ1

∥∥Pk(∇2

x L(xk, λ(xk))dkv + ∇ f (xk)

)∥∥2,

which, employing the Schwarz inequality, yields

1

ρ1

∥∥dko

∥∥ ≥ ∥∥Pk(∇2

x L(xk, λ(xk))dkv + ∇ f (xk)

)∥∥.

The proposition is finally proved by noting that Pk(∇gAk (xk)zk) = 0.

It is possible to relate the gradient of the Lagrangian function to the quantities ‖dk‖, ‖zk −λAk (xk)‖ and ‖λN k (xk)‖, as shown by the following proposition.

Page 14: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

98 FACCHINEI, LIUZZI AND LUCIDI

Proposition 5.3. Suppose that Assumption B1 hold. Let {xk} be a bounded sequence, zk

satisfy (4.18) and dko be computed by means of the PCG algorithm. Then, a positive constant

ρ, not depending on k, exists such that

‖∇x L(xk, λ(xk))‖ ≤ ρ(‖dk‖ + ‖λN k (xk)‖ + ‖zk − λAk (xk)‖). (5.27)

Proof: By Proposition 5.2, dko satisfies (5.26). Using Eq. (4.18) and the properties of the

projection operator Pk , it results

‖∇x L(xk, λ(xk))‖ ≤ ‖Pk(∇ f (xk) + ∇gAk (xk)λAk (xk))‖+ ‖(I − Pk)(∇ f (xk) + ∇gAk (xk)zk)‖+ ‖(I − Pk)∇gAk (xk)‖‖zk − λAk (xk)‖+ ‖∇gN k (xk)‖‖λN k (xk)‖

≤ 1

ρ1

∥∥dko

∥∥ + ∥∥Pk∇2x L(xk, λ(xk))

∥∥∥∥dkv

∥∥+ ∥∥(I − Pk)∇2

x L(xk, λ(xk))∥∥‖dk‖ + ‖∇gN k (xk)‖‖λN k (xk)‖

+ ‖(I − Pk)∇gAk (xk)‖‖zk − λAk (xk)‖.

The proof easily follows by the continuity assumption and by taking into account that thesequence {xk} is bounded.

With these preliminary properties, it is now possible for us to state the following twopropositions regarding the overall solution (dk zk) of system (4.13).

Proposition 5.4. Suppose that Assumption B1 hold and, for every positive value of ε, let{xk}, {dk}, {zk}, be sequences such that:(i) {xk} and {zk} are bounded;

(ii) for every k, zk satisfies (4.18) and the direction dk is given by (4.15), where dkv satisfies

(4.19) and dko is computed by means of the PCG algorithm.

Then, there exist positive constants c1 and c2 that are not dependent on ε and k, and suchthat

‖λAk (xk) − zk‖2 + (1 − ε2c1)‖λN k (xk)‖2 ≤ c2‖dk‖2.

Proof: By the definition of the multiplier function λ(x) we have:

M(xk)

(λAk (xk) − zk

λN k (xk)

)+

[∇g(xk)∇gAk (xk) +

(G Ak (xk)2

0

)

+(∑m

i=1 g+i (xk)3 IAk

0

) ]zk = −∇g(xk) f (xk), (5.28)

Page 15: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 99

which implies

M(xk)

(λAk (xk) − zk

λN k (xk)

)= −∇g(xk)( f (xk) + ∇gAk (xk)zk)

+[

G Ak (xk)Zk

0

]∇gAk (xk)dk −

[zkeG+(xk)2

0

]g+(xk).

(5.29)

By using (5.29) we obtain:

γmin(M(xk))2(‖λAk (xk) − zk‖2 + ‖λN k (xk)‖2)

≤ ‖∇g(xk)(∇ f k + ∇gAk (xk)zk)‖2 + ∥∥Qk1

∥∥2‖dk‖2 + ∥∥Qk2

∥∥2‖g+(xk)‖2, (5.30)

where γmin(M(xk)) is the minimum eigenvalue of the matrix M(xk) and

Qk1 =

[G Ak (xk)Zk

0

]∇gAk (xk),

Qk2 =

[zkeG+(xk)2

0

].

(5.31)

By the preceeding definition we have that

∥∥Qk1

∥∥2 ≤ ‖g(xk)‖2‖zk‖2‖∇g(xk)‖2, (5.32)∥∥Qk2

∥∥2 ≤ ‖zk‖2‖g+(xk)‖4. (5.33)

Now, using (4.18) and property (5.26) of the horizontal step dko , and omitting the arguments

to simplify the notation, we have:

∥∥∇gk(∇ f k + ∇gkAk zk

)∥∥2 ≤ ‖∇gk‖2(∥∥Pk

(∇ f k + ∇gkAk zk

)∥∥2

+ ∥∥(I − Pk)(∇ f k + ∇gk

Ak zk)∥∥2)

≤ ‖∇gk‖2

[(1

ρ1

∥∥dko

∥∥ + ∥∥Pk∇2x Lkdk

v

∥∥)2

+ ∥∥(I − Pk)∇2x Ldk

∥∥2]. (5.34)

By recalling the properties of the max function and Eq. (4.19), we have:

‖g+(xk)‖2 ≤ ‖max[g(xk), −εP(xk ; ε)λ(xk)]‖2

≤ ‖gAk (xk)‖2 + ε2‖pN k (xk ; ε)‖2‖λN k (xk)‖2

≤ ‖∇gAk (xk)dk‖2 + ε2‖pN k (xk ; ε)‖2‖λN k (xk)‖2. (5.35)

Page 16: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

100 FACCHINEI, LIUZZI AND LUCIDI

By using (5.34) and (5.35) in Eq. (5.30) we obtain

γmin(M(xk))2(‖λAk (xk) − zk‖2 + ‖λN k (xk)‖2)

≤ ‖∇gk(xk)‖2

[(1

ρ1

∥∥dko

∥∥ + ∥∥Pk∇2x L(xk, λ(xk))dk

v

∥∥)2

+ ∥∥(I − Pk)∇2x L(xk, λ(xk))dk

∥∥2]

+ ∥∥Qk1

∥∥2‖dk‖2 + ∥∥Qk2

∥∥2(∥∥∇gAk dk

∥∥2 + ε2‖pN k (xk ; ε)‖2‖λN k (xk)‖2).

(5.36)

The thesis easily follows from Eq. (5.36), the continuity assumption and the boundednesshypothesis on the sequences {xk} and {zk}.

Proposition 5.5. Let ε be any fixed positive scalar and let {xk} be a bounded sequence.(a) Suppose that Assumption B1 hold. Let zk satisfy (4.18) and dk be given by (4.15),

where dkv satisfy (4.19) and dk

o is computed by means of the PCG algorithm. Assume,moreover, that

limk→∞

(‖dk‖ + ‖zk − λAk (xk)‖ + ‖λN k (xk)‖) = 0,

then every accumulation point x of {xk} is a KKT point.(b) If every accumulation point x of {xk} is a KKT point, then eventually the matrices

∇gAk (xk) have full rank, direction dk and vector zk, given by (4.15) and (4.22), respec-tively, where the components dk

o , dkv verify (4.20), (5.24) and (5.25), can be defined

and satisfy the following limit

limk→∞

(‖dk‖ + ‖zk − λAk (xk)‖ + ‖λN k (xk)‖) = 0.

Proof:

(a) If x is an accumulation point of the sequence {xk} of generated points, and if we takeinto account that the number of subsets of {1, . . . , m} is finite and, hence, so is thenumber of possible estimates Ak and N k , we can extract a subsequence (which, withoutany loss of generality, we rename {xk}) such that

xk → x,

Ak = A(x), N k = N (x), for all k.

Now, it follows from the conditions of (a) that dk → 0, zk → λA and

λN = 0, (5.37)

then, passing to the limit in (4.19) and (5.27) and considering that λ(xk) → λ, we obtain

gA(x) = 0, (5.38)

∇x L(x, λ) = 0. (5.39)

Page 17: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 101

Recalling the definitions of index sets Ak and N k we get

gN (x) ≤ 0, (5.40)

λA ≥ 0. (5.41)

Hence, (5.37)–(5.41) imply that the accumulation point x is a KKT point.(b) Since every accumulation point is a KKT point, by property (4.11) and taking into

account Assumption A2 and the continuity assumption, we have that, for k sufficientlylarge, the matrices ∇gAk (xk) have full column rank, that is, eventually Assumption B1is satisfied. Therefore, we can compute dk

v and zk by means of formulas (4.20) and(4.22), respectively. Furthermore, let dk

o be computed by means of algorithm PCG.We now proceed by contradiction. Let us suppose that

limk→∞

(‖dk‖ + ‖zk − λAk (xk)‖ + ‖λN k (xk)‖) = 0,

does not hold. If this was the case we could extract subsequences, that we rename again{xk}, {dk}, and {zk} such that

limk→∞

xk = x,

limk→∞

λ(xk) = λ(x) = λ, (5.42)

limk→∞

(‖dk‖ + ‖zk − λAk (xk)‖ + ‖λN k (xk)‖) > 0,

where (x, λ) is a KKT pair. Since the number of possibly different estimates Ak andN k is finite, also in this case we assume, without loss of generality, that Ak = A(x) andN k = N (x) for every k. Furthermore, for k sufficiently large by (4.11) we have:

A(x) ⊆ I0(x),

N (x) ⊆ {I : λi = 0}. (5.43)

Now, considering that for k sufficiently large ∇gi (xk), for i ∈ A(x), are linearly inde-pendent and recalling expressions (4.20), (4.22) and property (5.25) of the horizontalstep dk

o , we can extract subsequences {dk} and {zk} such that dk → d and zk → z.From (5.43) we get

gA(xk) → 0, λN (xk) → 0. (5.44)

This and Eq. (4.20) yield dkv → 0. Now, recalling that

– by hypothesis, xk → x that is a KKT point,– the term Pk(∇x L(xk, λ(xk)) − ∇gN (xk)λN (xk)) tends to zero, due to (5.44),– Pk(∇ f (xk)) = Pk(∇x L(xk, λ(xk)) − ∇gA(xk)λA(xk) − ∇gN (xk)λN (xk)) tends to

zero,

we get, using property (5.25) of the horizontal step dko , that also dk

o tends to zero. Hence,dk

o + dkv = dk → 0.

Page 18: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

102 FACCHINEI, LIUZZI AND LUCIDI

Considering that, by expression (4.16), (I −Pk)∇gA(xk) = ∇gA(xk), and taking thelimit of the following equation

(I − Pk)(∇2

x Lkdk + ∇gkA

(zk − λk

A

) − ∇gkN λk

N + ∇x Lk) = 0,

where we have omitted the arguments, we obtain (zk − λA(xk)) → 0. Therefore, weget

‖dk‖ + ‖(zk − λA(xk))‖ + ‖λN (xk)‖ → 0,

which contradicts assumption (5.42) above.

The following proposition finally relates the truncated Newton-type direction to the meritfunction Z (x ; ε)

Proposition 5.6. Suppose that Assumption B1 hold and let {εk}, {xk}, {dk}, {zk} be se-quences such that:

(i) limk→∞ εk = 0;(ii) xk ∈ �(x ; εk);

(iii) for every k, dk and zk are given by (4.15) and (4.22), respectively, where dkv is given

by (4.20) and dko is computed by means of algorithm PCG;

(iv) the sequences {dk}, {zk} are bounded;Then k and γ > 0 exist such that for every k ≥ k we have:

∇Z (xk ; εk)dk ≤ −γ (‖dk‖2 + ‖zk − λAk (xk)‖2 + ‖λN k (xk)‖2).

Proof: First of all we note that by Proposition 3.2(a) and assumption (ii) we have thatthe sequence {xk} is bounded. Then the proof follows by contradiction. Therefore, assumethere exist sequences {xk}, {εk}, {γ k}, {dk} and {zk} such that

xk → x, εk ↓ 0, γ k ↓ 0,

∇Z (xk ; εk)dk > −γ k(‖dk‖2 + ‖zk − λAk (xk)‖2 + ‖λN k (xk)‖2).(5.45)

Furthermore, if we take into account that the number of possible different estimates Ak

and N k is finite, we can extract subsequences, that without loss of generality we rename{xk}, {εk}, {γ k}, {dk} and {zk}, such that

Ak = A(x), N k = N (x), for all k.

Now from (3.7) and (4.13), omitting the arguments to simplify the notation, we have

∇Zkdk = [

Pk(∇ f k + ∇gk

AλkA

)]dk + [

(I − Pk)(∇ f k + ∇gk

AλkA

)]dk

+ λkN

∇gkN

dk +

∑i∈A

[gk

i

εk pki

∇gki

dk + gki ∇λk

i

dk −(gk

i

)2

2εk(

pki

)2 ∇ pki

dk

]

Page 19: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 103

+∑i∈N

[−λk

i ∇gki

dk − εk pki λ

ki ∇λk

i

dk + εk

2

(λk

i

)2∇ pki

dk

]

= [Pk

(∇ f k + ∇gkAλk

A

)]dk + [

(I − Pk)(∇ f k + ∇gk

AλkA

)]dk

+∑i∈A

[gk

i

εk pki

∇gki

dk + gki ∇λk

i

dk −(gk

i

)2

2εk(

pki

)2 ∇ pki

dk

]

+∑i∈N

[−εk pk

i λki ∇λk

i

dk + εk

2

(λk

i

)2∇ pki

dk

]. (5.46)

By Proposition 5.1, dko satisfies (5.24) and (5.25) hence we get

(dk

o

)Pk(∇2

x Lkdkv + ∇ f k) ≤ −ρ1

ρ22

∥∥dko

∥∥2,

which, by taking into account that Pk∇gkAλk

A = 0 and Pkdko = Pkdk , is equivalent to

[Pk

(∇ f k + ∇gkAλk

A

)]dk ≤ −ρ1

ρ22

∥∥dko

∥∥2 − (dk

o

)Pk∇2x Lkdk

v .

Hence, we obtain

[Pk

(∇ f k + ∇gkAλk

A

)]dk ≤ −ρ1

ρ22

∥∥dko

∥∥2 + ‖Pk‖∥∥∇2x Lk

∥∥∥∥dko

∥∥∥∥dkv

∥∥. (5.47)

Moreover, by adding and subtracting ∇gkAλk

A in (4.18) we get

(I − Pk)(∇ f k + ∇gk

AλkA

) = −(I − Pk)(∇2

x Lkdk + ∇gkA

(zk − λk

A

)). (5.48)

Now, by considering that (I −Pk)dk = (I −Pk)dkv and that, by Proposition 5.4, ‖zk −

λkA‖ ≤ c2‖dk‖, from (5.48) we obtain

[(I − Pk)

(∇ f k + ∇gkAλk

A

)]dk ≤ ∥∥dk

v

∥∥‖I − Pk‖(∥∥∇2x Lk

∥∥ + c2‖∇gAk ‖)‖dk‖.(5.49)

By recalling (3.8), we have for all i = 1, . . . , m:

− 1

2εk pi (xk ; εk)2∇ pi (x

k ; εk)dk

= 2λi (xk)∇λi (x

k)dk + 3

2εka(xk)2

∑{ j :g j (xk )>0}

g j (xk)2∇g j (x

k)dk,(5.50)

and, since for sufficiently small values of εk it holds that

{ j : g j (xk) > 0} ⊆ A(xk),

Page 20: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

104 FACCHINEI, LIUZZI AND LUCIDI

we have that, for sufficiently large values of k, (5.50) becomes

−∇ pi (xk ; εk)dk

2εk pi (xk ; εk)2= −

∑{ j :g j (xk )>0}

3g j (xk)3

2εka(xk)2+ 2λi (x

k)∇λi (xk)dk

≤ 2λi (xk)∇λi (x

k)dk, i = 1, . . . , m. (5.51)

From (5.45), (5.46) and (5.47), (5.49), (5.50) above, we obtain

0 < γ k(‖dk‖2 + ‖zk − λAk (xk)‖2 + ‖λN k (xk)‖2)

− ρ1

ρ22

∥∥dko

∥∥2 + ‖Pk‖∥∥∇2x Lk

∥∥∥∥dko

∥∥∥∥dkv

∥∥ − 1

εkgk

A(

PkA

)−1gk

A

+‖I − Pk‖(∥∥∇2x Lk

∥∥ + c2

∥∥∇gkA

∥∥)∥∥dkv

∥∥2 + εk∥∥λk

N

∥∥∥∥pkN

∥∥∥∥∇λkN

∥∥‖dk‖+ ‖I − Pk‖(∥∥∇2

x Lk∥∥ + c2

∥∥∇gkA

∥∥)∥∥dkv

∥∥∥∥dko

∥∥ + ∥∥∇λkA

∥∥‖dk‖∥∥gkA

∥∥+ εk

2

∥∥λkN

∥∥2∥∥∇ pkN

∥∥‖dk‖ + 2∥∥gk

A

∥∥2∥∥λkA

∥∥∥∥∇λkA

∥∥‖dk‖. (5.52)

Now, taking limits in (5.52) and taking into account the boundedness of sequences {xk}, {dk}and {1/pi (xk ; εk)}, considering that the sequence {εk} goes to zero, and by using the conti-nuity assumption on λ(x), ∇λ(x), pi (x ; ε) and ∇ pi (x ; ε) we can derive that:

limk→∞

gA(xk) = gA(x) = 0. (5.53)

This limit shows that the index set A(x) is such that:

A(x) ⊆ I0(x). (5.54)

Recalling the definition of the index set N (xk) and the boundedness of the sequence {xk}we obtain:

limk→∞

gN (xk) = gN (x) ≤ 0. (5.55)

Hence, (5.53) and (5.55) together imply that x ∈ F .Applying Proposition 5.4, we know that a constant c2 exists such that

∥∥zk − λkA

∥∥ ≤ c2‖dk‖, ∥∥λkN

∥∥ ≤ c2‖dk‖.

Taking into account the above inequalities, we can thus rewrite (5.52) in such a way to have

0 < γ k c‖dk‖2 − ρ1

ρ22

∥∥dko

∥∥2 + ‖Pk‖∥∥∇2x Lk

∥∥∥∥dko

∥∥∥∥dkv

∥∥ + ∥∥∇λkA

∥∥‖dk‖∥∥gkA

∥∥+ (‖I − Pk‖(∥∥∇2

x Lk∥∥ + c2

∥∥∇gkA

∥∥)∥∥dkv

∥∥)(∥∥dkv

∥∥ + ∥∥dko

∥∥)

Page 21: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 105

− 1

εkgk

A(

PkA

)−1gk

A + εk c2

∥∥pkN

∥∥∥∥∇λkN

∥∥‖dk‖2

+ εk

2c2

∥∥λkN

∥∥∥∥∇ pkN

∥∥‖dk‖2 + 2∥∥gk

A

∥∥2∥∥λkA

∥∥‖∇λA‖‖dk‖.

Considering that the above formula can be rewritten in the following way

0 < γ k c‖dk‖2 − ρ1

ρ22

∥∥dko

∥∥2 + ‖I − Pk‖(∥∥∇2x Lk

∥∥ + c2

∥∥∇gkA

∥∥)∥∥dkv

∥∥2

+ [(‖Pk‖ + ‖I − Pk‖)

∥∥∇2x Lk

∥∥ + c2‖I − Pk‖∥∥∇gkA

∥∥]∥∥dko

∥∥∥∥dkv

∥∥− 1

εkgk

A(

PkA

)−1gk

A + c3

∥∥gkA

∥∥‖dk‖,

and remembering that dkv is given by Eq. (4.20) which relates it to gAk , we can write

0 < −(

ρ1

ρ22

− γ k c

)∥∥dko

∥∥2 + 2c4

∥∥dko

∥∥∥∥gkA

∥∥ −(

c5

εk− c6

)∥∥gkA

∥∥2

which, for k sufficiently large, leads to a contradiction.

Our last task is showing that the search direction dk defined in the previous section bringslocal superlinear convergence. To this end we note that system (4.13) is the same systemconsidered in [20]. Theorem 3.1 in [20] then implies that, under standard assumptions, theiteration xk+1 = xk + dk is locally superlinearly convergent in a neighborhood of a KKTpoint x of Problem (P) if dk is the exact solution of system (4.13).

In the next proposition we analyze the effect of solving system (4.13) inexactly. Theproposition is an extension to the constrained framework of classical results on truncatedNewton methods (see [12] and [25]). In fact, the procedure proposed in the previous sectionto calculate the solution of (4.13) simply amounts to solving system

[ ∇2x Lk ∇gk

Ak(∇gkAk

)0

] [dk

zk − λkAk

]= −

[∇ f k + ∇gk

Ak λkAk

gkAk

]+

[r k

0

](5.56)

instead of (4.13), where we have omitted the arguments and

r k = Pk(∇2

x L(xk, λ(xk))dk + ∇ f (xk))

is the error made when approximately solving the first n rows of system (4.13).For the sake of clarity, a distinction is made between the actual (projected) residual r k

and the residual r i�

returned by algorithm PCG at its final iteration i�. In fact, the abovetwo residuals may be different and coincide only if algorithm PCG exits at Step 4 settingdk

o = t i�

. With all this in mind we can now proceed to prove our last result.

Page 22: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

106 FACCHINEI, LIUZZI AND LUCIDI

Proposition 5.7. Let (x, λ) be a KKT pair for Problem (P) which satisfies the StrongSecond Order Sufficient Condition (SSOSC), that is, such that

w∇2x L(x, λ)w > 0, ∀w �= 0 : ∇gI+ (x)w = 0.

Furthermore, let {xk} be a sequence converging to x and {dk} be the sequence given bythe inexact (d-part) solution of system (4.14) obtained by applying algorithm PCG (andtherefore satisfying system (5.56)). Then, a sufficiently small positive constant c1 (see PCG)exists such that, if c1 ≤ c1,

limk→∞

‖xk + dk − x‖‖xk − x‖ = 0.

Proof: For the sake of clarity, we divide the proof into three steps. In the first one (a) weestablish, as a simple consequence of the (SSOSC), that algorithm PCG always terminatesat Step 4 setting dk

o = t = t i�

and that the norm of the residual ‖r k‖ is smaller than tolk .In the second, crucial step (b) we show that the residual r k goes to zero faster than the

r.h.s. of system (4.14), i.e. that

limk→∞

‖r‖∥∥∥∥∇ f (xk )+∇gAk (xk )λAk (xk )

gAk (xk )

∥∥∥∥= 0. (5.57)

The assertion of the proposition will then follow from standard arguments, this is part (c).

Step (a)

First of all we prove that if (x, λ) satisfies the (SSOSC) then a sufficiently small neighbor-hood of x exists such that

w∇2x L(x, λ(x))w > 0, ∀w �= 0 : ∇gA(x)(x)w = 0.

Well-known properties of quadratic forms (see, e.g., [1]) and (SSOSC) imply that thereexists a positive constant η such that the matrix

∇2x L(x, λ) + η∇gI+ (x)∇gI+ (x)

is positive definite. By continuity, there exists a neighborhood �1 of x such that the matrix

∇2x L(x, λ(x)) + η∇gI+ (x)∇gI+ (x)

is positive definite for all x ∈ �1. This implies (see, e.g., [1]) that, for all x ∈ �1,

w∇2x L(x, λ(x))w > 0, ∀w �= 0 : ∇gI+ (x)w = 0.

Recalling property (4.11), we can find a neighborhood �2 ⊆ �1 such that

I+(x, λ) ⊆ A(x) ⊆ I0(x), for all x ∈ �2.

Page 23: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 107

Therefore, we obtain that, for all x ∈ �2,

w∇2x L(x, λ(x))w > 0, ∀w �= 0 : ∇gA(x)(x)w = 0.

By the above property and noting that all the vectors si generated by the PCG algorithm aresuch that ∇gAk (xk)si = 0, it follows that a positive constant c1 exists, such that if c1 ≤ c1,then the PCG algorithm never stops at Step 1.

Then, standard arguments on conjugate gradient methods, imply that eventually the testat Step 3 of algorithm PCG, namely ‖r i+1‖ ≥ tolk , will not be satisfied. That is, an iterationindex i� exists such that algorithm PCG will stop with

‖r i�‖ < tolk .

Now we prove that r k = r i�

. To this end, we show that, under the (SSOSC), dko is set equal

to t i�

.Indeed, it results

−(r◦)t i� = −(r◦)i�−1∑h=0

ρhsh .

On the other hand, well-known properties of preconditioned conjugate gradient methodsand the properties of the projection operator Pk , yield

– (r◦)r h = 0 for all h = 1, 2, . . . , i� − 1 and– (r◦)sh = (rh)r h = ‖r h‖2, for all h = 0, 1, . . . , i� − 1.

Hence, by the (SSOSC), we obtain

−(r◦)t i� = −i�−1∑h=0

‖r h‖2‖r h‖2

(sh)∇2x Lksh

≤ 0,

so that t = t i�

. Moreover, again by the (SSOSC), (sh)∇2x Lksh > c1‖sh‖2 for all h =

0, 1, . . . , i� − 1 and c1 ≤ c1, that is, no negative curvature direction is encountered duringthe iterations of PCG algorithm. Thus, we have θ i� = t i�

and algorithm PCG will producedk

o = t = t i� = θ i�

.Thus we can conclude that r i� = r k and

‖r k‖ = ∥∥Pk(∇2

x L(xk, λ(xk))dk + ∇ f (xk))∥∥ ≤ tolk .

Step (b)

By noting that the stopping test tolerance of algorithm PCG is set equal to

tolk = 1

k

∥∥Pk(∇2

x L(xk, λ(xk))dkv + ∇ f (xk)

)∥∥,

Page 24: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

108 FACCHINEI, LIUZZI AND LUCIDI

it obviously results

limk→∞

‖r k‖∥∥Pk(∇2

x L(xk, λ(xk))dkv + ∇ f (xk)

)∥∥ = 0.

In order to prove (5.57), we show that

∥∥Pk(∇2

x L(xk, λ(xk))dkv + ∇ f (xk)

)∥∥‖K (xk)‖ ≤ c for all k, (5.58)

where c is a positive constant and K (xk) is the r.h.s. of system (4.14), that is,

K (xk) =∥∥∥∥∥∇ f (xk) + ∇gAk (xk)λAk (xk)

gAk (xk)

∥∥∥∥∥ .

By considering that Pk(∇gAk (xk)λAk (xk)) = 0, we have

∥∥Pk(∇2

x L(xk, λ(xk))dkv + ∇ f (xk)

)∥∥‖K (xk)‖ ≤ ‖Pk‖

∥∥∇2x L(xk, λ(xk))dk

v

∥∥ + ‖K (xk)‖‖K (xk)‖ .

Then, for k sufficiently large and by Assumption A2, the vertical step dkv is given by (4.20),

so that ∥∥Pk(∇2

x L(xk, λ(xk))dkv + ∇ f (xk)

)∥∥‖K (xk)‖

≤ ‖Pk‖[

1 +∥∥∇2

x L(xk, λ(xk))∇gAk (xk)(∇gAk (xk)∇gAk (xk))−1gAk (xk)∥∥

‖K (xk)‖]

≤ ‖Pk‖[

1 +∥∥∇2

x L(xk, λ(xk))∇gAk (xk)(∇gAk (xk)∇gAk (xk))−1∥∥‖gAk (xk)‖

‖gAk (xk)‖].

This last inequality along with the continuity assumption prove relation (5.58) and therefore(5.57).

Step (c)

Reasoning as in Theorem 3.1 of Ref. [20] and taking into account that system (4.14) issolved with residual (r k 0), it is possible to show that positive constants µ, L , M1 and M2

exist such that

‖xk + dk − x‖ ≤ µ[L‖xk − x‖2 + M1‖λ(xk) − λ(x)‖‖xk − x‖+ M1‖xk − x‖2 + ‖r k‖], (5.59a)

‖K (xk)‖ = ‖K (xk) − K (x)‖ ≤ M2‖xk − x‖. (5.59b)

Page 25: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 109

Hence, from (5.59), we obtain

‖xk + dk − x‖‖xk − x‖ ≤ µ

[(L + M2)‖xk − x‖ + M1‖λ(xk) − λ(x)‖ + M2

‖r k‖‖K (xk)‖

].

The above relation concludes the proof.

The above proposition states that the truncated direction dk is a Newton-type direction inthe sense that, at least locally and under the required assumptions, it is able to guarantee asuperlinear convergence rate.

6. The complete algorithm

In this section we describe the complete algorithm.

6.1. Updating the penalty parameter ε

Appropriate updating of the penalty parameter ε is crucial to both the theoretical conver-gence properties of the algorithm and to its practical efficiency. At each iteration we testfor the appropriateness of the value of ε just after calculating the search direction. Accord-ing to which direction has been chosen, the truncated Newton-type (NT) direction or theantigradient (AG) of the merit function, we use a different, simple test, to decide whetherto reduce the penalty parameter value or not.

We first describe the test for the case the NT direction has been selected. So, let (dk zk)

be the solution of system (4.13), we decrease the value of ε if the following inequality isnot satisfied

∇Z (xk ; ε)dk ≤ −ε�k, (6.60)

where �k is defined by

�k = ‖dk‖2 + ‖zk − λAk (xk)‖2 + ‖λN k (xk)‖2. (6.61)

The technical reasons for this choice are rather involved, but the basic ideas behind it canbe explained as follows. Suppose for simplicity that the NT direction dk is always used.It is not difficult to prove [20] that if �k goes to 0, every limit point of the sequence ofpoints produced by the algorithm is a KKT point of Problem (P). Furthermore, it is possibleto show that, after a finite number of reductions of ε, test (6.60) is always satisfied. Thismeans that the value of ε stays fixed eventually, so that the algorithm basically reducesto the unconstrained minimization of the merit function for a fixed value of the penaltyparameter. Therefore, inequality (6.60) implies on the one hand that dk is a direction ofsufficient decrease for the merit function Z (x ; ε) while, on the other hand, it also guaranteesthat �k is driven to 0, so that every limit point of the sequence of points produced is a KKTpoint of Problem (P).

Page 26: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

110 FACCHINEI, LIUZZI AND LUCIDI

If the direction used is the antigradient (AG) we still use test (6.60) to decide whether toreduce ε or not, however �k is no longer given by (6.61), but by

�k =m∑

i=1

max[gi (xk), −εpi (x

k ; ε)λi (xk)]2.

The different definition of �k is justified by the different direction selected, however, therationale behind test (6.60) is the same we just describe for the NT search direction.

There is another point which we need to discuss at this juncture. Suppose that at iterationk we reduce the value of the penalty parameter from ε to, say, ε/2. Since the iterative processwe are performing is a descent process we have Z (xk−1, ε) > Z (xk ; ε); however, after thereduction of ε, there is no guarantee that Z (xk−1, ε/2) > Z (xk, ε/2). Since reductionsof ε can occur only a finite number of times, this is not a serious threat to convergence;however, if Z (xk−1, ε/2) ≤ Z (xk, ε/2) one could wonder whether it is wiser to continuethe minimization process from xk−1 rather than from xk . More in general, after a reductionof ε has occured, one might wish to restart the minimization process from the point xi

among those already generated which gives the lowest value Z (xi ; ε/2). On the other handone should also pay attention not to throw away too easily all the information that hasbeen gathered by the minimization process up to that moment. Our choice is then thefollowing. When a reduction of ε occurs we compare the value of Z (xk ; ε/2) to Z (x◦; ε/2)and continue the process from whichever point gives the lowest merit function value. Thisseems reasonable: if Z (xk ; ε/2) > Z (x◦; ε/2) it really means that the whole minimizationprocess up to that point did not even lead to an improvement with respect to the startingpoint. So it seems likely that, because of the wrong value of ε, we were wandering awayfrom KKT points.

6.2. Calculation of the steplength αk

We use the following Armijo line search procedure

Z (xk + αk pk ; ε) ≤ Z (xk ; ε) + γαk∇Z (xk ; ε) pk, (6.62)

where pk is a suitable search direction.However, this procedure has to be slightly modified to take into account that the merit

function Z is defined only on the open setS. Since Z goes to infinity on the boundary ofS, itis well known and easily verified that it is sufficient to select the largest αk in 1, 1/2, 1/4, . . .

such that xk + αk pk belongs to S and (6.62) is satisfied.We remark the fact that the above line search procedure (namely a monotone line search)

might be substituted by every line search technique (i.e., non monotone line search) thatguarantees the following properties

limk→∞

∇Z (xk ; ε) pk = 0,

which is used when proving the global convergence property of the complete algorithmbelow. We are now ready to present the complete optimization algorithm.

Page 27: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 111

Algorithm model TNA

Data: x ∈ Rn, ε > 0, δ > 0, γ ∈ (0, 1/2), σ ∈ (0, 1), ρ ∈ (0, 1) and a positive α such that

x ∈ S.Step 1: Initialization. Set k = 0, x0 = x, ε = ε.Step 2: Test for convergence

if xk is a KKT point then STOP else go to Step 3.

Step 3: Choice of the search direction

if system (4.13) has a truncated Newton solution (dk, zk) and this solution is such that

‖dk‖ + ‖λAk (xk) − zk‖ ≤ δ (6.63)

then (NT) set pk = dk and �k = ‖dk‖2 + ‖λAk (xk) − zk‖2 + ‖λN k (xk)‖2;else (AG) set pk = −∇Z (xk ; ε) and �k = ∑m

i=1 max[gi (xk), −εpi (xk ; ε)λi (xk)]2;

Step 4: Test on the penalty parameter

if

∇Z (xk ; ε) pk > −ε�k (6.64)

then choose ε ∈ (0, ρε).

if Z (x ; ε) ≤ Z (xk ; ε)then set xk+1 = xelse set xk+1 = xk ;set k = k + 1 and go to Step 3.

else go to Step 5.

Step 5: Line search

Compute αk = σ h where h is the smallest nonnegative integer such that

xk + αk pk ∈ SZ (xk + αk pk ; ε) ≤ Z (xk ; ε) + γαk∇Z (xk ; ε) pk,

set xk+1 = xk + αk pk, k = k + 1 and go to Step 2.

The only partially new element in algorithm TNA is test (6.63). In Section 2.2 we saidthat if the truncated Newton-type (NT) direction dk exists and is “suitable” in some sensewe use it. Test (6.63) embodies our suitability concept. Note that if the sequence {xk} isconverging to a KKT point, test (6.63) will be always satisfied eventually, since in thiscase dk converges to 0 and zk to λAk (xk), as stated by Proposition 5.5. On the otherhand if, when far from a KKT point, dk becomes overly large or zk is too far from theestimate λAk (xk), then we expect the direction dk to be no longer reliable. This phe-nomenon is similar to what happens in Newton method for unconstrained optimizationwhere it is well known that an excessively large direction is often evidence of a pathologicalsituation.

Page 28: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

112 FACCHINEI, LIUZZI AND LUCIDI

We are now ready to state the main theorem of this paper.

Theorem 6.1. Algorithm TNA is well defined. Let {xk}, {pk}, {zk} be the sequences pro-duced by the algorithm. Then(a) the sequence {xk} is bounded;(b) after a finite number of steps the penalty parameter is no longer updated, i.e. the penalty

parameter test (6.64) is never satisfied eventually;(c) every accumulation point x of the sequence {xk} is a KKT point;(d) if x is an accumulation point of the sequence {xk} which is an isolated KKT point, then

the whole sequence {xk} converges to x ;(e) if the sequence {xk} is converging to a point x, then eventually the truncated Newton-

type direction is taken as search direction;(f) a positive constant c1 exists such that if c1 ≤ c1 in algorithm PCG and if the se-

quence {xk} converges to a point x such that (x, λ(x)) satisfies the Strong Second OrderSufficient Condition (SSOSC), namely,

w∇2x L(x, λ(x))w > 0, ∀w �= 0 : ∇gI+ (x)w = 0,

then eventually αk = 1 and the rate of convergence is q-superlinear.

Proof:

(a) We note that, by the instructions of Step 4, every time we update ε the point x◦ belongsto �(x ; ε). By this fact and by the instructions of Step 5, we also have that every xk

belongs to �(x ; ε). Then, from Proposition 3.2(a), the whole sequence {xk} is containedin the compact set �, so that it is bounded.

(b) The proof of this point is by contradiction. Therefore, suppose the assertion is false.Then, there exist sequences {x j }, {ε j }, {p j }, and {� j } such that

ε j ↓ 0, (6.65)

x j → x ∈ � (because of point (a)),

∇Z (x j , ε j ) p j > −ε j� j (because (6.64) is satisfied). (6.66)

The sequences {x j }, {p j } and {� j } are subsequences of the corresponding quantitiesfor which test (6.64) is violated, while {ε j } is the sequence of corresponding values ofthe penalty parameter.

Let us first consider the case in which p j = −∇Z (x j ; ε j ) (AG) for an infinite numberof indices j .

We can actually assume, without loss of generality, that p j = −∇Z (x j ; ε j ) for everyj . Then, by (6.66) we have that

‖∇Z (x j ; ε j )‖2 ≤ ε j� j . (6.67)

By the fact that the sequence {x j } is contained in the compact set � and by the continuityassumption, we have that {� j } is a bounded sequence as well. Hence, by (6.65) we getthat ‖∇Z (x j ; ε j )‖ tends to 0 so that, by Proposition 3.2(b), we have x ∈ F . But then,

Page 29: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 113

by Proposition 3.2(c), and by definition of �, we get a contradiction to (6.67) if weprove that

(ε j )2‖∇g(x j )∇Z (x j ; ε j )‖2 ≤ ‖∇Z (x j ; ε j )‖2.

But this last inequality is eventually true by continuity and by (6.65).Therefore we can assume, without loss of generality, that p j is the truncated Newton-

type (NT) direction for every j(p j = d j ).In this case we have that Proposition 5.6 holds, because x j belongs to � for every j

and d j and z j satisfy (6.63) for every j . Recalling that in this case

� j = ‖d j‖2 + ‖λA j (x j ) − z j‖2 + ‖λN j (x j )‖2,

we get a contradiction between Proposition 5.6 and (6.65), (6.66); and this completesthe proof of point (b).

(c) Let {xk}K be a subsequence of {xk} converging to x . We want to show that x is a KKTpoint. By point (b) we can assume, without loss of generality, that ε = ε. Since thesequence {xk} belongs to the compact set � and Z (x ; ε) is bounded from below on �,it is standard to show that the Armijo line search technique used in Step 5 implies

limk→∞,k∈K

αk∇Z (xk, ε) pk = 0. (6.68)

We now show that

limk→∞,k∈K

∇Z (xk, ε) pk = 0. (6.69)

If (6.69) was not true we could assume, renumbering if necessary, and taking intoaccount (6.64) and the continuity of ∇Z (x ; ε) on the compact set �, that

limk→∞,k∈K

∇Z (xk, ε) pk < 0. (6.70)

On the other hand, (6.68) implies that αk goes to 0 which, in turn, because ofProposition 3.2(a) and the rules of Step 5, implies that, for k sufficiently large,

xk + αk

σpk ∈ S,

Z

(xk + αk

σpk ; ε

)− Z (xk ; ε) > γ

αk

σ∇Z (xk ; ε) pk . (6.71)

Now, if we divide (6.71) by αk/σ and pass to the limit for k ∈ K , we obtain a contra-diction to (6.70) since γ ∈ (0, 1/2). So (6.69) is true.

In turn, since (6.64) is never satisfied, we have, eventually, that

∇Z (xk ; ε) pk ≤ −ε�k,

so that (6.69) implies

limk→∞,k∈K

�k = 0. (6.72)

Page 30: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

114 FACCHINEI, LIUZZI AND LUCIDI

Now point (c) follows by the definition of �k and Proposition 5.5(a) if, for a infinite num-ber of k ∈ K , pk = dk , while it follows from the definition of �k and Proposition 4.2(a)of [11] if, for a infinite number of k ∈ K , pk = −∇Z (xk ; ε).

(d) We observe that, by the instructions of Step 3 and Step 5 and by point (a) of this theorem,we have

limk→∞

αk pk = 0,

from which we obtain:

limk→∞

‖xk+1 − xk‖ = 0. (6.73)

Now, point (d) follows from (6.73) and from well known results (see for example page478 of [32]).

(e) We prove this point by contradiction. Assume that the assertion is false. Then thereexists a converging subsequence {xk}K such that pk = −∇Z (xk ; ε) for all k ∈ K . Point(c) yields that the limit x of {xk}K is a KKT point. Then, Assumption A2, relation (4.11)and point (b) of Proposition 5.5 guarantee that, for sufficiently large k, with k ∈ K , atruncated solution (dk, zk) of system (4.13) is defined and ‖dk‖ + ‖zk − λAk (xk)‖ ≤ δ.Therefore we obtain the contradiction that, for such values of k, the test at Step 3 wouldbe satisfied but the algorithm would not set pk = dk that is the truncated direction.

(f) By point (e) we know that, eventually, the truncated Newton-type direction dk is alwaysaccepted as search direction. Point (b) of this theorem guarantees that the penalty pa-rameter test (6.64) is never satisfied (eventually), that is, for all k ≥ k with k sufficientlylarge we have that

∇Z (xk ; ε)dk ≤ −ε�k .

Hence, the definition of �k , the fact that Z (x ; ε) is an SC1 function and Proposition 5.7,show that we can apply Theorem 3.3 of Ref. [18] which guarantees that, eventually, theunit stepsize (αk = 1) is always accepted at Step 5 of algorithm TNA. This fact, andagain Proposition 5.7 conclude the proof of point (f).

Remark 1. We point out that the properties of the above algorithm are not affected if theassignment pk = −∇Z (xk ; ε), in Step 3, is modified as follows

pk = −Bk∇Z (xk ; ε),

where {Bk} is any sequence of positive definite matrices with smallest eigenvalues boundedaway from zero and bounded largest eigenvalues.

Remark 2. Algorithm TNA should incorporate a way for automatically updating the con-stant c1, which appears in the definition of algorithm PCG. Nevertheless, in the context oftruncated Newton methods, it is commonly accepted to fix, once and for all, c1 to a smallpositive value (see next section).

Page 31: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 115

7. Preliminary numerical results

In order to have a feeling of the viability of the proposed approach, we produced a Fortran90 version of TNA, properly extended to handle both inequality and equality constraints(see [16] for details) and tested it on a selection of large scale nonlinear constrained prob-lems from the CUTE collection [4]. As we have already mentioned in the introduction,TNA has been designed to cope with problems having a large number of variables (say, afew thousands) with respect to a limited number of constraints (say, a bit more than onehundred).

Problem n m q Problem n m q

ARGTRIG 150 0 150 EIGENCCO 462 0 231

AUG2D 850 0 400 FLOSP2HH 323 0 323

AUG2DC 850 0 400 FLOSP2HM 323 0 323

AUG3D 1360 0 300 HAGER1 2000 0 1000

AUG3DC 1360 0 300 HAGER2 2000 0 1000

BRATU2D 900 0 900 HAGER3 2000 0 1000

BRATU3D 1000 0 1000 HAGER4 1000 500 500

BROYDN3D 1000 0 1000 HANGING 288 180 0

CATENA 497 0 166 LCH 600 0 1

CATENARY 497 0 166 MSQRTA 529 0 529

CBRATU2D 882 0 882 MSQRTB 529 0 529

CBRATU3D 1024 0 1024 OPTCTRL3 299 0 200

DRCAVTY1 100 0 100 OPTCTRL6 299 0 200

DRCAVTY3 100 0 100 OPTMASS 606 101 404

DTOC1L 1194 0 796 ORTHREGA 517 0 256

DTOC1NA 1194 0 796 ORTHREGC 505 0 250

DTOC1NB 1194 0 796 ORTHREGE 999 1 662

DTOC1NC 1194 0 796 ORTHREGF 680 2 225

DTOC1ND 1194 0 796 ORTHRGDM 503 0 250

DTOC2 1194 0 796 ORTHRGDS 503 0 250

DTOC3 1497 0 998 POROUS1 900 0 900

DTOC4 1497 0 998 POROUS2 900 0 900

DTOC6 1000 0 500 POWELL20 1000 1000 0

EIGENB 110 0 110 PRIMAL1 325 86 0

EIGENC 462 0 462 PRIMAL2 649 97 0

EIGENB2 110 0 55 PRIMAL3 745 112 0

EIGENACO 110 0 55 PRIMAL4 1489 76 0

EIGENBCO 110 0 55 PRIMALC5 287 286 0EIGENC2 462 0 231

Page 32: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

116 FACCHINEI, LIUZZI AND LUCIDI

Unfortunately, the CUTE collection, though providing us with an extraordinary variety ofproblems, does not contain many nonlinear programs with the required properties. There-fore, in order to obtain a significant number of test problems we picked up those problemshaving less constraints than variables and, in any case, less then one thousand constraints.In the above table we describe the test set used by reporting for each problem its name, thenumber n of variables, the number m of inequality constraints (including bound constraintson the variables), and the number q of equality constraints.

We did not include any problem having only bound constraints on the variables, sinceour code does not exploit in any way this particular aspect. Moreover, we decided not toconsider all those problems which our algorithm failed to converge on due to the presenceof points not satisfying the assumptions. Naturally, an efficient implementation of TNAshould include some strategies that, near non regular points, be able to continue somehowthe minimization process thus preventing the algorithm from getting stuck. Nevertheless, thedefinition of an extremely efficient code is out of the scope of the present paper hence thedevelopment of such procedures was not considered. For the same reason, a fine-tuningof the parameters used to define the algorithm has not been carried out. In particular, asconcerns the algorithm model TNA we have chosen the following values for the parameters

– α = 1 + ∑mi=1 g+

i (x)3, εo = max[10−3, min[10−1,ε]]α

,

where ε = max[10−6,‖g+(x)‖]‖∇L(x,λ(x))‖ and x is the initial point;

– δ = 106;– γ = 10−3, σ = 0.5;– ρ = 10−1.

As regards the truncated Newton procedure PCG, we have set

– c1 = c2 = 10−6;– tolk = min[ 0.9

k , ‖r◦‖] min[10−3, ‖r◦‖]‖Pk(∇2x L(xk, λ(xk))dk

v + ∇ f (xk))‖.

Implementing algorithm TNA we substituted the Armijo-type linesearch with a nonmono-tone linesearch strategy as described in [26]. As already mentioned in Section 5.2, the useof such a nonmonotone globalization technique does not modify the convergence propertiesof the constrained algorithm TNA. We did not define TNA directly using this nonmono-tone technique in order to avoid an excessively complex exposition. On the other handthe use of such a nonmonotone linesearch strategy allows us to better tackle with possibleill-conditioning of the penalty function.

All runs were performed on an Intel pentium III based personal computer with 512 MB ofmain memory using IEEE double precision arithmetic. In the following table we report theresults of TNA compared with those of LANCELOT [10]. More precisely, for each algorithmwe report the number of iterations (IT), the number of function (nF) and gradient evaluations(nG), the final function value (F.Value) and the solution time (Time) required to get a KKTpoint. Both algorithms were interrupted either when the number of function evaluations oriterations reached 2000, or when the execution time reached ten minutes of CPU time, orwhen the step taken by the algorithm was smaller than the machine precision. Finally, by

Page 33: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 117

Infeasible point (see problems DRCAVTY3, FLOSP2HH, FLOSP2HM and ORTHRGDS)we mean that the algorithm terminates unexpectedly signaling the impossibility of reducingthe infeasibility of the current point.

TNA LANCELOT

Problem IT nF nG F. Value Time IT/nF nG F. Value Time

ARGTRIG 3 4 4 0.275492D−16 7.65 15 14 0.275492D−16 2.36

AUG2D 2 3 3 0.996013D+03 1.34 20 21 0.996013D+03 0.59

AUG2DC 1 2 2 0.155645D+04 1.17 24 25 0.155645D+04 0.61

AUG3D 2 3 3 0.132918D+02 1.76 7 8 0.132918D+02 0.42

AUG3DC 1 2 2 0.157727D+03 2.14 11 12 0.157727D+03 0.51

BRATU2D 2 3 3 0.657207D−09 8.01 2 3 0.657207D−09 0.29

BRATU3D 3 4 4 0.691848D−14 27.23 4 5 0.691848D−14 0.28

BROYDN3D 4 5 5 0.566682D−17 5.54 5 6 0.566682D−17 0.08

CATENA 8 13 9 −0.348530D+06 1.41 137 125 −0.348530D+06 0.62

CATENARY Max f. evals. 826 687 −0.348400D+06 8.70

CBRATU2D 3 4 4 0.284644D−14 9.66 3 4 0.284644D−14 0.30

CBRATU3D 3 4 4 0.812799D−14 31.68 3 4 0.812799D−14 0.30

DRCAVTY1 9 14 11 0.215494D−13 1.11 48 40 0.215494D−13 0.55

DRCAVTY3 7 10 8 0.135917D−10 0.82 Infeasible point

DTOC1L 8 9 9 0.816221D+00 7.37 9 10 0.816221D+00 0.31

DTOC1NA 8 9 9 0.855190D+00 16.54 10 11 0.855190D+00 0.83

DTOC1NB 8 9 9 0.144966D+01 16.44 12 13 0.144966D+01 0.94

DTOC1NC 6 8 8 0.700780D+01 16.37 13 14 0.700780D+01 0.96

DTOC1ND 6 8 8 0.947069D+01 16.34 21 19 0.947069D+01 1.48

DTOC2 8 10 10 0.490343D+00 10.61 34 34 0.489160D+00 2.07

DTOC3 1 2 2 0.235084D+03 9.53 28 29 0.235084D+03 0.78

DTOC4 3 4 4 0.288285D+01 11.04 15 16 0.288285D+01 0.88

DTOC6 8 9 9 0.684661D+04 4.33 87 88 0.684661D+04 4.10

EIGENB 31 138 41 0.375868D−11 12.98 172 141 0.375868D−11 2.19

EIGENC 16 31 17 0.579975D−16 286.95 627 524 0.579975D−16 132.81

EIGENB2 1 2 2 0.180000D+02 0.22 45 40 0.553000D−09 0.24

EIGENACO 1 2 2 0.000000D+00 0.25 24 24 0.671210D−12 0.19

EIGENBCO 1 2 2 0.900000D+01 0.25 124 102 0.344000D−09 1.32

EIGENC2 19 46 23 0.810000D+03 15.48 644 527 0.671210D−12 39.55

EIGENCCO 22 58 28 0.790000D+03 123.25 280 231 0.656270D−11 46.15

FLOSP2HH Max CPU time Infeasible point

FLOSP2HM Max CPU time Infeasible point

HAGER1 1 2 2 0.880797D+00 10.45 12 13 0.880797D+00 0.40

(Continued on next page.)

Page 34: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

118 FACCHINEI, LIUZZI AND LUCIDI

(Continued ).

TNA LANCELOT

Problem IT nF nG F. Value Time IT/nF nG F. Value Time

HAGER2 1 2 2 0.432082D+00 16.61 11 12 0.432082D+00 0.50

HAGER3 1 2 2 0.140961D+00 20.45 12 13 0.140961D+00 0.66

HAGER4 4 5 5 0.279451D+01 3.94 9 10 0.279451D+01 1.43

HANGING 103 337 131 −0.620176D+03 15.64 94 85 −0.620176D+03 1.02

LCH 2 3 3 −0.309886D+01 0.48 35 35 −0.431830D+01 0.60

MSQRTA 5 8 6 0.191014D−07 112.34 19 16 0.191014D−07 0.06

MSQRTB 5 8 6 0.198496D−07 143.42 17 16 0.198496D−07 0.05

OPTCTRL3 19 25 21 0.503897D+04 3.47 68 69 0.503897D+04 0.25

OPTCTRL6 19 25 21 0.503897D+04 3.51 68 69 0.503897D+04 0.25

OPTMASS Max CPU time Max iteration num.

ORTHREGA 31 32 32 0.166476D+04 8.79 161 152 0.141410D+04 2.10

ORTHREGC 6 7 7 0.958196D+01 1.55 57 50 0.126130D+02 0.72

ORTHREGE 92 194 117 0.330009D+03 208.87 Step taken too small

ORTHREGF 118 200 127 0.455091D+02 22.97 147 124 0.129800D+02 1.72

ORTHRGDM 109 107 87 0.753944D+02 15.29 Step taken too small

ORTHRGDS 182 267 145 0.761044D+02 47.94 Infeasible point

POROUS1 11 14 12 0.294127D−10 17.43 30 28 0.294127D−10 4.69

POROUS2 6 9 7 0.599071D−08 16.27 32 28 0.599071D−08 2.30

POWELL20 Max CPU time Step taken too small

PRIMAL1 7 14 9 −0.350130D−01 2.19 9 10 −0.350130D−01 0.62

PRIMAL2 5 29 23 −0.337337D−01 6.34 8 10 −0.337337D−01 1.09

PRIMAL3 6 34 27 −0.135756D+00 35.90 9 10 −0.135756D+00 6.90

PRIMAL4 7 40 28 −0.746091D+00 20.93 9 10 −0.746091D+00 6.71

PRIMALC5 Linesearch step too small 25 25 −0.42723D0+03 0.39

From the above table it clearly emerges that our approach is less efficient, in terms ofsolution time, than LANCELOT. This is mainly due to the linear algebra connected withthe evaluation of the multiplier functions and their Jacobians which is particularly timeconsuming, in an exact penalty approach, expecially when the number of constraints (m+q)becomes considerable. Nevertheless, the scenario is not that bad. In fact, as we can see,algorithm TNA performs better than LANCELOT, on most problems, both in terms offunction and gradient evaluations. The latter aspect could be relevant in case the problemfunctions are heavy to compute. Unfortunately, at the moment, CUTE does not provide uswith such problems. Moreover, the efficiency of the code in term of CPU time could beenhanced by developing the following points:

– defining strategies that, while retaining the convergence properties, avoid the computationof multiplier functions at every iteration;

Page 35: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 119

– using more efficient linear algebra routines;– studying more appropriate termination criteria in the truncated conjugate gradient algo-

rithm used to compute the horizontal step.– using automatic differentiation techniques;

Nevertheless, an investigation of these aspects is out of the scope of the present paper which,on the contrary, aims at a more theoretical analysis of the properties of the algorithm modelherein described.

Appendix: Efficient solution of linear systems

The calculation of search direction dk and of zk requires the solution of several linearsystems of the following type

∇gAk ∇gAk w = bk . (A.74)

In particular, they occur in determining the vertical step dkv , the value of zk and in the

projection Pkr i+1 in the PCG algorithm. It is thus evident that an efficient solution of suchlinear systems may considerably speed up the overall procedure that computes the searchdirection. As suggested by Byrd et al. [7] we solve the aforementioned linear systems usingthe augmented matrix

[I ∇gAk

∇gAk 0

]

In fact, it can be shown that all the previously seen quantities can be computed by meansof the augmented system

[I ∇gAk

∇gAk 0

][y

w

]=

[b1

b2

](A.75)

for different values of the right hand side (b1, b2). More specifically, as regards dv , weclaim that Eq. (4.20) can be written, omitting the argument xk and the iteration indicesk, as

dv = −∇gAw, (A.76)

provided that

w = (∇gA ∇gA

)−1ga,

that is, w is the analytic solution of the following linear system

∇gA ∇gAw = ga .

Page 36: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

120 FACCHINEI, LIUZZI AND LUCIDI

The above system, using (A.76), can be rewritten as −∇gA dv = gA. In conclusion, we have

come up with the following augmented system of equations

{dv + ∇gAw = 0

∇gA dv = −gA,

that is system (A.75) where the r.h.s. is set equal to (0, −gA). Let us now consider theexpression of the projection of a vector r onto the null space of the transpose gradients ofthe active constraints, that is Pr . Recalling (4.17), we claim that Pr = r −∇gAw providedthat w = (∇g

A ∇gA)−1∇gA r . Now, considering that ∇g

APr = 0, we can finally write

{Pr + ∇gAw = r

∇gAPr = 0,

that is system (A.75) where, this time, the r.h.s. is set equal to (r , 0).To conclude, as regards zk , we note that expression (4.22) can be rewritten, in a more

convenient way, as

−∇gA (∇gAz + b) = 0, (A.77)

provided that b = (∇2x Ld + ∇ f ). Now, if we introduce the auxiliary variable y together

with equation y = −(∇gAz + b), from (A.77) we get again system (A.75), namely

{y + ∇gAz = −b

∇gA y = 0,

where we have set the r.h.s. equal to (−b, 0).To factorize and solve system (A.75) we employ routine MA27 from the Harwell Sub-

routine Library [27] which implements a direct factorization method based on a sparsevariant of multifrontal Gaussian elimination and is thus suitable for large symmetric linearsystems.

Acknowledgments

We thank two anonymous referees whose very detailed and constructive criticisms greatlyhelped improving the presentation.

References

1. R. Bellman, Introduction to Matrix Analysis. McGraw-Hill: New York, 1970.2. H.Y. Benson, D.F. Shanno, and R.J. Vanderbei, “Interior-point methods for nonconvex nonlinear programming:

Jamming and comparative numerical testing,” Tech. Rep. ORFE-00-02, Operations Research and FinancialEngineering, Princeton University, Princeton, NJ, USA, 2000.

3. D.P. Bertsekas, Constrained Optimization and Lagrange Multipliers Methods. Academic Press: New York,1982.

Page 37: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

TRUNCATED NEWTON METHOD 121

4. I. Bongartz, A.R. Conn, N.I.M. Gould, and Ph.L. Toint, “CUTE: Constrained and unconstrained testingenvironment,” ACM Transaction on Mathematical Software, vol. 21, pp. 123–160, 1995.

5. I. Bongartz, A.R. Conn, N.I.M. Gould, M. Saunders, and Ph.L. Toint, “A numerical comparison between theLANCELOT and MINOS packages for large-scale nonlinear optimization,” Tech. Rep. 97/13, Department ofMathematics, FUNDP, Namur, Belgium, 1997.

6. I. Bongartz, A.R. Conn, N.I.M. Gould, M. Saunders, and Ph.L. Toint, “A numerical comparison between theLANCELOT and MINOS packages for large-scale nonlinear optimization: The complete results,” Tech. Rep.97/14, Department of Mathematics, FUNDP, Namur, Belgium, 1997.

7. R.H. Byrd, M.E. Hribar, and J. Nocedal, “An interior point algorithm for large-scale nonlinear programming,”SIAM J. Optimization, vol. 9, pp. 877–900, 1999.

8. R.H. Byrd, J.Ch. Gilbert, and J. Nocedal, “A trust region method based on interior point techniques fornonlinear programming,” Math. Programming, vol. 89, pp. 149–185, 2000.

9. R.H. Byrd, J. Nocedal, and R.A. Waltz, “Feasible interior methods using slacks for nonlinear optimization,”Tech. Rep. OTC 2000/11, Optimization Technology Center, Evanston, IL, USA, 2000.

10. A.R. Conn, N.I.M. Gould, and P.L. Toint, “LANCELOT: A Fortran package for large-scale nonlinear opti-mization,” vol. 17 of Springer Series in Computational Mathematics, Springer Verlag, Heidelberg, New York,1992.

11. G. Contaldi, G. Di Pillo, and S. Lucidi, “A continuously differentiable exact penalty function for nonlinearprogramming problems with unbounded feasible set,” Operations Research Letters, vol. 14, pp. 153–161,1993.

12. R.S. Dembo and T. Steihaug, “Truncated-Newton algorithms for large-scale unconstrained optimization,”Math. Programming, vol. 26, pp. 190–212, 1983.

13. G. Di Pillo, F. Facchinei, and L. Grippo, “An RQP algorithm using a differentiable exact penalty function forinequality constrained problems,” Math. Programming, vol. 55, pp. 49–68, 1992.

14. G. Di Pillo and L. Grippo, “An augmented lagrangian for inequality constraints in nonlinear programmingproblems,” J. Optim. Theory and Appl., vol. 36, pp. 495–519, 1982.

15. G. Di Pillo and L. Grippo, “A continuously differentiable exact penalty function for nonlinear programmingproblems with inequality constraints,” SIAM J. Control and Optimization, vol. 23, pp. 72–84, 1985.

16. G. Di Pillo and L. Grippo, “Exact penalty functions in constrained optimization,” SIAM J. Control andOptimization, vol. 27, pp. 1333–1360, 1989.

17. G. Di Pillo, G. Liuzzi, S. Lucidi, and L. Palagi, “Use of a truncated newton direction in an augmentedlagrangian framework,” TR 18–02, Department of Computer and Systems Science, University of Rome “LaSapienza,” Rome, Italy, 2002.

18. F. Facchinei, “Minimization of SC1 functions and the maratos effect,” Operations Research Letters, vol. 17,pp. 131–137, 1995.

19. F. Facchinei, “Robust recursive quadratic programming algorithm model with global and superlinear conver-gence properties,” J. Optim. Theory and Appl., vol. 92, pp. 543–579, 1997.

20. F. Facchinei and S. Lucidi, “Quadratically and superlinearly convergent algorithms for the solutionof inequality constrained minimization problems,” J. Optim. Theory and Appl., vol. 85, pp. 265–289,1995.

21. R. Fletcher, N.I.M. Gould, S. Leyffer, and Ph.L. Toint, “Global convergence of trust-region SQP-filter algo-rithms for nonlinear programming,” Tech. Rep. 99/03, Department of Mathematics, University of Namur, 61rue de Bruxelles, B-5000, Namur, Belgium, 1999.

22. R. Fletcher and S. Leyffer, “Nonlinear programming without a penalty function,” Math. Programming, vol. 91,pp. 239–270, 2002.

23. P.E. Gill, W. Murray, and M.A. Saunders, “SNOPT: An SQP algorithm for large-scale constrained optimiza-tion,” SIAM J. Optimization, vol. 12, no. 4, pp. 976–1006, 2002.

24. T. Glad and E. Polak, “A multiplier method with automatic limitation of penalty growth,” Math. Programming,vol. 17, pp. 140–155, 1979.

25. L. Grippo, F. Lampariello, and S. Lucidi, “A truncated newton method with non-monotone line search forunconstrained optimization,” J. Optim. Theory and Appl., vol. 60, pp. 401–419, 1989.

26. L. Grippo, F. Lampariello, and S. Lucidi, “A class of nonmonotone stabilization methods in unconstrainedoptimization,” Numerische Mathematik, vol. 59, pp. 779–805, 1991.

Page 38: A truncated Newton method for the solution of large-scale inequality constrained minimization problems

122 FACCHINEI, LIUZZI AND LUCIDI

27. Harwell Subroutine Library, “A Catalogue of Subroutines” (Release 12), AEA Technology, Harwell,Oxfordshire, England, 1995.

28. S. Lucidi, “New results on a class of exact augmented lagrangians,” J. Optim. Theory and Appl., vol. 58,pp. 259–282, 1988.

29. S. Lucidi, “New results on a continuously differentiable exact penalty function,” SIAM J. Optimization, vol. 2,pp. 558–574, 1992.

30. O.L. Mangasarian and S. Fromowitz, “The Fritz-John necessary optimality conditions in the presence ofequality constraints,” J. Math. Analysis and Appl., vol. 17, pp. 34–47, 1967.

31. J.L. Morales, J. Nocedal, R.A. Waltz, G. Liu, and J.P. Goux, “Assessing the potential of interior methods fornonlinear optimization,” Tech. Rep. OTC 2001/6, Optimization Technology Center, Evanston, IL, USA, 2001.

32. J.M. Ortega and W.C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, AcademicPress: New York, 1970.

33. L. Qi and Y. Yang, “Globally and superlinearly convergent QP-free algorithm for nonlinear constrainedoptimization,” J. Optim. Theory and Appl., vol. 113, pp. 297–323, 2001.

34. L. Qi and Y. Yang, “A globally and superlinearly convergent SQP algorithm for nonlinear constrained opti-mization,” Journal of Global Optim., vol. 21, pp. 157–184, 2001.

35. D.F. Shanno and R.J. Vanderbei, “An interior point algorithm for nonconvex nonlinear programming,” Com-putational Optimization and Applications, vol. 13, pp. 231–252, 1999.

36. P. Spellucci, “A new technique for inconsistent QP problems in the SQP methods,” Math. Methods of Opera-tions Research, vol. 47, pp. 355–400, 1998.

37. A. Wachter and L.T. Biegler, “Failure of global convergence for a class of interior point methods for nonlinearprogramming,” Math. Programming, vol. 88, pp. 565–574, 2000.

38. A. Wachter and L.T. Biegler, “Global and local convergence of line search filter methods for nonlinear program-ming,” Tech. Rep. B-01-09, Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh,PA, USA, 2001.