A semismooth Newton method for SOCCPs based on a one-parametric class of SOC complementarity functions Shaohua Pan 1 School of Mathematical Sciences South China University of Technology Guangzhou 510640, China Jein-Shan Chen 2 Department of Mathematics National Taiwan Normal University Taipei 11677, Taiwan March 26, 2007 (revised on July 30, 2007) (2nd revised on October 28, 2007) Abstract. In this paper, we present a detailed investigation for the properties of a one- parametric class of SOC complementarity functions, which include the globally Lipschitz continuity, strong semismoothness, and the characterization of the B-subdifferential at a general point. Moreover, for the merit functions induced by them for the second-order cone complementarity problem (SOCCP), we provide a condition for each stationary point being a solution of the SOCCP and establish the boundedness of their level sets, by exploiting Cartesian P -properties. We also propose a semismooth Newton method based on the reformulation of the nonsmooth system of equations involving the class of SOC complementarity functions. The global and superlinear convergence results are obtained, and among others, the superlinear convergence is established under strict complementarity. Preliminary numerical results are reported for DIMACS second-order cone programs, which confirm the favorable theoretical properties of the method. Key words. Second-order cone, complementarity, semismooth, B-subdifferential, New- ton’s method. 1 The author’s work is partially supported by the Doctoral Starting-up Foundation (B13B6050640) of GuangDong Province. E-mail:[email protected]. 2 Member of Mathematics Division, National Center for Theoretical Sciences, Taipei Office. The author’s work is partially supported by National Science Council of Taiwan. E-mail: [email protected]. 1
31
Embed
A semismooth Newton method for SOCCPs based on a one-parametric class of SOC complementarity functions
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A semismooth Newton method for SOCCPs based on aone-parametric class of SOC complementarity functions
Shaohua Pan 1
School of Mathematical Sciences
South China University of Technology
Guangzhou 510640, China
Jein-Shan Chen 2
Department of Mathematics
National Taiwan Normal University
Taipei 11677, Taiwan
March 26, 2007
(revised on July 30, 2007)
(2nd revised on October 28, 2007)
Abstract. In this paper, we present a detailed investigation for the properties of a one-
parametric class of SOC complementarity functions, which include the globally Lipschitz
continuity, strong semismoothness, and the characterization of the B-subdifferential at
a general point. Moreover, for the merit functions induced by them for the second-order
cone complementarity problem (SOCCP), we provide a condition for each stationary
point being a solution of the SOCCP and establish the boundedness of their level sets,
by exploiting Cartesian P -properties. We also propose a semismooth Newton method
based on the reformulation of the nonsmooth system of equations involving the class
of SOC complementarity functions. The global and superlinear convergence results are
obtained, and among others, the superlinear convergence is established under strict
complementarity. Preliminary numerical results are reported for DIMACS second-order
cone programs, which confirm the favorable theoretical properties of the method.
1The author’s work is partially supported by the Doctoral Starting-up Foundation (B13B6050640)of GuangDong Province. E-mail:[email protected].
2Member of Mathematics Division, National Center for Theoretical Sciences, Taipei Office.The author’s work is partially supported by National Science Council of Taiwan. E-mail:[email protected].
1
1 Introduction
We consider the following conic complementarity problem of finding ζ ∈ IRn such that
F (ζ) ∈ K, G(ζ) ∈ K, 〈F (ζ), G(ζ)〉 = 0, (1)
where 〈·, ·〉 denotes the Euclidean inner product, F and G are the mappings from IRn to
IRn which are assumed to be continuously differentiable, and K is the Cartesian product
of second-order cones (SOCs), also called Lorentz cones [8]. In other words,
K = Kn1 ×Kn2 × · · · × Knm , (2)
where m,n1, . . . , nm ≥ 1, n1 + n2 + · · ·+ nm = n, and
Kni :={(x1, x2) ∈ IR× IRni−1 | x1 ≥ ‖x2‖
},
with ‖·‖ denoting the Euclidean norm and K1 denoting the set of nonnegative reals IR+.
We will refer to (1)–(2) as the second-order cone complementarity problem (SOCCP). In
addition, we write F = (F1, . . . , Fm) and G = (G1, . . . , Gm) with Fi, Gi : IRn → IRni .
An important special case of the SOCCP corresponds to G(ζ) = ζ for all ζ ∈ IRn.
Then (1) reduces to
F (ζ) ∈ K, ζ ∈ K, 〈F (ζ), ζ〉 = 0, (3)
which is a natural extension of the nonlinear complementarity problem (NCP) where
K = K1 × · · · × K1. Another important special case corresponds to the Karush-Kuhn-
Tucker (KKT) conditions of the convex second-order cone program (SOCP):
min g(x)
s.t. Ax = b, x ∈ K,(4)
where A ∈ IRm×n has full row rank, b ∈ IRm and g : IRn → IR is a twice continuously
differentiable convex function. From [7], the KKT conditions for (4), which are sufficient
but not necessary for optimality, can be written in the form of (1) with
F (ζ) := d + (I − AT (AAT )−1A)ζ, G(ζ) := ∇g(F (ζ))− AT (AAT )−1Aζ, (5)
where d ∈ IRn is any vector satisfying Ax = b. For large problems with a sparse A,
(5) has an advantage that the main cost of evaluating the Jacobian ∇F and ∇G lies in
inverting AAT , which can be done efficiently via sparse Cholesky factorization.
There have been various methods proposed for solving SOCPs and SOCCPs. They
include interior-point methods [1, 2, 17, 18, 24], non-interior smoothing Newton methods
[4, 9], the smoothing-regularization method [13], the merit function method [7] and the
semismooth Newton method [15]. Among others, the last four kinds of methods are all
2
based on an SOC complementarity function or a smooth merit function induced by it.
Given a mapping φ : IRl× IRl → IRl (l ≥ 1), we call φ an SOC complementarity function
associated with the cone Kl if for any (x, y) ∈ IRl × IRl,
φ(x, y) = 0 ⇐⇒ x ∈ Kl, y ∈ Kl, 〈x, y〉 = 0. (6)
Clearly, when l = 1, an SOC complementarity function reduces to an NCP function,
which plays an important role in the solution of NCPs; see [22] and references therein.
A popular choice of φ is the Fischer-Burmeister (FB) function [10, 11], defined by
φFB
(x, y) := (x2 + y2)1/2 − (x + y), (7)
where x2 means x◦x with “◦” denoting the Jordan product, and x+y denotes the usual
componentwise addition of vectors. More specifically, for any x = (x1, x2), y = (y1, y2) ∈IR× IRl−1, we define their Jordan product associated with Kl as
x ◦ y := (〈x, y〉, y1x2 + x1y2). (8)
The Jordan product, unlike scalar or matrix multiplication, is not associative, which
is the main source on complication in the analysis of SOCCPs. The identity element
under this product is e := (1, 0, · · · , 0)T ∈ IRl. It is known that x2 ∈ Kl for all x ∈ IRl.
Moreover, if x ∈ Kl, then there exists a unique vector in Kl, denoted by x1/2, such that
(x1/2)2 = x1/2 ◦ x1/2 = x. Thus, φFB
in (7) is well-defined for all (x, y) ∈ IRl × IRl. The
function φFB
was proved in [9] to satisfy the equivalence (6), and its squared norm
ψFB
(x, y) :=1
2‖φ
FB(x, y)‖2,
has been shown to be continuously differentiable everywhere by Chen and Tseng [7].
Another popular choice of φ is the residual function φNR : IRl × IRl → IRl given by
φNR(x, y) := x− [x− y]+,
where [ · ]+ means the minimum Euclidean distance projection onto Kl. The function
was studied in [9, 13] which is involved in smoothing methods for the SOCCP, and
recently it was used to develop a semismooth Newton method for nonlinear SOCPs by
Kanzow and Fukushima [15]. The function φNR also induces a merit function
ψNR(x, y) :=1
2‖φNR(x, y)‖2,
but, compared to ψFB
, it has a remarkable drawback, i.e. the non-differentiability.
In this paper, we consider a one-parametric class of vector-valued functions
φτ (x, y) :=[(x− y)2 + τ(x ◦ y)
]1/2 − (x + y) (9)
3
with τ being an arbitrary fixed parameter from (0, 4). The class of functions is a natural
extension of the family of NCP functions proposed by Kanzow and Kleinmichel [14], and
has been shown to satisfy the characterization (6) in [6]. It is not hard to see that as
τ = 2, φτ reduces to the FB function φFB
in (7) while it becomes a multiple of the natural
residual function φNR as τ → 0+. With the class of SOC complementarity functions, the
SOCCP can be reformulated as a nonsmooth system of equations
Φτ (ζ) :=
φτ (F1(ζ), G1(ζ))...
φτ (Fi(ζ), Gi(ζ))...
φτ (Fm(ζ), Gm(ζ))
= 0, (10)
which induces a natural merit function Ψτ : IRn → IR+ given by
Ψτ (ζ) =1
2‖Φτ (ζ)‖2 =
m∑i=1
ψτ (Fi(ζ), Gi(ζ), (11)
with
ψτ (x, y) =1
2‖φτ (x, y)‖2. (12)
In [6], we studied the continuous differentiability of ψτ and proved that each stationary
point of Ψτ is a solution of the SOCCP if ∇F and −∇G are column monotone. This
paper focuses on other properties of φτ , including the globally Lipschitz continuity, the
strong semismoothness, and the characterization of the B-subdifferential. Particularly,
we provide a weaker condition than [6] for each stationary point of Ψτ to be a solution
of the SOCCP and establish the boundedness of the level sets of Ψτ , by using Cartesian
P -properties. We also propose a semismooth Newton method based on (10), and obtain
the corresponding global and the superlinear convergence results. Among others, the
superlinear convergence is established under strict complementarity.
Throughout this paper, I represents an identity matrix of suitable dimension, and
IRn1×· · ·×IRnm is identified with IRn1+···+nm . For a differentiable mapping F : IRn → IRm,
∇F (x) denotes the transpose of the Jacobian F ′(x). For a symmetric matrix A ∈ IRn×n,
we write A º O (respectively, A Â O) to mean A is positive semidefinite (respectively,
positive definite). Given a finite number of square matrices Q1, . . . , Qm, we denote the
block diagonal matrix with these matrices as block diagonals by diag(Q1, . . . , Qm) or by
diag(Qi, i = 1, . . . , m). If J and B are index sets such that J ,B ⊆ {1, 2, . . . , m}, we
denote PJB by the block matrix consisting of the submatrices Pjk ∈ IRnj×nk of P with
j ∈ J , k ∈ B, and by xB a vector consisting of subvectors xi ∈ IRni with i ∈ B.
4
2 Preliminaries
This section recalls some background materials and preliminary results that will be used
in the subsequent sections. We begin with the interior and the boundary of Kl (l ≥ 1).
It is known that Kl is a closed convex self-dual cone with nonempty interior given by
int(Kl) :={x = (x1, x2) ∈ IR× IRl−1 | x1 > ‖x2‖
}
and the boundary given by
bd(Kl) :={x = (x1, x2) ∈ IR× IRl−1 | x1 = ‖x2‖
}.
For each x = (x1, x2) ∈ IR× IRl−1, the determinant and the trace of x are defined by
det(x) := x21 − ‖x2‖2, tr(x) := 2x1.
In general, det(x ◦ y) 6= det(x) det(y) unless x2 = αy2 for some α ∈ IR. A vector x ∈ IRl
is said to be invertible if det(x) 6= 0, and its inverse is denoted by x−1. Given a vector
x = (x1, x2) ∈ IR× IRl−1, we often use the following symmetry matrix
Lx :=
[x1 xT
2
x2 x1I
], (13)
which can be viewed as a linear mapping from IRl to IRl. It is easy to verify Lxy = x ◦ y
and Lx+y = Lx + Ly for any x, y ∈ IRl. Furthermore, x ∈ Kl if and only if Lx º O, and
x ∈ int(Kl) if and only if Lx  O. If x ∈ int(Kl), then Lx is invertible with
L−1x =
1
det(x)
x1 −xT2
−x2det(x)
x1
I +1
x1
x2xT2
. (14)
We recall from [9] that each x = (x1, x2) ∈ IR× IRl−1 admits a spectral factorization,
associated with Kl, of the form
x = λ1(x) · u(1)x + λ2(x) · u(2)
x ,
where λi(x) and u(i)x for i = 1, 2 are the spectral values and the associated spectral
vectors of x, respectively, given by
λi(x) = x1 + (−1)i‖x2‖, u(i)x =
1
2
(1, (−1)ix2
)(15)
with x2 = x2/‖x2‖ if x2 6= 0, and otherwise x2 being any vector in IRl−1 such that
‖x2‖ = 1. If x2 6= 0, then the factorization is unique. The spectral decomposition of
x, x2 and x1/2 has some basic properties as below, whose proofs can be found in [9].
5
Property 2.1 For any x = (x1, x2) ∈ IR × IRl−1 with the spectral values λ1(x), λ2(x)
and spectral vectors u(1)x , u
(2)x given as above, the following results hold:
(a) x ∈ Kl if and only if λ1(x) ≥ 0, and x ∈ int(Kl) if and only if λ1(x) > 0.
is nonempty and is called the B-subdifferential of H at z, where DH ⊆ IRn denotes the
set of points at which H is differentiable. The convex hull ∂H(z) := conv∂BH(z) is the
generalized Jacobian of H at z in the sense of Clarke [5]. For the concepts of (strongly)
semismooth functions, please refer to [20, 21] for details. We next present definitions of
Cartesian P -properties for a matrix M ∈ IRn×n, which are in fact special cases of those
introduced by Chen and Qi [3] for a linear transformation.
Definition 2.1 A matrix M ∈ IRn×n is said to have
(a) the Cartesian P -property if for any 0 6= x = (x1, . . . , xm) ∈ IRn with xi ∈ IRni, there
exists an index ν ∈ {1, 2, . . . , m} such that 〈xν , (Mx)ν〉 > 0;
(b) the Cartesian P0-property if for any 0 6= x = (x1, . . . , xm) ∈ IRn with xi ∈ IRni,
there exists an index ν ∈ {1, 2, . . . ,m} such that xν 6= 0 and 〈xν , (Mx)ν〉 ≥ 0.
Some nonlinear generalizations of these concepts in the setting ofK are defined as follows.
7
Definition 2.2 Given a mapping F = (F1, . . . , Fm) with Fi : IRn → IRni, F is said to
(a) have the uniform Cartesian P -property if for any x = (x1, . . . , xm), y = (y1, . . . , ym) ∈IRn, there is an index ν ∈ {1, 2, . . . , m} and a constant ρ > 0 such that
〈xν − yν , Fν(x)− Fν(y)〉 ≥ ρ‖x− y‖2;
(b) have the Cartesian P0-property if for any x = (x1, . . . , xm), y = (y1, . . . , ym) ∈ IRn
and x 6= y, there exists an index ν ∈ {1, 2, . . . , m} such that
xν 6= yν and 〈xν − yν , Fν(x)− Fν(y)〉 ≥ 0.
3 Properties of the functions φτ and Φτ
First, we study the favorable properties of φτ , including the globally Lipschitz continuity,
the strong semismoothness and the characterization of the B-subdifferential at any point.
Proposition 3.1 The function φτ defined as in (9) has the following properties.
(a) φτ is (continuously) differentiable at (x, y) if and only if w(x, y) ∈ int(Kl). Also,
∇xφτ (x, y) = Lx+ τ−22
yL−1z − I, ∇yφτ (x, y) = Ly+ τ−2
2xL
−1z − I.
(b) φτ is globally Lipschitz continuous with the Lipschitz constant independent of τ .
(c) φτ is strongly semismooth at any (x, y) ∈ IRl × IRl.
(d) The squared norm of φτ , i.e. ψτ , is continuously differentiable everywhere.
Proof. (a) The proof directly follows from Lemma 2.1 and the following fact that
φτ (x, y) = z(x, y)− (x + y). (23)
(b) It suffices to prove that z(x, y) is globally Lipschitz continuous by (23). Let
z = z(x, y, ε) :=[(x− y)2 + τ(x ◦ y) + εe
]1/2(24)
for any ε > 0 and x = (x1, x2), y = (y1, y2) ∈ IR× IRl−1. Then, applying Lemma A.1 in
the appendix and the Mean-Value Theorem, we have∥∥∥z(x, y)− z(a, b)
∥∥∥ =
∥∥∥∥ limε→0+
z(x, y, ε)− limε→0+
z(a, b, ε)
∥∥∥∥≤ lim
ε→0+‖z(x, y, ε)− z(a, y, ε) + z(a, y, ε)− z(a, b, ε)‖
≤ limε→0+
∥∥∥∥∫ 1
0
∇xz(a + t(x− a), y, ε)(x− a)dt
∥∥∥∥
+ limε→0+
∥∥∥∥∫ 1
0
∇yz(a, b + t(y − b), ε)(y − b)dt
∥∥∥∥≤
√2C‖(x, y)− (a, b)‖
8
for any (x, y), (a, b) ∈ IRl × IRl, where C > 0 is a constant independent of τ .
(c) From the definition of φτ and φFB
, it is not hard to check that
φτ (x, y) = φFB
(x +
τ − 2
2y,
√τ(4− τ)
2y
)+
1
2
(τ − 4 +
√τ(4− τ)
)y.
Notice that φFB
is strongly semismooth by [23, Corollary 3.3], and the functions x+ τ−22
y,12
√τ(4− τ)y and 1
2(τ − 4+
√τ(4− τ))y are also strongly semismooth. Therefore, φτ is
a strongly semismooth function since by [11, Theorem 19] the composition of strongly
semismooth functions is strongly semismooth.
(d) The proof can be found in Proposition 3.3 of [6]. 2
Proposition 3.1 (c) indicates that, when a smoothing or nonsmooth Newton method is
used to solve system (10), a fast convergence rate (at least superlinear) may be expected.
To develop a semismooth Newton method for the SOCCP, we need to characterize the
B-subdifferential ∂Bφτ (x, y) at a general point (x, y). The discussion of B-subdifferential
for φFB
was given in [19], and we here generalize it to φτ for any τ ∈ (0, 4). The detailed
derivation process is included in the appendix for completeness.
Proposition 3.2 Given a general point (x, y) ∈ IR× IRl−1, each element in ∂Bφτ (x, y)
is of the form V = [Vx − I Vy − I] with Vx and Vy having the following representation:
(a) If (x− y)2 + τ(x ◦ y) ∈ int(Kl), then Vx = L−1z Lx+ τ−2
2y and Vy = L−1
z Ly+ τ−22
x.
(b) If (x− y)2 + τ(x ◦ y) ∈ bd(Kl) and (x, y) 6= (0, 0), then
Vx ∈{
1
2√
2w1
(1 wT
2
w2 4I − 3w2wT2
)(Lx +
τ − 2
2Ly
)+
1
2
(1
−w2
)uT
}
Vy ∈{
1
2√
2w1
(1 wT
2
w2 4I − 3w2wT2
)(Ly +
τ − 2
2Lx
)+
1
2
(1
−w2
)vT
}(25)
for some u = (u1, u2), v = (v1, v2) ∈ IR × IRl−1 satisfying |u1| ≤ ‖u2‖ ≤ 1 and
|v1| ≤ ‖v2‖ ≤ 1, where w2 = w2
‖w2‖ .
(c) If (x, y) = (0, 0), then Vx ∈ {Lu}, Vy ∈ {Lv} for some u = (u1, u2), v = (v1, v2) ∈IR× IRl−1 satisfying ‖u‖, ‖v‖ ≤ 1 and u1v2 + v1u2 = 0, or
Vx ∈{
1
2
(1
w2
)ξT +
1
2
(1
−w2
)uT + 2
(0 0
(I − w2wT2 )s2 (I − w2w
T2 )s1
)}
Vy ∈{
1
2
(1
w2
)ηT +
1
2
(1
−w2
)vT + 2
(0 0
(I − w2wT2 )ω2 (I − w2w
T2 )ω1
)}(26)
9
for some u = (u1, u2), v = (v1, v2), ξ = (ξ1, ξ2), η = (η1, η2) ∈ IR × IRl−1 satisfying