1/100 P i ? Heinrich-Heine-Universit¨ at D¨ usseldorf – Lehrstuhl f¨ ur Mathematische Optimierung http://www.opt.uni-duesseldorf.de/ Interior Methods for Linear and Convex Optimization Tunis, Nov. 18, 2004 Florian Jarre [email protected]
1/100
P �i ?
22333ML232
Heinrich-Heine-Universitat Dusseldorf– Lehrstuhl fur Mathematische Optimierung
http://www.opt.uni-duesseldorf.de/
Interior Methods for Linear and Convex
Optimization
Tunis, Nov. 18, 2004
Florian [email protected]
2/100
P �i ?
22333ML232
Survey
Linear Programming, Simplex Method
Linear Programming, Interior Point Methods
Convex Programming, Interior Point Methods
Semidefinite Optimization
• Theory (Duality, Newton’s Method)
• Sensitivity Analysis
• Applications (Lyapunov stability, Max-Cut Approximation)
3/100
P �i ?
22333ML232
PART O
Brief Introduction to Linear Programming
4/100
P �i ?
22333ML232
A Diet Problem
Table 1: Data for the diet problem.
carbon hydrates proteines vitamines costs
1 unit Company A 20 E 15 E 5 E 10 Euro
1 unit Company B 20 E 3 E 10 E 7 Euro
demand/day 60 E 15 E 20 E
minimize 10x1 + 7x2
subject to 20x1 + 20x2 ≥ 60,
15x1 + 3x2 ≥ 15,
5x1 + 10x2 ≥ 20,
x1 ≥ 0, x2 ≥ 0.
(1)
5/100
P �i ?
22333ML232
Figure 1: Feasible set for the diet problem.
6/100
P �i ?
22333ML232
Indroduction of slack variables
minimize 10x1 + 7x2
subject to 20x1 + 20x2 − x3 = 60,
15x1 + 3x2 − x4 = 15,
5x1 + 10x2 − x5 = 20,
xi ≥ 0 (1 ≤ i ≤ 5).
(2)
Crash Course of Simplex method: Linear program in standard form:
minimize cTx
subject to Ax = b,
x ≥ 0.
(3)
7/100
P �i ?
22333ML232
Finding a “vertex” of the feasible set
1. “Vertex” of a polyhedron in IRn: Intersection of n (linearly independent)
linear constraints, so-called active constraints.
2. For A ∈ IRm×n, the m equality constraints given by Ax = b are always
active at any feasible point.
3. “Pick” additional n − m constraints of the form xi ≥ 0, i.e. set n − m
components of x to zero (non-basic variables). The remaining variables
(basic variables) are fixed such that Ax = b holds true. (This is possible
if the columns of A that correspond to the basic variables are linearly
independent).
4. If we pick the non-basic variables in such a way that the basic variables
are nonnegative when solving Ax = b then the basis (i.e. the set of basic
variables) is called feasible.
8/100
P �i ?
22333ML232
Key point of simplex method:
By “intelligent bookkeeping”, one can identify all directions (all
edges of the polyhedron) along which the objective function cTx
improves. Moreover, it is easy to compute the vertex “at the other
end” of a given edge. Choosing such an edge and the vertex at
the other end is called “pivot step” of the simplex method.
Moving from one vertex to the next along edges that improve cTx
the simplex method either finds an optimal vertex or it finds a
feasible ray along which the objective function tends to −∞.
(Some technicalities when there happen to be more than n active
constraints at a given vertex, several changes of basic and nonbasic
variables at the same vertex, strategies to avoid “cycling”.)
9/100
P �i ?
22333ML232
Examples
www.mcs.anl.gov/home/otc/Guide/CaseStudies/diet/index.html
www.mcs.anl.gov/home/otc/Guide/CaseStudies/simplex/index.html
Caution
The polyhedral structure (network of neighboring vertices) may be much more
complicated than one might expect from our 3-dimensional intuition:
- Consider polyhedra for which each vertex is connected to each other vertex by
an edge. For n = 1, 2, 3 such a polyhedron must have at most n + 1 vertices.
For n = 4 it may have 6 vertices, for n = 45 it may have 512 vertices!
Consider a polyhedron P1 ⊂ IRn and a polyhedron P2 ⊂ IRk for some k < n
such that
P2 = {x | (x, y) ∈ P1, x ∈ IRk, y ∈ IRn−k}.
If P1 has m vertices, P2 may have 2O(m) vertices.
(The projection may have exponentially many more vertices.)
10/100
P �i ?
22333ML232
There are open questions still today like the conjecture of Hirsch
(1959): “There exists a choice of pivot steps such that the simplex
method terminates after at most n −m steps.” (?) (Unfortu-
nately, the “opposite” is easy to prove: For many commonly used
pivot step selection rules there are examples such that the simplex
method takes 2O(n) steps.)
Only in 1979, one could find a (new) method that is guaranteed
to converge in a polynomial number of steps. (Khachians proof
of convergence for the ellipsoid method due to Yudin and Ne-
mirovski / Shor.) However, this method was very slow in practical
applications.
In 1984, Karmarkar presented another class of polynomial methods
that is also very efficient in practice and will be discussed next:
11/100
P �i ?
22333ML232
PART I
Interior Point Methods for Linear Programming
12/100
P �i ?
22333ML232
Duality
Figure 2: Optimality conditions for max bT y | AT y ≤ c
13/100
P �i ?
22333ML232
If y is optimal, then there exist x1, x2 such that
b = x1a1 + x2a2 with xi ≥ 0
More general, there exists x such that
b =n∑
i=1xiai, x ≥ 0, and xi · (ci − aT
i y) = 0.
Rewritten: Find x, y, s such that
Ψ0(x, y, s) :=
Ax− b
ATy + s− c
Xs
=
0
0
0
, x, s ≥ 0, (4)
where X = Diag(x1, . . . , xn).
14/100
P �i ?
22333ML232
Can we use Newton’s method to solve Ψ0(x, y, s) = 0?
Ψ0(x + ∆x, y + ∆y, s + ∆s)
≈ Ψ0(x, y, s) + DΨ0(x, y, s)(∆x, ∆y, ∆s) !=0,
i.e.
(∆x, ∆y, ∆s) = −(DΨ0(x, y, s))−1Ψ0(x, y, s)
where
DΨ0(x, y, s) =
A 0 0
0 AT I
S 0 X
.
15/100
P �i ?
22333ML232
When A has full row rank (preprocessing!) and when x > 0,
s > 0, one can verify that DΨ0(x, y, s) is invertible.
Problem: The solutions x, s of Ψ0(x, y, s) = 0 contain (many)
zero-components. Newton’s method typically converges to a so-
lution where x 6≥ 0, s 6≥ 0, i.e. where some components are
negative.
Therefore: For fixed µ > 0 consider
Ψµ(x, y, s) :=
Ax− b
ATy + s− c
Xs− µe
=
0
0
0
, x, s > 0.
The solutions (x(µ), y(µ), s(µ)) of this system are called points
on the central path. The central path is well defined for all µ > 0 if
there exists an “interior point” (x, y, s) with Ax = b, ATy+s = c
and x, s > 0.
16/100
P �i ?
22333ML232
The derivatives of Ψ0 and Ψµ coincide. If Newton’s method
starts sufficiently close to (x(µ), y(µ), s(µ)), it will converge to
(x(µ), y(µ), s(µ)) (which is a strictly positive solution).
Interior point method:
Fix µ > 0.
1. Do one Newton step to approximate (x(µ), y(µ), s(µ)).
2. Reduce µ > 0 and repeat.
17/100
P �i ?
22333ML232
Note:
Points on the central path are also solutions of the following con-
vex logarithmic barrier problems:
mincTx
µ−
n∑i=1
ln xi | Ax = b,
min−bTy
µ−
n∑i=1
ln si | ATy + s = c.
Using Newton’s method to solve one or both of these problems
will yield different search directions (so-called primal direction or
dual direction).
18/100
P �i ?
22333ML232
Convergence:
Define a positive diagonal matrix D by D2 := XS−1.
When Ax = b and ATy + s = c hold, the right hand side
Ψµ(x, y, s) of the Newton step (DΨµ(x, y, s))−1Ψµ(x, y, s) is just
(0, 0, µe−Xs).
Let r := Xs− µe, then the solution of the Newton step is given
by q = DX−1r and:
∆y = (AD2AT )−1ADq,
∆x = D2AT∆y −Dq,
∆s = −D−1q −D−2∆x,
19/100
P �i ?
22333ML232
To analyze the Newton step we recall that
ΠR := DAT (AD2AT )−1AD
is the “orthogonal projection” onto
R = R(DAT ) = {DATw | w ∈ IRm}.
Moreover,
R(DAT ) ⊥ N(AD) and R(DAT )⊕N(AD) = IRn,
N = N(AD) := {y ∈ IRn | ADy = 0} is the null space of AD.
20/100
P �i ?
22333ML232
Above definitions imply:
DAT∆y = ΠR q. (5)
∆x = D(ΠR q − q) = −DΠN q, (6)
∆s = −D−1(q + D−1∆x) = −D−1ΠR q. (7)
21/100
P �i ?
22333ML232
Analyze one step, (x, y, s)→ (x + ∆x, y + ∆y, s + ∆s).
r = (X + ∆X)(s + ∆s)− µe
= Xs +
−r︷ ︸︸ ︷X∆s + S∆x +∆X∆s− µe
= Xs− r + ∆X∆s− µe = ∆X∆s = ∆X∆s,
∆x := −D−1∆x = ΠN q and ∆s := −D∆s = ΠR q
Xs = r + µe and R = Diag(r). Set β = ‖r‖2/µ.
‖DX−1‖22 = ‖(R + µI)−1‖2 = max1≤i≤n
1
|ri + µ|≤ 1
µ(1− β), (8)
This implies
β := ‖q‖2 = ‖DX−1r‖2 ≤ β√
µ/(1− β), (9)
22/100
P �i ?
22333ML232
and
‖∆x‖2 = β cos θ, ‖∆s‖2 = β sin θ. (10)
Here, θ is the angle between q and ΠN q. From the representation
of r, (??) and (??) follows
‖r‖2 = ‖∆X∆s‖2 ≤ β2 cos θ sin θ ≤ 1
2(1− β)µβ2 ≤ µβ2.
Second but last inequality we used |2 cos θ sin θ| = | sin(2θ)| ≤ 1.
Hence, the relative error ‖Xs−µe‖2/µ = ‖r‖2/µ is squared after
each step of the Newton method.
(NOTE: We can give precise bounds and not a statement like
“there exists a neighborhood U and a constant M <∞ such that
for any starting point in U we have ‖x+∆x−x∗‖ ≤M‖x−x∗‖2”.)
23/100
P �i ?
22333ML232
Short Step Algorithm.
Input: x0 > 0, y0, s0 > 0 and µ0 > 0 such that Ax0 = b,
ATy0 + s0 = c, X0s0−µ0e = r0 and ‖r0‖2/µ0 ≤ 12 . (This means
the starting point is close to the central path.)
Let ε > 0 be given. Set k = 0.
1. Do one step of Newton’s method for finding (x(µk), y(µk), s(µk)).
Let xk+1 = xk + ∆x, yk+1 = yk + ∆y and sk+1 = sk + ∆s.
2. Reduce µk to µk+1 := µk(1− 16√
n).
3. Set k = k + 1.
4. If µk ≤ ε/n, then STOP, else go to Step 1.
24/100
P �i ?
22333ML232
Mehrotras predictor-corrector algorithm
Input: (x0, y0, s0) with (x0, s0) > 0, ε > 0, a bound M � 0.
Set k := 0.
1. Set (x, y, s) := (xk, yk, sk), µk := (xk)Tsk/n.
2. If ‖Ax− b‖ < ε,∥∥∥ATy + s− c
∥∥∥ < ε and µk < ε: STOP, the
iterate is an approximate solution of the linear program.
3. If ‖x‖ > M or ‖s‖ > M : STOP, das problem either has no
feasible solution or is of “bad condition”.
4. Solve A
AT I
S X
∆xN
∆yN
∆sN
=
b− Ax
c− ATy − s
−Xs
.
25/100
P �i ?
22333ML232
5. Compute the maximum step lengths along ∆xN and ∆sN ,
αNx := min{1, min
i:∆xNi <0− xi
∆xNi
}, αNs := min{1, min
i:∆sNi <0− si
∆sNi
}.
6. Set
µ+N = (x + αN
x ∆x)T (s + αNs ∆s)/n
and
µC = µk · (µ+N/µk)
3.
7. SolveA
AT I
S X
∆xC
∆yC
∆sC
=
b− Ax
c− ATy − s
µCe−Xs−∆XN∆sN
.
26/100
P �i ?
22333ML232
8. Select a damping parameter ηk ∈ [0.8, 1.0) and compute the
primal and dual step length along ∆xC and ∆sC via
αCmax,x := min
i:∆xCi <0{− xi
∆xCi
}, αCmax,s := min
i:∆sCi <0{− si
∆sCi
}
and
αCx := min{1, ηkα
Cmax,x}, αC
s := min{1, ηkαCmax,s}.
9. Set
xk+1 := xk+αCx ∆xC , (yk+1, sk+1) := (yk, sk)+αC
x (∆yC , ∆sC),
as well as k := k + 1 and go to Step 1.
27/100
P �i ?
22333ML232
PART II
Interior Point Methods for Convex Programming
28/100
P �i ?
22333ML232
Basic problem
minimize f0(x) s.t. x ∈ IRn : fi(x) ≤ 0 for 1 ≤ i ≤ m.
Most of the following results are due to Nesterov and Nemirovskii:
Interior Point Polynomial Methods in Convex Programming, SIAM 1994.
Assumptions
fi three times continuously differentiablef0 and −∑
i≥1 ln(−fi) convex Always satified if all fi are convex
29/100
P �i ?
22333ML232
Simplifications
• f0 is linear, f0(x) ≡ cTx
(Without loss of generality!)
• The feasible set
S := {x | fi(x) ≤ 0 for 1 ≤ i ≤ m}
is bounded and has nonemtpy interior.
(Some restriction of generality)
30/100
P �i ?
22333ML232
Logarithmic barrier method
φ(x) := −m∑
i=1ln(−fi(x))
logarithmic barrier function for S, and for some fixed θ ≥ 1 and
λ > min{cTx | x ∈ S}:
ϕ(x, λ) := −θ ln(λ− cTx) + φ(x)
barrier function for S(λ) := S ∩ {x | cTx ≤ λ}.
The minimizer x of φ is called the analytic center of S.
The minimizer x(λ) of ϕ( . , λ) is called the analytic center of
S(λ).
31/100
P �i ?
22333ML232
Method of analytic centers (Huard, Sonnevend, Renegar):
Let x0 ∈ S◦ and λ0 > cTx0 be given. Set σ = 12 and k = 0.
Repeat
1.) Reduce λk to λk+1 := λk − σ(λk − cTxk).
2.) Starting at xk do few steps of Newton’s method for minimizing
ϕ( . , λk+1).
3.) Set k := k + 1.
End.
Crucial questions:
1. How fast does Newton’s method converge?
2. How big is λk − cT xk compared to the unknown distance cT xk − λopt?
32/100
P �i ?
22333ML232
Distance to optimality
33/100
P �i ?
22333ML232
First question:
• For simplicity, restrict examination to φ
– and apply the results for Newton’s method later to ϕ which has the same structure!
• Intuitively, want ∇2φ to be “nearly constant” for Newton’s
method to work well.
• In fact, want relative change of ∇2φ to be small.
• The absolute change of ∇2φ is given by ∇3φ.
• Hence want ∇3φ to be small compared to ∇2φ.
• Look at “Karmarkar’s barrier function” for the positive real axis:
φ(t) := − ln t. Here, φ′′′(t) ≤ 2(φ′′(t))3/2.
• Generalize this to n dimensions:
34/100
P �i ?
22333ML232
Self-concordance
The barrier function φ : S◦ → IR is called (strongly) self-concor-
dant if for any x ∈ S◦ and any h ∈ IRn the restriction l = lx,h of
φ to the line x + th,
l(t) := φ(x + th)
satisfies l′′′(0) ≤ 2(l′′(0))3/2.
(This assumes l is C3-smooth and l′′ ≥ 0, i.e. l is convex.)
(The power 3/2 garantees invariance w.r.t. the length of h !)
35/100
P �i ?
22333ML232
Does this make sense?
I.) Do there exist functions that satisfy this condition?
1. Easy to verify (binomial formula):
If φ1 and φ2 are self-concordant, then so is φ1 + φ2.
(provided the domains of φ1 and φ2 intersect)
2. Very easy to verify:
If A is an affine mapping and φ is self-concordant, then so is
φ(A( . )). (provided the range of A intersects the domain of φ.)
3. This implies that −∑ln(bi− aT
i x) is a self-concordant barrier
function for the polyhedron {x | aTi x ≤ bi}.
36/100
P �i ?
22333ML232
4. Convex quadratic constraints q(x) ≤ 0:
For fixed x with q(x) < 0 and fixed h, the term q(x + th) can
be factored into the product of two linear terms, so that by 1.
and 2., the function l(t) = − ln(−q(x+th)) is self-concordant!
37/100
P �i ?
22333ML232
5. Semidefinite constraints X � 0 (X = XT ∈ IRn×n):
The barrier function φ(X) := − ln det(X) is self-concordant:
For fixed X � 0 and fixed H = HT ∈ IRn×n it follows
l(t) = − ln det(X + tH)
= − ln det(X1/2(I + tX−1/2HX−1/2)X1/2)
= −2 ln det(X1/2)− ln(n∏
j=1(1 + tλi))
= − ln det(X)−n∑
j=1ln(1 + tλi),
where λi are the eigenvalues of X−1/2HX−1/2 independent
of t. Again, by 1. and 2. it follows that φ is self-concordant.
Note that
l′(0) = trace(X−1/2HX−1/2) = trace(HX−1) =: H •X−1.
38/100
P �i ?
22333ML232
Moreover, let some symmetric matrices A(0), . . . , A(m) be given.
Consider the problem
minimize bTy s.t. A(y) := A(0) +m∑
1=1yiA
(i) � 0.
By affine invariance, also the function Φ(y) := − ln det(A(y)) is
self-concordant. Moreover, the derivatives of Φ can be explicitely
stated:
∂
∂yiΦ(y) = −A(y)−1 • A(i),
∂2
∂yi∂yjΦ(y) = A(y)−1A(i)A(y)−1 • A(j).
39/100
P �i ?
22333ML232
II.) Does this condition really garantee that Newton’s method
converges well?
Since l′′′(0) ≤ 2l′′(0)3/2 holds for any x (defining l), this implies
in fact that l′′′(t) ≤ 2l′′(t)3/2 for any t in the domain of l.
Let u(t) := l′′(t), then u′(t) ≤ 2u(t)3/2 is a valid differential
inequality.
The extremal solution of v′(t) = 2v(t)3/2 with initial value
v(0) = u(0) = hT∇2φ(x)h := δ2
is given by v(t) = 1/(δ−1 − t)2.
Whenever v has finite values, u must be finite, and hence so must
be l.
40/100
P �i ?
22333ML232
This implies the following lemma:
Inner Ellipsoid
Let E(x) := {h | hT∇2φ(x)h ≤ 1} be the unit ball of the (semi-)
norm given by the Hessian Hx := ∇2φ(x) of φ at x.
(The (semi-) norm is ‖h‖Hx:=√
hTHxh.)
Then, for any x ∈ S◦ the inclusion x + E(x) ⊂ S holds true.
41/100
P �i ?
22333ML232
Inner ellipsoid
42/100
P �i ?
22333ML232
Further Results
Equivalent Relative Lipschitz condition:
|hT (∇2φ(x + ∆x)−∇2φ(x))h| ≤ δM(δ)hT∇2φ(x)h,
where
δ := ‖∆x‖Hxand M(δ) :=
2
1− δ+
δ
(1− δ)2 = 2 + O(δ).
(Somewhat more difficult to show.)
43/100
P �i ?
22333ML232
Newton’s method
Let x ∈ S◦ be given. Denote Hx := ∇2φ(x).
Let ∆x := −H−1x ∇φ(x) be the Newton step for minimizing φ
starting at x ∈ S◦. Assume that δ := ‖∆x‖Hx< 1.
Then, by the inner ellipsoid x + ∆x ∈ S◦.
Let x := x + ∆x and ∆x := −H−1x ∇φ(x) be the “next” Newton
step. Then
‖∆x‖Hx≤ δ2
(1− δ)2 .
This implies quadratic convergence in at least one fifth of the
inner ellipsoid about the center. (Related idea of proof as for inner ellipsoid.)
44/100
P �i ?
22333ML232
The second question, distance to optimality:
Self-concordance is a (relative) Lipschitz condition on the Hessian of φ.
• By adding a linear perturbation to φ, the self-concordance con-
dition obviously does not change.
• But by adding a linear perturbation to φ we can make any point
x ∈ S◦ a minimizer of the perturbed function.
• For x close to the boundary, the perturbation will have to be
large, and the perturbed function will have a “large gradient” at
the minimizer of φ.
• If we want to avoid points close to the boundary to be a min-imizer of “our” barrier function, we may limit the norm of itsgradient – of course with respect to the canonical norm ‖ . ‖Hx
.
45/100
P �i ?
22333ML232
With the notation used in the self-concordance condition, we re-
quire for some fixed θ ≥ 1:
l′(0) ≤√
θl′′(0)1/2.
If φ is self-concordant and satisfies the above condition, we say
φ is θ-self-concordant.
This condition is also affine invariant.
It is “additive” w.r.t. θ, in the sense that if φ1 and φ2 satisfy the
condition with values θ1 ≥ 1 and θ2 ≥ 1, then so does φ1 + φ2
with value θ1 + θ2.
The previous examples satisfy the condition with θ = 1 for a linear
or convex quadratic constraint, and θ = l for an l× l semidefinite
constraint.
46/100
P �i ?
22333ML232
Results
1. We have λ−cTx(λ) > cTx(λ)−λopt when θ in the definition of
ϕ is chosen as least as large as the self-concordance parameter
of φ. “Identical” proof as for inner ellipsoid, just the other way round.
2. Let x ∈ S◦ be arbitrary and let φ be a θ-self-concordant barrier
function for S. Let H := {y | (y − x)T∇φ(x) ≥ 0} be a half
space cutting through x and E(x) be the inner ellipsoid. Then
S ∩H ⊂ x + (θ + 2√
θ)E(x).
(H = IRn when ∇φ(x) = 0.) Again, similar proof as for inner ellipsoid.
47/100
P �i ?
22333ML232
Inner and outer ellipsoid
48/100
P �i ?
22333ML232
• Now we have all essential tools to show that the method of
centers converges at a fixed rate.
• If λk is changed only little at each iteration (namely σ =
1/(8√
θ) rather than σ = 1/2) then only one step of Newton’s
method suffices at each iteration, and after 12√
θ iterations the
unknown distance λk − λopt is reduced by a factor at least 1/2.
49/100
P �i ?
22333ML232
Discussion
The given rate of convergence (for a problem with 10000 convex
quadratic constraints, 1200 iterations are needed to reduce the
error bound λk − λopt by a factor 1/2) is too slow for practical
implementations.
BUT it garantees a very weak dependence on the data of the
problem — the rate only depends on a weighted number of con-
straints, where complicated conditions like semidefiniteness con-
straints are counted with a somewhat higher weight.
50/100
P �i ?
22333ML232
• No dependence on the number of unknowns.
(Generalization to Hilbert space by Renegar)
• Assuming exact artihmetic – unlike the conjugate gradient or steepest
descent methods, no dependence on any condition numbers of the
problem.
• Assuming exact artihmetic – unlike the simplex method, no dependence
on degeneracy.
Hence, the CONCEPT is very robust.
Find an acceleration based on this concept.
51/100
P �i ?
22333ML232
Modifications
• Infeasible starting points, empty interior, unbounded set of op-
timal solutions.
• Predictor corrector strategy: Under “mild” conditions, the cen-
tral path x(λ) forms a smooth curve leading to an optimal
solution.
Through any given point x ∈ S◦ one can define a perturbed
central path x(λ) leading to an optimal solution as well, and
the tangent to this curve is “easily” computable. (Same system
as used for Newton’s method.)
Do some extrapolation along this tangent and start the Newton
corrections from the extrapolated point.
52/100
P �i ?
22333ML232
Conic formulations
• Each convex program that posesses a self-concordant barrier
function can be expressed in conic form with a self-concordant
barrier function of the same order of magnitude. (Nesterov and
Nemirovskii 1994, Freund and J., 1999 “optimal barrier”)
• Conic formulations allow for primal-dual methods that have
turned out to be more efficient in practical implementations. (Their
theoretical complexity is the same as the one of the method of
centers.)
• Many programs (like semidefinite programs) are naturally given
in conic form and thus allow for direct application of primal dual
methods.
53/100
P �i ?
22333ML232
PART III
Semidefinite Programming
54/100
P �i ?
22333ML232
Notation
Sn: The space of symmetric n× n-matrices
X � 0, (X � 0): X ∈ Sn is positive semidefinite (positive definite).
Standard scalar product on the space of n× n-matrices
〈C, X〉 := C •X := trace(CTX) =∑i,j
Ci,jXi,j
inducing the Frobenius norm,
X •X = ‖X‖2F .
55/100
P �i ?
22333ML232
Notation (continued)
For given symmetric matrices A(i) a linear map A from Sn to IRm is given by
A(X) =
A(1) •X
...
A(m) •X
.
The adjoint operator A∗ satisfying
〈A∗(y), X〉 = yTA(X) ∀ X ∈ Sn, y ∈ IRm
is given by
A∗(y) =m∑
i=1yiA
(i).
56/100
P �i ?
22333ML232
Linear Semidefinite Programs
minimize C •X where A(X) = b
X � 0
Similar structure as linear programs, only
“the condition x ≥ 0 (componentwise)”
replaced by
“the condition X � 0 (semidefinite)”
Can be solved by the (accelerated!) method of centers from
Part II – or by specialized primal-dual methods.
57/100
P �i ?
22333ML232
Basic Theory
If there exists X � 0 with A(X) = b (strict feasibility), then
(P ) inf C •X s.t. A(X) = b, X � 0
(D) = sup bTy s.t. A∗(y) + S = C, S � 0.
If (P ) and (D) have strictly feasible solutions, then the optimalsolutions Xopt and yopt, Sopt of both problems exist and satisfy the
equation
XoptSopt = 0.
58/100
P �i ?
22333ML232
Basic Theory (continued)
Conversely, any pair X and y, S of feasible points for (P ) and (D)
satisfying
A(X) = b, X � 0
A∗(y) + S = C, S � 0
XS = 0 (or SX = 0)
is optimal for both problems.
58/100
P �i ?
22333ML232
Basic Theory (continued)
Conversely, any pair X and y, S of feasible points for (P ) and (D)
satisfying
A(X) = b, X � 0
A∗(y) + S = C, S � 0
XS = 0 (or SX = 0)
is optimal for both problems.
For Newton’s method symmetrize (Monteiro et. al.) and perturb
the last equation for some small µ > 0:
E.g. M(X, S) = 12(XS + SX) (AHO)
59/100
P �i ?
22333ML232
Basic Theory (continued)
Conversely, any pair X and y, S of feasible points for (P ) and (D)
satisfying
A(X) = b, X � 0
A∗(y) + S = C, S � 0
M(X, S) = 0 ←− µI
is optimal for both problems.
For Newton’s method symmetrize (Monteiro et. al.) and perturb
the last equation for some small µ > 0:
E.g. M(X, S) = 12(XS + SX) (AHO)
60/100
P �i ?
22333ML232
Basic Theory (continued)
For example (AHO), the linearization of
A(X) = b, X � 0
A∗(y) + S = C, S � 0
XS + SX = 2µI
60/100
P �i ?
22333ML232
Basic Theory (continued)
For example (AHO), the linearization of
A(X) = b, X � 0
A∗(y) + S = C, S � 0
XS + SX = 2µI
yields the linear system for ∆X, ∆y, ∆S:
A(∆X) = b−A(X), X � 0
A∗(∆y) + ∆S = C −A∗(y)− S, S � 0
X∆S + ∆XS + S∆X + ∆SX = 2µI −XS − SX.
61/100
P �i ?
22333ML232
Summarizing, linear SDPs are well analyzed and there exists
Numerically efficient
and polynomial
public domain software
for linear semidefinite programs,
e.g. SEDUMI by Jos Sturm.
62/100
P �i ?
22333ML232
Example
In our first example we consider the differential equation
x(t) = Ax(t) with initial value x(0) = x(0).
The matrix A is called stable, if for all initial values x(0) = x(0)
the solutions x(t) converge to zero when t→∞.
It is well known that this is the case if, and only if, the real part
of all eigenvalues of A is negative,
Re(λi(A)) < 0 for 1 ≤ i ≤ n.
63/100
P �i ?
22333ML232
By Lyapunov’s theorem, this is the case if, and only if,
∃P � 0 : ATP + PA ≺ 0.
Motivation:
d
dt‖x(t)‖2P
=d
dtx(t)TPx(t)
= x(t)TPx(t) + x(t)TPx(t)
= (Ax(t))TPx(t) + x(t)TPAx(t)
= x(t)T (ATP + PA)x(t)
< 0
for all x(t) 6= 0, hence, the P -Norm of x(t) is strictly monotonously
decreasing when t→∞.
64/100
P �i ?
22333ML232
If the matrices Pi, 1 ≤ i ≤ n(n + 1)/2, form a basis of Sn
the determination of a matrix P =∑
i yiPi with P � 0 and
ATP + PA ≺ 0 leads to a linear semidefinite program
min{λ |
∑yiPi � 0, λI −
∑yi(A
TPi + PiA) � 0}
< 0,
a linear semidefinite program in the “dual” form.
(In this form typically unbounded.)
(There are cheaper ways of checking numerically
that the real parts of all eigenvalues of A are negative!)
65/100
P �i ?
22333ML232
Now consider the system
x(t) = A(t)x(t) (∗)where the matrix A(t) is not known explicitely.
(For example, when there are small unknown perturbations of the matrix A.)
If there exist matrices A(i), i = 1, 2, . . . , K, with
A(t) ∈ conv({
A(i)}
i≤i≤K
)for all t ≥ 0,
then the existence of a Lyapunov matrix P � 0 with
(A(i))TP + PA(i) ≺ 0 for 1 ≤ i ≤ K
is a sufficient condition for the stability of (∗). – Because then,
A(t)TP + PA(t) ≺ 0 andd
dt‖x(t)‖2P = x(t)T (A(t)TP + PA(t))x(t) < 0.
66/100
P �i ?
22333ML232
Just a brief reminder...
The condition that the real parts of all eigenvalues of all A(i) are
negative is a necessary but not a sufficient condition:
When the real part of all eigenvalues of two matrices A and B are
negative, this may not be the case for 12(A + B), choose e.g.
A =
−1 4
−1
, B =
−1
4 −1
.
(All eigenvalues real and equal to −1 but 12(A+B) has eigenvalue
+1.)
67/100
P �i ?
22333ML232
A sensitivity result for linear SDPs
Uniqueness-assumption:
Data D of a pair (P ) and (D) of primal and dual linear semidefinite programs:
D = [A, b, C] with A : Sn → IRm, b ∈ IRm, C ∈ Sn.
Assume that (P ) and (D) satisfy Slater’s condition, and that X ∈ Sn and
y ∈ IRm, S ∈ Sn are unique and strictly complementary solutions of (P )
and (D), that is,
A(X) = b, X � 0,
A∗(y) + S = C, S � 0,
XS = 0, X + S � 0.
68/100
P �i ?
22333ML232
Theorem (Freund & J. 2003)
If the data of (P ) and (D) is changed by sufficiently small perturbations
∆D = [∆A, ∆b, ∆C],
then the optimal solutions X(D), y(D), S(D) of the semidefinite programs are
differentiable functions of the perturbations, i.e.
X(D + ∆D) = X(D) + DDX[∆D] + o(‖∆D‖).
Furthermore, the derivatives
X := DDX[∆D], y := DDy[∆D], and S := DDS[∆D],
of the solution X(D), y(D), S(D) satisfy
A(X) = ∆b−∆A(X),
A∗(y) + S = ∆C −∆A∗(y),
XS + XS = 0.
69/100
P �i ?
22333ML232
Idea of Proof:
1) Slater and continuity:
The perturbed problem must have a solution.
2) Subtract optimality conditions and take limit:
We get precisely the statement of the theorem.
3) It remains to be shown that this system is “nonsingular”.
(It is an overdetermined system just by the number of
equations and unknowns.)
70/100
P �i ?
22333ML232
Idea of Proof: (continued)
By complementarity, XS = 0 = SX, and thus the matrices X � 0 and S � 0
commute. This garantees that there exists a unitary matrix U and diagonal
matrices
Λ = Diag (λ1, λ2, . . . , λn) � 0 and Σ = Diag (σ1, σ2, . . . , σn) � 0
such that
X = UΛUT and S = UΣUT .
Partition
λ1, λ2, . . . , λk > 0 and σk+1, σk+2, . . . , σn > 0.
Transform so that, without loss of generality, X = Λ, S = Σ and consider the
upper triangular part Πup(∆XΣ + Λ∆S) = 0,
along with A(∆X) = 0 and A∗(∆y) + ∆S = 0.
71/100
P �i ?
22333ML232
Idea of Proof: (continued II)
Using the structure of the equation and uniqueness of the optimal solution
shows that this system has only the zero solution.
72/100
P �i ?
22333ML232
Idea of Proof: (last)
4) Implicit function theorem ... (Done)
Key was to identify a nonsingular part of the overdetermined sys-
tem – that can also be used unmerically.
Proof does not use the central path or interior-point techniques.
5) Upper semicontinuity of optimal solutions for more general cone
programs was established by Robinson (1982).
Here we consider special (linear semidefinite) cone programs.
There is a simple example that this theorem does not hold for
strictly complementary solutions of more general cone programs:
73/100
P �i ?
22333ML232
Example
74/100
P �i ?
22333ML232
Maximize x1 subject to these (infinitely many) linear constraints.
Add the redundant constraint x1 ≤ 1. (All other constraints are
‘facet defining’.)
Then the optimal solution is unique, strictly complementary, but
the only active constraint is the redundant constraint x1 ≤ 1.
If the objective gradient (1, 0)T is changed a bit, the optimal solu-
tion jumps between the “vertices” close to (1, 0)T . In particular,
it is not differentiable.
From this set form a closed convex cone in IR3 to have a conic
program.
75/100
P �i ?
22333ML232
Corollary
Any step X + tX for t 6= 0 is (typically) infeasible in the sense
that X + tX 6� 0. In some applications the following formula for
the second directional derivative X := 12D
2DX(D)[∆D, ∆D] may
be useful:
A(X) = −∆A(X),
A∗(y) + S = −∆A∗(y),
XS + XS = −XS.
This is the same system matrix as for the first derivative with
different right hand side.
76/100
P �i ?
22333ML232
Note:
When X = UΛUT where the diagonal matrix Λ has a leading
nonzero diagonal block Λ1 as in the preceeding proof, then X has
the following structure:
X = U
A B
BT 0
UT ,
and X has the structure
X = U
∗ ∗∗ BTΛ−1
1 B
UT .
Setting ∗ = 0 yields a minimum norm second order correction
towards the positive semidefinite cone maintaining the multiplicity
of the zero eigenvalue (up to third order terms).
77/100
P �i ?
22333ML232
A combinatorial Application (Max-Cut)
From the set V of vertices of a graph select a subset V1 such that
the number (or the total weight) of the edges from V1 to V \V1 is
maximized.
77/100
P �i ?
22333ML232
A combinatorial Application (Max-Cut)
From the set V of vertices of a graph select a subset V1 such that
the number (or the total weight) of the edges from V1 to V \V1 is
maximized.
Defining a vector x with components xi = 1 if i ∈ V1 and xi = −1
if i 6∈ V1, one can construct a symmetric matrix Q from the
edges of the graph, such that the max-cut problem is equivalent
to solving the following binary quadratic program:
minimize xTQx s.t. x ∈ {−1, 1}n.
78/100
P �i ?
22333ML232
The max-cut problem:
minimize xTQx s.t. x ∈ {−1, 1}n.
78/100
P �i ?
22333ML232
The max-cut problem:
minimize xTQx s.t. x ∈ {−1, 1}n.
Note: For x ∈ {−1, 1}n there is
xTQx = trace(xTQx) = trace(QxxT ) = trace(QX) = Q •X
with X � 0,
rank(X) = 1,
Diag(X) = e := (1, 1 . . . , 1)T .
79/100
P �i ?
22333ML232
Relaxation
X = xxT with x ∈ {±1}n iff
X � 0,
rank(X) = 1, ←− omitt
Diag(X) = e := (1, 1 . . . , 1)T .
79/100
P �i ?
22333ML232
Relaxation
X = xxT with x ∈ {±1}n iff
X � 0,
rank(X) = 1, ←− omitt
Diag(X) = e := (1, 1 . . . , 1)T .
Relaxed problem:
minX�0
{Q •X | Diag(X) = e} .
Linear objective function, linear constraints, semidefiniteness condition.
“Very good approximation to (BQP )” Goemans und Williamson (1995).
80/100
P �i ?
22333ML232
The SDP-approximation uses
MC := conv({
(xxT ) | x ∈ {−1, 1}n})
= conv ({X � 0 | rank(X) = 1, Diag(X) = e})
= conv ({X � 0 | rank(X) = 1} ∩ {X | Diag(X) = e})
⊂ conv ({X � 0 | rank(X) = 1}) ∩ {X | Diag(X) = e}
= {X � 0 | Diag(X) = e} =: SDP.
Omitting the rank-condition means ignoring (n2 − n)/2 nonlinear equations.
(There is no reason to believe this ignorance will be very helpful.)
Above interpretation shows:
1.) Instead of minimizing a linear objective over some set of isolated points,
one can equivalently minimize over the convex hull. (Huge increase of the feasible set.)
2.) In this situation, the deletion of the rank condition corresponds to some
form of Lagrange relaxation.
81/100
P �i ?
22333ML232
Goemans und Williamson (1995), Nesterov (1998):
By sin [X] (with square brackets) we denote the componentwise
application of the sin function to the matrix X.
Then the following inner approximation of the max-cut polytope
MC holds true,
T A :=
{X � 0 | sin
[π
2X
]� 0, Diag(X) = e
}⊂MC,
and the extremal points of the Trigonometric Approximation T Aand ofMC coincide.
82/100
P �i ?
22333ML232
Hirschfeld und J. (2003):
All one- and two-dimensional boundary manifolds ofMC are con-
tained in T A. (In particular, for n = 3 we haveMC = T A.)
Consequence: Each locally optimal vertex of T A is also globally
optimal over T A as well as overMC.
(“Local minimization of a linear function over T A is (probably) NP-complete.”)
The set T A is explicitely given by two analytic semidefiniteness constraints, it
is star shaped, contains a ball of radius 1, and is contained in a ball of radius n.
Local Minimization over T A is the solution of a
nonlinear semidefinite program.
82/100
P �i ?
22333ML232
Hirschfeld und J. (2003):
All one- and two-dimensional boundary manifolds ofMC are con-
tained in T A. (In particular, for n = 3 we haveMC = T A.)
Consequence: Each locally optimal vertex of T A is also globally
optimal over T A as well as overMC.
(“Local minimization of a linear function over T A is (probably) NP-complete.”)
The set T A is explicitely given by two analytic semidefiniteness constraints, it
is star shaped, contains a ball of radius 1, and is contained in a ball of radius n.
Local Minimization over T A is the solution of a
nonlinear semidefinite program.
But the constraint qualification is violated.
The tangential cone at any vertex has a nonempty interior but the interior of the tangential
cone at midpoints of certain edges is empty.
83/100
P �i ?
22333ML232
The sets MC and T A