Interior Point Methods for Convex Optimizationread.pudn.com/downloads77/ebook/294108/Interior... · Interior Methods for Linear and Convex Tunis, Nov. 18, 2004Optimization Florian

1/100

P �i ?

22333ML232

Heinrich-Heine-Universitat Dusseldorf– Lehrstuhl fur Mathematische Optimierung

http://www.opt.uni-duesseldorf.de/

Interior Methods for Linear and Convex

Optimization

Tunis, Nov. 18, 2004

Florian [email protected]

http://www.opt.uni-duesseldorf.de/

mailto:[email protected]

2/100

P �i ?

22333ML232

Survey

Linear Programming, Simplex Method

Linear Programming, Interior Point Methods

Convex Programming, Interior Point Methods

Semidefinite Optimization

• Theory (Duality, Newton’s Method)

• Sensitivity Analysis

• Applications (Lyapunov stability, Max-Cut Approximation)

3/100

P �i ?

22333ML232

PART O

Brief Introduction to Linear Programming

4/100

P �i ?

22333ML232

A Diet Problem

Table 1: Data for the diet problem.

carbon hydrates proteines vitamines costs

1 unit Company A 20 E 15 E 5 E 10 Euro

1 unit Company B 20 E 3 E 10 E 7 Euro

demand/day 60 E 15 E 20 E

minimize 10x1 + 7x2

subject to 20x1 + 20x2 ≥ 60,

15x1 + 3x2 ≥ 15,

5x1 + 10x2 ≥ 20,

x1 ≥ 0, x2 ≥ 0.

(1)

5/100

P �i ?

22333ML232

Figure 1: Feasible set for the diet problem.

6/100

P �i ?

22333ML232

Indroduction of slack variables

minimize 10x1 + 7x2

subject to 20x1 + 20x2 − x3 = 60,

15x1 + 3x2 − x4 = 15,

5x1 + 10x2 − x5 = 20,

xi ≥ 0 (1 ≤ i ≤ 5).

(2)

Crash Course of Simplex method: Linear program in standard form:

minimize cTx

subject to Ax = b,

x ≥ 0.

(3)

7/100

P �i ?

22333ML232

Finding a “vertex” of the feasible set

1. “Vertex” of a polyhedron in IRn: Intersection of n (linearly independent)

linear constraints, so-called active constraints.

2. For A ∈ IRm×n, the m equality constraints given by Ax = b are always

active at any feasible point.

3. “Pick” additional n − m constraints of the form xi ≥ 0, i.e. set n − m

components of x to zero (non-basic variables). The remaining variables

(basic variables) are fixed such that Ax = b holds true. (This is possible

if the columns of A that correspond to the basic variables are linearly

independent).

4. If we pick the non-basic variables in such a way that the basic variables

are nonnegative when solving Ax = b then the basis (i.e. the set of basic

variables) is called feasible.

8/100

P �i ?

22333ML232

Key point of simplex method:

By “intelligent bookkeeping”, one can identify all directions (all

edges of the polyhedron) along which the objective function cTx

improves. Moreover, it is easy to compute the vertex “at the other

end” of a given edge. Choosing such an edge and the vertex at

the other end is called “pivot step” of the simplex method.

Moving from one vertex to the next along edges that improve cTx

the simplex method either finds an optimal vertex or it finds a

feasible ray along which the objective function tends to −∞.

(Some technicalities when there happen to be more than n active

constraints at a given vertex, several changes of basic and nonbasic

variables at the same vertex, strategies to avoid “cycling”.)

9/100

P �i ?

22333ML232

Examples

www.mcs.anl.gov/home/otc/Guide/CaseStudies/diet/index.html

www.mcs.anl.gov/home/otc/Guide/CaseStudies/simplex/index.html

Caution

The polyhedral structure (network of neighboring vertices) may be much more

complicated than one might expect from our 3-dimensional intuition:

- Consider polyhedra for which each vertex is connected to each other vertex by

an edge. For n = 1, 2, 3 such a polyhedron must have at most n + 1 vertices.

For n = 4 it may have 6 vertices, for n = 45 it may have 512 vertices!

Consider a polyhedron P1 ⊂ IRn and a polyhedron P2 ⊂ IRk for some k < n

such that

P2 = {x | (x, y) ∈ P1, x ∈ IRk, y ∈ IRn−k}.

If P1 has m vertices, P2 may have 2O(m) vertices.

(The projection may have exponentially many more vertices.)

10/100

P �i ?

22333ML232

There are open questions still today like the conjecture of Hirsch

(1959): “There exists a choice of pivot steps such that the simplex

method terminates after at most n −m steps.” (?) (Unfortu-

nately, the “opposite” is easy to prove: For many commonly used

pivot step selection rules there are examples such that the simplex

method takes 2O(n) steps.)

Only in 1979, one could find a (new) method that is guaranteed

to converge in a polynomial number of steps. (Khachians proof

of convergence for the ellipsoid method due to Yudin and Ne-

mirovski / Shor.) However, this method was very slow in practical

applications.

In 1984, Karmarkar presented another class of polynomial methods

that is also very efficient in practice and will be discussed next:

11/100

P �i ?

22333ML232

PART I

Interior Point Methods for Linear Programming

12/100

P �i ?

22333ML232

Duality

Figure 2: Optimality conditions for max bT y | AT y ≤ c

13/100

P �i ?

22333ML232

If y is optimal, then there exist x1, x2 such that

b = x1a1 + x2a2 with xi ≥ 0

More general, there exists x such that

b =n∑

i=1xiai, x ≥ 0, and xi · (ci − aT

i y) = 0.

Rewritten: Find x, y, s such that

Ψ0(x, y, s) :=

Ax− b

ATy + s− c

Xs

=

0

0

0

, x, s ≥ 0, (4)

where X = Diag(x1, . . . , xn).

14/100

P �i ?

22333ML232

Can we use Newton’s method to solve Ψ0(x, y, s) = 0?

Ψ0(x + ∆x, y + ∆y, s + ∆s)

≈ Ψ0(x, y, s) + DΨ0(x, y, s)(∆x, ∆y, ∆s) !=0,

i.e.

(∆x, ∆y, ∆s) = −(DΨ0(x, y, s))−1Ψ0(x, y, s)

where

DΨ0(x, y, s) =

A 0 0

0 AT I

S 0 X

.

15/100

P �i ?

22333ML232

When A has full row rank (preprocessing!) and when x > 0,

s > 0, one can verify that DΨ0(x, y, s) is invertible.

Problem: The solutions x, s of Ψ0(x, y, s) = 0 contain (many)

zero-components. Newton’s method typically converges to a so-

lution where x 6≥ 0, s 6≥ 0, i.e. where some components are

negative.

Therefore: For fixed µ > 0 consider

Ψµ(x, y, s) :=

Ax− b

ATy + s− c

Xs− µe

=

0

0

0

, x, s > 0.

The solutions (x(µ), y(µ), s(µ)) of this system are called points

on the central path. The central path is well defined for all µ > 0 if

there exists an “interior point” (x, y, s) with Ax = b, ATy+s = c

and x, s > 0.

16/100

P �i ?

22333ML232

The derivatives of Ψ0 and Ψµ coincide. If Newton’s method

starts sufficiently close to (x(µ), y(µ), s(µ)), it will converge to

(x(µ), y(µ), s(µ)) (which is a strictly positive solution).

Interior point method:

Fix µ > 0.

1. Do one Newton step to approximate (x(µ), y(µ), s(µ)).

2. Reduce µ > 0 and repeat.

17/100

P �i ?

22333ML232

Note:

Points on the central path are also solutions of the following con-

vex logarithmic barrier problems:

mincTx

µ−

n∑i=1

ln xi | Ax = b,

min−bTy

µ−

n∑i=1

ln si | ATy + s = c.

Using Newton’s method to solve one or both of these problems

will yield different search directions (so-called primal direction or

dual direction).

18/100

P �i ?

22333ML232

Convergence:

Define a positive diagonal matrix D by D2 := XS−1.

When Ax = b and ATy + s = c hold, the right hand side

Ψµ(x, y, s) of the Newton step (DΨµ(x, y, s))−1Ψµ(x, y, s) is just

(0, 0, µe−Xs).

Let r := Xs− µe, then the solution of the Newton step is given

by q = DX−1r and:

∆y = (AD2AT )−1ADq,

∆x = D2AT∆y −Dq,

∆s = −D−1q −D−2∆x,

19/100

P �i ?

22333ML232

To analyze the Newton step we recall that

ΠR := DAT (AD2AT )−1AD

is the “orthogonal projection” onto

R = R(DAT ) = {DATw | w ∈ IRm}.

Moreover,

R(DAT ) ⊥ N(AD) and R(DAT )⊕N(AD) = IRn,

N = N(AD) := {y ∈ IRn | ADy = 0} is the null space of AD.

20/100

P �i ?

22333ML232

Above definitions imply:

DAT∆y = ΠR q. (5)

∆x = D(ΠR q − q) = −DΠN q, (6)

∆s = −D−1(q + D−1∆x) = −D−1ΠR q. (7)

21/100

P �i ?

22333ML232

Analyze one step, (x, y, s)→ (x + ∆x, y + ∆y, s + ∆s).

r = (X + ∆X)(s + ∆s)− µe

= Xs +

−r︷︸︸︷X∆s + S∆x +∆X∆s− µe

= Xs− r + ∆X∆s− µe = ∆X∆s = ∆X∆s,

∆x := −D−1∆x = ΠN q and ∆s := −D∆s = ΠR q

Xs = r + µe and R = Diag(r). Set β = ‖r‖2/µ.

‖DX−1‖22 = ‖(R + µI)−1‖2 = max1≤i≤n

1

|ri + µ|≤ 1

µ(1− β), (8)

This implies

β := ‖q‖2 = ‖DX−1r‖2 ≤ β√

µ/(1− β), (9)

22/100

P �i ?

22333ML232

and

‖∆x‖2 = β cos θ, ‖∆s‖2 = β sin θ. (10)

Here, θ is the angle between q and ΠN q. From the representation

of r, (??) and (??) follows

‖r‖2 = ‖∆X∆s‖2 ≤ β2 cos θ sin θ ≤ 1

2(1− β)µβ2 ≤ µβ2.

Second but last inequality we used |2 cos θ sin θ| = | sin(2θ)| ≤ 1.

Hence, the relative error ‖Xs−µe‖2/µ = ‖r‖2/µ is squared after

each step of the Newton method.

(NOTE: We can give precise bounds and not a statement like

“there exists a neighborhood U and a constant M <∞ such that

for any starting point in U we have ‖x+∆x−x∗‖ ≤M‖x−x∗‖2”.)

23/100

P �i ?

22333ML232

Short Step Algorithm.

Input: x0 > 0, y0, s0 > 0 and µ0 > 0 such that Ax0 = b,

ATy0 + s0 = c, X0s0−µ0e = r0 and ‖r0‖2/µ0 ≤ 12 . (This means

the starting point is close to the central path.)

Let ε > 0 be given. Set k = 0.

1. Do one step of Newton’s method for finding (x(µk), y(µk), s(µk)).

Let xk+1 = xk + ∆x, yk+1 = yk + ∆y and sk+1 = sk + ∆s.

2. Reduce µk to µk+1 := µk(1− 16√

n).

3. Set k = k + 1.

4. If µk ≤ ε/n, then STOP, else go to Step 1.

24/100

P �i ?

22333ML232

Mehrotras predictor-corrector algorithm

Input: (x0, y0, s0) with (x0, s0) > 0, ε > 0, a bound M � 0.

Set k := 0.

1. Set (x, y, s) := (xk, yk, sk), µk := (xk)Tsk/n.

2. If ‖Ax− b‖ < ε,∥∥∥ATy + s− c

∥∥∥ < ε and µk < ε: STOP, the

iterate is an approximate solution of the linear program.

3. If ‖x‖ > M or ‖s‖ > M : STOP, das problem either has no

feasible solution or is of “bad condition”.

4. Solve A

AT I

S X

∆xN

∆yN

∆sN

=

b− Ax

c− ATy − s

−Xs

.

25/100

P �i ?

22333ML232

5. Compute the maximum step lengths along ∆xN and ∆sN ,

αNx := min{1, min

i:∆xNi <0− xi

∆xNi

}, αNs := min{1, min

i:∆sNi <0− si

∆sNi

}.

6. Set

µ+N = (x + αN

x ∆x)T (s + αNs ∆s)/n

and

µC = µk · (µ+N/µk)

3.

7. SolveA

AT I

S X

∆xC

∆yC

∆sC

=

b− Ax

c− ATy − s

µCe−Xs−∆XN∆sN

.

26/100

P �i ?

22333ML232

8. Select a damping parameter ηk ∈ [0.8, 1.0) and compute the

primal and dual step length along ∆xC and ∆sC via

αCmax,x := min

i:∆xCi <0{− xi

∆xCi

}, αCmax,s := min

i:∆sCi <0{− si

∆sCi

}

and

αCx := min{1, ηkα

Cmax,x}, αC

s := min{1, ηkαCmax,s}.

9. Set

xk+1 := xk+αCx ∆xC , (yk+1, sk+1) := (yk, sk)+αC

x (∆yC , ∆sC),

as well as k := k + 1 and go to Step 1.

27/100

P �i ?

22333ML232

PART II

Interior Point Methods for Convex Programming

28/100

P �i ?

22333ML232

Basic problem

minimize f0(x) s.t. x ∈ IRn : fi(x) ≤ 0 for 1 ≤ i ≤ m.

Most of the following results are due to Nesterov and Nemirovskii:

Interior Point Polynomial Methods in Convex Programming, SIAM 1994.

Assumptions

fi three times continuously differentiablef0 and −∑

i≥1 ln(−fi) convex Always satified if all fi are convex

29/100

P �i ?

22333ML232

Simplifications

• f0 is linear, f0(x) ≡ cTx

(Without loss of generality!)

• The feasible set

S := {x | fi(x) ≤ 0 for 1 ≤ i ≤ m}

is bounded and has nonemtpy interior.

(Some restriction of generality)

30/100

P �i ?

22333ML232

Logarithmic barrier method

φ(x) := −m∑

i=1ln(−fi(x))

logarithmic barrier function for S, and for some fixed θ ≥ 1 and

λ > min{cTx | x ∈ S}:

ϕ(x, λ) := −θ ln(λ− cTx) + φ(x)

barrier function for S(λ) := S ∩ {x | cTx ≤ λ}.

The minimizer x of φ is called the analytic center of S.

The minimizer x(λ) of ϕ( . , λ) is called the analytic center of

S(λ).

31/100

P �i ?

22333ML232

Method of analytic centers (Huard, Sonnevend, Renegar):

Let x0 ∈ S◦ and λ0 > cTx0 be given. Set σ = 12 and k = 0.

Repeat

1.) Reduce λk to λk+1 := λk − σ(λk − cTxk).

2.) Starting at xk do few steps of Newton’s method for minimizing

ϕ( . , λk+1).

3.) Set k := k + 1.

End.

Crucial questions:

1. How fast does Newton’s method converge?

2. How big is λk − cT xk compared to the unknown distance cT xk − λopt?

32/100

P �i ?

22333ML232

Distance to optimality

33/100

P �i ?

22333ML232

First question:

• For simplicity, restrict examination to φ

– and apply the results for Newton’s method later to ϕ which has the same structure!

• Intuitively, want ∇2φ to be “nearly constant” for Newton’s

method to work well.

• In fact, want relative change of ∇2φ to be small.

• The absolute change of ∇2φ is given by ∇3φ.

• Hence want ∇3φ to be small compared to ∇2φ.

• Look at “Karmarkar’s barrier function” for the positive real axis:

φ(t) := − ln t. Here, φ′′′(t) ≤ 2(φ′′(t))3/2.

• Generalize this to n dimensions:

34/100

P �i ?

22333ML232

Self-concordance

The barrier function φ : S◦ → IR is called (strongly) self-concor-

dant if for any x ∈ S◦ and any h ∈ IRn the restriction l = lx,h of

φ to the line x + th,

l(t) := φ(x + th)

satisfies l′′′(0) ≤ 2(l′′(0))3/2.

(This assumes l is C3-smooth and l′′ ≥ 0, i.e. l is convex.)

(The power 3/2 garantees invariance w.r.t. the length of h !)

35/100

P �i ?

22333ML232

Does this make sense?

I.) Do there exist functions that satisfy this condition?

1. Easy to verify (binomial formula):

If φ1 and φ2 are self-concordant, then so is φ1 + φ2.

(provided the domains of φ1 and φ2 intersect)

2. Very easy to verify:

If A is an affine mapping and φ is self-concordant, then so is

φ(A( . )). (provided the range of A intersects the domain of φ.)

3. This implies that −∑ln(bi− aT

i x) is a self-concordant barrier

function for the polyhedron {x | aTi x ≤ bi}.

36/100

P �i ?

22333ML232

4. Convex quadratic constraints q(x) ≤ 0:

For fixed x with q(x) < 0 and fixed h, the term q(x + th) can

be factored into the product of two linear terms, so that by 1.

and 2., the function l(t) = − ln(−q(x+th)) is self-concordant!

37/100

P �i ?

22333ML232

5. Semidefinite constraints X � 0 (X = XT ∈ IRn×n):

The barrier function φ(X) := − ln det(X) is self-concordant:

For fixed X � 0 and fixed H = HT ∈ IRn×n it follows

l(t) = − ln det(X + tH)

= − ln det(X1/2(I + tX−1/2HX−1/2)X1/2)

= −2 ln det(X1/2)− ln(n∏

j=1(1 + tλi))

= − ln det(X)−n∑

j=1ln(1 + tλi),

where λi are the eigenvalues of X−1/2HX−1/2 independent

of t. Again, by 1. and 2. it follows that φ is self-concordant.

Note that

l′(0) = trace(X−1/2HX−1/2) = trace(HX−1) =: H •X−1.

38/100

P �i ?

22333ML232

Moreover, let some symmetric matrices A(0), . . . , A(m) be given.

Consider the problem

minimize bTy s.t. A(y) := A(0) +m∑

1=1yiA

(i) � 0.

By affine invariance, also the function Φ(y) := − ln det(A(y)) is

self-concordant. Moreover, the derivatives of Φ can be explicitely

stated:

∂

∂yiΦ(y) = −A(y)−1 • A(i),

∂2

∂yi∂yjΦ(y) = A(y)−1A(i)A(y)−1 • A(j).

39/100

P �i ?

22333ML232

II.) Does this condition really garantee that Newton’s method

converges well?

Since l′′′(0) ≤ 2l′′(0)3/2 holds for any x (defining l), this implies

in fact that l′′′(t) ≤ 2l′′(t)3/2 for any t in the domain of l.

Let u(t) := l′′(t), then u′(t) ≤ 2u(t)3/2 is a valid differential

inequality.

The extremal solution of v′(t) = 2v(t)3/2 with initial value

v(0) = u(0) = hT∇2φ(x)h := δ2

is given by v(t) = 1/(δ−1 − t)2.

Whenever v has finite values, u must be finite, and hence so must

be l.

40/100

P �i ?

22333ML232

This implies the following lemma:

Inner Ellipsoid

Let E(x) := {h | hT∇2φ(x)h ≤ 1} be the unit ball of the (semi-)

norm given by the Hessian Hx := ∇2φ(x) of φ at x.

(The (semi-) norm is ‖h‖Hx:=√

hTHxh.)

Then, for any x ∈ S◦ the inclusion x + E(x) ⊂ S holds true.

41/100

P �i ?

22333ML232

Inner ellipsoid

42/100

P �i ?

22333ML232

Further Results

Equivalent Relative Lipschitz condition:

|hT (∇2φ(x + ∆x)−∇2φ(x))h| ≤ δM(δ)hT∇2φ(x)h,

where

δ := ‖∆x‖Hxand M(δ) :=

2

1− δ+

δ

(1− δ)2 = 2 + O(δ).

(Somewhat more difficult to show.)

43/100

P �i ?

22333ML232

Newton’s method

Let x ∈ S◦ be given. Denote Hx := ∇2φ(x).

Let ∆x := −H−1x ∇φ(x) be the Newton step for minimizing φ

starting at x ∈ S◦. Assume that δ := ‖∆x‖Hx< 1.

Then, by the inner ellipsoid x + ∆x ∈ S◦.

Let x := x + ∆x and ∆x := −H−1x ∇φ(x) be the “next” Newton

step. Then

‖∆x‖Hx≤ δ2

(1− δ)2 .

This implies quadratic convergence in at least one fifth of the

inner ellipsoid about the center. (Related idea of proof as for inner ellipsoid.)

44/100

P �i ?

22333ML232

The second question, distance to optimality:

Self-concordance is a (relative) Lipschitz condition on the Hessian of φ.

• By adding a linear perturbation to φ, the self-concordance con-

dition obviously does not change.

• But by adding a linear perturbation to φ we can make any point

x ∈ S◦ a minimizer of the perturbed function.

• For x close to the boundary, the perturbation will have to be

large, and the perturbed function will have a “large gradient” at

the minimizer of φ.

• If we want to avoid points close to the boundary to be a min-imizer of “our” barrier function, we may limit the norm of itsgradient – of course with respect to the canonical norm ‖ . ‖Hx

.

45/100

P �i ?

22333ML232

With the notation used in the self-concordance condition, we re-

quire for some fixed θ ≥ 1:

l′(0) ≤√

θl′′(0)1/2.

If φ is self-concordant and satisfies the above condition, we say

φ is θ-self-concordant.

This condition is also affine invariant.

It is “additive” w.r.t. θ, in the sense that if φ1 and φ2 satisfy the

condition with values θ1 ≥ 1 and θ2 ≥ 1, then so does φ1 + φ2

with value θ1 + θ2.

The previous examples satisfy the condition with θ = 1 for a linear

or convex quadratic constraint, and θ = l for an l× l semidefinite

constraint.

46/100

P �i ?

22333ML232

Results

1. We have λ−cTx(λ) > cTx(λ)−λopt when θ in the definition of

ϕ is chosen as least as large as the self-concordance parameter

of φ. “Identical” proof as for inner ellipsoid, just the other way round.

2. Let x ∈ S◦ be arbitrary and let φ be a θ-self-concordant barrier

function for S. Let H := {y | (y − x)T∇φ(x) ≥ 0} be a half

space cutting through x and E(x) be the inner ellipsoid. Then

S ∩H ⊂ x + (θ + 2√

θ)E(x).

(H = IRn when ∇φ(x) = 0.) Again, similar proof as for inner ellipsoid.

47/100

P �i ?

22333ML232

Inner and outer ellipsoid

48/100

P �i ?

22333ML232

• Now we have all essential tools to show that the method of

centers converges at a fixed rate.

• If λk is changed only little at each iteration (namely σ =

1/(8√

θ) rather than σ = 1/2) then only one step of Newton’s

method suffices at each iteration, and after 12√

θ iterations the

unknown distance λk − λopt is reduced by a factor at least 1/2.

49/100

P �i ?

22333ML232

Discussion

The given rate of convergence (for a problem with 10000 convex

quadratic constraints, 1200 iterations are needed to reduce the

error bound λk − λopt by a factor 1/2) is too slow for practical

implementations.

BUT it garantees a very weak dependence on the data of the

problem — the rate only depends on a weighted number of con-

straints, where complicated conditions like semidefiniteness con-

straints are counted with a somewhat higher weight.

50/100

P �i ?

22333ML232

• No dependence on the number of unknowns.

(Generalization to Hilbert space by Renegar)

• Assuming exact artihmetic – unlike the conjugate gradient or steepest

descent methods, no dependence on any condition numbers of the

problem.

• Assuming exact artihmetic – unlike the simplex method, no dependence

on degeneracy.

Hence, the CONCEPT is very robust.

Find an acceleration based on this concept.

51/100

P �i ?

22333ML232

Modifications

• Infeasible starting points, empty interior, unbounded set of op-

timal solutions.

• Predictor corrector strategy: Under “mild” conditions, the cen-

tral path x(λ) forms a smooth curve leading to an optimal

solution.

Through any given point x ∈ S◦ one can define a perturbed

central path x(λ) leading to an optimal solution as well, and

the tangent to this curve is “easily” computable. (Same system

as used for Newton’s method.)

Do some extrapolation along this tangent and start the Newton

corrections from the extrapolated point.

52/100

P �i ?

22333ML232

Conic formulations

• Each convex program that posesses a self-concordant barrier

function can be expressed in conic form with a self-concordant

barrier function of the same order of magnitude. (Nesterov and

Nemirovskii 1994, Freund and J., 1999 “optimal barrier”)

• Conic formulations allow for primal-dual methods that have

turned out to be more efficient in practical implementations. (Their

theoretical complexity is the same as the one of the method of

centers.)

• Many programs (like semidefinite programs) are naturally given

in conic form and thus allow for direct application of primal dual

methods.

53/100

P �i ?

22333ML232

PART III

Semidefinite Programming

54/100

P �i ?

22333ML232

Notation

Sn: The space of symmetric n× n-matrices

X � 0, (X � 0): X ∈ Sn is positive semidefinite (positive definite).

Standard scalar product on the space of n× n-matrices

〈C, X〉 := C •X := trace(CTX) =∑i,j

Ci,jXi,j

inducing the Frobenius norm,

X •X = ‖X‖2F .

55/100

P �i ?

22333ML232

Notation (continued)

For given symmetric matrices A(i) a linear map A from Sn to IRm is given by

A(X) =

A(1) •X

...

A(m) •X

.

The adjoint operator A∗ satisfying

〈A∗(y), X〉 = yTA(X) ∀ X ∈ Sn, y ∈ IRm

is given by

A∗(y) =m∑

i=1yiA

(i).

56/100

P �i ?

22333ML232

Linear Semidefinite Programs

minimize C •X where A(X) = b

X � 0

Similar structure as linear programs, only

“the condition x ≥ 0 (componentwise)”

replaced by

“the condition X � 0 (semidefinite)”

Can be solved by the (accelerated!) method of centers from

Part II – or by specialized primal-dual methods.

57/100

P �i ?

22333ML232

Basic Theory

If there exists X � 0 with A(X) = b (strict feasibility), then

(P ) inf C •X s.t. A(X) = b, X � 0

(D) = sup bTy s.t. A∗(y) + S = C, S � 0.

If (P ) and (D) have strictly feasible solutions, then the optimalsolutions Xopt and yopt, Sopt of both problems exist and satisfy the

equation

XoptSopt = 0.

58/100

P �i ?

22333ML232

Basic Theory (continued)

Conversely, any pair X and y, S of feasible points for (P ) and (D)

satisfying

A(X) = b, X � 0

A∗(y) + S = C, S � 0

XS = 0 (or SX = 0)

is optimal for both problems.

58/100

P �i ?

22333ML232



satisfying

A(X) = b, X � 0

A∗(y) + S = C, S � 0

XS = 0 (or SX = 0)


For Newton’s method symmetrize (Monteiro et. al.) and perturb

the last equation for some small µ > 0:

E.g. M(X, S) = 12(XS + SX) (AHO)

59/100

P �i ?

22333ML232



satisfying

A(X) = b, X � 0

A∗(y) + S = C, S � 0

M(X, S) = 0 ←− µI


For Newton’s method symmetrize (Monteiro et. al.) and perturb

the last equation for some small µ > 0:

E.g. M(X, S) = 12(XS + SX) (AHO)

60/100

P �i ?

22333ML232


For example (AHO), the linearization of

A(X) = b, X � 0

A∗(y) + S = C, S � 0

XS + SX = 2µI

60/100

P �i ?

22333ML232


For example (AHO), the linearization of

A(X) = b, X � 0

A∗(y) + S = C, S � 0

XS + SX = 2µI

yields the linear system for ∆X, ∆y, ∆S:

A(∆X) = b−A(X), X � 0

A∗(∆y) + ∆S = C −A∗(y)− S, S � 0

X∆S + ∆XS + S∆X + ∆SX = 2µI −XS − SX.

61/100

P �i ?

22333ML232

Summarizing, linear SDPs are well analyzed and there exists

Numerically efficient

and polynomial

public domain software

for linear semidefinite programs,

e.g. SEDUMI by Jos Sturm.

62/100

P �i ?

22333ML232

Example

In our first example we consider the differential equation

x(t) = Ax(t) with initial value x(0) = x(0).

The matrix A is called stable, if for all initial values x(0) = x(0)

the solutions x(t) converge to zero when t→∞.

It is well known that this is the case if, and only if, the real part

of all eigenvalues of A is negative,

Re(λi(A)) < 0 for 1 ≤ i ≤ n.

63/100

P �i ?

22333ML232

By Lyapunov’s theorem, this is the case if, and only if,

∃P � 0 : ATP + PA ≺ 0.

Motivation:

d

dt‖x(t)‖2P

=d

dtx(t)TPx(t)

= x(t)TPx(t) + x(t)TPx(t)

= (Ax(t))TPx(t) + x(t)TPAx(t)

= x(t)T (ATP + PA)x(t)

< 0

for all x(t) 6= 0, hence, the P -Norm of x(t) is strictly monotonously

decreasing when t→∞.

64/100

P �i ?

22333ML232

If the matrices Pi, 1 ≤ i ≤ n(n + 1)/2, form a basis of Sn

the determination of a matrix P =∑

i yiPi with P � 0 and

ATP + PA ≺ 0 leads to a linear semidefinite program

min{λ |

∑yiPi � 0, λI −

∑yi(A

TPi + PiA) � 0}

< 0,

a linear semidefinite program in the “dual” form.

(In this form typically unbounded.)

(There are cheaper ways of checking numerically

that the real parts of all eigenvalues of A are negative!)

65/100

P �i ?

22333ML232

Now consider the system

x(t) = A(t)x(t) (∗)where the matrix A(t) is not known explicitely.

(For example, when there are small unknown perturbations of the matrix A.)

If there exist matrices A(i), i = 1, 2, . . . , K, with

A(t) ∈ conv({

A(i)}

i≤i≤K

)for all t ≥ 0,

then the existence of a Lyapunov matrix P � 0 with

(A(i))TP + PA(i) ≺ 0 for 1 ≤ i ≤ K

is a sufficient condition for the stability of (∗). – Because then,

A(t)TP + PA(t) ≺ 0 andd

dt‖x(t)‖2P = x(t)T (A(t)TP + PA(t))x(t) < 0.

66/100

P �i ?

22333ML232

Just a brief reminder...

The condition that the real parts of all eigenvalues of all A(i) are

negative is a necessary but not a sufficient condition:

When the real part of all eigenvalues of two matrices A and B are

negative, this may not be the case for 12(A + B), choose e.g.

A =

−1 4

−1

, B =

−1

4 −1

.

(All eigenvalues real and equal to −1 but 12(A+B) has eigenvalue

+1.)

67/100

P �i ?

22333ML232

A sensitivity result for linear SDPs

Uniqueness-assumption:

Data D of a pair (P ) and (D) of primal and dual linear semidefinite programs:

D = [A, b, C] with A : Sn → IRm, b ∈ IRm, C ∈ Sn.

Assume that (P ) and (D) satisfy Slater’s condition, and that X ∈ Sn and

y ∈ IRm, S ∈ Sn are unique and strictly complementary solutions of (P )

and (D), that is,

A(X) = b, X � 0,

A∗(y) + S = C, S � 0,

XS = 0, X + S � 0.

68/100

P �i ?

22333ML232

Theorem (Freund & J. 2003)

If the data of (P ) and (D) is changed by sufficiently small perturbations

∆D = [∆A, ∆b, ∆C],

then the optimal solutions X(D), y(D), S(D) of the semidefinite programs are

differentiable functions of the perturbations, i.e.

X(D + ∆D) = X(D) + DDX[∆D] + o(‖∆D‖).

Furthermore, the derivatives

X := DDX[∆D], y := DDy[∆D], and S := DDS[∆D],

of the solution X(D), y(D), S(D) satisfy

A(X) = ∆b−∆A(X),

A∗(y) + S = ∆C −∆A∗(y),

XS + XS = 0.

69/100

P �i ?

22333ML232

Idea of Proof:

1) Slater and continuity:

The perturbed problem must have a solution.

2) Subtract optimality conditions and take limit:

We get precisely the statement of the theorem.

3) It remains to be shown that this system is “nonsingular”.

(It is an overdetermined system just by the number of

equations and unknowns.)

70/100

P �i ?

22333ML232

Idea of Proof: (continued)

By complementarity, XS = 0 = SX, and thus the matrices X � 0 and S � 0

commute. This garantees that there exists a unitary matrix U and diagonal

matrices

Λ = Diag (λ1, λ2, . . . , λn) � 0 and Σ = Diag (σ1, σ2, . . . , σn) � 0

such that

X = UΛUT and S = UΣUT .

Partition

λ1, λ2, . . . , λk > 0 and σk+1, σk+2, . . . , σn > 0.

Transform so that, without loss of generality, X = Λ, S = Σ and consider the

upper triangular part Πup(∆XΣ + Λ∆S) = 0,

along with A(∆X) = 0 and A∗(∆y) + ∆S = 0.

71/100

P �i ?

22333ML232

Idea of Proof: (continued II)

Using the structure of the equation and uniqueness of the optimal solution

shows that this system has only the zero solution.

72/100

P �i ?

22333ML232

Idea of Proof: (last)

4) Implicit function theorem ... (Done)

Key was to identify a nonsingular part of the overdetermined sys-

tem – that can also be used unmerically.

Proof does not use the central path or interior-point techniques.

5) Upper semicontinuity of optimal solutions for more general cone

programs was established by Robinson (1982).

Here we consider special (linear semidefinite) cone programs.

There is a simple example that this theorem does not hold for

strictly complementary solutions of more general cone programs:

73/100

P �i ?

22333ML232

Example

74/100

P �i ?

22333ML232

Maximize x1 subject to these (infinitely many) linear constraints.

Add the redundant constraint x1 ≤ 1. (All other constraints are

‘facet defining’.)

Then the optimal solution is unique, strictly complementary, but

the only active constraint is the redundant constraint x1 ≤ 1.

If the objective gradient (1, 0)T is changed a bit, the optimal solu-

tion jumps between the “vertices” close to (1, 0)T . In particular,

it is not differentiable.

From this set form a closed convex cone in IR3 to have a conic

program.

75/100

P �i ?

22333ML232

Corollary

Any step X + tX for t 6= 0 is (typically) infeasible in the sense

that X + tX 6� 0. In some applications the following formula for

the second directional derivative X := 12D

2DX(D)[∆D, ∆D] may

be useful:

A(X) = −∆A(X),

A∗(y) + S = −∆A∗(y),

XS + XS = −XS.

This is the same system matrix as for the first derivative with

different right hand side.

76/100

P �i ?

22333ML232

Note:

When X = UΛUT where the diagonal matrix Λ has a leading

nonzero diagonal block Λ1 as in the preceeding proof, then X has

the following structure:

X = U

A B

BT 0

UT ,

and X has the structure

X = U

∗ ∗∗ BTΛ−1

1 B

UT .

Setting ∗ = 0 yields a minimum norm second order correction

towards the positive semidefinite cone maintaining the multiplicity

of the zero eigenvalue (up to third order terms).

77/100

P �i ?

22333ML232

A combinatorial Application (Max-Cut)

From the set V of vertices of a graph select a subset V1 such that

the number (or the total weight) of the edges from V1 to V \V1 is

maximized.

77/100

P �i ?

22333ML232

A combinatorial Application (Max-Cut)

From the set V of vertices of a graph select a subset V1 such that

the number (or the total weight) of the edges from V1 to V \V1 is

maximized.

Defining a vector x with components xi = 1 if i ∈ V1 and xi = −1

if i 6∈ V1, one can construct a symmetric matrix Q from the

edges of the graph, such that the max-cut problem is equivalent

to solving the following binary quadratic program:

minimize xTQx s.t. x ∈ {−1, 1}n.

78/100

P �i ?

22333ML232

The max-cut problem:


78/100

P �i ?

22333ML232

The max-cut problem:


Note: For x ∈ {−1, 1}n there is

xTQx = trace(xTQx) = trace(QxxT ) = trace(QX) = Q •X

with X � 0,

rank(X) = 1,

Diag(X) = e := (1, 1 . . . , 1)T .

79/100

P �i ?

22333ML232

Relaxation

X = xxT with x ∈ {±1}n iff

X � 0,

rank(X) = 1, ←− omitt

Diag(X) = e := (1, 1 . . . , 1)T .

79/100

P �i ?

22333ML232

Relaxation

X = xxT with x ∈ {±1}n iff

X � 0,

rank(X) = 1, ←− omitt

Diag(X) = e := (1, 1 . . . , 1)T .

Relaxed problem:

minX�0

{Q •X | Diag(X) = e} .

Linear objective function, linear constraints, semidefiniteness condition.

“Very good approximation to (BQP )” Goemans und Williamson (1995).

80/100

P �i ?

22333ML232

The SDP-approximation uses

MC := conv({

(xxT ) | x ∈ {−1, 1}n})

= conv ({X � 0 | rank(X) = 1, Diag(X) = e})

= conv ({X � 0 | rank(X) = 1} ∩ {X | Diag(X) = e})

⊂ conv ({X � 0 | rank(X) = 1}) ∩ {X | Diag(X) = e}

= {X � 0 | Diag(X) = e} =: SDP.

Omitting the rank-condition means ignoring (n2 − n)/2 nonlinear equations.

(There is no reason to believe this ignorance will be very helpful.)

Above interpretation shows:

1.) Instead of minimizing a linear objective over some set of isolated points,

one can equivalently minimize over the convex hull. (Huge increase of the feasible set.)

2.) In this situation, the deletion of the rank condition corresponds to some

form of Lagrange relaxation.

81/100

P �i ?

22333ML232

Goemans und Williamson (1995), Nesterov (1998):

By sin [X] (with square brackets) we denote the componentwise

application of the sin function to the matrix X.

Then the following inner approximation of the max-cut polytope

MC holds true,

T A :=

{X � 0 | sin

[π

2X

]� 0, Diag(X) = e

}⊂MC,

and the extremal points of the Trigonometric Approximation T Aand ofMC coincide.

82/100

P �i ?

22333ML232

Hirschfeld und J. (2003):

All one- and two-dimensional boundary manifolds ofMC are con-

tained in T A. (In particular, for n = 3 we haveMC = T A.)

Consequence: Each locally optimal vertex of T A is also globally

optimal over T A as well as overMC.

(“Local minimization of a linear function over T A is (probably) NP-complete.”)

The set T A is explicitely given by two analytic semidefiniteness constraints, it

is star shaped, contains a ball of radius 1, and is contained in a ball of radius n.

Local Minimization over T A is the solution of a

nonlinear semidefinite program.

82/100

P �i ?

22333ML232

Hirschfeld und J. (2003):

All one- and two-dimensional boundary manifolds ofMC are con-

tained in T A. (In particular, for n = 3 we haveMC = T A.)

Consequence: Each locally optimal vertex of T A is also globally

optimal over T A as well as overMC.

(“Local minimization of a linear function over T A is (probably) NP-complete.”)

The set T A is explicitely given by two analytic semidefiniteness constraints, it

is star shaped, contains a ball of radius 1, and is contained in a ball of radius n.

Local Minimization over T A is the solution of a

nonlinear semidefinite program.

But the constraint qualification is violated.

The tangential cone at any vertex has a nonempty interior but the interior of the tangential

cone at midpoints of certain edges is empty.

83/100

P �i ?

22333ML232

The sets MC and T A

Interior Point Methods for Convex Optimizationread.pudn.com/downloads77/ebook/294108/Interior... · Interior Methods for Linear and Convex Tunis, Nov. 18, 2004Optimization Florian

Documents