Top Banner
Sequential Convex Programming sequential convex programming alternating convex optimization convex-concave procedure EE364b, Stanford University
43

sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Jul 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Sequential Convex Programming

• sequential convex programming

• alternating convex optimization

• convex-concave procedure

EE364b, Stanford University

Page 2: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Methods for nonconvex optimization problems

• convex optimization methods are (roughly) always global, always fast

• for general nonconvex problems, we have to give up one

– local optimization methods are fast, but need not find globalsolution (and even when they do, cannot certify it)

– global optimization methods find global solution (and certify it),but are not always fast (indeed, are often slow)

• this lecture: local optimization methods that are based on solving asequence of convex problems

EE364b, Stanford University 1

Page 3: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Sequential convex programming (SCP)

• a local optimization method for nonconvex problems that leveragesconvex optimization

– convex portions of a problem are handled ‘exactly’ and efficiently

• SCP is a heuristic

– it can fail to find optimal (or even feasible) point– results can (and often do) depend on starting point

(can run algorithm from many initial points and take best result)

• SCP often works well, i.e., finds a feasible point with good, if notoptimal, objective value

EE364b, Stanford University 2

Page 4: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Problem

we consider nonconvex problem

minimize f0(x)subject to fi(x) ≤ 0, i = 1, . . . ,m

hi(x) = 0, j = 1, . . . , p

with variable x ∈ Rn

• f0 and fi (possibly) nonconvex

• hi (possibly) non-affine

EE364b, Stanford University 3

Page 5: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Basic idea of SCP

• maintain estimate of solution x(k), and convex trust region T (k) ⊂ Rn

• form convex approximation fi of fi over trust region T (k)

• form affine approximation hi of hi over trust region T (k)

• x(k+1) is optimal point for approximate convex problem

minimize f0(x)

subject to fi(x) ≤ 0, i = 1, . . . ,m

hi(x) = 0, i = 1, . . . , px ∈ T (k)

EE364b, Stanford University 4

Page 6: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Trust region

• typical trust region is box around current point:

T (k) = {x | |xi − x(k)i | ≤ ρi, i = 1, . . . , n}

• if xi appears only in convex inequalities and affine equalities, can takeρi = ∞

EE364b, Stanford University 5

Page 7: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Affine and convex approximations via Taylor expansions

• (affine) first order Taylor expansion:

f(x) = f(x(k)) +∇f(x(k))T (x− x(k))

• (convex part of) second order Taylor expansion:

f(x) = f(x(k)) +∇f(x(k))T (x− x(k)) + (1/2)(x− x(k))TP (x− x(k))

P =(

∇2f(x(k)))

+, PSD part of Hessian

• give local approximations, which don’t depend on trust region radii ρi

EE364b, Stanford University 6

Page 8: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Quadratic trust regions

• full second order Taylor expansion:

f(x) = f(x(k))+∇f(x(k))T (x−x(k))+(1/2)(x−x(k))∇2f(x(k))(x−x(k)),

• trust region is compact ellipse around current point: for some P ≻ 0

T (k) = {x | (x− x(k))TP (x− x(k)) ≤ ρ}

• Update is any x(k+1) for which there is λ ≥ 0 s.t.

∇2f(x(k)) + λP � 0, λ(‖x(k+1)‖2 − 1) = 0,

(∇2f(x(k)) + λP )x(k) = −∇f(x(k))

EE364b, Stanford University 7

Page 9: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Particle method

• particle method:

– choose points z1, . . . , zK ∈ T (k)

(e.g., all vertices, some vertices, grid, random, . . . )– evaluate yi = f(zi)– fit data (zi, yi) with convex (affine) function

(using convex optimization)

• advantages:

– handles nondifferentiable functions, or functions for which evaluatingderivatives is difficult

– gives regional models, which depend on current point and trustregion radii ρi

EE364b, Stanford University 8

Page 10: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Fitting affine or quadratic functions to data

fit convex quadratic function to data (zi, yi)

minimize∑K

i=1

(

(zi − x(k))TP (zi − x(k)) + qT (zi − x(k)) + r − yi)2

subject to P � 0

with variables P ∈ Sn, q ∈ Rn, r ∈ R

• can use other objectives, add other convex constraints

• no need to solve exactly

• this problem is solved for each nonconvex constraint, each SCP step

EE364b, Stanford University 9

Page 11: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Quasi-linearization

• a cheap and simple method for affine approximation

• write h(x) as A(x)x+ b(x) (many ways to do this)

• use h(x) = A(x(k))x+ b(x(k))

• example:

h(x) = (1/2)xTPx+ qTx+ r = ((1/2)Px+ q)Tx+ r

• hql(x) = ((1/2)Px(k) + q)Tx+ r

• htay(x) = (Px(k) + q)T (x− x(k)) + h(x(k))

EE364b, Stanford University 10

Page 12: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Example

• nonconvex QP

minimize f(x) = (1/2)xTPx+ qTxsubject to ‖x‖∞ ≤ 1

with P symmetric but not PSD

• use approximation

f(x(k)) + (Px(k) + q)T (x− x(k)) + (1/2)(x− x(k))TP+(x− x(k))

EE364b, Stanford University 11

Page 13: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

• example with x ∈ R20

• SCP with ρ = 0.2, started from 10 different points

5 10 15 20 25 30−70

−60

−50

−40

−30

−20

−10

k

f(x

(k) )

• runs typically converge to points between −60 and −50

• dashed line shows lower bound on optimal value ≈ −66.5

EE364b, Stanford University 12

Page 14: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Lower bound via Lagrange dual

• write constraints as x2i ≤ 1 and form Lagrangian

L(x, λ) = (1/2)xTPx+ qTx+

n∑

i=1

λi(x2i − 1)

= (1/2)xT (P + 2diag(λ))x+ qTx− 1Tλ

• g(λ) = −(1/2)qT (P + 2diag(λ))−1

q − 1Tλ; need P + 2diag(λ) ≻ 0

• solve dual problem to get best lower bound:

maximize −(1/2)qT (P + 2diag(λ))−1

q − 1Tλsubject to λ � 0, P + 2diag(λ) ≻ 0

EE364b, Stanford University 13

Page 15: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Some (related) issues

• approximate convex problem can be infeasible

• how do we evaluate progress when x(k) isn’t feasible?need to take into account

– objective f0(x(k))

– inequality constraint violations fi(x(k))+

– equality constraint violations |hi(x(k))|

• controlling the trust region size

– ρ too large: approximations are poor, leading to bad choice of x(k+1)

– ρ too small: approximations are good, but progress is slow

EE364b, Stanford University 14

Page 16: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Exact penalty formulation

• instead of original problem, we solve unconstrained problem

minimize φ(x) = f0(x) + λ (∑m

i=1 fi(x)+ +∑p

i=1 |hi(x)|)

where λ > 0

• for λ large enough, minimizer of φ is solution of original problem

• for SCP, use convex approximation

φ(x) = f0(x) + λ

(

m∑

i=1

fi(x)+ +

p∑

i=1

|hi(x)|

)

• approximate problem always feasible

EE364b, Stanford University 15

Page 17: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Trust region update

• judge algorithm progress by decrease in φ, using solution x ofapproximate problem

• decrease with approximate objective: δ = φ(x(k))− φ(x)(called predicted decrease)

• decrease with exact objective: δ = φ(x(k))− φ(x)

• if δ ≥ αδ, ρ(k+1) = βsuccρ(k), x(k+1) = x(α ∈ (0, 1), βsucc ≥ 1; typical values α = 0.1, βsucc = 1.1)

• if δ < αδ, ρ(k+1) = βfailρ(k), x(k+1) = x(k)

(βfail ∈ (0, 1); typical value βfail = 0.5)

• interpretation: if actual decrease is more (less) than fraction α ofpredicted decrease then increase (decrease) trust region size

EE364b, Stanford University 16

Page 18: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Nonlinear optimal control

θ1

θ2

τ1

τ2

l1, m1

l2, m2

• 2-link system, controlled by torques τ1 and τ2 (no gravity)

EE364b, Stanford University 17

Page 19: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

• dynamics given by M(θ)θ +W (θ, θ)θ = τ , with

M(θ) =

[

(m1 +m2)l21 m2l1l2(s1s2 + c1c2)

m2l1l2(s1s2 + c1c2) m2l22

]

W (θ, θ) =

[

0 m2l1l2(s1c2 − c1s2)θ2m2l1l2(s1c2 − c1s2)θ1 0

]

si = sin θi, ci = cos θi

• nonlinear optimal control problem:

minimize J =∫ T

0‖τ(t)‖22 dt

subject to θ(0) = θinit, θ(0) = 0, θ(T ) = θfinal, θ(T ) = 0‖τ(t)‖∞ ≤ τmax, 0 ≤ t ≤ T

EE364b, Stanford University 18

Page 20: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Discretization

• discretize with time interval h = T/N

• J ≈ h∑N

i=1 ‖τi‖22, with τi = τ(ih)

• approximate derivatives as

θ(ih) ≈θi+1 − θi−1

2h, θ(ih) ≈

θi+1 − 2θi + θi−1

h2

• approximate dynamics as set of nonlinear equality constraints:

M(θi)θi+1 − 2θi + θi−1

h2+W

(

θi,θi+1 − θi−1

2h

)

θi+1 − θi−1

2h= τi

• θ0 = θ1 = θinit; θN = θN+1 = θfinal

EE364b, Stanford University 19

Page 21: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

• discretized nonlinear optimal control problem:

minimize h∑N

i=1 ‖τi‖22

subject to θ0 = θ1 = θinit, θN = θN+1 = θfinal‖τi‖∞ ≤ τmax, i = 1, . . . , N

M(θi)θi+1−2θi+θi−1

h2 +W(

θi,θi+1−θi−1

2h

)

θi+1−θi−12h = τi

• replace equality constraints with quasilinearized versions

M(θ(k)i )

θi+1 − 2θi + θi−1

h2+W

(

θ(k)i ,

θ(k)i+1 − θ

(k)i−1

2h

)

θi+1 − θi−1

2h= τi

• trust region: only on θi

• initialize with θi = ((i− 1)/(N − 1))(θfinal − θinit), i = 1, . . . , N

EE364b, Stanford University 20

Page 22: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Numerical example

• m1 = 1, m2 = 5, l1 = 1, l2 = 1

• N = 40, T = 10

• θinit = (0,−2.9), θfinal = (3, 2.9)

• τmax = 1.1

• α = 0.1, βsucc = 1.1, βfail = 0.5, ρ(1) = 90◦

• λ = 2

EE364b, Stanford University 21

Page 23: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

SCP progress

5 10 15 20 25 30 35 4010

20

30

40

50

60

70

k

φ(x

(k) )

EE364b, Stanford University 22

Page 24: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Convergence of J and torque residuals

5 10 15 20 25 30 35 4010.5

11

11.5

12

12.5

13

13.5

14

k

J(k

)

5 10 15 20 25 30 35 4010

−3

10−2

10−1

100

101

102

ksum

oftorqueresiduals

EE364b, Stanford University 23

Page 25: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Predicted and actual decreases in φ

5 10 15 20 25 30 35 40−20

0

20

40

60

80

100

120

140

k

δ(dotted),δ(solid)

5 10 15 20 25 30 35 4010

−3

10−2

10−1

100

101

102

kρ(k

)(◦)

EE364b, Stanford University 24

Page 26: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Trajectory plan

0 1 2 3 4 5 6 7 8 9 10−0.5

0

0.5

1

1.5

0 2 4 6 8 10−1

0

1

2t

t

τ1

τ2

0 1 2 3 4 5 6 7 8 9 100

0.5

1

1.5

2

2.5

3

3.5

0 2 4 6 8 10−5

0

5t

tθ1

θ2

EE364b, Stanford University 25

Page 27: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Convex composite

• general form: for h : Rm → R convex, c : Rn → Rm smooth,

f(x) = h(c(x))

• exact penalty formulation of

minimize f(x) subject to c(x) = 0

• approximate f locally by convex approximation: near x,

f(y) ≈ fx(y) = h(c(x) +∇c(x)T (y − x))

EE364b, Stanford University 26

Page 28: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Convex composite (prox-linear) algorithm

given function f = h ◦ c and convex domain C,line search parameters α ∈ (0, .5), β ∈ (0, 1), stopping tolerance ǫ > 0

k := 0repeat

Use model f = fx(k)

Set x(k+1) = argminx∈C{f(x)} and direction ∆(k+1) = x(k+1) − x(k)

Set δ(k) = f(x(k) +∆(k))− f(x(k))Set t = 1while f(x(k) + t∆(k)) ≥ f(x(k)) + αtδ(k)

t = β · tIf ‖∆(k+1)‖2/t ≤ ǫ, quitk := k + 1

EE364b, Stanford University 27

Page 29: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Nonlinear measurements (phase retrieval)

• phase retrieval problem: for ai ∈ Cn, x⋆ ∈ Cn, observe

bi = |a∗ix⋆|2

• goal is to find x, natural objectives are of form

f(x) =∥

∥|Ax|2 − b∥

• “robust” phase retrieval problem

f(x) =

m∑

i=1

∣|a∗ix|2 − bi

or quadratic objective

f(x) =1

2

m∑

i=1

(

|a∗ix|2 − bi

)2

EE364b, Stanford University 28

Page 30: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Numerical example

• m = 200, n = 50, over reals R (sign retrieval)

• Generate 10 independent examples, A ∈ Rm×n, b = |Ax⋆|2,

Aij ∼ N (0, 1), x⋆ ∼ N (0, I)

• Two sets of experiments: initialize at

x(0) ∼ N (0, I) or x(0) ∼ N (x⋆, I)

• Use h(z) = ‖z‖1 or h(z) = ‖z‖22, c(x) = (Ax)2 − b.

EE364b, Stanford University 29

Page 31: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Numerical example (absolute loss, random initialization)

20 40 60 80 10010−5

10−4

10−3

10−2

10−1

100

101

102

103f(x

(k) )−

f(x

⋆)

k

EE364b, Stanford University 30

Page 32: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Numerical example (absolute loss, good initialization)

1 2 3 4 5 6 710−5

10−4

10−3

10−2

10−1

100

101

102

103f(x

(k) )−

f(x

⋆)

k

EE364b, Stanford University 31

Page 33: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Numerical example (squared loss, random init)

1 2 3 4 5 6 7 8 9 1010−6

10−5

10−4

10−3

10−2

10−1

100

101f(x

(k) )−

f(x

⋆)

k

EE364b, Stanford University 32

Page 34: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Numerical example (squared loss, good init)

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.010−7

10−6

10−5

10−4

10−3

10−2

10−1

100

101f(x

(k) )−

f(x

⋆)

k

EE364b, Stanford University 33

Page 35: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Extensions and convergence of basic prox-linear method

• regularization or “trust” region: update

x(k+1) = argminx∈C

{

h(c(x(k)) +∇c(x(k))T (x− x(k))) +1

2αk

‖x− x(k)‖22

}

• with line search or αk small enough, lower bound oninfx f(x) = infx h(c(x)) > −∞, guaranteed to converge to stationarypoint

• When h(z) = ‖z‖22, often called ’Gauss–Newton’ method, some variantscalled ’Levenberg–Marquardt’

EE364b, Stanford University 34

Page 36: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

‘Difference of convex’ programming

• express problem as

minimize f0(x)− g0(x)subject to fi(x)− gi(x) ≤ 0, i = 1, . . . ,m

where fi and gi are convex

• fi − gi are called ‘difference of convex’ functions

• problem is sometimes called ‘difference of convex programming’

EE364b, Stanford University 35

Page 37: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Convex-concave procedure

• obvious convexification at x(k): replace f(x)− g(x) with

f(x) = f(x)− g(x(k))−∇g(x(k))T (x− x(k))

• since f(x) ≥ f(x) for all x, no trust region is needed

– true objective at x is better than convexified objective– true feasible set contains feasible set for convexified problem

• SCP sometimes called ‘convex-concave procedure’

EE364b, Stanford University 36

Page 38: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Example (BV §7.1)

• given samples y1, . . . , yN ∈ Rn from N (0,Σtrue)

• negative log-likelihood function is

f(Σ) = log detΣ +Tr(Σ−1Y ), Y = (1/N)

N∑

i=1

yiyTi

(dropping a constant and positive scale factor)

• ML estimate of Σ, with prior knowledge Σij ≥ 0:

minimize f(Σ) = log detΣ +Tr(Σ−1Y )subject to Σij ≥ 0, i, j = 1, . . . , n

with variable Σ (constraint Σ ≻ 0 is implicit)

EE364b, Stanford University 37

Page 39: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

• first term in f is concave; second term is convex

• linearize first term in objective to get

f(Σ) = log detΣ(k) +Tr(

(Σ(k))−1(Σ− Σ(k)))

+Tr(Σ−1Y )

EE364b, Stanford University 38

Page 40: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Numerical example

convergence of problem instance with n = 10, N = 15

1 2 3 4 5 6 7−30

−25

−20

−15

−10

−5

0

k

f(Σ

)

EE364b, Stanford University 39

Page 41: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Alternating convex optimization

• given nonconvex problem with variable (x1, . . . , xn) ∈ Rn

• I1, . . . , Ik ⊂ {1, . . . , n} are index subsets with⋃

j Ij = {1, . . . , n}

• suppose problem is convex in subset of variables xi, i ∈ Ij,when xi, i 6∈ Ij are fixed

• alternating convex optimization method: cycle through j, in each stepoptimizing over variables xi, i ∈ Ij

• special case: bi-convex problem

– x = (u, v); problem is convex in u (v) with v (u) fixed– alternate optimizing over u and v

EE364b, Stanford University 40

Page 42: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Nonnegative matrix factorization

• NMF problem:minimize ‖A−XY ‖Fsubject to Xij, Yij ≥ 0

variables X ∈ Rm×k, Y ∈ Rk×n, data A ∈ Rm×n

• difficult problem, except for a few special cases (e.g., k = 1)

• alternating convex optimation: solve QPs to optimize over X, then Y ,then X . . .

EE364b, Stanford University 41

Page 43: sequential convex programming alternating convex ... · Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization

Example

• convergence for example with m = n = 50, k = 5(five starting points)

0 5 10 15 20 25 300

5

10

15

20

25

30

k

‖A−

XY‖ F

EE364b, Stanford University 42