Lipschitzian Piecewise Smooth Minimization [0.5ex] via ... EuroAd Workshop - Sabrina Fieg… · Lipschitzian Piecewise Smooth Minimization via Algorithmic Differentiation Sabrina

Lipschitzian Piecewise Smooth Minimizationvia Algorithmic Differentiation

Sabrina Fiege1 Andreas Griewank2 Andrea Walther1

1Institut für Mathematik, Universität Paderborn2Yachay Tech, Ecuador

18th Euro AD Workshop 2015Paderborn, Germany

Motivation

New Optimization Approach

Our goal: Locate local optima of a piecewise smooth function by

successive approximation by piecewise linear models and⇒ Piecewise Linearizationexplicit handling of kink structure in PL model.

Hierarchy of problems:

locally Lipschitz continuous

∪piecewise smooth (PS)

∪piecewise linear (PL)

∪piecewise linear and convex

S. Fiege, A. Griewank, and A. Walther 1 / 30 December 1, 2015

Motivation

New Optimization ApproachOur goal: Locate local optima of a piecewise smooth function by





∪piecewise linear (PL)


Lipschitz Optimization based on gray-box piecewise linearization,A. Griewank, A. Walther, SF, T. Bosse, Mathematical Programming, 2015


Motivation

New Optimization ApproachOur goal: Locate local optima of a piecewise smooth function by





∪ →piecewise linear (PL)


Lipschitz Optimization based on gray-box piecewise linearization,A. Griewank, A. Walther, SF, T. Bosse, Mathematical Programming, 2015

Work in Progress!Today’s talk.


Motivation

Observations

Solving min f (x) with f PL is not easy:

Global minimization is NP-hard.

Steepest descent with exact linesearch may fail.

Zeno behaviour possible,i.e., solution trajactory with infinitenumber of direction changes in afinite amount of time.

J.-B. Hiriart-Urruty and C. Lemaréchal,Convex Analysis and Minimization Algorithms I,

Springer, 1993

y

x-100-50

050

-20

-10

0

10

20

-400

-300

-200

-100

f(x,

y)

0

100

200

−100 −50 0 50−20

−15

−10

−5

0

5

10

15

20

x1

x 2

Nondifferentiable points of f

f0(x)

f2(x)

f−2(x)

f1(x)

f−1(x)x0=(9,−3)


Motivation

Assumptions

We consider Lipschitzian piecewise smooth funtions

f : Rn → R.

All nondifferentiabilities are incorporated by abs().

min(u, v) = (v + u − abs(v − u))/2,max(u, v) = (v + u + abs(v − u))/2and complementarity conditions are covered.

Handling of abs() is included in algorithmic differentiation tool ADOL-C.


AD Drivers

Outline

1 Motivation

2 AD DriversPiecewise LinearizationDirectional Active GradientAbs-normal Form

3 Lipschitzian Piecewise Smooth MinimizationMinimization of Piecewise Linear FunctionsMinimization of Piecewise Smooth FunctionNumerical Results

4 Conclusion and Outlook


AD Drivers Piecewise Linearization

Adapted Evaluation Procedure for PS Objectives

vi−n = xi i = 1 ... nzi = ψi (vj )j≺iσi = sign(zi ) i = 1 ... svi = σizi = abs(zi )y = ψs(vj )j≺s

Table : Reduced evaluation procedure

s ∈ N number of evaluations of absolut value function.σ = {−1, 0, 1}s is called signature vector.z ∈ Rs is called switching vector.



Piecewise Linearization

Construction of tangent approximation for each elemental function

∆vi = ∆vj ±∆vk for vi = vj ± vk∆vi = vj ∗∆vk + vk ∗∆vj for vi = vj ∗ vk∆vi = ϕ′(vj )j≺i ∗∆(vj )j≺i for vi = ϕi (vj )j≺i 6= abs(vj )

∆vi = abs(vj + ∆vj )− vi for vi = abs(vj )

One obtains the piecewise linearization

fPL,x (∆x) = f (x) + ∆f (x ; ∆x)

of the original PS function f at a point x with the argument ∆x .Andreas Griewank. On stable piecewise linearization and generalized algorithmic differentiation,Optimization Methods & Software, 28(6), 1139–1178 2013.



Example: Minimum and MaximumRemark: One obtains as the linearization of the min and max functions, themaximum and minimum of the linearized arguments.

−4 −3 −2 −1 0 1 2 3 4−5

0

5

10

15

20

25

x

x2−1

−0.1*(x−2)3+1

max(x2−1,−0.1*(x−2)

3+1)

linearization of x2−1

linearization of −0.1*(x−2)3+1

maximum of the two linearizations

−4 −3 −2 −1 0 1 2 3 4−5

0

5

10

15

20

25

x

x2−1

−0.1*(x−2)3+1

min(x2−1,−0.1*(x−2)

3+1)

linearization of x2−1

linearization of −0.1*(x−2)3+1

minimum of the two linearizations

max{x2 − 1,−0.1(x − 2)3 + 1} min{x2 − 1,−0.1(x − 2)3 + 1}



AD Drivers provided by ADOL-C

zos_pl_forward(tag,1,n,1,x,y,z)

Evaluates the PL at x , returns the function value y and the switchingvector z at that point.

s=get_num_switches(tag)

Returns the number of evaluations of the absolut value function.

fos_pl_forward(tag,1,n,x,deltax,y,deltay,z,deltaz)

Computes the increment ∆y = ∆f (x ; ∆x). Returns additionally theswitching vector z and its linearization ∆z.

ADOL-C: https://projects.coin-or.org/ADOL-C


AD Drivers Directional Active Gradient

Selection Functions and Limiting Gradients

PS functions can be represented by selection functions fσ as

f (x) ∈ {fσ(x) : σ ∈ E ⊂ {−1, 0, 1}s}.

where the selection functions fσ are continuously differentiable on openneigborhoods of points.

The Clarke subdifferential is given by

∂f (x) ≡ conv(∂Lf (x)) with ∂Lf (x) ≡ {∇fσ(x) : fσ(x) = f (x)}

where the elements of ∂Lf (x) are called limiting gradients.


AD Drivers Directional Active Gradient

AD Drivers provided by ADOL-C

A directionally active gradient g is given by

g ≡ g(x , d) ∈ ∂Lf (x) such that f ′(x , d) = gT d

and g(x ; d) equals ∇fσ(x) of a locally differentiable selection function fσ.

directional_active_gradient(tag,n,x,d,g)

Returns g(x ; d) at a given point x and a given direction d .


AD Drivers Abs-normal Form

The abs-normal form for PL functions (1)

Example

F (x1, x2) = x1 + |z1|+ |z3|with z1 = x1 − x2 z2 = x2 z3 = x1 − |z2|

z1z2z3y

=

0000

+

1 −1 0 0 00 1 0 0 01 0 0 −1 01 0 1 0 1

x1x2|z1||z2||z3|



The Abs-normal Form for PL Functions (2)

Definition Abs-normal form for PL F : Rn → R[

zy

]=

[c1c2

]+

[Z LaT bT

] [x|z|

]Z ∈ Rs×n, L ∈ Rs×s, a ∈ Rn, b ∈ Rs c1 ∈ Rs, c2 ∈ R

L is stricly lower triangular

Σ ≡ diag(σ) and |z| = Σ · zPL function fPL approximation of PS function.

PL fPL,x ≡ y can be written as abs-normal form.

Andreas Griewank. On stable piecewise linearization and generalized algorithmic differentiation,Optimization Methods & Software, 28(6), 1139–1178 2013.



The Abs-normal Form for PL Functions (2)


zy

]=

[c1c2

]+

[Z LaT bT

] [x

Σ · z


Take the first row, solve for z and plug into the 2nd

fσ(x) ≡ y = c2 + bT Σ(I − LΣ)−1c1︸︷︷︸≡γσ(x)

+ (aT + bT Σ(I − LΣ)−1Z )︸︷︷︸≡gσ(x)

x

The abs-normal form represents a PL function fσ : Rn → R with

fσ(x) = γσ(x) + gσ(x) · x



AD Driver provided by ADOL-C


zy

]=

[c1c2

]+

[Z LaT bT

] [x

Σ · z


abs_normal(tag,n,x,sigma,y,z,c1,c2,a,b,Z,L)

Computes a PL for a given PS function f and a given point x .Remark: c1, c2, a, b, Z and L only depent on the PS function f .


Lipschitzian Piecewise Smooth Minimization Minimization of PL Functions

Outline

1 Motivation

2 AD DriversPiecewise LinearizationDirectional Active GradientAbs-normal Form

3 Lipschitzian Piecewise Smooth MinimizationMinimization of Piecewise Linear FunctionsMinimization of Piecewise Smooth FunctionNumerical Results

4 Conclusion and Outlook



Description of Polyhedral Structure

The polyhedra Pσ ≡ {x ∈ Rn : σ(x) = σ}are relatively open and convex.

are mutually disjoint, their union is the whole Rn.

1

0.5x0

-0.5

-1

-3

y

-2

-1

f(x,

y)

-1-0.5

00.5

1

0

1

2

−2 −1 0 1 2−2

−1

0

1

2

x1

x2

σ=(−1,−1)

σ=(−1,1)

σ=(−1,0) ↓

σ=(1,−1)

σ=(1,1)

↑ σ=(1,0)

← σ=(0,−1)

← σ=(0,1)

← σ=(0,0)



Description of Polyhedral Structure

The polyhedra Pσ ≡ {x ∈ Rn : σ(x) = σ}are relatively open and convex.

are mutually disjoint, their union is the whole Rn.Further properties:

fσ is essentially active at all points in P̄σ providedPσ is open.

The corresponding σ are are called essential and

E = {σ ∈ {−1, 0, 1}s : ∅ 6= Pσ open}.

The signature vectors are partially ordered by

σ � σ̃ :⇐⇒ σ2i ≤ σ̃i σi for 1 ≤ i ≤ s.

1

0.5x0

-0.5

-1

-3

y

-2

-1

f(x,

y)

-1-0.5

00.5

1

0

1

2

−2 −1 0 1 2−2

−1

0

1

2

x1

x2

σ=(−1,−1)

σ=(−1,1)

σ=(−1,0) ↓

σ=(1,−1)

σ=(1,1)

↑ σ=(1,0)

← σ=(0,−1)

← σ=(0,1)

← σ=(0,0)



Solution of PL Function by PLMin()

PLMin(): Preconditions: x0 ∈ Rn, q ≥ 0, ∆x = 0, σ = σ(x0)

1 Determine solution ∆x of local QP on current polyhedron Pσ.

2 Compute bundle G.

3 Compute direction d that identifies the new polyhedra Pσ.

4 Update xk+1 = xk + ∆x , k = k + 1

5 If d = 0: STOP, else go to 1.



Step 1: Solve local quadratic problemSolve local QP on current, open polyhedron Pσ.

min∆x

fσ +q2‖∆x‖2,

s.t. eTi (z(xk ) +∇z(xk )T ∆x) =

{≥ 0 σ > 0≤ 0 σ < 0

This yields xk+1 = xk + ∆x , σ̂ = σ(xk+1), active set Â = {i|σ̂ = 0 or λi 6= 0}.

−100 −50 0 50−20

−15

−10

−5

0

5

10

15

20

x

y



Step 2 & 3: Compute bundle G and direction d (1)

Given q ≥ 0 and ∅ 6= G ⊂ ∂Lf (x). Compute new direction d by

d(x) = shortest(qx ,G)

= argmin

||d ||∣∣∣∣∣∣d =

m∑j=1

βjgj − qx , gj ∈ G, βj ≥ 0,m∑

j=1

βj = 1

.Interpretation of d :

d = 0 Stationary point

(g + qx)T d < 0 Direction of descent

(g + qx)T d > 0 Use computeStep() to collect further gradients g




Interpretation of d :




computeStep(x,q,G)repeat

{ d = −shortest(qx ,G)g = g(x ; d)G = G ∪ {g} }

until (g + qx)>d ≤ −‖d‖2G = ∅return d , G




Interpretation of d :




computeStep(x,q,G)repeat

{ d = −shortest(qx ,G)g = g(x ; d)G = G ∪ {g} }

until (g + qx)>d ≤ −‖d‖2G = ∅return d , G

−100 −50 0 50−20

−15

−10

−5

0

5

10

15

20

x

y



Convergence of Algorithm

Argument space is divided only into finitely many polyhedra.

Function value is decreased each time we switch from one polyheron toanother.

Algorithm must reach stationary point x̂ after finitely many steps


Lipschitzian Piecewise Smooth Minimization Minimization of PS Function

LiPsMinLiPsMin

Lipschitzian Piecewise Smooth Minimization

LiPSMin(): Let f be a PS function. Preconditions: x0 ∈ Rn, q ≥ 0for k = 0, 1, 2...

1 Generate local model f̂xk (∆x) = fPL,xk (∆x) +q2 ||∆x ||

2 with q ≥ 0.

2 Compute ∆x as stationary point of local model s.t. f (xk + ∆x) < f (xk ).

3 Update xk+1 = xk + ∆x .

4 If ||∆x || = 0: STOP5 Update q = max{q, q̂(xk )} and k = k + 1.



Step 1: Generate Local Model

Piecewise Linearization can be written in abs-normal form.

PL is of second order in the distance to the base point.

Add quadratic term to ensure the boundedness.

Generate local model f̂xk (∆x) = fPL,xk (∆x) +q2 ||∆x ||

2 with q ≥ 0.

Example: f : R2 → R, f (x1, x2) = max{x22 −max{x1, 0}, 0}

1

0.5x0

-0.5

-1

-3

y

-2

-1

f(x,

y)

0

1

-1-0.5

00.5

1

2

1

0.5x0

-0.5

-1

-3

y

-2

-1

f(x,

y)

-1-0.5

00.5

1

0

1

2

PS function and its local model at x0 = (−1, 1) with q = 0.01



Step 2 & 3: Optimization of Local Model (1)

Compute ∆x as stationary point of the local model f̂xk by PLMin().

Exploit structure of the domain of the function.

Update xk+1 = xk + ∆x .


1

0.5x0

-0.5

-1

-3

y

-2

-1

f(x,

y)

-1-0.5

00.5

1

0

1

2

1

0.5x0

-0.5

-1

-3

y

-2

-1

f(x,

y)

0

1

-1-0.5

00.5

1

2

Minimization of local model and new iterate x̂ = x1 = (−1, 0.5) of PS function



Step 2 & 3: Optimization of Local Model (2)

PLMin() does not guarantee that f (xk + ∆x) < f (xk ). Therefore we put in athird routine:

GuaranteeDescent(): // Precondition: x ,∆x ∈ Rn, q ≥ 0

for k = 0, 1, 2...

1 Set ∆x = 0 .

2 Call PLMin(x,∆x ,q).

3 Check if f (x + ∆x) < f (x) then STOP else increase q and go to 1.

Ongoing work: Prove that the algorithm above terminates after finitely manyiterations.



Step 5: Penalty coefficient

Update q = max{0.9q + 0.1q̂(xk ,∆x), q̂(xk ,∆x), q0} with∆x = xk+1 − xk and

q̂(xk ,∆x) =|f (xk+1)− f (xk )− fPL(xk ; ∆x)|

‖∆x‖2

Quadratic coefficient q ensures that local model is also bounded below.Example: For f (x) = x2 one obtains at x̄ = 1 the f̂x̄ (x̄ ; x − x̄) = 2x − 1.


1

0.5x0

-0.5

-1

-3

y

-2

-1

f(x,

y)

0

1

-1-0.5

00.5

1

2

x

y

-3

-2

f(x,

y) -1

1

0.5

0

-0.5

-1

-1-0.5

00.5

1

0

1

1

0.5x

0

-0.5

-1

-3

y

-2

-1

0f(x,

y)

1

-1-0.5

00.5

1

2

3

4

PS function, PL function with and without quadratic term with q = 1



Convergence of Algorithm

Convergence of LiPSMin

Under the assumptions

PS functionf has bounded level set with x0 the starting point,

{qk} is bounded, {∆xk} and {q̂k} are uniformly boundedand GuaranteeDescent() terminates after finitely many iterations,

all cluster points x∗ of the infinite sequence {xk}k∈N generated by LiPSMinsatisfy the first order minimality condition f ′(x∗, ·) ≥ 0 for Lipschitzianpiecewise smooth problems.


Lipschitzian Piecewise Smooth Minimization Numerical Results

Example

f : R2 7→ R, f (x1, x2) = max{−100, 3x1 − 2x2, 2x1 − 5x2, 3x1 + 2x2, 2x1 + 5x2}

−80 −60 −40 −20 0 20 40−15

−10

−5

0

5

10

15

x1

x2

f0(x)

f2(x)

f−2(x)

f1(x)

f−1(x) x0=(9,−3)

x*=(−50,0)



Results: Chained LQ

f (x) =n−1∑i=1

max−xi − xi+1,−xi − xi+1 + x2i + x2i+1 − 1

with x0i = −0.5, ∀i = 1, ..., n and f (x∗) = −(n − 1)√

2

n f ∗ #f #g #QP #iter5 -5.657 29 63 63 14

LiPsMin 10 -12.728 21 57 57 1020 -26.87 21 660 659 105 -5.657 88 88 - 51

MPBNGC 10 -12.728 123 123 - 10620 -26.87 1011 1011 - 1000

MPBNGC is a multiobjective proximal bundle method for nonconvex,nonsmooth (nondifferentiable) and generally constrained minimization, seeM.M.Mäkelä. Multiobjective Proximal Bundle Method for Nonconvex, Nonsmooth Optimization:Fortran Subroutine MPBNGC 2.0, Reports of the Department of Mathematical InformationTechnology, Series B, Scientific computing, No. B 13/2003, University of Jyväskylä, 2003.



Results: Active faces

f (x) = max1≤i≤n

{g(−n∑

i=1

xi ), g(xi ), } with g(y) = ln(|y |+ 1)

with x0i = 1, ∀i = 1, ..., n and f (x∗) = 0

n f ∗ #f #g #QP #iter5 1e-15 5 6 6 2

LiPsMin 10 1e-15 7 7 7 320 1e-15 9 11 11 45 0 18 18 - 15

MPBNGC 10 1e-11 1000 1000 - 99420 1e-11 1000 1000 - 991

Test problems, seeM. Haarala, K.Miettinen, M.M.Mäkelä.New Limited Memory Bundle Method for Large Scale Nonsmooth Optimization,OMS, 2007.


Conclusion and Outlook


AD drivers provided by ADOL-C

Minimization method for Lipschitzian PS functions: LiPsMin

Numerical results

Future Work:

Convergence theory

Strategy for building the bundle

Thank you for your attention! Questions?




AD drivers provided by ADOL-C

Minimization method for Lipschitzian PS functions: LiPsMin

Numerical results

Future Work:

Convergence theory

Strategy for building the bundle

Thank you for your attention! Questions?


MotivationAD DriversPiecewise LinearizationDirectional Active GradientAbs-normal Form

Lipschitzian Piecewise Smooth MinimizationMinimization of Piecewise Linear FunctionsMinimization of Piecewise Smooth FunctionNumerical Results


Lipschitzian Piecewise Smooth Minimization [0.5ex] via ... EuroAd Workshop - Sabrina Fieg… · Lipschitzian Piecewise Smooth Minimization via Algorithmic Differentiation Sabrina

Documents