Top Banner
Ellipsoid Method ellipsoid method convergence proof inequality constraints feasibility problems Prof. S. Boyd, EE392o, Stanford University
29

Ellipsoid Method - Stanford University · 2003. 9. 18. · Ellipsoid Method †ellipsoidmethod †convergenceproof †inequalityconstraints †feasibilityproblems Prof. S. Boyd, EE392o,

Jan 29, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Ellipsoid Method

    • ellipsoid method

    • convergence proof

    • inequality constraints

    • feasibility problems

    Prof. S. Boyd, EE392o, Stanford University

  • Challenges in cutting-plane methods

    • can be difficult to compute appropriate next query point

    • localization polyhedron grows in complexity as algorithm progresses

    can get around these challenges . . .

    ellipsoid method is another approach

    • developed in 70s by Shor and Yudin

    • used in 1979 by Khachian to give polynomial time algorithm for LP

    Prof. S. Boyd, EE392o, Stanford University 1

  • Ellipsoid algorithm

    idea: localize x? in an ellipsoid instead of a polyhedron

    1. at iteration k we know x? ∈ E(k)

    2. set x(k+1) := center(E (k)); evaluate ∇f(x(k+1)) (or g(k) ∈ ∂f(x(k+1)))

    3. hence we know

    x? ∈ E(k) ∩ {z | ∇f(x(k+1))T (z − x(k+1)) ≤ 0}

    (a half-ellipsoid)

    4. set E(k+1) := minimum volume ellipsoid coveringE(k) ∩ {z | ∇f(x(k+1))T (z − x(k+1)) ≤ 0}

    Prof. S. Boyd, EE392o, Stanford University 2

  • PSfrag replacements

    E(k)

    x(k+1)

    ∇f(x(k+1))

    E(k+1)

    compared to cutting-plane method:

    • localization set doesn’t grow more complicated

    • easy to compute query point

    • but, we add unnecessary points in step 4

    Prof. S. Boyd, EE392o, Stanford University 3

  • Properties of ellipsoid method

    • reduces to bisection for n = 1

    • simple formula for E (k+1) given E(k), ∇f(x(k+1))

    • E(k+1) can be larger than E (k) in diameter (max semi-axis length), butis always smaller in volume

    • vol(E(k+1)) < e−12n vol(E(k))

    (note that volume reduction factor depends on n)

    Prof. S. Boyd, EE392o, Stanford University 4

  • Example

    PSfrag replacements

    px(0)

    PSfrag replacements

    px(1)

    PSfrag replacements

    px(2)

    Prof. S. Boyd, EE392o, Stanford University 5

  • PSfrag replacements

    px(3)

    PSfrag replacements

    px(4)

    PSfrag replacements

    px(5)

    Prof. S. Boyd, EE392o, Stanford University 6

  • Updating the ellipsoid

    E(x,A) ={

    z | (z − x)TA−1(z − x) ≤ 1}

    PSfrag replacements rxrx+

    r

    ¡¡¡ª

    E

    @@@R

    E+g

    Prof. S. Boyd, EE392o, Stanford University 7

  • (for n > 1) minimum volume ellipsoid containing

    E ∩{

    z | gT (z − x) ≤ 0}

    is given by

    x+ = x−1

    n + 1Ag̃

    A+ =n2

    n2 − 1

    (

    A−2

    n + 1Ag̃g̃TA

    )

    where g̃∆= g

    /

    gTAg

    Prof. S. Boyd, EE392o, Stanford University 8

  • Stopping criterion

    x? ∈ Ek, so

    f(x?) ≥ f(x(k)) +∇f(x(k))T (x? − x(k))

    ≥ f(x(k)) + infx∈E(k)

    ∇f(x(k))T (x− x(k))

    = f(x(k))−√

    ∇f(x(k))TA(k)∇f(x(k))

    simple stopping criterion:

    ∇f(x(k))TA(k)∇f(x(k)) ≤ ²

    Prof. S. Boyd, EE392o, Stanford University 9

  • PSfrag replacements

    AK

    f(x(k))−√

    ∇f(x(k))TA(k)∇f(x(k))

    ¡ªf(x(k))

    f?

    k0 5 10 15 20 25 30

    Prof. S. Boyd, EE392o, Stanford University 10

  • more sophisticated stopping criterion: Uk − Lk ≤ ², where

    Uk = mini≤k

    f(x(i))

    Lk = maxi≤k

    (

    f(x(i))−√

    ∇f(x(i))TA(i)∇f(x(i))

    )

    Prof. S. Boyd, EE392o, Stanford University 11

  • PSfrag replacements

    @ILk

    ¡ªUk

    f?

    k0 5 10 15 20 25 30

    Prof. S. Boyd, EE392o, Stanford University 12

  • Basic ellipsoid algorithm

    ellipsoid described as E(x,A) = { z | (z − x)TA−1(z − x) ≤ 1 }

    given ellipsoid E(x,A) containing x?, accuracy ² > 0

    repeat1. evaluate ∇f(x) (or g ∈ ∂f(x))

    2. if√

    ∇f(x)TA∇f(x) ≤ ², return(x)3. update ellipsoid

    3a. g̃ := ∇f(x)/

    ∇f(x)TA∇f(x)

    3b. x := x− 1n+1Ag̃

    3c. A := n2

    n2−1

    (

    A− 2n+1Ag̃g̃TA

    )

    Prof. S. Boyd, EE392o, Stanford University 13

  • Interpretation

    • change coordinates so uncertainty (E) is unit ball

    • take gradient (or subgradient) step with fixed length 1/(n + 1)

    properties:

    • can propagate Cholesky factor of A; get O(n2) update

    • not a descent method

    • often slow but robust in practice

    Prof. S. Boyd, EE392o, Stanford University 14

  • Proof of convergence

    assumptions:

    • f is Lipschitz: |f(y)− f(x)| ≤ G‖y − x‖

    • E(0) is ball with radius R

    suppose f(x(i)) > f? + ², i = 0, . . . , k

    thenf(x) ≤ f? + ² =⇒ x ∈ E (k)

    since at iteration i we only discard points with f ≥ f(x(i))

    Prof. S. Boyd, EE392o, Stanford University 15

  • from Lipschitz condition,

    ‖x− x?‖ ≤ ²/G =⇒ f(x) ≤ f? + ² =⇒ x ∈ E (k)

    so B = {x | ‖x− x?‖ ≤ ²/G} ⊆ E(k)

    hence vol(B) ≤ vol(E (k)), so

    βn(²/G)n ≤ e−k/2n vol(E(0)) = e−k/2nβnR

    n

    (βn is volume of unit ball in Rn)

    therefore k ≤ 2n2 log(RG/²)

    Prof. S. Boyd, EE392o, Stanford University 16

  • PSfrag replacements

    E(0)

    E(k)

    x(k)

    f(x) ≤ f? + ²

    B = {x | ‖x− x?‖ ≤ ²/G}

    x?

    conclusion: for K > 2n2 log(RG/²),

    mini=0,...,K

    f(x(i)) ≤ f? + ²

    Prof. S. Boyd, EE392o, Stanford University 17

  • Interpretation of complexity

    since x? ∈ E0 = {x | ‖x− x(0)‖ ≤ R}, our prior knowledge of f? is

    f? ∈ [f(x(0))−GR, f(x(0))]

    our prior uncertainty in f? is GR

    after k iterations our knowledge of f? is

    f? ∈

    [

    mini=0,...,k

    f(x(i))− ², mini=0,...,k

    f(x(i))

    ]

    posterior uncertainty in f? is ≤ ²

    Prof. S. Boyd, EE392o, Stanford University 18

  • iterations required:

    2n2 logRG

    ²= 2n2 log

    prior uncertainty

    posterior uncertainty

    efficiency: 0.72/n2 bits per gradient evaluation (degrades with n)

    Prof. S. Boyd, EE392o, Stanford University 19

  • Inequality constrained problems

    minimize f0(x)subject to fi(x) ≤ 0, i = 1, . . . ,m

    same idea: maintain ellipsoids E (k) that

    • contain x?

    • decrease in volume to zero

    Prof. S. Boyd, EE392o, Stanford University 20

  • case 1: x(k) feasible, i.e., fi(x(k)) ≤ 0, i = 1, . . . ,m

    • then do usual update of E (k) based on ∇f0(x(k))

    • rules out halfspace of points with larger function value than currentpoint

    case 2: x(k) infeasible, say, fj(x(k)) > 0;

    • then ∇fj(x(k))T (x− x(k)) ≥ 0 =⇒ fj(x) > 0 =⇒ x infeasible so

    update E(k) based on ∇fj(x(k))

    • rules out halfspace of infeasible points

    Prof. S. Boyd, EE392o, Stanford University 21

  • Example

    PSfrag replacements

    ªf1(x) = 0

    px(0)

    ∇f1(x(0))

    PSfrag replacements

    px(1)∇f0(x

    (1))

    PSfrag replacements

    px(2)

    ∇f0(x(2))

    Prof. S. Boyd, EE392o, Stanford University 22

  • PSfrag replacementspx(3)

    ∇f1(x(3))

    PSfrag replacements

    px(4)∇f0(x

    (4))

    PSfrag replacements

    px(5)

    ∇f0(x(5))

    Prof. S. Boyd, EE392o, Stanford University 23

  • Stopping criterion

    if x(k) is feasible, we have a lower bound on f? as before:

    f? ≥ f(x(k))−√

    ∇f(x(k))TA(k)∇f(x(k))

    if x(k) is infeasible, we have for all x ∈ E (k)

    fj(x) ≥ fj(x(k)) +∇fj(x

    (k))T (x− x(k))

    ≥ fj(x(k)) + inf

    x∈E(k)∇fj(x

    (k))T (x− x(k))

    = fj(x(k))−

    ∇fj(x(k))TA(k)∇fj(x(k))

    Prof. S. Boyd, EE392o, Stanford University 24

  • hence, problem is infeasible if for some j,

    fj(x(k))−

    ∇fj(x(k))TA(k)∇fj(x(k)) > 0

    stopping criteria:

    • if x(k) is feasible and√

    ∇f0(x(k))TA(k)∇f0(x(k)) ≤ ²(x(k) is ²-suboptimal)

    • if fj(x(k))−

    ∇fj(x(k))TA(k)∇fj(x(k)) > 0(problem is infeasible)

    Prof. S. Boyd, EE392o, Stanford University 25

  • Ellipsoid method for feasibility

    abstract feasibility problem: find x ∈ C ⊂ Rn or determine C = ∅

    separating hyperplane oracle: for any x, oracle either

    • confirms x ∈ C, or

    • returns g 6= 0 s.t. z ∈ C ⇒ gT (z − x) ≤ 0

    PSfrag replacementsx(k)

    E(k+1)

    E(k)

    C

    g(k)

    Prof. S. Boyd, EE392o, Stanford University 26

  • start with E(0) which intersects C

    1. If x(k) := center(E (k)) ∈ C, quit. Else, compute g 6= 0, s.t.x ∈ C ⇒ gT (x− x(k)) ≤ 0

    2. E(k+1) := minimum volume ellipsoid covering

    E(k) ∩ {z | gT (z − x(k)) ≤ 0}

    Prof. S. Boyd, EE392o, Stanford University 27

  • Example

    PSfrag replacements ••PSfrag replacements ••

    PSfrag replacements•

    •PSfrag replacements

    ••

    Prof. S. Boyd, EE392o, Stanford University 28