Top Banner

of 42

cg_ex10

Apr 06, 2018

Download

Documents

John Walker
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/3/2019 cg_ex10

    1/42

    Gradient Methods

    Yaron Lipman

    May 2003

  • 8/3/2019 cg_ex10

    2/42

    Preview

    Background

    Steepest Descent

    Conjugate Gradient

  • 8/3/2019 cg_ex10

    3/42

    Preview

    Background

    Steepest Descent

    Conjugate Gradient

  • 8/3/2019 cg_ex10

    4/42

    Background

    Motivation

    The gradient notion

    The Wolfe Theorems

  • 8/3/2019 cg_ex10

    5/42

    Motivation

    The min(max) problem:

    But we learned in calculus how to solve that

    kind of question!

    )(min xfx

  • 8/3/2019 cg_ex10

    6/42

    Motivation

    Not exactly,

    Functions: High order polynomials:

    What about function that dont have an analytic

    presentation: Black Box

    x1

    6x3 1

    120x5 1

    5040x7

    RRfn

    p:

  • 8/3/2019 cg_ex10

    7/42

    Motivation

    real world problem finding harmonic mapping

    General problem: find global min(max)

    This lecture will concentrate on finding localminimum.

    !

    Eji

    jijiharm kE),(

    2

    ,2

    1vv

    RRyyxxE nnnharm p211 :),,,,,( --

  • 8/3/2019 cg_ex10

    8/42

    Background

    Motivation

    The gradient notion The Wolfe Theorems

  • 8/3/2019 cg_ex10

    9/42

    :=f p( ),x y

    cos

    1

    2 x

    cos

    1

    2 y x

  • 8/3/2019 cg_ex10

    10/42

    Directional Derivatives:first, the one dimension derivative:

    U

  • 8/3/2019 cg_ex10

    11/42

    x

    yxf

    x

    x ),(

    yyxf

    xx ),(

    Directional Derivatives :Along the Axes

  • 8/3/2019 cg_ex10

    12/42

    v

    yxf

    x

    x ),(

    2

    Rv1!v

    Directional Derivatives :In general direction

  • 8/3/2019 cg_ex10

    13/42

    Directional

    Derivatives

    x

    yxf

    x

    x ),(

    y

    yxf

    x

    x ),(

  • 8/3/2019 cg_ex10

    14/42

    In the plane

    2R

    RRfp

    2:

    x

    x

    x

    x!

    y

    f

    x

    fyxf :),(

    The Gradient: Definition in

  • 8/3/2019 cg_ex10

    15/42

    x

    x

    x

    x!

    nn x

    f

    x

    fxxf ,...,:),...,(

    11

    RRf

    np:

    The Gradient: Definition

  • 8/3/2019 cg_ex10

    16/42

    The Gradient Properties

    The gradient defines (hyper) plane

    approximating the function infinitesimally

    yy

    fx

    x

    fz (

    x

    x(

    x

    x!(

  • 8/3/2019 cg_ex10

    17/42

    The Gradient properties

    By the chain rule: (important for later use)

    vfpv

    fp ,)()( !

    x

    x1!v

  • 8/3/2019 cg_ex10

    18/42

    The Gradient properties

    Proposition 1:

    is maximal choosing

    is minimal choosing

    (intuitive: the gradient point the greatest change direction)

    v

    f

    x

    xp

    p

    ff

    v )()(

    1

    !

    p

    pff

    v

    )()(

    1

    !

  • 8/3/2019 cg_ex10

    19/42

    The Gradient properties

    Proof: (only for minimum case)

    Assign: by chain rule:

    p

    p

    p

    pp

    p

    p

    p

    p

    ff

    fff

    f

    f

    f

    fpv

    yxf

    )()(

    )()(,)(

    )(

    1

    )(

    )(

    1,)()(

    ),(

    2

    !

    !

    !

    !

    x

    x

    p

    p

    ff

    v )()(

    1

    !

  • 8/3/2019 cg_ex10

    20/42

    The Gradient properties

    On the other hand for general v:

    p

    p

    pp

    fpv

    yxff

    vfvfpv

    yxf

    )()(),(

    )(

    )(,)()(),(

    ux

    x

    !

    !e!x

    x

  • 8/3/2019 cg_ex10

    21/42

    The Gradient Properties

    Proposition 2: let be a

    smooth function around P,if f has local minimum (maximum) at p

    then,

    (Intuitive: necessary for local min(max))

    RRfnp:

    0)( ! pf

    1

    C

  • 8/3/2019 cg_ex10

    22/42

    The Gradient Properties

    Proof:

    Intuitive:

  • 8/3/2019 cg_ex10

    23/42

    The Gradient Properties

    Formally: for any

    We get:

    }0{\nRv

    0)(

    ,)()0()(

    0

    !

    !

    !

    p

    p

    f

    vf

    dt

    vtpdf

  • 8/3/2019 cg_ex10

    24/42

    The Gradient Properties

    We found the best INFINITESIMAL DIRECTION

    at each point, Looking for minimum: blind man procedure

    How can we derive the way to the minimum

    using this knowledge?

  • 8/3/2019 cg_ex10

    25/42

    Background

    Motivation

    The gradient notion The Wolfe Theorems

  • 8/3/2019 cg_ex10

    26/42

    The Wolfe Theorem

    This is the link from the previous gradient

    properties to the constructive algorithm. The problem:

    )(min xfx

  • 8/3/2019 cg_ex10

    27/42

    The Wolfe Theorem

    We introduce a model for algorithm:

    Data:Step 0: set i=0

    Step 1: if stop,

    else, compute search directionStep 2: compute the step-size

    Step 3: set go to step 1

    n

    Rx 0

    0)( ! ixfn

    i Rh

    )(minarg0

    iiihxf

    u

    PPP

    iiii

    hxx !

    P1

  • 8/3/2019 cg_ex10

    28/42

    The Wolfe Theorem

    The Theorem: suppose C1

    smooth, and exist continuous function:

    And,

    And, the search vectors constructed by the

    model algorithm satisfy:

    RRf n p:

    ]1,0[: pnRk

    0)(0)(: "{

    xkxfx

    iiiii hxfxkhxf e )()(),(

  • 8/3/2019 cg_ex10

    29/42

    The Wolfe Theorem

    And

    Then if is the sequence constructed by

    the algorithm model,

    then any accumulation point y of this sequencesatisfy:

    g

    !0}{ iixx^

    0)( ! yf

    00)( {{i

    hyf

  • 8/3/2019 cg_ex10

    30/42

    The Wolfe Theorem

    The theorem has very intuitive interpretation :

    Always go in decent direction.

    )( ixf

    ih

  • 8/3/2019 cg_ex10

    31/42

    Preview

    Background

    Steepest Descent Conjugate Gradient

  • 8/3/2019 cg_ex10

    32/42

    Steepest Descent

    What it mean?

    We now use what we have learned toimplement the most basic minimization

    technique.

    First we introduce the algorithm, which is a

    version of the model algorithm.

    The problem:)(min xf

    x

  • 8/3/2019 cg_ex10

    33/42

    Steepest Descent

    Steepest descent algorithm:

    Data:Step 0: set i=0

    Step 1: if stop,

    else, compute search directionStep 2: compute the step-size

    Step 3: set go to step 1

    n

    Rx 0

    0)( ! ixf

    )( ii xfh!

    )(minarg0

    iiihxf

    u

    PPP

    iiii

    hxx !

    P1

  • 8/3/2019 cg_ex10

    34/42

    Steepest Descent

    Theorem: if is a sequence constructed

    by the SD algorithm, then every accumulationpoint y of the sequence satisfy:

    Proof: from Wolfe theorem

    0)( ! yf

    g

    !0}{ iix

  • 8/3/2019 cg_ex10

    35/42

    Steepest Descent

    From the chain rule:

    Therefore the method of steepest descentlooks like this:

    0),()( !! iiiiiii hhxfhxfdd PPP

  • 8/3/2019 cg_ex10

    36/42

    Steepest Descent

  • 8/3/2019 cg_ex10

    37/42

    Steepest Descent

    The steepest descent find critical point and

    local minimum. Implicit step-size rule

    Actually we reduced the problem to finding

    minimum:

    There are extensions that gives the step size

    rule in discrete sense. (Armijo)

    RRf p:

  • 8/3/2019 cg_ex10

    38/42

    Preview

    Background

    Steepest Descent Conjugate Gradient

  • 8/3/2019 cg_ex10

    39/42

    Conjugate Gradient

    Modern optimization methods : conjugate

    direction

    methods. A method to solve quadratic function

    minimization:

    (H is symmetric and positive definite)

    },,{min 21

    xdHxxnRx

  • 8/3/2019 cg_ex10

    40/42

    Conjugate Gradient

    Originally aimed to solve linear problems:

    Later extended to general functions under

    rational ofquadratic approximation to a

    function is quite accurate.

    2min bAxbAxnRx

    !

  • 8/3/2019 cg_ex10

    41/42

    Conjugate Gradient

    The basic idea: decompose the n-dimensional

    quadratic problem into n problems of1-dimension

    This is done by exploring the function in

    conjugate directions.

    Definition: H-conjugate vectors:

    jiHuuRu jinn

    ii {!! ,0,,}{ 1

  • 8/3/2019 cg_ex10

    42/42

    Conjugate Gradient

    If there is an H-conjugate basis then:

    N problems in 1-dimension (simple smiling quadratic)

    The global minimizer is calculated sequentially startingfrom x0:

    !j

    hjhxx P0

    !

    !

    jjjjjj hHxdHhhxfxf

    xdHxxxf

    PP,,)()(

    ,,:)(

    0

    2

    2

    1

    0

    21

    )1...,,1,0(,1

    !!

    nihxxiiii

    P