cg_ex10

8/3/2019 cg_ex10

1/42

Gradient Methods

Yaron Lipman

May 2003

8/3/2019 cg_ex10

2/42

Preview

Background

Steepest Descent

Conjugate Gradient

8/3/2019 cg_ex10

3/42

Preview

Background

Steepest Descent

Conjugate Gradient

8/3/2019 cg_ex10

4/42

Background

Motivation

The gradient notion

The Wolfe Theorems

8/3/2019 cg_ex10

5/42

Motivation

The min(max) problem:

But we learned in calculus how to solve that

kind of question!

)(min xfx

8/3/2019 cg_ex10

6/42

Motivation

Not exactly,

Functions: High order polynomials:

What about function that dont have an analytic

presentation: Black Box

x1

6x3 1

120x5 1

5040x7

RRfn

p:

8/3/2019 cg_ex10

7/42

Motivation

real world problem finding harmonic mapping

General problem: find global min(max)

This lecture will concentrate on finding localminimum.

!

Eji

jijiharm kE),(

2

,2

1vv

RRyyxxE nnnharm p211 :),,,,,( --

8/3/2019 cg_ex10

8/42

Background

Motivation

The gradient notion The Wolfe Theorems

8/3/2019 cg_ex10

9/42

:=f p( ),x y

cos

1

2 x

cos

1

2 y x

8/3/2019 cg_ex10

10/42

Directional Derivatives:first, the one dimension derivative:

U

8/3/2019 cg_ex10

11/42

x

yxf

x

x ),(

yyxf

xx ),(

Directional Derivatives :Along the Axes

8/3/2019 cg_ex10

12/42

v

yxf

x

x ),(

2

Rv1!v

Directional Derivatives :In general direction

8/3/2019 cg_ex10

13/42

Directional

Derivatives

x

yxf

x

x ),(

y

yxf

x

x ),(

8/3/2019 cg_ex10

14/42

In the plane

2R

RRfp

2:

x

x

x

x!

y

f

x

fyxf :),(

The Gradient: Definition in

8/3/2019 cg_ex10

15/42

x

x

x

x!

nn x

f

x

fxxf ,...,:),...,(

11

RRf

np:

The Gradient: Definition

8/3/2019 cg_ex10

16/42

The Gradient Properties

The gradient defines (hyper) plane

approximating the function infinitesimally

yy

fx

x

fz (

x

x(

x

x!(

8/3/2019 cg_ex10

17/42

The Gradient properties

By the chain rule: (important for later use)

vfpv

fp ,)()( !

x

x1!v

8/3/2019 cg_ex10

18/42


Proposition 1:

is maximal choosing

is minimal choosing

(intuitive: the gradient point the greatest change direction)

v

f

x

xp

p

ff

v )()(

1

!

p

pff

v

)()(

1

!

8/3/2019 cg_ex10

19/42


Proof: (only for minimum case)

Assign: by chain rule:

p

p

p

pp

p

p

p

p

ff

fff

f

f

f

fpv

yxf

)()(

)()(,)(

)(

1

)(

)(

1,)()(

),(

2

!

!

!

!

x

x

p

p

ff

v )()(

1

!

8/3/2019 cg_ex10

20/42


On the other hand for general v:

p

p

pp

fpv

yxff

vfvfpv

yxf

)()(),(

)(

)(,)()(),(

ux

x

!

!e!x

x

8/3/2019 cg_ex10

21/42


Proposition 2: let be a

smooth function around P,if f has local minimum (maximum) at p

then,

(Intuitive: necessary for local min(max))

RRfnp:

0)( ! pf

1

C

8/3/2019 cg_ex10

22/42


Proof:

Intuitive:

8/3/2019 cg_ex10

23/42


Formally: for any

We get:

}0{\nRv

0)(

,)()0()(

0

!

!

!

p

p

f

vf

dt

vtpdf

8/3/2019 cg_ex10

24/42


We found the best INFINITESIMAL DIRECTION

at each point, Looking for minimum: blind man procedure

How can we derive the way to the minimum

using this knowledge?

8/3/2019 cg_ex10

25/42

Background

Motivation

The gradient notion The Wolfe Theorems

8/3/2019 cg_ex10

26/42

The Wolfe Theorem

This is the link from the previous gradient

properties to the constructive algorithm. The problem:

)(min xfx

8/3/2019 cg_ex10

27/42

The Wolfe Theorem

We introduce a model for algorithm:

Data:Step 0: set i=0

Step 1: if stop,

else, compute search directionStep 2: compute the step-size

Step 3: set go to step 1

n

Rx 0

0)( ! ixfn

i Rh

)(minarg0

iiihxf

u

PPP

iiii

hxx !

P1

8/3/2019 cg_ex10

28/42

The Wolfe Theorem

The Theorem: suppose C1

smooth, and exist continuous function:

And,

And, the search vectors constructed by the

model algorithm satisfy:

RRf n p:

]1,0[: pnRk

0)(0)(: "{

xkxfx

iiiii hxfxkhxf e )()(),(

8/3/2019 cg_ex10

29/42

The Wolfe Theorem

And

Then if is the sequence constructed by

the algorithm model,

then any accumulation point y of this sequencesatisfy:

g

!0}{ iixx^

0)( ! yf

00)( {{i

hyf

8/3/2019 cg_ex10

30/42

The Wolfe Theorem

The theorem has very intuitive interpretation :

Always go in decent direction.

)( ixf

ih

8/3/2019 cg_ex10

31/42

Preview

Background

Steepest Descent Conjugate Gradient

8/3/2019 cg_ex10

32/42

Steepest Descent

What it mean?

We now use what we have learned toimplement the most basic minimization

technique.

First we introduce the algorithm, which is a

version of the model algorithm.

The problem:)(min xf

x

8/3/2019 cg_ex10

33/42

Steepest Descent

Steepest descent algorithm:

Data:Step 0: set i=0

Step 1: if stop,

else, compute search directionStep 2: compute the step-size

Step 3: set go to step 1

n

Rx 0

0)( ! ixf

)( ii xfh!

)(minarg0

iiihxf

u

PPP

iiii

hxx !

P1

8/3/2019 cg_ex10

34/42

Steepest Descent

Theorem: if is a sequence constructed

by the SD algorithm, then every accumulationpoint y of the sequence satisfy:

Proof: from Wolfe theorem

0)( ! yf

g

!0}{ iix

8/3/2019 cg_ex10

35/42

Steepest Descent

From the chain rule:

Therefore the method of steepest descentlooks like this:

0),()( !! iiiiiii hhxfhxfdd PPP

8/3/2019 cg_ex10

36/42

Steepest Descent

8/3/2019 cg_ex10

37/42

Steepest Descent

The steepest descent find critical point and

local minimum. Implicit step-size rule

Actually we reduced the problem to finding

minimum:

There are extensions that gives the step size

rule in discrete sense. (Armijo)

RRf p:

8/3/2019 cg_ex10

38/42

Preview

Background

Steepest Descent Conjugate Gradient

8/3/2019 cg_ex10

39/42

Conjugate Gradient

Modern optimization methods : conjugate

direction

methods. A method to solve quadratic function

minimization:

(H is symmetric and positive definite)

},,{min 21

xdHxxnRx

8/3/2019 cg_ex10

40/42

Conjugate Gradient

Originally aimed to solve linear problems:

Later extended to general functions under

rational ofquadratic approximation to a

function is quite accurate.

2min bAxbAxnRx

!

8/3/2019 cg_ex10

41/42

Conjugate Gradient

The basic idea: decompose the n-dimensional

quadratic problem into n problems of1-dimension

This is done by exploring the function in

conjugate directions.

Definition: H-conjugate vectors:

jiHuuRu jinn

ii {!! ,0,,}{ 1

8/3/2019 cg_ex10

42/42

Conjugate Gradient

If there is an H-conjugate basis then:

N problems in 1-dimension (simple smiling quadratic)

The global minimizer is calculated sequentially startingfrom x0:

!j

hjhxx P0

!

!

jjjjjj hHxdHhhxfxf

xdHxxxf

PP,,)()(

,,:)(

0

2

2

1

0

21

)1...,,1,0(,1

!!

nihxxiiii

P

cg_ex10

Documents