Top Banner
Gradient Methods May 2005
53

Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Gradient Methods

May 2005

Page 2: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Preview

Background Steepest DescentConjugate Gradient

Page 3: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Preview

Background Steepest DescentConjugate Gradient

Page 4: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Background

Motivation The gradient notion The Wolfe Theorems

Page 5: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Motivation

The min(max) problem:

But we learned in calculus how to solve that kind of question!

)(min xfx

Page 6: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Motivation

Not exactly, Functions: High order polynomials:

What about function that don’t have an analytic presentation: “Black Box”

x1

6x

3 1

120x

5 1

5040x

7

RRf n :

Page 7: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Motivation- “real world” problem

Connectivity shapes (isenburg,gumhold,gotsman)

What do we get only from C without geometry?

{ ( , ), }mesh C V E geometry

Page 8: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Motivation- “real world” problem

First we introduce error functionals and then try to minimize them:

23

( , )

( ) 1ns i j

i j E

E x x x

( , )

1( )i j i

i j Ei

L x x xd

3 2

1

( ) ( )n

nr i

i

E x L x

Page 9: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Motivation- “real world” problem

Then we minimize:

High dimension non-linear problem. The authors use conjugate gradient method

which is maybe the most popular optimization technique based on what we’ll see here.

3

( , ) arg min 1 ( ) ( )n

s rx

E C E x E x

Page 10: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Motivation- “real world” problem

Changing the parameter:

3

( , ) arg min 1 ( ) ( )n

s rx

E C E x E x

Page 11: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Motivation

General problem: find global min(max) This lecture will concentrate on finding local

minimum.

Page 12: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Background

Motivation The gradient notion The Wolfe Theorems

Page 13: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

:= f ( ),x y

cos

1

2x

cos

1

2y x

Page 14: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Directional Derivatives: first, the one dimension derivative:

Page 15: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

x

yxf

),(

y

yxf

),(

Directional Derivatives : Along the Axes…

Page 16: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

v

yxf

),(

2Rv

1v

Directional Derivatives : In general direction…

Page 17: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Directional Derivatives

x

yxf

),(

y

yxf

),(

Page 18: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

In the plane

2R

RRf 2:

y

f

x

fyxf :),(

The Gradient: Definition in

),( yxf

Page 19: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

n

n x

f

x

fxxf ,...,:),...,(

11

RRf n :

The Gradient: Definition

Page 20: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

The Gradient Properties

The gradient defines (hyper) plane approximating the function infinitesimally

yy

fx

x

fz

Page 21: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

The Gradient properties

By the chain rule: (important for later use)

vfpv

fp ,)(

1v

pf

v

Page 22: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

The Gradient properties

Proposition 1: is maximal choosing

is minimal choosing

(intuitive: the gradient points at the greatest change direction)

v

f

p

p

ff

v

1

p

p

ff

v

1

Page 23: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

The Gradient properties

Proof: (only for minimum case)

Assign: by chain rule:

p

p

p

pp

p

p

p

p

ff

fff

f

ff

fpv

yxf

2

,1

)()(

1,)()(

),(

p

p

ff

v

1

Page 24: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

The Gradient properties

On the other hand for general v:

p

p

pp

fpv

yxf

f

vfvfpv

yxf

)(),(

,)(),(

Page 25: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

The Gradient Properties

Proposition 2: let be a smooth function around P,

if f has local minimum (maximum) at p

then,

(Intuitive: necessary for local min(max))

RRf n :

0 pf

1C

Page 26: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

The Gradient Properties

Proof:

Intuitive:

Page 27: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

The Gradient Properties

Formally: for any

We get:}0{\nRv

0)(

,)()0()(

0

p

p

f

vfdt

vtpdf

Page 28: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

The Gradient Properties

We found the best INFINITESIMAL DIRECTION at each point,

Looking for minimum: “blind man” procedureHow can we derive the way to the minimum

using this knowledge?

Page 29: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Background

Motivation The gradient notion The Wolfe Theorems

Page 30: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

The Wolfe Theorem

This is the link from the previous gradient properties to the constructive algorithm.

The problem:

)(min xfx

Page 31: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

The Wolfe Theorem

We introduce a model for algorithm:

Data:

Step 0: set i=0

Step 1: if stop,

else, compute search direction

Step 2: compute the step-size

Step 3: set go to step 1

nRx 0

0)( ixfn

i Rh

)(minarg0

iii hxf

iiii hxx 1

Page 32: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

The Wolfe Theorem

The Theorem: suppose C1 smooth, and exist continuous function:

And,

And, the search vectors constructed by the model algorithm satisfy:

RRf n :

]1,0[: nRk

0)(0)(: xkxfx

iiiii hxfxkhxf )()(),(

Page 33: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

The Wolfe Theorem

And

Then if is the sequence constructed by

the algorithm model,

then any accumulation point y of this sequence satisfy:

0}{ iix

0)( yf

00)( ihyf

Page 34: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

The Wolfe Theorem

The theorem has very intuitive interpretation :

Always go in decent direction.

)( ixf

ih

Page 35: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Preview

Background Steepest DescentConjugate Gradient

Page 36: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Steepest Descent

What it mean?We now use what we have learned to

implement the most basic minimization technique.

First we introduce the algorithm, which is a version of the model algorithm.

The problem: )(min xf

x

Page 37: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Steepest Descent

Steepest descent algorithm:

Data:

Step 0: set i=0

Step 1: if stop,

else, compute search direction

Step 2: compute the step-size

Step 3: set go to step 1

nRx 0

0)( ixf)( ii xfh

)(minarg0

iii hxf

iiii hxx 1

Page 38: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Steepest Descent

Theorem: if is a sequence constructed by the SD algorithm, then every accumulation point y of the sequence satisfy:

Proof: from Wolfe theorem

Remark: Wolfe theorem gives us numerical stability if the derivatives aren’t given (are calculated numerically).

0)( yf

0}{ iix

Page 39: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Steepest Descent

From the chain rule:

Therefore the method of steepest descent looks like this:

0),()( iiiii hhxfhxfd

d

Page 40: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Steepest Descent

Page 41: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Steepest Descent

The steepest descent find critical point and local minimum.

Implicit step-size rule Actually we reduced the problem to finding

minimum:

There are extensions that gives the step size rule in discrete sense. (Armijo)

RRf :

Page 42: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Steepest Descent

Back with our connectivity shapes: the authors solve the 1-dimension problem analytically.

They change the spring energy and get a quartic polynomial in x

)(minarg0

iii hxf

223

( , )

( ) 1ns i j

i j E

E x x x

Page 43: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Preview

Background Steepest DescentConjugate Gradient

Page 44: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Conjugate Gradient

We from now on assume we want to minimize the quadratic function:

This is equivalent to solve linear problem:

There are generalizations to general functions.

cxbAxxxf TT 2

1)(

bAxxf )(0

Page 45: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Conjugate Gradient

What is the problem with steepest descent?

We can repeat the same directions over and over…

Conjugate gradient takes at most n steps.

Page 46: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Conjugate Gradient

0x

1x

0d

1e

0e0

~xbxA ~

,...,...,, 10 jddd Search directions – should span

iiii dxx 1

iii AexxAxf

xAAxbAxxf

)~()(

~)(

xxe ii~

n

Page 47: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Conjugate Gradient

0x

1x

0d0

~x

Given , how do we calculate ? (as before)jd

iTi

iTi

iTi

iTi

i

iiiTi

iTi

iTi

Add

xfd

Add

Aed

deAd

Aed

xfd

)(

0)(

0

0)(

1

1

j

)( 1 ixf

Page 48: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Conjugate Gradient

0x

1x

0d

1e

0e0

~x

How do we find ?We want that after n step the error will be 0 :

jd

1

00

n

iiide

1

0110020010 ...

j

iiij deddedee

1

0

1

0

j

iii

n

iiij dde

Page 49: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Conjugate Gradient

Here an idea: if then:jj

11

0

1

0

1

0

1

0

n

jiii

j

iii

n

iii

j

iii

n

iiij ddddde

So if ,nj

0ne

Page 50: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Conjugate Gradient

So we look for such that :jj jd

0iTj Add

Simple calculation shows that if we take

A - conjugate (- orthogonal)ji

Page 51: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Conjugate Gradient

We have to find an A conjugate basis

We can do “gram-schmidt” process, but we should be careful since it is an O(n³) process:

1...0, njd j

k

i

kkiii dud

1

0,nuuu ,...,, 21

Some series of vectors

Page 52: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Conjugate Gradient

So for a arbitrary choice of we don’t earn nothing.

Luckily, we can choose so that the conjugate direction calculation is O(m) where m is the number of non-zero entries in .

The correct choice of is:

iu

iu

A

iu

)( ii xfu

Page 53: Gradient Methods May 2005. Preview Background Steepest Descent Conjugate Gradient.

Conjugate Gradient

So the conjugate gradient algorithm for minimizing f:Data: Step 0: Step 1:

Step 2:

Step 3:

Step 4: and repeat n times.

nx 0

)(: 000 xfrd

iTi

iT

ii Add

rr

iiii dxx 1

iT

i

iT

ii rr

rr 111

iiii drd 111

)(: ii xfr