7/30/2019 Gradient Methods
1/53
Gradient Methods
May 2005
7/30/2019 Gradient Methods
2/53
Preview
Background
Steepest Descent Conjugate Gradient
7/30/2019 Gradient Methods
3/53
Preview
Background
Steepest Descent Conjugate Gradient
7/30/2019 Gradient Methods
4/53
Background
Motivation
The gradient notion The Wolfe Theorems
7/30/2019 Gradient Methods
5/53
Motivation
The min(max) problem:
But we learned in calculus how to solve that
kind of question!
)(min xfx
7/30/2019 Gradient Methods
6/53
Motivation
Not exactly,
Functions: High order polynomials:
What about function that dont have an analytic
presentation: Black Box
x1
6
x3 1
120
x5 1
5040
x7
RRfn
:
7/30/2019 Gradient Methods
7/53
Motivation- real world problem
Connectivity shapes (isenburg,gumhold,gotsman)
What do we get only from C without geometry?{ ( , ), }mesh C V E geometry
7/30/2019 Gradient Methods
8/53
Motivation- real world problem
First we introduce error functionals and then try
to minimize them:
2
3
( , )
( ) 1n
s i j
i j E
E x x x
( , )
1
( )i j ii j EiL x x xd
3 2
1
( ) ( )nn
r i
i
E x L x
7/30/2019 Gradient Methods
9/53
Motivation- real world problem
Then we minimize:
High dimension non-linear problem.
The authors use conjugate gradient methodwhich is maybe the most popular optimizationtechnique based on what well see here.
3
( , ) arg min 1 ( ) ( )n
s rx
E C E x E x
7/30/2019 Gradient Methods
10/53
Motivation- real world problem
Changing the parameter:
3
( , ) arg min 1 ( ) ( )n
s rx
E C E x E x
7/30/2019 Gradient Methods
11/53
7/30/2019 Gradient Methods
12/53
Background
Motivation
The gradient notion The Wolfe Theorems
7/30/2019 Gradient Methods
13/53
:=f ( ),x y
cos1
2x
cos1
2y x
7/30/2019 Gradient Methods
14/53
Directional Derivatives:
first, the one dimension derivative:
7/30/2019 Gradient Methods
15/53
x
yxf
),(
y
yxf
),(
Directional Derivatives :
Along the Axes
7/30/2019 Gradient Methods
16/53
v
yxf
),(
2
Rv
1v
Directional Derivatives :
In general direction
7/30/2019 Gradient Methods
17/53
Directional
Derivatives
x
yxf
),(
y
yxf
),(
7/30/2019 Gradient Methods
18/53
In the plane
2
R
RRf 2:
y
f
x
fyxf :),(
The Gradient: Definition in
),( yxf
7/30/2019 Gradient Methods
19/53
n
n
x
f
x
fxxf ,...,:),...,(
1
1
RRfn :
The Gradient: Definition
7/30/2019 Gradient Methods
20/53
The Gradient Properties
The gradient defines (hyper) plane
approximating the function infinitesimally
yy
fx
x
fz
7/30/2019 Gradient Methods
21/53
The Gradient properties
By the chain rule: (important for later use)
vfpv
fp ,)(
1v
pf
v
7/30/2019 Gradient Methods
22/53
The Gradient properties
Proposition 1:
is maximal choosing
is minimal choosing
(intuitive: the gradient points at the greatest change direction)
v
f
p
p
ffv
1
pp
ffv
1
7/30/2019 Gradient Methods
23/53
The Gradient properties
Proof: (only for minimum case)
Assign: by chain rule:
p
p
p
pp
p
p
p
p
ff
fff
f
f
f
fp
v
yxf
2
,1
)(
)(
1,)()(
),(
p
pffv
1
7/30/2019 Gradient Methods
24/53
The Gradient properties
On the other hand for general v:
p
p
pp
fpv
yxf
f
vfvfpv
yxf
)(),(
,)(),(
7/30/2019 Gradient Methods
25/53
The Gradient Properties
Proposition 2: let be asmooth function around P,
if f has local minimum (maximum) at p
then,
(Intuitive: necessary for local min(max))
RRfn :
0 pf
1
C
7/30/2019 Gradient Methods
26/53
The Gradient Properties
Proof:
Intuitive:
7/30/2019 Gradient Methods
27/53
The Gradient Properties
Formally: for any
We get:
}0{\nRv
0)(
,)()0()(
0
p
p
f
vf
dt
vtpdf
7/30/2019 Gradient Methods
28/53
The Gradient Properties
We found the best INFINITESIMAL DIRECTIONat each point,
Looking for minimum: blind man procedure
How can we derive the way to the minimum
using this knowledge?
7/30/2019 Gradient Methods
29/53
Background
Motivation
The gradient notion The Wolfe Theorems
7/30/2019 Gradient Methods
30/53
The Wolfe Theorem
This is the link from the previous gradient
properties to the constructive algorithm.
The problem:
)(min xfx
7/30/2019 Gradient Methods
31/53
The Wolfe Theorem
We introduce a model for algorithm:
Data:Step 0: set i=0
Step 1: if stop,
else, compute search directionStep 2: compute the step-size
Step 3: set go to step 1
n
Rx
0
0)( ixfn
i
Rh
)(minarg0
iii hxf
iiiihxx
1
7/30/2019 Gradient Methods
32/53
The Wolfe Theorem
The Theorem: suppose C1
smooth, and exist continuous function:
And,
And, the search vectors constructed by the
model algorithm satisfy:
RRf n :
]1,0[: nRk
0)(0)(: xkxfx
iiiii hxfxkhxf )()(),(
7/30/2019 Gradient Methods
33/53
The Wolfe Theorem
And
Then if is the sequence constructed by
the algorithm model,
then any accumulation point y of this sequence
satisfy:
0}{ iix
0)( yf
00)( ihyf
7/30/2019 Gradient Methods
34/53
The Wolfe Theorem
The theorem has very intuitive interpretation :
Always go in decent direction.
)( ixf
ih
7/30/2019 Gradient Methods
35/53
Preview
Background
Steepest Descent Conjugate Gradient
7/30/2019 Gradient Methods
36/53
Steepest Descent
What it mean?
We now use what we have learned toimplement the most basic minimization
technique.
First we introduce the algorithm, which is a
version of the model algorithm.
The problem:)(min xf
x
7/30/2019 Gradient Methods
37/53
Steepest Descent
Steepest descent algorithm:
Data:Step 0: set i=0
Step 1: if stop,
else, compute search direction
Step 2: compute the step-size
Step 3: set go to step 1
n
Rx
0
0)( ixf
)( ii xfh
)(minarg0
iii hxf
iiii
hxx
1
7/30/2019 Gradient Methods
38/53
Steepest Descent
Theorem: if is a sequence constructed
by the SD algorithm, then every accumulation
point y of the sequence satisfy:
Proof: from Wolfe theorem
Remark: Wolfe theorem gives us numerical stability if the derivatives arent
given (are calculated numerically).
0)( yf
0}{ iix
7/30/2019 Gradient Methods
39/53
Steepest Descent
From the chain rule:
Therefore the method of steepest descent
looks like this:
0),()( iiiii hhxfhxfdd
7/30/2019 Gradient Methods
40/53
Steepest Descent
7/30/2019 Gradient Methods
41/53
Steepest Descent
The steepest descent find critical point and
local minimum.
Implicit step-size rule
Actually we reduced the problem to finding
minimum:
There are extensions that gives the step size
rule in discrete sense. (Armijo)
RRf :
7/30/2019 Gradient Methods
42/53
Steepest Descent
Back with our connectivity shapes: the authors
solve the 1-dimension problem analytically.
They change the spring energy and get a
quartic polynomial in x
)(minarg0
iii hxf
2
23
( , )
( ) 1n
s i j
i j E
E x x x
7/30/2019 Gradient Methods
43/53
Preview
Background
Steepest Descent Conjugate Gradient
7/30/2019 Gradient Methods
44/53
Conjugate Gradient
We from now on assume we want to minimize
the quadratic function:
This is equivalent to solve linear problem:
There are generalizations to general functions.
cxbAxxxf TT 2
1)(
bAxxf )(0
7/30/2019 Gradient Methods
45/53
Conjugate Gradient
What is the problem with steepest descent?
We can repeat the same directions over and
over
Conjugate gradient takes at mostn steps.
7/30/2019 Gradient Methods
46/53
Conjugate Gradient
0x
1x
0d
1e
0e
0
~x
bxA ~
,...,...,, 10 jddd Search directions should span
iiii dxx 1
iii AexxAxf
xAAxbAxxf
)~()(
~)(
xxe ii~
n
7/30/2019 Gradient Methods
47/53
Conjugate Gradient
0x
1x
0d
0
~x
Given , how do we calculate ? (as before)jd
i
T
i
i
T
i
i
T
i
i
T
ii
iii
T
i
i
T
i
iTi
Add
xfd
Add
AeddeAd
Aed
xfd
)(0)(
0
0)(
1
1
j
)( 1 ixf
7/30/2019 Gradient Methods
48/53
Conjugate Gradient
0x
1x
0d
1e
0e
0~x
How do we find ?
We want that after n step the error will be 0 :jd
1
0
0
n
i
ii de
1
0
110020010 ...
j
i
iij deddedee
1
0
1
0
j
i
ii
n
i
iij dde
7/30/2019 Gradient Methods
49/53
Conjugate Gradient
Here an idea: if then:jj
11
0
1
0
1
0
1
0
n
ji
ii
j
i
ii
n
i
ii
j
i
ii
n
i
iij ddddde
So if ,nj 0ne
7/30/2019 Gradient Methods
50/53
Conjugate Gradient
So we look for such that :jj jd
0iT
j Add
Simple calculation shows that if we take
A - conjugate (- orthogonal)ji
7/30/2019 Gradient Methods
51/53
Conjugate Gradient
We have to find anA conjugate basis
We can do gram-schmidt process, but we
should be careful since it is an O(n) process:
1...0, njdj
k
i
k
kiii dud
1
0
,nuuu ,...,, 21
Some series of vectors
7/30/2019 Gradient Methods
52/53
Conjugate Gradient
So for a arbitrary choice of we dont earn
nothing.
Luckily, we can choose so that the
conjugate direction calculation is O(m) where
m is the number of non-zero entries in .
The correct choice of is:
iu
iu
A
iu
)( ii xfu
7/30/2019 Gradient Methods
53/53
Conjugate Gradient
So the conjugate gradient algorithm for minimizing f:Data:
Step 0:
Step 1:
Step 2:
Step 3:
Step 4: and repeat n times.
nx 0
)(: 000 xfrd
i
T
i
i
T
ii
Add
rr
iiii dxx 1
i
T
i
i
T
ii
rr
rr 111
iiii drd 111
)(: ii xfr