8/3/2019 cg_ex10
1/42
Gradient Methods
Yaron Lipman
May 2003
8/3/2019 cg_ex10
2/42
Preview
Background
Steepest Descent
Conjugate Gradient
8/3/2019 cg_ex10
3/42
Preview
Background
Steepest Descent
Conjugate Gradient
8/3/2019 cg_ex10
4/42
Background
Motivation
The gradient notion
The Wolfe Theorems
8/3/2019 cg_ex10
5/42
Motivation
The min(max) problem:
But we learned in calculus how to solve that
kind of question!
)(min xfx
8/3/2019 cg_ex10
6/42
Motivation
Not exactly,
Functions: High order polynomials:
What about function that dont have an analytic
presentation: Black Box
x1
6x3 1
120x5 1
5040x7
RRfn
p:
8/3/2019 cg_ex10
7/42
Motivation
real world problem finding harmonic mapping
General problem: find global min(max)
This lecture will concentrate on finding localminimum.
!
Eji
jijiharm kE),(
2
,2
1vv
RRyyxxE nnnharm p211 :),,,,,( --
8/3/2019 cg_ex10
8/42
Background
Motivation
The gradient notion The Wolfe Theorems
8/3/2019 cg_ex10
9/42
:=f p( ),x y
cos
1
2 x
cos
1
2 y x
8/3/2019 cg_ex10
10/42
Directional Derivatives:first, the one dimension derivative:
U
8/3/2019 cg_ex10
11/42
x
yxf
x
x ),(
yyxf
xx ),(
Directional Derivatives :Along the Axes
8/3/2019 cg_ex10
12/42
v
yxf
x
x ),(
2
Rv1!v
Directional Derivatives :In general direction
8/3/2019 cg_ex10
13/42
Directional
Derivatives
x
yxf
x
x ),(
y
yxf
x
x ),(
8/3/2019 cg_ex10
14/42
In the plane
2R
RRfp
2:
x
x
x
x!
y
f
x
fyxf :),(
The Gradient: Definition in
8/3/2019 cg_ex10
15/42
x
x
x
x!
nn x
f
x
fxxf ,...,:),...,(
11
RRf
np:
The Gradient: Definition
8/3/2019 cg_ex10
16/42
The Gradient Properties
The gradient defines (hyper) plane
approximating the function infinitesimally
yy
fx
x
fz (
x
x(
x
x!(
8/3/2019 cg_ex10
17/42
The Gradient properties
By the chain rule: (important for later use)
vfpv
fp ,)()( !
x
x1!v
8/3/2019 cg_ex10
18/42
The Gradient properties
Proposition 1:
is maximal choosing
is minimal choosing
(intuitive: the gradient point the greatest change direction)
v
f
x
xp
p
ff
v )()(
1
!
p
pff
v
)()(
1
!
8/3/2019 cg_ex10
19/42
The Gradient properties
Proof: (only for minimum case)
Assign: by chain rule:
p
p
p
pp
p
p
p
p
ff
fff
f
f
f
fpv
yxf
)()(
)()(,)(
)(
1
)(
)(
1,)()(
),(
2
!
!
!
!
x
x
p
p
ff
v )()(
1
!
8/3/2019 cg_ex10
20/42
The Gradient properties
On the other hand for general v:
p
p
pp
fpv
yxff
vfvfpv
yxf
)()(),(
)(
)(,)()(),(
ux
x
!
!e!x
x
8/3/2019 cg_ex10
21/42
The Gradient Properties
Proposition 2: let be a
smooth function around P,if f has local minimum (maximum) at p
then,
(Intuitive: necessary for local min(max))
RRfnp:
0)( ! pf
1
C
8/3/2019 cg_ex10
22/42
The Gradient Properties
Proof:
Intuitive:
8/3/2019 cg_ex10
23/42
The Gradient Properties
Formally: for any
We get:
}0{\nRv
0)(
,)()0()(
0
!
!
!
p
p
f
vf
dt
vtpdf
8/3/2019 cg_ex10
24/42
The Gradient Properties
We found the best INFINITESIMAL DIRECTION
at each point, Looking for minimum: blind man procedure
How can we derive the way to the minimum
using this knowledge?
8/3/2019 cg_ex10
25/42
Background
Motivation
The gradient notion The Wolfe Theorems
8/3/2019 cg_ex10
26/42
The Wolfe Theorem
This is the link from the previous gradient
properties to the constructive algorithm. The problem:
)(min xfx
8/3/2019 cg_ex10
27/42
The Wolfe Theorem
We introduce a model for algorithm:
Data:Step 0: set i=0
Step 1: if stop,
else, compute search directionStep 2: compute the step-size
Step 3: set go to step 1
n
Rx 0
0)( ! ixfn
i Rh
)(minarg0
iiihxf
u
PPP
iiii
hxx !
P1
8/3/2019 cg_ex10
28/42
The Wolfe Theorem
The Theorem: suppose C1
smooth, and exist continuous function:
And,
And, the search vectors constructed by the
model algorithm satisfy:
RRf n p:
]1,0[: pnRk
0)(0)(: "{
xkxfx
iiiii hxfxkhxf e )()(),(
8/3/2019 cg_ex10
29/42
The Wolfe Theorem
And
Then if is the sequence constructed by
the algorithm model,
then any accumulation point y of this sequencesatisfy:
g
!0}{ iixx^
0)( ! yf
00)( {{i
hyf
8/3/2019 cg_ex10
30/42
The Wolfe Theorem
The theorem has very intuitive interpretation :
Always go in decent direction.
)( ixf
ih
8/3/2019 cg_ex10
31/42
Preview
Background
Steepest Descent Conjugate Gradient
8/3/2019 cg_ex10
32/42
Steepest Descent
What it mean?
We now use what we have learned toimplement the most basic minimization
technique.
First we introduce the algorithm, which is a
version of the model algorithm.
The problem:)(min xf
x
8/3/2019 cg_ex10
33/42
Steepest Descent
Steepest descent algorithm:
Data:Step 0: set i=0
Step 1: if stop,
else, compute search directionStep 2: compute the step-size
Step 3: set go to step 1
n
Rx 0
0)( ! ixf
)( ii xfh!
)(minarg0
iiihxf
u
PPP
iiii
hxx !
P1
8/3/2019 cg_ex10
34/42
Steepest Descent
Theorem: if is a sequence constructed
by the SD algorithm, then every accumulationpoint y of the sequence satisfy:
Proof: from Wolfe theorem
0)( ! yf
g
!0}{ iix
8/3/2019 cg_ex10
35/42
Steepest Descent
From the chain rule:
Therefore the method of steepest descentlooks like this:
0),()( !! iiiiiii hhxfhxfdd PPP
8/3/2019 cg_ex10
36/42
Steepest Descent
8/3/2019 cg_ex10
37/42
Steepest Descent
The steepest descent find critical point and
local minimum. Implicit step-size rule
Actually we reduced the problem to finding
minimum:
There are extensions that gives the step size
rule in discrete sense. (Armijo)
RRf p:
8/3/2019 cg_ex10
38/42
Preview
Background
Steepest Descent Conjugate Gradient
8/3/2019 cg_ex10
39/42
Conjugate Gradient
Modern optimization methods : conjugate
direction
methods. A method to solve quadratic function
minimization:
(H is symmetric and positive definite)
},,{min 21
xdHxxnRx
8/3/2019 cg_ex10
40/42
Conjugate Gradient
Originally aimed to solve linear problems:
Later extended to general functions under
rational ofquadratic approximation to a
function is quite accurate.
2min bAxbAxnRx
!
8/3/2019 cg_ex10
41/42
Conjugate Gradient
The basic idea: decompose the n-dimensional
quadratic problem into n problems of1-dimension
This is done by exploring the function in
conjugate directions.
Definition: H-conjugate vectors:
jiHuuRu jinn
ii {!! ,0,,}{ 1
8/3/2019 cg_ex10
42/42
Conjugate Gradient
If there is an H-conjugate basis then:
N problems in 1-dimension (simple smiling quadratic)
The global minimizer is calculated sequentially startingfrom x0:
!j
hjhxx P0
!
!
jjjjjj hHxdHhhxfxf
xdHxxxf
PP,,)()(
,,:)(
0
2
2
1
0
21
)1...,,1,0(,1
!!
nihxxiiii
P