MATH2070 Optimisation Nonlinear optimisation without constraints Semester 2, 2012 Lecturer: I.W. Guo Lecture slides courtesy of J.R. Wishart
MATH2070 Optimisation
Nonlinear optimisation without constraints
Semester 2, 2012Lecturer: I.W. Guo
Lecture slides courtesy of J.R. Wishart
Introduction Bivariate Multivariate Hessian Convex
Review: Nonlinear optimisation without constraints
The full non-linear optimisation problem with no constraints
Bivariate (two dimensional) method
Generalised Multivariate (n-dimensional) method
Use of Eigenvalues in Hessian
Convex functions and sets
Introduction Bivariate Multivariate Hessian Convex
The full non-linear optimisation problem with no constraints
Bivariate (two dimensional) method
Generalised Multivariate (n-dimensional) method
Use of Eigenvalues in Hessian
Convex functions and sets
Introduction Bivariate Multivariate Hessian Convex
Non-linear optimisation
Interested in optimising a function of several variables with noconstraints.
Multivariate framework
Variables x = (x1, x2, . . . , xn) ∈ D ⊂ Rn
Objective Function Z = f (x1, x2, . . . , xn)
Example
Typical non-linear functions:
I f (x1, x2) = 7x51 − 10x3
2 + 3x1x2.
I f (x1, x2, x3) = e−x3(x2
1 + x22
).
I f (x1, x2) = 12π exp
−x21+x22
2
.
Introduction Bivariate Multivariate Hessian Convex
Example: Trigonometric function
02
46 0
2
4
6−1
0
1
Figure 1: Objective function : f (x1, x2) = sin(x1)× sin(x2)
Introduction Bivariate Multivariate Hessian Convex
Example: Bivariate Normal density
−2 −1 01
2−2
0
20
0.1
0.2
Figure 2: Objective function :f(x1, x2) = 1
π√3
exp(− 2
3
(x21 + x22 − x1x2
))
Introduction Bivariate Multivariate Hessian Convex
Optimisation of multivariate functions
Consider first minimising objective function over its domain D,
Z∗ = min(x1,x2,...,xn)∈D
f (x1, x2, . . . , xn) .
Written in vector notation,
Z∗ = minx∈D
f(x).
Consider the problem of local minima, which is simpler.
Methodology
1. Find critical points, i.e. when first partial derivatives of f arezero.
2. Determine the nature of critical points by looking at secondorder derivatives.
Introduction Bivariate Multivariate Hessian Convex
No loss of generality
Can extend this to maximisation through the following,
maxx∈D
f(x) = −minx∈D−f(x)
−2 −1 01
2−2
0
2
−0.2
−0.1
0
Figure 3: Objective function :f(x1, x2) = −1
π√3
exp(− 2
3
(x21 + x22 − x1x2
))
Introduction Bivariate Multivariate Hessian Convex
Univariate review
Univariate framework
Domain x ∈ D ⊂ R.
Objective function f : [a, b]−→R.
Necessary conditions for Extrema
Occur at boundary of D or ..
In the interior of D when dfdx = 0.
Introduction Bivariate Multivariate Hessian Convex
Example
xx∗x∗
f(x)
I Global max at the left boundary
I Global min in the interior
I Local max near the right boundary.
Introduction Bivariate Multivariate Hessian Convex
Nature of extrema
Generalised higher derivative test
Let m be a positive integer and assume that there exists anx0 ∈ (a, b) such that
f (1)(x0) = f (2)(x0) = . . . = f (2m−1)(x0) = 0.
Then the following holds
1. If f (2m)(x0) > 0, then x0 is the location of a local minimum.
2. If f (2m)(x0) < 0, then x0 is the location of a local maximum.
3. If f (2m)(x0) = 0 and f (2m+1)(x0) 6= 0 then the test fails.
Introduction Bivariate Multivariate Hessian Convex
Taylor’s Theorem
Theorem
Taylor’s Suppose that f ∈ Cp[a, b] and the derivative f (p+1) existson [a, b], and let x0 ∈ [a, b]. For every x ∈ [a, b] there exists η(x)between x0 and x such that
f(x) = f(x0) + f (1)(x0)(x− x0) +f (2)(x0)
2!(x− x0)2
+ · · ·+ f (p)(x0)
p!(x− x0)p +
f (p+1)(η)
(p+ 1)!(x− x0)p+1 .
To prove the results on previous slide choose p = 2m− 1.
Introduction Bivariate Multivariate Hessian Convex
Proof of result 1.
Choose p = 2m− 1 in Taylor’s theorem yields,
f(x) = f(x0) +f (2m)(η)
(2m)!(x− x0)2m ,
where η is between x and x0.
I The term (x− x0)2m is guaranteed to be positive.
I By assumption, f (2m)(η) will be positive if x is in aneighbourhood of x0. Then, f(x) > f(x0) when x is in a neighbourhood of x0.
Introduction Bivariate Multivariate Hessian Convex
Proof of result 2.
Again, choose p = 2m− 1 in Taylor’s theorem,
f(x) = f(x0) +f (2m)(η)
(2m)!(x− x0)2m ,
where η is between x and x0.
I The term (x− x0)2m is guaranteed to be positive.
I By assumption, f (2m)(η) will be negative if x is in aneighbourhood of x0. Then, f(x) < f(x0) when x is in a neighbourhood of x0.
Introduction Bivariate Multivariate Hessian Convex
The full non-linear optimisation problem with no constraints
Bivariate (two dimensional) method
Generalised Multivariate (n-dimensional) method
Use of Eigenvalues in Hessian
Convex functions and sets
Introduction Bivariate Multivariate Hessian Convex
Bivariate Analysis
Consider now the necessary and sufficient conditions for extremafor a bivariate function.
Bivariate framework
Domain x = (x, y) ∈ D ⊂ R2.
Objective function f = f(x, y) :D−→R.
Extend the Taylor expansion argument to two dimensional case.
Introduction Bivariate Multivariate Hessian Convex
Necessary condition for extrema
Critical point
A point x0 is a stationary point of the function f = f(x) if
∂f
∂x=∂f
∂y= 0.
Introduce notation for partial derivatives,
Notation
Let f = f(x, y) then,
∂f∂x = fx, ∂f
∂y = fy, ∂2f∂x2
= fxx, ∂2f∂x∂y = fxy
and so on...
Introduction Bivariate Multivariate Hessian Convex
Minimisation example
Example
Minimize z = f(x, y) = 12(x− 1)2 + 1
2(y − 2)2 + 1.
Solution
Consider the necessary conditions:
∂f
∂x= x− 1 = 0
∂f
∂y= y − 2 = 0
and the result follows.
Introduction Bivariate Multivariate Hessian Convex
Bivariate version
Theorem (Taylor’s Theorem)
Suppose that f(x, y) and its partial derivatives of all orders lessthan or equal to p+ 1 are continuous on D ⊆ R2 and letx0 = (x0, y0) ∈ D. For every x = (x, y) ∈ D, there exists ξbetween x and x0, and η between y and y0 such that
f(x,y) =f(x0,y0)+fx(x0)(x−x0)+fy(x0)(y−y0)
+ 12!(fxx(x0)(x−x0)2+2fxy(x0)(x−x0)(y−y0)+fyy(x0)(y−y0)2)
+···+ 1p!
(∑pj=0 (pj)( ∂pf
∂xp−j∂yj)0(x−x0)p−j(y−y0)j
)+Rp(x,y,ξ,η)
where Rp is a remainder term.
Introduction Bivariate Multivariate Hessian Convex
Example: Inverted linear function
1
2
3
4−3
−2
−1
0
0.5
1
Figure 4: Objective function : z = f (x, y) = 1x−y
Introduction Bivariate Multivariate Hessian Convex
Quadratic level Taylor expansion: Example
Find a quadratic approximation to z = 1x−y at (x, y) = (1, 0).
Let z = 1x−y = (x− y)−1. Finding the partial derivatives,
zx = − 1
(x− y)2= −zy , zxx =
2
(x− y)3= zyy = −zxy .
Evaluating z and these derivatives at x0 = (1, 0) gives
z(1, 0) = 1 , zx(1, 0) = −1 = −zy , zxx = 2 = zyy = −zxy .
Apply Taylor’s Theorem,
z(x, y) = 1− (x− 1) + y +1
2
(2(x− 1)2 − 2(x− 1)y + 2y2
)+ . . .
= 3− 3x+ 2y − xy + x2 + y2 + . . .
Introduction Bivariate Multivariate Hessian Convex
Hessian matrix
Consider the Taylor expansion in matrix form.
Introduce the gradient operator ∇f :=
(∂f∂x
∂f∂y
)
Definition
Define the Hessian matrix
H (x) = H(x, y) :=
∂2f
∂x2
∂2f
∂x∂y
∂2f
∂y∂x
∂2f
∂y2
.
Introduction Bivariate Multivariate Hessian Convex
Matrix version of Taylor formula
Define the distance from x to x0 as,
d := x− x0 =
(x− x0
y − y0
).
Then Taylor’s formula can be rewritten,
Matrix version
f(x) = f(x0 + d)
= f(x0) + dT∇f(x0) +1
2dTH(x0)d + · · · ,
Introduction Bivariate Multivariate Hessian Convex
Proof of Matrix version
∇f (x0)T d =(∂f∂x (x0) ∂f
∂y (x0))(x− x0
y − y0
)=∂f
∂x(x0) (x− x0) +
∂f
∂y(x0) (y − y0)
dTH(x0)d =(x− x0 y − y0
)∂2f
∂x2(x0)
∂2f
∂x∂y(x0)
∂2f
∂y∂x(x0)
∂2f
∂y2(x0)
(x− x0
y − y0
)
=∂2f
∂x2(x0)(x− x0)2 + 2
∂2f
∂x∂y(x0)(x− x0)(y − y0)
+∂2f
∂y2(x0)(y − y0)2.
Introduction Bivariate Multivariate Hessian Convex
Application of Matrix version
Recall, the necessary condition for a critical point at x0 is,
∂f
∂x=∂f
∂y= 0⇒ ∇f (x0) = 0
So, at critical point Taylor expansion reduces to,
f(x) = f(x0) +1
2dTH(x0)d + . . .
Focus analysis on matrix H to determine sufficient conditions forbehaviour of critical points.
Introduction Bivariate Multivariate Hessian Convex
Quadratic Form
Assume the matrix M is a symmetric matrix then
Definition
The expression Q := xTMx is known as a the quadratic formassociated with the matrix M .
Known as quadratic form due to the pairwise multiplication(powers of two) elements in the vector.
The Hessian matrices are symmetric, since it is assumed that thefunction f is continuous.
Introduction Bivariate Multivariate Hessian Convex
Invariant under order of partial differentiation
If f is continuous then this implies that,
∂2f
∂x∂y=
∂2f
∂y∂x.
and therefore the Hessian matrix becomes,
H(x) =
(∂2f∂x2
∂2f∂y∂x
∂2f∂x∂y
∂2f∂y2
)=
(∂2f∂x2
∂2f∂x∂y
∂2f∂x∂y
∂2f∂y2
)Why is this useful?
Introduction Bivariate Multivariate Hessian Convex
Quadratic form
Example
If H is the symmetric matrix,
H =
(2 55 2
)the associated quadratic form Q = xTHx is
Q =(x y
)(2 55 2
)(xy
)= 2x2 + 5xy + 5yx+ 2y2
= 2x2 + 10xy + 2y2 .
Introduction Bivariate Multivariate Hessian Convex
General Quadratic form for bivariate functions
Bivariate Quadratic form
Given x =
(xy
)and a symmetric matrix,
M =
(a bb c
)then
Q := xTMx = ax2 + 2bxy + cy2.
Introduction Bivariate Multivariate Hessian Convex
Inverse operation of Quadratic form
Inverse Quadratic form result
Consider the quadratic form Q := ax2 + bxy + cyx+ dy2. Thenthe associated symmetric matrix M is
M =
(a 1
2(b+ c)12(b+ c) d
).
Example
If Q = 3x2 + 14xy + y2, then the associated symmetric matrix Mis,
M =
(3 77 1
).
Introduction Bivariate Multivariate Hessian Convex
Link to Quadratic form
The quadratic form Q = xTHx, and its associated symmetricmatrix H, may be classified as follows:
1. Q and H are positive definite, if Q > 0 for x 6= 0.
2. Q and H are negative definite, if Q < 0 for x 6= 0.
3. Q and H are positive semi-definite, if Q ≥ 0 and Q = 0 forsome x 6= 0.
4. Q and H are negative semi-definite, if Q ≤ 0 and Q = 0 forsome x 6= 0.
5. Q and H are indefinite, if there exist x1, x2 such that Q > 0at x = x1 and Q < 0 at x = x2.
Introduction Bivariate Multivariate Hessian Convex
Positive definite function
−10−5
05
10
−10−5
05
100
50
100
150
200
x1
x2
Q1(x
1,x2)
Figure 5: Q1 (x1, x2) = x21 + x22
Notice that the function is positive for all values in its domain(excluding x = 0).
Introduction Bivariate Multivariate Hessian Convex
Negative definite function
−10−5
05
10
−10−5
05
10−200
−150
−100
−50
0
x1
x2
Q2(x
1,x2)
Figure 6: Q2 (x1, x2) = −x21 − x22
Notice that the function is negative for all values in its domain(excluding x = 0).
Introduction Bivariate Multivariate Hessian Convex
Indefinite function
−10−5
05
10
−10−5
05
10−100
−50
0
50
100
x1
x2
Q3(x
1,x2)
Figure 7: Q3 (x1, x2) = x21 − x22
Notice the saddle point around x = 0.
Introduction Bivariate Multivariate Hessian Convex
Positive semi-definite function
−10 −5 0 5 10−10
0
100
100
200
300
400
x1
x2
Q4(x
1,x2)
Figure 8: Q4 (x1, x2) = x21 + 2x1x2 + x22
The function is positive but can be zero along the axis x1 = −x2.
Introduction Bivariate Multivariate Hessian Convex
Negative semi-definite function
−10 −5 0 5 10−10
0
10−400
−300
−200
−100
0
x1
x2
Q5(x
1,x2)
Figure 9: Q5 (x1, x2) = −(x21 + 2x1x2 + x22)
The function is negative but can be zero along the axis x1 = −x2.
Introduction Bivariate Multivariate Hessian Convex
Why bother with Quadratic form?
Reason
There exists sufficient conditions for the nature of critical pointsand those conditions are linked to the Hessian H and Q.
Sufficient conditions
1. If H is positive definite, then x0 is a local minimum of f(x).
2. If H is negative definite, then x0 is a local maximum of f(x).
3. If H is positive semi-definite, then the test fails.
4. If H is negative semi-definite, then the test fails.
5. If H is indefinite, then x0 is a saddle point of f(x).
Introduction Bivariate Multivariate Hessian Convex
Why does the test fail sometimes?
Recall the Taylor expansion,
f(x) = f(x0) + dT∇f(x0) + dTH (x0)d + · · ·
If H (x0) is semi-definite then the sign of dTH (x0)d is neitherpositive or negative everywhere.
So, nature determined by higher order terms in expansion.
Introduction Bivariate Multivariate Hessian Convex
Example of completing the square
Example
Let Q be given by Q = x2 + 6xy + 11y2.
Determine the nature of this quadratic form.
Solution
Q = x2 + 6xy + 11y2
= x2 + 6xy + (3y)2 + 2y2
= (x+ 3y)2 + 2y2
> 0
for (x, y) 6= (0, 0). Therefore Q is positive definite.
Introduction Bivariate Multivariate Hessian Convex
The full non-linear optimisation problem with no constraints
Bivariate (two dimensional) method
Generalised Multivariate (n-dimensional) method
Use of Eigenvalues in Hessian
Convex functions and sets
Introduction Bivariate Multivariate Hessian Convex
Generalise the optimisation
Consider the full generalised framework.
Multivariate framework
Variables x = (x1, x2, . . . , xn) ∈ D ⊂ Rn
Objective Function Z = f (x1, x2, . . . , xn)
Wish to extend the arguments in the Bivariate section withHessian and Quadratic forms to the n-dimensional case.
Notation
For the generalised partial derivatives denote,
∂pf
∂xi11 ∂xi22 . . . ∂x
inn
= fxi11 x
i22 ...x
inn
Introduction Bivariate Multivariate Hessian Convex
Taylor’s Theorem
Theorem
Suppose that f(x) and its partial derivatives of all orders less thanor equal to p+ 1 are continuous on an open set D ⊂ Rn and letx0 ∈ D such that the line segment joining x to x0 lies in D. Forevery x ∈ D, there exists ξ between x and x0, such that
f(x) =f(x0) +∑n
i=1 fxi (x0)(xi−x0i)
+ 12!
∑ni=1
∑nj=1 fxixj (x0)(xi−x0i)(xj−x0j) + ···
+ 1p!
∑pi∈S
p!i1!i2!...in!
fxi11 x
i22 ...x
inn
(x0)(x1−x01)i1 (x2−x02)i2 ...(xn−x0n)in+Rp(x)
where the summation setS = i1, i2, . . . , in; 0 ≤ i1, i2, . . . , in ≤ p, i1 + i2 + · · ·+ in = pand Rp is the remainder term.
Introduction Bivariate Multivariate Hessian Convex
Generalised gradient operator
Definition
Define the n-dimensional gradient operator
∇f :=
∂f∂x1
∂f∂x2
...
∂f∂xn
.
So the first summation is∑ni=1 fxi(x0)(xi − x0i) = (x− x0)T∇f(x0).
Introduction Bivariate Multivariate Hessian Convex
Generalised Hessian
Definition
Define the Hessian matrix
H =
(∂2f
∂xi∂xj
)1≤i,j≤n
=
∂2f
∂x21
∂2f
∂x1∂x2· · · ∂2f
∂x1∂xn∂2f
∂x2∂x1
∂2f
∂x22
· · ·...
......
. . ....
∂2f
∂xn∂x1· · · · · · ∂2f
∂x2n
.
Then the second summation is∑ni=1
∑nj=1 fxixj (x0)(xi − x0i)(xj − x0j) = dTH (x0)d.
Introduction Bivariate Multivariate Hessian Convex
Functions
Consider functions f such that,
∂2f
∂xi∂xj=
∂2f
∂xj∂xi
so that the Hessian H is always a symmetric matrix, i.e.Hij = Hji.
H =
∂2f
∂x21
∂2f
∂x1∂x2· · · ∂2f
∂x1∂xn∂2f
∂x1∂x2
∂2f
∂x22
· · ·...
......
. . ....
∂2f
∂x1∂xn· · · · · · ∂2f
∂x2n
.
Introduction Bivariate Multivariate Hessian Convex
Necessary conditions
Critical point
A point x0 is a stationary point of the function f = f(x) if
∂f
∂x1=
∂f
∂x2= . . . =
∂f
∂xn= 0.
Introduction Bivariate Multivariate Hessian Convex
General Quadratic form for univariate functions
Trivariate Quadratic form
Given x =
xyz
and a symmetric matrix,
M =
a b cb d ec e f
then
Q := xTMx = ax2 + 2bxy + 2cxz + dy2 + 2eyz + fz2.
Introduction Bivariate Multivariate Hessian Convex
Example of going Q→M
Example
The quadratic form
Q = x21 + 4x1x2 + 5x2
2 + 6x2x3 + 2x23 + 2x1x3
has the associated symmetric matrix,
M =
1 2 12 5 31 3 2
.
Introduction Bivariate Multivariate Hessian Convex
Inverse operation of Quadratic form (Trivariate)
Inverse Quadratic form result
Consider the quadratic form
Q := ax2 + bxy + cxz + dyz + ey2 + fz2.
Then the associated symmetric matrix M is
M =
a 1
2b12c
12b e 1
2d
12c
12d f
.
Introduction Bivariate Multivariate Hessian Convex
Multivariate Quadratic form
Example
The quadratic form
Q = x21 + 4x1x2 + 5x2
2 + 6x2x3 + 2x23 + 2x1x3
has the associated symmetric matrix,
M =
1 2 12 5 31 3 2
.
Introduction Bivariate Multivariate Hessian Convex
Completing the square
Example
Complete the square for the form
Q = x21 + 4x1x2 + 5x2
2 + 6x2x3 + 2x23 + 2x1x3
Solution
Q = (x21 + 4x1x2 + 2x1x3) + 5x2
2 + 6x2x3 + 2x23
= (x1 + 2x2 + x3)2 + x22 + 2x2x3 + x2
3
= (x1 + 2x2 + x3)2 + (x2 + x3)2 .
I Q is usually positive (sum of squares)
I Q can be zero if x = (a,−a, a) for some a 6= 0.
Introduction Bivariate Multivariate Hessian Convex
Same sufficient conditions apply for multivariate case.
If x = (x1, x2, . . . , xn)T and H is the associate Hessian matrix.
Assume ∇f(x0) = 0 for some x0 ∈ Rn
Sufficient conditions
1. If H is positive definite, then x0 is a local minimum of f(x).
2. If H is negative definite, then x0 is a local maximum of f(x).
3. If H is positive semi-definite, then the test fails.
4. If H is negative semi-definite, then the test fails.
5. If H is indefinite, then x0 is a saddle point of f(x).
Introduction Bivariate Multivariate Hessian Convex
3-dimensional example
Find the critical points of z = f(x1, x2, x3) = 3x21x
22 + x2
3.Show that Hessian is only positive semi-definite and the localminimum test fails.Are the critical points actually local minima?
Introduction Bivariate Multivariate Hessian Convex
The full non-linear optimisation problem with no constraints
Bivariate (two dimensional) method
Generalised Multivariate (n-dimensional) method
Use of Eigenvalues in Hessian
Convex functions and sets
Introduction Bivariate Multivariate Hessian Convex
Alternative to completing the square
First method considered to check the nature of H is completingthe square.
Alternative method exists to determine the nature of the HessianH which checks the eigenvalues of the matrix H.
In particular, we check the sign of the eigenvalues of the matrix toshow it is positive or negative definite.
Introduction Bivariate Multivariate Hessian Convex
Brief review of linear algebra
Definition
The eigenvalues of a square n× n matrix H are the valuesλ1, λ2, . . . , λn such that,
det |H − λiI| = 0,
where det |M | is the determinant of the matrix M and I is theidentity matrix.
Recall, H is a symmetric matrix which means the eigenvalues arereal.
To ease the notation, assume eigenvalues are ordered so that,
λ1 ≤ λ2 ≤ · · · ≤ λn .
Introduction Bivariate Multivariate Hessian Convex
Sufficient conditions for nature of H
Eigenvalue Test
1. H is positive definite if and only if all its eigenvalues arepositive, i.e. λ1 > 0.
2. H is negative definite if and only if all its eigenvalues arenegative, i.e. λn < 0.
3. H is indefinite if and only if it has positive and negativeeigenvalues, i.e. λ1 < 0 and λn > 0.
4. H is positive semi-definite if and only if all its eigenvalues arenon-negative and at least one is zero, i.e. λ1 = 0.
5. H is negative semi-definite if and only if all its eigenvalues arenon-positive and at least one is zero, i.e. λn = 0.
Introduction Bivariate Multivariate Hessian Convex
Same sufficient conditions apply for multivariate case.
If x = (x1, x2, . . . , xn)T and H is the associate Hessian matrix.
Assume ∇f(x0) = 0 for some x0 ∈ Rn
Sufficient conditions
1. If H is positive definite, then x0 is a local minimum of f(x).
2. If H is negative definite, then x0 is a local maximum of f(x).
3. If H is positive semi-definite, then the test fails.
4. If H is negative semi-definite, then the test fails.
5. If H is indefinite, then x0 is a saddle point of f(x).
Introduction Bivariate Multivariate Hessian Convex
Simple Hessian eigenvalue example
Example
Find all critical points for the function
f(x, y) = x2 + y2
and determine the nature of the critical points.
Solution
Finding the critical points,
∂f
∂x= 2x ,
∂f
∂y= 2y .
Critical point at x0 = (0, 0). Find Hessian,
H =
(2 00 2
).
Introduction Bivariate Multivariate Hessian Convex
3-dimensional Hessian example
Find all extrema of the function,
f(x, y, z) = 3xy − x3 − y3 − 3z2
Critical points at,
∂f
∂x= 3y − 3x2 = 0,
∂f
∂y= 3x− 3y2 = 0,
∂f
∂z= −6z = 0 .
Two critical points for these equations (1, 1, 0) and (0, 0, 0). TheHessian is
H(x, y, z) =
−6x 3 03 −6y 00 0 −6
.
Introduction Bivariate Multivariate Hessian Convex
At (0, 0, 0) the Hessian is
H(0, 0, 0) =
0 3 03 0 00 0 −6
.
The eigenvalues of H (0, 0, 0) are,
λ = 3, −3, −6.
Therefore H(0, 0, 0) is indefinite and (0, 0, 0) is a saddle point.At the other critical point (1, 1, 0),
H(1, 1, 0) =
−6 3 03 −6 00 0 −6
.
The eigenvalues are λ = −3, −6, −9, H(1, 1, 0) is negativedefinite and hence there is a local maximum at (1, 1, 0).
Introduction Bivariate Multivariate Hessian Convex
The full non-linear optimisation problem with no constraints
Bivariate (two dimensional) method
Generalised Multivariate (n-dimensional) method
Use of Eigenvalues in Hessian
Convex functions and sets
Introduction Bivariate Multivariate Hessian Convex
Convex Set
Definition
A set Ω ∈ Rn is convex, if for any points P , Q in Ω, the linesegment PQ joining P and Q lies in Ω. If p and q are the positionvectors of P and Q, then the point R with position vectorr = cp + (1− c)q lies in Ω, where 0 ≤ c ≤ 1.
Roughly speaking, if two points x, y ∈ Ω, then the line joining thepoints x and y is also in Ω.
I Linear constraints are convex,
g(x) =
n∑i=1
cixi − b = 0,
for some constants ci and b.
Introduction Bivariate Multivariate Hessian Convex
Convex function
Definition
A function f(x) = f(x1, x2, . . . , xn) is convex on the convex setΩ, if
f(cp + (1− c)q) ≤ cf(p) + (1− c)f(q) ,
where 0 ≤ c ≤ 1.
Note in particular, that linear functions are convex.Proof: Left as an exercise.
I A function is called strictly convex if the inequality is strict.
I A function f is (strictly) concave if −f is (strictly) convex.
Introduction Bivariate Multivariate Hessian Convex
Convex function example
x
y
p r q
f(p)
f(r)
f(q)cf(p) + (1− c)f(q)
Roughly speaking,
the chord between two points on f lies above f .
Introduction Bivariate Multivariate Hessian Convex
Theorem
If fini=1 are convex functions on the convex set Ω and ci ≥ 0,then
∑ni=1 cifi is convex on Ω.
Proof is left as an exercise.
Theorem
If f is a convex function on the convex set Ω. Then
Uc =x ∈ Ω
∣∣∣f(x) ≤ c
is a convex set for all c ∈ R.
Introduction Bivariate Multivariate Hessian Convex
Differentiable Convex functions
Theorem
Let f ∈ C1(Ω) with Ω convex. Then f is convex if and only if
f(y) ≥ f(x) + (y − x)T∇f(x)
for all x,y ∈ Ω.
Theorem
Let f ∈ C2(Ω), Ω is a non-empty convex set. Then f is convex onΩ if and only if Hf , the Hessian of f is positive semi-definite on Ω.
Introduction Bivariate Multivariate Hessian Convex
Check for Convexity
Example
Is the function
f(x, y) = 2x2 + xy + 2y2 − 12x+ 12y − 12
convex?
Introduction Bivariate Multivariate Hessian Convex
Check for Convexity
Example
Is the function
f(x, y) = −2x2 + xy + 2y2 − 2x+ 8y + 1
convex?
Introduction Bivariate Multivariate Hessian Convex
Convex optimisation
Theorem
If f is a convex function on a convex set Ω, then
M =
x ∈ Ω
∣∣∣minx∈Ω
f(x)
is a convex set and any local minimum is a global minimum.
Theorem
If f ∈ C1(Ω) with Ω convex and there exists a x∗ such that for ally ∈ Ω,
(y − x∗)T∇f(x∗) ≥ 0,
then x∗ is a global minimum of f over Ω.