MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

MATH2070 Optimisation

Nonlinear optimisation without constraints

Semester 2, 2012Lecturer: I.W. Guo

Lecture slides courtesy of J.R. Wishart

Introduction Bivariate Multivariate Hessian Convex

Review: Nonlinear optimisation without constraints

The full non-linear optimisation problem with no constraints

Bivariate (two dimensional) method

Generalised Multivariate (n-dimensional) method

Use of Eigenvalues in Hessian

Convex functions and sets








Non-linear optimisation

Interested in optimising a function of several variables with noconstraints.

Multivariate framework

Variables x = (x1, x2, . . . , xn) ∈ D ⊂ Rn

Objective Function Z = f (x1, x2, . . . , xn)

Example

Typical non-linear functions:

I f (x1, x2) = 7x51 − 10x3

2 + 3x1x2.

I f (x1, x2, x3) = e−x3(x2

1 + x22

).

I f (x1, x2) = 12π exp

−x21+x22

2

.


Example: Trigonometric function

02

46 0

2

4

6−1

0

1

Figure 1: Objective function : f (x1, x2) = sin(x1)× sin(x2)


Example: Bivariate Normal density

−2 −1 01

2−2

0

20

0.1

0.2

Figure 2: Objective function :f(x1, x2) = 1

π√3

exp(− 2

3

(x21 + x22 − x1x2

))


Optimisation of multivariate functions

Consider first minimising objective function over its domain D,

Z∗ = min(x1,x2,...,xn)∈D

f (x1, x2, . . . , xn) .

Written in vector notation,

Z∗ = minx∈D

f(x).

Consider the problem of local minima, which is simpler.

Methodology

1. Find critical points, i.e. when first partial derivatives of f arezero.

2. Determine the nature of critical points by looking at secondorder derivatives.


No loss of generality

Can extend this to maximisation through the following,

maxx∈D

f(x) = −minx∈D−f(x)

−2 −1 01

2−2

0

2

−0.2

−0.1

0

Figure 3: Objective function :f(x1, x2) = −1

π√3

exp(− 2

3

(x21 + x22 − x1x2

))


Univariate review

Univariate framework

Domain x ∈ D ⊂ R.

Objective function f : [a, b]−→R.

Necessary conditions for Extrema

Occur at boundary of D or ..

In the interior of D when dfdx = 0.


Example

xx∗x∗

f(x)

I Global max at the left boundary

I Global min in the interior

I Local max near the right boundary.


Nature of extrema

Generalised higher derivative test

Let m be a positive integer and assume that there exists anx0 ∈ (a, b) such that

f (1)(x0) = f (2)(x0) = . . . = f (2m−1)(x0) = 0.

Then the following holds

1. If f (2m)(x0) > 0, then x0 is the location of a local minimum.

2. If f (2m)(x0) < 0, then x0 is the location of a local maximum.

3. If f (2m)(x0) = 0 and f (2m+1)(x0) 6= 0 then the test fails.


Taylor’s Theorem

Theorem

Taylor’s Suppose that f ∈ Cp[a, b] and the derivative f (p+1) existson [a, b], and let x0 ∈ [a, b]. For every x ∈ [a, b] there exists η(x)between x0 and x such that

f(x) = f(x0) + f (1)(x0)(x− x0) +f (2)(x0)

2!(x− x0)2

+ · · ·+ f (p)(x0)

p!(x− x0)p +

f (p+1)(η)

(p+ 1)!(x− x0)p+1 .

To prove the results on previous slide choose p = 2m− 1.


Proof of result 1.

Choose p = 2m− 1 in Taylor’s theorem yields,

f(x) = f(x0) +f (2m)(η)

(2m)!(x− x0)2m ,

where η is between x and x0.

I The term (x− x0)2m is guaranteed to be positive.

I By assumption, f (2m)(η) will be positive if x is in aneighbourhood of x0. Then, f(x) > f(x0) when x is in a neighbourhood of x0.


Proof of result 2.

Again, choose p = 2m− 1 in Taylor’s theorem,

f(x) = f(x0) +f (2m)(η)

(2m)!(x− x0)2m ,

where η is between x and x0.

I The term (x− x0)2m is guaranteed to be positive.

I By assumption, f (2m)(η) will be negative if x is in aneighbourhood of x0. Then, f(x) < f(x0) when x is in a neighbourhood of x0.








Bivariate Analysis

Consider now the necessary and sufficient conditions for extremafor a bivariate function.

Bivariate framework

Domain x = (x, y) ∈ D ⊂ R2.

Objective function f = f(x, y) :D−→R.

Extend the Taylor expansion argument to two dimensional case.


Necessary condition for extrema

Critical point

A point x0 is a stationary point of the function f = f(x) if

∂f

∂x=∂f

∂y= 0.

Introduce notation for partial derivatives,

Notation

Let f = f(x, y) then,

∂f∂x = fx, ∂f

∂y = fy, ∂2f∂x2

= fxx, ∂2f∂x∂y = fxy

and so on...


Minimisation example

Example

Minimize z = f(x, y) = 12(x− 1)2 + 1

2(y − 2)2 + 1.

Solution

Consider the necessary conditions:

∂f

∂x= x− 1 = 0

∂f

∂y= y − 2 = 0

and the result follows.


Bivariate version

Theorem (Taylor’s Theorem)

Suppose that f(x, y) and its partial derivatives of all orders lessthan or equal to p+ 1 are continuous on D ⊆ R2 and letx0 = (x0, y0) ∈ D. For every x = (x, y) ∈ D, there exists ξbetween x and x0, and η between y and y0 such that

f(x,y) =f(x0,y0)+fx(x0)(x−x0)+fy(x0)(y−y0)

+ 12!(fxx(x0)(x−x0)2+2fxy(x0)(x−x0)(y−y0)+fyy(x0)(y−y0)2)

+···+ 1p!

(∑pj=0 (pj)( ∂pf

∂xp−j∂yj)0(x−x0)p−j(y−y0)j

)+Rp(x,y,ξ,η)

where Rp is a remainder term.


Example: Inverted linear function

1

2

3

4−3

−2

−1

0

0.5

1

Figure 4: Objective function : z = f (x, y) = 1x−y


Quadratic level Taylor expansion: Example

Find a quadratic approximation to z = 1x−y at (x, y) = (1, 0).

Let z = 1x−y = (x− y)−1. Finding the partial derivatives,

zx = − 1

(x− y)2= −zy , zxx =

2

(x− y)3= zyy = −zxy .

Evaluating z and these derivatives at x0 = (1, 0) gives

z(1, 0) = 1 , zx(1, 0) = −1 = −zy , zxx = 2 = zyy = −zxy .

Apply Taylor’s Theorem,

z(x, y) = 1− (x− 1) + y +1

2

(2(x− 1)2 − 2(x− 1)y + 2y2

)+ . . .

= 3− 3x+ 2y − xy + x2 + y2 + . . .


Hessian matrix

Consider the Taylor expansion in matrix form.

Introduce the gradient operator ∇f :=

(∂f∂x

∂f∂y

)

Definition

Define the Hessian matrix

H (x) = H(x, y) :=

∂2f

∂x2

∂2f

∂x∂y

∂2f

∂y∂x

∂2f

∂y2

.


Matrix version of Taylor formula

Define the distance from x to x0 as,

d := x− x0 =

(x− x0

y − y0

).

Then Taylor’s formula can be rewritten,

Matrix version

f(x) = f(x0 + d)

= f(x0) + dT∇f(x0) +1

2dTH(x0)d + · · · ,


Proof of Matrix version

∇f (x0)T d =(∂f∂x (x0) ∂f

∂y (x0))(x− x0

y − y0

)=∂f

∂x(x0) (x− x0) +

∂f

∂y(x0) (y − y0)

dTH(x0)d =(x− x0 y − y0

)∂2f

∂x2(x0)

∂2f

∂x∂y(x0)

∂2f

∂y∂x(x0)

∂2f

∂y2(x0)

(x− x0

y − y0

)

=∂2f

∂x2(x0)(x− x0)2 + 2

∂2f

∂x∂y(x0)(x− x0)(y − y0)

+∂2f

∂y2(x0)(y − y0)2.


Application of Matrix version

Recall, the necessary condition for a critical point at x0 is,

∂f

∂x=∂f

∂y= 0⇒ ∇f (x0) = 0

So, at critical point Taylor expansion reduces to,

f(x) = f(x0) +1

2dTH(x0)d + . . .

Focus analysis on matrix H to determine sufficient conditions forbehaviour of critical points.


Quadratic Form

Assume the matrix M is a symmetric matrix then

Definition

The expression Q := xTMx is known as a the quadratic formassociated with the matrix M .

Known as quadratic form due to the pairwise multiplication(powers of two) elements in the vector.

The Hessian matrices are symmetric, since it is assumed that thefunction f is continuous.


Invariant under order of partial differentiation

If f is continuous then this implies that,

∂2f

∂x∂y=

∂2f

∂y∂x.

and therefore the Hessian matrix becomes,

H(x) =

(∂2f∂x2

∂2f∂y∂x

∂2f∂x∂y

∂2f∂y2

)=

(∂2f∂x2

∂2f∂x∂y

∂2f∂x∂y

∂2f∂y2

)Why is this useful?


Quadratic form

Example

If H is the symmetric matrix,

H =

(2 55 2

)the associated quadratic form Q = xTHx is

Q =(x y

)(2 55 2

)(xy

)= 2x2 + 5xy + 5yx+ 2y2

= 2x2 + 10xy + 2y2 .


General Quadratic form for bivariate functions

Bivariate Quadratic form

Given x =

(xy

)and a symmetric matrix,

M =

(a bb c

)then

Q := xTMx = ax2 + 2bxy + cy2.


Inverse operation of Quadratic form

Inverse Quadratic form result

Consider the quadratic form Q := ax2 + bxy + cyx+ dy2. Thenthe associated symmetric matrix M is

M =

(a 1

2(b+ c)12(b+ c) d

).

Example

If Q = 3x2 + 14xy + y2, then the associated symmetric matrix Mis,

M =

(3 77 1

).


Link to Quadratic form

The quadratic form Q = xTHx, and its associated symmetricmatrix H, may be classified as follows:

1. Q and H are positive definite, if Q > 0 for x 6= 0.

2. Q and H are negative definite, if Q < 0 for x 6= 0.

3. Q and H are positive semi-definite, if Q ≥ 0 and Q = 0 forsome x 6= 0.

4. Q and H are negative semi-definite, if Q ≤ 0 and Q = 0 forsome x 6= 0.

5. Q and H are indefinite, if there exist x1, x2 such that Q > 0at x = x1 and Q < 0 at x = x2.


Positive definite function

−10−5

05

10

−10−5

05

100

50

100

150

200

x1

x2

Q1(x

1,x2)

Figure 5: Q1 (x1, x2) = x21 + x22

Notice that the function is positive for all values in its domain(excluding x = 0).


Negative definite function

−10−5

05

10

−10−5

05

10−200

−150

−100

−50

0

x1

x2

Q2(x

1,x2)

Figure 6: Q2 (x1, x2) = −x21 − x22

Notice that the function is negative for all values in its domain(excluding x = 0).


Indefinite function

−10−5

05

10

−10−5

05

10−100

−50

0

50

100

x1

x2

Q3(x

1,x2)

Figure 7: Q3 (x1, x2) = x21 − x22

Notice the saddle point around x = 0.


Positive semi-definite function

−10 −5 0 5 10−10

0

100

100

200

300

400

x1

x2

Q4(x

1,x2)

Figure 8: Q4 (x1, x2) = x21 + 2x1x2 + x22

The function is positive but can be zero along the axis x1 = −x2.


Negative semi-definite function

−10 −5 0 5 10−10

0

10−400

−300

−200

−100

0

x1

x2

Q5(x

1,x2)

Figure 9: Q5 (x1, x2) = −(x21 + 2x1x2 + x22)

The function is negative but can be zero along the axis x1 = −x2.


Why bother with Quadratic form?

Reason

There exists sufficient conditions for the nature of critical pointsand those conditions are linked to the Hessian H and Q.

Sufficient conditions

1. If H is positive definite, then x0 is a local minimum of f(x).

2. If H is negative definite, then x0 is a local maximum of f(x).

3. If H is positive semi-definite, then the test fails.

4. If H is negative semi-definite, then the test fails.

5. If H is indefinite, then x0 is a saddle point of f(x).


Why does the test fail sometimes?

Recall the Taylor expansion,

f(x) = f(x0) + dT∇f(x0) + dTH (x0)d + · · ·

If H (x0) is semi-definite then the sign of dTH (x0)d is neitherpositive or negative everywhere.

So, nature determined by higher order terms in expansion.


Example of completing the square

Example

Let Q be given by Q = x2 + 6xy + 11y2.

Determine the nature of this quadratic form.

Solution

Q = x2 + 6xy + 11y2

= x2 + 6xy + (3y)2 + 2y2

= (x+ 3y)2 + 2y2

> 0

for (x, y) 6= (0, 0). Therefore Q is positive definite.








Generalise the optimisation

Consider the full generalised framework.

Multivariate framework

Variables x = (x1, x2, . . . , xn) ∈ D ⊂ Rn

Objective Function Z = f (x1, x2, . . . , xn)

Wish to extend the arguments in the Bivariate section withHessian and Quadratic forms to the n-dimensional case.

Notation

For the generalised partial derivatives denote,

∂pf

∂xi11 ∂xi22 . . . ∂x

inn

= fxi11 x

i22 ...x

inn


Taylor’s Theorem

Theorem

Suppose that f(x) and its partial derivatives of all orders less thanor equal to p+ 1 are continuous on an open set D ⊂ Rn and letx0 ∈ D such that the line segment joining x to x0 lies in D. Forevery x ∈ D, there exists ξ between x and x0, such that

f(x) =f(x0) +∑n

i=1 fxi (x0)(xi−x0i)

+ 12!

∑ni=1

∑nj=1 fxixj (x0)(xi−x0i)(xj−x0j) + ···

+ 1p!

∑pi∈S

p!i1!i2!...in!

fxi11 x

i22 ...x

inn

(x0)(x1−x01)i1 (x2−x02)i2 ...(xn−x0n)in+Rp(x)

where the summation setS = i1, i2, . . . , in; 0 ≤ i1, i2, . . . , in ≤ p, i1 + i2 + · · ·+ in = pand Rp is the remainder term.


Generalised gradient operator

Definition

Define the n-dimensional gradient operator

∇f :=

∂f∂x1

∂f∂x2

...

∂f∂xn

.

So the first summation is∑ni=1 fxi(x0)(xi − x0i) = (x− x0)T∇f(x0).


Generalised Hessian

Definition

Define the Hessian matrix

H =

(∂2f

∂xi∂xj

)1≤i,j≤n

=

∂2f

∂x21

∂2f

∂x1∂x2· · · ∂2f

∂x1∂xn∂2f

∂x2∂x1

∂2f

∂x22

· · ·...

......

. . ....

∂2f

∂xn∂x1· · · · · · ∂2f

∂x2n

.

Then the second summation is∑ni=1

∑nj=1 fxixj (x0)(xi − x0i)(xj − x0j) = dTH (x0)d.


Functions

Consider functions f such that,

∂2f

∂xi∂xj=

∂2f

∂xj∂xi

so that the Hessian H is always a symmetric matrix, i.e.Hij = Hji.

H =

∂2f

∂x21

∂2f

∂x1∂x2· · · ∂2f

∂x1∂xn∂2f

∂x1∂x2

∂2f

∂x22

· · ·...

......

. . ....

∂2f

∂x1∂xn· · · · · · ∂2f

∂x2n

.


Necessary conditions

Critical point

A point x0 is a stationary point of the function f = f(x) if

∂f

∂x1=

∂f

∂x2= . . . =

∂f

∂xn= 0.


General Quadratic form for univariate functions

Trivariate Quadratic form

Given x =

xyz

and a symmetric matrix,

M =

a b cb d ec e f

then

Q := xTMx = ax2 + 2bxy + 2cxz + dy2 + 2eyz + fz2.


Example of going Q→M

Example

The quadratic form

Q = x21 + 4x1x2 + 5x2

2 + 6x2x3 + 2x23 + 2x1x3

has the associated symmetric matrix,

M =

1 2 12 5 31 3 2

.


Inverse operation of Quadratic form (Trivariate)

Inverse Quadratic form result

Consider the quadratic form

Q := ax2 + bxy + cxz + dyz + ey2 + fz2.

Then the associated symmetric matrix M is

M =

a 1

2b12c

12b e 1

2d

12c

12d f

.


Multivariate Quadratic form

Example

The quadratic form

Q = x21 + 4x1x2 + 5x2

2 + 6x2x3 + 2x23 + 2x1x3

has the associated symmetric matrix,

M =

1 2 12 5 31 3 2

.


Completing the square

Example

Complete the square for the form

Q = x21 + 4x1x2 + 5x2

2 + 6x2x3 + 2x23 + 2x1x3

Solution

Q = (x21 + 4x1x2 + 2x1x3) + 5x2

2 + 6x2x3 + 2x23

= (x1 + 2x2 + x3)2 + x22 + 2x2x3 + x2

3

= (x1 + 2x2 + x3)2 + (x2 + x3)2 .

I Q is usually positive (sum of squares)

I Q can be zero if x = (a,−a, a) for some a 6= 0.


Same sufficient conditions apply for multivariate case.

If x = (x1, x2, . . . , xn)T and H is the associate Hessian matrix.

Assume ∇f(x0) = 0 for some x0 ∈ Rn








3-dimensional example

Find the critical points of z = f(x1, x2, x3) = 3x21x

22 + x2

3.Show that Hessian is only positive semi-definite and the localminimum test fails.Are the critical points actually local minima?








Alternative to completing the square

First method considered to check the nature of H is completingthe square.

Alternative method exists to determine the nature of the HessianH which checks the eigenvalues of the matrix H.

In particular, we check the sign of the eigenvalues of the matrix toshow it is positive or negative definite.


Brief review of linear algebra

Definition

The eigenvalues of a square n× n matrix H are the valuesλ1, λ2, . . . , λn such that,

det |H − λiI| = 0,

where det |M | is the determinant of the matrix M and I is theidentity matrix.

Recall, H is a symmetric matrix which means the eigenvalues arereal.

To ease the notation, assume eigenvalues are ordered so that,

λ1 ≤ λ2 ≤ · · · ≤ λn .


Sufficient conditions for nature of H

Eigenvalue Test

1. H is positive definite if and only if all its eigenvalues arepositive, i.e. λ1 > 0.

2. H is negative definite if and only if all its eigenvalues arenegative, i.e. λn < 0.

3. H is indefinite if and only if it has positive and negativeeigenvalues, i.e. λ1 < 0 and λn > 0.

4. H is positive semi-definite if and only if all its eigenvalues arenon-negative and at least one is zero, i.e. λ1 = 0.

5. H is negative semi-definite if and only if all its eigenvalues arenon-positive and at least one is zero, i.e. λn = 0.


Same sufficient conditions apply for multivariate case.

If x = (x1, x2, . . . , xn)T and H is the associate Hessian matrix.

Assume ∇f(x0) = 0 for some x0 ∈ Rn








Simple Hessian eigenvalue example

Example

Find all critical points for the function

f(x, y) = x2 + y2

and determine the nature of the critical points.

Solution

Finding the critical points,

∂f

∂x= 2x ,

∂f

∂y= 2y .

Critical point at x0 = (0, 0). Find Hessian,

H =

(2 00 2

).


3-dimensional Hessian example

Find all extrema of the function,

f(x, y, z) = 3xy − x3 − y3 − 3z2

Critical points at,

∂f

∂x= 3y − 3x2 = 0,

∂f

∂y= 3x− 3y2 = 0,

∂f

∂z= −6z = 0 .

Two critical points for these equations (1, 1, 0) and (0, 0, 0). TheHessian is

H(x, y, z) =

−6x 3 03 −6y 00 0 −6

.


At (0, 0, 0) the Hessian is

H(0, 0, 0) =

0 3 03 0 00 0 −6

.

The eigenvalues of H (0, 0, 0) are,

λ = 3, −3, −6.

Therefore H(0, 0, 0) is indefinite and (0, 0, 0) is a saddle point.At the other critical point (1, 1, 0),

H(1, 1, 0) =

−6 3 03 −6 00 0 −6

.

The eigenvalues are λ = −3, −6, −9, H(1, 1, 0) is negativedefinite and hence there is a local maximum at (1, 1, 0).








Convex Set

Definition

A set Ω ∈ Rn is convex, if for any points P , Q in Ω, the linesegment PQ joining P and Q lies in Ω. If p and q are the positionvectors of P and Q, then the point R with position vectorr = cp + (1− c)q lies in Ω, where 0 ≤ c ≤ 1.

Roughly speaking, if two points x, y ∈ Ω, then the line joining thepoints x and y is also in Ω.

I Linear constraints are convex,

g(x) =

n∑i=1

cixi − b = 0,

for some constants ci and b.


Convex function

Definition

A function f(x) = f(x1, x2, . . . , xn) is convex on the convex setΩ, if

f(cp + (1− c)q) ≤ cf(p) + (1− c)f(q) ,

where 0 ≤ c ≤ 1.

Note in particular, that linear functions are convex.Proof: Left as an exercise.

I A function is called strictly convex if the inequality is strict.

I A function f is (strictly) concave if −f is (strictly) convex.


Convex function example

x

y

p r q

f(p)

f(r)

f(q)cf(p) + (1− c)f(q)

Roughly speaking,

the chord between two points on f lies above f .


Theorem

If fini=1 are convex functions on the convex set Ω and ci ≥ 0,then

∑ni=1 cifi is convex on Ω.

Proof is left as an exercise.

Theorem

If f is a convex function on the convex set Ω. Then

Uc =x ∈ Ω

∣∣∣f(x) ≤ c

is a convex set for all c ∈ R.


Differentiable Convex functions

Theorem

Let f ∈ C1(Ω) with Ω convex. Then f is convex if and only if

f(y) ≥ f(x) + (y − x)T∇f(x)

for all x,y ∈ Ω.

Theorem

Let f ∈ C2(Ω), Ω is a non-empty convex set. Then f is convex onΩ if and only if Hf , the Hessian of f is positive semi-definite on Ω.


Check for Convexity

Example

Is the function

f(x, y) = 2x2 + xy + 2y2 − 12x+ 12y − 12

convex?


Check for Convexity

Example

Is the function

f(x, y) = −2x2 + xy + 2y2 − 2x+ 8y + 1

convex?


Convex optimisation

Theorem

If f is a convex function on a convex set Ω, then

M =

x ∈ Ω

∣∣∣minx∈Ω

f(x)

is a convex set and any local minimum is a global minimum.

Theorem

If f ∈ C1(Ω) with Ω convex and there exists a x∗ such that for ally ∈ Ω,

(y − x∗)T∇f(x∗) ≥ 0,

then x∗ is a global minimum of f over Ω.

MATH2070 - Non-linear optimisation without …Review: Nonlinear optimisation without constraints The full non-linear optimisation problem with no constraints Bivariate (two dimensional)

Documents