Nonlinear Programming Models

In LP ... the objective function & constraints are linear and the problems are “easy” to solve.

Most real-world problems have nonlinear elements and are hard to solve.

Nonlinear Programming Models

Minimize f(x)

s.t. gi(x) (, , =) bi, i = 1,…,m

x is the n-dimensional vector of decision variables

f(x) is the objective function

gi(x) are the constraint functions

bi are fixed known constants

General NLP

Example 1 Max 3x1 + 2x2

4

s.t. x1 + x2 1, x1 0, x2 unrestricted

2

Example 2 Max ec1x1 ec2x2 … ecnxn

Example 3 Minnj=1

fj(xj)

s.t. Ax = b, x 0

where each fj(xj) is of the form

Problems with“decreasing efficiencies”

Examples 2 and 3 can be reformulated as LPs

s.t. Ax = b, x 0

fj(xj)

xj

Max f(x1, x2) = x1x2

s.t. 4x1 + x2 8

x1, x20

2

8

f(x) = 2

f(x) = 1

x2

Optimal solution will lie on the line g(x) = 4x1 + x2 – 8 = 0.

x1

NLP Graphical Solution Method

• Solution is not a vertex of feasible region.

• For this particular problem the solution is on the boundary of the feasible region.

• This is not always the case.

Solution Characteristics

Gradient of f(x) = f(x1, x2) (f/x1, f/x2)T

This gives f/x1 = x2, f/x2 = x1

and g/x1 = 4, g/x2 = 1

At optimality we have f(x1, x2) = g(x1, x2)

or x2* = 4 and x1

* = 1

f(x)

x

localmin

globalmax stationary

point

localmin

localmax

Nonconvex Function

Let S Rn be the set of feasible solutions to an NLP.

Definition: A global minimum is any x0 S such that

f(x0) f(x)

for all feasible x not equal to x0.

Function with Unique Global Minimum at x = (1, –3)

What is the optimal solution if x1 0 and x2 0 ?

Min {f(x)= sin(x) : 0 x 5}

Function with Multiple Maxima and Minima

Constrained Function with Unique Global Maximum and Unique Global Minimum

d2f (x)dx2 ≥ 0 for all x.

Convex for Univariate f :

Convex function: If you draw a straight line between any two points on f(x) the line will be above or on the line of f(x).

Concave function: If f(x) is convex than - f(x) is concave.

Linear functions are both convex and concave.

Convexity

Definition of ConvexityLet x1 and x2 be two points in S Rn. A function f(x) is convex if and only if

f(x1 + (1–)x2) ≤ f(x1) + (1–)f(x2)

for all 0 < < 1. It is strictly convex if the inequality sign ≤ is replaced with the sign <.

x1 x2 x

f(x)

x + (1–)x1 2

f(x )+(1–)f(x )1 2

f(x +(–l)x )1 2

1-dimensional example

))1(( 21 xxf

)()1()( 21 xfxf

21 )1( xx

f(x)

x

Nonconvex -- Nonconcave Function

A positively weighted sum of convex functions is convex:

if fk(x) k =1,…,m are convex and 1,…,m 0

then f(x) = kfk(x) is convex.

m

k=1

Theoretical Result for Convex Functions

Hessian of f at x:

Example: f(x) = 2x1

3 + 3x22 – 4x1

2x2 +

5x1-8

2

2

2

2

1

2

2

2

22

2

12

21

2

21

2

21

2

2 )x(

nnn

n

n

x

f

xx

f

xx

f

xx

f

x

f

xx

f

xx

f

xx

f

x

f

f

…

…

…

Determining Convexity

Single Dimensional Functions:

A function f(x) C1 is convex if and only if it is underestimated by linear extrapolation; i.e.,

f(x2) ≥ f(x1) + (df(x1)/dx)(x2 – x1) for all x1 and x2.

A function f(x) C2 is convex if and only if its second derivative is nonnegative.

d2f(x)/dx2 ≥ 0 for all x

If the inequality is strict (>), the function is strictly convex.

x1 x2

f(x)

Multiple Dimensional Functions

Definition: The Hessian matrix H(x) associated with

f(x) is the n n symmetric matrix of second partial

derivatives of f(x) with respect to the components of x.

Example: f(x) = 3(x1)2 + 4(x2)3 – 5x1x2 + 4x1

2122

21

245

56)( and

512

456)(

xxx

xxf xHx

When f(x) is quadratic, H(x) has only constant terms;

when f(x) is linear, H(x) does not exist.

Properties of the Hessian

• H(x) is positive semi-definite (PSD) if and only if xTHx ≥ 0 for all

x and there exists an x 0 such that xTHx ≥ 0.

• H(x) is positive definite (PD) if and only if xTHx > 0 for all x 0.

• H(x) is indefinite if and only if xTHx > 0 for some x, and xTHx < 0 for some other x.

How can we use Hessian to determine whether or not f(x) is convex?

Multiple Dimensional Functions and Convexity

• f(x) is convex if only if f(x2) ≥ f(x1) + Tf(x1)(x2 – x1) for all x1 and x2.

• f(x) is convex (strictly convex) if its associated Hessian matrix H(x) is positive semi-definite (definite) for all x.

• f(x) is concave if only if f(x2) ≤ f(x1) + ▽Tf(x1)(x2 – x1) for all x1 and x2.

• f(x) is concave (strictly concave) if its associated Hessian matrix H(x) is negative semi-definite (definite) for all x.

• f(x) is neither convex nor concave if its associated Hessian matrix H(x) is indefinite

Testing for Definiteness

Definition: The ith leading principal submatrix of H is the matrix

formed taking the intersection of its first i rows and i columns.

Let Hi be the value of the corresponding determinant:

obtained. is untilon so and , ,2221

12112111 nH

hh

hhHhH

Let Hessian, H =

nnnn

n

n

hhh

hhh

hhh

...

..

..

..

...

...

21

22221

11211

• Definition – The kth order principal submatrices of an n

n symmetric matrix A are the k k matrices obtained by deleting n - k rows and the corresponding n - k columns of A (where k = 1, ... , n).

• Example

346

570

812

A

346

570

812

70

12,

36

82,

34

57

3 ,7 ,2

,3

,2,2,2

,1,1,1

AH

HHH

HHH

a

cba

cba

11 12 13

21 22 23

31 32 33

r

11 22 33

r

11 13 22 2311 12

31 33 32 3321 22

Example:

Principal submatrices of order 1: (PS (A))

[ ] [ ] [ ]

Principal submatrices of order 2: (PS (A))

a a a

A a a a

a a a

a a a

a a a aa a

a a a aa a

11 12 13

21 22 23

31 32 33

r

11 12 1311 12

11 21 22 2321 22

31 32 33

Principal submatrix of order 3

Leading principal submatrices (LPS (A))

[ ], ,

a a a

a a a

a a a

a a aa a

a a a aa a

a a a

Rules for Definiteness• H is positive definite if and only if the determinants of all the

leading principal submatrices are positive; i.e., Hi > 0 for i = 1,…,n.

• H is negative definite if and only if H1 < 0 and the remaining leading principal determinants alternate in sign:

H2 > 0, H3 < 0, H4 > 0, . . .• H is positive-semidefinite if and only if all principal

submatrices ( Hi ) have nonnegative determinants.

• H is negative semi-definiteness if and only if

Hi 0 for i odd and Hi 0 for i even .

Quadratic Functions

Example 1: f(x) = 3x1x2 + x12 + 3x2

2

63

32)( and

63

23)(

21

12 xHxxx

xxf

so H1 = 2 and H2 = 12 – 9 = 3

Conclusion f(x) is strictly convex because

H(x) is positive definite.

Quadratic Functions

Example 2: f(x) = 24x1x2 + 9x12 + 6x2

2

3224

2418)( and

3224

1824)(

21

12 xHxxx

xxf

H1 = 18 and H2 = 576 – 576 = 0 → f is not PD

• H is positive semi-definite (determinants of all

principal submatrices are nonnegative) → f(x) is

convex .

• Note, xTHx = 18(x1 + (4/3)x2)2 ≥ 0.

Nonquadratic Functions

Example 3: f(x) = (x2 – x12)2 + (1 – x1)2

24

42124)(

1

1212

x

xxxxH

Thus the Hessian depends on the point under consideration:

At x = (1, 1), which is positive definite.

At x = (0, 1), which is indefinite.

Thus f(x) is not convex although it is strictly convex near (1, 1).

24

410)1,1(H

20

02)00( ,H

Example

18060

600A

Is matrix A PD or PSD or ND or NSD or Indefinite ?

Convex Sets

Definition: A set S n is convex if any point on the line

segment connecting any two points x1, x2 S is also in S.

Mathematically, this is equivalent to

x0 = x1 + (1–)x2 S for all such that 0 ≤ ≤ 1.

x1

x2

x1x1x2

x2

x1

x2

S = {(x1, x2) : (0.5x1 – 0.6)x2 ≤ 1

2(x1)2 + 3(x2)2 ≥ 27; x1, x2 ≥ 0}

(Nonconvex) Feasible Region

Convex Sets and Optimization

Let S = { x n : gi(x) bi, i = 1,…,m }

Fact: If gi(x) is a convex function for each i = 1,…,m then S is a convex set.

Convex Programming Theorem: Let x n and let f(x) be a

convex function defined over a convex constraint set S. If a

finite solution exists to the problem

Minimize{f(x) : x S}

then all local optima are global optima. If f(x) is strictly

convex, the optimum is unique.

Note• Let s = { x n : g(x) b}. Fact: If g (x) is a convex function, then s is a convex set.

• Let S = { x n : gi(x) bi, i = 1,…,m }

Fact: If gi(x) is a convex function for each i = 1,…,m then S is a convex set.

• Let t = { x n : g(x) b}. Fact: If g (x) is a concave function, then t is a convex set.

• Let T = { x n : gi(x) bi, i = 1,…,m }

Fact: If gi(x) is a concave function for each i = 1,…,m then T is a convex set.

Max f(x1,…,xn)

s.t. gi(x1,…,xn) bi

i = 1,…,mx1 0,…,xn 0

is a convex program if f is concave and each gi is convex.

Convex Programming

Min f(x1,…,xn)

s.t. gi(x1,…,xn) bi

i = 1,…,mx1 0,…,xn 0

is a convex program if f is convex and each gi is convex.

x11 2 3 4 5

1

2

3

4

5

x2

Maximize f(x) = (x1 – 2)2 + (x2 – 2)2

subject to –3x1 – 2x2 ≤ –6

–x1 + x2 ≤ 3

x1 + x2 ≤ 7

2x1 – 3x2 ≤ 4

Linearly Constrained Convex Function with Unique Global Maximum

(Nonconvex) Optimization Problem

Commercial optimization software cannot guarantee that a solution is globally optimal to a nonconvex program.

Importance of Convex Programs

NLP algorithms try to find a point where the gradient of the Lagrangian function is zero – a stationary point – and complementary slackness holds.

Given L(x,) = f(x) + (g(x) – b)

we want

L(x,) = 0, g(x) – b ≤ 0, g(x)-b] = 0, x 0, 0

However, for a convex program, all local solutions are globally optima.

We want to build a cylinder (with a top and a bottom) of maximum volume such that its surface area is no more than S units.

Max V(r,h) = r2h

s.t. 2r2 + 2rh = S

r 0, h 0

r

h

There are a number of ways to approach this problem. One way is to solve the surface area constraint for h and substitute the result into the objective function.

Example: Cylinder Design

h =S 2r2

2rVolume = V = r2 [

S 2r2

r] =

rS 2

r3

dVdr

= 0 r = (S

6)1/2

, h = S

2r r =2(S

6)1/2

V = r2h = 2(S

6)3/2

r = (S

6)1/2

h = 2(S

6)1/2

Is this a global optimal solution?

Solution by Substitution

V(r) = rS 2

r3 dV(r) dr =

S2 3r2

d2V(r)

dr2r

d2V dr2 0 for all r0

Thus V(r) is concave on r0 so the solution is a global maximum.

Test for Convexity

• A company wants to advertise in two regions.

• The marketing department says that if $x1 is spent in region 1, sales volume will be 6(x1)1/2.

• If $x2 is spent in region 2 the sales volume will be 4(x2)1/2.

• The advertising budget is $100.

Model: Max f(x) = 6(x1)1/2 + 4(x2)1/2

s.t. x1 + x2 100, x1 0, x2 0

Advertising (with Diminishing Returns)

Solution: x1* = 69.2, x2

* = 30.8, f(x*) = 72.1

Is this a global optimum?

Excel Add-in Solution

1

2345678910111213141516171819

A B C D E F G H I J K L M N O

Nonlinear Model Name: Adv100 Objective Terms Solver: Excel Solver

72.111 Type: NLP1 Linear: 0 Type: Nonlinear2 Change Goal: Max NonLinear 1: 72.111 Sens.: Yes Comp. Time 00:00

TRUE Objective: 72.111NonLinear 2: 0 Status OptimalTRUE Solve100 Variables 1 2

Change Relation Name: X1 X2Values: 69.231 30.769

Lower Bounds: 0 0

Linear Obj. Coef.: 0 0Nonlinear Obj. Terms: 8.3205 5.547Nonlinear Obj. Coef.: 6 4

ConstraintsNum. Name Value Rel. RHS Linear Constraint Coefficients

1 Con1 100 <= 100 1 12 Con2 0 <= 10000 0 0

Let j = expected return

jjvariance of return

We are also concerned with the covariance terms:

ij= cov (ri, rj)

If ij > 0 then returns on i and j are positively correlated.

If ij < 0 returns are negatively correlated.

Portfolio Selection with Risky Assets (Markowitz)

• Suppose that we may invest in (up to) n stocks.

• Investors worry about (1) expected gain (2) risk.

Decision Variables: xj = # of shares of stock j purchased

R(x) = jxj

n

j=1Expected return of the portfolio:

V(x) = ijxixj Variance (measure of risk):

V(x) = 11x1x1 + 12x1x2 + 21x2x1 + 22x2x1

= 2 + (2) + (2) + 2 = 0

Thus we can construct a “risk-free” portfolio (from variance point of view) if we can find stocks “fully” negatively correlated.

n

i=1

n

j=1

Example If x1 = x2 = 1, we get

22

22

2221

1211

If , then purchasing stock 2 is just like

purchasing additional shares of stock 1.

22

22

2221

1211

Nonlinear optimization models …

Let pj = price of stock j,

b = our total budget

risk-aversion factor ( 0 risk is not a factor)

Consider 3 different models:

1) Max f(x) = R(x) – V(x)

s.t. pjxj b, xj 0, j = 1,…,n

where 0 determined by the decision maker

n

j=1

2) Max f(x) = R(x)

s.t. V(x) , pjxj b, xj 0, j = 1,…,n

where 0 is determined by the investor. Smaller

values of represent greater risk aversion.

n

j=1

3) Min f(x) = V(x)

s.t. R(x) , pjxj b, xj 0, j = 1,…,n

where 0 is the desired rate of return

(minimum expectation) is selected by the investor.

n

j=1

Nonlinear Programming Models

Documents