Linear Programming 2011 1 IE 531 Linear Programming Spring 2011 박 성 수박 성 수.

Linear Programming 2011 1

IE 531 Linear Programming

Spring 2011

박 성 수


Course Objectives

Why need to study LP?Important tool by itselfTheoretical basis for later developments (IP, Network, Graph, Nonlinear,

scheduling, Sets, Coding, Game, … )Formulation + package is not enough for advanced applications and

interpretation of results

Objectives of the class:Understand the theory of linear optimizationPreparation for more in-depth optimization theoryModeling capabilitiesIntroduction to use of software (Xpress-MP and/or CPLEX)


Prerequisite: basic linear algebra/calculus,

earlier exposure to LP/OR helpful,

mathematical maturity (reading proofs, logical thinking)

Be steady in studying.


Mathematical Programming Problem:

min/max f(x)

subject to gi(x) 0, i = 1, ..., m,

(hj(x) = 0, j = 1, ..., k,)

( x X Rn)

f, gi, hj : Rn R

If f, gi, hj linear (affine) function linear programming problem

If f, gi, hj (or part of them) nonlinear function nonlinear programming problem

If solution set restricted to be integer points integer programming problem

Chapter 1 Introduction


Linear programming: problem of optimizing (maximize or minimize) a linear (objective) function subject to linear inequality (and equality) constraints.

General form:

{max, min} c'x

subject to ai'x bi , iM1

ai'x bi , iM2

ai'x = bi , iM3

xj 0, jN1 , xj 0, jN2

c, ai , x Rn

(There may exist variables unrestricted in sign) inner product of two column vectors x, y Rn :

x’y = i = 1n xiyi

If x’y = 0, x, y 0, then x, y are said to be orthogonal. In 3-D, the angle between the two vectors is 90 degrees.

( vectors are column vectors unless specified otherwise)


Big difference from systems of linear equations is the existence of objective function and linear inequalities (instead of equalities)

Much deeper theoretical results and applicability than systems of linear equations.

x1, x2, …, xn : (decision) variables

bi : right-hand-side

ai'x { , , } bi : i th constraint

xj { , } 0 : nonnegativity (nonpositivity) constraint

c'x : objective function

Other terminology:

feasible solution, feasible set (region), free (unrestricted) variable, optimal (feasible) solution, optimal cost, unbounded


Important submatrix multiplications

Interpretation of constraints: see as submatrix multiplication.

A: mn matrix

1

1

' | |

' | |n

m

a

A A A

a

denote constraints as Ax { , , } b

nj

mi iijj xeaxAAx 1 1 ' , where ei is i-th unit vector

mi

nj jjii eAyayAy 1 1 ''''


Any LP can be expressed as min c'x, Ax b

max c'x min (-c'x) and take negative of the optimal cost

ai'x bi -ai'x -bi

ai'x = bi ai'x bi , -ai'x -bi

nonnegativity (nonpositivity) are special cases of inequalities which will be handled separately in the algorithms.

Feasible solution set of LP can always be expressed as Ax b (or Ax b) (called polyhedron, a set which can be described as a solution set of finitely many linear inequalities)

We may sometimes use max c'x, Ax b form (especially, when we study polyhedron)


Brief History of LP (or Optimization) Gauss: Gaussian elimination to solve systems of equations

Fourier(early 19C) and Motzkin(20C) : solving systems of linear inequalities

Farkas, Minkowski, Weyl, Caratheodory, … (19-20C): Mathematical structures related to LP (polyhedron, systems of

alternatives, polarity)

Kantorovich (1930s) : efficient allocation of resources

(Nobel prize in 1975 with Koopmans)

Dantzig (1947) : Simplex method

1950s : emergence of Network theory, Integer and combinatorial optimization, development of computer

1960s : more developments

1970s : disappointment, NP-completeness, more realistic expectations

Khachian (1979) : ellipsoid method for LP


1980s : personal computer, easy access to data, willingness to use models

Karmarkar (1984) : Interior point method

1990s : improved theory and software, powerful computerssoftware add-ins to spreadsheets, modeling languages,

large scale optimization, more intermixing of O.R. and A.I.

Markowitz (1990) : Nobel prize for portfolio selection (quadratic programming)

Nash (1994) : Nobel prize for game theory

21C (?) : Lots of opportunities

more accurate and timely data available

more theoretical developments

better software and computer

need for more automated decision making for complex systems

need for coordination for efficient use of resources (e.g.

supply chain management, APS, traditional engineering problems, bio, finance, ...)


Application Areas of Optimization

Operations Managements

Production Planning

Scheduling (production, personnel, ..)

Transportation Planning, Logistics

Energy

Military

Finance

Marketing

E-business

Telecommunications

Games

Engineering Optimization (mechanical, electrical, bioinformatics, ... )

System Design

…


Resources Societies:

INFORMS (the Institute for Operations Research and Management Sciences) : www.informs.org

MPS (The Mathematical Programming Society) : www.mathprog.orgKorean Institute of Industrial Engineers : kiie.orgKorean Operations Research Society : www.korms.or.kr

Journals:

Operations Research, Management Science, Mathematical Programming, Networks, European Journal of Operational Research, Naval Research Logistics, Journal of the Operational Research Society, Interfaces, …


Standard form problems

Standard form : min c'x, Ax = b, x 0

Two view points:Find optimal weights (nonnegative) from possible nonnegative linear

combinations of columns of A to obtain b vectorFind optimal solution that satisfies linear equations and nonnegativity

Reduction to standard form

Free (unrestricted) variable xj xj+ - xj

- , xj+, xj

- 0

j aijxij bi j aijxij + si = bi , si 0 (slack variable)

j aijxij bi j aijxij - si = bi , si 0 (surplus variable)

nj

mi iijj xeaxAAx 1 1 '


Any (practical) algorithm can solve the LP problem in equality form only (except nonnegativity)

Modified form of the simplex method can solve the problem with free variables directly (w/o using difference of two variables).

It gives more sensible interpretation of the behavior of the algorithm.


1.2 Formulation examples See other examples in the text. Minimum cost network flow problem

Directed network G=(N, A), (|N| = n )

arc capacity uij , (i, j) A, unit flow cost cij , (i, j) A

bi : net supply at node i (bi > 0: supply node, bi < 0: demand node), (i bi = 0)

Find min cost transportation plan that satisfies supply, demand at each node and arc capacities.

minimize (i, j)A cijxij

subject to {j : (i, j)A} xij - {j : (j, i)A} xji = bi , i = 1, …, n

(out flow - in flow = net flow at node i)

(some people use, in flow – out flow = net flow)

xij uij , (i, j)A

xij 0 , (i, j)A


Choosing paths in a communication network ( (fractional) multicommodity flow problem)

Multicommodity flow problem: Several commodities share the network. For each commodity, it is min cost network flow problem. But the commodities must share the capacities of the arcs. Generalization of min cost network flow problem. Many applications in communication, distribution / transportation systems

Several commodities caseActually one commodity. But there are multiple origin and destination

pairs of nodes (telecom, logistics, ..)

Given telecommunication network (directed) with arc set A, arc capacity uij bits/sec, (i, j) A, unit flow cost cij /bit , (i, j) A, demand bkl bits/sec for traffic from node k to node l.

Data can be sent using more than one path.

Find paths to direct demands with min cost.


Decision variables:

xijkl : amount of data with origin k and destination l that

traverses link (i, j) A

Let bikl = bkl if i = k

-bkl if i = l

0 otherwise

Formulation (flow formulation)

minimize (i, j)A k l cijxijkl

subject to {j : (i, j)A} xijkl - {j : (j, i)A} xji

kl = bikl , i, k, l = 1, …, n

(out flow - in flow = net flow at node i for

commodity from node k to node l)

k l xijkl uij , (i, j)A

(The sum of all commodities should not exceed the

capacity of link (i, j) )

xijkl 0 , (i, j)A, k, l =1, …, n


Alternative formulation (path formulation)

Let K: set of origin-destination pairs (commodities)

P(k): set of all possible paths for sending commodity k K

P(k;e): set of paths in P(k) that traverses arc e A

E(p): set of links contained in path p

Decision variables:

ypk : fraction of commodity k sent on path p

minimize kK pP(k) wpkyp

k

subject to pP(k) ypk = 1, for all kK

kK pP(k; e) bkypk ue , for all eA

0 ypk 1, for all p P(k), k K

where wpk = bkeE(p) ce

If ypk {0, 1}, it is a single path routing problem (path selection,

integer multicommodity flow).


path formulation has smaller number of constraints, but enormous number of variables.

can be solved easily by column generation technique (later).

Integer version is more difficult to solve.

Extensions: Network design - also determine the number and type of facilities to be installed on the links (and/or nodes) together with routing of traffic.

Variations: Integer flow. Bifurcation of traffic may not be allowed. Determine capacities and routing considering rerouting of traffic in case of network failure, Robust network design (data uncertainty), ...


Pattern classification (Linear classifier)

Given m objects with feature vector ai Rn , i = 1, …, m.

Objects belong to one of two classes. We know the class to which each sample object belongs.

We want to design a criterion to determine the class of a new object using the feature vector.

Want to find a vector (x, xn+1) Rn+1 with x Rn such that, if i S, then ai'x xn+1, and if i S, then ai'x < xn+1. (if it is possible)


Find a feasible solution (x, xn+1) that satisfies

ai'x xn+1, i S

ai'x < xn+1. i S

for all sample objects i

Is this a linear programming problem?

( no objective function, strict inequality in constraints)


Is strict inequality allowed in LP?

consider min x, x > 0 no minimum point. only infimum of objective value exists

If the system has a feasible solution (x, xn+1), we can make the difference of the rhs and lhs large by using solution M(x, xn+1) for M > 0 and large. Hence there exists a solution that makes the difference at least 1 if the system has a solution.

Remedy: Use ai'x xn+1, i S

ai'x xn+1-1, i S

Important problem in data mining with applications in target marketing, bankruptcy prediction, medical diagnosis, process monitoring, …

VariationsWhat if there are many choices of hyperplanes? any reasonable criteria?What if there is no hyperplane separating the two classes?Do we have to use only one hyperplane?Use of nonlinear function possible? How to solve them?

• SVM (support vector machine), convex optimizationMore than two classes?



1.3 Piecewise linear convex obj. functions

Some problems involving nonlinear functions can be modeled as LP.

Def: Function f : Rn R is called a convex function if for all x, y Rn and all [0, 1]

f(x + (1- )y) f(x) + (1- )f(y).

( the domain may be restricted)

f called concave if -f is convex

(picture: the line segment joining (x, f(x)) and (y, f(y)) in Rn+1 is not below the locus of f(x) )


Def: x, y Rn, 1, 2 0, 1+ 2 = 1

Then 1x + 2y is said to be a convex combination of x, y.

Generally, i=1k ixi , where i=1

k i = 1 and i 0, i = 1, ..., k is a convex combination of the points x1, ..., 타

Def: A set S Rn is convex if for any x, y S, we have 1x + 2y S for any 1, 2 0, 1+ 2 = 1.

Picture: 1x + 2y = 1x + (1 - 1) y, 0 1 1

= y + 1 (x – y), 0 1 1

(line segment joining x, y lies in S)x (1 = 1)

y (1 = 0)

(x-y)

(x-y)

If we have 1x + 2y, 1+ 2 = 1 (without 1, 2 0), it is called an affine combination of x and y.

Picture: 1x + 2y = 1x + (1 - 1) y,

= y + 1 (x – y), (1 is arbitrary)

(line passing through x, y)



Picture of convex function

x y(1 )x y

( (1 ) )f x y

( ) (1 ) ( ) f x f y

)(xf

nx R

1( , ( )) nx f x R ( , ( ))y f y

))()1()(,)1(( yfxfyx


relation between convex function and convex set

Def: f: Rn R. Define epigraph of f as epi(f) = { (x, ) Rn+1 : f(x) } Then previous definition of convex function is equivalent to epi(f) being

a convex set. When dealing with convex functions, we frequently consider epi(f) to exploit the properties of convex sets.

Consider operations on functions that preserve convexity and operations on sets that preserve convexity.


Example:

Consider f(x) = max i = 1, …, m (ci'x + di), ci Rn, di R

(maximum of affine functions, called a piecewise linear convex function.)

nx R

)(xf

c1'x+d1

c2'x+d2

c3'x+d3

x


Thm: Let f1, …, fm : Rn R be convex functions. Then

f(x) = max i = 1, …, m fi(x) is also convex.

pf) f(x + (1- )y) = max i=1, …, m fi(x + (1- )y )

max i=1, …, m (fi(x) + (1- )fi(y) )

max i=1, …, m fi(x) + max i=1, …, m (1- )fi(y) = f(x) + (1- )f(y)


Min of piecewise linear convex functions

Minimize max I=1, …, m (ci'x + di)Subject to Ax b

Minimize z Subject to z ci'x + di , i = 1, …, m

Ax b


Q: What can we do about finding max of a piecewise linear convex function?

maximum of a piecewise linear concave function (can be obtained as min of affine functions)?

Min of a piecewise linear concave function?


Convex function has a nice property such that a local min point is a global min point. (when domain is Rn or convex set) (HW later)

Hence finding min of a convex function defined over a convex set is usually easy. But finding a max of a convex function is difficult to solve. Basically, we need to examine all local max points.

Similarly, finding a max of concave function is easy, but finding min of a concave function is difficult.


In constraints, f(x) h

where f(x) is piecewise linear convex function f(x) = max i=1, …, m (fi'x + gi).

fi'x + gi h, i = 1, … , m

Q: What about constraints f(x) h ? Can it be modeled as LP?

Def: f: Rn R, convex function, R

The set C = { x: f(x) } is called the level set of f

level set of a convex function is a convex set. (HW later)

solution set of LP is convex (easy) non-convex solution set can’t be modeled as LP.


Problems involving absolute values

Minimize i = 1, …, n ci |xi|

subject to Ax b (assume ci 0)

More direct formulations than piecewise linear convex function is possible.

(1)Min i ci zi subject to Ax b xi zi , i = 1, …, n -xi zi , i = 1, …, n

(2)Min i ci (xi

+ + xi-)

subject to Ax+ - Ax- b x+ , x- 0

(want xi+ = xi

if xi 0, xi- = -xi

if xi < 0 and xi

+xi- = 0, i.e., at most one of

xi+, xi

- is positive in an optimal solution.ci 0 guarantees that.)


Data Fitting

Regression analysis using absolute value function

Given m data points (ai , bi ), i = 1, …, m, ai Rn , bi R.

Want to find x Rn that predicts results b given a with function b = a'x

Want x that minimizes prediction error | bi - ai'x | for all i.

minimize z

subject to bi - ai'x z, i = 1, … , m

-bi + ai'x z, i = 1, … , m


Alternative criterion

minimize i = 1, …, m | bi - ai'x |

minimize z1 + … + zm

subject to bi - ai'x zi , i = 1, … , m

-bi + ai'x zi , i = 1, … , m

Quadratic error function can't be modeled as LP, but need calculus method (closed form solution)


Special case of piecewise linear objective function : separable piecewise linear objective function.

function f: Rn R, is called separable if f(x) = f1(x1) + f2(x2) + … + fn(xn)

xi

fi(xi)

a3a2a10

c3

c2

c1

c4

slope: ci

x1i x2i x3i x4i

c1 < c2 < c3 < c4


Express xi in the constraints as xi x1i + x2i + x3i + x4i , where

0 x1i a1, 0 x2i a2 - a1 , 0 x3i a3 - a2, 0 x4i

In the objective function, use :

min c1x1i + c2x2i + c3x3i + c4x4i

Since we solve min problem, it is guaranteed that we get

xki > 0 in an optimal solution implies xji , j < k have values at their upper bounds.


1.4 Graphical representation and solution

Let a Rn, b R.

Geometric intuition for the solution sets of

{ x : a’x = 0 }

{ x : a’x 0 }

{ x : a’x 0 }

{ x : a’x = b }

{ x : a’x b }

{ x : a’x b }


{ x : a’x 0 }

{ x : a’x 0 }

Geometry in 2-D

a

{ x : a’x = 0 }

0


Let z be a (any) point satisfying a’x = b. Then

{ x : a’x = b } = { x : a’x = a’z } = { x : a’(x – z) = 0 }

Hence x – z = y, where y is any solution to a’y = 0, or x = y + z.

Similarly, for { x : a’x b }, { x : a’x b }.

{ x : a’x b }

{ x : a’x b }

a

{ x : a’x = 0 }

0{ x : a’x = b }

z


min c1x1 + c2x2

s.t. -x1 + x2 1, x1 0, x2 0

c=(-1, -1)

c=(1, 1)

c=(1, 0)

c=(0, 1)

{x: x1 + x2 = 0} {x: x1 + x2 = z}

x1

x2


Representing complex solution set in 2-D

( n variables, m equations (coefficient vectors are linearly independent), nonnegativity, and n – m = 2 )

x1

x3

x2x1 = 0

x2 = 0 x3 = 0

See text sec. 1.5, 1.6 for more backgrounds

Linear Programming 2011 1 IE 531 Linear Programming Spring 2011 박 성 수박 성 수.

Documents