Optimization Algorithms - Technische Universität · PDF fileCombinatorial Optimization, Randomization, ... Combinatorial Optimization: Algorithms and Complexity, Prentice Hall, Englewood

Optimization Algorithms

Karsten Weihe

Algorithmics - Technische Universitat Darmstadthttp://www.algo.informatik.tu-darmstadt.de/

Winter Term 2012 / 2013

Copyright c© 2012 byMatthias Muller-Hannemann and Karsten Weihe

All rights reserved

c©2006 M. Muller-Hannemann & K. Weihe Algorithmics - TU Darmstadt Optimization Algorithms 1

Organization

How to reach me ""bb""eebbrrr rrr

Hochschulstraße 10, Gebaude S202

Room E122

My office hour:s after the lecture and by appointment

e-mail: [email protected]

Homepage:http://www.algo.informatik.tu-darmstadt.de

https://www.algo.informatik.tu-darmstadt.de/lehre/

wintersemester-1213/optimierungsalgorithmen/

Forum:http://www.fachschaft.informatik.tu-darmstadt.de/

forum/viewforum.php?f=???


http://www.algo.informatik.tu-darmstadt.de

https://www.algo.informatik.tu-darmstadt.de/lehre/wintersemester-1213/optimierungsalgorithmen/

https://www.algo.informatik.tu-darmstadt.de/lehre/wintersemester-1213/optimierungsalgorithmen/

Organization

Important Dates ""bb""eebbrrr rrr

Lecture: on Wednesdays, 9:40 — 11:20 (intermediate break?)

No other weekly session

Oral exam: at individual dates

Prior to the oral exam date: individual meetings to discussyour solution to the programming exercise


Organization

Programming Exercise I ""bb""eebbrrr rrr

Not a formal requirement, but will be an essential part of theexam!

Algorithmic problem:

Input: a finite set of rectangles, each given by its two edgelengths (aka width and height)Output: a placement of all rectangles in the plane such that

the rectangles are placed openly disjoint, that is, the openinteriors of any two rectangles do not intersect, andthe placement of each rectangle is axis-parallel, that is, theedges are parallel to the coordinate axes (may be turned by 90degrees).

Objective: minimize the area of the bounding box, that is, thesmallest axis-parallel rectangular area enclosing all inputrectangles.


Organization

Programming Exercise II ""bb""eebbrrr rrr

Task:

Implement the following algorithms from the lecture: localsearch, simulated annealing, taboo search, greedy, andbacktracking.

Implement a visualization that shows, step by step, how eachalgorithm places and removes rectangles in the plane.

Implement a reasonable random generator for inputs, wherethe number of rectangles is given.

Your code should conform to good software developmentpractice.

At least C, C++, C#, and Java are ok.

Further details to be best discussed in the discussion forum.


Organization

Lecture Notes ""bb""eebbrrr rrr

All slides will be available on the website.

Slides are in English.

Based in large parts on lecture notes in previous years,

but substantially revised and extended.


Section 1: Introduction


Introduction 1.1 Overview

Focus of This Course: Optimization Problems ""bb""eebbrrr rrr

Not on efficiently solvable problems but on NP–hardproblems.

Not on algorithms tailored to specific problems but onall–purpose algorithms (generic algorithms).

These algorithms are not considered independently of eachother but as specializations of even more general, fundamentalalgorithmic concepts.

“Cool algorithms” such as evolution strategies and geneticalgorithms will turn out to be (more or less straightforward)specializations.



Planned Contents ""bb""eebbrrr rrr

Introduction to Optimization Problems

NP–completeness

Neighborhood-based approaches (local search, exactneighborhoods, heuristics)

Decision-based approaches (decision trees, exhaustive search,constraint propagation, branch & bound, dynamicprogramming)

Approximation algorithms

Randomized algorithms

Fixed-parameter algorithms



Recommendations for Further Reading (I) ""bb""eebbrrr rrr

R.K. Ahuja, T.L. Magnanti, J.B. Orlin: Network Flows: Theory,Algorithms, and Applications. Prentice Hall, 1993.

T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein: Introduction toAlgorithms, MIT Press, 2nd edition, 2001.

M.R. Garey and D.S. Johnson: Computers and Intractability — A Guideto the Theory of NP-Completeness, Freeman, 1979.

D.S. Hochbaum (ed.): Approximation Algorithms for NP-Hard Problems,PWS Publishing Company, Boston, MA, 1997.

H.H. Hoos and T. Stutzle: Stochastic Local Search, Morgan KaufmannPublishers Inc, 2004.

J. Hromkovic: Algorithmics for Hard Problems. Introduction toCombinatorial Optimization, Randomization, Approximation, andHeuristics (Texts in Theoretical Computer Science. An EATCS Series),Springer Verlag, 2001.



Recommendations for Further Reading (II) ""bb""eebbrrr rrr

B. Korte, J. Vygen: Combinatorial Optimization: Theory and Algorithms,Springer Verlag, third edition, 2006.

R. Motwani: Randomized Algorithms, Cambridge University Press, 1995.

G.L. Nemhauser, L. A. Wolsey. Integer and Combinatorial Optimization.John Wiley and Sons, New York, 1988.

R. Niedermeier: Invitation to Fixed-Parameter Algorithms, Oxford LectureSeries in Mathematics and Its Applications, Oxford University Press, 2006.

A. Schrijver: Theory of Linear and Integer Programming: Wiley,Chichester, 1986.

A. Schrijver: Combinatorial Optimization: Polyhedra and Efficiency,Springer 2003.

C.H. Papadimitriou, K. Steiglitz: Combinatorial Optimization: Algorithmsand Complexity, Prentice Hall, Englewood Cliffs, NJ, 1982.

V.V. Vazirani: Approximation Algorithms, Springer Verlag, 2001.

L.A. Wolsey: Integer Programming. Wiley-Interscience, 1998.



Outline of Section 1 ""bb""eebbrrr rrr

Before going into the algorithms, we first focus for a while on theproblems:

examples, examples, examples, and

some general aspects that all optimization problems have incommon.

Remark:

The examples section overlaps with the annual lecture“Algorithmic Modeling” to some extent.


Introduction 1.2 Examples

Example 1: (Euclidean) TSP ""bb""eebbrrr rrr

TSP = Traveling Salesman Problem

Input: a finite set of points in the plane.

Feasible output: a closed cycle on the points.

Objective: minimizing the length of the cycle(=sum of edge lengths).



General TSP ""bb""eebbrrr rrr

The problem from the last slide is but a special case of theTSP.−→ Called the Euclidean TSP.

The general TSP abstracts from points in the plane:

Input: a nonnegative integer n and a real–valued(n × n)–matrix D.

Feasible outputs: the permutations of 1, . . . , n.Objective: find a permutation σ that minimizes

n−1∑i=1

D[σ(i), σ(i + 1)] + D[σ(n), σ(1)] .

Euclidean TSP: D[i , j ] is the (Euclidean) distance from pointno. i to point no. j .



Steiner Tree Problem in the Plane ""bb""eebbrrr rrr

Input: A set of points K in the plane(elements of K are terminals).

Desired output: Find a network of minimum total length(=sum of edge lengths) connecting all terminals.

Euclidian metric rectilinear metric



Why Steiner Tree Problem? ""bb""eebbrrr rrr

“A very simple but instructive problem wastreated by Jakob Steiner. Three villagesA,B,C are to be joined by a system of roadsof minimum length.”

(Richard Courant and Herbert Robbins:What is Mathematics? 1941)

Jakob Steiner(1796-1863)

But already mentioned in letters between Gauß and Schumacher(1836)!



The Steiner Tree Problem in Graphs ""bb""eebbrrr rrr

Input:A graph G = (V ,E ), a terminal set K ⊆ V ,edge weights we ≥ 0 for all e ∈ E .

Desired output:A connected cycle-free subgraph T = (V ′,E ′) of G with K ⊆ V ′

and minimum total weight∑

e∈E ′ we .



Applications of Steiner Trees ""bb""eebbrrr rrr

Many applications:

Facility location

Network planning and -optimizationgas and telecommunication networks

Phylogenetic trees

Routing in VLSI design

Generalizations: packing of Steiner trees

Generalizations: survivable network design



Special Case: MST ""bb""eebbrrr rrr

MST = Minimum Spanning Tree.

A proper special case of the Steiner–tree problem in networks.

Specialization: All nodes of the input graph are terminals.

Surprisingly,

the MST problem is polynomial,the general Steiner–tree problem in networks is NP–hard.



Generalization: Steiner–Packing Problem ""bb""eebbrrr rrr

Now we have several, pairwise disjoint sets of terminals.

We are looking for one separate Steiner tree for each set ofterminals.

The Steiner trees must not hit each other.

Sensible generalized objectives: sum or maximum of the totallengths of all individual Steiner trees.



Application of This Packing Version: VLSI ""bb""eebbrrr rrr

VLSI = Very Large Scale Integrated.

Input:

Graph: 3D–grid with a huge width and breadth but a smallheight.Terminals: pins to be connected by wires.

The connections are Steiner trees in the grid.

Of course, these connections must not hit each other to avoidshort–circuits.



But ... ""bb""eebbrrr rrr

The total length of the tree is not the only relevant objective.

The pins are not completely exchangeable: One designatedpin serves as a driver (“source”) which sends a signal to allother pins (“sinks”) in its set.→ Look for a directed rooted Steiner tree (rooted at thesource).

Important objective is (roughly) to minimize the maximal runtime from each driver to all of its sinks.

This objective is much harder.

For certain stages of the VLSI design process, the objectivesfrom Slide no. 22 are sensible alternatives:

much easier to handle mathematically,probably (hopefully!) close enough to reality.



Variation: Fault–tolerant networks ""bb""eebbrrr rrr

A communication network should not become disconnected ifone server (node) or one connection (arc) breaks down due tohardware/software failures.

In terms of graph theory: Any two nodes A and B should beconnected through the network by at least two paths which are

edge–disjoint in case only break–downs of edges are relevant;node–disjoint except for A and B (a.k.a. internallynode–disjoint) in case break–downs of servers shall also betaken into account.

Further realistic variants:

At least three, four... connecting, disjoint paths for any twonodes A and B.Different numbers of required disjoint paths for pairs A, Bwith different priorities (e.g. important companies vs. people inthe outback).



Example 3: Timetabling ""bb""eebbrrr rrr

Given:

A set of rooms, each coming with the number of seats.A set of time slots, supposed to be non-overlapping (to makethings easier).A set of courses, each filling exactly one time slot (to makethings easier).A set of students, each coming with a list of courses in whichthis student is registered.

Find:an assignment of rooms and time slots to courses such that

no two courses are assigned the same room and time slot,the audience of a course is not larger than the capacity of theroom assigned to this course, andno student is registered for two courses at the same time.



Example 4: Linear Regression ""bb""eebbrrr rrr

Input: a finite set of points in the plane.

Background: These points are probes which would ideally (i.e.without inaccuracies in the measured values) reveal anaffine–linear relation between the two parameters (=lie on astraight line).

Desired output: the “best approximating straight line”.



What is the “best approximating straight line”? ""bb""eebbrrr rrr

The most obvious, straightforward definition for“best approximating line” would be to minimizethe sum of the distances of all points from the line.

Either in vertical direction: Or in perpendicular direction:



Best approximating straight line (cont’d) ""bb""eebbrrr rrr

The standard approach is known as the least–squares method.

This means: The sum of the squares of the vertical distancesis to be minimized.

Why this model:Can be reduced to solving one linear equation system.−→ Proof left out here (see lectures on statistics).



Example 5: nonlinear optimization problem ""bb""eebbrrr rrr

Let f , gi (i = 1, . . . ,m), hj (j = 1, . . . , p) be differentiable (or atleast continuous) functions from Rn 7→ R. Then the problem

minimize f (x) subject to

gi (x) ≤ 0 for i = 1, . . . ,mhj(x) = 0 for j = 1, . . . , p

x ∈ Rn

is called a nonlinear optimization problem.



Convex Sets ""bb""eebbrrr rrr

Given two distinct points x , y ∈ Rn, a convex combination ofthem is any point of the form z = λx + (1−λ)y for λ ∈ R and0 ≤ λ ≤ 1 (this is a strict convex combination if 0 < λ < 1).

A set S ⊆ Rn is convex if it contains all convex combinationsof pairs of points x , y ∈ S .

convex set non−convex set

x yx

y



Example 6: convex optimization problem ""bb""eebbrrr rrr

a bx z y

f

A function f : S 7→ R (where S ⊆ Rn is a convex set) isconvex in S if for any two points x , y ∈ S and 0 ≤ λ ≤ 1 wehave

λf (x) + (1− λ)f (y) ≥ f (λx + (1− λ)y).

Examples of convex functions: linear, quadratic, exponential.

Let S ⊆ Rn be a convex set, and f : Rn 7→ R be a convexfunction. Then the problem of finding an x ∈ S thatminimizes f (x) among all x ∈ S is called a convexminimization problem.



Example 7: Linear Programming Problem ""bb""eebbrrr rrr

Given c ∈ Rn,A ∈ R(m,n), b ∈ Rm, then

maximize cT x subject toAx ≤ bx ∈ Rn

is called a linear programming problem (LP).

−→ A special case of convex optimization.



Example 8: Integer Linear Programs ""bb""eebbrrr rrr

Given c ∈ Rn,A ∈ R(m,n), b ∈ Rm, then

maximize cT x subject toAx ≤ bx ∈ Zn

is called aninteger linear programming problem (ILP).

If a variable is restricted to be zero-one valued, it is called abinary variable.

If all variables are binary variables, the problem is called abinary linear programming problem.

If only a subset of the variables is required to beinteger-valued, the corresponding problem is called a mixedinteger linear programming problem (MILP).



Example 9: Semi–Infinite Programming ""bb""eebbrrr rrr

This is a specific class of algorithmic problems or, better, ofmodels or formulations of algorithmic problems.

In semi–infinite models,

the number of variables is finite butthe number of side constraints may be infinite.

Exemplary problem class: min-max problemsInput: two arbitrary sets, S and T , and an objective functionC : S × T −→ R.

Feasible outputs: pairs (s, t) such that s ∈ S and t ∈ T .

Objective: find s0 ∈ S and t0 ∈ T such that

C (s0, t0) = mins∈S

maxt∈T

C (s, t) .



Semi–infinite re–formulation of the example ""bb""eebbrrr rrr

Drop S and introduce a new variable z ∈ R.

−→ The solution space is now T × R.

For each s ∈ S introduce the function

gs :T −→R

defined by gs(t) := C (s, t) for all t ∈ T .

For each s ∈ S introduce the side constraint

gs(t) ≤ z .

The objective to be minimized is z .

−→ If T is finite–dimensional, this re–formulation is semi–infinite.



Concrete application: polynomial approximation ""bb""eebbrrr rrr

Input: a real–valued interval [a . . . b], a functionf : [a . . . b] −→ R, a natural number k ∈ N0.

Feasible output: a real–valued vectoru = (u[0], u[1], . . . , u[k]) ∈ Rk+1.

Auxiliary notation: for u ∈ Rk+1 let

∆(u) := maxx∈[a...b]

∣∣∣ f (x)−k∑

i=0

u[i ] · x i∣∣∣ .

Objective: find u0 ∈ Rk+1 such that

∆(u0) = minu∈Rk+1

∆(u) .



Semi–infinite re–formulation ofpolynomial approximation ""

bb""eebbrrr rrr

Informal interpretation:

The function f is to be approximated by a polynomial ofdegree at most k such that the maximal approximation errorin [a . . . b] is minimized.

It is well known that a polynomial of degree at most k can beidentified with its k + 1 coefficients, so we may optimize overRk+1.

Re–formulation:

Input: as before.

Feasible output: a pair (u, z) such that u ∈ Rk+1, z ∈ R, and∣∣∣ f (x)−k∑

i=0

u[i ] · x i∣∣∣ ≤ z for all x ∈ [a . . . b] .

Objective: minimizing z .


Section 1.3General Discussion of Algorithmic Problems


Introduction 1.3 General Discussion of Algorithmic Problems

Types of Algorithmic Problems ""bb""eebbrrr rrr

Decision problem: is there a solution or not.

Construction problem: if there is a solution, construct one.

Enumeration problem: give all solutions.

Optimization problem: construct a solution that is optimal (orat least nearly optimal) subject to a given objective.

Construction and optimization problems are by far the mostimportant types of problems in practice.



Construction and Optimization Problems ""bb""eebbrrr rrr

Algorithms for decision problems are typically constructive,which means they solve the corresponding constructionproblem as well.

Trivially, each construction problem may be viewed as anoptimization problem: just define an objective that assigns thesame value to each solution.

Moreover, many generic algorithms are in fact applicable tooptimization problems only.

To apply such a generic algorithm to a construction problem,it has to be transformed into an optimization problem (lateron, we will see general techniques for that).

→ For all of these reasons, we may focus on optimization problemsin the following.



Ingredients of an Optimization Problem ""bb""eebbrrr rrr

An optimization problem may be described by the followingingredients:

the feasible inputs (a.k.a. feasible instances),

the feasible outputs for a given instance,

the objective.



Ingredients of an Optimization Problem ""bb""eebbrrr rrr

More formally:

A set I of potential inputs.

For each input I ∈ I a set FI of feasible solutions.

An objective function objI : FI −→ R and a direction:minimizing or maximizing.

Task:

determine whether FI 6= ∅ and,if so, find x ∈ FI such that

objI (x) =

minobjI (y) | y ∈ FImaxobjI (y) | y ∈ FI



Example 10: Matching ""bb""eebbrrr rrr

Matching:

Input: an undirected graph G = (V ,E ).

Feasible output: a set M ⊆ E such that no two edges in Mhave an incident node in common.←→ M is called a matching of G .

Objective: maximizing #M.

Ingredients:

The instances I ∈ I are the undirected graphs G = (V ,E ).

For an instance G = (V ,E ) ∈ I, FG is the set of allmatchings in G .

The objective function counts the number of edges in thematching.



Specification of Feasible Solutions ""bb""eebbrrr rrr

Typically, the sets FI are specified by ground sets SI and sideconstraints.

Side constraint: a predicate (=boolean function) on SI .

For an instance I ∈ I, let SCI denote the set of all sideconstraints for I .

Feasible solution: the set of x ∈ SI such thatc(x) is satisfied for all constraints c ∈ SCI .

Applied to the matching example:

For G = (V ,E ) ∈ I, we can define SG as the power set(=set of all subsets) of E .

Side constraints: For each subset M ⊆ E and for allv1, v2, w1,w2 ∈ M, it must be

v1 6= w1, v1 6= w2, v2 6= w1, and v2 6= w2.



More Formal Specification of theMatching Problem ""

bb""eebbrrr rrr

For an instance G = (V ,E ) of the matching problem, theelements of SG may be alternatively encoded as the set of all0/1–vectors x defined on the index set E .

Interpretation: x [e] = 1 if and only if e ∈ M

Side constraints:For x ∈ SG and all v ∈ V , it must be∑

e=(v ,w)∈E

x [e] ≤ 1.



More Complex Example:the General TSP Revisited ""

bb""eebbrrr rrr

I can be viewed as the set of all quadratic real–valuedmatrices D:

D[i , j ] = distance from point no. i to point no. j .

−→ Cf. Slide no. 14.

For an (n × n)–matrix I ∈ I, SI may then be the set of allquadratic 0/1–matrices X of size n:X [i , j ] = 1 ⇐⇒j follows i immediately on the cycle corresponding to X .

Objective for matrix size n:

minimizingn∑

i=1

n∑j=1

D[i , j ] · X [i , j ].



TSP: Exemplary Set of Side Constraints ""bb""eebbrrr rrr

∀i ∈ 1, . . . , n : X [i , i ] = 0−→ No arc from a node to itself.

∀i ∈ 1, . . . , n :n∑

j=1

X [i , j ] = 1

−→ Each node is left by exactly one arc.

∀i ∈ 1, . . . , n :n∑

j=1

X [j , i ] = 1

−→ Each node is approached by exactly one arc.



However, still possible ... ""bb""eebbrrr rrr

Additional side constraints to fix the problem:

∀S ( 1, . . . , n, S 6= ∅ :n∑i=1i∈S

n∑j=1j 6∈S

X [i , j ] ≥ 1

−→ At least one arc from each (non–empty) set S ofnodes to its (non–empty) complement.

See the annual lecture on “algorithmic modeling”(every summer term) for more stuff like this.



Feasibility and Boundedness ""bb""eebbrrr rrr

An instance I ∈ I of an algorithmic problem is called feasible,if FI 6= ∅, otherwise infeasible.An instance I ∈ I of a minimization (resp. maximization)problem is called bounded, if the objective function isbounded over FI from below (resp. above), otherwise, it itcalled unbounded.

Note:

Boundedness of an instance I ∈ I is not identical with the normal,set–theoretic boundedness of FI .

Consider the important case that FI ⊆ Rn for some n, FI is closed, andthe objective function is continuous on FI :

Basic calculus says for this case: boundedness of FI (i.e., FI iscompact) implies the existence of a minimum and a maximum.In particular, boundedness of I .

The interval FI = (0, 1] is obviously bounded, but min log x is unboundedover FI .



Exact vs. Approximation vs. Heuristic ""bb""eebbrrr rrr

An algorithm is called exact if:

Feasibility version: It provably finds a feasible solution if thereis one.

Optimization version: It provably finds an optimal solution ifthere is one.

An algorithm is called approximative if:

Feasibility version: It finds a solution that is provably not toofar from feasibility according to some reasonable measure.Optimization version: It finds a solution that is provably nottoo far from optimality according to some reasonable measure.

An algorithm is called heuristic if:

Feasibility version: It attempts at finding a feasible or nearlyfeasible solution, but no quality guarantee is proved.Optimization version: It attempts at finding an optimal ornearly optimal solution, but no quality guarantee is proved.


Section 2: NP–completeness


NP–completeness 2.1 Introduction

Convincing your boss ... ""bb""eebbrrr rrr

[taken from Garey & Johnson’s introduction to “Computers and Intractability”]

Imagine that you are the chief algorithm designer of your company.

Your boss asks you to solve a problem of high importance for yourcompany.

After several weeks of intensive work you don’t find a solutionsubstantially better than trying out all possible solutions.

You certainly don’t want to return to your boss and report:

“I can’t find an efficient algorithm, I guess I’m just too dumb.”




You want to avoid serious damage to your position in the company.

So it might be better if you can prove that the problem is inherentlyintractable, that no algorithm can solve it quickly.

Then you are in a position to return to your boss and proclaim:

“I can’t find an efficient algorithm, because no such algorithm ispossible.”

Unfortunately, proving inherent intractability can be very hard.




But the theory of NP–completeness provides techniques for proving thata given problem is “just as hard” as a large number of other problemsthat are widely recognized as being difficult.

Armed with these techniques you might be able to prove that yourproblem is NP–complete.

Then you can march to your boss and announce:

“I can’t find an efficient algorithm, but neither can all these famouspeople.”



Motivation to Study NP–completeness ""bb""eebbrrr rrr

NP–completeness is a form of bad news: evidence that manyimportant problems can’t be solved quickly.

Why should we care?

Knowing that a problem is hard lets you stop beating your headagainst a wall trying to solve them, and do something better:

Use a heuristic. If you can’t quickly solve the problem with agood worst case time, maybe you can come up with a methodfor solving a reasonable fraction of the common cases.

Solve the problem approximately instead of exactly. A lot ofthe time it is possible to come up with a provably fastalgorithm, that doesn’t solve the problem exactly but comesup with a solution you can prove is close to right.



Why should we care? (continued) ""bb""eebbrrr rrr

Use an exponential time solution anyway. If you really have tosolve the problem exactly, you can settle down to writing anexponential time algorithm and stop worrying about finding apolynomial time algorithm.

Choose a better abstraction. The NP–complete abstractproblem you’re trying to solve presumably comes fromignoring some of the seemingly unimportant details of a morecomplicated real world problem. Perhaps some of those detailsshouldn’t have been ignored, and make the difference betweenwhat you can and can’t solve.


NP–completeness 2.2 Encodings and Machine Models

Representation of the Input ""bb""eebbrrr rrr

To study the efficiency of algorithms, we first need a formalnotion of input size.

The input must be encoded or represented as a sequence ofsymbols over some fixed alphabet such as bits or characters.

Once we have decided how the input is represented as asequence of symbols, we define the input size as the length ofthis sequence, that is, the number of symbols in it.

Note: Input size depends on the chosen encoding.Example: integers

input: number 2006 has input size:

4 using decimal representation11 using binary representation2006 using unary representation



Equivalence of Encodings ""bb""eebbrrr rrr

Notation:

I instance of some problem class

C (I ) encoding of I

〈C (I )〉 length of the encoding of I with C

Definition

Encodings C1,C2 are polynomially equivalent (with respect to aproblem class) iff there are polynomials p1, p2 : N 7→ N such thatfor all instances I of the problem class we have

〈C2(I )〉 ≤ p1(〈C1(I )〉) and 〈C1(I )〉 ≤ p2(〈C2(I )〉).



Equivalence of Encodings ""bb""eebbrrr rrr

Examples:

graphs with n nodes and m edgesinput size: O(n2) with an adjacency matrix

O(n + m) with adjacency lists

→ encodings as adjacency matrix or adjacency lists arepolynomially equivalent

integersbinary and k-ary representation of integers are polynomiallyequivalentbut: binary and unary encoding or not equivalent



Reasonable Encodings ""bb""eebbrrr rrr

What is a “reasonable” encoding scheme?

Although this notion is hard to formalize, the following conditionscapture much of it:

1 the encoding of an instance I should be concise and not“padded” with unnecessary information and symbols,

2 numbers occurring in I should be encoded in binary (or in anyfixed base other than 1),

3 it should be decodable. The intent of “decodability” is that,given any particular component of a generic instance, oneshould be able to specify a polynomial time algorithm that iscapable of extracting a description of that component fromany encoded instance.



Standard Encodings ""bb""eebbrrr rrr

integers: in binary, 〈n〉 = dlog2 |n|+ 1e+ 1

rational numbers: store numerator and denominator separatelyq = s

t has encoding length 〈q〉 = 〈s〉+ 〈t〉.(Choose s and t such that the greatest common divisor is 1.)

vector x = (x1, . . . , xn) of n integers: 〈x〉 = 〈x1〉+ · · ·+ 〈xn〉graph G with n nodes and m edges as array of adjacency lists〈G 〉 = n + m

Note: here we count the encoding length of nodes and edges as 1,although we would need Θ(m log n) space to distinguish all objects.

Why is this common practice?

Within the range 0 to MAXINT, all integers occupy the same space,independent of their size in a realistic computer environment.



Running Time of an Algorithm ""bb""eebbrrr rrr

Let A be an algorithm which accepts inputs from a set I.

For the running time of an algorithm we count the elementarysteps of A on input I ∈ I with respect to a model ofcomputation and some encoding of the input:

TA : N 7→ N whereTA(〈I 〉) = sum of the costs of all elementary steps of A

on input I .

Elementary steps (examples):variable assignment, random access to a variable whose indexis stored in another variable, conditional jumps, and simplearithmetic operations (addition, multiplication, division,comparison of numbers)



Machine Models ""bb""eebbrrr rrr

Note: the cost of an elementary step depends on the machinemodel.For example:addition of two k-digit binary numbers in O(k), oraddition of two k-digit binary numbers in O(1)

Our general assumption:arithmetic operations require O(1) time (unit cost model)

Many theoretical machine models exist, most importantly

Turing machine (one tape, k-tapes)RAM (Random Access Machine)



Equivalence of Machine Models ""bb""eebbrrr rrr

Each of these (and other possible) machines can be simulatedby each other machine such that the running times for inputsof same size differ from each other only by a polynomial factorand the necessary space consumption only by a constantfactor.

“differ by a polynomial factor” means: if Ti denotes therunning time with respect to machine model i (i = 1, 2), thenthere exist polynomials p1, p2 such that

T1(〈I 〉) ≤ p1(T2(〈I 〉)) for all inputs I , and

T2(〈I 〉) ≤ p2(T1(〈I 〉)) for all inputs I

Equivalence thesis: This equivalence holds for all “reasonable”models of computation.

Has been shown formally for many models.


NP–completeness 2.3 The Classes P and NP

Polynomial-Time Solvable Problems ""bb""eebbrrr rrr

Definition

An algorithm A is said to run in polynomial time(has polynomial-time worst case complexity) for a problem class ifthere is an integer k such that for all instances I of this problemclass

TA(〈I 〉) ≤ O(〈I 〉k)

and all numbers in intermediate computations can be stored withO(〈I 〉k) bits.

A problem class is polynomial-time solvable if there is apolynomial-time algorithm for this class.

Note: this definition is independent from the machine model dueto the equivalence thesis.



Decision (Recognition) Problems ""bb""eebbrrr rrr

Optimization problems can be reformulated as decisionproblems (that can be answered by YES or NO):

Given an instance I (represented by a set FI of feasiblesolutions and an objective function objI which we want tominimize) and an integer k , is there a feasible solution s ∈ FIwith objI (s) ≤ k?

The decision problem is no harder than the originaloptimization problem.

This implies: Any negative result proved about the complexityof the decision version will apply to the optimization versionas well.



The Classes P and NP ""bb""eebbrrr rrr

Definition

P denotes the class of decision problems that can be solved by apolynomial-time algorithm.

For many problems, we do not know whether they are in P.

However, quite often we are able to check in polynomial timewhether the YES-answer to a decision problem is correct ornot (without worrying about how hard it might be to find thesolution).

This leads to the definition of the class NP.



The Classes P and NP ""bb""eebbrrr rrr

Definition

A decision problem belongs to the class NP if for everyYES-instance I there is a short certificate C (I ) which can bechecked in polynomial time for validity.

More formally,there is an integer k and a (certificate-checking) algorithm A suchthat for every YES-instance I there exists a certificate C (I ) oflength 〈C (I )〉 ≤ O(〈I 〉k) (the length is polynomial in 〈I 〉) suchthat A with input I and C (I ) can verify the YES-answer in at mostO(〈I 〉k) steps.



Remarks on NP ""bb""eebbrrr rrr

An equivalent definition of the class NP works withnon-deterministic models of computation. In such models, aprogram may “guess” (call an oracle) at certain steps, andmust verify YES-instances in polynomial time.

“Guessing” corresponds to telling the certificate.

NP stands for “non-deterministic polynomial time”.

It does NOT mean non-polynomial time!

Lemma

P ⊆ NP.

Proof.

For any problem in P, we can choose the certificate to beempty.



Membership in NP ""bb""eebbrrr rrr

Example:HAMILTON CIRCUIT:Instance: An undirected graph G .Question: Has G a Hamilton circuit?(Hamilton circuit = a simple cycle in G which visits each vertex.)

Lemma

HAMILTON CIRCUIT belongs to NP.

Proof.

For each YES-instance G we take any Hamilton circuit of G as acertificate. Checking whether an edge set is a Hamilton circuit canobviously be done in polynomial time.



The P–vs–NP-Problem ""bb""eebbrrr rrr

Is P = NP or P 6= NP?

big open question in theoretical computer science

P = NP would imply:all problems in NP are polynomial-time solvable(although we do not know the efficient algorithms).

P 6= NP would imply:there is a complexity gap between these classes.

Clay foundation offered $ 1.000.000 for the solution(see http://www.claymath.org/millenium)


http://www.claymath.org/millenium


(Karp) Reductions ""bb""eebbrrr rrr

Next step: study “hardest” problems in NPP2 is at least as hard as P1 if P1 is a “special case” of P2

Definition

Let P1 and P2 be two decision problems. We say thatP1 polynomially transforms (reduces) to P2 (we write P1 ∝ P2) ifthere is a function f : P1 7→ P2 such that

for each instance I ∈ P1, we can compute f (I ) ∈ P2 inpolynomial time (with respect to 〈I 〉), and

I is a YES-instance of P1 if and only iff (I ) is YES-instance of P2.

Lemma

∝ is transitive.

If P2 ∈ P and P1 ∝ P2, then P1 ∈ P.



The Class NPC ""bb""eebbrrr rrr

Definition

A decision problem P0 is called NP–complete if

P0 ∈ NP, and

P1 ∝ P0 for all P1 ∈ NP.

NPC denotes the class of NP–complete problems.

Part (2) of the following theorem will be the basic tool to provemembership in the class NPC.

Theorem

(1) If NPC ∩ P 6= ∅, then P = NP.

(2) If P1 ∈ NPC,P2 ∈ NP, and P1 ∝ P2, then P2 ∈ NPC.

Of course, the definition of the class NPC would be meaningless,if no NP–complete problems existed.



The Satisfiability Problem (SAT) ""bb""eebbrrr rrr

Let X = x1, x2, . . . , xn be a set of Boolean variables.

A truth assignment for X is a function T : X 7→ true, false.The negation of a variable x is denoted by x .

The elements of the set L := X ∪ x | x ∈ X, i.e. the variables and theirnegations are called literals. The truth assignment is extended to L in theobvious way, by setting T (x) := true if T (x) = false and vice versa.

A clause over X is a disjunction of literals, i.e. a logical or-combination ofliterals (denoted by +). It is satisfied by a truth assignment if and only ifat least one of its literals is true.

A conjunction of clauses F = C1 · C2 · · · · · Cm, i.e. a logicaland-combination of clauses (denoted by multiplication ·), is satisfiable ifand only if there is a truth assignment which simultaneously satisfies allof its clauses Ci .

Such a formula F is a Boolean formula in conjunctive normal form.



The Satisfiability Problem (SAT) ""bb""eebbrrr rrr

SATISFIABILITY (SAT):Instance: A set of variables X and a formula F = C1 · C2 · · · · · Cm

in conjunctive normal form over X .Question: Is F satisfiable?

Example:

C1 = x3, C2 = x1 + x2 + x3

F = C1 · C2 is satisfiable by setting x3 = false and x1 = true(x2 arbitrary)



The First NP–complete problem ""bb""eebbrrr rrr

Stephen Cook proved that the class of NP–complete problems isnon-empty by showing that it contains SATISFIABILITY:

Theorem (Cook (1971))

SATISFIABILITY is NP–complete.

Since then, NP–completeness has been shown for severalthousand problems.

We will see a couple of further examples ...



3SAT ""bb""eebbrrr rrr

3SAT:Instance: A set of variables X and a formula F = C1 · C2 · · · · · Cm

in conjunctive normal form over X , where each clause containsexactly three literals.Question: Is F satisfiable?

3SAT ∈ NP, since it is a special case of SATISFIABILITY.

To show NP–completeness, we shall show thatSATISFIABILITY polynomially transforms to 3SAT.

Consider any formula F consisting of clauses C1, . . . ,Cm.

We shall construct a new formula F ′ with three literals perclause, such that F ′ is satisfiable if and only if F is.

We shall examine the clauses of F one by one and replace eachCi by an equivalent set of clauses, each with three literals.



Transforming SATISFIABILITY to 3SAT ""bb""eebbrrr rrr

We distinguish among four cases:

Case 1: Ci has exactly 3 literals. We do nothing.

Case 2: Ci is a clause with more than 3 literals, sayCi = (λ1 + λ2 + · · ·+ λk), k > 3.

We replace Ci by the following k − 2 clauses:(λ1 + λ2 + y1), (y1 + λ3 + y2), (y2 + λ4 + y3), . . . , (yk−3 + λk−1 + λk),where y1, . . . , yk−3 are new variables.

Case 3: Ci is a clause with exactly two literals, say Ci = (λ1 + λ2).

We introduce three new variables y1, y2, y3 and replace C by the following5 clauses:

(λ1 + λ2 + y1), (y1 + y2 + y3), (y1 + y2 + y3), (y1 + y2 + y3), (y1 + y2 + y3)

Note: the last four clauses force y1 to be false.

Case 4: Ci is a clause with only one literal, say Ci = λ1.

We introduce four new variables y1, y2, y3, y4 and replace C by thefollowing 9 clauses:

(λ1 + y1 + y2), (y1 + y3 + y4), (y1 + y3 + y4), (y1 + y3 + y4), (y1 + y3 + y4),(y2 + y3 + y4), (y2 + y3 + y4), (y2 + y3 + y4), (y2 + y3 + y4)

Obviously, the transformation can be done in polynomial time.



3SAT is NP–complete ""bb""eebbrrr rrr

Thus, we have proved

Theorem (Cook (1971))

3SAT is NP–complete.

Remark:If we restrict each clause to consist of just two literals, the problem(called 2SAT) can be solved in linear time.



The Stable Set Problem ""bb""eebbrrr rrr

Some graph terminology:A clique is a set of pairwise adjacent vertices.A node set S is called stable (independent) if no two nodes in Sare connected by an edge.

STABLE SET (INDEPENDENT SET):Instance: A graph G = (V ,E ) and an integer k.Question: Is there a stable set of ≥ k vertices?

Theorem (Karp (1972))

STABLE SET is NP–complete.

Proof:

STABLE SET ∈ NP, since a stable set of size k is acertificate which we can verify in polynomial time.



NP–completeness of STABLE SET ""bb""eebbrrr rrr

We show: SATISFIABILITY ∝ STABLE SET.

Consider an instance I of SATISFIABILITY with clauses Z1,Z2, . . .Zm

where Zi = yi1 + yi2 + · · ·+ yiki with yij ∈ xij , xij.We construct for I an instance f (I ) of STABLE SET, i.e. a graph G andan integer k such that

(1) I is satisfiable if and only if G has a stable set of size k, and(2) the construction can be done in polynomial time.

Construction:for each clause Zi , we introduce a clique Ci of ki vertices according to theliterals of this clause.

Vertices corresponding to different clauses are connected by an edge ifand only if the literals contradict each other (i.e., one literal is thenegation of the other).

We choose k := m (the number of clauses).



Example: SATISFIABILITY ∝ STABLE SET ""bb""eebbrrr rrr

Example:F = (x1 + x2 + x3) · (x1 + x3) · (x3 + x2) · (x1 + x2 + x3)

x

x

x

x

x x

x

x

x

2

2

3

3

2

x31

11

3

Suppose G has a stable set S with m vertices.

Then, S contains exactly one node from each clique Ci .

Set the literals corresponding to vertices in S to true. Set all othervariables arbitrarily.

Obviously, this gives us a truth assignment fulfilling all clauses.

Conversely, if some truth assignment satisfies all clauses, then we choosea literal which is true out of each clause.

The set of corresponding vertices defines a stable set of size m in G . 2



VERTEX COVER and CLIQUE ""bb""eebbrrr rrr

A vertex cover in an undirected graph G = (V ,E) is a subset S ⊆ V ofvertices such that every edge of G is incident to at least one vertex of S .

vertex cover stable set clique

Lemma

Let G = (V ,E ) be a graph and X ⊆ V . The following threestatements are equivalent:(1) X is a vertex cover in G .(2) V \ X is a stable set in G .(3) V \ X is a clique in the complement of G .



VERTEX COVER and CLIQUE ""bb""eebbrrr rrr

VERTEX COVER:Instance: A graph G = (V ,E ) and an integer k.Question: Is there a vertex cover of cardinality ≤ k?

CLIQUE:Instance: A graph G = (V ,E ) and an integer k.Question: Is there a clique of cardinality ≥ k?

Corollary (Karp (1972))

VERTEX COVER and CLIQUE are NP–complete.



The Class coNP ""bb""eebbrrr rrr

The definition of the class NP is not symmetric with respectto YES- and NO-instances.

For example, it is an open question whether the followingproblem belongs to NP:given a graph G , is it true that G is not Hamiltonian ?

Definition

coNP is the class of decision problems for which (as in thedefinition for NP) a certificate checking algorithm exists for theNO-instances (which runs in polynomial time).A decision problem P0 is called coNP–complete if

(1) P0 ∈ coNP, and(2) P1 ∝ P0 for all P1 ∈ coNP.

coNPC denotes the class of coNP–complete problems.



The Class coNP ""bb""eebbrrr rrr

The complement co(P0) of a decision problem P0 has the sameinstances as P0, but the question with respect to co(P0) is thenegation of the question for P0.

Theorem

A decision problem is NP–complete if and only if its complementis coNP–complete.

Interesting is the class NP ∩ coNP.For problems in this class there are certificates that can be checkedin polynomial time for YES- as well as for NO-instances.

Edmonds called such problems “problems with a goodcharacterization”.



PRIMES ""bb""eebbrrr rrr

Example: a problem in NP ∩ coNP for which membership in Pwas shown only recently:

PRIMES:Instance: A number n ∈ N (in its binary representation).Question: Is n prime?

NO-certificate: any factor of nYES-certificate: number-theoretical methods (Pratt 1975)

Membership in P:Agrawal, Kayal, Saxena (2002) “Primes is in P”, the algorithmruns in O(log15/2+ε n).



Extension to Optimization Problems ""bb""eebbrrr rrr

Definition

A decision problem P0 polynomially reduces to the optimizationproblem P1 if there is for P0 a polynomial-time oracle algorithm A,that means,(1) algorithm A has polynomial running time, and(2) algorithm A may use polynomially many calls to an oraclewhich delivers an optimal solution to an instance I ∈ P1, and eachoracle call has O(1) cost.

An optimization problem or decision problem P0 is calledNP–hard if every problem P1 ∈ NP polynomially reduces to P0.

This means: NP–hard problems are at least as hard as the hardestproblems in NP. But some may be harder than every problem in NP.

There are NP–hard problems which are not known to be in NP.

Example: Euclidean Steiner tree problem (Garey, Graham, Johnson 1979)



Traveling Salesman Problem (TSP) ""bb""eebbrrr rrr

TSP:Instance: A complete graph Kn on n vertices, n ≥ 3, and distances c(e) ≥ 0(rational numbers) for all edges.Task: Find a Hamiltonian cycle C of minimum length

∑e∈E(C) c(e).

Theorem

TSP is NP–hard.

Proof: Reduction of HAMILTON CIRCUIT to TSP

Given an instance of HAMILTON CIRCUIT G = (V ,E) with n nodes, wedefine the following TSP instance:

Kn with c(e) := 1 if e ∈ E and c(e) := 2, otherwise.

G has a Hamiltonian cycle if and only if the shortest tour has length n.

Hence, a single call of an oracle for TSP suffices to solve HAMILTONCIRCUIT.

Since HAMILTON CIRCUIT is NP–complete, the theorem is proved. 2


Section 3:Neighborhood-Based Approaches


Neighborhood-Based Approaches 3.1 Local Search

Neighborhoods ""bb""eebbrrr rrr

Consider an instance I of an optimization problem. Let FI bethe set of feasible solutions and objI : FI 7→ R be the objectivefunction.

Given a feasible point f ∈ FI , it is useful to define a set NI (f )of points that are “close” in some sense to the point f .

A neighborhood is a mapping NI : FI 7→ 2FI from the set offeasible points to its power set.

In principle, the mapping can be chosen arbitrarily, but ofcourse depending of the problem only certain neighborhoodswill be useful.

Later on, we will discuss some general rules of thumb how toconstruct neighborhoods.



Neighborhood Graphs ""bb""eebbrrr rrr

A given neighborhood NI (·) for an instance I of anoptimization problem induces a neighborhood graphGI = (VI ,AI ).

The vertex set VI corresponds to the set of feasible points FI .

There is an arc (x , y) ∈ AI if and only if y ∈ NI (x).

Note: In case of problems with potentially infinite solutionspaces, these graphs may be of infinite size, and the numberof arcs leaving/entering a node may also be infinite.

The neighborhood graph can be considered as an undirectedgraph if the neighborhood is symmetric, i.e. ify ∈ NI (x)⇔ x ∈ NI (y) for all x , y ∈ VI .



Global vs. Local Optima ""bb""eebbrrr rrr

Let I ∈ I be an instance:

A global minimum for I is a feasible solution s∗ ∈ FI such thatobjI (s) ≥ objI (s

∗) for all s ∈ FI .A local minimum for I is a feasible solution s∗ ∈ FI such thatobjI (s) ≥ objI (s

∗) for all s ∈ FI with (s∗, s) ∈ AI .

A neighborhood relation is called exact (with respect to aminimization problem) if every local minimum is also a globalminimum.

Local/global maxima and exactness with respect to amaximization problem are analogously defined.



General Local–Search Scheme ""bb""eebbrrr rrr

If terminating, the following general algorithmic scheme obviouslydelivers a local minimum:

Step 1: Start with an arbitrary feasible solution s∗ ∈ FI .

Step 2: While there is some s ∈ FI with (s∗, s) ∈ AI andobj(s) < obj(s∗):

a) Select one such s.

b) Let s be the new s∗.Step 3: Deliver s∗ as the final solution.

Comments:

In case of an exact neighborhood, this clearly means that thisalgorithmic scheme solves the minimization problem optimally(if terminating!).

In case FI is finite, the algorithm certainly terminates after afinite number of iterations.



Examples of Neighborhood Relations ""bb""eebbrrr rrr

Before we start with examples:

In the general definition from the previous slides, a neighborhood relationis an arbitrary graph on the solution space of an instance.

In combinatorial optimization problems, the number of arcs leaving orentering a given solution is typically tiny compared to the size of thesolution space.

Since the solution space is usually huge, this does not mean that thenumber of arcs leaving/entering a node is small in absolute terms.

Fortunately, we usually do not need to construct (or even store) thewhole neighborhood graph explicitely.

Typically, the existence of an arc (s1, s2) in AI is constituted by minormodifications, which transform s1 into s2.

−→ In the following examples, we only formulatethese minor modifications to specify AI .



Example I: Permutations ""bb""eebbrrr rrr

The feasible outputs of various problems may be interpretedas permutations of a common ground set 1, . . . , n.Examples:

sorting,

TSP.



Natural options for neighborhood relations(examples only!) ""

bb""eebbrrr rrr

Two permutations σ1 6= σ2 of 1, . . . , n are neighbored if they areidentical except for:

a single swap: (one pair of elements is exchanged):

∃ i , j ∈ 1, . . . , n, i 6= j , ∀ k ∈ 1, . . . , n\i , j : σ1[k] = σ2[k] .

−→ Symmetric neighborhood relation.

The permutations σ1 and σ2 differ by a rotational shift:∃ i , j ∈ 1, . . . , n, i < j ,

(∀ k ∈ 1, . . . , i−1, j+1, . . . , n : σ1[k] = σ2[k]) ∧(∀ k ∈ i , . . . , j − 1 : σ1[k + 1] = σ2[k]) ∧(σ1[i ] = σ2[j ]) .

−→ Asymmetric neighborhood relation.



Example II: (Euclidean) TSP ""bb""eebbrrr rrr

Remove two arcs, re–connect the tour by inserting twonew arcs, and turn some of the arcs to make the tourproperly oriented again.

⇓

?

⇓



k-opt-Neighborhood for the TSP ""bb""eebbrrr rrr

In the simple neighborhood definition on the last slide, two arcs wereexchanged.

Obvious generalization:

a fixed number k ≥ 2 of arcs is removed;k appropriate arcs are introduced to re–connect the partialtours;an appropriate selection of these partial tours is turned in orderto make the re–connected subgraph an oriented tour.

This kind of neighborhood is called k–OPT in the literature.

Consequently, the simple neighborhood from the last slide is “2–OPT”.

The size of the k-opt neighborhood is Ω(nk) for the TSP on n points.

Therefore, in practice, only 2–OPT (or 3–OPT) are usually applied.



Example III: Steiner Trees in Networks ""bb""eebbrrr rrr

Recall the Steiner tree problem:

⇒Neighborhood:

Remove a complete path between two nodes of interest(that is, branchings of the tree or terminals),

then re–connect the two resulting connected components by some newpath.

⇒ ⇒



Example IV: Bipartition ""bb""eebbrrr rrr

Problem definition:

Input: positive real numbers a[1], . . . , a[n], b.

−→ Items 1, . . . , n with sizes a[1], . . . , a[n]; capacity b.

Feasible outputs: selections

i1, . . . , ik ⊆ 1, . . . , n

of arbitrary sizes k such that a[i1] + · · ·+ a[ik ] ≤ b.

Objective: maximizing a[i1] + · · ·+ a[ik ].

Straightforward idea for a neighborhood structure:

Two feasible solutions are neighbored.⇐⇒

One can be constructed from the other one by exchanging asingle item.



Example IV: Bipartition (cont’d) ""bb""eebbrrr rrr

Problem with such a neighborhood: The number of items inthe selection cannot change by stepping from one feasiblesolution to a neighbored one.−→ GI is highly disconnected.−→ If the search happens to start in the “wrong”

connected component, it has no chance to reachthe good solutions.

Probably a better approach:Two feasible solutions are neighbored.

⇐⇒One can be constructed from the other one by inserting,

removing, or exchanging one item.

In general:It is desirable that the neighborhood graph is stronglyconnected (i.e. there is a directed path between any twovertices).



Example V: Disjoint Paths ""bb""eebbrrr rrr

Input:

a directed or undirected graph,a set of k pairs of nodes,natural numbers `1, . . . , `k .

Pairs of nodes:

ordered pairs (s1, t1), . . . , (sk , tk) in case of directed graphs,unordered pairs s1, t1, . . . , sk , tk in case of undirectedgraphs.

Feasible output: a set of paths such that

every path runs from some si to its mate ti ,at least `i paths run from si to ti , andall paths altogether fulfill some disjointness condition.

Objective:

realize a maximum number of disjoint paths



Disjointness Conditions ""bb""eebbrrr rrr

Edge/arc–disjointness: No two paths share an edge/arc.

Node–disjointness: No two paths share a node except for,possibly, common endnodes (internally node–disjoint).

Capacities:

Additional input: Each edge/arc is assigned a nonnegativeintegral capacity value.Additional side constraint for each edge/arc:

The number of paths using this edge/arc must not exceed thisedge’s/arc’s capacity value.



Neighborhood for Disjoint Paths? ""bb""eebbrrr rrr

Discussion:

This is not a good example of neighborhood structures. In fact, it wasinserted into this list of examples to serve as a counter–example.

What’s wrong:Seems that there is no appropriate neighborhood structure.

Straightforward ideas for neighborhood structures: two feasible solutionare neighbored ⇐⇒

a subpath of one path is changed, orsubpaths of a few, mutually involved paths are exchanged, orsome edges change the path to which they belong.

Why not too promising:

It is very likely that no strict improvement step is possible:You may need many rearrangements of paths, until you canincrease the number of paths by one.

−→ Chances are high that the local search is over aftervery few (maybe zero) steps.



Example VI: Partial Consideration of Constraints ""bb""eebbrrr rrr

Sometimes the number of side constraints is way too large toconsider all constraints simultaneously.

In some modeling approaches, the number of side constraintsmay even be infinite.−→ Cf. Slides nos. 35 ff.

In cases like these, one can try to approximate the optimalsolutions by (typically infeasible) “solutions”:

Every finite subset of the set of all side constraints constitutesa certain (potentially infeasible) “solution”.Try to find one of these “solutions” S such that S isacceptably close to at least one of the actual optimal solutions.



Partial Consideration of Constraints (cont’d) ""bb""eebbrrr rrr

For an instance I ∈ I let C(I ) denote the set of constraints.

For each finite set C ⊆ C(I ) of constraints, let SC denotesome arbitrary yet fixed optimal solution to C .

Intent:

Suppose we have a solver for the case that the number ofconstraints is finite.For C ∈ C(I ), SC is then the solution delivered by this solver.

This constitutes an alternate set

F ∗(I ) = SC : C ⊆ C(I ); |C | <∞

of “feasible” solutions.

Note: F ∗(I ) ∩ F (I ) = ∅ is possible.



Partial Consideration of Constraints (cont’d) ""bb""eebbrrr rrr

Example:

Input: n ∈ N, c, w ∈ Rn, r ∈ R.

Ground set for the feasible outputs: Rn.

Side constraints: for each v ∈ Qn such that ‖v‖2 = 1, a side constraintvT (x − c) ≤ r .

−→ Describes a ball of radius r around c.

Objective: maximizing wT x (a linear function).

A finite selection of side constraints amounts to a finite selectionv1, . . . , vk of vectors v :

∀i ∈ 1, . . . , k : vTi (x − c) ≤ r .

If w 6∈ v1, . . . , vk, SC is unique.

If w 6∈ v1, . . . , vk, SC is not in this ball.

−→ The last two statements left as an easy exercise in linear algebra.



Polynomial Approximation Revisited ""bb""eebbrrr rrr

Cf. Slides nos. 37 ff.

A finite selection of side constraints amounts to a finiteselection of values x1, . . . , xn ∈ [a . . . b].

Resulting finite set of side constraints:

∀j ∈ 1, . . . , n :∣∣∣ f (xj)−

k∑i=0

u[i ] · x ij∣∣∣ ≤ z .



Neighborhoods for Selections of Side Constraints ""bb""eebbrrr rrr

On finite selections of side constraints, one can easily definevarious neighborhood relations.

Simple example:

Two sets of side constraints are neighbored if oneset is constructed from the other one by inserting,removing, or exchanging exactly one side constraint.

Example polynomial approximation:“exchanging” means that one xi is moved to another positioninside [a . . . b].


Neighborhood-Based Approaches 3.2 Exact Neighborhoods

Exact Neighborhoods ""bb""eebbrrr rrr

From Slide no. 95, recall that the primitive local–searchscheme delivers a global optimum in case of an exactneighborhood.

Here, we will consider a few examples of exact neighborhoods.



Example VII: Convex Programming ""bb""eebbrrr rrr

Recall the problem definition: Convex programming meansthe restriction of the general optimization problem tominimization problems in which

each F (I ), I ∈ I, is a convex subset of some space Rn andthe objective function to be minimized is also convex.

For each positive integer n, letdn : Rn × Rn −→ R+

0 be some distance function on the spaceRn (e.g. the Euclidean distance) andrn > 0 be some positive real number.

Given an instance I with F (I ) ⊆ Rn, we specify for alls1, s2 ∈ F (I ):

(s1, s2) ∈ A :⇐⇒ dn (s1, s2) ≤ rn .

Claim: This neighborhood relation is exact.−→ Proof left as an easy exercise in basic calculus.



Example VII: Matching ""bb""eebbrrr rrr

Problem definition:

Recall this problem from Slides nos. 44 ff.

Details reviewed:

Input: an undirected graph G = (V ,E ).Feasible output: a set M ⊆ E such that no two edges in Ehave an incident node in common.

←→ M is called a matching of G .Objective: maximizing |M|.

In the following, M denotes an arbitrary but fixed matching insome graph G = (V ,E ).

An edge is said to be matched if it is a member of M.

A vertex is free or exposed if it is not incident with a matchededge.



Alternating Paths ""bb""eebbrrr rrr

A path (or cycle) p in G is called elementary (or simple) ifeach of its vertices appears only once.

An elementary path (or cycle) p in G is called alternating(with respect to M) if exactly every second edge of p belongsto M.−→ In other words, the edges of M appear on p in analternating fashion.

More specifically: Let e1 − e2 − e3 − · · · − ek−1 − ek be theedges on p in the order in which they appear on p.

Either we have ei ∈ M for all odd i ∈ 1, . . . , k and ei 6∈ Mfor all even i ∈ 1, . . . , k.Or we have ei ∈ M for all even i ∈ 1, . . . , k and ei 6∈ M forall odd i ∈ 1, . . . , k.



Structure of Alternating Paths ""bb""eebbrrr rrr

−→ Obviously, there are four fundamentally different shapes ofalternating paths, which are represented by these four smallexamples.



Neighborhood Relation for Matching ""bb""eebbrrr rrr

symmetric

difference

M 2M1

Definition

Two matchings M1 and M2 in an undirected graph G = (V ,E ) areneighbored if the symmetric difference

M14M2 = (M1 \M2) ∪ (M2 \M1)

is a single alternating path of both M1 and M2.



Neighborhood Relation for Matching ""bb""eebbrrr rrr

Lemma

This neighborhood relation for matchings is exact.

Proof:

For convenience, we will identify a path with its set of edges.

Let M1 and M2 be two matchings such that |M2| > |M1|.We have to show: In this case there is an alternating path p for M1 suchthat |(M14 p)| > |M1|.Clearly, at most one edge of each of M1 and M2 is incident to a givennode.

−→ Every node has degree at most two in the symmetric differenceM14M2.

−→ M14M2 decomposes into elementary paths and cy-cles, which are all alternating for both M1 and M2.

Since |M2| > |M1|, at least one of these paths p, say, must contain morearcs from M2 than from M1.

Clearly, this is a path p as desired. 2c©2006 M. Muller-Hannemann & K. Weihe Algorithmics - TU Darmstadt Optimization Algorithms 116


Local Search for Matching ""bb""eebbrrr rrr

Remarks:

The matching problem is a maximization problem, and theobjective value obj(M) of a matching M is the number ofedges in M.Recall the loop in the local search scheme on Slide no. 95. LetM∗ = s∗ be the current solution, and M = s be a neighboredsolution.This means that M∗4M is an alternating path, which hasodd length and both endnodes are exposed (w.r.t. M∗!).The latter paths are called M∗–augmenting paths.Exactness of the neighborhood relation for graphs is thereforea reformulation of a famous Theorem of Berge:

Theorem (Berge (1957))

Let G be a graph with some matching M. Then M is maximum ifand only if there is no M-augmenting path.



Local Search for Matching ""bb""eebbrrr rrr

A start solution for the local search scheme on Slide 95 is easyto find: the empty matching.

At first glance, it also seems easy to find an augmenting path:

Find an unmatched node v ∈ V , that is, a node that is notincident to any edge of the current matching s∗.

Determine all nodes w ∈ V such that there is an alternating(v ,w)–path.If at least one of these nodes w is unmatched, the symmetricdifference of the current matching and this (v ,w)–path is amatching with one more edge.

v



Searching for Augmenting Paths ""bb""eebbrrr rrr

Clearly, we cannot enumerate all possible alternating pathsthat start with v , because their number may be exponentiallylarge (−→ left as an exercise).

Each of the common efficient search strategies (depth–first,breadth–first, ...) determines a tree T , which clearly containsat most one (v ,w)–path for each node w .

However, it can be seen (formal details omitted; see thepicture below for an intuition) that, possibly, alternating pathsto some nodes w (and thus the nodes themselves) may bemissed.

v w

⇒v w

!T



Searching for Augmenting Paths (cont.) ""bb""eebbrrr rrr

Embarrassingly, this gap has not been closed for about ten years (from themid–fifties to the mid–sixties), until Jack Edmonds had a brilliant insight.

Fortunately, there is an efficient (yet complicated) “workaround”.

Precise problem: cycles of odd length such that

a maximal number of edges on this cycle belong to thematching,

the remaining node is also matched, and

tree T enters the cycle via this node.

−→ Called a blossom in the literature.

v w

!T



Blossom Shrinking ""bb""eebbrrr rrr

Crucial insight (proof omitted):

Suppose that we search encounter a blossom during asearch from some exposed node v .If we “shrink” the blossom to a single node(pseudonode), then:

There is an augmenting (v ,w)–path in the originalinstance equipped with the original matching.

⇐⇒There is an augmenting (v ,w)–path in the reducedinstance equipped with the correspondingly reduced

matching.

v w

⇒ v w



Edmonds’ Matching Algorithm (Sketch) ""bb""eebbrrr rrr

Since we cannot determine a better neighbored solution in a reasonableway, we cannot apply the “pure” local–search scheme.

However, we can modify the loop on Slide no. 95:

We search for an alternating path by growing a set ofalternating trees (alternating forest).

If we detect a blossom, we shrink this blossom into apseudonode and continue with the resulting graph andmatching.

If we find an alternating path p connecting two unmatchednodes, we replace the current matching M by M4p, expandpseudonodes on this path (recursively), and continue as in theregular local–search scheme.

Finally, if neither case applies, we have found a maximalmatching in the shrunken graph (proof omitted).

At the very end, all shrinking operations are undone and the matching isextended to all blossom edges.



Example VIII: Minimum Cost Flows ""bb""eebbrrr rrr

Input:

a directed graph D = (V ,A);lower and upper capacity values 0 ≤ `[a] ≤ u[a] ∈ R and acost factor c[a] ∈ R for each arc a ∈ A;a balance value b[v ] ∈ R for each node v ∈ V .

Desired output: a flow value f [a] ∈ R for each arc a ∈ A such that

`[a] ≤ f [a] ≤ u[a] for each arc a ∈ A (capacity constraints) andfor each node v ∈ V :∑

w∈V(v,w)∈A

f [(v ,w)] −∑w∈V

(w,v)∈A

f [(w , v)] = b[v ] .

(flow balance constraints)

Objective: minimizing∑a∈A

c[a] · f [a] .

If b ≡ 0, flows are called circulations.



Neighborhood Relation for Min-Cost-Flows ""bb""eebbrrr rrr

In the following, paths and cycles

are defined as sets of (directed) arcsand may contain arcs in forward and backward direction(−→ weak paths or cycles, respectively).

Let (D, `, u, b, c), D = (V ,A), be an arbitrary, yet fixed instance.

Neighborhood relation:

Let f1 and f2 be two feasible flows for (D, `, u, b, c).Then f1 and f2 are called neighbored if they only differ on oneelementary weak cycle.

Example of two neighbored circulations (b ≡ 0):

2

1

3

0

1

2

f f1

flow values

1

1

2

2

01

2 1f −f

2



Neighborhood Relation for Min-Cost-Flows ""bb""eebbrrr rrr

Lemma

This neighborhood relation for flows is exact.

Proof: On the next few slides (until Slide no. 132).

Remark:

The application of the algorithmic scheme from Slideno. 95 to the min–cost flow problem on thisneighborhood relation is called the negative–cyclecanceling algorithm in the literature.



Augmenting Paths ""bb""eebbrrr rrr

Consider some elementary path or cycle p in G .Let Aforw(p) and Abackw(p) denote the forward andbackward arcs of p, respectively.

Then p is called augmenting with respect to (w.r.t.) flowvalues f : A −→ R, lower bounds ` : A −→ R, and upperbounds u : A −→ R if

f [a] < u[a] for all a ∈ Aforw(p) andf [a] > `[a] for all a ∈ Abackw(p).

Two weak paths (or cycles) p1 and p2 are consistent if

Aforw(p1) ∩ Abackw(p2) = ∅ andAbackw(p1) ∩ Aforw(p2) = ∅.



Negative Cycles ""bb""eebbrrr rrr

Flows are considered as vectors on arcs. This means thataddition of two flows is done component-wise.

For a weak cycle p and ε > 0, ε · p denotes the vector fdefined by

f [a] = +ε for a ∈ Aforw(p),

f [a] = −ε for a ∈ Abackw(p),

f [a] = 0 for a ∈ A \ p.

A cycle p is called negative w.r.t. some cost functionc : A −→ R if ∑

a∈Aforw(p)

c [a] −∑

a∈Abackw(p)

c [a] < 0 .

−→ If, and only if, ε · p has negative cost w.r.t. c for ε > 0.



Flow Decomposition ""bb""eebbrrr rrr

Lemma

For two feasible flows f1 and f2 for (D, `, u, b, c), there are somenonnegative integer k and weak cycles p1, . . . , pk such that:

There are values ε1, . . . , εk > 0 such that

f2 = f1 + ε1 · p1 + · · ·+ εk · pk .

For each cycle pi , i ∈ 1, . . . , k, we have

f2[a] > f1[a] for all a ∈ Aforw(pi ) andf2[a] < f1[a] for all a ∈ Abackw(pi ).

−→ In particular, p1, . . . , pk are pairwise consistent.

Proof: By induction on the number of arcs with f1[a] 6= f2[a].Details are left as an exercise.



Flow Decomposition ""bb""eebbrrr rrr

Example:

20

4 4

2

33

6

3

4

2

flow values

7

3

Decomposition of the flow into three cycles.



Exactness of the Min-Cost-Flow Neighborhood ""bb""eebbrrr rrr

Proof (of the exactness):

Suppose that f1 is not optimal.

−→ To prove the claim, it then suffices to show that thereis some negative cycle p that is augmenting w.r.t. f1,the lower bounds `, and the upper bounds u.

Suppose that f2 is optimal.

−→ The cost of f2 is strictly smaller than the cost of f1.

−→ Among the cycles p1, . . . , pk that are guaranteed bythe flow decomposition lemma, at least one must benegative.

Let pi denote this cycle and εi its multiplicity in the flow decomposition.

Since pi is augmenting, f1 + εi · pi is obviously feasible.

In summary, pi is a cycle as desired. 2



Concluding Remarks on ExactNeighborhood Relations ""

bb""eebbrrr rrr

If a neighborhood relation is not exact, there is still some(heuristic!) hope that the solution from the algorithmicscheme on Slide no. 95 is not “too bad”.

However, the search will always be “trapped” in a localoptimum “near” the start solution.

Such a local optimum may be very bad compared to theoverall global optimum.

In the following, we will discuss a couple of heuristictechniques to let the search “escape” from local optima.


Neighborhood-Based Approaches 3.3 Heuristic Local–Search Techniques

Why Heuristic Local Search? ""bb""eebbrrr rrr

In general, the local–search scheme from Slide no. 95 onlyguarantees termination at a local optimum.

Clearly, a local optimum may be much worse than a globaloptimum.

It is easy to construct examples where almost all local optimaare nearly worst–case solutions.

Therefore, the pure local–search scheme from Slide no. 95 isnot a good optimization algorithm in general.



General Ideas for Improving the Scheme ""bb""eebbrrr rrr

Modifications:

Occasionally accept forward steps from one feasible solutionto the next one even if the new solution is worse.

Perform several forward steps and select the best intermediateresult as the starting point for the next iteration of thewhile–loop.

Repeat local–search runs from many different start solutions.

Run several local–search sequences simultaneously.



Remarks on the Presentation ""bb""eebbrrr rrr

In this lecture, many details of the various algorithmicapproaches are left open.−→ This means that these details are degrees of freedom.

−→ To be specified by designers of concrete algorithmicapproaches for concrete optimization problems.

The list of presented techniques is not at all exhaustive, butthe presented taxonomy should cover all techniques.

Each technique is only briefly touched upon.−→ Systematology is preferred over details.−→ Details can afterwards be looked up in the literature.

The terminology and taxonomies used in the literature are notat all standardized.−→ No guarantee that the presentation here perfectly

conforms to all presentations in the literature.


Neighborhood-Based Approaches 3.4 Simulated Annealing

What is Simulated Annealing? ""bb""eebbrrr rrr

Also known as the Metropolis algorithm.

Main differences to the pure local–search scheme:

In Step 2 of the local–search scheme on Slide no. 95, choosethe neighbor s of s∗ at random.

−→ It may be obj(s) ≥ obj(s∗)!

If obj(s) < obj(s∗), s replaces s∗ as the current feasiblesolution as before.

In case obj(s) ≥ obj(s∗), a probabilistic yes/no decision ismade.

If the outcome is “yes,” s replaces s∗ (s is now called s∗).

Otherwise, the current feasible solution is not changed in thisstep.



Modified Local–Search Scheme ""bb""eebbrrr rrr

1. Start with an arbitrary feasible solution s∗ ∈ FI .

2. While a certain termination condition has not yet beenreached:

a) Randomly select some s ∈ F (I ) such that (s∗, s) ∈ AI .

b) If obj(s) < obj(s∗), let s be the new s∗.

c) Otherwise:

Make a probabilistic yes/no decision.

If the outcome is “yes,” let s be the new s∗.

Otherwise, do nothing (leave s∗ as is).

3. Deliver the best solution seen so far as the final solution.



Probabilistic Yes/No Decision ""bb""eebbrrr rrr

In principle, this means a “biased coin–flipping” experimentwhere the head and the tail of the coin may occur withdifferent probabilities. On a computer, this amounts toapplying a random number generator.Random number generator: A deterministic number generatorthat simulates a non–deterministic choice of numbers.In the simulated–annealing algorithm, the probability of “yes”is determined by the so–called temperature T > 0: Forobj(s) ≥ obj(s∗), the probability of “yes” is

exp

(obj(s∗)− obj(s)

T

).

Observation: Since obj(s) ≥ obj(s∗) and T > 0, this is indeeda probabilistic decision, that is,

0 ≤ exp

(obj(s∗)− obj(s)

T

)≤ 1 .



Cooling Schedule ""bb""eebbrrr rrr

Open questions:

how to define T ;how to define the termination condition.

−→ Form the cooling schedule as defined below.

Cooling schedule:

A finite sequence T1 > T2 > T3 > · · · > Tk > 0 oftemperature values is defined.For i ∈ 1, . . . , k, a positive integral sequence length ni isdefined in addition.

Application of the cooling schedule:

For i = 1, 2, . . . , k , exactly ni iterations of thewhile–loop on Slide no. 138 are run with Ti beingthe temperature value.



Termination Criteria ""bb""eebbrrr rrr

Often, further termination criteria are applied.

Examples:

termination as soon as “nothing significant has changed” for awhile;termination after a fixed amount of CPU time.

This allows to stop the procedure prematurely in case thecooling schedule was chosen too time-consuming.

Of course, this will usually compromise quality of the result.



Convergence of Simulated Annealing ""bb""eebbrrr rrr

Consider an optimization problem with a strongly connectedneighborhood GI for each instance I ∈ I.

First consider the (unrealistic) variant of simulated annealing where

the temperature value T > 0 is constant and(the unrealistic part:) the number of iterations of the loop isinfinite.

For technical reasons, we also assume that we stay at each stage with aprobability of at least 1

2.

For this variant, the following fact can be shown for any feasible solutions and any T > 0:

For k →∞, the probability PkT (s) that s is the currentfeasible solution in step no. k of the search converges to

limk→∞

PkT (s) =exp(−obj(s)/T )∑

t∈FI

exp(−obj(t)/T ).

−→ Independent of the choice of the very first start solution!



Convergence of Simulated Annealing (cont’d) ""bb""eebbrrr rrr

Proof: Via Markov chains and ergodic theory.−→ See lectures on probability theory or stochastic processes.

Omitted here.

Consequence of this mathematical fact:

For a feasible solution s, for T 0 and for k →∞, theprobability PkT (s) converges to 0 in case s is not anoptimal solution:

limT0k→∞

PkT (s) = 0 .

−→ Proof of the consequence on the next slide.




Proof of the consequence:

Consider two feasible solutions s1 and s2 such that obj(s1) < obj(s2).

Then we have PkT (s1)

PkT (s2)k→∞−→ exp(−obj(s1)/T )

exp(−obj(s2)/T )

= exp

(obj(s2)− obj(s1)

T

)T0−→ ∞ .

The last limit is due to the assumption obj(s2)− obj(s1) > 0.

Since PkT (s1) is a probability value, we have PkT (s1) ≤ 1.

Thus, convergence to “∞” is only possible if

limT0

limk→∞

PkT (s2) = 0 .




Interpretation of the consequence:

If we were able to run simulated annealing

with an infinite number of temperature valuesT1 > T2 > T3 > · · · −→ 0 and

for each temperature value Ti with an infinite number ni ofsteps,

then we reach (and stay on) the optimal solutions at the veryend ...

... whatever the meaning of “end” might be here.

Note: From the formula for PkT (s) on Slide no. 142,

one cannot only conclude that the optimal solutions becomeoverwhelmingly likely when T is decreased (as we did),

but one can analogously conclude that, in general, a bettersolution is more likely than a worse solution.



How to Choose the Cooling Schedule? ""bb""eebbrrr rrr

Certain mathematical arguments suggest: a logarithmicscheme

Tk :=C

log k,

for some constant C makes a good cooling schedule.

The depth of a local minimum α is the size of thedeterioration that is sufficient for leaving α.

The initial temperature should be at least as large as themaximum depth of a local minimum.−→ Choice of the constant C above.

Such a scheme allows one to prove upper bounds on theconvergence rate.

For example, applied to the TSP one can prove that onereaches an optimal state with probability of at least 3

4 after anexponential number of iterations.



Discussion of Simulated Annealing ""bb""eebbrrr rrr

Simulated Annealing is quite a popular method.

Why:

A first prototype is easy to implement.Only little mathematical background is required from theprogrammer.Has the potential to provide feasible solutions of high quality.The name is “cool”!?

Problems:

Often, reasonable solutions can only be achieved at the cost ofan enormous computational effort.No quality guarantees at all.Typically, a lot of experimental work is needed to adjust theparameters for a particular problem.



Background of Simulated Annealing ""bb""eebbrrr rrr

Annealing: In chemistry and chemical engineering the process of coolingheated material.

The annealing should not produce cracks and fissures in the material.

In physical terms:

Cracks and fissures mean that the remaining potential energyinside the material is high.The material always assumes a local minimum of potentialenergy when cooled down.

The warmer the material,

the higher the chances that cracks and fissures occur, but alsothe higher the chances that those cracks and fissures areclosed again.

−→ That the material escapes from a bad local minimum.

The formula used for the probabilistic decision has originally beeninvented to describe physical processes like cooling.


Neighborhood-Based Approaches 3.5 Feature-based local search

Feature-based Local Search ""bb""eebbrrr rrr

What is feature–based local search?

In typical optimization problems, the feasible solutions to an instance areformed by features.

More specifically:

For an instance, there is a finite ground set of features.The feasible solutions to this instance are certain subsets ofthis ground set.

−→ Selections of features.

Concrete examples of features: −→ On the next few slides.

Remark:

Features ≡ dimensions: if the feasible solutions are elements ofsome space 0, 1n, the n dimensions may be interpreted asfeatures.Here we follow the terminology from the literature onlocal–search algorithms and speak of “features” rather thandimensions.



Concrete Examples of Features ""bb""eebbrrr rrr

TSP:

Recall the TSP from Slide no. 14.There the feasible solutions to a TSP instance on n pointswere encoded as certain (n × n)–matrices X with 0/1–entries.Semantics: X [i , j ] = 1 means that point no. j immediatelyfollows point no. i cyclically on the round tour encoded by X .Then the features are the pairs (i , j) for i , j ∈ 1, . . . , n.

Matching:

Recall the matching problem from Slide no. 44.Here the edges of the input graph are the features.



Concrete Examples of Features (cont’d) ""bb""eebbrrr rrr

Set covering:

Input: a finite ground set F and a collection S of subsets of F .Feasible output:a subset S ′ ⊆ S of S such that F is covered by S ′:

∀ x ∈ F ∃ s ∈ S ′ : x ∈ s .

Objective: minimizing |S ′|.

Features: the elements of S.




unit edge weights cut of size 7 cut of size 9

Max–cut:

Input: an undirected graph G = (V ,E ) and an edge weightingc : E −→ R.Feasible output: any subset W ⊆ V .Objective:maximizing

∑e∈EW

c[v ,w] where

EW = v ,w ∈ E | v ∈W , w ∈ V \W .Features:the nodes (v selected as a feature means v ∈W ).




Coloring:

Input: an undirected graph G = (V ,E ).Output: an assignment C : V −→ N of a positive integralnumber (“color”) to each node.Feasible: if C [v ] 6= C [w ] for all v ,w ∈ E .Objective: minimizing maxC [v ] | v ∈ V .

Features: pairs (v , n) such that v ∈ V and n ∈ N.

Remark:

In principle, the set of features is infinite in this example.However, obviously, maxC [v ] | v ∈ V ≤ |V | for any optimalsolution.Consequently, the assumption that 1, . . . , |V | is the (finite)ground set of colors does not reduce generality.



Choice of the Feature Set ""bb""eebbrrr rrr

Note: the features of the instances of a problem depend onthe chosen problem formulation.

Example: if the feasible solutions to a general TSP instanceon n objects o1, . . . , on are regarded as permutations of theobjects,

then the canonical features are again pairs (i , j) withi , j ∈ 1, . . . , n,but now (i , j) stands for oi being at position j .

In typical (yet not all) optimization problems,

each feature is assigned a feature cost andthe cost of a solution is the sum of the costs of the featuresthat make up the solution.



Are all neighborhood relations feature–based? ""bb""eebbrrr rrr

Question:Are all neighborhood relations feature–based?

Answer:Definitely not.

Abstract counter–example:

All algorithmic problems in which the number of feasible solutionsto an instance may be infinite.

−→ Not representable as subsets of a finite set.



Feature–Based Local Search ""bb""eebbrrr rrr

Idea of feature–based local search:

penalize certain features orforbid changes of certain features

to let the local search escape from a local optimum.

In the following, we will briefly touch upon the mostprominent examples of both options:

Guided local search:

−→ Penalizes certain features to drive them out of thecurrent solution.

Taboo search:

−→ Forbids inserting/removing certain features in/fromthe current solution.



Useful Terminology ""bb""eebbrrr rrr

Consider an instance I of an optimization problem.

Again, let FI denote the set of all features of the instance I .

For a feature x ∈ FI let C [x ] denote the feature cost.

A feasible solution S to I can be identified with the setF (S) ⊆ FI of the features that make up S .

In optimization problems in which the cost of a solution is thesum of the costs of the selected features, the cost of solutionS may then be rewritten as

C [S ] =∑

x∈F (S)

C [x ] .



Guided Local Search ""bb""eebbrrr rrr

In principle, this is the general local–search scheme from Slideno. 95.

Crucial difference: something different happens whenever thesearch runs into a local optimum (not just termination).

Handling a local optimum:

The algorithm examines all features that make up the localoptimum.For each of these features a “utility of penalization” isdetermined.One or more features with the highest “utility of penalization”are penalized.The penalty is so large that the current solution is not a localoptimum anymore.Then the local search continues as usual.

What does “penalized” and “utility of penalization” mean?−→ On the very next slide.



Penalization ""bb""eebbrrr rrr

“Penalized”:

The feature cost of a feature is increased by some value(the penalty).

“Utility of penalization”:

The “utility of penalization” of a feature is an estimation howpromising it would be to penalize this feature.

Ideas for such an estimation:

If the original cost value of a feature is high, it might bepromising to drive it out of the solution by penalizing it.On the other hand, if a feature has often been penalized and isagain in the current solution, it might not be too promising topenalize it yet another time.

How to resolve this conflict?

−→ On the next slide.



Resolving the Conflict ""bb""eebbrrr rrr

The approach on this slide is but an example.

It is taken from the literature and often used in practice.

For each feature x ∈ FI , the number of search steps in which x has beenpenalized so far is counted in an additional variable px .

For the j–th penalization of feature x , a penalty value λxj is defined.

The cost of a feasible solution S to I (incl. all penalties) is then

C [S ] :=∑

x∈F (S)

(C [x ] +

px∑j=1

λxj

).

The utility of penalization of feature x ∈ F (S) is then defined assomething like

C [x ]

1 + px.

−→ Conflict resolved.



Taboo Search ""bb""eebbrrr rrr

This is another variant of the general local–search scheme from Slideno. 95.

Fundamental difference:

Taboo search always moves on to the neighbor of minimal cost.Unlike local search, it does so even in case the current solutionis a local optimum.

−→ In such a case, the move step causes a deterioration.In order to terminate the algorithm, an additional, externalstopping criterion must be incorporated(e.g. termination after a certain number of steps).

Problem: After escaping from a local minimum by a neighborhood step,the algorithm is very likely to return to this local minimum very soon.

Potential consequence whenever this problem occurs:an infinite loop on a few feasible solutions.



The “Taboo” in Taboo Search ""bb""eebbrrr rrr

To avoid such an infinite loop, the most recent changes of features are“kept in mind” by the search for a number of search steps.

Change of a feature x ∈ FI on the move from a feasible solution sk to theneighbored sk+1 means:

x ∈ F (sk) \ F (sk+1) or x ∈ F (sk+1) \ F (sk).

As long as a change is “kept in mind,” the search must not undo thischange.

−→ Must not drop a feature recently inserted nor re–inserta feature recently dropped.

Typically (yet not exclusively),

a number k of steps is globally pre–defined andeach change is removed from the taboo list after k steps.

−→ The taboo “expires” after k steps.



Examples for Taboo Search ""bb""eebbrrr rrr

TSP:If a pair (i , j) was inserted in (resp. removed from) the tour, itmust not be removed (inserted) again until the taboo expires.

Coloring:If a node v was de–assigned a color c , v must not bere–assigned c again until the taboo expires.



Aspiration ""bb""eebbrrr rrr

Practical experience with taboo search has shown that it is good

to occasionally drop all taboos, that is,to continue the taboo search with a cleaned list of taboos(no more taboos “in mind”).

Clearly, the search immediately resumes recording new taboos after sucha refresh.

A condition that forces the taboo search to drop all taboos is called anaspiration criterion in the literature.

Typical example: if the current solution is the best solution seen so far.

Aspiration is but one (however, the most fundamental) improvementtechnique for taboo search.


Neighborhood-Based Approaches 3.6 Kernighan-Lin

Approaches of the Kernighan-Lin Type ""bb""eebbrrr rrr

What does that mean?

Until now, we exclusively focused on approaches that realizethe first idea on Slide no. 135:

Occasionally accept forward steps ... even ifthe new solution is worse.

Next we turn our attention to the second idea:Perform several forward steps and select thebest ... result ...

A first variant of the latter technique has been developed inthe early seventies by two researchers, Kernighan and Lin.



General Concept ""bb""eebbrrr rrr

Again, see Slide no. 95 for the terminology used on this slide.

In each iteration of the while–loop in the local–search scheme,the search constructs a certain path in the neighborhoodgraph.

This path starts with the current solution s∗.

The best solution s 6= s∗ on this path is selected as the newcurrent solution s∗ (variable depth search).

Options in case s is worse than the old s∗:

s is nonetheless accepted orthe algorithm terminates orthe algorithm tries to find another path.

In the literature, the first option is typically applied.



Variable Depth Search ""bb""eebbrrr rrr

Suppose we have constructed a path P = s∗ = p0, p1, . . . , pn of feasiblesolutions pi .

The change of the objective function value caused by the step from pi−1

to pi is the “gain” gi = obj(pi )− obj(pi−1).

The cumulative gain after t steps on the path P is

G(t) =t∑

i=1

gi .

Recall that we here consider a minimization problem. We choose our nextcandidate solution s as s = pk where k is the index which minimizes G(·),i.e.,

G(k) = minG(t) | 1 ≤ t ≤ n.G(t)

t steps1 n

k



Construction of the Path ""bb""eebbrrr rrr

Of course, the path is constructed incrementally, arc by arc,starting at s∗.

Clearly, there are many good and bad strategies

to choose the next arc in each step of this incrementalconstruction andto terminate the construction of the path.

Standard procedure from the literature:

The strategy (for both aspects simultaneously) in theliterature is based on ideas very similar to taboo search.

Therefore, it applies to feature–based optimization problemsin the first place.



Details of the Standard Procedure ""bb""eebbrrr rrr

A taboo list is maintained (and permanently expanded)throughout the construction of the path.

At the beginning of the path construction, the set of taboos isempty.

During the path construction, taboos are never removed fromthe set.

An arc in the neighborhood graph whose insertion wouldviolate the taboos is regarded as temporarily infeasible.

In each step of the path construction, the best feasible arc ischosen.

The construction of the path terminates as soon as it arrivesat a node of the neighborhood graph all of whose out–goingarcs have become infeasible.



Examples of Kernighan-Lin Approaches ""bb""eebbrrr rrr

TSP:

Consider the neighborhood structure visualized on Slide no. 99.Whenever an arc is removed from the round tour, itsre–insertion is becoming a taboo.

Max–cut:

From Slide no. 152 recall the max–cut problem.Various neighborhood structures could be defined based oninserting nodes into W and removing nodes from W .In any such case, we could taboo the re–insertion of a removednode and the removal of an inserted node.



Kernighan-Lin: what’s in a name? ""bb""eebbrrr rrr

Actually,

these two guys never described the approach in full abstractgenerality,but only presented concrete instances of this technique for twoconcrete problems: TSP and max–cut.

These two instances are commonly called the Kernighan-Linalgorithm and the Lin-Kernighan algorithm in the literature.

The term, “approaches of the Kernighan–Lin type,” is notcommon in the literature.

It is chosen in this lecture in honor of these two pioneers ofheuristic algorithms.


Neighborhood-Based Approaches 3.7 Iterated Local Search

Iterated/Chained Local Search ""bb""eebbrrr rrr

cost

perturbation

solution space

s* s’ = LocalSearch(s)s

Step 1: Start with an arbitrary feasible solution s∗ ∈ FI .

Step 2: WHILE termination condition not met DOa) s = Perturbation(s∗);

b) s’ = LocalSearch(s);

c) s* = AcceptanceCriterion(s∗,s ′);

Step 3: Deliver s∗ as the final solution.


Neighborhood-Based Approaches 3.7 Iterated Local Search

Iterated/Chained Local Search ""bb""eebbrrr rrr

Explanation:

Perturbation(s∗) modifies the current solution and deliversa feasible intermediate s.LocalSearch(s) can be any algorithm which gets a feasiblesolution as its input and delivers a feasible solution as itsoutput.AcceptanceCriterion(s∗,s ′) decides whether we accept s ′

as our new solution or stay at the previous solution s∗.

Remarks:The Perturbation should neither be too small nor too large:

If it is too small, one will often fall back to the previous localoptimum.If it is too large, than the intermediate step will be almost arandom intermediate solution. In this case, the algorithm willbehave similar to a random restart type algorithm.

The LocalSearch algorithm can be given as a “black box”.c©2006 M. Muller-Hannemann & K. Weihe Algorithmics - TU Darmstadt Optimization Algorithms 171

Neighborhood-Based Approaches 3.8 Population-Based Approaches

Multi-Start Local Search ""bb""eebbrrr rrr

All variants of local search considered so far only apply onerun of the search.

Problem:

Chances are high that the search will never leave a smallsubspace of the solution space.The really good feasible solutions may be somewhere else inthe search space.

Simplest imaginable idea:

Generate a set of feasible solutions (e.g. randomly).Start a local search (or simulated annealing, taboo search,whatever) from each of them.Deliver the best feasible solution seen by any of these searches.



Population-Based Strategies ""bb""eebbrrr rrr

Multi-start is the simplest population-based strategy: apopulation of independent individuals, no competition, nocooperation.

In the following, we will consider several models:

Competition: survival of the fittest (evolution strategies).

Recombination (genetic algorithms).

“Social” collaboration (ant colony).



Survival of the Fittest ""bb""eebbrrr rrr

Generate a set of feasible solutions much like on the last butone slide.

Like in multi-start, start a local search from each of them.

Difference:

All of these searches are performed in “rounds”:simultaneously in a pseudo–parallel fashion.

Always the better searches survive, and the best ones produce“offspring”.

Details of rounds: next slide.



A Single Round ""bb""eebbrrr rrr

Each of the searches performs one move step in theneighborhood graph.

Then the current solutions (the population) of all of thesesearches are compared with each other.

The searches with the worst current solutions are dropped.

Each of the best current solutions S is “forked” into at leasttwo searches.

−→ Start solution(s) of the new search(es) (a.k.a. off-spring):feasible solution(s) neighbored to S .



Evolutionary Algorithms ""bb""eebbrrr rrr

Also known as evolution strategies.

This is the most prominent variant of “survival of the fittest”.

It aims at simulating asexual biological evolution:

“Fit” members of the population are able to reproduce byduplication.

−→ The “fitter” the member, the higher itsprobability of duplicating.

The offspring are randomly chosen from the “parents”neighborhood.

Members of the population are randomly killed by thecircumstances.

−→ The “fitter” the member, the smaller itsprobability of being killed.



Evolutionary Algorithms (cont’d) ""bb""eebbrrr rrr

Like “survival of the fittest” (but unlike biological evolution), the processis organized in rounds.

One round:

A certain number of members of the population are selectedrandomly with a probability that is monotonously increasingwith their cost values.These members of the population are dropped.Another number of members of the population are selectedrandomly with a probability that is monotonously decreasingwith their cost values.These members of the population produce offspring.Each member of the new population

is mutated randomly like in “survival of the fittest,”however, not at all odds,but only with a certain (typically very small) probability.



Biological Evolution ""bb""eebbrrr rrr

Each individual living being is (more or less!) a product of itsgenes.

−→ In some sense, the genes of an individual are its ab-stract representation (“translated” into a living indi-vidual in a fuzzy, randomized fashion).

From an abstract point of view, the genes of a livingindividual may be viewed as one long string over an alphabetΣ = A,C ,G ,T of four letters.

Biological evolution may be viewed as proceeding solely on thegenes:

−→ The genes are the true entities that mutate, replicate,and (in sexual reproduction) combine.

For two living individuals that may produce offspring togethersexually, these strings are (nearly) equally long.



“Evaluating” the Genes ""bb""eebbrrr rrr

Each individual is a trial at its genes:

−→ How well does an individual that carries these genes?

In other words: Letting the individual struggle for life is much like

evaluating an objective function for its abstract representation(genes) anddeciding upon its “survival” through a random decision with aprobability that is monotonously increasing in the value of theobjective function (cf. Slide no. 179).

The space of biologically possible individuals is certainly much larger thanthe number of possible genes.

−→ Each abstract representation corresponds to a feasiblesolution (but not necessarily vice versa).



Genetic Algorithms ""bb""eebbrrr rrr

Conceptually, genetic algorithms are very similar to evolution strategies:

A search proceeding in rounds.A population (of genes!) is maintained.In each round, some members of the population are killed witha probability that is monotonously decreasing in the fitness ofthe individual.

−→ Cf. Slide no. 178.

Main difference:

In evolution strategies, a new generation (“child generation”)is generated from selected members of the previous generation(“parent generation”) by means of (asexual) mutation.In genetic algorithms, each member of the child generation isgenerated from two members of the parent generation bymeans of (sexual) recombination.



Genes in Genetic Algorithms ""bb""eebbrrr rrr

The search does not take place on the feasible solutionsthemselves but on abstract representatives.

Like in biology, the abstract representative of a feasiblesolution is called its gene.

For each instance, all abstract representatives will be stringsof the same length over the same set Σ.

Frequent example: Σ = 0, 1.

Note:The usage of biological terms such as gene andchromosome is not perfectly standardized in the literatureon genetic algorithms.



Example I of genes:feature–based problem definitions ""

bb""eebbrrr rrr

Recall feature–based problem definitions fromSlides nos. 149 ff.

Consider a fixed instance of some feature–based problem andlet n denote the number of features for this instances.

Encode each feasible solution s to this instance as the0/1–vector xs of length n such that for i ∈ 1, . . . , n:

xs [i ] = 1 ⇐⇒ i is selected in s.

This encoding of an instance is an example of genes.



Example II of genes: TSP ""bb""eebbrrr rrr

From Slide no. 150 recall that the TSP can be modeled in afeature–based fashion.

However, we have seen that the feasible solutions to a TSP instance on ncities may also be encoded as the permutations of (1, . . . , n).

−→ Such a representation may be viewed as a string oflength n over the alphabet 1, . . . , n.

Permutations can alternatively be encoded in a binary fashion usingpermutation matrices:

For n cities, this is an (n × n)–matrix X of 0/1–entries.For i , j ∈ 1, . . . , n, X [i , j ] = 1 means that i is the j–th cityto be visited.

−→ Arises from the identity matrix by permuting the rows or columns.



Example III of genes: graph coloring ""bb""eebbrrr rrr

From Slide no. 153 recall the definition of the graph coloring problem.

Also recall that, without loss of generality, the number of colors may berestricted to the number n of nodes of the graph.

Therefore, a feasible (or infeasible) solution may be encoded as a string oflength n over the alphabet 1, . . . , n.Alternatively, a feasible (or infeasible) solution may be encoded in abinary fashion through an (n × n)–matrix X :

X [i , j ] = 1 ⇐⇒ node no. i is assigned color no. j .

−→ Much like in the example of TSP from the last slide.



Recombination of Genes ""bb""eebbrrr rrr

Typically, recombination is realized as follows in genetic algorithms:

two offspring are simultaneously produced from a pair of parents,

both offspring are produced by the same (deterministic or randomized)procedure, but

the roles of the two parents are exchanged in the production of the twooffspring.

−→ In total contrast to the biological rules for producing offspring.

Terminology:

Recombination is often called crossover or crossing–over both inmolecular biology and in genetic algorithms.



Quality of a Crossover Strategy ""bb""eebbrrr rrr

Since the selection of the parents is based on their degrees of fitness, theoffspring of a pair of parents should preferably resemble their parents.

−→ If the parents are fit, chances are high that the offspring arefit, too.

For example, in feature–based problems:

If a feature is selected in both parents, it should also beselected in the offspring.If a feature is selected in neither parent, it should not beselected in the offspring, either.

Moreover, an offspring should not essentially inherit from one parentalone but from both parents.

−→ Should not be very similar to one parent and, simultaneously,very different to the other parent.



Crossover Strategies ""bb""eebbrrr rrr

Notation:

Consider an instance and let n denote the length of the genes.

Let P1 and P2 denote two members of the parent generation.

From them, two offspring O1 and O2 are generated.



Fundamental Examples of Crossover Strategies ""bb""eebbrrr rrr

One–point crossover:

A position p ∈ 1, . . . , n − 1 is chosen according to anarbitrary selection rule.

O1[i ] := P1[i ] and O2[i ] := P2[i ] for i ∈ 1, . . . , p.O1[i ] := P2[i ] and O2[i ] := P1[i ] for i ∈ p + 1, . . . , n.

Two–point crossover:

Two positions `, r ∈ 1, . . . , n, ` < r , are chosen according toan arbitrary selection rule.

O1[i ] := P1[i ] and O2[i ] := P2[i ]for i ∈ 1, . . . , `− 1, r + 1, . . . , n.O1[i ] := P2[i ] and O2[i ] := P1[i ] for i ∈ `, . . . , r.

Uniform crossover: For each i ∈ 1, . . . , n,

O1[i ] := P1[i ] and O2[i ] := P2[i ] with some probability,

O1[i ] := P2[i ] and O2[i ] := P1[i ] otherwise.



General problem ""bb""eebbrrr rrr

The result may be the abstract representative of an infeasiblesolution.

One possible strategy to overcome this problem:

Make all solutions feasible by dropping all side constraints.As a surrogate, penalize the (degree of) deviation fromfeasibility.

Alternative ideas:

Re–define the genetic representation of solutions such thatreasonable crossover strategies will produce (almost) feasibleoffspring from feasible parents.

−→ At least some constraints are satisfied.Apply the crossover and repair the result afterwards.

−→ Examples on the next few slides.



Example: Resource–Constrained Scheduling ""bb""eebbrrr rrr

Input: a finite set of jobs, J1, . . . , Jn,a duration di for each job Ji , anda selection S pairs (Ji , Jj) such that the graph with node setJ1, . . . , Jn and these selected pairs as directed arcs is acyclic.Furthermore, some kind of resource constraints.

Output: an assignment of each job Ji to a start time ti ≥ 0.

Constraints: ti + di ≤ tj for each (Ji , Jj) ∈ S ,and the resource constraints.

Objective: minimize maxiti + di (the makespan).



Example: Resource–Constrained Scheduling ""bb""eebbrrr rrr

A

B

C D

E

F G

H

A

B

E

D

F GC

H

M1

M2

M1

M2

A

B

C

D

E

F

G

H

Job

Duration

A G

3 5

B C D E F H

6 4 4 2 22

P1

P2

time

17

Gantt−Charts

Interpretation:

A large process or project is broken down into indivisibletasks Ji .Certain tasks must be delayed until certain other tasks arefinished (precedence constraints).Resource constraints limit how many jobs can be scheduledsimultaneously. For example, only two machines (processors)are available.The total duration of the project or process is to be minimized.

Resource-constrained scheduling problems are typically very hardoptimization problems (both theoretically and practically).



Example of “Re-define ... feasible offspringfrom feasible parents” ""

bb""eebbrrr rrr

Idea: We relax (i.e., ignore) the resource constraints and penalize theirviolation in the objective function.

Goal: Feasibility with respect to precedence constraints is maintained.

Representation: For each job Ji , the genetic representation contains areal number xi .

Semantics: xi = minti − (tj + dj) | (Jj , Ji ) ∈ S.xi represents the slack of job Ji , i.e., the amount of time Ji could bescheduled earlier without violating the precedence constraints.

Solutions which fulfill the precedence constraints correspond in aone-to-one fashion to the nonnegative genetic representations.



Crossover Strategies for this Representation ""bb""eebbrrr rrr

So assume that genes are represented by slack values xi .

For this space of genetic representation, any crossoverstrategy is appropriate that transforms nonnegativerepresentation into nonnegative ones.

Examples: the offspring are constructed by

exchanging elements of the parents,taking the element-wise minimum/maximum of each elementof the parents,taking (weighted) average values of the parents’ elements.

Note: Offspring constructed in one of these ways may violateresource constraints.



Example of Crossover/Repair:PMX for the TSP ""

bb""eebbrrr rrr

This is a particular crossover–and–repair strategy tailored tothe TSP.

What’s in a name: PMX = partially mapped crossover.

The crossover strategy used is two–point crossover.

Again, let [`, . . . , r ] denote the positions of the exchangedsubstring.

What needs to be repaired in O1 after the crossover:Cities may appear twice in O1,

once in `, . . . , r andonce in 1, . . . , `− 1, r + 1, . . . , n.



PMX for the TSP ""bb""eebbrrr rrr

On the next few slides, we will only consider the procedurefor O1.

−→ The procedure for O2 is mirror–symmetric, with theroles of P1 and P2 exchanged.

Repair loop:While some city appears twice, the following procedure isapplied for repair:

Let i ∈ `, . . . , r and j ∈ 1, . . . , `− 1, r + 1, . . . , nsuch that O1[i ] = O1[j ].

Set O1[j ] := P1[i ].



PMX for the TSP: Example ""bb""eebbrrr rrr

P1:P2:

O1:O2:

O1:

O1:

O1:

O1:

` r1 2 3 4 10 5 6 11 7 8 9

5 4 6 9 11 2 1 10 7 8 3⇓ Two–point crossover

1 2 6 9 11 2 1 10 7 8 9

5 4 3 4 10 5 6 11 7 8 3⇓ O1[1] := P1[7]

6 2 6 9 11 2 1 10 7 8 9⇓ O1[2] := P1[6]

6 5 6 9 11 2 1 10 7 8 9⇓ O1[1] := P1[3]

3 5 6 9 11 2 1 10 7 8 9⇓ O1[11] := P1[4]

3 5 6 9 11 2 1 10 7 8 4



Analysis of PMX ""bb""eebbrrr rrr

As this example shows, it may be necessary to repair aposition more than once.Clearly, the repair loop only terminates when O1 becomesfeasible.Thus, it suffices to prove that the repair loop indeedterminates.For an easier exposition, we will consider an auxiliary directedgraph G = (V ,A) with V = 1, . . . , n and for all i , j ∈ V :(i , j) ∈ A if, and only if, there is h ∈ `, . . . , r such thatP2[h] = i and P1[h] = j .In the example from the last slide:

` rP1:P2:

1 2 3 4 10 5 6 11 7 8 9

5 4 6 9 11 2 1 10 7 8 3

−→ (6, 3), (9, 4), (11, 10), (2, 5), (1, 6), (10, 11) ∈ A.c©2006 M. Muller-Hannemann & K. Weihe Algorithmics - TU Darmstadt Optimization Algorithms 197


Analysis of PMX (cont’d) ""bb""eebbrrr rrr

Each iteration of the repair loop replaces i by j for some (i , j) ∈ A.

Repairing a position more than once means proceeding along some pathof G .

Clearly, this path starts with one of the nodesP1[1], . . . ,P1[`− 1], P1[r + 1], . . . ,P1[n].

Therefore, the repair loop terminates unless it proceeds along a cycleof G .

However, for each node i ∈ V in a cycle of G , it is i ∈ P1[`], . . . ,P1[r ]and i ∈ P2[`], . . . ,P2[r ].Obviously, each node is entered by at most one arc.

In summary, a node i on a cycle of G cannot be reached from any of thenodes P1[1], . . . ,P1[`− 1], P1[r + 1], . . . ,P1[n].

` r1 2 3 4 10 5 6 11 7 8 9

5 4 6 9 11 2 1 10 7 8 3

−→ (6, 3), (9, 4), (11, 10), (2, 5), (1, 6), (10, 11) ∈ A.



Problem–Specific Crossover Strategies ""bb""eebbrrr rrr

In PMX,

the crossover strategy was one of the general ones(viz. two–point crossover),

but the repair strategy was specific for problems in which thefeasible solutions are permutations.

In the following, we will consider an example of strategies that areproblem–specific from the very beginning.

Again, we will consider a strategy for the TSP as an example.

This exemplary strategy is called order crossover (OX).

This strategy may also be viewed as a variant of two–point crossover.

As before, let P1 and P2 denote the two parents.

Again, we will only consider the construction of one offspring, O1.



Order Crossover ""bb""eebbrrr rrr

Like in two–point crossover, positions `, r ∈ 1, . . . , n, ` < r , are chosenaccording to some selection rule.

Then O1[i ] := P1[i ] for all i ∈ `, . . . , r.For all i ∈ 1, . . . , `− 1, r + 1, . . . , n, the values O1[i ] are defined suchthat the result O1 is indeed a permutation of 1, . . . , n.Contribution of P2:

The values O1[1], . . . , O1[`− 1], O1[r + 1], . . . , O1[n] aredefined such that their relative order is identical to theirrelative order in P2.

More specifically:

Let i1, i2 ∈ 1, . . . , `− 1, r + 1, . . . , n.Let j1, j2 ∈ 1, . . . , n such that O1[i1] = P2[j1] andO1[i2] = P2[j2].Identical relative order means: if i1 < i2, then j1 < j2.



Mutation in Genetic Algorithms ""bb""eebbrrr rrr

Each round of a genetic algorithm comprises the followingsteps:

A certain number of pairs of parents is selected forreproduction.The pairs are combined to form offspring.With a small probability, such an offspring is then mutated likein evolution strategies.

Rationale of the additional mutation step:

Simulates the biological procedure more precisely.Often seems to have a positive effect on the outcome of thegenetic algorithm.



Concluding Remarks on Genetic Algorithms ""bb""eebbrrr rrr

There is a body of mathematical theory for genetic algorithms.

This theory is mainly based on statistics and on mathematicalbiology.

In particular, there are some general, non–quantitative resultsabout the convergence towards the “good” feasible solutions.

However, to the best of the lecturer’s knowledge,

the model assumptions are highly unrealistic,

and thus the results cannot give more than some evidence forpractice.

−→ Much like in the case of simulated annealing(Slides nos. 142 ff.).



“Cultural algorithms” ""bb""eebbrrr rrr

In evolution strategies, the only “social activity” among the members ofthe population is competition.

In genetic programming, the social activity is much more collaborative.

This point of view inspired population–based approaches that simulatecollaborative strategies of social actors.

−→ So–called cultural algorithms.

This means:

The individual members of the population react on each otherin a positive or negative fashion.

The more promising a member’s way to go, the “morepositive” the reactions of other members.

Since social collaboration is very complex and not at all understood,simple models of social life are taken as patterns.

Example: The collaboration of ants in an ant colony.−→ See the next few slides.



Ant–Colony Optimization ""bb""eebbrrr rrr

Ants quickly converge to the shortest and most convenient paths fromsources of food and material to the nest.

Apparently, no steering intelligence is behind this “automatic procedure”.

Instead, the steering mechanisms are quite primitive:

Each ant lays out a trail of a certain chemical substance(a certain pheromone).

Each ant tries to follow the most intensive trail.

The shorter and more convenient a path, the higher theprobability that a “randomly strolling” ant has taken it before.

−→ A tendency towards the short, convenient paths.

Since ants make mistakes (even ants are “only human”), the intensity ofa trail only gives the probability that an ant follows this trail.

−→ Important to avoid local optima!



Simplified Model Scenario ""bb""eebbrrr rrr

Let there be only two paths from a nest A to a source S of food.

For simplicity, let these two paths not meet between A and S (internallydisjoint).

Let there be no other paths in the scenario.

Suppose one of the paths is twice as long as the other path.

Suppose 2n ants leave the nest in search for food.

−→ On average, n ants take either path.

The ants that take the shorter path are earlier at S .

−→ At that moment, only on the shorter path are there pheromone trails.

−→ Due to the trails, those ants will return on the shorter path with highprobability.

−→ When the ants from the longer path are going to return, thepheromone trail on the shorter path is much more intensive than on thelonger path.



Positive and Negative Feedback ""bb""eebbrrr rrr

Positive feedback:

Each ant stimulates other ants to follow it.

Negative feedback:

The amount of pheromone on a trail decreases (“evaporates”)permanently.

Clearly, the next ant on this trail will refresh it.

Thus, if a trail is not used very often, its probability of beingused decreases even more.



Translation into an Algorithm ""bb""eebbrrr rrr

The problem is to be re–formulated such that

each instance corresponds to a directed graph,

each solution to an instance corresponds to a path in thegraph,

side constraints of the problem occur as side constraints onpaths, and

the cost function can be (at least approximately) expressed asthe length of the corresponding path.

Idea:

The ants move around in this graph (obeying theabove–mentioned side constraints).

The mechanism of pheromones is simulated.

Then, hopefully(!), the ants will tend to move along shortpaths.



Example I: TSP ""bb""eebbrrr rrr

Consider a general TSP instance with n points.

At the beginning, a number of ants is placed in an arbitrary (random orsophisticated) distribution on the n points.

Like in the previous algorithms (but unlike animal behavior), thealgorithm proceeds in rounds.

More specifically,

the algorithm comprises an arbitrary number of major rounds,

and each major round consists of n minor rounds.

In each minor round, each ant moves on to a point that it has not yetvisited during the current major round.

−→ In each major round, each ant completes a round tour on the npoints.

The best tour completed by any ant is the overall output of the algorithm.



Ant–Colony Algorithm for the TSP (cont’d) ""bb""eebbrrr rrr

Whenever an ant moves on from one point i to another point j , it leavessome amount of pheromone on the arc (i , j).

To simulate negative feedback, the amount of pheromone decreases(“evaporates”) after each minor round by a constant factor.

The point to be visited by an ant in the next minor round is chosenrandomly (from the points not yet visited in this major round).

The probability of each candidate point j depends

on the distance d (i , j) from the currently visited object i to jand

on the amount of pheromone currently on the arc (i , j).

−→ The smaller the distance and the larger the amount of pheromone,the higher the probability of taking this arc.



Example II: Set Covering ""bb""eebbrrr rrr

From Slide no. 151 recall the set–cover problem.

We will here use the terminology from there.

Re–formulation as a path problem in a directed graph:

The nodes of the graphs are the elements of S.

The graph is complete, that is, for any s1, s2 ∈ S, the arc(s1, s2) exists.

(Ordered) subsets of S can then be viewed as paths in thisgraph.

The weight of an arc (s1, s2) shall express the “benefit” ofadding s2 to the solution given that s1 is part of the solution.

−→ For example, the cardinality |(s2 \ s1)| could express thisbenefit.

An ant must not stop walking through the graph until the elements of Son this path together cover F .


Neighborhood-Based Approaches 3.9 Local Search with Complex Side Constraints

Complex Side Constraints ""bb""eebbrrr rrr

All variants of the general local–search scheme depend on anappropriate neighborhood structure in some way or other.

Examples:

The fundamental local–search algorithm requires an easilyenumerable neighborhood structure.Simulated annealing requires a neighborhood structure inwhich random selection is easy.

Unfortunately, the side constraints often make the naturalneighborhood definitions inappropriate.−→ Example on the next slides.



Additional Complex Constraints in the TSP ""bb""eebbrrr rrr

The neighborhood sketched on Slide no. 99 is appropriate: any pair ofarcs of the current tour induces a neighbored feasible solution.

However, the TSP does not seem to occur very often in its purist form inreality.

In fact, real–world variants of the TSP typically come with additional sideconstraints.

Typical example:

A list of pairs (i , j) is given as an additional input.For each such pair (i , j), object no. i must be visited beforeobject no. j (precedence constraints).

Consequence:

Removing two arbitrary arcs from a tour and re–connecting thetour like on Slide no. 99 may result in an infeasible solution.Unless the list of pairs is very short, the probability ofinfeasibility is very high.



Problem Summary ""bb""eebbrrr rrr

The neighborhood relation may become very sparse (maybeeven disconnected).

Consequences:

Due to the loose neighborhood connections, the search is likelyto stay in a small subset of the search space (around the startsolution).Chances are high that the search quickly traps into a localoptimum and cannot leave it anymore.

Additional problem for algorithms such as simulated annealing:

Such an algorithm may require many trials ofneighbored, but infeasible, “solutions” until afeasible neighbored solution is found.



Typical approach: Relaxations ""bb""eebbrrr rrr

Some of the side constraints are relaxed, that is:

The selected side constraints are dropped.Additional penalty terms in the cost function penalizeviolations of the dropped side constraints.

If the side constraints to be dropped are well selected, thenatural neighborhood relations will again be appropriate.

Natural approach to the example (TSP with precedenceconstraints):

The precedence constraints are dropped.The number of input pairs (i , j) such that no. j is visitedbefore no. i is added to the cost function (possibly multipliedby some weighting factor).



Variable Weight Penalties ""bb""eebbrrr rrr

The rough penalty terms is to be multiplied by a (nonnegative) penaltyfactor, which expresses the relative priority of the penalty compared tothe original objective.

Clearly, this factor need not be constant throughout an application of anykind of local–search technique (simulated annealing, evolution strategies,...).

Natural scheme:

In the beginning, the factor is very small or even zero.

−→ The side constraints are dropped, (more or less) with-out a substitute.

Throughout the procedure, the factor is increased from time totime.At the end, the factor is very large.

−→ The problem becomes (approximately) a pure feasi-bility problem.



Heuristic Idea Behind Variable Weights ""bb""eebbrrr rrr

Chances are high that the objective function is rather “smooth” on theset of all feasible and infeasible solutions.

−→ Need not be equally smooth on the feasible solutions alone!

In this case, the good feasible solutions are probably found in the areaswhere the objective function is generally good on feasible and infeasiblesolutions.

Therefore, it might be a good idea,

first to approach these areas quickly (disregarding feasibility)andenforcing feasibility only later on.

It might even be a better idea to increase the force on feasibility graduallythroughout the search.

−→ Exactly the scheme formulated on the last slide.



Natural Translation into an Algorithmic Goal ""bb""eebbrrr rrr

In the first couple of rounds of the procedure, the search shallconverge to the areas of the ground set where the objectivefunction is quite good.

−→ Optimization is more important than feasibility.

In the last couple of rounds, the search shall converge tofeasibility.

−→ Feasibility is more important than optimization.



General Problem with Relaxation ""bb""eebbrrr rrr

The feasible solutions to the original problem are among the very (very!)good solutions with respect to the new objective function, that is,

the original objective functionplus some penalty terms.

Experience seems to suggest that the various local–search algorithmspresented here

indeed converge to the very good solutions,however, often at a miserable convergence rate.

Heuristic consequence:

Chances are high that such a search procedure requires a lot oftime until the very first feasible solution is seen.However, if the algorithm indeed finds a feasible solution, thisis probably quite good.


Section 4:Decision-Based Approaches


Decision-Based Approaches 4.1 Decision Trees

Decomposition of the Solution Space ""bb""eebbrrr rrr

In this section, we consider approaches for finding an optimal(or at least feasible) solution to an algorithmic problem whichare based on partitions of the search space into smallersubsets.

The algorithmic steps within the search for an optimalsolution induce a tree structure, a search tree.

At each internal node of the tree we “make a decision” onhow to decompose the search space further.



Search Trees and Decomposition ""bb""eebbrrr rrr

SS S

S

S

S S

S

S

S

S S

S

S

S

10 11

20

21

22 23

24

20

21

22 23

30

32

31

S

S20 S21 S22 S23S 24

S S

S S11

30 31 S32

10

The search tree is a directed, rooted tree. The rootcorresponds to the whole solution space S .

Each node in the tree corresponds to a certain subset of allsolutions.

The subsets associated to the children of a node form apartition of the solution set associated with their parent.



Branchings of a Search Tree ""bb""eebbrrr rrr

The partition into subsets is usually induced by adding someconstraint.

Examples:Let S be the set of solutions associated with node P.

Example 1: We branch on a binary variable xi ∈ 0, 1: P hasexactly two children Q0 and Q1. The solution set associatedwith Q0 is S | xi = 0, the solutions set associated with Q1 isS | xi = 1.Example 2: We branch on a continuous variable xi ∈ R: Wemay partition S into several sets, for instance intoS1 = S | xi < 0, S2 = S | xi = 0, S3 = S | xi > 0.Then node P has three children Q0,Q1 and Q2 with associatedsets S1,S2 and S3.



Feature-based Problems ""bb""eebbrrr rrr

From Slides 149 ff. recall that, in many algorithmic problems,

the feasible solutions may be identified with the subsets of aground set of features,which may often be regarded as the dimensions of theunderlying ground set.

−→ Feature-based problem definitions and algorithms.

Each feature in the ground set naturally induces one decision:whether or not it shall be a member of the feasible solution.

Thus, every solution may be determined by a sequence of“yes” and “no” answers (one for each feature).



Decision Trees ""bb""eebbrrr rrr

In the feature–based case, a natural (though not unique)definition of search trees is the following special case which wecall decision trees.

We decide on the features in some order. Let us assume thatthe features are numbered from 1 to n with respect to thisorder.

At tree level i we decide to select or reject feature i + 1(the root has level 0).

This yields a full binary tree, that is:

All leaves are on the same level n(with n = number of features).Any other node has exactly two children(one for “select” and one for “reject”).



Decision Trees and Partitionof the Solution Space ""

bb""eebbrrr rrr

The options in a decision partition the solution space intodisjoint subsets:

those solutions in which this feature is selected andthose solutions in which this feature is rejected.

The subset of solutions associated with a node of a decisiontree is exactly the set of solutions which are compatible withall decisions along the unique path from the root to this node.

In other words: a node corresponds to the set of all solutionsthat

contain the features selected so far anddo not contain the features rejected so far.

A leaf then corresponds to a singleton, and represents exactlyone element of the solution space.



Exploring Decision/Search Trees ""bb""eebbrrr rrr

Typically, the number of solutions is (at least) exponential inthe number of decisions.

Thus, a decision tree for an instance is way too large to beexplored exhaustively.

In principle, there are two different strategies to overcome thisproblem:

The exploration is only potentially exhaustive.The exploration is heuristically restricted to a selected part ofthe tree.



What does that mean? ""bb""eebbrrr rrr

Exploration “potentially exhaustive”:

For each examined node, we try to construct a certificate thatthere is no feasible (or optimal) solution among the leaves ofthe node’s subtree.If we succeed, the subtree is excluded from the exploration as awhole.The latter is called pruning.

Exploration “heuristically restricted”:Deliberately skip parts of the tree (subtrees).−→ The search for a feasible (not to mention optimal)

solution may fail even if the solution space is non–empty.



General Algorithmic Scheme ""bb""eebbrrr rrr

STEP 1: Create a root node r representing the original problem.Mark this node as unexplored (“not visited”).

STEP 2: WHILE there is an unexplored node DO

2a) select one unexplored node for examination, say node n,and mark this node as explored;

2b)

either solve the problem associated with n(determine infeasibility or find an optimal solution);or decide to prune the whole subtree rooted at n;or branch, that is, decompose the problem into subproblemsand add corresponding unexplored nodes to the tree(determine the order of the subproblems in the tree);



Remarks on the General Scheme ""bb""eebbrrr rrr

The general scheme gives many degrees of freedom:

How to select among the unexplored nodes?−→ tree traversal strategies(to be discussed on the next slides)Step 2b) offers many variants, for example:Variant 1: First try to solve the problem associated withnode n. If this attempt fails, then branch into subproblems.

Variant 2: Branch immediately into subproblems, unless thecurrent node corresponds to a singleton solution.

Variant 3: Compute some additional information which helpsyou to decide whether you can prune the subtree. If youdecide against pruning, then branch into subproblems.How to branch?(to be discussed later)



Tree Traversal Strategies ""bb""eebbrrr rrr

In the general algorithmic scheme, the search tree is built up dynamically,i.e., it grows step by step.

Let us now change the viewpoint and look at the whole tree which we getafter completing all computations.

In which order did we visit the tree nodes?

Obviously, a tree node (except for the root) can only be visited when itsimmediate predecessor has been visited before.−→ For each arc (v ,w) of the decision tree, w enters the

state “explored” (= “visited”) only after v .

Thus, a “reasonable” tree traversal order can be viewed as propagating a“frontier line” through the (final) tree:

In each step, one arc (v ,w) with v already visited and w notyet visited is chosen.The node w is visited.

The individual tree traversal strategies differ in the selection rule for thearc (v ,w).



Canonical Tree Traversal Orders ""bb""eebbrrr rrr

Three kinds of traversal order are quite obvious and build the basic“pure” strategies:

depth-first search,

breadth-first search,

best-first search.

−→ To be discussed in more detail on the next few slides.



Depth–First Search (DFS) ""bb""eebbrrr rrr

Consider a stage of the tree traversal after visiting k nodes,say.

Let v1, . . . , vk be the nodes visited so far (in that order, thatis, vi visited before vi+1 for i ∈ 1, . . . , k − 1).

Let i ∈ 1, . . . , k be maximal such that there is an arc(vi ,w) with w not yet visited.

Choose one of the arcs (vi ,w).



Realization of DFS with a Stack ""bb""eebbrrr rrr

A node v is inserted in the stack when the search descendsfrom v ’s parent to v .

The next node to be visited is one of the children of the topelement of the stack.

A node v is removed from the stack once all immediatechildren of v have been visited and the search ascends backfrom v to v ’s parent.

So, at any time, the nodes in the stack form the (unique)path from the root to the current top element.

−→ If the next arc from v is always the leftmost one not yetprocessed, the nodes in the stack form a “frontier line”that passes from left to right through the tree.



Breadth–First Search (BFS) ""bb""eebbrrr rrr

Let (v1,w1), . . . , (vk ,wk) be the arcs of the current frontierline at some stage of the tree traversal (vi = vj possible fori 6= j ; all nodes vi have been examined, all nodes wi are stillunexplored).

For a node v , let h(v) be the height level of v in the decisiontree.−→ h(w) = h(v) + 1 for every arc (v ,w) of the decision tree.

Choose an arc (vi ,wi ) such that h(vi ) is minimal.



Realization of BFS with a Queue ""bb""eebbrrr rrr

A node v is appended to the queue (added to the back) whenthe search descends from v ’s parent to v .

The next node to be visited is one of the children of the firstelement of the queue.

A node v is removed from the queue when all immediatechildren of v have been visited.

−→ At any time, the nodes in the queue form a “frontierline” in the decision tree, which passes “horizontally”through the tree (the first few elements one height leveldeeper than the other elements).



Best–First Search ""bb""eebbrrr rrr

For the tree traversal, a “goodness function” g : V −→ R isdefined on the nodes of the decision tree.

In each stage, choose an arc (v ,w) ∈ A with g(w) maximalamong all arcs (v ,w) such that v has been visited but not yetw .

Typically, the goodness function value of a node v is anestimate of the best possible leaf (=solution) in the subtreerooted at v .

Simple example: complete the partial solution correspondingto v by a greedy-like algorithm and take the objective value ofthe result as g(v).



Priority Queues ""bb""eebbrrr rrr

A priority queue is a data structure which allows us to perform thefollowing operations on a collection H of objects, each with anassociated real number, called its key:

create-pq(H): create an empty priority queue H.

insert(H, x): insert the element x into H.

find-min(H, x): find and return an object x of minimum keyin H.

delete-min(H, x): delete an object x of minimum key from H.

decrease-key(H, x , y): decrease the key of an object x in H tothe new value y .



Best–First Search - Implementation ""bb""eebbrrr rrr

With a priority queue:

The priority queue may be implemented as a heap, forexample.

It behaves much like a FIFO queue.

Difference: the top element is not the one that was insertedfirst but the “best” one according to some criterion that canbe expressed as a numerical value for each element.

Here the numerical value of a node v is just g(v).

−→ At any time, the nodes in the queue form a “frontierline” in the decision tree, which does not follow anyparticular pattern (as opposed to breadth-first search).



Advantages/Disadvantages of Pure Strategies ""bb""eebbrrr rrr

Depth first search:

May find a first feasible solution relatively soon.

Requires only small memory:the stack size is bounded by the maximal tree depth.

Best first search:

May help to find better feasible solutions faster.

Has a larger overhead per iteration.

Breadth first search:

Has huge memory requirements (eventually the whole tree hasto be stored!).

Might be useful in a combined strategy: Start with BFS, andthen switch to best-first-search. Why? In the beginning, theestimates for best first search might be not meaningfulenough to guide the search.



Complete Enumeration ""bb""eebbrrr rrr

In general, we can not afford to explore the whole tree exhaustively.

For enumeration (or counting) problems we are forced to do so.

Enumeration problems: we want to output all feasible solutions(only reasonable for finite solution sets)

Counting problems: we want to output the number of feasible solutions

In such cases the best we can expect in terms of efficiency is to bepolynomial in the size of the input plus output.

Which strategy should be used for enumeration problems?

Depth-first-search is usually the method of choice since

it is easy to implement, andhas least memory requirements.



Brute-Force ""bb""eebbrrr rrr

Solving a problem by a complete traversal of the decision tree(= systematically generate all possible solutions) is usuallycalled a brute-force approach.

It is one of the most simple ways to solve a problem (it makesno or only little use of the problem structure),

but can be afforded only for small instances (since thedecision tree has usually exponential size).


Decision-Based Approaches 4.2 Preprocessing

The Power of Preprocessing ""bb""eebbrrr rrr

General idea:Transform a given instance I1 of some problem class P into an instance I2 ofthe same problem class P such that

the size of I2 is strictly smaller than that of I1, and

if we know an optimal solution for I2, we can easily compute an optimalsolution for I1.

Example: Cardinality Matching

Let G1 = (V ,E) be an undirected graph in which we seek a maximumcardinality matching.

Transformation rule: Let v be a vertex of degree one in G1, and (v ,w) bethe incident edge. Delete the vertices v and w (and all incident edges)from G1 to obtain a graph G2.

If M2 is a maximum cardinality matching in G2, then M1 = M2 ∪ (v ,w)is a maximum cardinality matching in G1.

Apply the transformation rule repeatedly (as long as possible).



Another Example for Preprocessing ""bb""eebbrrr rrr

Let us consider the NP–hard problem HITTING SET:

Input:a non-empty and finite ground set F and a collection S of subsets of F .

Feasible output:a subset F ′ ⊆ F of F such that s ∩ F ′ 6= ∅ for every s ∈ S.

Objective: minimizing |F ′|.

Case study from a concrete application:

F is the set of all stations of a railroad system.

There is an element of S for each train in the schedule.

An element s ∈ S consists of all stations where the corresponding trainstops.

The problem is to find a set F ′ ⊆ F of service stations such that eachtrain stops at at least one service station.



Hypergraphs and HITTING SET ""bb""eebbrrr rrr

The input of the hitting set problem is a set system which is alsoknown as a hypergraph:

Definition

A hypergraph H is a pair (F ,S) where F is a non-empty finite setand S is a family of subsets of F .The elements of F are called vertices, the elements of S are calledhyperedges.

In our concrete application, each station is a vertex, and each trainroute (given as a sequence of stations) represents a hyperedge.



Data Reduction Techniques in the Case Study ""bb""eebbrrr rrr

Simple reduction techniques apply:

If all trains that stop at station A also stop at station B, thenA may be removed from the set of stations (and from the listof stops of each train).If train A stops at all stations where train B stops, then A maybe removed from the set of trains (and from the list of trainsat every station).

These two techniques may be applied as often as possible, andevery optimal solution to the reduced instance is still anoptimal solution to the original instance.

Observation: If a station becomes isolated, but is not removed(that is, it is contained in a hyperedge with only one element),it belongs to every feasible solution to the reduced instance.



Example: Selection of Service Stations ""bb""eebbrrr rrr

German ICEs:Six service station suffice(Data is outdated)

All German trains:only tiny connected compo-nents left after preprocessing



Data Reduction in Our Case Study ""bb""eebbrrr rrr

The data reduction maintains the optimal value.

If the reduction decomposes the hypergraph into connected components,we get an optimal solution to the entire instance by computing an optimalsolution to each connected component and concatenating all of them.

If a connected component is an isolated node, this node is the optimalsolution to this connected component.

The repeated application of the two reduction techniques has simplifiedthe ICE instance to a set of isolated stations.−→ Optimal solution found through reduction only!

The all–German–trains instance was simplified to a set of isolated stationsand a few, very small connected components.−→ For each connected component, the decision tree is

small enough to be searched exhaustively.

Exactly this phenomenon occurred in all tested instances (taken from allover Europe).



Lessons from the Case Study ""bb""eebbrrr rrr

Morale of this story:

Sometimes, simple preprocessing has a surprisingly largeimpact:Afterwards even total enumeration becomes affordable.

Do not overlook the simple things that you can do!


Decision-Based Approaches 4.3 Cutting Strategies

Pruning of Subtrees ""bb""eebbrrr rrr

We have seen that, in very rare cases, we can afford to explorethe decision tree exhaustively.

In most cases, the decision tree is way too large, so we canonly afford to explore parts of it.

In other words, whenever we visit a node of the decision tree,we will make a decision

whether to descend into the subtree rooted at this nodeor to discard this subtree as a whole.

In the following sections, we will consider three differenttechniques that cut off subtrees only if these subtrees aredefinitely useless.



Definitely Useless Subtrees ""bb""eebbrrr rrr

A subtree of the decision tree is definitely useless...

in case of a pure feasibility problem:

if the subtree is infeasible (the associated solution space to theroot of this subtree is empty),or there is at least one feasible solution outside the union ofthis subtree and all subtrees cut off previously.

in case of an optimization problem:

if the subtree is infeasible (the associated solution space to theroot of this subtree is empty),or the subtree is unbounded (then the whole problem isunbounded and we can stop),or there is at least one optimal solution (if existing!) outsidethe union of this subtree and all subtrees cut off previously,or we know already a feasible solution which is better than theoptimal solution within this subtree.



Determining Uselessness ""bb""eebbrrr rrr

To cut off a subtree without losing correctness and optimality,we will compute some kind of evidence that this subtree isdefinitely useless.

The techniques in the following sections will basically differ bythe very nature of the computation strategy.

Side remark: we will see that different traversal strategies areappropriate for the individual types of computation strategies.

Note: for NP-hard algorithmic problems, an efficient strategycannot always determine whether a subtree is definitelyuseless or not.−→ Otherwise, applying this strategy to the root of the

decision tree would efficiently determine whetherthe instance is feasible.



Conservative Determination of Uselessness ""bb""eebbrrr rrr

So this means we cannot get perfectly accurate evidencewithin a reasonable amount of run time.

Alternatively, we will aim at a conservative strategy.

This means two outcomes are possible: “yes, definitelyuseless” or “don’t know”.

It goes without saying that the subtree must indeed bedefinitely useless whenever “yes, definitely useless” is theanswer.−→ We are on the “safe side” = conservative strategy.

The challenge is to design strategies such that “don’t know”is not too often the outcome in cases where the subtree isindeed definitely useless.


Decision-Based Approaches 4.4 Branch-and-Bound

Branch-and-Bound ""bb""eebbrrr rrr

Branch-and-bound addresses optimization problems in thefirst place.

In the following, we will focus on minimization problems forconciseness.−→ Maximization is analogous (“mirror-symmetric”).

Branch-and-bound follows the general scheme of Slide 230.

It uses upper and lower bounds on the optimal solution valueto prune subtrees.



Fundamental Concept of Branch-and-Bound ""bb""eebbrrr rrr

Suppose we know

an upper bound U on the optimal cost value.Note: any feasible solution s with cost value c(s) can serve asan upper bound.

Further suppose, for every node v of the decision tree, we can compute alower bound `(v) ∈ R ∪ +∞ on the objective values of all feasiblesolutions in the subtree rooted at v .

If the solution set corresponding to v is empty, we set `(v) := +∞.



Fundamental Concept of Branch-and-Bound ""bb""eebbrrr rrr

The subtree rooted at v is useless

if `(v) = +∞, because the whole subtree is infeasible(pruning by infeasibility);

if `(v) > U, because no optimal solution may be in the subtree rootedat v (pruning by bound);

if `(v) = c(s) for some feasible s (which we have found).

If s belongs to the subtree rooted at v , we have found anoptimal solution for the whole subtree. If c(s) < U, we canupdate our upper bound and set U := c(s).Otherwise, the subtree may contain a solution with the sameobjective value, but no better solution.

In both cases, the subtree rooted at v needs no further examination(pruning by optimality).



Computing an Initial Upper Bound ""bb""eebbrrr rrr

Clearly, the smaller the value of an upper bound U is, the more effectiveour pruning will be.

A good upper bound U is often hard to compute.

However, some upper bound U may be easy to compute.

−→ For example, in feature–based problems, the total sum of allfeature costs is an upper bound U (but very bad in mostcases).

We may apply any heuristic to find a feasible solution: its solution valuecan serve as an initial upper bound.



Update of the Upper Bound ""bb""eebbrrr rrr

Whenever the tree traversal encounters a leaf of the search tree(=feasible solution) s with c(s) < U it is reasonable to replace U by c(s).

−→ To increase the chance of determining useless subtrees.

There are also chances that the algorithm which computes a lower boundfor a tree node, delivers as a side effect a feasible solution s. Of course, ifc(s) < U, we update our global upper bound, too.

If the tree traversal strategy is chosen appropriately, a bad upper boundU may soon be replaced by the cost of a better feasible solution s.

So, comparison to an abstract upper bound is restricted to the initialsearch steps. Afterwards the lower bound `(v) for a node v of thedecision tree is compared to the cost value c(s ′) of a solution s ′. Thisupper bound becomes better and better as the search proceeds.



Lower Bounds for Subtrees ""bb""eebbrrr rrr

Recall the interpretation of a decision-tree node v as a subset of theoriginal set of all feasible solutions.

−→ The feasible solutions that obey all constraints correspondingto the tree arcs on the path from the root to v .

In other words: We need a lower bound `v on the optimum objectivevalue inside this subset.

General idea: We solve a relaxation of the instance at hand.

−→ The optimal objective value of the relaxed instanceis a lower bound on the optimal objective value ofthe original instance.

Clearly, such a relaxation to compute lower bounds within abranch–and–bound approach only makes sense if the relaxed problem issignificantly easier to solve than the original problem.



Relaxations ""bb""eebbrrr rrr

The general idea is to replace a difficult minimization problem by a simpleroptimization problem whose optimal value is not larger than that of the originalproblem. To this end we may

(i) enlarge the set of feasible solutions or

(ii) replace the objective function by a function which has the same or asmaller value everywhere.

Definition

Let (ORIG) be the following problem:

optO = minc(x) | x ∈ X ⊆ Rn

The problem (REL)

optR = minf (x) | x ∈ T ⊆ Rn

is called a relaxation of (ORIG) if

(i) X ⊆ T , and

(ii) f (x) ≤ c(x) for all x ∈ X .



Properties of Relaxations ""bb""eebbrrr rrr

Relaxations deliver lower bounds:

Lemma

If (REL) is a relaxation of (ORIG), then optR ≤ optO .

Relaxations may help to detect infeasibility:

Lemma

If a relaxation (REL) is infeasible, then the original problem (ORIG) isinfeasible.

Relaxations may deliver optimal solutions:

Lemma

Let x∗ be an optimal solution of (REL). If x∗ ∈ X and f (x∗) = c(x∗), then x∗

is an optimal solution of (ORIG).



Example: LP–Relaxations ""bb""eebbrrr rrr

Consider an optimization problem that is formulated as aspecial case of INTEGER LINEAR PROGRAMMING (ILP).

Dropping the constraint that the solution be integral (andchanging nothing else) yields the so–called LP-relaxation ofthis algorithmic problem, which is a linear programmingproblem (LP).

Since the LP-relaxation is indeed a relaxation, its objectivevalue is a lower bound on the objective value of the originalILP.

LP is much more efficient to solve than ILP.−→ See lectures on Linear Programming.



Combinatorial Relaxations ""bb""eebbrrr rrr

Whenever the relaxed problem is a combinatorial optimization problem, we callit a combinatorial relaxation.Example: Symmetric traveling salesman problem (STSP)Instance: an undirected graph G = (V ,E) and edge weights ce for e ∈ E .Task: Find an undirected tour of minimum weight.

Idea for a relaxation: Drop the subtour elimination constraints.Relaxed problem becomes:

min∑e∈E

ce · xe

∑e=(v,w)∈E

xe = 2 for all v ∈ V

xe ∈ 0, 1This problem is a so-called perfect 2-factor problem. (A perfect 2-factor of anundirected graph G = (V ,E) is a subset M of E such that each vertex isincident with exactly two edges from M).

Remark: A minimum cost perfect 2-factor can be found efficiently using

matching techniques (similar to the blossom algorithm).



1-Tree Relaxation of STSP ""bb""eebbrrr rrr

Another, important combinatorial relaxation of the symmetric TSP is theso-called 1–tree relaxation.

A 1–tree on node set V = v1, v2, . . . , vn is a graph consisting of twoedges adjacent to node v1, plus the edges of a spanning tree on nodesv2, v3, . . . , vn.

Clearly every tour is a 1–tree, and thus the value of a shortest 1-tree is avalid relaxation of STSP.

The computation of a shortest 1-tree is easy: just find a minimumspanning tree on 2, 3, . . . , n and add the two cheapest edges incident tovertex 1.

−→ We will discuss algorithms for finding minimum spanning trees in amore general framework later.



Lagrangian Relaxation ""bb""eebbrrr rrr

The next technique has been proven to be very effective in practice.

Suppose we consider the following optimization model (P):

z(P) := min cT x

subject toAx = b

x ∈ X

We have a vector x of decision variables, a linear objective function cT x ,and a set of explicit linear equalities Ax = b (say k equalities). Feasiblesolutions are further restricted to lie in a given constraint set X .

Idea: Drop the explicit linear equalities, but bring them into the objectivefunction with associated Lagrangian multipliers λ = (λ1, λ2, . . . , λk).

The problem

(LRλ) : L(λ) := min cT x + λT (Ax − b)

subject to x ∈ X

is called Lagrange relaxation of (P).

L(λ) is called the Lagrangian function.



Lagrangian Relaxation ""bb""eebbrrr rrr

We obtain indeed a relaxation:

Lemma

For any vector λ of Lagrangian multipliers, the value L(λ) of the Lagrangianfunction is a lower bound on the objective value z(P) of the originaloptimization problem (P).

Proof.

Since Ax = b for every feasible solution to (P), we have for any vector λ ofLagrangian multipliers

mincT x |Ax = b, x ∈ X = mincT x + λT (Ax − b) |Ax = b, x ∈ X.

Since removing the constraints Ax = b from the second formulation cannotlead to an increase in the value of the objective function (the value mightdecrease), we have

z(P) ≥ mincT x + λT (Ax − b) | x ∈ X = L(λ).



The Lagrangian Multiplier Problem ""bb""eebbrrr rrr

Aims of Lagrangian relaxation:

Identify a set of restrictions Ax = b such that (LRλ) is “easier” to solvethan (P).

We want to make the gap z(P)− L(λ) as small as possible, that meansL(λ) as large as possible

This leads to the Lagrangian multiplier optimization problem

L∗ = maxλ

L(λ).

Remark: The Lagrangian multiplier optimization problem can be solved bysubgradient optimization.

−→ This goes beyond the scope of this lecture.



Example: Symmetric TSP ""bb""eebbrrr rrr

Consider the following ILP-formulation of the symmetric TSP:(given a graph G = (V ,E), V = 1, . . . , n, edge costs ce)Variables: xe denote whether edge e ∈ E is in the tour.

min∑e∈E

ce · xe

subject to

(1)∑

e=(i,j),i,j∈S

xe ≤ |S | − 1 for all ∅ 6= S ⊂ V , 1 6∈ S

(2)∑

e=(i,j)

xe = 2 for all i ∈ V

(3)∑e∈E

xe = n

(4) xe ∈ 0, 1 for all e ∈ E .



Lagrangian Relaxation for the Symmetric TSP ""bb""eebbrrr rrr

Replace (2) by

(5)∑

e=(i,j)

xe = 2 for i = 2, . . . n

(6)∑

e=(1,j)

xe = 2

Lemma

x = (xe) satisfies (1),(3),(4) and (6) if and only if x corresponds to a 1–tree.

Relax equalities (5):

L(λ) =∑e∈E

cexe +n∑

i=2

λi (∑

e=(i,j)

xe − 2)

L(λ) =∑

e=(i,j)∈E

(ce + λi + λj)xe − 2n∑

i=1

λi (with λ1 := 0).

Interpretation: To solve the Lagrangian relaxation we have to solve aminimum 1-tree problem with modified costs c ′e := ce + λi + λj for e = (i , j).



Use of Relaxations within Branch & Bound ""bb""eebbrrr rrr

We have seen different possibilities to obtain relaxations of optimizationproblems.

Methodological obstacle: The subset of the solution spacecorresponding to a node of the decision tree need not be the solutionspace of some instance of the problem.

Example: Consider the TSP as a feature-based problem.

Let v be a node of this decision tree on height h. For the firsth features, a decision has been made whether to select or toreject each of them.Let X denote the set of selected features and Y the set ofrejected features.Then v corresponds to the problem of finding an optimalround tour – among all round tours that cover X and avoid Y .This is not an instance of the pure TSP anymore!



Use of Relaxations within Branch & Bound ""bb""eebbrrr rrr

Consequence: At each node we need relaxations for amodified problem type, namely the original problem with someadditional side constraints.

The additional side constraints are typically fixations of somevariables or linear constraints.

Advice: Choose relaxations so that you can solve themodified instances efficiently.



Branch & Bound for ILPs ""bb""eebbrrr rrr

Optimization problems which can be modeled as ILPs are ofhighest practical importance.−→ Good reason to discuss them in more detail.

To solve them, it is most common to use LP relaxations forlower bounds.

If an LP relaxation delivers an integral optimal solution, this isautomatically an optimal solution.

Hence, the LP relaxation is fractional if we have to branch ata node of our branch-and-bound tree.

How should we branch?



Branching Strategies for ILPs ""bb""eebbrrr rrr

The following branching strategies are most common:

maximum infeasibility rule: chooses the variable with the largestfractional deviation from integrality in the LP relaxation

Fractional deviation f from integrality of variable x :f = minx − bxc, dxe − x

Comment: forces larger changes earlier in the tree

minimum infeasibility rule: chooses the variable with the smallestfractional deviation from integrality in the LP relaxation

Comment: may lead more quickly to a first integral solution, but overalltime will often be slower

strong branching: partially solves a number of subproblems withtentative branches to see which branch is most promising

Comment: often effective on large, difficult MIP’s, but is very costly



Node Selection for ILPs ""bb""eebbrrr rrr

Let us revisit tree traversal strategies:

Depth-first search:

try to find a first feasible solution fast (requires very good upper boundsfor effective pruning);

solving the LP relaxation of an immediate descendant can reuseinformation of the optimal solution for its parent;

uses less memory.

Best-Bound search:

chooses always a node with the smallest lower bound;

one will never branch at a node whose lower bound is larger than theoptimal value of the whole problem (which would be a waste of time).


Decision-Based Approaches 4.5 Constraint Programming

Constraint Satisfaction Problems ""bb""eebbrrr rrr

A CONSTRAINT SATISFACTION PROBLEM (CSP) consists of

a set of variables X = x1, x2, . . . , xn,for each variable xi a finite set Di of possible values (itsdomain), and

a set of constraints restricting the values that the variablescan simultaneously take.

A solution to a CSP is an assignment of a value from its domain toevery variable in such a way that every constraint is satisfied.

Thus, CSP is a pure feasibility problem. In our previousterminology, its variables correspond to features.

In the following we will discuss several systematic techniques forsolving such problems.



Example: The n–Queens Problem ""bb""eebbrrr rrr

The n–queens problem is a famous combinatorial puzzle. It will serve asour toy example to illustrate several search methods.

Given any integer n, the problem is to place n queens on ann × n–chessboard so that no two queens threaten each other.

A queen threatens any other queen on the same row, column, or diagonal.

How can this problem be modeled?

Each queen must be in a different column. We introduce a variable ri(with the domain 1 . . . n) for the queen in the i-th column indicating itsrow position.

A solution is feasible if and only if for all 1 ≤ i ≤ n and 1 ≤ j ≤ n withi 6= j we have

ri 6= rj and |i − j | 6= |ri − rj |.



Backtracking ""bb""eebbrrr rrr

The most common algorithm for performing systematic search isbacktracking.

Backtracking incrementally attempts to extend a partial assignment thatspecifies consistent values for some of the variables,

towards a complete assignment, by repeatedly choosing a value foranother variable consistent with the values in the current partial solution.

Any partial assignment corresponds to a node in the search tree.

If a partial assignment violates any of the constraints (that means, it isinfeasible), the whole subtree rooted at this nodes can be pruned.

In such a case one goes back in the chosen path in the decision tree tothe most recently instantiated variable that still has alternatives available.



Example: Backtracking and 4-queens ""bb""eebbrrr rrr

x x

x x x x x

x x x x

x x

x x x

x x



Disadvantages of Backtracking ""bb""eebbrrr rrr

There are three major drawbacks of the standard backtracking scheme:

repeated failure for the same reason (trashing).

Trashing occurs because backtracking does not identify the real reason ofthe conflict. Trashing can be avoided by backjumping, i.e., by a schemeon which backtracking is done directly to the variable that caused thefailure.

Backtracking is having to perform redundant work. Even if theconflicting values of the variables are identified they are not rememberedfor immediate detection of the same conflict on a subsequentcomputation. Methods to resolve this problem are called backchecking orbackmarking.

Backtracking detects the conflict too late as it is not able to detect theconflict before it occurs. This can be avoided by applying consistencytechniques to forward check possible conflicts.



Example: Backjumping ""bb""eebbrrr rrr

Consider the situation in the 8-queens problem where we have allocatedthe first five queens. No queen can be placed within column 6.

3,4

1

2,5

4,5

3,5

1

2

3

conflicting queens

with this position

1 2 3 4 5 6 7 8

8

7

6

5

4

12

3

Backtracking would backtrack to column 5 and find another row for thisqueen (row 8 is feasible). But then it is still impossible to place a queenin column 6.

Backjumping is more intelligent in finding the “real conflict”. The closestqueen that can resolve the conflict is queen 4.

In general, backjumping goes back to the lowest level of the tree that hasa conflict with each possible value for the current variable.



Consistency Techniques ""bb""eebbrrr rrr

Consistency techniques try to rule out inconsistent assignments at a very earlystage.

Let us start with an example:

Let A be a variable with domain DA = 3, 4, 5, 6, 7 and B be a variablewith domain DB = 1, 2, 3, 4, 5.Let A < B a given constraint.

For some values in DA there does not exist a consistent value in DB

satisfying the constraint DB and vive versa. Such values can be safelyremoved from the respective domains without loss of any solution.

We get reduced domains DA = 3, 4 and DB = 4, 5.Note that this reduction does not remove all inconsistent pairs(A = B = 4 is still possible),

but for each value of A from DA it is possible to find a consistent value ofB and vice versa.



Constraint (Hyper)graphs ""bb""eebbrrr rrr

With each constraint satisfaction problem (CSP) we can associate aso-called constraint (hyper)graph.

The nodes of the hypergraph correspond one-to-one to the variables.

Each constraint forms a hyperedge: the node set representing thehyperedge are exactly those variables which occur in the constraint.

A constraint to which only one variable is bound, is called unary.

If all constraints are binary (=exactly two variables occur in eachconstraint), the hypergraph is an ordinary graph.

Various consistency techniques are formulated in terms of constraintgraphs.



Node Consistency ""bb""eebbrrr rrr

The simplest consistency technique is referred to as nodeconsistency (NC).

Definition

The node representing a variable X in a constraint graph is nodeconsistent if and only if for every value x in the current domain DX

of X , each unary constraint on X is satisfied.

A CSP is node consistent if and only if all variables are nodeconsistent.

If the domain DX of a variable X contains a value x that does notsatisfy some unary constraint on X , this node inconsistency cansimply be eliminated by removing such values from the domain DX .



Arc Consistency ""bb""eebbrrr rrr

The next consistency technique considers binary constraints.

Definition

Let X ,Y be two variables which occur in a binary constraint. Wesay the ordered pair (X ,Y ) is arc consistent if and only if for everyvalue x in the current domain of DX there is some value y in thedomain of Y such that X = x and Y = y is permitted by thebinary constraint between X and Y .

A CSP is arc consistent if and only if every arc (X ,Y ) in theconstraint graph is arc consistent.

Note: the concept of arc consistency is directional. If an arc(X ,Y ) is arc consistent, this does not imply that (Y ,X ) is alsoconsistent.



Achieving Arc Consistency ""bb""eebbrrr rrr

An arc (X ,Y ) can be made consistent by simply deletingthose values from the domain of X for which no value in thedomain Y does exist such that the binary constraint betweenX and Y is satisfied.

To achieve overall arc consistency it is necessary to apply thisreduction procedure repeatedly as long as the domain of anyvariable changes.

Example:

X

Z

Y

X = Z Y < Z

1,2 1,2

1,2

X

Z

Y

X = Z Y < Z

1 1

2

not arc consistent arc consistentmade



Is Arc Consistency Sufficient? ""bb""eebbrrr rrr

Achieving arc consistency removes many inconsistencies from theconstraint graph.

Is arc consistency sufficient to guarantee that a solution exists?

Apparently not:

X

Z

Y

X = Z

1,2 1,2

1,2

Y = Z

X = Y

but infeasible

problem is arc−consistent,

Remark: Stronger notions of consistency (path consistency, K -consistency) canbe used to remove even more inconsistencies, but in general

they are computationally much more expensive, and

they still cannot fully eliminate the need for backtracking.

(For more details, we refer to the literature on constraint programming.)



Constraint Propagation ""bb""eebbrrr rrr

Consistency techniques are helpful to reduce the search space.

Let us now embed consistency techniques into the searchalgorithm.

Such schemes are usually called look-ahead strategies andthey are based on the idea of reducing the search spacethrough constraint propagation.

Remark: backtracking can be seen as a combination ofdepth-first search and a fraction of arc consistency: at eachnode, we test arc consistency among the already instantiatedvariables, i.e. we check the validity of constraints consideringthe partial instantiation.



Forward Checking ""bb""eebbrrr rrr

Forward checking is the easiest way to prevent future conflicts.

Instead of performing arc consistency between instantiated variables, itperforms arc consistency between pairs of a non-yet instantiated variableand an instantiated variable.

It maintains the invariant that for every un-instantiated variable thereexists at least one value in its domain which is compatible with the valuesof instantiated variables.

When a value is assigned to the current variable, any value in the domainof a “future” variable which conflicts with this assignment is(temporarily) removed from the domain.

More precisely, values are removed for the whole subtree rooted at thecurrent instantiation (but of course not for other branches of the searchtree!).

Note: whenever a new variable is considered, all its remaining values areguaranteed to be consistent with the past variables.



Example: Forward Checking and 4-Queens ""bb""eebbrrr rrr

x

x

x

xxx

x xx

x

x

x

x

x

x x x

x

x

xx

x

x x x

x

x

xx

x

x

x x x

x

x x x

x

x

x x

x

x

x

x

x

x

x x

x

x

x

x

x

x

x

x

x

x

x

x

x

If the domain of a variable becomes empty, the subtree rooted atthe corresponding node of the search tree can be pruned.



Full Look Ahead ""bb""eebbrrr rrr

The more computational effort is spent the moreinconsistencies can be removed.

Forward checking performs only the checks of constraintsbetween the current variable and the future variables.

In (full) look ahead we check the constraints between thefuture variables and the future variables and the currentvariable.

x

x

x

xxx

x x x

x

x

x

x

x

x

x

x

x

x



Variable Ordering ""bb""eebbrrr rrr

The variable ordering can noticeably change the efficiency of thesearch.

What variable ordering should be chosen in general?

Similar to our discussion for branch-and-bound, there are severalheuristics:

Rule 1: Prefer the variables with the smaller domain.Rationale: To succeed, try first where you are most likely tofail. If failure is inevitable, then the sooner we discover it, thebetter.

Rule 2: In case of a tie, prefer the variable with moreconstraints to instantiated variables.Rationale: Deal with the hard cases first: they can only getmore difficult.


Decision-Based Approaches 4.6 Dynamic Programming

Dynamic Programming ""bb""eebbrrr rrr

Dynamic Programming is an exact optimization method whichsolves a problem by combining the solutions to subproblems.

A dynamic programming algorithm solves every subproblemjust once and saves its answer in a table, thereby avoiding thework of recomputing the answer to subproblems which havealready been solved in an earlier step.

In contrast, a divide-and-conquer algorithm would repeatedlysolve common subproblems (does more work than necessary).



Designing a Dynamic Programming Approach ""bb""eebbrrr rrr

The development of a dynamic programming algorithm can bebroken into a sequence of four steps:

1 Characterize the structure of optimal solutions.

2 Recursively define the value of an optimal solution.

3 Compute the value of an optimal solution in a bottom-upfashion.

4 Construct an optimal solution from computed information.



Applicability of Dynamic Programming ""bb""eebbrrr rrr

An optimization problem must have two key ingredients fordynamic programming to be (reasonably) applicable:

optimal substructure: an optimal solution to a problemcontains within it optimal solutions to subproblems

overlapping subproblems: a recursive algorithm revisits thesame problem over and over again



Example I: Matrix Chain Multiplication ""bb""eebbrrr rrr

Example: Compute the matrix product

A1A2A3

where A1 is a 10× 100, A2 a 100× 5, and A3 a 5× 50 matrix.

Assumption: We use a straightforward matrix multiplication algorithm whichrequires pqr scalar multiplications to multiply matrix A of dimension p× q withmatrix B of dimension q × r .

In which order should we multiply the matrices?

(A1A2)A3 or A1(A2A3) ??

(A1A2)A3 requires 10 · 100 · 5 + 10 · 5 · 50 = 5000 + 2500 = 7500 scalarmultiplications.

A1(A2A3) requires 100 · 5 · 50 + 10 · 100 · 50 = 25000 + 50000 = 75000 scalarmultiplications.

Hence, it this case the first option is 10 times faster!




A product of matrices is fully parenthesized if it is either

a single matrix or

the product of two fully parenthesized matrix products,surrounded by parentheses.

The matrix chain multiplication problem:Input: Given n matrices A1,A2, . . . ,An where matrix Ai hasdimension pi−1 × pi for i = 1, . . . , n.Task: Fully parenthesize the product A1 · A2 · · ·An in a way thatminimizes the number of scalar multiplications.




Step 1: (structure of an optimal solution)An optimal parenthesization of the product A1 · A2 · · ·An splits theproduct between Ak and Ak+1 for some integer k in the range1 ≤ k < n.Key observation: The prefix subchain A1 · A2 · · ·Ak within theoptimal parenthesization of A1 · A2 · · ·An must be an optimalparenthesization of A1 · A2 · · ·Ak (a similar property holds for thesuffix subchain).Step 2: (recursive solution)Let m[i , j ] be the minimum number of scalar multiplicationsneeded to compute the matrix product Ai · · ·Aj . The followingrecursion holds:

m[i , j ] =

0 if i = j

mini≤k<j m[i , k] + m[k + 1, j ] + pi−1pkpj if i < j




Step 3: (bottom-up solution of the recurrence)for i = 1 to n do initialize m[i , i ] := 0;for ` = 2 to n do

for i = 1 to n − `+ 1 doj := i + `− 1;m[i , j ] :=∞;for k = i to j − 1 do

q := m[i , k] + m[k + 1, j ] + pi−1pkpj ;if q < m[i , j ] then

m[i,j] := q;s[i,j] := k;

An auxiliary table s[i , j ] (1 ≤ i ≤ n, 1 ≤ j ≤ n) records which indexof k achieved the optimal cost in computing m[i , j ].




Step 4: (construct the optimal solution)The following recursive procedure computes the matrix chain productAi · · · · · Aj given the matrices A = (A1, . . . ,An), the s[i , j ] table computed instep 3, and the indices i and j .

MATRIX-CHAIN-MULTIPLY (A, s, i , j) if j > i then

X := MATRIX-CHAIN-MULTIPLY (A, s, i , s[i , j ]);Y := MATRIX-CHAIN-MULTIPLY (A, s, s[i , j ] + 1, j);return MATRIX-MULTIPLY (X ,Y );

elsereturn Ai ;

The initial call is MATRIX-CHAIN-MULTIPLY (A, s, 1, n).



Example II: Sequence Alignment ""bb""eebbrrr rrr

Pairwise sequence alignment or inexact matching is the problem ofcomparing two sequences while allowing certain mismatches betweenthem.Motivation: Mutation in DNA is a natural evolutionary process. DNAreplication errors cause substitutions, insertions and deletions ofnucleotides. This can be seen as “editing” DNA strings over the alphabetΣ = A,C ,G ,T.Similarity can be the clue

to common evolutionary origins, or

to common function.

Example: Input strings: s = GCATCAGC and t = CAATAAGGCG

Alignment of s and t:G C A − T C A G − C −− C A A T A A G G C G



Alignments ""bb""eebbrrr rrr

Let us formalize the notion of an alignment:Let Σ := Σ ∪ − be an extended alphabet, where “-” denotes an emptyspace (we assume − /∈ Σ)

An alignment is a pair (a, b) with(1) a, b ∈ Σ

∗,

(2) |a| = |b|, and(3) there is no index i , 0 ≤ i ≤ |a| − 1 such that ai = bi = −.

Let w ∈ Σ∗. The restriction of w to Σ is defined as

w |Σ= r(w),

wherer(a) = a for all a ∈ Σ,

r(−) = ε, (the empty word)

r(uv) = r(u)r(v) for all u, v ∈ Σ∗.

In other words: restricting w to Σ means nothing else but deleting alloccurrences of spaces “−” from w .



Terminology for Alignments ""bb""eebbrrr rrr

The pair (a, b) is an alignment for a, b ∈ Σ∗ if

a |Σ= a and b |Σ= b.

Consider an alignment (a, b) of length n.For each position i of the alignment (0 ≤ i < n) we distinguishbetween the following four cases:

a match occurs, if ai = bi

a substitution occurs, if ai , bi ∈ Σ, ai 6= bi

an insertion occurs, if ai = −, bi ∈ Σ

a deletion occurs, if ai ∈ Σ, bi = −



Alignment Distance ""bb""eebbrrr rrr

Let c : Σ× Σ 7→ R+ be a cost function. The cost of an alignment(a, b) of (a, b) is defined as

c(a, b) :=

|a|−1∑i=0

c(ai , bi ).

The alignment distance of a, b ∈ Σ∗ is defined as

dc := min c(a, b) | (a, b) is alignment for (a, b) .



Global Sequence Alignment ""bb""eebbrrr rrr

We are now ready to state the fundamental problem of pairwisesequence alignment.

Problem (Pairwise global sequence alignment)

Input: Strings s = s0s1 · · · sn−1 ∈ Σn and t = t0t1 · · · tm−1 ∈ Σm,and a distance measure d based on cost function c.Output: An optimal alignment (s, t) of s and t.

To solve this problem, we will use the concept of dynamicprogramming.Needleman and Wunsch (1970) were the first to apply dynamicprogramming to this problem.



Dynamic Programming for Sequence Alignment ""bb""eebbrrr rrr

Define a table (matrix) D(·, ·) of dimension (n + 1)× (m + 1).The entry D(i , j) denotes the cost of an optimal alignment for thesubstrings s0s1 · · · si−1 and t0t1 · · · tj−1, where 0 ≤ i ≤ n and 0 ≤ j ≤ m.

The following recursion holds for 1 ≤ i ≤ n and 1 ≤ j ≤ m:

D(i , j) = min

D(i − 1, j) + c(si−1,−),D(i − 1, j − 1) + c(si−1, tj−1),D(i , j − 1) + c(−, tj−1).

.

Moreover, we have as initial values

D(0, 0) = 0, D(i , 0) =i−1∑k=0

c(sk ,−) and D(0, j) =

j−1∑k=0

c(−, tk).



Global Alignment: Example ""bb""eebbrrr rrr

Input strings: s = GCATCAGC and t = CAATAAGGCGCost function: match = 0, insert/deletion = 2, substitution = 3

Distance matrix D:

T

A

C

G

C

C

A

G

A A T A A G G C GC

0

1

2

3

4

5

6

7

8

0 1 2 3 4 5 6 7 8 9 10

6 10 128 14 16 18 20

12

14

16

10

4

6

8

11 13

10

6 13

10

14 12 10 11 10 11

12 9 9 11 13

10 8 6

8

8 6 11 13 15

8 7 6 9 11 12

1286546

4 6 8 10 12 16

4 6 8 10 12 14 15 14 16

181614129753

2 4

11

7

7

7

9

4

42

2

2

0

16

14

1714

14

8

9 10 9

9

Optimal alignment:G C A − T C A G − C −− C A A T A A G G C G



Global Alignment: Example (continued) ""bb""eebbrrr rrr

Input strings: s = GCATCAGC and t = CAATAAGGCGCost function: match = 0, insert/deletion = 2, substitution = 3

T

A

C

G

G

C

C

A

C A A T A A G G C G

T

A

C

G

G

C

C

A

C A A T A A G G C G

Optimal alignment:G C A − T C A G − C −− C A A T A A G G C G



Dynamic Programming and the Edit Graph ""bb""eebbrrr rrr

The recursion from the previous slide can be interpreted as a shortestpath problem in a so-called edit graph G . This graph is a directedacyclic graph, and it is constructed as follows:

With each entry D(i , j) in the table D, we associate a vertex (i , j).

For each pair (i , j) with 1 ≤ i ≤ n and 1 ≤ j ≤ m, there are the followingthree arcs:

(i − 1, j)→ (i , j) with length c(si−1,−),(i − 1, j − 1)→ (i , j) with length c(si−1, tj−1),(i , j − 1)→ (i , j) with length c(−, tj−1).

Finally, we add the arcs(i − 1, 0)→ (i , 0) with length c(si−1,−) for 1 ≤ i ≤ n, and(0, j − 1)→ (0, j) with length c(−, tj−1) for 1 ≤ j ≤ m.

A shortest path from (0, 0) to (n,m) corresponds to an optimalalignment.



Example III: INTEGER KNAPSACK ""bb""eebbrrr rrr

INTEGER KNAPSACK PROBLEM

z = maxn∑

j=1

cjxj

subject ton∑

j=1

ajxj ≤ B,

x ∈ Zn+,

where the coefficients aj and B are positive integers.(We may assume aj ≤ B.)

Lemma

INTEGER KNAPSACK is an NP–hard optimization problem.




Define a value function gr (λ) as follows(think of λ taking values from 0, 1 . . . ,B):

gr (λ) = maxr∑

j=1

cjxj

r∑j=1

ajxj ≤ λ, x ∈ Zr+.

Clearly z = gn(B). We can derive the following recursion

gr (λ) =

maxgr−1(λ), cr + gr (λ− ar ), if λ > 0

0, if λ = 0

−∞, if λ < 0

andg0(λ) = 0 for all λ ≥ 0.

This leads to an O(nB) algorithm for INTEGER KNAPSACK.




Concrete example: Given is an instance of INTEGER KNAPSACK with 4items and capacity B = 6. The four items have the following profits ci andsizes ai :

item 1 2 3 4

ci 6 10 15 16

ai 2 3 4 5

This leads to the following table for gr (λ):

r 1 2 3 4

0 0 0 0 01 0 0 0 02 6 6 6 6

λ 3 6 10 10 104 12 12 15 155 12 16 16 166 18 20 21 21

Hence, the optimal solution has value 21, and this value is realized by taking

items 1 and 3 exactly once into the knapsack.



Knapsack As a Longest Path Problem ""bb""eebbrrr rrr

Another recursion can be used for the INTEGER KNAPSACK problem, namelyfor h(λ) = gn(λ):

h(λ) = max0, maxj :aj≤λ

cj + h(λ− aj).

Construct an acyclic digraph D = (V ,A) as follows:The node set is V = 0, 1, . . . ,B, there are arcs (λ, λ+ aj) for all λ ∈ Z1

+

with λ ≤ b − aj with weight cj for j = 1, . . . , n, and arcs (λ, λ+ 1) of weight 0for λ ∈ Z1

+ with λ ≤ B − 1.

Example (from previous slide):

0 1 2 3 4 5 60 0 0 0 0 0

6 6 6 6 6

10 10 10 10

1515 15

16 16

Observation: h(λ) is precisely the value of a longest path fromnode 0 to node λ.



Pseudopolynomial Time ""bb""eebbrrr rrr

The two proposed algorithms for INTEGER KNAPSACK both requireO(nB) time.

Note that this is not a polynomial function of the encoding length of theinput. (Recall that B has encoding length O(logB).)

Let P be a decision or optimization problem where all numbers used todescribe an instance are integers.

For an instance I of such a problem, we denote by largest(I ) the largestof these integers.

An algorithm for P is called pseudopolynomial if its running time isbounded by a polynomial in the encoding length 〈I 〉 and largest(I ).

Hence, our algorithms for the knapsack problem are pseudopolynomial.


Decision-Based Approaches 4.7 Greedy Algorithms

Greedy Algorithms ""bb""eebbrrr rrr

What is the greedy algorithm?

Rough answer: the simplest imaginable strategy for a heuristicrestriction of the search space.

Explore exactly one branch of the decision tree.At each node, choose the option that looks “most promising”at this moment (without foresight = “greedily”).

Due to the second point, this strategy is called the greedyalgorithm.

However, the term “greedy algorithm” is often used in a morerestrictive way: for certain special cases of this generalalgorithmic scheme only.



Independence Systems ""bb""eebbrrr rrr

Definition

Let E be a finite set, and F ⊆ 2E .

A set system (E ,F) is an independence system if(M1) ∅ ∈ F ; and(M2) if X ⊆ Y and Y ∈ F , then X ∈ F .

The elements of F are called independent,the elements of 2E \ F dependent.Maximal independent sets (= no superset is independent) arecalled bases.



Maximization/Minimization Problem forIndependence Systems ""

bb""eebbrrr rrr

MAXIMIZATION PROBLEM FOR INDEPENDENCE SYSTEMSInstance: An independence system (E ,F) and c : E 7→ R.Task: Find an X ∈ F such that c(X ) :=

∑e∈X c(e) is maximum.

MINIMIZATION PROBLEM FOR INDEPENDENCE SYSTEMSInstance: An independence system (E ,F) and c : E 7→ R.Task: Find a basis B ∈ F such that c(B) :=

∑e∈B c(e) is minimum.

Remark: The set F is usually not given by an explicit list of its elements. Weusually assume to have an oracle which - given a subset F ⊂ E - decideswhether F ∈ F .



Examples of Optimization Problems forIndependence Systems ""

bb""eebbrrr rrr

Many combinatorial optimization problems can be formulated as maximizationor minimization problems for independence systems:

Example 1: MAXIMUM WEIGHT STABLE SET PROBLEMGiven a graph G with vertex set V (G) and weights c : V 7→ R, find a stableset X in G of maximum weight.

Here E = V (G) and F = F ⊆ E : F is stable in G.

Example 2: TSPGiven a complete undirected graph G and weights c : E(G) 7→ R+, find aminimum weight Hamiltonian circuit in G .

Here E = E(G) and F = F ⊆ E : F is subset of a Hamiltonian circuit in G.

Example 3: SHORTEST PATH PROBLEMGiven a digraph D = (V ,A), c : A 7→ R and s, t ∈ V such that t is reachablefrom s, find a shortest s-t–path in D with respect to c.

Here E = A and F = F ⊆ E : F is subset of an s-t–path .



Examples of Optimization Problems forIndependence Systems ""

bb""eebbrrr rrr

Example 4: 0-1–KNAPSACK PROBLEMGiven nonnegative integral numbers ci , ai (1 ≤ i ≤ n) and B, find a subsetS ⊆ 1, . . . , n such that

∑j∈S aj ≤ B and

∑j∈S cj is maximum.

Here E = 1, . . . , n and F = F ⊆ E :∑

j∈F aj ≤ B.

Example 5: MINIMUM SPANNING TREEGiven a connected undirected graoh G and weights c : E(G) 7→ R find aminimum weight spanning tree in G .

Here E = E(G) and F is the set of forests in G .

Example 6: MAXIMUM WEIGHT MATCHING PROBLEMGiven an undirected graph G and weights c : E(G) 7→ R, find a maximumweight matching in G .

Here E = E(G) and F is the set of matchings in G .



BEST-IN-GREEDY Algorithm ""bb""eebbrrr rrr

We consider the maximization problem for independence systems.

Input: An independence system (E ,F), given by an independence oracle (i.e.an oracle which, given a set F ⊆ E , decides whether f ∈ F or not), weightsc : E 7→ R+.

Output: A set F ∈ F .

STEP 1: Sort E = e1, e2, . . . , en such that c(e1) ≥ c(e2) ≥ · · · ≥ c(en).

STEP 2: Set F := ∅.STEP 3: FOR i = 1 TO n DO

IF F ∪ ei ∈ F THEN set F := F ∪ ei.

Remark: We do not have to consider negative weights since elements withnegative weight never appear in an optimum solution.



WORST-OUT-GREEDY Algorithm ""bb""eebbrrr rrr

Let us now consider the minimization problem for independence systems.The following greedy algorithm requires a more complicated oracle. Given a setF ⊆ E , the oracle decides whether F contains a basis (basis superset oracle).

Input: An independence system (E ,F), given by a basis superset oracle,weights c : E 7→ R.

Output: A basis F of (E ,F).

STEP 1: Sort E = e1, e2, . . . , en such that c(e1) ≥ c(e2) ≥ · · · ≥ c(en).

STEP 2: Set F := E .

STEP 3: FOR i = 1 TO n DOIF F \ ei contains a basis THEN set F := F \ ei.



How Good is the Greedy Algorithm? ""bb""eebbrrr rrr

In general, the solution deliverd by the greedy algorithm canbe quite poor.

In Section 3, we have seen that the simple local-searchstrategy is guaranteed to deliver an optimal solution if theproblem fulfills a certain structural property (exactneighborhood).

So an interesting question is whether the greedy algorithmprovably delivers optimal solutions if some structural propertyis fulfilled.

The remainder of this section is devoted to a characterizationof this for the case of independence systems. This leads us toso-called matroids.



Matroids ""bb""eebbrrr rrr

Definition

An independence system (E ,F) is a matroid if(M3) If X ,Y ∈ F and |X | > |Y |, then there is an x ∈ X \ Y withY ∪ x ∈ F .

The name matroid points out that this structure is a generalizationof matrices.

Example 1: Matric matroidE is the set of columns of a matrix A over some field, andF := F ⊆ E : The columns in F are linearly independent overthat field.Example 2:E is a finite set, k an integer and F := F ⊆ E : |F | ≤ k.



Graphic Matroid ""bb""eebbrrr rrr

Example 3: Graphic matroidE is a set of edges of some undirected graph G = (V ,E) andF := F ⊆ E : the subgraph (V ,F ) is a forest .

Proof (that (M3) is fulfilled:)

Let X ,Y ∈ F and suppose Y ∪ x 6∈ F for all x ∈ X \ Y .We have to show that |X | ≤ |Y |.For each edge x = v ,w ∈ X , v and w are in the same connectedcomponent of (V ,Y ) (by our assumption).

Hence each connected component of (V ,X ) is a subset of a connectedcomponent of (V ,Y ).

So the number p of connected components of the forest (V ,X ) is greaterthan or equal to the number q of connected components of (V ,Y ).

Since p = |V | − |X | and q = |V | − |Y |, this implies |X | ≤ |Y |. 2



Characterizations of Matroids ""bb""eebbrrr rrr

Theorem

Let (E ,F) be an independence system. Then the following statements areequivalent:

(M3) If X ,Y ∈ F and |X | > |Y |, then there is an x ∈ X \Y with Y ∪x ∈ F .

(M3’) If X ,Y ∈ F and |X | = |Y |+ 1, then there is an x ∈ X \ Ywith Y ∪ x ∈ F .

(M3”) For each X ⊆ E , all bases of X have the same cardinality.

Proof:

Trivially, (M3) ⇒ (M3’) ⇒ (M3”).

To prove (M3”) ⇒ (M3), let X ,Y ∈ F and |X | > |Y |.By (M3”), Y cannot be a basis of X ∪ Y .

So there must be an x ∈ (X ∪Y ) \Y = X \Y such that Y ∪ x ∈ F . 2



Matroids and the Greedy Algorithm ""bb""eebbrrr rrr

Theorem

Let (E ,F) be an independence system.

(E ,F) is a matroid if and only if the BEST-IN-GREEDY algorithm finds anoptimal solution for the maximization problem for (E ,F , c) for every costfunction c : E 7→ R+.

Proof: ”⇒”: Suppose first that (E ,F) is a matroid.

Let F = e1, . . . , ek be the solution constructed by the greedy algorithm.

Suppose for a contradiction that there is an independent setF ′ = f1, . . . , f` such that

c(F ) =k∑

i=1

c(ei ) <∑j=1

c(fj) = c(F ′) .

Since c[·] ≥ 0, we may assume that F ′ is a basis.

The output of the greedy algorithm F is also a basis.

By the theorem from Slide 325, |F | = |F ′|, and thus k = `.



Proof: Matroid ”⇒” Greedy is Optimal(continued) ""

bb""eebbrrr rrr

Now assume that the elements in F and F ′ are ordered such thatc(e1) ≥ c(e2) ≥ · · · ≥ c(ek) and c(f1) ≥ c(f2) ≥ · · · ≥ c(fk).

Since c(F ) < c(F ′), there must be a smallest index m such thatc(ei ) ≥ c(fi ) for i = 1, . . .m − 1 and c(em) < c(fm).

Y = e1, e2, . . . , em−1 is independent, and X = f1, f2, . . . , fm isindependent too. By (M3) there exists fi ∈ X \ Y such that Y ∪ fi isindependent.

As c(fi ) > c(em), fi was tested by the greedy algorithm before em but wasrejected. This is a contradiction.



Proof: Second Part ""bb""eebbrrr rrr

In this direction,

we assume that the independence system is not a matroidand have to show that the greedy algorithm will fail to producean optimal solution for at least one choice of c .

So suppose there are independent sets F1, F2 such that |F1| = p and|F2| = p + 1 but F1 ∪ e 6∈ F is not independent for any e ∈ F2 \ F1.

We may assume that p ≥ 2.

We will construct a particular objective function c such that the greedyalgorithm will fail to produce an optimal solution for c.



Proof: Second Part ""bb""eebbrrr rrr

Let us consider the following cost function c for E :

c(e) =

p + 2 if e ∈ F1

p + 1 if e ∈ F2 \ F1

0 otherwise .

We observe that F1 is suboptimal, because

c(F2) ≥ (p + 1)2 > p(p + 2) = c(F1).

The greedy algorithm, if applied to this instance, will start by picking allelements of F1.

Afterwards, it will not improve the total weight, because for all otherelements either F1 ∪ e 6∈ F , or otherwise, c(e) = 0.

Hence, we have proved that the greedy algorithm fails to produce anoptimal solution for at least one cost function, if the independence systemis not a matroid. 2



Minimization Problem for Matroids ""bb""eebbrrr rrr

Let (E ,F) be a matroid and c : E 7→ R be a cost function.

Suppose we want to solve the MINIMIZATION PROBLEM for (E ,F , c), i.e.

(1) min∑e∈F

c(e) |F is basis.

This is clearly equivalent to

(2) max∑e∈F

−c(e) |F is basis.

Let M = maxc(e)|e ∈ E+ 1. Since (E ,F) is a matroid, all bases have thesame cardinality. Therefore, (2) is equivalent to

(3) max∑e∈F

(M − c(e)) |F is basis.

Using the abbreviation c(e) := M − c(e) > 0, we can write (3) as

(4) max∑e∈F

c(e) |F ∈ F.

Thus, we have seen that for matroids the MINIMIZATION PROBLEM for(E ,F , c) is equivalent to a MAXIMIZATION PROBLEM for (E ,F , c).



Application to Minimum Spanning Trees ""bb""eebbrrr rrr

Consequence: The BEST-IN-GREEDY applied to (E ,F , c) solves the originalMINIMIZATION PROBLEM for (E ,F , c) to optimality.

Note: Working with c instead of c only reverses the element order used withinthe BEST-IN-GREEDY algorithm.

Application:

We have seen that the minimum spanning tree problem on a graphG(V ,E) with edge costs c can be formulated as a minimization problemover an independence system: E is the edge set of an undirected graphand F is the set of forests in G .

We have verified that (E ,F) is a matroid.

With the above remarks we conclude that the BEST-IN-GREEDY appliedto elements ordered by increasing weight yields an optimal solution.

This algorithm is well-known as Kruskal’s algorithm for MINIMUMSPANNING TREE.



Greedy Algorithm for Start Solutions ""bb""eebbrrr rrr

The greedy algorithm is often useful to generate a first start solution forlocal search.

Example: the cardinality matching problem.

Greedy matching:

1 Set M = ∅.2 For each edge e ∈ E (in an arbitrary order) do:

If M ∪ e is a matching, insert e into M.3 Deliver the final M.

Very often, the greedy matching is not much smaller than a maximalmatching.

With a greedy matching at hand we can apply local search with an exactneighborhood (i.e. search for augmenting paths) afterwards.

Such a heuristic may speed up the search for a maximum matchingconsiderably (as the remaining number of iterations is relatively small).

Matching is but an example of greedy start solution; we can use greedyon any independence system to compute a (hopefully) goodinclusion-maximal independent set.


Section 5: Approximation Algorithms


Approximation Algorithms

Absolute Approximations ""bb""eebbrrr rrr

Let P be an optimization problem, I an instance of P.

OPT (I ) denotes the objective value of an optimal solution.

APP(I ) denotes the objective value delivered by an algorithm A.

If we cannot solve a problem to optimality when the ideal case would beto find a solution which is guaranteed to differ from the optimum only bya (small) constant:

Definition

A polynomial-time algorithm A for an optimization problem P is called anabsolute approximation algorithm if there exists a constant k such that

|APP(I )− OPT (I )| ≤ k

for all instances I of P.



k-Factor Approximations ""bb""eebbrrr rrr

Unfortunately, absolute approximation algorithms are only known for veryfew classical NP–hard optimization problems.

In most cases we must be satisfied with relative performance guarantees.

Definition

Let P be an optimization problem and k ≥ 1 a constant.A k–factor approximation algorithm for P is a polynomial time algorithm A forP such that

1

k· OPT (I ) ≤ APP(I ) ≤ k · OPT (I )

for all instances I of P.We also say that A has performance ratio k.

The first inequality applies to maximization problems, the second one tominimization problems.

Note that exact polynomial time algorithms are precisely the 1-factorapproximations.



Example 1: Maximum Weighted Matching ""bb""eebbrrr rrr

Lemma

The greedy algorithm delivers a factor 2 approximation for the maximumweighted matching problem.

Proof:

Let Mopt be a maximum weight matching, and Mapp be the output of thegreedy algorithm. Consider the symmetric difference Mopt4Mapp.

Note that no connected component can be a singleton edge.

Each edge e of Mopt is adjacent to at most two edges, say e′ and e′′ fromMapp in this symmetric difference.

If w(e) > w(e′) and w(e) > w(e′′), then the greedy algorithm wouldhave chosen e instead of e′ and e′′.

Thus, at least one of the following inequalities must hold:w(e′) ≥ w(e) or w(e′′) ≥ w(e).

So let us charge the weight of e to an adjacent edge of Mapp with at leastthe same weight.

Obviously, each edge of Mapp is charged at most twice.

Hence, w(Mopt) ≤ 2w(Mapp). 2c©2006 M. Muller-Hannemann & K. Weihe Algorithmics - TU Darmstadt Optimization Algorithms 334


Example 2: Minimum Vertex Cover ""bb""eebbrrr rrr

Recall the VERTEX COVER problem from Slide 84 ff.

There is a simple approximation algorithm for VERTEX COVER:

Just find any maximal matching M and take the ends of all edges in M asthe vertex cover.

Lemma

This algorithm delivers a 2–approximation for VERTEX COVER.

Proof:

The output is obviously a vertex cover and contains 2|M| vertices.

Any vertex cover must contain |M| vertices, since no vertex covers twoedges of M.

Therefore, |M| ≤ OPT (I ) ≤ APP(I ) = 2|M| ≤ 2 · OPT (I ). 2



Example 3: Steiner Trees ""bb""eebbrrr rrr

Input: An instance of the Steiner tree problem in graphs, given byG = (V ,E ), edge weights w , and terminal set K .

Approximation algorithm:Step 1:Construct a distance network Nd on the set of terminal K .

A

B

C

D

B

CD

A3

333

2

4

The distance network Nd is a complete graph on K ; its edgeweights correspond to distances d in the original graph G withrespect to w .



Approximation Algorithm for Steiner Trees ""bb""eebbrrr rrr

Step 2:Compute a minimum spanning tree T ′ = (K ,E ′) on Nd .

B

CD

A3

333

2

4

B

CD

A3

333

2

4



Approximation Algorithm for Steiner Trees ""bb""eebbrrr rrr

Step 3: Transform the result into a Steiner tree

a) Replace each edge e ∈ E ′ by the corresponding path in G .b) If the resulting graph contains a cycle, remove the longest edge

from each cycle.c) Remove leaves of the tree which do not correspond to terminals.

B

CD

A3

333

2

4

A

B

C

D

A

B

C

D

constructed Steiner tree has total length 7optimal Steiner tree has total length 6



Approximation Guarantee ""bb""eebbrrr rrr

Theorem (2-approximation)

The proposed algorithm computes a Steiner tree which is at mosttwice as long as the optimum Steiner tree.

Proof (sketch):Show that there is a spanning tree in Nd which is at most twice as longas the optimum Steiner tree.

A

B

C

D

A

B

C

D

3

43

2

Spanning tree with edges A-B, B-C and C -D in Nd has this property.



Example 4: General TSP isHard to Approximate ""

bb""eebbrrr rrr

In all previous examples, it was quite easy to design simple approximationalgorithms with constant factor performance guarantees.

The general TSP, however, is hard to approximate:

Theorem

Unless P = NP there is no k-factor approximation algorithm for the TSP forany k ≥ 1.

Proof:

Suppose there is a k-factor approximation algorithm A for the TSP.

Then we prove that there is a polynomial-time algorithm for theNP–complete HAMILTONIAN CIRCUIT problem. This would implyP = NP.

Given a graph G = (V ,E), we construct an instance of the TSP withn = |V | cities. The distances are defined as

c[i , j ] =

1 if i , j ∈ E

2 + (k − 1)n if i , j 6∈ E .



Example 4: General TSP isHard to Approximate ""

bb""eebbrrr rrr

Proof (continued):

Now we apply algorithm A to this instance.

If algorithm A returns a tour of length n, then this tour is a Hamiltoniancircuit in G .

Otherwise, the returned tour has length at least

n − 1 + 2 + (k − 1)n = kn + 1.

If OPT denotes the length of the optimum tour, and APP denotes thelength of the tour returned by A, then

kn + 1 ≤ APP ≤ k · OPT ,

since A is a k-factor approximation.

We conclude, that OPT > n, showing that G has no Hamiltonian cycle.2



Example 5: Metric TSP ""bb""eebbrrr rrr

We have seen that the TSP is hard to approximate for arbitrary distancematrices.

However, in many applications, the distances of TSP instances satisfy thetriangle inequality. This motivates the following definition

METRIC TSPInstance: A complete graph G = (V ,E) with weights c : E 7→ R+ such that

c(x , y) + c(y , z) ≥ c(x , z) for all x , y , z ∈ V .

Task: Find a Hamiltonian circuit in G of minimum weight.

Theorem

The METRIC TSP is NP–hard.

Proof: Transformation from HAMILTONIAN CIRCUIT as in the proof for the

general TSP. 2



Approximation of METRIC TSP ""bb""eebbrrr rrr

Double-Tree-Algorithm:STEP 1: Find a minimum spanning tree T in G with respect to c.STEP 2: Create a multigraph T ′ by using two copies of each edge of T .STEP 3: Find an Eulerian walk in T ′.STEP 4: Transform this walk into a tour by taking shortcuts.

1

2

3

4

5

6

7

8 10

9

STEP 1:

1

2

3

4

5

6

7

8 10

9

STEP 4:

1

2

3

4 6

7

8 10

9

5

Optimal tour:

1

2

3

4

5

6

7

8 10

9

STEP 2:

Eulerian walk: 1−2−3−2−4−6−5−7−5−6−8−10−9−10−8−6−4−2−1STEP 3:



Analysis of the Double-Tree-Algorithm ""bb""eebbrrr rrr

Theorem

The Double-Tree-Algorithm is a factor 2 approximation algorithm for theMETRIC TSP.

Proof:

The length of a minimum spanning tree c(E(T )) is certainly a lowerbound for the length OPT of an optimal tour (since by deleting one edgefrom any tour we get a spanning tree).

Therefore, c(E(T ′)) = 2 · c(E(T )) ≤ 2 · OPT .

In STEP 4, we transform an Eulerian walk of length c(E(T ′)) into a tour.

The tour is defined by the order in which the vertices appear in this walk— we ignore all but the first occurrence of a vertex.

The triangle inequality implies that this tour is no longer than c(E(T ′)).2



Christofides Algorithm ""bb""eebbrrr rrr

The Double-Tree-Algorithm can be improved. The best known approximationalgorithm for the TSP is due to Christofides (1976).

Christofides Algorithm:STEP 1: Find a minimum spanning tree T in G with respect to c.

STEP 2: Let W be the vertices in T which have odd degree.Find a minimum weight perfect matching M in the complete graphconsisting of these nodes only.

STEP 3: Create a multigraph T ′ = (V ,E(T ) ∪M) formed by the edges ofT and M.

STEP 4: Find an Eulerian walk in T ′.

STEP 5: Transform this walk into a tour by taking shortcuts.

Note:

A perfect matching exists since |W | is even (the number of nodes withodd degree is even for any graph).

T ′ is Eulerian (each node has even degree), so an Eulerian walk exists.



Analysis of Christofides Algorithm ""bb""eebbrrr rrr

Theorem

Christofides algorithm delivers a 32–approximation for the METRIC TSP.

Proof:

The tour constructed by the algorithm is no longer than the Eulerian walkT ′ which has length c(E(T ′)) = c(E(T )) + c(M).

Again the spanning tree T is a lower bound for the optimal tour lengthOPT , that is, c(E(T )) ≤ OPT .

Thus, it suffices to show c(M) ≤ 12OPT .

Let w1,w2, . . . ,w2k be the vertices of W in the order that they appearin the optimal tour TOPT .

Consider the two perfect matchingsM1 = (w1,w2), (w3,w4), . . . , (w2k−1,w2k) andM2 = (w2,w3), (w4,w5), . . . , (w2k ,w1).Note that OPT ≥ c(M1) + c(M2) by the triangle inequality.

Since M is a minimum perfect matching, c(M1) + c(M2) ≥ 2 · c(M). 2



More on Approximations ""bb""eebbrrr rrr

Is this the end of the story?

No, not at all - but almost the end of this lecture!

Ideally, we would like to get arbitrarily close to the optimum solution (inpolynomial time).

Thus, for any ε > 0, we would like to have an algorithm Aε which deliversa (1 + ε)–approximation in polynomial time in the input length of thegiven instance.

Such a family of algorithms Aε is called a polynomial-time approximationscheme (PTAS).

For many problems there is such a PTAS (but not for METRIC TSP).

The interested reader is referred to the books by Hochbaum (ed.) andVazirani (confer Slide 10).


Optimization Algorithms - Technische Universität · PDF fileCombinatorial Optimization, Randomization, ... Combinatorial Optimization: Algorithms and Complexity, Prentice Hall, Englewood

Documents