Optimization Algorithms Karsten Weihe Algorithmics - Technische Universit¨ at Darmstadt http://www.algo.informatik.tu-darmstadt.de/ Winter Term 2012 / 2013 Copyright c 2012 by Matthias M¨ uller-Hannemann and Karsten Weihe All rights reserved c 2006 M. M¨ uller-Hannemann & K. Weihe Algorithmics - TU Darmstadt Optimization Algorithms 1
347
Embed
Optimization Algorithms - Technische Universität · PDF fileCombinatorial Optimization, Randomization, ... Combinatorial Optimization: Algorithms and Complexity, Prentice Hall, Englewood
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Not a formal requirement, but will be an essential part of theexam!
Algorithmic problem:
Input: a finite set of rectangles, each given by its two edgelengths (aka width and height)Output: a placement of all rectangles in the plane such that
the rectangles are placed openly disjoint, that is, the openinteriors of any two rectangles do not intersect, andthe placement of each rectangle is axis-parallel, that is, theedges are parallel to the coordinate axes (may be turned by 90degrees).
Objective: minimize the area of the bounding box, that is, thesmallest axis-parallel rectangular area enclosing all inputrectangles.
H.H. Hoos and T. Stutzle: Stochastic Local Search, Morgan KaufmannPublishers Inc, 2004.
J. Hromkovic: Algorithmics for Hard Problems. Introduction toCombinatorial Optimization, Randomization, Approximation, andHeuristics (Texts in Theoretical Computer Science. An EATCS Series),Springer Verlag, 2001.
The total length of the tree is not the only relevant objective.
The pins are not completely exchangeable: One designatedpin serves as a driver (“source”) which sends a signal to allother pins (“sinks”) in its set.→ Look for a directed rooted Steiner tree (rooted at thesource).
Important objective is (roughly) to minimize the maximal runtime from each driver to all of its sinks.
This objective is much harder.
For certain stages of the VLSI design process, the objectivesfrom Slide no. 22 are sensible alternatives:
much easier to handle mathematically,probably (hopefully!) close enough to reality.
A communication network should not become disconnected ifone server (node) or one connection (arc) breaks down due tohardware/software failures.
In terms of graph theory: Any two nodes A and B should beconnected through the network by at least two paths which are
edge–disjoint in case only break–downs of edges are relevant;node–disjoint except for A and B (a.k.a. internallynode–disjoint) in case break–downs of servers shall also betaken into account.
Further realistic variants:
At least three, four... connecting, disjoint paths for any twonodes A and B.Different numbers of required disjoint paths for pairs A, Bwith different priorities (e.g. important companies vs. people inthe outback).
A set of rooms, each coming with the number of seats.A set of time slots, supposed to be non-overlapping (to makethings easier).A set of courses, each filling exactly one time slot (to makethings easier).A set of students, each coming with a list of courses in whichthis student is registered.
Find:an assignment of rooms and time slots to courses such that
no two courses are assigned the same room and time slot,the audience of a course is not larger than the capacity of theroom assigned to this course, andno student is registered for two courses at the same time.
Background: These points are probes which would ideally (i.e.without inaccuracies in the measured values) reveal anaffine–linear relation between the two parameters (=lie on astraight line).
Desired output: the “best approximating straight line”.
Given two distinct points x , y ∈ Rn, a convex combination ofthem is any point of the form z = λx + (1−λ)y for λ ∈ R and0 ≤ λ ≤ 1 (this is a strict convex combination if 0 < λ < 1).
A set S ⊆ Rn is convex if it contains all convex combinationsof pairs of points x , y ∈ S .
Example 6: convex optimization problem ""bb""eebbrrr rrr
a bx z y
f
A function f : S 7→ R (where S ⊆ Rn is a convex set) isconvex in S if for any two points x , y ∈ S and 0 ≤ λ ≤ 1 wehave
λf (x) + (1− λ)f (y) ≥ f (λx + (1− λ)y).
Examples of convex functions: linear, quadratic, exponential.
Let S ⊆ Rn be a convex set, and f : Rn 7→ R be a convexfunction. Then the problem of finding an x ∈ S thatminimizes f (x) among all x ∈ S is called a convexminimization problem.
Example 8: Integer Linear Programs ""bb""eebbrrr rrr
Given c ∈ Rn,A ∈ R(m,n), b ∈ Rm, then
maximize cT x subject toAx ≤ bx ∈ Zn
is called aninteger linear programming problem (ILP).
If a variable is restricted to be zero-one valued, it is called abinary variable.
If all variables are binary variables, the problem is called abinary linear programming problem.
If only a subset of the variables is required to beinteger-valued, the corresponding problem is called a mixedinteger linear programming problem (MILP).
Introduction 1.3 General Discussion of Algorithmic Problems
Construction and Optimization Problems ""bb""eebbrrr rrr
Algorithms for decision problems are typically constructive,which means they solve the corresponding constructionproblem as well.
Trivially, each construction problem may be viewed as anoptimization problem: just define an objective that assigns thesame value to each solution.
Moreover, many generic algorithms are in fact applicable tooptimization problems only.
To apply such a generic algorithm to a construction problem,it has to be transformed into an optimization problem (lateron, we will see general techniques for that).
→ For all of these reasons, we may focus on optimization problemsin the following.
Introduction 1.3 General Discussion of Algorithmic Problems
More Formal Specification of theMatching Problem ""
bb""eebbrrr rrr
For an instance G = (V ,E ) of the matching problem, theelements of SG may be alternatively encoded as the set of all0/1–vectors x defined on the index set E .
Interpretation: x [e] = 1 if and only if e ∈ M
Side constraints:For x ∈ SG and all v ∈ V , it must be∑
Introduction 1.3 General Discussion of Algorithmic Problems
More Complex Example:the General TSP Revisited ""
bb""eebbrrr rrr
I can be viewed as the set of all quadratic real–valuedmatrices D:
D[i , j ] = distance from point no. i to point no. j .
−→ Cf. Slide no. 14.
For an (n × n)–matrix I ∈ I, SI may then be the set of allquadratic 0/1–matrices X of size n:X [i , j ] = 1 ⇐⇒j follows i immediately on the cycle corresponding to X .
Introduction 1.3 General Discussion of Algorithmic Problems
Feasibility and Boundedness ""bb""eebbrrr rrr
An instance I ∈ I of an algorithmic problem is called feasible,if FI 6= ∅, otherwise infeasible.An instance I ∈ I of a minimization (resp. maximization)problem is called bounded, if the objective function isbounded over FI from below (resp. above), otherwise, it itcalled unbounded.
Note:
Boundedness of an instance I ∈ I is not identical with the normal,set–theoretic boundedness of FI .
Consider the important case that FI ⊆ Rn for some n, FI is closed, andthe objective function is continuous on FI :
Basic calculus says for this case: boundedness of FI (i.e., FI iscompact) implies the existence of a minimum and a maximum.In particular, boundedness of I .
The interval FI = (0, 1] is obviously bounded, but min log x is unboundedover FI .
Introduction 1.3 General Discussion of Algorithmic Problems
Exact vs. Approximation vs. Heuristic ""bb""eebbrrr rrr
An algorithm is called exact if:
Feasibility version: It provably finds a feasible solution if thereis one.
Optimization version: It provably finds an optimal solution ifthere is one.
An algorithm is called approximative if:
Feasibility version: It finds a solution that is provably not toofar from feasibility according to some reasonable measure.Optimization version: It finds a solution that is provably nottoo far from optimality according to some reasonable measure.
An algorithm is called heuristic if:
Feasibility version: It attempts at finding a feasible or nearlyfeasible solution, but no quality guarantee is proved.Optimization version: It attempts at finding an optimal ornearly optimal solution, but no quality guarantee is proved.
But the theory of NP–completeness provides techniques for proving thata given problem is “just as hard” as a large number of other problemsthat are widely recognized as being difficult.
Armed with these techniques you might be able to prove that yourproblem is NP–complete.
Then you can march to your boss and announce:
“I can’t find an efficient algorithm, but neither can all these famouspeople.”
Motivation to Study NP–completeness ""bb""eebbrrr rrr
NP–completeness is a form of bad news: evidence that manyimportant problems can’t be solved quickly.
Why should we care?
Knowing that a problem is hard lets you stop beating your headagainst a wall trying to solve them, and do something better:
Use a heuristic. If you can’t quickly solve the problem with agood worst case time, maybe you can come up with a methodfor solving a reasonable fraction of the common cases.
Solve the problem approximately instead of exactly. A lot ofthe time it is possible to come up with a provably fastalgorithm, that doesn’t solve the problem exactly but comesup with a solution you can prove is close to right.
Use an exponential time solution anyway. If you really have tosolve the problem exactly, you can settle down to writing anexponential time algorithm and stop worrying about finding apolynomial time algorithm.
Choose a better abstraction. The NP–complete abstractproblem you’re trying to solve presumably comes fromignoring some of the seemingly unimportant details of a morecomplicated real world problem. Perhaps some of those detailsshouldn’t have been ignored, and make the difference betweenwhat you can and can’t solve.
To study the efficiency of algorithms, we first need a formalnotion of input size.
The input must be encoded or represented as a sequence ofsymbols over some fixed alphabet such as bits or characters.
Once we have decided how the input is represented as asequence of symbols, we define the input size as the length ofthis sequence, that is, the number of symbols in it.
Note: Input size depends on the chosen encoding.Example: integers
input: number 2006 has input size:
4 using decimal representation11 using binary representation2006 using unary representation
Encodings C1,C2 are polynomially equivalent (with respect to aproblem class) iff there are polynomials p1, p2 : N 7→ N such thatfor all instances I of the problem class we have
Although this notion is hard to formalize, the following conditionscapture much of it:
1 the encoding of an instance I should be concise and not“padded” with unnecessary information and symbols,
2 numbers occurring in I should be encoded in binary (or in anyfixed base other than 1),
3 it should be decodable. The intent of “decodability” is that,given any particular component of a generic instance, oneshould be able to specify a polynomial time algorithm that iscapable of extracting a description of that component fromany encoded instance.
Let A be an algorithm which accepts inputs from a set I.
For the running time of an algorithm we count the elementarysteps of A on input I ∈ I with respect to a model ofcomputation and some encoding of the input:
TA : N 7→ N whereTA(〈I 〉) = sum of the costs of all elementary steps of A
on input I .
Elementary steps (examples):variable assignment, random access to a variable whose indexis stored in another variable, conditional jumps, and simplearithmetic operations (addition, multiplication, division,comparison of numbers)
Note: the cost of an elementary step depends on the machinemodel.For example:addition of two k-digit binary numbers in O(k), oraddition of two k-digit binary numbers in O(1)
Our general assumption:arithmetic operations require O(1) time (unit cost model)
Many theoretical machine models exist, most importantly
Each of these (and other possible) machines can be simulatedby each other machine such that the running times for inputsof same size differ from each other only by a polynomial factorand the necessary space consumption only by a constantfactor.
“differ by a polynomial factor” means: if Ti denotes therunning time with respect to machine model i (i = 1, 2), thenthere exist polynomials p1, p2 such that
T1(〈I 〉) ≤ p1(T2(〈I 〉)) for all inputs I , and
T2(〈I 〉) ≤ p2(T1(〈I 〉)) for all inputs I
Equivalence thesis: This equivalence holds for all “reasonable”models of computation.
An algorithm A is said to run in polynomial time(has polynomial-time worst case complexity) for a problem class ifthere is an integer k such that for all instances I of this problemclass
TA(〈I 〉) ≤ O(〈I 〉k)
and all numbers in intermediate computations can be stored withO(〈I 〉k) bits.
A problem class is polynomial-time solvable if there is apolynomial-time algorithm for this class.
Note: this definition is independent from the machine model dueto the equivalence thesis.
Optimization problems can be reformulated as decisionproblems (that can be answered by YES or NO):
Given an instance I (represented by a set FI of feasiblesolutions and an objective function objI which we want tominimize) and an integer k , is there a feasible solution s ∈ FIwith objI (s) ≤ k?
The decision problem is no harder than the originaloptimization problem.
This implies: Any negative result proved about the complexityof the decision version will apply to the optimization versionas well.
P denotes the class of decision problems that can be solved by apolynomial-time algorithm.
For many problems, we do not know whether they are in P.
However, quite often we are able to check in polynomial timewhether the YES-answer to a decision problem is correct ornot (without worrying about how hard it might be to find thesolution).
A decision problem belongs to the class NP if for everyYES-instance I there is a short certificate C (I ) which can bechecked in polynomial time for validity.
More formally,there is an integer k and a (certificate-checking) algorithm A suchthat for every YES-instance I there exists a certificate C (I ) oflength 〈C (I )〉 ≤ O(〈I 〉k) (the length is polynomial in 〈I 〉) suchthat A with input I and C (I ) can verify the YES-answer in at mostO(〈I 〉k) steps.
An equivalent definition of the class NP works withnon-deterministic models of computation. In such models, aprogram may “guess” (call an oracle) at certain steps, andmust verify YES-instances in polynomial time.
“Guessing” corresponds to telling the certificate.
NP stands for “non-deterministic polynomial time”.
It does NOT mean non-polynomial time!
Lemma
P ⊆ NP.
Proof.
For any problem in P, we can choose the certificate to beempty.
Example:HAMILTON CIRCUIT:Instance: An undirected graph G .Question: Has G a Hamilton circuit?(Hamilton circuit = a simple cycle in G which visits each vertex.)
Lemma
HAMILTON CIRCUIT belongs to NP.
Proof.
For each YES-instance G we take any Hamilton circuit of G as acertificate. Checking whether an edge set is a Hamilton circuit canobviously be done in polynomial time.
Next step: study “hardest” problems in NPP2 is at least as hard as P1 if P1 is a “special case” of P2
Definition
Let P1 and P2 be two decision problems. We say thatP1 polynomially transforms (reduces) to P2 (we write P1 ∝ P2) ifthere is a function f : P1 7→ P2 such that
for each instance I ∈ P1, we can compute f (I ) ∈ P2 inpolynomial time (with respect to 〈I 〉), and
I is a YES-instance of P1 if and only iff (I ) is YES-instance of P2.
The Satisfiability Problem (SAT) ""bb""eebbrrr rrr
Let X = x1, x2, . . . , xn be a set of Boolean variables.
A truth assignment for X is a function T : X 7→ true, false.The negation of a variable x is denoted by x .
The elements of the set L := X ∪ x | x ∈ X, i.e. the variables and theirnegations are called literals. The truth assignment is extended to L in theobvious way, by setting T (x) := true if T (x) = false and vice versa.
A clause over X is a disjunction of literals, i.e. a logical or-combination ofliterals (denoted by +). It is satisfied by a truth assignment if and only ifat least one of its literals is true.
A conjunction of clauses F = C1 · C2 · · · · · Cm, i.e. a logicaland-combination of clauses (denoted by multiplication ·), is satisfiable ifand only if there is a truth assignment which simultaneously satisfies allof its clauses Ci .
Such a formula F is a Boolean formula in conjunctive normal form.
Some graph terminology:A clique is a set of pairwise adjacent vertices.A node set S is called stable (independent) if no two nodes in Sare connected by an edge.
STABLE SET (INDEPENDENT SET):Instance: A graph G = (V ,E ) and an integer k.Question: Is there a stable set of ≥ k vertices?
Theorem (Karp (1972))
STABLE SET is NP–complete.
Proof:
STABLE SET ∈ NP, since a stable set of size k is acertificate which we can verify in polynomial time.
Consider an instance I of SATISFIABILITY with clauses Z1,Z2, . . .Zm
where Zi = yi1 + yi2 + · · ·+ yiki with yij ∈ xij , xij.We construct for I an instance f (I ) of STABLE SET, i.e. a graph G andan integer k such that
(1) I is satisfiable if and only if G has a stable set of size k, and(2) the construction can be done in polynomial time.
Construction:for each clause Zi , we introduce a clique Ci of ki vertices according to theliterals of this clause.
Vertices corresponding to different clauses are connected by an edge ifand only if the literals contradict each other (i.e., one literal is thenegation of the other).
A vertex cover in an undirected graph G = (V ,E) is a subset S ⊆ V ofvertices such that every edge of G is incident to at least one vertex of S .
vertex cover stable set clique
Lemma
Let G = (V ,E ) be a graph and X ⊆ V . The following threestatements are equivalent:(1) X is a vertex cover in G .(2) V \ X is a stable set in G .(3) V \ X is a clique in the complement of G .
The definition of the class NP is not symmetric with respectto YES- and NO-instances.
For example, it is an open question whether the followingproblem belongs to NP:given a graph G , is it true that G is not Hamiltonian ?
Definition
coNP is the class of decision problems for which (as in thedefinition for NP) a certificate checking algorithm exists for theNO-instances (which runs in polynomial time).A decision problem P0 is called coNP–complete if
(1) P0 ∈ coNP, and(2) P1 ∝ P0 for all P1 ∈ coNP.
coNPC denotes the class of coNP–complete problems.
The complement co(P0) of a decision problem P0 has the sameinstances as P0, but the question with respect to co(P0) is thenegation of the question for P0.
Theorem
A decision problem is NP–complete if and only if its complementis coNP–complete.
Interesting is the class NP ∩ coNP.For problems in this class there are certificates that can be checkedin polynomial time for YES- as well as for NO-instances.
Edmonds called such problems “problems with a goodcharacterization”.
Extension to Optimization Problems ""bb""eebbrrr rrr
Definition
A decision problem P0 polynomially reduces to the optimizationproblem P1 if there is for P0 a polynomial-time oracle algorithm A,that means,(1) algorithm A has polynomial running time, and(2) algorithm A may use polynomially many calls to an oraclewhich delivers an optimal solution to an instance I ∈ P1, and eachoracle call has O(1) cost.
An optimization problem or decision problem P0 is calledNP–hard if every problem P1 ∈ NP polynomially reduces to P0.
This means: NP–hard problems are at least as hard as the hardestproblems in NP. But some may be harder than every problem in NP.
There are NP–hard problems which are not known to be in NP.
Example: Euclidean Steiner tree problem (Garey, Graham, Johnson 1979)
Traveling Salesman Problem (TSP) ""bb""eebbrrr rrr
TSP:Instance: A complete graph Kn on n vertices, n ≥ 3, and distances c(e) ≥ 0(rational numbers) for all edges.Task: Find a Hamiltonian cycle C of minimum length
∑e∈E(C) c(e).
Theorem
TSP is NP–hard.
Proof: Reduction of HAMILTON CIRCUIT to TSP
Given an instance of HAMILTON CIRCUIT G = (V ,E) with n nodes, wedefine the following TSP instance:
Kn with c(e) := 1 if e ∈ E and c(e) := 2, otherwise.
G has a Hamiltonian cycle if and only if the shortest tour has length n.
Hence, a single call of an oracle for TSP suffices to solve HAMILTONCIRCUIT.
Since HAMILTON CIRCUIT is NP–complete, the theorem is proved. 2
A given neighborhood NI (·) for an instance I of anoptimization problem induces a neighborhood graphGI = (VI ,AI ).
The vertex set VI corresponds to the set of feasible points FI .
There is an arc (x , y) ∈ AI if and only if y ∈ NI (x).
Note: In case of problems with potentially infinite solutionspaces, these graphs may be of infinite size, and the numberof arcs leaving/entering a node may also be infinite.
The neighborhood graph can be considered as an undirectedgraph if the neighborhood is symmetric, i.e. ify ∈ NI (x)⇔ x ∈ NI (y) for all x , y ∈ VI .
Examples of Neighborhood Relations ""bb""eebbrrr rrr
Before we start with examples:
In the general definition from the previous slides, a neighborhood relationis an arbitrary graph on the solution space of an instance.
In combinatorial optimization problems, the number of arcs leaving orentering a given solution is typically tiny compared to the size of thesolution space.
Since the solution space is usually huge, this does not mean that thenumber of arcs leaving/entering a node is small in absolute terms.
Fortunately, we usually do not need to construct (or even store) thewhole neighborhood graph explicitely.
Typically, the existence of an arc (s1, s2) in AI is constituted by minormodifications, which transform s1 into s2.
−→ In the following examples, we only formulatethese minor modifications to specify AI .
In the simple neighborhood definition on the last slide, two arcs wereexchanged.
Obvious generalization:
a fixed number k ≥ 2 of arcs is removed;k appropriate arcs are introduced to re–connect the partialtours;an appropriate selection of these partial tours is turned in orderto make the re–connected subgraph an oriented tour.
This kind of neighborhood is called k–OPT in the literature.
Consequently, the simple neighborhood from the last slide is “2–OPT”.
The size of the k-opt neighborhood is Ω(nk) for the TSP on n points.
Therefore, in practice, only 2–OPT (or 3–OPT) are usually applied.
Example IV: Bipartition (cont’d) ""bb""eebbrrr rrr
Problem with such a neighborhood: The number of items inthe selection cannot change by stepping from one feasiblesolution to a neighbored one.−→ GI is highly disconnected.−→ If the search happens to start in the “wrong”
connected component, it has no chance to reachthe good solutions.
Probably a better approach:Two feasible solutions are neighbored.
⇐⇒One can be constructed from the other one by inserting,
removing, or exchanging one item.
In general:It is desirable that the neighborhood graph is stronglyconnected (i.e. there is a directed path between any twovertices).
Neighborhood for Disjoint Paths? ""bb""eebbrrr rrr
Discussion:
This is not a good example of neighborhood structures. In fact, it wasinserted into this list of examples to serve as a counter–example.
What’s wrong:Seems that there is no appropriate neighborhood structure.
Straightforward ideas for neighborhood structures: two feasible solutionare neighbored ⇐⇒
a subpath of one path is changed, orsubpaths of a few, mutually involved paths are exchanged, orsome edges change the path to which they belong.
Why not too promising:
It is very likely that no strict improvement step is possible:You may need many rearrangements of paths, until you canincrease the number of paths by one.
−→ Chances are high that the local search is over aftervery few (maybe zero) steps.
Example VI: Partial Consideration of Constraints ""bb""eebbrrr rrr
Sometimes the number of side constraints is way too large toconsider all constraints simultaneously.
In some modeling approaches, the number of side constraintsmay even be infinite.−→ Cf. Slides nos. 35 ff.
In cases like these, one can try to approximate the optimalsolutions by (typically infeasible) “solutions”:
Every finite subset of the set of all side constraints constitutesa certain (potentially infeasible) “solution”.Try to find one of these “solutions” S such that S isacceptably close to at least one of the actual optimal solutions.
Neighborhoods for Selections of Side Constraints ""bb""eebbrrr rrr
On finite selections of side constraints, one can easily definevarious neighborhood relations.
Simple example:
Two sets of side constraints are neighbored if oneset is constructed from the other one by inserting,removing, or exchanging exactly one side constraint.
Example polynomial approximation:“exchanging” means that one xi is moved to another positioninside [a . . . b].
A path (or cycle) p in G is called elementary (or simple) ifeach of its vertices appears only once.
An elementary path (or cycle) p in G is called alternating(with respect to M) if exactly every second edge of p belongsto M.−→ In other words, the edges of M appear on p in analternating fashion.
More specifically: Let e1 − e2 − e3 − · · · − ek−1 − ek be theedges on p in the order in which they appear on p.
Either we have ei ∈ M for all odd i ∈ 1, . . . , k and ei 6∈ Mfor all even i ∈ 1, . . . , k.Or we have ei ∈ M for all even i ∈ 1, . . . , k and ei 6∈ M forall odd i ∈ 1, . . . , k.
Neighborhood Relation for Matching ""bb""eebbrrr rrr
Lemma
This neighborhood relation for matchings is exact.
Proof:
For convenience, we will identify a path with its set of edges.
Let M1 and M2 be two matchings such that |M2| > |M1|.We have to show: In this case there is an alternating path p for M1 suchthat |(M14 p)| > |M1|.Clearly, at most one edge of each of M1 and M2 is incident to a givennode.
−→ Every node has degree at most two in the symmetric differenceM14M2.
−→ M14M2 decomposes into elementary paths and cy-cles, which are all alternating for both M1 and M2.
Since |M2| > |M1|, at least one of these paths p, say, must contain morearcs from M2 than from M1.
The matching problem is a maximization problem, and theobjective value obj(M) of a matching M is the number ofedges in M.Recall the loop in the local search scheme on Slide no. 95. LetM∗ = s∗ be the current solution, and M = s be a neighboredsolution.This means that M∗4M is an alternating path, which hasodd length and both endnodes are exposed (w.r.t. M∗!).The latter paths are called M∗–augmenting paths.Exactness of the neighborhood relation for graphs is thereforea reformulation of a famous Theorem of Berge:
Theorem (Berge (1957))
Let G be a graph with some matching M. Then M is maximum ifand only if there is no M-augmenting path.
A start solution for the local search scheme on Slide 95 is easyto find: the empty matching.
At first glance, it also seems easy to find an augmenting path:
Find an unmatched node v ∈ V , that is, a node that is notincident to any edge of the current matching s∗.
Determine all nodes w ∈ V such that there is an alternating(v ,w)–path.If at least one of these nodes w is unmatched, the symmetricdifference of the current matching and this (v ,w)–path is amatching with one more edge.
Clearly, we cannot enumerate all possible alternating pathsthat start with v , because their number may be exponentiallylarge (−→ left as an exercise).
Each of the common efficient search strategies (depth–first,breadth–first, ...) determines a tree T , which clearly containsat most one (v ,w)–path for each node w .
However, it can be seen (formal details omitted; see thepicture below for an intuition) that, possibly, alternating pathsto some nodes w (and thus the nodes themselves) may bemissed.
Since we cannot determine a better neighbored solution in a reasonableway, we cannot apply the “pure” local–search scheme.
However, we can modify the loop on Slide no. 95:
We search for an alternating path by growing a set ofalternating trees (alternating forest).
If we detect a blossom, we shrink this blossom into apseudonode and continue with the resulting graph andmatching.
If we find an alternating path p connecting two unmatchednodes, we replace the current matching M by M4p, expandpseudonodes on this path (recursively), and continue as in theregular local–search scheme.
Finally, if neither case applies, we have found a maximalmatching in the shrunken graph (proof omitted).
At the very end, all shrinking operations are undone and the matching isextended to all blossom edges.
Example VIII: Minimum Cost Flows ""bb""eebbrrr rrr
Input:
a directed graph D = (V ,A);lower and upper capacity values 0 ≤ `[a] ≤ u[a] ∈ R and acost factor c[a] ∈ R for each arc a ∈ A;a balance value b[v ] ∈ R for each node v ∈ V .
Desired output: a flow value f [a] ∈ R for each arc a ∈ A such that
`[a] ≤ f [a] ≤ u[a] for each arc a ∈ A (capacity constraints) andfor each node v ∈ V :∑
Neighborhood Relation for Min-Cost-Flows ""bb""eebbrrr rrr
Lemma
This neighborhood relation for flows is exact.
Proof: On the next few slides (until Slide no. 132).
Remark:
The application of the algorithmic scheme from Slideno. 95 to the min–cost flow problem on thisneighborhood relation is called the negative–cyclecanceling algorithm in the literature.
Exactness of the Min-Cost-Flow Neighborhood ""bb""eebbrrr rrr
Proof (of the exactness):
Suppose that f1 is not optimal.
−→ To prove the claim, it then suffices to show that thereis some negative cycle p that is augmenting w.r.t. f1,the lower bounds `, and the upper bounds u.
Suppose that f2 is optimal.
−→ The cost of f2 is strictly smaller than the cost of f1.
−→ Among the cycles p1, . . . , pk that are guaranteed bythe flow decomposition lemma, at least one must benegative.
Let pi denote this cycle and εi its multiplicity in the flow decomposition.
Since pi is augmenting, f1 + εi · pi is obviously feasible.
Concluding Remarks on ExactNeighborhood Relations ""
bb""eebbrrr rrr
If a neighborhood relation is not exact, there is still some(heuristic!) hope that the solution from the algorithmicscheme on Slide no. 95 is not “too bad”.
However, the search will always be “trapped” in a localoptimum “near” the start solution.
Such a local optimum may be very bad compared to theoverall global optimum.
In the following, we will discuss a couple of heuristictechniques to let the search “escape” from local optima.
In principle, this means a “biased coin–flipping” experimentwhere the head and the tail of the coin may occur withdifferent probabilities. On a computer, this amounts toapplying a random number generator.Random number generator: A deterministic number generatorthat simulates a non–deterministic choice of numbers.In the simulated–annealing algorithm, the probability of “yes”is determined by the so–called temperature T > 0: Forobj(s) ≥ obj(s∗), the probability of “yes” is
exp
(obj(s∗)− obj(s)
T
).
Observation: Since obj(s) ≥ obj(s∗) and T > 0, this is indeeda probabilistic decision, that is,
how to define T ;how to define the termination condition.
−→ Form the cooling schedule as defined below.
Cooling schedule:
A finite sequence T1 > T2 > T3 > · · · > Tk > 0 oftemperature values is defined.For i ∈ 1, . . . , k, a positive integral sequence length ni isdefined in addition.
Application of the cooling schedule:
For i = 1, 2, . . . , k , exactly ni iterations of thewhile–loop on Slide no. 138 are run with Ti beingthe temperature value.
Discussion of Simulated Annealing ""bb""eebbrrr rrr
Simulated Annealing is quite a popular method.
Why:
A first prototype is easy to implement.Only little mathematical background is required from theprogrammer.Has the potential to provide feasible solutions of high quality.The name is “cool”!?
Problems:
Often, reasonable solutions can only be achieved at the cost ofan enormous computational effort.No quality guarantees at all.Typically, a lot of experimental work is needed to adjust theparameters for a particular problem.
Background of Simulated Annealing ""bb""eebbrrr rrr
Annealing: In chemistry and chemical engineering the process of coolingheated material.
The annealing should not produce cracks and fissures in the material.
In physical terms:
Cracks and fissures mean that the remaining potential energyinside the material is high.The material always assumes a local minimum of potentialenergy when cooled down.
The warmer the material,
the higher the chances that cracks and fissures occur, but alsothe higher the chances that those cracks and fissures areclosed again.
−→ That the material escapes from a bad local minimum.
The formula used for the probabilistic decision has originally beeninvented to describe physical processes like cooling.
Neighborhood-Based Approaches 3.5 Feature-based local search
Feature-based Local Search ""bb""eebbrrr rrr
What is feature–based local search?
In typical optimization problems, the feasible solutions to an instance areformed by features.
More specifically:
For an instance, there is a finite ground set of features.The feasible solutions to this instance are certain subsets ofthis ground set.
−→ Selections of features.
Concrete examples of features: −→ On the next few slides.
Remark:
Features ≡ dimensions: if the feasible solutions are elements ofsome space 0, 1n, the n dimensions may be interpreted asfeatures.Here we follow the terminology from the literature onlocal–search algorithms and speak of “features” rather thandimensions.
Neighborhood-Based Approaches 3.5 Feature-based local search
Concrete Examples of Features ""bb""eebbrrr rrr
TSP:
Recall the TSP from Slide no. 14.There the feasible solutions to a TSP instance on n pointswere encoded as certain (n × n)–matrices X with 0/1–entries.Semantics: X [i , j ] = 1 means that point no. j immediatelyfollows point no. i cyclically on the round tour encoded by X .Then the features are the pairs (i , j) for i , j ∈ 1, . . . , n.
Matching:
Recall the matching problem from Slide no. 44.Here the edges of the input graph are the features.
Neighborhood-Based Approaches 3.5 Feature-based local search
Concrete Examples of Features (cont’d) ""bb""eebbrrr rrr
Coloring:
Input: an undirected graph G = (V ,E ).Output: an assignment C : V −→ N of a positive integralnumber (“color”) to each node.Feasible: if C [v ] 6= C [w ] for all v ,w ∈ E .Objective: minimizing maxC [v ] | v ∈ V .
Features: pairs (v , n) such that v ∈ V and n ∈ N.
Remark:
In principle, the set of features is infinite in this example.However, obviously, maxC [v ] | v ∈ V ≤ |V | for any optimalsolution.Consequently, the assumption that 1, . . . , |V | is the (finite)ground set of colors does not reduce generality.
Neighborhood-Based Approaches 3.5 Feature-based local search
Useful Terminology ""bb""eebbrrr rrr
Consider an instance I of an optimization problem.
Again, let FI denote the set of all features of the instance I .
For a feature x ∈ FI let C [x ] denote the feature cost.
A feasible solution S to I can be identified with the setF (S) ⊆ FI of the features that make up S .
In optimization problems in which the cost of a solution is thesum of the costs of the selected features, the cost of solutionS may then be rewritten as
Neighborhood-Based Approaches 3.5 Feature-based local search
Guided Local Search ""bb""eebbrrr rrr
In principle, this is the general local–search scheme from Slideno. 95.
Crucial difference: something different happens whenever thesearch runs into a local optimum (not just termination).
Handling a local optimum:
The algorithm examines all features that make up the localoptimum.For each of these features a “utility of penalization” isdetermined.One or more features with the highest “utility of penalization”are penalized.The penalty is so large that the current solution is not a localoptimum anymore.Then the local search continues as usual.
What does “penalized” and “utility of penalization” mean?−→ On the very next slide.
Neighborhood-Based Approaches 3.5 Feature-based local search
Penalization ""bb""eebbrrr rrr
“Penalized”:
The feature cost of a feature is increased by some value(the penalty).
“Utility of penalization”:
The “utility of penalization” of a feature is an estimation howpromising it would be to penalize this feature.
Ideas for such an estimation:
If the original cost value of a feature is high, it might bepromising to drive it out of the solution by penalizing it.On the other hand, if a feature has often been penalized and isagain in the current solution, it might not be too promising topenalize it yet another time.
Neighborhood-Based Approaches 3.5 Feature-based local search
Taboo Search ""bb""eebbrrr rrr
This is another variant of the general local–search scheme from Slideno. 95.
Fundamental difference:
Taboo search always moves on to the neighbor of minimal cost.Unlike local search, it does so even in case the current solutionis a local optimum.
−→ In such a case, the move step causes a deterioration.In order to terminate the algorithm, an additional, externalstopping criterion must be incorporated(e.g. termination after a certain number of steps).
Problem: After escaping from a local minimum by a neighborhood step,the algorithm is very likely to return to this local minimum very soon.
Potential consequence whenever this problem occurs:an infinite loop on a few feasible solutions.
Examples of Kernighan-Lin Approaches ""bb""eebbrrr rrr
TSP:
Consider the neighborhood structure visualized on Slide no. 99.Whenever an arc is removed from the round tour, itsre–insertion is becoming a taboo.
Max–cut:
From Slide no. 152 recall the max–cut problem.Various neighborhood structures could be defined based oninserting nodes into W and removing nodes from W .In any such case, we could taboo the re–insertion of a removednode and the removal of an inserted node.
Kernighan-Lin: what’s in a name? ""bb""eebbrrr rrr
Actually,
these two guys never described the approach in full abstractgenerality,but only presented concrete instances of this technique for twoconcrete problems: TSP and max–cut.
These two instances are commonly called the Kernighan-Linalgorithm and the Lin-Kernighan algorithm in the literature.
The term, “approaches of the Kernighan–Lin type,” is notcommon in the literature.
It is chosen in this lecture in honor of these two pioneers ofheuristic algorithms.
Neighborhood-Based Approaches 3.7 Iterated Local Search
Iterated/Chained Local Search ""bb""eebbrrr rrr
Explanation:
Perturbation(s∗) modifies the current solution and deliversa feasible intermediate s.LocalSearch(s) can be any algorithm which gets a feasiblesolution as its input and delivers a feasible solution as itsoutput.AcceptanceCriterion(s∗,s ′) decides whether we accept s ′
as our new solution or stay at the previous solution s∗.
Remarks:The Perturbation should neither be too small nor too large:
If it is too small, one will often fall back to the previous localoptimum.If it is too large, than the intermediate step will be almost arandom intermediate solution. In this case, the algorithm willbehave similar to a random restart type algorithm.
All variants of local search considered so far only apply onerun of the search.
Problem:
Chances are high that the search will never leave a smallsubspace of the solution space.The really good feasible solutions may be somewhere else inthe search space.
Simplest imaginable idea:
Generate a set of feasible solutions (e.g. randomly).Start a local search (or simulated annealing, taboo search,whatever) from each of them.Deliver the best feasible solution seen by any of these searches.
Like “survival of the fittest” (but unlike biological evolution), the processis organized in rounds.
One round:
A certain number of members of the population are selectedrandomly with a probability that is monotonously increasingwith their cost values.These members of the population are dropped.Another number of members of the population are selectedrandomly with a probability that is monotonously decreasingwith their cost values.These members of the population produce offspring.Each member of the new population
is mutated randomly like in “survival of the fittest,”however, not at all odds,but only with a certain (typically very small) probability.
−→ How well does an individual that carries these genes?
In other words: Letting the individual struggle for life is much like
evaluating an objective function for its abstract representation(genes) anddeciding upon its “survival” through a random decision with aprobability that is monotonously increasing in the value of theobjective function (cf. Slide no. 179).
The space of biologically possible individuals is certainly much larger thanthe number of possible genes.
−→ Each abstract representation corresponds to a feasiblesolution (but not necessarily vice versa).
Conceptually, genetic algorithms are very similar to evolution strategies:
A search proceeding in rounds.A population (of genes!) is maintained.In each round, some members of the population are killed witha probability that is monotonously decreasing in the fitness ofthe individual.
−→ Cf. Slide no. 178.
Main difference:
In evolution strategies, a new generation (“child generation”)is generated from selected members of the previous generation(“parent generation”) by means of (asexual) mutation.In genetic algorithms, each member of the child generation isgenerated from two members of the parent generation bymeans of (sexual) recombination.
Example III of genes: graph coloring ""bb""eebbrrr rrr
From Slide no. 153 recall the definition of the graph coloring problem.
Also recall that, without loss of generality, the number of colors may berestricted to the number n of nodes of the graph.
Therefore, a feasible (or infeasible) solution may be encoded as a string oflength n over the alphabet 1, . . . , n.Alternatively, a feasible (or infeasible) solution may be encoded in abinary fashion through an (n × n)–matrix X :
X [i , j ] = 1 ⇐⇒ node no. i is assigned color no. j .
−→ Much like in the example of TSP from the last slide.
Since the selection of the parents is based on their degrees of fitness, theoffspring of a pair of parents should preferably resemble their parents.
−→ If the parents are fit, chances are high that the offspring arefit, too.
For example, in feature–based problems:
If a feature is selected in both parents, it should also beselected in the offspring.If a feature is selected in neither parent, it should not beselected in the offspring, either.
Moreover, an offspring should not essentially inherit from one parentalone but from both parents.
−→ Should not be very similar to one parent and, simultaneously,very different to the other parent.
The result may be the abstract representative of an infeasiblesolution.
One possible strategy to overcome this problem:
Make all solutions feasible by dropping all side constraints.As a surrogate, penalize the (degree of) deviation fromfeasibility.
Alternative ideas:
Re–define the genetic representation of solutions such thatreasonable crossover strategies will produce (almost) feasibleoffspring from feasible parents.
−→ At least some constraints are satisfied.Apply the crossover and repair the result afterwards.
Input: a finite set of jobs, J1, . . . , Jn,a duration di for each job Ji , anda selection S pairs (Ji , Jj) such that the graph with node setJ1, . . . , Jn and these selected pairs as directed arcs is acyclic.Furthermore, some kind of resource constraints.
Output: an assignment of each job Ji to a start time ti ≥ 0.
Constraints: ti + di ≤ tj for each (Ji , Jj) ∈ S ,and the resource constraints.
A large process or project is broken down into indivisibletasks Ji .Certain tasks must be delayed until certain other tasks arefinished (precedence constraints).Resource constraints limit how many jobs can be scheduledsimultaneously. For example, only two machines (processors)are available.The total duration of the project or process is to be minimized.
Resource-constrained scheduling problems are typically very hardoptimization problems (both theoretically and practically).
Example of “Re-define ... feasible offspringfrom feasible parents” ""
bb""eebbrrr rrr
Idea: We relax (i.e., ignore) the resource constraints and penalize theirviolation in the objective function.
Goal: Feasibility with respect to precedence constraints is maintained.
Representation: For each job Ji , the genetic representation contains areal number xi .
Semantics: xi = minti − (tj + dj) | (Jj , Ji ) ∈ S.xi represents the slack of job Ji , i.e., the amount of time Ji could bescheduled earlier without violating the precedence constraints.
Solutions which fulfill the precedence constraints correspond in aone-to-one fashion to the nonnegative genetic representations.
Crossover Strategies for this Representation ""bb""eebbrrr rrr
So assume that genes are represented by slack values xi .
For this space of genetic representation, any crossoverstrategy is appropriate that transforms nonnegativerepresentation into nonnegative ones.
Examples: the offspring are constructed by
exchanging elements of the parents,taking the element-wise minimum/maximum of each elementof the parents,taking (weighted) average values of the parents’ elements.
Note: Offspring constructed in one of these ways may violateresource constraints.
As this example shows, it may be necessary to repair aposition more than once.Clearly, the repair loop only terminates when O1 becomesfeasible.Thus, it suffices to prove that the repair loop indeedterminates.For an easier exposition, we will consider an auxiliary directedgraph G = (V ,A) with V = 1, . . . , n and for all i , j ∈ V :(i , j) ∈ A if, and only if, there is h ∈ `, . . . , r such thatP2[h] = i and P1[h] = j .In the example from the last slide:
Each iteration of the repair loop replaces i by j for some (i , j) ∈ A.
Repairing a position more than once means proceeding along some pathof G .
Clearly, this path starts with one of the nodesP1[1], . . . ,P1[`− 1], P1[r + 1], . . . ,P1[n].
Therefore, the repair loop terminates unless it proceeds along a cycleof G .
However, for each node i ∈ V in a cycle of G , it is i ∈ P1[`], . . . ,P1[r ]and i ∈ P2[`], . . . ,P2[r ].Obviously, each node is entered by at most one arc.
In summary, a node i on a cycle of G cannot be reached from any of thenodes P1[1], . . . ,P1[`− 1], P1[r + 1], . . . ,P1[n].
Like in two–point crossover, positions `, r ∈ 1, . . . , n, ` < r , are chosenaccording to some selection rule.
Then O1[i ] := P1[i ] for all i ∈ `, . . . , r.For all i ∈ 1, . . . , `− 1, r + 1, . . . , n, the values O1[i ] are defined suchthat the result O1 is indeed a permutation of 1, . . . , n.Contribution of P2:
The values O1[1], . . . , O1[`− 1], O1[r + 1], . . . , O1[n] aredefined such that their relative order is identical to theirrelative order in P2.
More specifically:
Let i1, i2 ∈ 1, . . . , `− 1, r + 1, . . . , n.Let j1, j2 ∈ 1, . . . , n such that O1[i1] = P2[j1] andO1[i2] = P2[j2].Identical relative order means: if i1 < i2, then j1 < j2.
Each round of a genetic algorithm comprises the followingsteps:
A certain number of pairs of parents is selected forreproduction.The pairs are combined to form offspring.With a small probability, such an offspring is then mutated likein evolution strategies.
Rationale of the additional mutation step:
Simulates the biological procedure more precisely.Often seems to have a positive effect on the outcome of thegenetic algorithm.
Neighborhood-Based Approaches 3.9 Local Search with Complex Side Constraints
Complex Side Constraints ""bb""eebbrrr rrr
All variants of the general local–search scheme depend on anappropriate neighborhood structure in some way or other.
Examples:
The fundamental local–search algorithm requires an easilyenumerable neighborhood structure.Simulated annealing requires a neighborhood structure inwhich random selection is easy.
Unfortunately, the side constraints often make the naturalneighborhood definitions inappropriate.−→ Example on the next slides.
Neighborhood-Based Approaches 3.9 Local Search with Complex Side Constraints
Additional Complex Constraints in the TSP ""bb""eebbrrr rrr
The neighborhood sketched on Slide no. 99 is appropriate: any pair ofarcs of the current tour induces a neighbored feasible solution.
However, the TSP does not seem to occur very often in its purist form inreality.
In fact, real–world variants of the TSP typically come with additional sideconstraints.
Typical example:
A list of pairs (i , j) is given as an additional input.For each such pair (i , j), object no. i must be visited beforeobject no. j (precedence constraints).
Consequence:
Removing two arbitrary arcs from a tour and re–connecting thetour like on Slide no. 99 may result in an infeasible solution.Unless the list of pairs is very short, the probability ofinfeasibility is very high.
Neighborhood-Based Approaches 3.9 Local Search with Complex Side Constraints
Problem Summary ""bb""eebbrrr rrr
The neighborhood relation may become very sparse (maybeeven disconnected).
Consequences:
Due to the loose neighborhood connections, the search is likelyto stay in a small subset of the search space (around the startsolution).Chances are high that the search quickly traps into a localoptimum and cannot leave it anymore.
Additional problem for algorithms such as simulated annealing:
Such an algorithm may require many trials ofneighbored, but infeasible, “solutions” until afeasible neighbored solution is found.
Neighborhood-Based Approaches 3.9 Local Search with Complex Side Constraints
Typical approach: Relaxations ""bb""eebbrrr rrr
Some of the side constraints are relaxed, that is:
The selected side constraints are dropped.Additional penalty terms in the cost function penalizeviolations of the dropped side constraints.
If the side constraints to be dropped are well selected, thenatural neighborhood relations will again be appropriate.
Natural approach to the example (TSP with precedenceconstraints):
The precedence constraints are dropped.The number of input pairs (i , j) such that no. j is visitedbefore no. i is added to the cost function (possibly multipliedby some weighting factor).
Neighborhood-Based Approaches 3.9 Local Search with Complex Side Constraints
Variable Weight Penalties ""bb""eebbrrr rrr
The rough penalty terms is to be multiplied by a (nonnegative) penaltyfactor, which expresses the relative priority of the penalty compared tothe original objective.
Clearly, this factor need not be constant throughout an application of anykind of local–search technique (simulated annealing, evolution strategies,...).
Natural scheme:
In the beginning, the factor is very small or even zero.
−→ The side constraints are dropped, (more or less) with-out a substitute.
Throughout the procedure, the factor is increased from time totime.At the end, the factor is very large.
−→ The problem becomes (approximately) a pure feasi-bility problem.
Neighborhood-Based Approaches 3.9 Local Search with Complex Side Constraints
Heuristic Idea Behind Variable Weights ""bb""eebbrrr rrr
Chances are high that the objective function is rather “smooth” on theset of all feasible and infeasible solutions.
−→ Need not be equally smooth on the feasible solutions alone!
In this case, the good feasible solutions are probably found in the areaswhere the objective function is generally good on feasible and infeasiblesolutions.
Therefore, it might be a good idea,
first to approach these areas quickly (disregarding feasibility)andenforcing feasibility only later on.
It might even be a better idea to increase the force on feasibility graduallythroughout the search.
−→ Exactly the scheme formulated on the last slide.
Neighborhood-Based Approaches 3.9 Local Search with Complex Side Constraints
General Problem with Relaxation ""bb""eebbrrr rrr
The feasible solutions to the original problem are among the very (very!)good solutions with respect to the new objective function, that is,
the original objective functionplus some penalty terms.
Experience seems to suggest that the various local–search algorithmspresented here
indeed converge to the very good solutions,however, often at a miserable convergence rate.
Heuristic consequence:
Chances are high that such a search procedure requires a lot oftime until the very first feasible solution is seen.However, if the algorithm indeed finds a feasible solution, thisis probably quite good.
Decomposition of the Solution Space ""bb""eebbrrr rrr
In this section, we consider approaches for finding an optimal(or at least feasible) solution to an algorithmic problem whichare based on partitions of the search space into smallersubsets.
The algorithmic steps within the search for an optimalsolution induce a tree structure, a search tree.
At each internal node of the tree we “make a decision” onhow to decompose the search space further.
The partition into subsets is usually induced by adding someconstraint.
Examples:Let S be the set of solutions associated with node P.
Example 1: We branch on a binary variable xi ∈ 0, 1: P hasexactly two children Q0 and Q1. The solution set associatedwith Q0 is S | xi = 0, the solutions set associated with Q1 isS | xi = 1.Example 2: We branch on a continuous variable xi ∈ R: Wemay partition S into several sets, for instance intoS1 = S | xi < 0, S2 = S | xi = 0, S3 = S | xi > 0.Then node P has three children Q0,Q1 and Q2 with associatedsets S1,S2 and S3.
From Slides 149 ff. recall that, in many algorithmic problems,
the feasible solutions may be identified with the subsets of aground set of features,which may often be regarded as the dimensions of theunderlying ground set.
−→ Feature-based problem definitions and algorithms.
Each feature in the ground set naturally induces one decision:whether or not it shall be a member of the feasible solution.
Thus, every solution may be determined by a sequence of“yes” and “no” answers (one for each feature).
Decision Trees and Partitionof the Solution Space ""
bb""eebbrrr rrr
The options in a decision partition the solution space intodisjoint subsets:
those solutions in which this feature is selected andthose solutions in which this feature is rejected.
The subset of solutions associated with a node of a decisiontree is exactly the set of solutions which are compatible withall decisions along the unique path from the root to this node.
In other words: a node corresponds to the set of all solutionsthat
contain the features selected so far anddo not contain the features rejected so far.
A leaf then corresponds to a singleton, and represents exactlyone element of the solution space.
For each examined node, we try to construct a certificate thatthere is no feasible (or optimal) solution among the leaves ofthe node’s subtree.If we succeed, the subtree is excluded from the exploration as awhole.The latter is called pruning.
Exploration “heuristically restricted”:Deliberately skip parts of the tree (subtrees).−→ The search for a feasible (not to mention optimal)
solution may fail even if the solution space is non–empty.
STEP 1: Create a root node r representing the original problem.Mark this node as unexplored (“not visited”).
STEP 2: WHILE there is an unexplored node DO
2a) select one unexplored node for examination, say node n,and mark this node as explored;
2b)
either solve the problem associated with n(determine infeasibility or find an optimal solution);or decide to prune the whole subtree rooted at n;or branch, that is, decompose the problem into subproblemsand add corresponding unexplored nodes to the tree(determine the order of the subproblems in the tree);
How to select among the unexplored nodes?−→ tree traversal strategies(to be discussed on the next slides)Step 2b) offers many variants, for example:Variant 1: First try to solve the problem associated withnode n. If this attempt fails, then branch into subproblems.
Variant 2: Branch immediately into subproblems, unless thecurrent node corresponds to a singleton solution.
Variant 3: Compute some additional information which helpsyou to decide whether you can prune the subtree. If youdecide against pruning, then branch into subproblems.How to branch?(to be discussed later)
In the general algorithmic scheme, the search tree is built up dynamically,i.e., it grows step by step.
Let us now change the viewpoint and look at the whole tree which we getafter completing all computations.
In which order did we visit the tree nodes?
Obviously, a tree node (except for the root) can only be visited when itsimmediate predecessor has been visited before.−→ For each arc (v ,w) of the decision tree, w enters the
state “explored” (= “visited”) only after v .
Thus, a “reasonable” tree traversal order can be viewed as propagating a“frontier line” through the (final) tree:
In each step, one arc (v ,w) with v already visited and w notyet visited is chosen.The node w is visited.
The individual tree traversal strategies differ in the selection rule for thearc (v ,w).
A node v is inserted in the stack when the search descendsfrom v ’s parent to v .
The next node to be visited is one of the children of the topelement of the stack.
A node v is removed from the stack once all immediatechildren of v have been visited and the search ascends backfrom v to v ’s parent.
So, at any time, the nodes in the stack form the (unique)path from the root to the current top element.
−→ If the next arc from v is always the leftmost one not yetprocessed, the nodes in the stack form a “frontier line”that passes from left to right through the tree.
Let (v1,w1), . . . , (vk ,wk) be the arcs of the current frontierline at some stage of the tree traversal (vi = vj possible fori 6= j ; all nodes vi have been examined, all nodes wi are stillunexplored).
For a node v , let h(v) be the height level of v in the decisiontree.−→ h(w) = h(v) + 1 for every arc (v ,w) of the decision tree.
Choose an arc (vi ,wi ) such that h(vi ) is minimal.
A node v is appended to the queue (added to the back) whenthe search descends from v ’s parent to v .
The next node to be visited is one of the children of the firstelement of the queue.
A node v is removed from the queue when all immediatechildren of v have been visited.
−→ At any time, the nodes in the queue form a “frontierline” in the decision tree, which passes “horizontally”through the tree (the first few elements one height leveldeeper than the other elements).
A priority queue is a data structure which allows us to perform thefollowing operations on a collection H of objects, each with anassociated real number, called its key:
create-pq(H): create an empty priority queue H.
insert(H, x): insert the element x into H.
find-min(H, x): find and return an object x of minimum keyin H.
delete-min(H, x): delete an object x of minimum key from H.
decrease-key(H, x , y): decrease the key of an object x in H tothe new value y .
The priority queue may be implemented as a heap, forexample.
It behaves much like a FIFO queue.
Difference: the top element is not the one that was insertedfirst but the “best” one according to some criterion that canbe expressed as a numerical value for each element.
Here the numerical value of a node v is just g(v).
−→ At any time, the nodes in the queue form a “frontierline” in the decision tree, which does not follow anyparticular pattern (as opposed to breadth-first search).
Advantages/Disadvantages of Pure Strategies ""bb""eebbrrr rrr
Depth first search:
May find a first feasible solution relatively soon.
Requires only small memory:the stack size is bounded by the maximal tree depth.
Best first search:
May help to find better feasible solutions faster.
Has a larger overhead per iteration.
Breadth first search:
Has huge memory requirements (eventually the whole tree hasto be stored!).
Might be useful in a combined strategy: Start with BFS, andthen switch to best-first-search. Why? In the beginning, theestimates for best first search might be not meaningfulenough to guide the search.
Solving a problem by a complete traversal of the decision tree(= systematically generate all possible solutions) is usuallycalled a brute-force approach.
It is one of the most simple ways to solve a problem (it makesno or only little use of the problem structure),
but can be afforded only for small instances (since thedecision tree has usually exponential size).
General idea:Transform a given instance I1 of some problem class P into an instance I2 ofthe same problem class P such that
the size of I2 is strictly smaller than that of I1, and
if we know an optimal solution for I2, we can easily compute an optimalsolution for I1.
Example: Cardinality Matching
Let G1 = (V ,E) be an undirected graph in which we seek a maximumcardinality matching.
Transformation rule: Let v be a vertex of degree one in G1, and (v ,w) bethe incident edge. Delete the vertices v and w (and all incident edges)from G1 to obtain a graph G2.
If M2 is a maximum cardinality matching in G2, then M1 = M2 ∪ (v ,w)is a maximum cardinality matching in G1.
Apply the transformation rule repeatedly (as long as possible).
The input of the hitting set problem is a set system which is alsoknown as a hypergraph:
Definition
A hypergraph H is a pair (F ,S) where F is a non-empty finite setand S is a family of subsets of F .The elements of F are called vertices, the elements of S are calledhyperedges.
In our concrete application, each station is a vertex, and each trainroute (given as a sequence of stations) represents a hyperedge.
Data Reduction Techniques in the Case Study ""bb""eebbrrr rrr
Simple reduction techniques apply:
If all trains that stop at station A also stop at station B, thenA may be removed from the set of stations (and from the listof stops of each train).If train A stops at all stations where train B stops, then A maybe removed from the set of trains (and from the list of trainsat every station).
These two techniques may be applied as often as possible, andevery optimal solution to the reduced instance is still anoptimal solution to the original instance.
Observation: If a station becomes isolated, but is not removed(that is, it is contained in a hyperedge with only one element),it belongs to every feasible solution to the reduced instance.
Data Reduction in Our Case Study ""bb""eebbrrr rrr
The data reduction maintains the optimal value.
If the reduction decomposes the hypergraph into connected components,we get an optimal solution to the entire instance by computing an optimalsolution to each connected component and concatenating all of them.
If a connected component is an isolated node, this node is the optimalsolution to this connected component.
The repeated application of the two reduction techniques has simplifiedthe ICE instance to a set of isolated stations.−→ Optimal solution found through reduction only!
The all–German–trains instance was simplified to a set of isolated stationsand a few, very small connected components.−→ For each connected component, the decision tree is
small enough to be searched exhaustively.
Exactly this phenomenon occurred in all tested instances (taken from allover Europe).
A subtree of the decision tree is definitely useless...
in case of a pure feasibility problem:
if the subtree is infeasible (the associated solution space to theroot of this subtree is empty),or there is at least one feasible solution outside the union ofthis subtree and all subtrees cut off previously.
in case of an optimization problem:
if the subtree is infeasible (the associated solution space to theroot of this subtree is empty),or the subtree is unbounded (then the whole problem isunbounded and we can stop),or there is at least one optimal solution (if existing!) outsidethe union of this subtree and all subtrees cut off previously,or we know already a feasible solution which is better than theoptimal solution within this subtree.
To cut off a subtree without losing correctness and optimality,we will compute some kind of evidence that this subtree isdefinitely useless.
The techniques in the following sections will basically differ bythe very nature of the computation strategy.
Side remark: we will see that different traversal strategies areappropriate for the individual types of computation strategies.
Note: for NP-hard algorithmic problems, an efficient strategycannot always determine whether a subtree is definitelyuseless or not.−→ Otherwise, applying this strategy to the root of the
decision tree would efficiently determine whetherthe instance is feasible.
Conservative Determination of Uselessness ""bb""eebbrrr rrr
So this means we cannot get perfectly accurate evidencewithin a reasonable amount of run time.
Alternatively, we will aim at a conservative strategy.
This means two outcomes are possible: “yes, definitelyuseless” or “don’t know”.
It goes without saying that the subtree must indeed bedefinitely useless whenever “yes, definitely useless” is theanswer.−→ We are on the “safe side” = conservative strategy.
The challenge is to design strategies such that “don’t know”is not too often the outcome in cases where the subtree isindeed definitely useless.
Fundamental Concept of Branch-and-Bound ""bb""eebbrrr rrr
Suppose we know
an upper bound U on the optimal cost value.Note: any feasible solution s with cost value c(s) can serve asan upper bound.
Further suppose, for every node v of the decision tree, we can compute alower bound `(v) ∈ R ∪ +∞ on the objective values of all feasiblesolutions in the subtree rooted at v .
If the solution set corresponding to v is empty, we set `(v) := +∞.
Fundamental Concept of Branch-and-Bound ""bb""eebbrrr rrr
The subtree rooted at v is useless
if `(v) = +∞, because the whole subtree is infeasible(pruning by infeasibility);
if `(v) > U, because no optimal solution may be in the subtree rootedat v (pruning by bound);
if `(v) = c(s) for some feasible s (which we have found).
If s belongs to the subtree rooted at v , we have found anoptimal solution for the whole subtree. If c(s) < U, we canupdate our upper bound and set U := c(s).Otherwise, the subtree may contain a solution with the sameobjective value, but no better solution.
In both cases, the subtree rooted at v needs no further examination(pruning by optimality).
Whenever the tree traversal encounters a leaf of the search tree(=feasible solution) s with c(s) < U it is reasonable to replace U by c(s).
−→ To increase the chance of determining useless subtrees.
There are also chances that the algorithm which computes a lower boundfor a tree node, delivers as a side effect a feasible solution s. Of course, ifc(s) < U, we update our global upper bound, too.
If the tree traversal strategy is chosen appropriately, a bad upper boundU may soon be replaced by the cost of a better feasible solution s.
So, comparison to an abstract upper bound is restricted to the initialsearch steps. Afterwards the lower bound `(v) for a node v of thedecision tree is compared to the cost value c(s ′) of a solution s ′. Thisupper bound becomes better and better as the search proceeds.
Recall the interpretation of a decision-tree node v as a subset of theoriginal set of all feasible solutions.
−→ The feasible solutions that obey all constraints correspondingto the tree arcs on the path from the root to v .
In other words: We need a lower bound `v on the optimum objectivevalue inside this subset.
General idea: We solve a relaxation of the instance at hand.
−→ The optimal objective value of the relaxed instanceis a lower bound on the optimal objective value ofthe original instance.
Clearly, such a relaxation to compute lower bounds within abranch–and–bound approach only makes sense if the relaxed problem issignificantly easier to solve than the original problem.
The general idea is to replace a difficult minimization problem by a simpleroptimization problem whose optimal value is not larger than that of the originalproblem. To this end we may
(i) enlarge the set of feasible solutions or
(ii) replace the objective function by a function which has the same or asmaller value everywhere.
Consider an optimization problem that is formulated as aspecial case of INTEGER LINEAR PROGRAMMING (ILP).
Dropping the constraint that the solution be integral (andchanging nothing else) yields the so–called LP-relaxation ofthis algorithmic problem, which is a linear programmingproblem (LP).
Since the LP-relaxation is indeed a relaxation, its objectivevalue is a lower bound on the objective value of the originalILP.
LP is much more efficient to solve than ILP.−→ See lectures on Linear Programming.
Whenever the relaxed problem is a combinatorial optimization problem, we callit a combinatorial relaxation.Example: Symmetric traveling salesman problem (STSP)Instance: an undirected graph G = (V ,E) and edge weights ce for e ∈ E .Task: Find an undirected tour of minimum weight.
Idea for a relaxation: Drop the subtour elimination constraints.Relaxed problem becomes:
min∑e∈E
ce · xe
∑e=(v,w)∈E
xe = 2 for all v ∈ V
xe ∈ 0, 1This problem is a so-called perfect 2-factor problem. (A perfect 2-factor of anundirected graph G = (V ,E) is a subset M of E such that each vertex isincident with exactly two edges from M).
Remark: A minimum cost perfect 2-factor can be found efficiently using
matching techniques (similar to the blossom algorithm).
Another, important combinatorial relaxation of the symmetric TSP is theso-called 1–tree relaxation.
A 1–tree on node set V = v1, v2, . . . , vn is a graph consisting of twoedges adjacent to node v1, plus the edges of a spanning tree on nodesv2, v3, . . . , vn.
Clearly every tour is a 1–tree, and thus the value of a shortest 1-tree is avalid relaxation of STSP.
The computation of a shortest 1-tree is easy: just find a minimumspanning tree on 2, 3, . . . , n and add the two cheapest edges incident tovertex 1.
−→ We will discuss algorithms for finding minimum spanning trees in amore general framework later.
The next technique has been proven to be very effective in practice.
Suppose we consider the following optimization model (P):
z(P) := min cT x
subject toAx = b
x ∈ X
We have a vector x of decision variables, a linear objective function cT x ,and a set of explicit linear equalities Ax = b (say k equalities). Feasiblesolutions are further restricted to lie in a given constraint set X .
Idea: Drop the explicit linear equalities, but bring them into the objectivefunction with associated Lagrangian multipliers λ = (λ1, λ2, . . . , λk).
For any vector λ of Lagrangian multipliers, the value L(λ) of the Lagrangianfunction is a lower bound on the objective value z(P) of the originaloptimization problem (P).
Proof.
Since Ax = b for every feasible solution to (P), we have for any vector λ ofLagrangian multipliers
mincT x |Ax = b, x ∈ X = mincT x + λT (Ax − b) |Ax = b, x ∈ X.
Since removing the constraints Ax = b from the second formulation cannotlead to an increase in the value of the objective function (the value mightdecrease), we have
Consider the following ILP-formulation of the symmetric TSP:(given a graph G = (V ,E), V = 1, . . . , n, edge costs ce)Variables: xe denote whether edge e ∈ E is in the tour.
Use of Relaxations within Branch & Bound ""bb""eebbrrr rrr
We have seen different possibilities to obtain relaxations of optimizationproblems.
Methodological obstacle: The subset of the solution spacecorresponding to a node of the decision tree need not be the solutionspace of some instance of the problem.
Example: Consider the TSP as a feature-based problem.
Let v be a node of this decision tree on height h. For the firsth features, a decision has been made whether to select or toreject each of them.Let X denote the set of selected features and Y the set ofrejected features.Then v corresponds to the problem of finding an optimalround tour – among all round tours that cover X and avoid Y .This is not an instance of the pure TSP anymore!
The n–queens problem is a famous combinatorial puzzle. It will serve asour toy example to illustrate several search methods.
Given any integer n, the problem is to place n queens on ann × n–chessboard so that no two queens threaten each other.
A queen threatens any other queen on the same row, column, or diagonal.
How can this problem be modeled?
Each queen must be in a different column. We introduce a variable ri(with the domain 1 . . . n) for the queen in the i-th column indicating itsrow position.
A solution is feasible if and only if for all 1 ≤ i ≤ n and 1 ≤ j ≤ n withi 6= j we have
There are three major drawbacks of the standard backtracking scheme:
repeated failure for the same reason (trashing).
Trashing occurs because backtracking does not identify the real reason ofthe conflict. Trashing can be avoided by backjumping, i.e., by a schemeon which backtracking is done directly to the variable that caused thefailure.
Backtracking is having to perform redundant work. Even if theconflicting values of the variables are identified they are not rememberedfor immediate detection of the same conflict on a subsequentcomputation. Methods to resolve this problem are called backchecking orbackmarking.
Backtracking detects the conflict too late as it is not able to detect theconflict before it occurs. This can be avoided by applying consistencytechniques to forward check possible conflicts.
Consider the situation in the 8-queens problem where we have allocatedthe first five queens. No queen can be placed within column 6.
3,4
1
2,5
4,5
3,5
1
2
3
conflicting queens
with this position
1 2 3 4 5 6 7 8
8
7
6
5
4
12
3
Backtracking would backtrack to column 5 and find another row for thisqueen (row 8 is feasible). But then it is still impossible to place a queenin column 6.
Backjumping is more intelligent in finding the “real conflict”. The closestqueen that can resolve the conflict is queen 4.
In general, backjumping goes back to the lowest level of the tree that hasa conflict with each possible value for the current variable.
The simplest consistency technique is referred to as nodeconsistency (NC).
Definition
The node representing a variable X in a constraint graph is nodeconsistent if and only if for every value x in the current domain DX
of X , each unary constraint on X is satisfied.
A CSP is node consistent if and only if all variables are nodeconsistent.
If the domain DX of a variable X contains a value x that does notsatisfy some unary constraint on X , this node inconsistency cansimply be eliminated by removing such values from the domain DX .
The next consistency technique considers binary constraints.
Definition
Let X ,Y be two variables which occur in a binary constraint. Wesay the ordered pair (X ,Y ) is arc consistent if and only if for everyvalue x in the current domain of DX there is some value y in thedomain of Y such that X = x and Y = y is permitted by thebinary constraint between X and Y .
A CSP is arc consistent if and only if every arc (X ,Y ) in theconstraint graph is arc consistent.
Note: the concept of arc consistency is directional. If an arc(X ,Y ) is arc consistent, this does not imply that (Y ,X ) is alsoconsistent.
An arc (X ,Y ) can be made consistent by simply deletingthose values from the domain of X for which no value in thedomain Y does exist such that the binary constraint betweenX and Y is satisfied.
To achieve overall arc consistency it is necessary to apply thisreduction procedure repeatedly as long as the domain of anyvariable changes.
Consistency techniques are helpful to reduce the search space.
Let us now embed consistency techniques into the searchalgorithm.
Such schemes are usually called look-ahead strategies andthey are based on the idea of reducing the search spacethrough constraint propagation.
Remark: backtracking can be seen as a combination ofdepth-first search and a fraction of arc consistency: at eachnode, we test arc consistency among the already instantiatedvariables, i.e. we check the validity of constraints consideringthe partial instantiation.
Forward checking is the easiest way to prevent future conflicts.
Instead of performing arc consistency between instantiated variables, itperforms arc consistency between pairs of a non-yet instantiated variableand an instantiated variable.
It maintains the invariant that for every un-instantiated variable thereexists at least one value in its domain which is compatible with the valuesof instantiated variables.
When a value is assigned to the current variable, any value in the domainof a “future” variable which conflicts with this assignment is(temporarily) removed from the domain.
More precisely, values are removed for the whole subtree rooted at thecurrent instantiation (but of course not for other branches of the searchtree!).
Note: whenever a new variable is considered, all its remaining values areguaranteed to be consistent with the past variables.
The variable ordering can noticeably change the efficiency of thesearch.
What variable ordering should be chosen in general?
Similar to our discussion for branch-and-bound, there are severalheuristics:
Rule 1: Prefer the variables with the smaller domain.Rationale: To succeed, try first where you are most likely tofail. If failure is inevitable, then the sooner we discover it, thebetter.
Rule 2: In case of a tie, prefer the variable with moreconstraints to instantiated variables.Rationale: Deal with the hard cases first: they can only getmore difficult.
Dynamic Programming is an exact optimization method whichsolves a problem by combining the solutions to subproblems.
A dynamic programming algorithm solves every subproblemjust once and saves its answer in a table, thereby avoiding thework of recomputing the answer to subproblems which havealready been solved in an earlier step.
In contrast, a divide-and-conquer algorithm would repeatedlysolve common subproblems (does more work than necessary).
Example I: Matrix Chain Multiplication ""bb""eebbrrr rrr
Example: Compute the matrix product
A1A2A3
where A1 is a 10× 100, A2 a 100× 5, and A3 a 5× 50 matrix.
Assumption: We use a straightforward matrix multiplication algorithm whichrequires pqr scalar multiplications to multiply matrix A of dimension p× q withmatrix B of dimension q × r .
Example I: Matrix Chain Multiplication ""bb""eebbrrr rrr
A product of matrices is fully parenthesized if it is either
a single matrix or
the product of two fully parenthesized matrix products,surrounded by parentheses.
The matrix chain multiplication problem:Input: Given n matrices A1,A2, . . . ,An where matrix Ai hasdimension pi−1 × pi for i = 1, . . . , n.Task: Fully parenthesize the product A1 · A2 · · ·An in a way thatminimizes the number of scalar multiplications.
Example I: Matrix Chain Multiplication ""bb""eebbrrr rrr
Step 1: (structure of an optimal solution)An optimal parenthesization of the product A1 · A2 · · ·An splits theproduct between Ak and Ak+1 for some integer k in the range1 ≤ k < n.Key observation: The prefix subchain A1 · A2 · · ·Ak within theoptimal parenthesization of A1 · A2 · · ·An must be an optimalparenthesization of A1 · A2 · · ·Ak (a similar property holds for thesuffix subchain).Step 2: (recursive solution)Let m[i , j ] be the minimum number of scalar multiplicationsneeded to compute the matrix product Ai · · ·Aj . The followingrecursion holds:
m[i , j ] =
0 if i = j
mini≤k<j m[i , k] + m[k + 1, j ] + pi−1pkpj if i < j
Example I: Matrix Chain Multiplication ""bb""eebbrrr rrr
Step 4: (construct the optimal solution)The following recursive procedure computes the matrix chain productAi · · · · · Aj given the matrices A = (A1, . . . ,An), the s[i , j ] table computed instep 3, and the indices i and j .
Pairwise sequence alignment or inexact matching is the problem ofcomparing two sequences while allowing certain mismatches betweenthem.Motivation: Mutation in DNA is a natural evolutionary process. DNAreplication errors cause substitutions, insertions and deletions ofnucleotides. This can be seen as “editing” DNA strings over the alphabetΣ = A,C ,G ,T.Similarity can be the clue
to common evolutionary origins, or
to common function.
Example: Input strings: s = GCATCAGC and t = CAATAAGGCG
Alignment of s and t:G C A − T C A G − C −− C A A T A A G G C G
We are now ready to state the fundamental problem of pairwisesequence alignment.
Problem (Pairwise global sequence alignment)
Input: Strings s = s0s1 · · · sn−1 ∈ Σn and t = t0t1 · · · tm−1 ∈ Σm,and a distance measure d based on cost function c.Output: An optimal alignment (s, t) of s and t.
To solve this problem, we will use the concept of dynamicprogramming.Needleman and Wunsch (1970) were the first to apply dynamicprogramming to this problem.
Dynamic Programming for Sequence Alignment ""bb""eebbrrr rrr
Define a table (matrix) D(·, ·) of dimension (n + 1)× (m + 1).The entry D(i , j) denotes the cost of an optimal alignment for thesubstrings s0s1 · · · si−1 and t0t1 · · · tj−1, where 0 ≤ i ≤ n and 0 ≤ j ≤ m.
The following recursion holds for 1 ≤ i ≤ n and 1 ≤ j ≤ m:
Dynamic Programming and the Edit Graph ""bb""eebbrrr rrr
The recursion from the previous slide can be interpreted as a shortestpath problem in a so-called edit graph G . This graph is a directedacyclic graph, and it is constructed as follows:
With each entry D(i , j) in the table D, we associate a vertex (i , j).
For each pair (i , j) with 1 ≤ i ≤ n and 1 ≤ j ≤ m, there are the followingthree arcs:
(i − 1, j)→ (i , j) with length c(si−1,−),(i − 1, j − 1)→ (i , j) with length c(si−1, tj−1),(i , j − 1)→ (i , j) with length c(−, tj−1).
Finally, we add the arcs(i − 1, 0)→ (i , 0) with length c(si−1,−) for 1 ≤ i ≤ n, and(0, j − 1)→ (0, j) with length c(−, tj−1) for 1 ≤ j ≤ m.
A shortest path from (0, 0) to (n,m) corresponds to an optimalalignment.
Rough answer: the simplest imaginable strategy for a heuristicrestriction of the search space.
Explore exactly one branch of the decision tree.At each node, choose the option that looks “most promising”at this moment (without foresight = “greedily”).
Due to the second point, this strategy is called the greedyalgorithm.
However, the term “greedy algorithm” is often used in a morerestrictive way: for certain special cases of this generalalgorithmic scheme only.
Maximization/Minimization Problem forIndependence Systems ""
bb""eebbrrr rrr
MAXIMIZATION PROBLEM FOR INDEPENDENCE SYSTEMSInstance: An independence system (E ,F) and c : E 7→ R.Task: Find an X ∈ F such that c(X ) :=
∑e∈X c(e) is maximum.
MINIMIZATION PROBLEM FOR INDEPENDENCE SYSTEMSInstance: An independence system (E ,F) and c : E 7→ R.Task: Find a basis B ∈ F such that c(B) :=
∑e∈B c(e) is minimum.
Remark: The set F is usually not given by an explicit list of its elements. Weusually assume to have an oracle which - given a subset F ⊂ E - decideswhether F ∈ F .
Examples of Optimization Problems forIndependence Systems ""
bb""eebbrrr rrr
Many combinatorial optimization problems can be formulated as maximizationor minimization problems for independence systems:
Example 1: MAXIMUM WEIGHT STABLE SET PROBLEMGiven a graph G with vertex set V (G) and weights c : V 7→ R, find a stableset X in G of maximum weight.
Here E = V (G) and F = F ⊆ E : F is stable in G.
Example 2: TSPGiven a complete undirected graph G and weights c : E(G) 7→ R+, find aminimum weight Hamiltonian circuit in G .
Here E = E(G) and F = F ⊆ E : F is subset of a Hamiltonian circuit in G.
Example 3: SHORTEST PATH PROBLEMGiven a digraph D = (V ,A), c : A 7→ R and s, t ∈ V such that t is reachablefrom s, find a shortest s-t–path in D with respect to c.
Here E = A and F = F ⊆ E : F is subset of an s-t–path .
We consider the maximization problem for independence systems.
Input: An independence system (E ,F), given by an independence oracle (i.e.an oracle which, given a set F ⊆ E , decides whether f ∈ F or not), weightsc : E 7→ R+.
Output: A set F ∈ F .
STEP 1: Sort E = e1, e2, . . . , en such that c(e1) ≥ c(e2) ≥ · · · ≥ c(en).
STEP 2: Set F := ∅.STEP 3: FOR i = 1 TO n DO
IF F ∪ ei ∈ F THEN set F := F ∪ ei.
Remark: We do not have to consider negative weights since elements withnegative weight never appear in an optimum solution.
Let us now consider the minimization problem for independence systems.The following greedy algorithm requires a more complicated oracle. Given a setF ⊆ E , the oracle decides whether F contains a basis (basis superset oracle).
Input: An independence system (E ,F), given by a basis superset oracle,weights c : E 7→ R.
Output: A basis F of (E ,F).
STEP 1: Sort E = e1, e2, . . . , en such that c(e1) ≥ c(e2) ≥ · · · ≥ c(en).
STEP 2: Set F := E .
STEP 3: FOR i = 1 TO n DOIF F \ ei contains a basis THEN set F := F \ ei.
How Good is the Greedy Algorithm? ""bb""eebbrrr rrr
In general, the solution deliverd by the greedy algorithm canbe quite poor.
In Section 3, we have seen that the simple local-searchstrategy is guaranteed to deliver an optimal solution if theproblem fulfills a certain structural property (exactneighborhood).
So an interesting question is whether the greedy algorithmprovably delivers optimal solutions if some structural propertyis fulfilled.
The remainder of this section is devoted to a characterizationof this for the case of independence systems. This leads us toso-called matroids.
An independence system (E ,F) is a matroid if(M3) If X ,Y ∈ F and |X | > |Y |, then there is an x ∈ X \ Y withY ∪ x ∈ F .
The name matroid points out that this structure is a generalizationof matrices.
Example 1: Matric matroidE is the set of columns of a matrix A over some field, andF := F ⊆ E : The columns in F are linearly independent overthat field.Example 2:E is a finite set, k an integer and F := F ⊆ E : |F | ≤ k.
Example 3: Graphic matroidE is a set of edges of some undirected graph G = (V ,E) andF := F ⊆ E : the subgraph (V ,F ) is a forest .
Proof (that (M3) is fulfilled:)
Let X ,Y ∈ F and suppose Y ∪ x 6∈ F for all x ∈ X \ Y .We have to show that |X | ≤ |Y |.For each edge x = v ,w ∈ X , v and w are in the same connectedcomponent of (V ,Y ) (by our assumption).
Hence each connected component of (V ,X ) is a subset of a connectedcomponent of (V ,Y ).
So the number p of connected components of the forest (V ,X ) is greaterthan or equal to the number q of connected components of (V ,Y ).
Since p = |V | − |X | and q = |V | − |Y |, this implies |X | ≤ |Y |. 2
Matroids and the Greedy Algorithm ""bb""eebbrrr rrr
Theorem
Let (E ,F) be an independence system.
(E ,F) is a matroid if and only if the BEST-IN-GREEDY algorithm finds anoptimal solution for the maximization problem for (E ,F , c) for every costfunction c : E 7→ R+.
Proof: ”⇒”: Suppose first that (E ,F) is a matroid.
Let F = e1, . . . , ek be the solution constructed by the greedy algorithm.
Suppose for a contradiction that there is an independent setF ′ = f1, . . . , f` such that
c(F ) =k∑
i=1
c(ei ) <∑j=1
c(fj) = c(F ′) .
Since c[·] ≥ 0, we may assume that F ′ is a basis.
The output of the greedy algorithm F is also a basis.
By the theorem from Slide 325, |F | = |F ′|, and thus k = `.
we assume that the independence system is not a matroidand have to show that the greedy algorithm will fail to producean optimal solution for at least one choice of c .
So suppose there are independent sets F1, F2 such that |F1| = p and|F2| = p + 1 but F1 ∪ e 6∈ F is not independent for any e ∈ F2 \ F1.
We may assume that p ≥ 2.
We will construct a particular objective function c such that the greedyalgorithm will fail to produce an optimal solution for c.
Let us consider the following cost function c for E :
c(e) =
p + 2 if e ∈ F1
p + 1 if e ∈ F2 \ F1
0 otherwise .
We observe that F1 is suboptimal, because
c(F2) ≥ (p + 1)2 > p(p + 2) = c(F1).
The greedy algorithm, if applied to this instance, will start by picking allelements of F1.
Afterwards, it will not improve the total weight, because for all otherelements either F1 ∪ e 6∈ F , or otherwise, c(e) = 0.
Hence, we have proved that the greedy algorithm fails to produce anoptimal solution for at least one cost function, if the independence systemis not a matroid. 2
Application to Minimum Spanning Trees ""bb""eebbrrr rrr
Consequence: The BEST-IN-GREEDY applied to (E ,F , c) solves the originalMINIMIZATION PROBLEM for (E ,F , c) to optimality.
Note: Working with c instead of c only reverses the element order used withinthe BEST-IN-GREEDY algorithm.
Application:
We have seen that the minimum spanning tree problem on a graphG(V ,E) with edge costs c can be formulated as a minimization problemover an independence system: E is the edge set of an undirected graphand F is the set of forests in G .
We have verified that (E ,F) is a matroid.
With the above remarks we conclude that the BEST-IN-GREEDY appliedto elements ordered by increasing weight yields an optimal solution.
This algorithm is well-known as Kruskal’s algorithm for MINIMUMSPANNING TREE.
Greedy Algorithm for Start Solutions ""bb""eebbrrr rrr
The greedy algorithm is often useful to generate a first start solution forlocal search.
Example: the cardinality matching problem.
Greedy matching:
1 Set M = ∅.2 For each edge e ∈ E (in an arbitrary order) do:
If M ∪ e is a matching, insert e into M.3 Deliver the final M.
Very often, the greedy matching is not much smaller than a maximalmatching.
With a greedy matching at hand we can apply local search with an exactneighborhood (i.e. search for augmenting paths) afterwards.
Such a heuristic may speed up the search for a maximum matchingconsiderably (as the remaining number of iterations is relatively small).
Matching is but an example of greedy start solution; we can use greedyon any independence system to compute a (hopefully) goodinclusion-maximal independent set.
Let P be an optimization problem, I an instance of P.
OPT (I ) denotes the objective value of an optimal solution.
APP(I ) denotes the objective value delivered by an algorithm A.
If we cannot solve a problem to optimality when the ideal case would beto find a solution which is guaranteed to differ from the optimum only bya (small) constant:
Definition
A polynomial-time algorithm A for an optimization problem P is called anabsolute approximation algorithm if there exists a constant k such that
Double-Tree-Algorithm:STEP 1: Find a minimum spanning tree T in G with respect to c.STEP 2: Create a multigraph T ′ by using two copies of each edge of T .STEP 3: Find an Eulerian walk in T ′.STEP 4: Transform this walk into a tour by taking shortcuts.
Analysis of the Double-Tree-Algorithm ""bb""eebbrrr rrr
Theorem
The Double-Tree-Algorithm is a factor 2 approximation algorithm for theMETRIC TSP.
Proof:
The length of a minimum spanning tree c(E(T )) is certainly a lowerbound for the length OPT of an optimal tour (since by deleting one edgefrom any tour we get a spanning tree).
Therefore, c(E(T ′)) = 2 · c(E(T )) ≤ 2 · OPT .
In STEP 4, we transform an Eulerian walk of length c(E(T ′)) into a tour.
The tour is defined by the order in which the vertices appear in this walk— we ignore all but the first occurrence of a vertex.
The triangle inequality implies that this tour is no longer than c(E(T ′)).2
No, not at all - but almost the end of this lecture!
Ideally, we would like to get arbitrarily close to the optimum solution (inpolynomial time).
Thus, for any ε > 0, we would like to have an algorithm Aε which deliversa (1 + ε)–approximation in polynomial time in the input length of thegiven instance.
Such a family of algorithms Aε is called a polynomial-time approximationscheme (PTAS).
For many problems there is such a PTAS (but not for METRIC TSP).
The interested reader is referred to the books by Hochbaum (ed.) andVazirani (confer Slide 10).