AOR Syllabus20132014

8/11/2019 AOR Syllabus20132014

1/115

ADVANCED OPERATIONS RESEARCH

Yves Crama

HEC Management School, University of Liege

January 2014

8/11/2019 AOR Syllabus20132014

2/115

Contents

1 Introduction 1

2 Combinatorial optimization and computational complexity 3

2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 The shortest path problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.2 The Chinese postman problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.3 The traveling salesman problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.4 The 0-1 linear programming problem . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.5 The graph equipartitioning problem . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.6 The graph coloring problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.7 Combinatorial optimization in practice . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.8 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 A glimpse at computational complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.1 Computational performance criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.2 Problems and problem instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.3 Easy and hard problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.4 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Heuristics for combinatorial optimization problems 173.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 Reformulation, rounding and decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3 List-processing heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3.1 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

i

8/11/2019 AOR Syllabus20132014

3/115

ii CONTENTS

3.4 Neighborhoods and neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4.1 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4.2 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.5 Steepest descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.5.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.5.2 Local minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.5.3 Choice of neighborhood structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.5.4 Selection of neighbor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.5.5 Fast computation of the objective function . . . . . . . . . . . . . . . . . . . . . . 31

3.5.6 Flat ob jective functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.5.7 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.6 Simulated annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.6.1 The simulated annealing metaheuristic . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.6.2 Choice of the transition probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.6.3 Stopping criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.6.4 Implementing the SA algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.6.5 Variants of the SA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.7 Tabu search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.7.2 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.7.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.8 Genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.8.2 Diversification via crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.8.3 A basic genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.8.4 Intensification and local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.8.5 Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.8.6 Implementing a genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4 Modeling languages for mathematical programming 59

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

8/11/2019 AOR Syllabus20132014

4/115

CONTENTS iii

5 Integer programming 63

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.1.1 Integer programming models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.1.2 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.2 Branch-and-bound method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.2.1 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.2.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.2.3 Heuristic solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2.4 Tight formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.2.5 Some final comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6 Neural networks 77

6.1 Feedforward neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.2 Neural networks as computing devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.3 Neural networks as function approximation devices . . . . . . . . . . . . . . . . . . . . . . 80

6.4 Unconstrained nonlinear optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.4.1 Minimization problems in one variable: introduction . . . . . . . . . . . . . . . . . 82

6.4.2 Equations in one variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.4.3 Minimization problems in one variable: algorithms . . . . . . . . . . . . . . . . . . 84

6.4.4 Multivariable minimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.5 Application to NN design: the backpropagation algorithm . . . . . . . . . . . . . . . . . . 86

6.5.1 Extensions of the delta rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.5.2 Model validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6.7 Notes on PROPAGATOR software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.7.1 Input files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.7.2 Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.7.3 Main window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

7 Cases 95

7.1 Container packing at Titanic Corp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.2 Stacking boxes at Gizeh Inc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

8/11/2019 AOR Syllabus20132014

5/115

iv CONTENTS

7.3 A high technology routing system for Meals-on-Wheels . . . . . . . . . . . . . . . . . . . . 96

7.4 Operations scheduling in Hobbitland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7.5 Setup optimization for the assembly of printed circuit boards . . . . . . . . . . . . . . . . 987.6 A new product line for Legiacom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Bibliography 110

8/11/2019 AOR Syllabus20132014

6/115

Chapter 1

Introduction

The aim of the course Advanced Operations Researchis to present several perspectives on mathematical

modeling and problem-solving strategies as they are used in operations research.

The course contains several independent parts, namely:

general-purpose heuristic strategies for the solution of combinatorial optimization problems, such

as simulated annealing, tabu search or genetic algorithms;

learning of a modeling language, i.e. a computer language specially devoted to the formulation,

the solution and the analysis of large-scale optimization models (linear or nonlinear programmingproblems);

an introduction to mixed integer programming models and algorithms;

other numerical methods, as time allows: neural networks, simulation, ...

These lecture notes propose a preliminary draft of the material usually covered in the course. They

concentrate mostly on combinatorial optimization heuristics, on mixed integer programming methods and

on neural networks. Modeling languages are handled more superficially, as this topic is mostly illustrated

through the development of numerical models in the computer lab.The course assumes that the reader has had a first introduction to operations research and has some

elementary knowledge of mathematical modeling, of mathematical programming and of graph theory.

Special thanks are due to Jean-Philippe Peters who drafted the first version of these classroom notes.

1

8/11/2019 AOR Syllabus20132014

7/115

2 CHAPTER 1. INTRODUCTION

8/11/2019 AOR Syllabus20132014

8/115

Chapter 2

Combinatorial optimization and

computational complexity: Basic

notions

The generic combinatorial optimization (CO) problem is

minimize {F(x)| x X} (2.1)

whereX is afinite(or at least, discrete1) set offeasible solutionsand Fis a real-valued objective function

defined on X. Of course, ifXis given in extension, i.e., by a complete explicit list of its elements, then

solving (CO) is quite easy: it suffices to compute the value ofF(x) for all elements x Xand to retain

the best element. But whenX is defined implicitly rather than in extension, the problem may become

much harder.

2.1 Examples

2.1.1 The shortest path problem

Nowadays, lots of commercial software products allow you to select effortlessly the shortest possible

route from your current location to a chosen destination (for example, from Liege to Torremolinos). The

1Intuitively, a set is discreteif it does not contain any continuous subset

3

8/11/2019 AOR Syllabus20132014

9/115

4 CHAPTER 2. COMBINATORIAL OPTIMIZATION AND COMPUTATIONAL COMPLEXITY

optimization problem which has to be solved whenever you address a query to the system can be modelled

as follows.

There is a graph G = (V, E) whereVis a finite set of elements called verticesand Eis a collection of

pairs of vertices called edges(think ofVas a list of geographical locations and ofEas a road network;

see e.g. Figure 2.1 for a representation). Assume that every edge e Ehas a nonnegative length (e),

and lets and t be two vertices in G. Theshortest path problemis to find a path (a connected sequence of

edges) through the graph that starts ats and ends at t, and which has the shortest possible total length.

This is clearly a CO problem, where Xis the (finite) set of all paths from s to t and F(x) is the total

length of path x. Note that the cardinality ofXcan be of the same order of magnitude as 2|V|, which is

quite large as compared to the size of the graph.

4 5

2 3

1

6

Figure 2.1: A graph with 6 vertices and 8 edges

2.1.2 The Chinese postman problem

This problem is similar to the shortest path problem, except that we consider here the additional con-

straint that every edge ofG should be traversed exactly once by the path from sto t (the postman has

to visit every street in his district). It is also usual to assume that s= t in this problem (the postmanreturns to the depot at the end of the day).

Besides its postal illustration, this model encounters applications in a variety of vehicle routing situ-

ations (garbage collection, snow plowing, street cleaning, etc.) and in the design of automatic drawing

software.

8/11/2019 AOR Syllabus20132014

10/115

2.1. EXAMPLES 5

2.1.3 The traveling salesman problem

The traveling salesman problem (denoted TSP) is again similar to the shortest path problem, with the

added requirement that every vertex should be visited exactly once by the path from s to t: the salesman

must visit each and every customer (located in the cities in V) along the way. In the sequel, we shall

always assume that G is a complete graph (i.e., it contains all possible edges) and that s= t. Thus, we

speak of a traveling salesman tour rather than path. Then,Xcan simply be viewed as the set of all

permutations of the elements ofV and|X|= |V|!. For instance, if|V|= 30, then |V|! is roughly 2 1032.

This famous combinatorial optimization problem has numerous applications, either in its pure form

or as a subproblem of more complex models. It arises for instance in many production scheduling settings

(sequencing of tasks on a single machine when the setup time between two successive tasks depends on

the identity of these tasks, sequencing of drilling operations in metal sheets, sequencing of componentplacement operations for the assembly of printed circuit boards, etc.) and in various types of vehicle

routing models (truck delivery problems, mail pickup, etc.).

2.1.4 The 0-1 linear programming problem

We can express the 0-1 LP problem as

min cx

subject to Ax b and x {0, 1}n

wherec Rn,b Rm andA Rmn are the parameters (or numerical data) of the problem and x Rn

is a vector of (unknown) decision variables. Note that, if we drop the constraint x {0, 1}n, then the

problem is simply a linear programming problem which can be solved by a variety of efficient algorithms

(e.g., the simplex method or an interior-point method). However, the requirement that x {0, 1}n leads

to a (much harder) CO problem whereX={ x {0, 1}n : Ax b }. The cardinality of this set, although

finite, is potentially as large as 2n (whenn = 30, this is approximately 109).

Theknapsack problemis the special case of 0-1 LP with only one inequality constraint:

max cx

subject to ax b andx {0, 1}n

where a, c Rn+ and b R. The usual interpretation of this problem is that the indices i = 1, 2, . . . , n

denote n objects that a hiker may want to carry in her knapsack, ci is the utility of object i, ai is its

weight and b is the maximum weight that the hiker is able to carry.

8/11/2019 AOR Syllabus20132014

11/115

8/11/2019 AOR Syllabus20132014

12/115

2.2. A GLIMPSE AT COMPUTATIONAL COMPLEXITY 7

R Y

G B

R

G

Figure 2.2: A feasible coloring

Applegate, Bixby, Chvatal, and Cook (2006), Barnhart, Johnson, Nemhauser, Sigismondi, and Vance

(1993), Bartholdi, Platzman, Collins, and Warden (1983), Bollapragada, Cheng, Phillips, Garbiras, Sc-

holes, Gibbs, and Humphreville (2002), Crama, van de Klundert, and Spieksma (2002), Crama, Oer-

lemans, and Spieksma (1996), Jain, Johnson, and Safai (1996), Glover and Laguna (1997), Kohli and

Krishnamurti (1987), Moonen and Spieksma (2003), Oliveira, Ferreira, and Vidal (1993), Tyagi and

Bollapragada (2003), etc.

2.1.8 Exercises.

Exercise 1. Consider the Meals-on-Wheels case in Section 7.3. Explain the similarities that this problem

shares with the traveling salesman problem, as well as the differences between the problems.

2.2 A glimpse at computational complexity

In order to fully appreciate the field of combinatorial optimization, it is necessary to understand, at least

at an intuitive level, some of the basic concepts of computational complexity. This part of theoretical

computer science deals with fundamental, but extremely deep questions like: what tasks can be carriedout by a computer?, or how much time does a given computational task require?

In this section, we attempt to introduce some elements of computational complexity, in a very informal

and hand-waving way. We refer the interested reader to Tovey (2002) for a more formal tutorial, and to

Papadimitriou and Steiglitz (1982) for a rigorous treatment of the topic.

8/11/2019 AOR Syllabus20132014

13/115


2.2.1 Computational performance criteria

What do we expect from a CO algorithm ? Well, an obvious answer would be that this algorithm should

always return an optimal solution of the problem. Is it the only game in town ? Certainly not. We

might also want it to be fast or efficient. Combining these two expectations is a crucial thing. Indeed

the required time to solve a problem logically increases together with the size of this problem, where the

size can be measured by the amount of data needed to describe a particular instance of the problem.

Let us take a look at an example. Suppose that we want to solve a 0-1 linear programming problem

involving n variables xj {0, 1}, j = 1, . . . , n. We can certainly find an optimal solution by listing all

possible vectors (x1, x2, . . . , xn), by checking for each of them whether it is feasible or not, by computing

the value of the objective function for each such feasible solution, and by retaining the best solution

found in the process. If we decide to go that way, then we must consider 2n vectors. For n = 50, that

means 250 1015 = 1, 000, 000, 000, 000, 000 vectors! If our algorithm is able to enumerate one million

(1,000,000) solutions per second, the whole procedure takes 109 seconds, or about 30 years. And for

n= 60, the enumeration of the 260 solutions would take about 30, 000 years !!

Note thatadding10 variables to the problem increases the computing time by a multiplicative factor

of 210 1, 000. So, with n = 80 variables (a rather modest problem size), the same algorithm would run

for 30 billion years, which is about twice the age of the universe. Not really efficient, by any practical

standards...

Let us look at this issue from another vantage point. Consider the well-known Moores law: Gordon

Moore, co-founder of the chips giant Intel, prophetized in 1965 that the number of transistors per square

inch on integrated circuits would double every 18 months per year starting from 1962, the year the

integrated circuit was invented (see the original paper of Moore (1965) for more details). In other words,

your PC processor works twice faster every year and a half, meaning that its speed is multiplied by 100

in 10 years.2 So, if you were able to enumerate 2n solutions in one hour in 1997, you could enumerate

100 2n

8/11/2019 AOR Syllabus20132014

14/115

8/11/2019 AOR Syllabus20132014

15/115


1. Matrix addition problem:

Instance size: 2n2.

Algorithm: any naive addition algorithm.Running time: n2 (additions). We denote this byO(n2), meaning that the running time grows

at most like n2.

2. Shortest path problem:

Instance size: O(n2) wheren= |V|.

Algorithm 1: enumerate all possible paths between s and t.

Running time of Algorithm 1: there could be exponentially many paths and tA1= O(2n).

Algorithm 2: Dijkstras algorithm (see Nemhauser and Wolsey (1988)).

Running time: O(n2) operations.

3. Traveling salesman problem:

Instance size: O(n2) wheren= |V|.

Algorithm: enumerate all possible tours.

Running time: O(n!).

In view of these examples, we are led to the following concept: the complexityof an algorithm A for

a problem P is the function

cA(n) = max{tA(I)| Iis an instance ofPwith size s(I) =n}. (2.2)

This is sometimes called the worst-case complexity ofA: indeed, the definition focuses on the worst-case

running time ofA on an instance of size n, rather than on its average running time.

2.2.3 Easy and hard problems

Figure 2.3 represents different types of complexity behaviors for algorithms.

The algorithm A is polynomial ifcA(n) is a polynomial (or is bounded by a polynomial) in n, and

exponential ifcA(n) grows faster than any polynomial function in n. Intuitively, we can probably accept

the idea that a polynomial algorithm is more efficient thaSn an exponential one.

For instance, the obvious algorithms for the addition and or the multiplication of matrices are poly-

nomial. So is the Gaussian elimination algorithm for the solution of systems of linear equations. On

the other hand, the simplex method (or at least, some variants of it) for linear programming problems

8/11/2019 AOR Syllabus20132014

16/115


Figure 2.3: (a) Linear: F(n) =an + b (b) Exponential: F(n) =a 2n

is known to be exponential3 while interior point methods are polynomial. This clearly illustrates the

emphasis on the worst-case running time which was already underlined above: indeed, in an average

sense, the simplex algorithm is an efficient method.

The complete enumeration approach for shortest path, Chinese postman or traveling salesman prob-

lems is exponential, since all these problems have an exponential number of feasible solutions. But

polynomial algorithms exist for the shortest path problem or the Chinese postman problem.

For the traveling salesman problem or for 0-1 integer programming problems, by contrast, only ex-

ponential algorithms are known. In fact, it is widely suspected that there does not exist any polynomial

algorithm for these problems. This is a typical feature of so-called NP-hardproblems which we define

(very informally again) as follows (see Papadimitriou and Steiglitz (1982) for details).

Definition 2.2.1. A problem P is NP-hard if it is as least as difficult as the 0-1 linear programming

problem, in the sense that any algorithm forPcan be used to solve the 0-1 LP problem with a polynomial

increase in running time.

The next claim has resisted all proof attempts (and there have been many) since the early 70s, but

the vast majority of computer scientists and operations researchers believe that it holds true.

3Klee and Minty (1972) provide instances Iof the LP problem such that tsimplex 2s(I)

8/11/2019 AOR Syllabus20132014

17/115

8/11/2019 AOR Syllabus20132014

18/115


Definition 2.2.2. Aheuristicfor an optimization problemPis an algorithm which is based on intuitively

appealing principles, but which does not guarantee to provide an optimal solution ofP.

So, when running on a particular CO problem, a heuristic could for instance

return an optimal solution of the problem, or

return a suboptimal solution, or

return an infeasible solution, or

fail to return any solution at all,

etc.

This very broad definition of a heuristic may seem rather amazing at first sight. It raises again the

question of the criteria which can be applied to analyze the performance of a particular heuristic. We

mention here two criteria which will be of particular concern in this course.

Computational complexity

Generally speaking, we want heuristics to be fast, at least when compared with the highly exponential

running times mentioned above. In fact, the main reason for giving up optimality is that we want the

heuristic to compute quickly a reasonably good solution. Thus, the basic trade-off that we want to achieve

reads

SOLUTION QUALITY vs. RUNNING TIME

Quality of approximation

The solution returned by the heuristic should provide a good approximation of the optimal solution. To

understand how to measure this, let xH be the solution computed by heuristicHfor a particular instance

and letxopt be an optimal solution for this instance.

Then,

E(xH) = F(xH) F(xopt)F(xopt)

0 (2.3)

provides a relative error measure: the closer it is to 0, the better the solution xH.

In general, however, F(xopt) is unknown. So, suppose now that we know how to compute a lower

bound on F(xopt), i.e. a number F such thatF F(xopt) (this is often much easier to compute than

8/11/2019 AOR Syllabus20132014

19/115


F(xopt)). Define

E(xH) =F(xH) F

F . (2.4)

Then we have

E(xH) = F(xH)

F(xopt) 1

F(xH)

F 1 =E(xH) (2.5)

which means that E(xH) overestimates the relative error E(xH). So, if E(xH) is small, we can

certainly be happy with the quality of the solution provided by H. (Note also that if the lower bound

F is reasonably close to F(xopt), then E(xH) actually provides a good estimate of the error.)

For example, consider the traveling salesman instance described by the (symmetric) distance matrix

L, where ij represents the distance from i to j , i, j = 1, 2, . . . , 6:

L =

0 4 7 2 6 3

4 0 3 5 5 7

7 3 0 2 6 5

2 5 2 0 9 8

6 5 6 9 0 5

3 7 5 8 5 0

.

Assume now that a heuristic returns the tourxH = (1, 2, 3, 4, 5, 6) (displayed in Figure 2.4).

6

1

5 2

4

3

3

4

3

2

9

5

Figure 2.4: A feasible tour

The total length of this tour is F(xH) = 4 + 3 + 2 + 9 + 5 + 3 = 26. On the other hand, an obvious

lower bound on the optimal tour length is given by the sum of the 6 shortest distances in L. ThusF

= 2 + 2 + 3 + 3 + 4 + 5 = 19, and, consequently, E(xH) = 261919 0.37. We can therefore conclude

8/11/2019 AOR Syllabus20132014

20/115


that xH is at most 37% longer than the optimal tour.

In order to compute lower bounds for combinatorial optimization problems, a simple but powerfulprinciple can often be used: when a constraint of a minimization problem P is relaxed (i.e., when

the constraint is either removed or replaced by a weaker one), then the optimal value of the resulting

relaxed problem provides a lower bound on the optimal value ofP. This principle will be illustrated

on the examples below.

2.2.4 Exercises.

Exercise 1. Consider again the traveling salesman problem. For every vertex v V, select the shortest

edge ev incident to v. Show that

vV (ev) is a lower bound on the length of the optimal tour.Compute this lower bound for the numerical example in Section 2.2.3. Can you improve this lower bound

by taking into account thetwo shortest edges incident to every vertex v? What bound do you obtain for

the numerical example?

Exercise 2. Consider the following problem: you want to save n electronic files with respective sizes

s1, s2, . . . , sn 0 on the smallest possible number of storing devices (say, floppy disks) with capacity C.

This problem is known under the name ofbin packing problem, and it is NP-hard. Can you compute a

lower bound on its optimal value?

Exercise 3. Show that the optimal value of the linear programming problem

min cx subject to Ax b, 0 xj 1 (j = 1, 2, . . . , n)

provides a lower bound on the optimal value of the 0-1 LP problem

min cx subject to Ax b, xj {0, 1} (j = 1, 2, . . . , n).

Exercise 4. Show that the lower bounds obtained in Exercises 1-3 can all be viewed as optimal solutions

of a relaxation of the original problem.

8/11/2019 AOR Syllabus20132014

21/115


8/11/2019 AOR Syllabus20132014

22/115

Chapter 3

Heuristics for combinatorial

optimization problems

3.1 Introduction

Even though there does not really exist any general theory of heuristics, certain common strategies

can be identified in many successful heuristics. The aim of this chapter is to present such fundamental

principles of heuristic algorithms for combinatorial optimization problems of the form

minimize {F(x)| x X} . (CO)

In Sections 3.2 and 3.3 below, we succesively describe a few simple ideas of this nature, namely

reformulation, decomposition, rounding, and list-processing.

Then, we turn to more elaborate frameworks, or guidelines, which have been proposed to develop

specific heuristics for a broad variety of optimization problems. These frameworks go by the name of

metaheuristic schemes, ormetaheuristicsfor short. Thus, metaheuristics can be viewed as recipes for

the solution of (CO) problems.

We focus more particularly on so-called local search heuristics. Broadly speaking, local search heuris-

tics rely on a common, rather natural and intuitive approach to find a good solution of (CO): starting

from an initial solution, they move from solution to solution in the feasible region X, in an attempt (or

hope) to locate a good solution along the way (see Figure 3.1, where N(x) represents the neighborhood

17

8/11/2019 AOR Syllabus20132014

23/115

18 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

of a current solution x). Most metaheuristics (like simulated annealing or tabu search) specifically

generate local search heuristics. They constitute the main topic of this chapter.

x0x1

x2

x3

x4

N(x0)

N(x1)

N(x2)

N(x3)

X

Figure 3.1: Local search

Additional information on heuristics can be found for instance in Aarts and Lenstra (1997), Glover

and Laguna (1997), Hoos and Stutzle (2005), Papadimitriou and Steiglitz (1982), Pirlot (1992), and manyother sources.

3.2 Reformulation, rounding and decomposition

Many heuristics rely on a few simple and natural ideas. One such idea is to replace the original hard

problem (CO) by an easier, but closely related one, say (CO). This can be accomplished, for instance,

by changing the definition of the objective function, or by dropping some of the constraints of (CO). In

the latter case, solving the simplified problem (CO) usually produces an infeasible solution of (CO), and

this solution needs to be somehow repaired in order to produce a feasible (but suboptimal) solution.

A specific, but extremely useful and common application of this idea is found in rounding algorithms

for 0-1 linear programming problems of the form

min cx

8/11/2019 AOR Syllabus20132014

24/115

3.2. REFORMULATION, ROUNDING AND DECOMPOSITION 19

subject toAx b and x {0, 1}n.

We have already observed in Section 2.1.4 that, when we drop the constraint x {0, 1}n from this

problem formulation, we obtain a linear programming problem which can be easily solved. Of course, the

optimal solution of the LP model is typically fractional, and hence infeasible for the original problem.

However, it is sometimes possible to round this optimal solution in such a way as to obtain feasible 0-1

solutions of the 0-1 LP problem.

Rounding has been used in countless algorithms for 0-1 LP problems, be it in theoretical developments,

in implementations of generic solvers, or in specific industrial applications; see for instance Bollapragada

et al. (2002) for a recent illustration.

Another general idea for solving hard problems is to decompose them into a collection or a sequence

of simpler subproblems. Then each subproblem can be solved either optimally or heuristically, and the

solutions of the subproblems are patched together in order to provide a feasible solution of the original

problem. Similar decomposition approaches are sometimes called divide and conquer strategies in the

broader context of algorithmic design. Note, however, that they usually result in suboptimal solutions of

the original CO problem.

Examples of the decomposition strategy are abundant in real-world settings. In a very broad sense,

if we assume that the ultimate ob jective of management is to optimize the revenues, or the shareholders

profit, or the survivability of a firm, then the functional organization of the firm in marketing, production,

and finance departments can be viewed as a way to decompose the global optimization issue into a num-

ber of subproblems, linked together by appropriate coordination mechanisms (e.g., strategic or business

plans).

More specific examples can be found in classical production planning approaches, for instance in MRP

techniques (Material Resource Planning; see Crama (2002)). Here, for simplicity reasons, the optimal lot

size is usually determined independently for each component arising in a bill-of-materials. But in fact, the

actual cost-minimization problem faced by the firm involves many interactions among these components:

use of common production equipments, possibilities of joint orders from suppliers, etc. Therefore, the

component-wise decomposition only provides a heuristic way of handling the global issue.

Illustrations of decomposition approaches can also be found in the papers by Crama, van de Klundert,

and Spieksma (2002) or by Tyagi and Bollapragada (2003) and in numerous other publications.

8/11/2019 AOR Syllabus20132014

25/115


3.3 List-processing heuristics

List-processing heuristics (also called greedyor myopic heuristics) can be viewed as a special type of local

search heuristics, and are among the simplest among them (see Section 3.4 below). We do not want to try

to characterize them very precisely here: let us simply say that they apply in particular to CO problems

of the form

min (or max) F(S) subject to S E, S I, (IS)

whereEis a finite set of elements and Iis a collection of subsets ofE.

The elements ofEcan be viewed as thedecision variablesof the problem. For instance, the knapsack

problem (see Subsection 2.1.4) can be interpreted in this way: here,E is a set of objects, and S I if

the subset of objects Sfits in the knapsack.

Now, list-processing heuristics construct a feasible solution of (CO) in successive iterations, starting

from the initial solution S= and adding elements to this solution, one by one, in the order prescribed

by some prespecified priority list. They terminate as soon as the priority list has been exhausted. In

particular, no effort is made to improve this solution in subsequent steps (which justifies the names

myopic or greedy).

Thus, the list-processing metaheuristic can be sketched as in Figure 3.3 below.

1. Etablish a priority listL of the elements ofE.

2. Set S :=.

3. Repeat: ifL is empty then return Sand stop; else

consider the next element in L, say ei, and remove ei from L;

ifS {ei}is feasible, i.e. ifS {ei} I, then set S :=S {ei}.

Figure 3.2: The list-processing metaheuristic

Intuitively speaking, the choice of the list L should be dictated by the impact of each element ofEon

the objective function F: those variables with a smaller marginal cost (for a minimization problem) or

a heavier marginal contribution (for a maximization problem) should receive higher priority. But these

8/11/2019 AOR Syllabus20132014

26/115

3.3. LIST-PROCESSING HEURISTICS 21

general guidelines leave room for many possible implementations. Let us illustrate this discussion on a

few examples.

Example: The knapsack problem. Consider the knapsack problem

max cx


wherea, c Rn+ andb R. Various list-processing strategies can be proposed for this problem.

Strategy 1.

1. Sort the variables by nonincreasing utility value: ifci > cj , then xi precedes xj in L.

2. Setx := (0, 0, . . . 0).

3. Run throughL; increase the current variable to 1 if the resulting partial solution is feasible; otherwise

leave it equal to 0.

Let us apply this strategy to the instance:

max 3x1+ 10x2+ 3x3+ 7x4+ 6x5

subject to 2x1+ 6x2+ 5x3+ 8x4+ 3x5 16

xi {0, 1}(i= 1, 2, . . . , 5).

For this instance, we successively obtain:

L= (x2, x4, x5, x1, x3)

x := (0, 0, 0, 0, 0)

x2:= 1; x := (0, 1, 0, 0, 0);

x4:= 1; x := (0, 1, 0, 1, 0);

x5:= 0; x := (0, 1, 0, 1, 0) (since (0, 1, 0, 1, 1) is not feasible !);

x1:= 1; x := (1, 1, 0, 1, 0);

x3:= 0; x := (1, 1, 0, 1, 0);

So, the algorithm returns the heuristic solution (1, 1, 0, 1, 0), with value 20.

An obvious shortcoming of Strategy 1 is that it does not take the value of the coefficients aj into

account when fixing the priority list. So, in the previous instance, variable x4 is given higher priority

than x5 when in fact, for a comparable utility, x5 adds much less weight to the knapsack than x4. This

observation leads to the next strategy.

8/11/2019 AOR Syllabus20132014

27/115


Strategy 2.

1. Sort the variables by nonincreasing value of the ratios ciai

: if ciai

> cjaj

, then xi precedesxj in L.

2. Setx

:= (0, 0, . . . 0).3. Run throughL; increase the current variable to 1 if the resulting partial solution is feasible; otherwise

leave it equal to 0.

Going back to the numerical instance, we now obtain L = (x5, x2, x1, x4, x3). The resulting heuristic

solution is (1, 1, 1, 0, 1), with value 22.

Interestingly, it can be proved that this strategy is equivalent to the following one (which combines

rounding with list-processing): solve the LP relaxation of the knapsack problem to obtain a fractional

solutionx, then sort the variables by nonincreasing value of the componentsxi and continue as in Steps

2-3 of Strategy 2 (see e.g. Nemhauser and Wolsey (1988)).

Example: The traveling salesman problem. The TSP can be viewed as a minimization problem

of the form (IS), where E is the set of edges of the underlying graph, and S is in I if and only ifS is

a subset of edges which can be extended to a TSP tour. Assume now that the priority listL sorts the

edges by nondecreasing length. The resulting greedy heuristic is known in the literature as theshortest

edgeheuristic.

Example: The maximum forest (MFT) problem. Let G = (V, E) be a non-oriented graph with

weightw(e) 0 on each edgee E. IfSis any subset of edges ofG, the weight ofSis w(S) =

eSw(e).

AforestofG is a subset of edges ofG which does not contain any cycle (i.e., closed path). Themaximum

forest problemasks for a forest ofG of maximum weight.

The greedy (list-processing) algorithm for this problem is:

Greedy MFT

1. Sort the edges ofG by nonincreasing weight: ifw(ei)> w(ej), then ei precedesej inL.

2. SetT :=.

3. Run through L; ifT {ei}is a forest (i.e., is cycle-free), then set T :=T {ei}.

Let us look at the instance in Figure 3.3, with the following weights (we denote by w(1, 2) the weight

of edge {1, 2}, etc.): w(1, 2) = 10, w(3, 5) = 8, w(1, 3) = 7, w(2, 3) = 7, w(5, 6) = 6, w(3, 6) = 6,

w(2, 4) = 2, w(4, 5) = 2. Note that we have listed the weights by nonincreasing value. So, the Greedy

algorithm successively produces:

8/11/2019 AOR Syllabus20132014

28/115

3.3. LIST-PROCESSING HEURISTICS 23

4 5

2 3

1

6

Figure 3.3: A graph with 6 vertices and 8 edges

T :=

T :={(1, 2)}

T :={(1, 2), (3, 5)}

T :={(1, 2), (3, 5), (1, 3)}

T :={(1, 2), (3, 5), (1, 3), (5, 6)}

T :={(1, 2), (3, 5), (1, 3), (5, 6), (2, 4)}.

The resulting forest has weight 33 and it is easy to check that this is the optimal solution for this instance

(although there are several alternative optimal forests).

Actually, Proposition 3.3.1 hereunder shows that the Greedy algorithm is not only a heuristic, but

also an exactalgorithm for the Maximum forest problem. Together with some of its far-reaching gener-

alizations, this result plays a central role in combinatorial theory.

We first need a lemma. Recall that a treeis a connected forest.

Lemma 3.3.1. IfG= (V, E) is a connected graph, then every maximal forest ofG is a tree containing

|V| 1 edges. More generally, ifG hasc connected componentsGi= (Vi, Ei) (i= 1, 2, . . . , n), then every

maximal forest ofG is the union ofc trees and containsc

i=1(|Vi| 1) edges.

Proof. We leave the proof to the reader. QED

Proposition 3.3.1. The Greedy algorithm delivers an optimal solution for every instance of the Maxi-

mum forest problem.

8/11/2019 AOR Syllabus20132014

29/115


Proof. Let T ={e1, e2, . . . , et} be the solution returned by the greedy algorithm and let S= {e1, e

2, . . . , e

t}

be an optimal solution, where ei precedes ei+1andei precedese

i+1 in L. We want to show by induction

that, fork = 1, 2, . . . , t, w(ek)w(e

k), which will imply that w(T) w(S).Fork = 1, we have w(e1) w(e

1) by definition of the Greedy algorithm.

Consider now an index k > 1. Suppose that w(ei) w(ei) for 1 i < k and w(e

k) > w(ek). Note

that ek precedes ek in L.

Consider the edge-set R = {e E | w(e) w(ek)} and the forests F = {e1, e2, . . . , ek1} and

H={e1, e2, . . . , e

k}. We claim thatF is a maximal forest in R, i.e. every edge ofR \ Fcreates a cycle

in F: indeed, ife R \ F, then w(e) w(ek)> w(ek) and the greedy algorithm should have chosen e

rather than ek.

Since|F|= k 1< |H|, we conclude that the graph (V, R) contains two maximal forests of different

cardinalities, contradicting Lemma 3.3.1. QED

Beyond their application to CO problems of the form (IS), list-proceesing algorithms can be extended

to handle associatedpartitioning problems like

min m

subject to S1 S2 . . . Sm= E, (PART)

Si I (i= 1, 2, . . . , m).

Thus, problem (PART) is here to partition E into a smallest number of sets in I. This problem can

be attacked by solving a sequence of optimization subproblems over (IS), with F(S) = |S|: try first

to determine a large set S1 I, then remove from Eall elements ofS1, repeat the process in order to

determineS2, and so on. If each step is solved by a list-processing algorithm, then the resulting procedure

is also called a list-processing algorithm for (PART); see Exercises 3 and 4 hereunder.

Additional examples of list-processing algorithms can be found, for instance, in Crama (2002).

3.3.1 Exercises.

Exercise 1. Apply the shortest edge heuristic to the TSP instance given in Section 2.2.3. Compare the

length of this tour with the lower bounds computed in Section 2.2.4.

Exercise 2. Prove Lemma 3.3.1.

Exercise 3. Let G = (V, E) be a graph. A subset of verticesS V is called stable (or independent)

8/11/2019 AOR Syllabus20132014

30/115

3.4. NEIGHBORHOODS AND NEIGHBORS 25

in G if it contains no edges, that is if the following condition holds: for all u, v S, {u, v} E. The

maximum stable set problem consists in finding a stable set of maximum size in a given graph. Provide

a greedy heuristic for this problem.Exercise 4. Show that the graph coloring problem (Section 2.1.6) and the bin packing problem (Section

2.2.4) are partitioning problems of the form (PART). Develop a greedy heuristic for each of these problems.

3.4 Neighborhoods and neighbors

In this and the following sections, we concentrate on local search procedures. A common feature of all

local search procedures is that they exploit the neighborhood concept (see Figure 3.1).

Definition 3.4.1. A neighborhood structure for the setXis a collection of subsetsN(x) X, one foreachx X. We callN(x) the neighborhood of solutionx, and we say that every element inN(x) is a

neighbor ofx.

The neighborhhood concept is naturally linked to the concept of local optimality.

Definition 3.4.2. A solutionx X is alocal minimumof CO with respect to the neighborhood structure

N(x) ifN(x) does not contain any solution better thanx, i.e. ifF(x) F(x) for allxX.

Note for further reference that this definition does not only depend on the problem at hand (i.e.,

on X and F) but also on the neighborhood structure which has been adopted. However, when theneighborhhod structure is clear from the context and when no confusion can arise, it is common practice

to omit the qualifier with respect to the neighborhood structure N(x).

There are in general very many ways to define a neighborhood structure for a particular CO problem.

Although all possible definitions are not necessarily equally good from the point of view of local search

performance, it is often difficult to decide ex antewhich ones will perform best. Some experimentation

and some experience will usually be necessary in order to make the best choice of neighborhoods.

3.4.1 Some examples

1. In the 0-1 linear programming problem, we have

X={x: Ax b, xj {0, 1}, j = 1,..n}.

For a solution x X, we can for instance define

8/11/2019 AOR Syllabus20132014

31/115


N1(x) ={y X :x and y differ by at most one component}.

Note that, intuitively speaking, the number of components on which two binary x and y differ provides

a measure of the distance between x and y (sometimes called Hamming distance between x and y). In

some applications of local search, we may actually prefer to use the neighborhood structure

Nk(x) ={y X :x and y differ by at most k components}

wherek may take any of the values 1,2,3,...

2. In the traveling salesman problem, a solution can be viewed as a permutation of the vertices. E.g., the

permutation = (2, 3, 6, 5, 1, 4) represents the tour which visits vertices 2, 3, 6, 5, 1 and 4 in that order.

Note that every permutation of the vertices corresponds to a feasible tour.Then, a neighborhood structure could be, for instance:

N() ={| permutation results from permutation by transposition of two vertices}.

With this definition, permutations (3, 2, 6, 5, 1, 4), (6, 3, 2, 5, 1, 4), (2, 1, 6, 5, 3, 4), (4, 3, 6, 5, 1, 2), etc., are

neighbors of (2, 3, 6, 5, 1, 4).

An alternative, slightly more subtle neighborhood structure arises if we look at tours as lists of edges,

rather than lists of vertices (this is of course conceptually equivalent, but experience shows that different

representations of solutions may sometimes lead to very different algorithmic developments). Considernow a tour Cas represented in Figure 3.4, where i ,j,k, l are four distinct vertices and edges {l, j} and

{i, k} are not in the tour. Then, a neighborC ofCcan be obtained by removing the edges {l, k} and

{i, j}from Cand by adding the edges {l, j} and {i, k}to it. The operation that transforms C into C is

called a 2-exchange. The 2-exchange neighborhood structure is naturally defined by

N(C) ={C |C results from Cby a 2-exchange }.

3. With an instance G = (V, E) of the graph equipartitioningproblem, we associate the feasible set of all

equipartitions ofV , i.e.

X={(V1, V2) :V1 V2= V , V1 V2= , |V1|= |V2|}.

A possible neighborhood structure for this problem is defined as

N(V1, V2) ={(V1 , V

2) :V

1 =V1 {v} \ {u}, V

2 =V2 {u} \ {v}for some pair of nodes u V1, v V2}.

8/11/2019 AOR Syllabus20132014

32/115

3.4. NEIGHBORHOODS AND NEIGHBORS 27

C

i

j

k

l

C

i

j

k

l

Figure 3.4: 2-exchange neighborhood concept for the traveling salesman problem

V1 V2 V

1 V

2

u v v u

Figure 3.5: Neighborhood concept for the graph equipartitioning problem

8/11/2019 AOR Syllabus20132014

33/115


We have imposed N(x) X for all x. WhenX is small (i.e., for heavily constrained problem), it is

sometimes difficult, or overly restrictive, to define neighborhoods that obey this condition. For instance,

when partitioning a graph, it may be natural to consider the alternative neighborhood structure

N(V1, V2) ={(V1 , V

2) :V

1 =V1 {v}, V

2 =V2\ {v} for some node v V2}.

In this case, a problem occurs as the feasibility condition |V1 |= |V2 |does not hold, that is (V

1 , V

2) /X.

One way around this difficulty is to reformulate the original CO problem into an equivalent problem that

admits more feasible solutions (i.e., to extend X) and to penalize all solutions that are not in X.

For example, for any partition (V1, V2) (not necessarily into equal parts) of the vertex set V, define

e(V1, V2) to be the number of edges from V1 to V2. Then, the graph equipartitioning problem

minimizee(V1, V2)

subject to V1 V2= , V1 V2= V, |V1|= |V2|

has the same optimal solutions as the following one:

minimizeh(V1, V2) =e(V1, V2) + M(|V1| |V2|)2

subject to V1 V2= , V1 V2= V

whereM is a very large number (penalty). Such problem reformulation allows to enlarge the feasible set,

hence to move more freely within this set and to find more easily an initial feasible solution x X. (A

similar reformulation is used in the big M method of linear programming.)

3.4.2 Exercises.

Exercise 1. For each of the neighborhood structures defined in Section 3.4.1, estimate the size of the

neighborhood of a feasible solution as a function of the size of the instance (number of variables, number

of vertices, etc.).

Exercise 2. Consider problem (IS) in Section 3.3. Show that a list heuristic is obtained by applying the

local search principle to (IS) with the following neighborhood structure

N(T) ={S I |T Sand |S|= |T| + 1}

(i.e., S results fromTby adding one element to it).

8/11/2019 AOR Syllabus20132014

34/115

3.5. STEEPEST DESCENT 29

3.5 Steepest descent

The steepest descent metaheuristic is one of the most natural local search heuristics: it simply recommends

to keep moving from the current solution to the best solution in its neighborhood, until no further

improvement can be found.

A more formal description of the algorithm is given in Figure 3.5. We assume here that a particular

neighborhood structure has been selected. For k = 1, 2, . . ., we denote by xk the current solution at

iteration k. We denote by x the best available solution and by F the best available function value:

that is, F =F(x).

1. Selectx1 X, set F :=F(x1), x =x1 andk := 1.

2. Repeat:

find the best solution x in N(xk) :F(x) = min{F(x) :x N(xk)};

ifF(x)< F(xk) then xk+1 :=x, F :=F(x), x =x and k := k+ 1

else return x, F and stop.

Figure 3.6: The steepest descent metaheuristic

Note that steepest descent really is a metaheuristic, not an algorithm. In particular, it cannot be

applied directly to any particular CO problem until the initialization procedure has been described or,

more fundamentally, until the neighborhood structure has been specified for this problem.

Note also that, when dealing with maximization (rather than minimization) problems, we speak of

steepest ascent rather than steepest descent.

We now proceed with a number of further comments on this framework.

3.5.1 Initialization

How should we select x1 ? Intuitively, it seems preferable to start from a good solution, such as a

solution selected by a list-processing heuristic. Experiments show, however, that this is not necessarily

the case and that starting from a random solution may sometimes be a good idea. The influence of the

initial solution may be reduced if we execute several times the algorithm with different initial solutions.

8/11/2019 AOR Syllabus20132014

35/115


3.5.2 Local minima

By definition, steepest descent heuristics terminate with a local optimum of CO which is not necessarily

a global optimum.

For example, consider the following instance of the knapsack problem:

max2x1 3x2+ x3+ 4x4 2x5

subject to 2x1 3x2+ 2x3+ 3x4 x5 2

xi {0, 1} for i= i = 1, 2, . . . , n

and consider the neighborhood structure defined by N1(x) ={y X :5

i=1 |xi yi| 1}.

Suppose that the initial solution is x1 = (0, 0, 0, 0, 0) and F = 0. Then, steepest ascent sets x2 =

(1, 0, 0, 0, 0), F = 2 and stops.

Suppose now that we start with another initial solution, say x1 = (0, 1, 0, 0, 1) and F =5. Then,

we successively getx2 = (0, 1, 0, 1, 1),F =1, and nextx3 = (0, 0, 0, 1, 1),F = 2. The algorithm stops

with the local maximum x =x3.

So, in both cases, we have only found local maxima, whereas the global maximum isx = (1, 1, 0, 1, 0),

with F(x) = 3.

3.5.3 Choice of neighborhood structure

A further observation (closely related to the previous one) is that, when N(x) is too small, the risk of

missing the global optimum is high. But conversely, whenN(x) is large, the heuristic may spend a lot

of time exploring the neighborhood of the current solution in order to determine x. This is another

manifestation of the quality vs. time trade-off already mentioned in Section 2.2.3.

This is illustrated (although caricaturally) by considering two extreme cases:

ifN(x) ={x} for all x X(a very small neighborhood, indeed), then the algorithm stops at the first

iteration and simply returns the initial solution;

at the other extreme, ifN(x) =X for all x X, then x is the global optimum of the problem (which,

of course, may be very hard to find).

More interestingly, this brief discussion points to the fact that the subproblem min{F(x) :x N(xk)}which is to be solved at every iteration of steepest descent is fundamentally a problem of the same nature

as CO itself, but over a restricted region of the search space. In many cases, this subproblem will be

solved by exhaustive search, i.e., by complete enumeration of all solutions in N(x). This observation may

guide the choice of an appropriate neighborhood structure.

8/11/2019 AOR Syllabus20132014

36/115

3.5. STEEPEST DESCENT 31

3.5.4 Selection of neighbor

Some variants of the algorithm do not completelyexplore the neighborhood of the current solution xk in

order to find x, but rather select, for instance, the first solution x such that F(x)< F(xk) found during

the exploration phase, or the best solution among the first ten candidates, etc. (This is akin to the partial

pricing strategy used in certain implementations of the simplex method for linear programming.)

3.5.5 Fast computation of the objective function

When exploring the neighborhood N(xk) of the current solution, it is sometimes possible to improve

efficiency by avoiding to recompute F(x) from scratch for all x N(xk), and by making use of the

information that is already available about the value ofF(xk).

For example, assume (as in a knapsack problem) that F(xk) =n

j=1 cjxkj and thatx N1(x

k) differs

fromxk only on the 5th component. How should we computeF(x) in this case? Brute force computation

of the expression F(x) =n

j=1 cjxj requires n multiplications and n 1 additions. By contrast, only 2

multiplications and 2 additions are required if we notice that F(x) =F(xk) c5xk5+ c5x5. Similarly, if

X={x|n

j=1 ajxj b}, then we can check whether x Xby storing the valuen

j=1 ajxkj and simply

checking whethern

j=1 ajxkj a5xj

k + a5xj b.

Let us consider another example. For the traveling salesman problem, let Cbe a feasible tour (set of

edges) with length L(C). After the 2-exchange displayed in Figure 3.7, we obtain a tour C with length

L(C) =L(C) dij dkl+ dik+ djl ,

and the computation ofL(C) only requires 4 additions ifL(C) is available.

3.5.6 Flat objective functions

Consider the graph coloring (or chromatic number) problem. Here, X is the set of all feasible colorings

and F(x) is the number of colors used in coloring x. We can define the neighborhood of coloringx as

consisting of all the colorings which can be obtained by changing the color of at most one vertex in x.

Suppose for instance that Figure 3.8 represents an arbitrary coloring. The color of each vertex

is indicated next to it, by a number between brackets (this is the coloring provided by the smallest

available color heuristic when the vertices are explored in the order v1, v2, . . . , v10). In this case, no

neighbor improves the initial solution. Intuitively, the objective function is flat in the neighborhood

8/11/2019 AOR Syllabus20132014

37/115


C

i

j

k

l

C

i

j

k

l

Figure 3.7: 2-exchange

v4(1) v5(3)

v9(4) v10(4)

v7(1) v8(3)

v2(2) v3(2)

v6 (2)

v1(1)

Figure 3.8: A feasible coloring with 4 colors

8/11/2019 AOR Syllabus20132014

38/115

3.6. SIMULATED ANNEALING 33

of the current solution (all neighbors have the same objective function value) and it is difficult to find

descent directions (see Figure 3.9).

A possible remedy to this difficulty is to modify both F(x) and X in the definition of the problem!For instance, let us select a tentative number of colors C (for example, C= 3) and define

XC={ colorings ofVusing the colors {1, 2, . . . , C } },

where the colorings are not necessarily required to be feasible (cf. Section 2.1.6), and let

F(x) = number of monochromatic edges induced by coloring x.

So, in Figure 3.10, F(x) = 4.

Of course, the graph can be colored with Ccolors if and only if min{F(x)| x XC}= 0. In other

words, the chromatic number of a graph is the smallest value ofCfor which min{F(x)| x XC}= 0.

So, the original graph coloring problem can be transformed into a sequence of problems of the form

(XC, F), for decreasing values ofC.

Note now that the objective function F(x) is not flat, as opposed toF(x). For instance, changing the

color ofv4 from 1 to 3 yields F(x) = 2. Next, changing the color ofv8 from 1 to 3 leads to F

(x) = 0,

meaning that the graph is feasibly colored with 3 colors.

3.5.7 Exercises.

Exercise 1. Explain why the simplex algorithm for linear programming can be called a steepest desccent

method.

Exercise 2. Show that changing the color ofv9 from 1 to 3 in Figure 3.10 leads to a local optimum of

F(x).

3.6 Simulated annealing

The major weakness of steepest descent algorithms is that they tend to stop too early, i.e. they get

trapped in local optima of poor quality. How can we avoid this weakness?

A possible solution is to run the algorithm repeatedly from multiple initial solutions. This multistart

strategy may work well in some cases, but other, more complex approaches have proved to be much more

powerful for large, difficult instances of CO problems.

In this section, we want to explore the following ideas.

8/11/2019 AOR Syllabus20132014

39/115


F(x)

xxk

Figure 3.9: Flat objective function

v4(1) v5(1)

v9(1) v10(3)

v7(1) v8(1)

v2(2) v3(2)

v6 (2)

v1(1)

Figure 3.10: An infeasible coloring with 3 colors

8/11/2019 AOR Syllabus20132014

40/115


Idea # 1. In order to escape local minima, it may be useful to take steps which deteriorate the objective

function, at least once in a while. One way to achieve this goal may be to replace xk by a neighborxk+1

chosen randomly in N(xk

). This idea has shown to be especially useful when combined with the nextingredient.

Idea # 2. Select a good neighbor with higher probability than a bad one.

Taken together, these two ideas result in the very popular simulated annealing algorithm. Various

aspects of the implementation of SA algorithms are discussed at length, for instance, in two papers by

Johnson, Aragon, McGeoch and Schevon (1989, 1991) or in Pirlot (1992). We only provide here some

basic elements of information and we refer to these papers for additional details.

3.6.1 The simulated annealing metaheuristic

The generic framework of the simulated annealing metaheuristic is shown in Figure 3.11. We suppose

again that a particular neighborhood structure has been selected and we use the same notations x, F, xk

as in the steepest descent heuristic. Moreover, we assume that for k = 1, 2, . . ., a number (calledtransition

probability) 0< pk

8/11/2019 AOR Syllabus20132014

41/115


1. Selectx1 X, set F :=F(x1), x =x1 andk:= 1.

2. Repeat:

Choose x randomly in N(xk) (Propose a move).

IfF(x)< F(xk) then AcceptMove(x) else Toss(xk, x).

Evaluate the stopping conditions.

If Terminate =True then return x, F and stop, else continue.

Procedure AcceptMove(x)

xk+1

:=x (Accept the move). ifF(x)< F(x) then F :=F(x), x :=x.

Procedure Toss(xk, x)

let xk+1 :=x with probability equal to pk (Accept the move)

else, let xk+1 :=xk (Reject the move).

Procedure Stopping conditions

if the stopping conditions are satisfied then Terminate := True

else k := k + 1 and Terminate :=False.

Figure 3.11: The simulated annealing metaheuristic

cooling schedulewhereby the temperature decreases by a constant factor (the cooling factor) after a

constant number L of iterations. The iterations performed at constant temperature constitute a plateau

(see Figure 3.14).

3.6.3 Stopping criteria

Note that, contrary to local search, simulated annealing may perform an infinite number of iterations ifwe do not impose some limitation on its running time. So, when should we terminate the process ?

A common criterion is to stop when a large number of iterations has been performed without any

improvement in the objective function and when the process seems to be stalling. One way to implement

this idea requires to select two positive numbers, say 2 and K2 (for example, 2 = 2 and K2 = 5). The

8/11/2019 AOR Syllabus20132014

42/115


Figure 3.12: Fixing p(k)

Figure 3.13: Fixing p(k) II

Tk

k

T0

T0

2T0

3T0

L 2L 3L 4L

Plateau

........................

.................

..............

. . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 3.14: A geometric cooling schedule

8/11/2019 AOR Syllabus20132014

43/115

8/11/2019 AOR Syllabus20132014

44/115


1. Selectx1 X, set F :=F(x1), x =x1, k := 1 and T :=T0.

2. Repeat:

Choose x randomly in N(xk) (Propose a move).

IfF(x)< F(xk) then AcceptMove(x) else Toss(xk, x).

Evaluate the stopping conditions.

If Terminate =True then return x, F and stop, else continue.

Procedure AcceptMove(x)

xk+1 :=x (Accept the move).

ifF(x)< F(x) then F :=F(x), x :=x.

Procedure Toss(xk, x)

compute F :=F(x) F(xk) andpk =eFT (transition probability)

draw a number u, randomly and uniformly distributed in [0,1]

ifu pk then xk+1 :=x (Accept the move)

else xk+1 :=x

k

(Reject the move).

Procedure Stopping conditions

if the number of iterations since the last decrease of temperature is less than L

then k := k + 1 and Terminate := False(Continue with the same plateau)

else

if no improvement ofF has been recorded and if fewer than 2 % of the moves have been

accepted during the last K2 temperature plateaus

then Terminate := True

elseT :=T (decrease T), k := k+ 1 and Terminate := False. (Proceed to the next plateau)

Figure 3.15: Implementing the simulated annealing metaheuristic

8/11/2019 AOR Syllabus20132014

45/115

8/11/2019 AOR Syllabus20132014

46/115


A potential problem is that the choice x1 = 1 is maybe very bad and, during each iteration, the

probability of reversing this choice is only 1n

. To solve this, here is a possible remedy: at the beginning of

steps 1, n + 1, 2n + 1,..., generate a random permutation of indices 1,...,n.; for example, (5, 3, 6, 2, 1,...).During the nextn iterations, generate the neighbors obtained by modifying each coordinates in the order

defined by the permutation. In other words:

step 2 :x15 becomes 1 x15= x

25


33


46


52

or

(1,0,0,1,0,1,...)

(1,0,0,1,1,1,...)

(1,0,1,1,1,1,...)

(1,0,1,1,1,0,...)

(1,1,1,1,1,0,...)

Thus, aftern iterations, each coordinate had the opportunity to be modified at least once (subject to

the acceptation of the move).

Approximate exponentiation

The computation time ofeFT is quite high. A non-negligible speedup can be obtained if we replace this

expression by its approximation 1 FT

(25 times faster for comparable quality; see Oliveira, Ferreira,

and Vidal (1993) for details).

Once again, we refer the reader to Aarts and Lenstra (1997), Johnson, Aragon, McGeoch and Schevon

(1989, 1991), Pirlot (1992) and to other references in the bibliography for more information on simulated

annealing algorithms.

8/11/2019 AOR Syllabus20132014

47/115


3.7 Tabu search

(To be revised and completed...)

3.7.1 Introduction

Idea : at each iteration, choose a neighbor x ofxn that minimizes F(x) in N(xn)1

Consider the following example:

max x1+ 10x2+ 3x3+ 7x4+ 6x5

subject to 2x1+ 6x2+ 5x3+ 8x4+ 3x5 16

xj {0, 1}, j = 1, ..., 5

The neighbors are solutions within a Hamming distance of 1. Let x0 = (0, 0, 0, 0, 0) be the initial

solution. Thus we might have

x0 = (0, 0, 0, 0, 0)

x1 = (0, 1, 0, 0, 0)

x2 = (0, 1, 0, 1, 0)

x3 = (1, 1, 0, 1, 0)

x4 = (0, 1, 0, 1, 0)

Here, x4 = x2, underlying a danger of this method : the cycling problem. Now, suppose that coming

back to the last explored solution is forbidden. We could have therefore

x4 = (1, 1, 0, 0, 0) (x2 is tabu)

x5 = (1, 1, 0, 0, 1) (x3 is tabu)

x6 = (1, 1, 1, 0, 1) (optimal solution)

Note that the interested reader will find a generic description of this problem in Section 4.1 of Pirlot

(1992).

3.7.2 The algorithm

Initialization: selectx1 X; F =F(x1); x x1 and the Tabu List T L:= .

Step k (with k = 1,2,...):

Choose the best neighbor x ofxk that is not tabu

F(x) = min{F(x) :x N(xk), x /T L}.

1This is the steepest descent mildest ascent method. See Hansen and Jaumard (1990) for more about this topic.

8/11/2019 AOR Syllabus20132014

48/115

8/11/2019 AOR Syllabus20132014

49/115


i

j

k

l

i

j

k

l

Figure 3.18: Tabu list

x1 :F(x1) = 5.

The best move is to change vertex (1) into V (or vertex (3) into V or vertex (4) into V). That way,

F(x2) = 3 and T L= {(1, B)}.

Then, the best move is to change vertex (3) into V. That way, F(x3) = 1 andT L= {(1, B), (3, B)}.

All the moves increase theFfunction. Choose for instance to change vertex (4) into V F(x4) = 3

and T L= {(1, B), (3, B), (4, B)}.

The best move is to change vertex (1) into B, which is tabu, but we accept it since it satisfies the

aspiration criterion. Thus, F(x5) = 1 and T L= {(1, B), (3, B), (4, B), (1, V)}.

...

8/11/2019 AOR Syllabus20132014

50/115

3.7. TABU SEARCH 45

1

2

3

4

5

6

7

8

9

10

11

B V B

R

B V

B V

R

R R V V

B

B

Figure 3.19: Chromatic number - Tabu search

8/11/2019 AOR Syllabus20132014

51/115


3.8 Genetic algorithms

3.8.1 Introduction

Steepest descent, simulated annealing and tabu search are designed to improve an initial solution by

exploring solutions that are close to it. This approach is sometimes called an intensification strategy

since it allows to intensify the search in the vicinity of a current solution.

A major drawback of such strategies is that they cannot easily reach areas that are very distant from

the initial solution; that is, they cannotdiversifythe exploration of the feasible set X.

A possible remedy to this drawback is to apply the algorithm a large number of times from many

different initial solutions (multistart, or sampling strategy). But here again, several problems occur: first,

a large number of times (1000 times, 10000 times) can still be quite small as compared to the size ofthe space to explore. Second, it is hard to ensure that the sample of initial solutions faithfully represents

the setX. 2

Genetic algorithms (GA) offer a specific, quite powerful approach to the diversification issue (see e.g.

Goldberg (1989)). In fact, they alternate diversification and intensification phases. At each iteration,

they produce a population (i.e., a subset of solutions): at step k, the population is denoted X(k) =

{x(k)1 , x

(k)2 ,...,x

(k)N } X.

3.8.2 Diversification via crossover

Consider a pair of solutions x and y (to be called parents) in the current population. We can combine

these solutions to produce one or two new solutions (called children) u and v that share some features

of both x and y. The operator that associates a child (or two children) to a pair of parents is called

crossover.

Intuitively (and just as in real life), the children obtained by crossover should look like their parents,

but should also introduce some diversity in the current population.

Suppose for example thatx andy are binary vectors:

x= (11010011)y= (01100101)

A possible crossover operator produces the single child u, where ui = xi with probability 0.5 and

ui= yi with probability 0.5. For our example, this operator could produce the child

2This is a general problem with sampling methods.

8/11/2019 AOR Syllabus20132014

52/115

3.8. GENETIC ALGORITHMS 47

u= (11100111).

Note that the second, fifth and eighth elements (underlined) ofu are predetermined since they are

common tox and y.Another crossover method works by randomly choosing an index i, splitting x and y at coordinate i

and exchanging the initial segments ofx and y. For instance, with i = 4 in the previous example, we

produce two children:

u= (1101|0101)

v= (0110|0011).

The crossover operators defined above are uniform operators, meaning: ifz = (z1,...,zn) is a child of

x = (x1,...,xn) and y = (y1,...,yn), then either (zi =xi) or (zi =yi). Note that nonuniform crossovers

are also frequently used in the literature (see Mulhenbein (1997) for details).

Ideally, the new individuals created by crossover should inherit desirable features from their parents:

we would like to produce good children from good parents. This goal can be achieved by combining

the following elements:

When picking a pair of parents to mate, good parents should be selected with a higher probability

than bad ones. For instance, x andy could be drawn in X(k) with probability equal to

Prob(x) = Fmax F(x)Nj=1[Fmax F(xj)]

(3.1)

whereFmax= max{F(xj) :j = 1,...,N}. See Table 3.1 for an example.

Common features of the parents (those that are expected to be typical of good solutions) or,

at least, some of those features, should be preserved when producing children (see later). As an

example, consider the traveling salesman problem. If the salesman is to visit every European

capital, then, in a reasonable tour, Helsinki and Madrid will never be visited successively (neither

will London and Athens). This feature should be preserved when crossing two reasonable parents.

3.8.3 A basic genetic algorithm

We are now ready to describe a primitive genetic algorithm for the combinatorial optimization problem

min{F(x) : x X}. The algorithm depends on the choice of a crossover operator, and on the choice of

a probability distribution Prob(.) defined on every finite subset ofX. Let us assume that the following

parameters have also been selected: N(the population size) and M (the number of children produced in

each generation) with MN. Then, the basic genetic metaheuristic is presented in Figure 3.20.

8/11/2019 AOR Syllabus20132014

53/115


X(k) F(x) Prob

x1 F(x1) = 15 0

x2 F(x2) = 12 3/13

x3 F(x3) = 10 5/13

x4 F(x4) = 10 5/13

Table 3.1: Genetic algorithms: selecting good parents

1. Initialization: Select an initial population X(1) X with |X(1)| = N, set F := min{F(x) :

x X(1)}, x := argmin{F(x) :x X(1)}andk := 1.

2. Repeat:

Selection of parents: Create a new temporary populationY(k) ={y1,...,y2M}, drawn randomly

(with replacements) from X(k) according to the distribution Prob(x).

Crossover: Forj = 1,...,M, cross the pair of parents (y2j1, y2j) to produce the set of children

Z(k) ={z1,...,zM}.

Survival of the fittest: Draw randomly N M elements from X(k) (with probability

Prob(x)) and add them to Z(k) in order to create the next-generation population X(k+1) =

{x(k+1)1 ,...,x

(k+1)N }. (An alternative procedure would draw Nelements from X

(k) Z(k).)

Letx:= argmin{F(x)| x X(k+1)}. IfF(x)< F then F :=F(x) and x :=x.

If the stopping criterion is satisfied then returnx

, F

and stop, else letk := k +1 and continue.

Figure 3.20: A simple genetic metaheuristic

8/11/2019 AOR Syllabus20132014

54/115


Let us formulate some comments on this algorithm:

1. A mutationphase is sometimes added to this basic algorithm, for instance after the step Survival

of the fittest. A mutation operator replaces each element ofX(k+1), with low probability , by a

randomly selected neighbor of this element. In other words, with probability , each element is

slightly perturbed. For example, (100101) could be replaced by its mutant (101101). The objective

of this operation is to increase the amount of diversification in a population. However, many

researchers consider nowadays that mutation does not significantly improve the performance of

GAs.

2. Possible stopping criteria are, as usual: a limit on the total number of iterations, convergence ofF,

a measure of the gap between F and a lower bound on min F(x), etc. For GAs, another criterion

is also commonly used. Let us define the fitnessof population X(k) as the average value ofF(x)

over Xk, that is the value:

Qk = 1

|X(k)|

xX(k)

F(x).

Convergence ofQk toward a fixed value indicates that the population is increasingly homogeneous

and that the procedure reaches a stationary state. Thus, if the difference |Qk+1 Qk| is small for

several successive iterations, then the algorithm can stop.

In its primitive form, the genetic algorithm presented above is generally not a very efficient approach

to the solution of hard combinatorial optimization problems. Before it becomes a practical method,

some enhancements have to be added to this basic scheme. In the next subsections, we proceed with a

discussion of such possible refinements.

3.8.4 Intensification and local search

In the simple GA outlined above, the average quality (or fitness) of a population is driven up by a single

factor in the course of iterations, namely: the random bias introduced in the selection of parents and in

the selection of the fittest step. However, by itself, this bias is generally insufficient to significantly

improve a bad initial population.

Moreover, in spite of everything we said earlier, solutions (children) arising from a crossover operation

are frequently quite different from their parents and may turn out to be much worse.

These observations lead to an improvement of the GA scheme which is conceptually simple, but

very powerful in practice: it consists in introducing a local search (intensification) phase within the

8/11/2019 AOR Syllabus20132014

55/115


diversification strategy of GA. This is simply done, for instance, by adding the following step right after

the crossover step. (Some authors speak ofmemetic algorithmswhen this step is introduced in the basic

GA scheme.) Local improvement: Forj = 1, 2, . . . , M , letzj be the best solution produced by a local search algorithm

(either greedy, or steepest descent, or SA,...) starting from zj as initial solution. Replace zj by zj in

Z(k).

In picturesque terms, we could say that children must be raised before they can be incorporated in the

population. More abstractly, with the above modification, we can view GA as performing a succession of

multistart rounds, where each round is initialized from members of the current population.

Whatever the interpretation, interlacing the basic GA scheme with some form of local search seems to

be asine qua nonecondition for the efficiency of the procedure. Let us illustrate this on some examples.

Example: Knapsack problem. Consider the knapsack problem

max cx


and the particular instance:

max 2x1+ 3x2+ 5x3+ x4+ 4x5

subject to 5x1+ 4x2+ 4x3+ 3x4+ 7x5 14xi {0, 1} for i = 1, . . . , 5.

We use the following crossover operator: if the parents are x and y, then the child z has zi = 1 when

xi= yi= 1, and zi= 0 otherwise (1, 2, . . . , n). (The child inherits an object only if both his parents own

it.) So we obtain for instance:

x= 11010, value = 6

y= 10001, value = 6

z = 10000, value = 2

Note that this crossover, even though it ensures feasibility of the children, will systematically produce

children of lower quality than their parents.

Assume now that we apply a variant of the classical greedy algorithm during the improvement phase:

first, we sort the indices (1, 2, . . . , n) by nonincreasing ratios cjaj

. Then, without changing the components

8/11/2019 AOR Syllabus20132014

56/115


ofz that are already equal to 1, we run through L and fix the next variable to 1 as long as the knapsack

constraint is not violated.

In our example, this procedure yields the priority listL = (3, 2, 5, 1, 4), and successively produces thesolutions: z = 100001010011100 =z, stop (with value = 10).

An alternative interpretation of the previous approach is to consider the local optimization step as

a feature of the crossover operator itself, rather than as an addition to it. (Even though both points of

view are in a sense equivalent, it is sometimes interesting to look at them from different angles.)

To illustrate this idea, let us again consider the knapsack problem and consider a priority list L on

{1, 2, . . . , n}. Then, we can define an optimizing crossover operator as follows: to compute the child z

ofx andy , we go through L and we let

zi= 1 if either xi= 1 or yi= 1 and if this results in a feasible solution;

zi= 0 otherwise.

(Another description of the same heuristic is: restrict the attention to those objects that have b een

selected at least once in either x or y , and apply the greedy heuristic to this subset of objects.)

For the above example, the list L = (3, 2, 5, 1, 4) leads to

x= 11010

y= 10001

z= 01011

The resulting solution z has value 8 (better than both its parents).

Example: Traveling salesman problem.

Considering the local optimization step as a feature of the crossover operator can similarly be applied

to the traveling salesman problem, as explained for instance in Hoos and Stutzle (2005), Kolen and Pesch

(1994), Merz and Freisleben (2001).

Suppose that T and T are two distinct solutions of the traveling salesman problem, viewed as sets

of edges. A child ofT and T can be produced by keeping all edges that occur in both parent solutions,

and by using a greedy procedure to complete the resulting partial solution T T.

Merz and Freisleben (2001) propose more specifically to apply the DPX crossover operator shown in

Figure 3.21 (we skip some details). They show that variants of this crossover operator, when combined

with effective local improvement steps, provide excellent solutions for the TSP.

8/11/2019 AOR Syllabus20132014

57/115


DPX Crossover:

1. computeC= T T; letP1, P2, . . . , P k be the subpaths that make up C, and letuj , vj

be the endpoints of subpath Pj forj = 1, 2, . . . , k;

2. while C is not a tour, repeat

ifC is a path containing all vertices, then add the missing edge that closes the

tour; else,

choose randomly one of the endpoints uj ;

choose the closest vertex to uj among all verticesw {u1, v1, u2, v2, . . . , uk, vk},

w / {uj , vj}, such that the edge (uj , w) is not included in T T;

add the edge (uj , w) to C;

3. return C;

Figure 3.21: DPX crossover for the TSP

Such adaptations of the basic genetic algorithm allow to enrich it with some heuristics that have

been specifically developed for the problem at hand. Indeed, whereas the special features of a problem

are usually included quite naturally in a steepest descent or in a simulated annealing algorithm (via theneighborhood structure), this is not immediately true in the basic GA formulation displayed in Figure

3.20.

A similar objective can sometimes be attained through a judicious encoding of the solutions. The

AOR Syllabus20132014

Documents