Top Banner

of 115

AOR Syllabus20132014

Jun 03, 2018

Download

Documents

kave1337
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/11/2019 AOR Syllabus20132014

    1/115

    ADVANCED OPERATIONS RESEARCH

    Yves Crama

    HEC Management School, University of Liege

    January 2014

  • 8/11/2019 AOR Syllabus20132014

    2/115

    Contents

    1 Introduction 1

    2 Combinatorial optimization and computational complexity 3

    2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.1.1 The shortest path problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.1.2 The Chinese postman problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2.1.3 The traveling salesman problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.1.4 The 0-1 linear programming problem . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.1.5 The graph equipartitioning problem . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.1.6 The graph coloring problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.1.7 Combinatorial optimization in practice . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.1.8 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.2 A glimpse at computational complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.2.1 Computational performance criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.2.2 Problems and problem instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.2.3 Easy and hard problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.2.4 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    3 Heuristics for combinatorial optimization problems 173.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    3.2 Reformulation, rounding and decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    3.3 List-processing heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.3.1 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    i

  • 8/11/2019 AOR Syllabus20132014

    3/115

    ii CONTENTS

    3.4 Neighborhoods and neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    3.4.1 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    3.4.2 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.5 Steepest descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.5.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.5.2 Local minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    3.5.3 Choice of neighborhood structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    3.5.4 Selection of neighbor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    3.5.5 Fast computation of the objective function . . . . . . . . . . . . . . . . . . . . . . 31

    3.5.6 Flat ob jective functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    3.5.7 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.6 Simulated annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.6.1 The simulated annealing metaheuristic . . . . . . . . . . . . . . . . . . . . . . . . . 35

    3.6.2 Choice of the transition probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    3.6.3 Stopping criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    3.6.4 Implementing the SA algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    3.6.5 Variants of the SA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    3.7 Tabu search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    3.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    3.7.2 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    3.7.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    3.8 Genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    3.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    3.8.2 Diversification via crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    3.8.3 A basic genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    3.8.4 Intensification and local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    3.8.5 Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    3.8.6 Implementing a genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    4 Modeling languages for mathematical programming 59

    4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

  • 8/11/2019 AOR Syllabus20132014

    4/115

    CONTENTS iii

    5 Integer programming 63

    5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    5.1.1 Integer programming models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.1.2 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    5.2 Branch-and-bound method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    5.2.1 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    5.2.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    5.2.3 Heuristic solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    5.2.4 Tight formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    5.2.5 Some final comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    6 Neural networks 77

    6.1 Feedforward neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    6.2 Neural networks as computing devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    6.3 Neural networks as function approximation devices . . . . . . . . . . . . . . . . . . . . . . 80

    6.4 Unconstrained nonlinear optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    6.4.1 Minimization problems in one variable: introduction . . . . . . . . . . . . . . . . . 82

    6.4.2 Equations in one variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    6.4.3 Minimization problems in one variable: algorithms . . . . . . . . . . . . . . . . . . 84

    6.4.4 Multivariable minimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . 85

    6.5 Application to NN design: the backpropagation algorithm . . . . . . . . . . . . . . . . . . 86

    6.5.1 Extensions of the delta rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

    6.5.2 Model validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    6.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

    6.7 Notes on PROPAGATOR software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    6.7.1 Input files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    6.7.2 Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    6.7.3 Main window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

    7 Cases 95

    7.1 Container packing at Titanic Corp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

    7.2 Stacking boxes at Gizeh Inc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

  • 8/11/2019 AOR Syllabus20132014

    5/115

    iv CONTENTS

    7.3 A high technology routing system for Meals-on-Wheels . . . . . . . . . . . . . . . . . . . . 96

    7.4 Operations scheduling in Hobbitland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

    7.5 Setup optimization for the assembly of printed circuit boards . . . . . . . . . . . . . . . . 987.6 A new product line for Legiacom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

    Bibliography 110

  • 8/11/2019 AOR Syllabus20132014

    6/115

    Chapter 1

    Introduction

    The aim of the course Advanced Operations Researchis to present several perspectives on mathematical

    modeling and problem-solving strategies as they are used in operations research.

    The course contains several independent parts, namely:

    general-purpose heuristic strategies for the solution of combinatorial optimization problems, such

    as simulated annealing, tabu search or genetic algorithms;

    learning of a modeling language, i.e. a computer language specially devoted to the formulation,

    the solution and the analysis of large-scale optimization models (linear or nonlinear programmingproblems);

    an introduction to mixed integer programming models and algorithms;

    other numerical methods, as time allows: neural networks, simulation, ...

    These lecture notes propose a preliminary draft of the material usually covered in the course. They

    concentrate mostly on combinatorial optimization heuristics, on mixed integer programming methods and

    on neural networks. Modeling languages are handled more superficially, as this topic is mostly illustrated

    through the development of numerical models in the computer lab.The course assumes that the reader has had a first introduction to operations research and has some

    elementary knowledge of mathematical modeling, of mathematical programming and of graph theory.

    Special thanks are due to Jean-Philippe Peters who drafted the first version of these classroom notes.

    1

  • 8/11/2019 AOR Syllabus20132014

    7/115

    2 CHAPTER 1. INTRODUCTION

  • 8/11/2019 AOR Syllabus20132014

    8/115

    Chapter 2

    Combinatorial optimization and

    computational complexity: Basic

    notions

    The generic combinatorial optimization (CO) problem is

    minimize {F(x)| x X} (2.1)

    whereX is afinite(or at least, discrete1) set offeasible solutionsand Fis a real-valued objective function

    defined on X. Of course, ifXis given in extension, i.e., by a complete explicit list of its elements, then

    solving (CO) is quite easy: it suffices to compute the value ofF(x) for all elements x Xand to retain

    the best element. But whenX is defined implicitly rather than in extension, the problem may become

    much harder.

    2.1 Examples

    2.1.1 The shortest path problem

    Nowadays, lots of commercial software products allow you to select effortlessly the shortest possible

    route from your current location to a chosen destination (for example, from Liege to Torremolinos). The

    1Intuitively, a set is discreteif it does not contain any continuous subset

    3

  • 8/11/2019 AOR Syllabus20132014

    9/115

    4 CHAPTER 2. COMBINATORIAL OPTIMIZATION AND COMPUTATIONAL COMPLEXITY

    optimization problem which has to be solved whenever you address a query to the system can be modelled

    as follows.

    There is a graph G = (V, E) whereVis a finite set of elements called verticesand Eis a collection of

    pairs of vertices called edges(think ofVas a list of geographical locations and ofEas a road network;

    see e.g. Figure 2.1 for a representation). Assume that every edge e Ehas a nonnegative length (e),

    and lets and t be two vertices in G. Theshortest path problemis to find a path (a connected sequence of

    edges) through the graph that starts ats and ends at t, and which has the shortest possible total length.

    This is clearly a CO problem, where Xis the (finite) set of all paths from s to t and F(x) is the total

    length of path x. Note that the cardinality ofXcan be of the same order of magnitude as 2|V|, which is

    quite large as compared to the size of the graph.

    4 5

    2 3

    1

    6

    Figure 2.1: A graph with 6 vertices and 8 edges

    2.1.2 The Chinese postman problem

    This problem is similar to the shortest path problem, except that we consider here the additional con-

    straint that every edge ofG should be traversed exactly once by the path from sto t (the postman has

    to visit every street in his district). It is also usual to assume that s= t in this problem (the postmanreturns to the depot at the end of the day).

    Besides its postal illustration, this model encounters applications in a variety of vehicle routing situ-

    ations (garbage collection, snow plowing, street cleaning, etc.) and in the design of automatic drawing

    software.

  • 8/11/2019 AOR Syllabus20132014

    10/115

    2.1. EXAMPLES 5

    2.1.3 The traveling salesman problem

    The traveling salesman problem (denoted TSP) is again similar to the shortest path problem, with the

    added requirement that every vertex should be visited exactly once by the path from s to t: the salesman

    must visit each and every customer (located in the cities in V) along the way. In the sequel, we shall

    always assume that G is a complete graph (i.e., it contains all possible edges) and that s= t. Thus, we

    speak of a traveling salesman tour rather than path. Then,Xcan simply be viewed as the set of all

    permutations of the elements ofV and|X|= |V|!. For instance, if|V|= 30, then |V|! is roughly 2 1032.

    This famous combinatorial optimization problem has numerous applications, either in its pure form

    or as a subproblem of more complex models. It arises for instance in many production scheduling settings

    (sequencing of tasks on a single machine when the setup time between two successive tasks depends on

    the identity of these tasks, sequencing of drilling operations in metal sheets, sequencing of componentplacement operations for the assembly of printed circuit boards, etc.) and in various types of vehicle

    routing models (truck delivery problems, mail pickup, etc.).

    2.1.4 The 0-1 linear programming problem

    We can express the 0-1 LP problem as

    min cx

    subject to Ax b and x {0, 1}n

    wherec Rn,b Rm andA Rmn are the parameters (or numerical data) of the problem and x Rn

    is a vector of (unknown) decision variables. Note that, if we drop the constraint x {0, 1}n, then the

    problem is simply a linear programming problem which can be solved by a variety of efficient algorithms

    (e.g., the simplex method or an interior-point method). However, the requirement that x {0, 1}n leads

    to a (much harder) CO problem whereX={ x {0, 1}n : Ax b }. The cardinality of this set, although

    finite, is potentially as large as 2n (whenn = 30, this is approximately 109).

    Theknapsack problemis the special case of 0-1 LP with only one inequality constraint:

    max cx

    subject to ax b andx {0, 1}n

    where a, c Rn+ and b R. The usual interpretation of this problem is that the indices i = 1, 2, . . . , n

    denote n objects that a hiker may want to carry in her knapsack, ci is the utility of object i, ai is its

    weight and b is the maximum weight that the hiker is able to carry.

  • 8/11/2019 AOR Syllabus20132014

    11/115

  • 8/11/2019 AOR Syllabus20132014

    12/115

    2.2. A GLIMPSE AT COMPUTATIONAL COMPLEXITY 7

    R Y

    G B

    R

    G

    Figure 2.2: A feasible coloring

    Applegate, Bixby, Chvatal, and Cook (2006), Barnhart, Johnson, Nemhauser, Sigismondi, and Vance

    (1993), Bartholdi, Platzman, Collins, and Warden (1983), Bollapragada, Cheng, Phillips, Garbiras, Sc-

    holes, Gibbs, and Humphreville (2002), Crama, van de Klundert, and Spieksma (2002), Crama, Oer-

    lemans, and Spieksma (1996), Jain, Johnson, and Safai (1996), Glover and Laguna (1997), Kohli and

    Krishnamurti (1987), Moonen and Spieksma (2003), Oliveira, Ferreira, and Vidal (1993), Tyagi and

    Bollapragada (2003), etc.

    2.1.8 Exercises.

    Exercise 1. Consider the Meals-on-Wheels case in Section 7.3. Explain the similarities that this problem

    shares with the traveling salesman problem, as well as the differences between the problems.

    2.2 A glimpse at computational complexity

    In order to fully appreciate the field of combinatorial optimization, it is necessary to understand, at least

    at an intuitive level, some of the basic concepts of computational complexity. This part of theoretical

    computer science deals with fundamental, but extremely deep questions like: what tasks can be carriedout by a computer?, or how much time does a given computational task require?

    In this section, we attempt to introduce some elements of computational complexity, in a very informal

    and hand-waving way. We refer the interested reader to Tovey (2002) for a more formal tutorial, and to

    Papadimitriou and Steiglitz (1982) for a rigorous treatment of the topic.

  • 8/11/2019 AOR Syllabus20132014

    13/115

    8 CHAPTER 2. COMBINATORIAL OPTIMIZATION AND COMPUTATIONAL COMPLEXITY

    2.2.1 Computational performance criteria

    What do we expect from a CO algorithm ? Well, an obvious answer would be that this algorithm should

    always return an optimal solution of the problem. Is it the only game in town ? Certainly not. We

    might also want it to be fast or efficient. Combining these two expectations is a crucial thing. Indeed

    the required time to solve a problem logically increases together with the size of this problem, where the

    size can be measured by the amount of data needed to describe a particular instance of the problem.

    Let us take a look at an example. Suppose that we want to solve a 0-1 linear programming problem

    involving n variables xj {0, 1}, j = 1, . . . , n. We can certainly find an optimal solution by listing all

    possible vectors (x1, x2, . . . , xn), by checking for each of them whether it is feasible or not, by computing

    the value of the objective function for each such feasible solution, and by retaining the best solution

    found in the process. If we decide to go that way, then we must consider 2n vectors. For n = 50, that

    means 250 1015 = 1, 000, 000, 000, 000, 000 vectors! If our algorithm is able to enumerate one million

    (1,000,000) solutions per second, the whole procedure takes 109 seconds, or about 30 years. And for

    n= 60, the enumeration of the 260 solutions would take about 30, 000 years !!

    Note thatadding10 variables to the problem increases the computing time by a multiplicative factor

    of 210 1, 000. So, with n = 80 variables (a rather modest problem size), the same algorithm would run

    for 30 billion years, which is about twice the age of the universe. Not really efficient, by any practical

    standards...

    Let us look at this issue from another vantage point. Consider the well-known Moores law: Gordon

    Moore, co-founder of the chips giant Intel, prophetized in 1965 that the number of transistors per square

    inch on integrated circuits would double every 18 months per year starting from 1962, the year the

    integrated circuit was invented (see the original paper of Moore (1965) for more details). In other words,

    your PC processor works twice faster every year and a half, meaning that its speed is multiplied by 100

    in 10 years.2 So, if you were able to enumerate 2n solutions in one hour in 1997, you could enumerate

    100 2n

  • 8/11/2019 AOR Syllabus20132014

    14/115

  • 8/11/2019 AOR Syllabus20132014

    15/115

    10 CHAPTER 2. COMBINATORIAL OPTIMIZATION AND COMPUTATIONAL COMPLEXITY

    1. Matrix addition problem:

    Instance size: 2n2.

    Algorithm: any naive addition algorithm.Running time: n2 (additions). We denote this byO(n2), meaning that the running time grows

    at most like n2.

    2. Shortest path problem:

    Instance size: O(n2) wheren= |V|.

    Algorithm 1: enumerate all possible paths between s and t.

    Running time of Algorithm 1: there could be exponentially many paths and tA1= O(2n).

    Algorithm 2: Dijkstras algorithm (see Nemhauser and Wolsey (1988)).

    Running time: O(n2) operations.

    3. Traveling salesman problem:

    Instance size: O(n2) wheren= |V|.

    Algorithm: enumerate all possible tours.

    Running time: O(n!).

    In view of these examples, we are led to the following concept: the complexityof an algorithm A for

    a problem P is the function

    cA(n) = max{tA(I)| Iis an instance ofPwith size s(I) =n}. (2.2)

    This is sometimes called the worst-case complexity ofA: indeed, the definition focuses on the worst-case

    running time ofA on an instance of size n, rather than on its average running time.

    2.2.3 Easy and hard problems

    Figure 2.3 represents different types of complexity behaviors for algorithms.

    The algorithm A is polynomial ifcA(n) is a polynomial (or is bounded by a polynomial) in n, and

    exponential ifcA(n) grows faster than any polynomial function in n. Intuitively, we can probably accept

    the idea that a polynomial algorithm is more efficient thaSn an exponential one.

    For instance, the obvious algorithms for the addition and or the multiplication of matrices are poly-

    nomial. So is the Gaussian elimination algorithm for the solution of systems of linear equations. On

    the other hand, the simplex method (or at least, some variants of it) for linear programming problems

  • 8/11/2019 AOR Syllabus20132014

    16/115

    2.2. A GLIMPSE AT COMPUTATIONAL COMPLEXITY 11

    Figure 2.3: (a) Linear: F(n) =an + b (b) Exponential: F(n) =a 2n

    is known to be exponential3 while interior point methods are polynomial. This clearly illustrates the

    emphasis on the worst-case running time which was already underlined above: indeed, in an average

    sense, the simplex algorithm is an efficient method.

    The complete enumeration approach for shortest path, Chinese postman or traveling salesman prob-

    lems is exponential, since all these problems have an exponential number of feasible solutions. But

    polynomial algorithms exist for the shortest path problem or the Chinese postman problem.

    For the traveling salesman problem or for 0-1 integer programming problems, by contrast, only ex-

    ponential algorithms are known. In fact, it is widely suspected that there does not exist any polynomial

    algorithm for these problems. This is a typical feature of so-called NP-hardproblems which we define

    (very informally again) as follows (see Papadimitriou and Steiglitz (1982) for details).

    Definition 2.2.1. A problem P is NP-hard if it is as least as difficult as the 0-1 linear programming

    problem, in the sense that any algorithm forPcan be used to solve the 0-1 LP problem with a polynomial

    increase in running time.

    The next claim has resisted all proof attempts (and there have been many) since the early 70s, but

    the vast majority of computer scientists and operations researchers believe that it holds true.

    3Klee and Minty (1972) provide instances Iof the LP problem such that tsimplex 2s(I)

  • 8/11/2019 AOR Syllabus20132014

    17/115

  • 8/11/2019 AOR Syllabus20132014

    18/115

    2.2. A GLIMPSE AT COMPUTATIONAL COMPLEXITY 13

    Definition 2.2.2. Aheuristicfor an optimization problemPis an algorithm which is based on intuitively

    appealing principles, but which does not guarantee to provide an optimal solution ofP.

    So, when running on a particular CO problem, a heuristic could for instance

    return an optimal solution of the problem, or

    return a suboptimal solution, or

    return an infeasible solution, or

    fail to return any solution at all,

    etc.

    This very broad definition of a heuristic may seem rather amazing at first sight. It raises again the

    question of the criteria which can be applied to analyze the performance of a particular heuristic. We

    mention here two criteria which will be of particular concern in this course.

    Computational complexity

    Generally speaking, we want heuristics to be fast, at least when compared with the highly exponential

    running times mentioned above. In fact, the main reason for giving up optimality is that we want the

    heuristic to compute quickly a reasonably good solution. Thus, the basic trade-off that we want to achieve

    reads

    SOLUTION QUALITY vs. RUNNING TIME

    Quality of approximation

    The solution returned by the heuristic should provide a good approximation of the optimal solution. To

    understand how to measure this, let xH be the solution computed by heuristicHfor a particular instance

    and letxopt be an optimal solution for this instance.

    Then,

    E(xH) = F(xH) F(xopt)F(xopt)

    0 (2.3)

    provides a relative error measure: the closer it is to 0, the better the solution xH.

    In general, however, F(xopt) is unknown. So, suppose now that we know how to compute a lower

    bound on F(xopt), i.e. a number F such thatF F(xopt) (this is often much easier to compute than

  • 8/11/2019 AOR Syllabus20132014

    19/115

    14 CHAPTER 2. COMBINATORIAL OPTIMIZATION AND COMPUTATIONAL COMPLEXITY

    F(xopt)). Define

    E(xH) =F(xH) F

    F . (2.4)

    Then we have

    E(xH) = F(xH)

    F(xopt) 1

    F(xH)

    F 1 =E(xH) (2.5)

    which means that E(xH) overestimates the relative error E(xH). So, if E(xH) is small, we can

    certainly be happy with the quality of the solution provided by H. (Note also that if the lower bound

    F is reasonably close to F(xopt), then E(xH) actually provides a good estimate of the error.)

    For example, consider the traveling salesman instance described by the (symmetric) distance matrix

    L, where ij represents the distance from i to j , i, j = 1, 2, . . . , 6:

    L =

    0 4 7 2 6 3

    4 0 3 5 5 7

    7 3 0 2 6 5

    2 5 2 0 9 8

    6 5 6 9 0 5

    3 7 5 8 5 0

    .

    Assume now that a heuristic returns the tourxH = (1, 2, 3, 4, 5, 6) (displayed in Figure 2.4).

    6

    1

    5 2

    4

    3

    3

    4

    3

    2

    9

    5

    Figure 2.4: A feasible tour

    The total length of this tour is F(xH) = 4 + 3 + 2 + 9 + 5 + 3 = 26. On the other hand, an obvious

    lower bound on the optimal tour length is given by the sum of the 6 shortest distances in L. ThusF

    = 2 + 2 + 3 + 3 + 4 + 5 = 19, and, consequently, E(xH) = 261919 0.37. We can therefore conclude

  • 8/11/2019 AOR Syllabus20132014

    20/115

    2.2. A GLIMPSE AT COMPUTATIONAL COMPLEXITY 15

    that xH is at most 37% longer than the optimal tour.

    In order to compute lower bounds for combinatorial optimization problems, a simple but powerfulprinciple can often be used: when a constraint of a minimization problem P is relaxed (i.e., when

    the constraint is either removed or replaced by a weaker one), then the optimal value of the resulting

    relaxed problem provides a lower bound on the optimal value ofP. This principle will be illustrated

    on the examples below.

    2.2.4 Exercises.

    Exercise 1. Consider again the traveling salesman problem. For every vertex v V, select the shortest

    edge ev incident to v. Show that

    vV (ev) is a lower bound on the length of the optimal tour.Compute this lower bound for the numerical example in Section 2.2.3. Can you improve this lower bound

    by taking into account thetwo shortest edges incident to every vertex v? What bound do you obtain for

    the numerical example?

    Exercise 2. Consider the following problem: you want to save n electronic files with respective sizes

    s1, s2, . . . , sn 0 on the smallest possible number of storing devices (say, floppy disks) with capacity C.

    This problem is known under the name ofbin packing problem, and it is NP-hard. Can you compute a

    lower bound on its optimal value?

    Exercise 3. Show that the optimal value of the linear programming problem

    min cx subject to Ax b, 0 xj 1 (j = 1, 2, . . . , n)

    provides a lower bound on the optimal value of the 0-1 LP problem

    min cx subject to Ax b, xj {0, 1} (j = 1, 2, . . . , n).

    Exercise 4. Show that the lower bounds obtained in Exercises 1-3 can all be viewed as optimal solutions

    of a relaxation of the original problem.

  • 8/11/2019 AOR Syllabus20132014

    21/115

    16 CHAPTER 2. COMBINATORIAL OPTIMIZATION AND COMPUTATIONAL COMPLEXITY

  • 8/11/2019 AOR Syllabus20132014

    22/115

    Chapter 3

    Heuristics for combinatorial

    optimization problems

    3.1 Introduction

    Even though there does not really exist any general theory of heuristics, certain common strategies

    can be identified in many successful heuristics. The aim of this chapter is to present such fundamental

    principles of heuristic algorithms for combinatorial optimization problems of the form

    minimize {F(x)| x X} . (CO)

    In Sections 3.2 and 3.3 below, we succesively describe a few simple ideas of this nature, namely

    reformulation, decomposition, rounding, and list-processing.

    Then, we turn to more elaborate frameworks, or guidelines, which have been proposed to develop

    specific heuristics for a broad variety of optimization problems. These frameworks go by the name of

    metaheuristic schemes, ormetaheuristicsfor short. Thus, metaheuristics can be viewed as recipes for

    the solution of (CO) problems.

    We focus more particularly on so-called local search heuristics. Broadly speaking, local search heuris-

    tics rely on a common, rather natural and intuitive approach to find a good solution of (CO): starting

    from an initial solution, they move from solution to solution in the feasible region X, in an attempt (or

    hope) to locate a good solution along the way (see Figure 3.1, where N(x) represents the neighborhood

    17

  • 8/11/2019 AOR Syllabus20132014

    23/115

    18 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    of a current solution x). Most metaheuristics (like simulated annealing or tabu search) specifically

    generate local search heuristics. They constitute the main topic of this chapter.

    x0x1

    x2

    x3

    x4

    N(x0)

    N(x1)

    N(x2)

    N(x3)

    X

    Figure 3.1: Local search

    Additional information on heuristics can be found for instance in Aarts and Lenstra (1997), Glover

    and Laguna (1997), Hoos and Stutzle (2005), Papadimitriou and Steiglitz (1982), Pirlot (1992), and manyother sources.

    3.2 Reformulation, rounding and decomposition

    Many heuristics rely on a few simple and natural ideas. One such idea is to replace the original hard

    problem (CO) by an easier, but closely related one, say (CO). This can be accomplished, for instance,

    by changing the definition of the objective function, or by dropping some of the constraints of (CO). In

    the latter case, solving the simplified problem (CO) usually produces an infeasible solution of (CO), and

    this solution needs to be somehow repaired in order to produce a feasible (but suboptimal) solution.

    A specific, but extremely useful and common application of this idea is found in rounding algorithms

    for 0-1 linear programming problems of the form

    min cx

  • 8/11/2019 AOR Syllabus20132014

    24/115

    3.2. REFORMULATION, ROUNDING AND DECOMPOSITION 19

    subject toAx b and x {0, 1}n.

    We have already observed in Section 2.1.4 that, when we drop the constraint x {0, 1}n from this

    problem formulation, we obtain a linear programming problem which can be easily solved. Of course, the

    optimal solution of the LP model is typically fractional, and hence infeasible for the original problem.

    However, it is sometimes possible to round this optimal solution in such a way as to obtain feasible 0-1

    solutions of the 0-1 LP problem.

    Rounding has been used in countless algorithms for 0-1 LP problems, be it in theoretical developments,

    in implementations of generic solvers, or in specific industrial applications; see for instance Bollapragada

    et al. (2002) for a recent illustration.

    Another general idea for solving hard problems is to decompose them into a collection or a sequence

    of simpler subproblems. Then each subproblem can be solved either optimally or heuristically, and the

    solutions of the subproblems are patched together in order to provide a feasible solution of the original

    problem. Similar decomposition approaches are sometimes called divide and conquer strategies in the

    broader context of algorithmic design. Note, however, that they usually result in suboptimal solutions of

    the original CO problem.

    Examples of the decomposition strategy are abundant in real-world settings. In a very broad sense,

    if we assume that the ultimate ob jective of management is to optimize the revenues, or the shareholders

    profit, or the survivability of a firm, then the functional organization of the firm in marketing, production,

    and finance departments can be viewed as a way to decompose the global optimization issue into a num-

    ber of subproblems, linked together by appropriate coordination mechanisms (e.g., strategic or business

    plans).

    More specific examples can be found in classical production planning approaches, for instance in MRP

    techniques (Material Resource Planning; see Crama (2002)). Here, for simplicity reasons, the optimal lot

    size is usually determined independently for each component arising in a bill-of-materials. But in fact, the

    actual cost-minimization problem faced by the firm involves many interactions among these components:

    use of common production equipments, possibilities of joint orders from suppliers, etc. Therefore, the

    component-wise decomposition only provides a heuristic way of handling the global issue.

    Illustrations of decomposition approaches can also be found in the papers by Crama, van de Klundert,

    and Spieksma (2002) or by Tyagi and Bollapragada (2003) and in numerous other publications.

  • 8/11/2019 AOR Syllabus20132014

    25/115

    20 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    3.3 List-processing heuristics

    List-processing heuristics (also called greedyor myopic heuristics) can be viewed as a special type of local

    search heuristics, and are among the simplest among them (see Section 3.4 below). We do not want to try

    to characterize them very precisely here: let us simply say that they apply in particular to CO problems

    of the form

    min (or max) F(S) subject to S E, S I, (IS)

    whereEis a finite set of elements and Iis a collection of subsets ofE.

    The elements ofEcan be viewed as thedecision variablesof the problem. For instance, the knapsack

    problem (see Subsection 2.1.4) can be interpreted in this way: here,E is a set of objects, and S I if

    the subset of objects Sfits in the knapsack.

    Now, list-processing heuristics construct a feasible solution of (CO) in successive iterations, starting

    from the initial solution S= and adding elements to this solution, one by one, in the order prescribed

    by some prespecified priority list. They terminate as soon as the priority list has been exhausted. In

    particular, no effort is made to improve this solution in subsequent steps (which justifies the names

    myopic or greedy).

    Thus, the list-processing metaheuristic can be sketched as in Figure 3.3 below.

    1. Etablish a priority listL of the elements ofE.

    2. Set S :=.

    3. Repeat: ifL is empty then return Sand stop; else

    consider the next element in L, say ei, and remove ei from L;

    ifS {ei}is feasible, i.e. ifS {ei} I, then set S :=S {ei}.

    Figure 3.2: The list-processing metaheuristic

    Intuitively speaking, the choice of the list L should be dictated by the impact of each element ofEon

    the objective function F: those variables with a smaller marginal cost (for a minimization problem) or

    a heavier marginal contribution (for a maximization problem) should receive higher priority. But these

  • 8/11/2019 AOR Syllabus20132014

    26/115

    3.3. LIST-PROCESSING HEURISTICS 21

    general guidelines leave room for many possible implementations. Let us illustrate this discussion on a

    few examples.

    Example: The knapsack problem. Consider the knapsack problem

    max cx

    subject to ax b andx {0, 1}n

    wherea, c Rn+ andb R. Various list-processing strategies can be proposed for this problem.

    Strategy 1.

    1. Sort the variables by nonincreasing utility value: ifci > cj , then xi precedes xj in L.

    2. Setx := (0, 0, . . . 0).

    3. Run throughL; increase the current variable to 1 if the resulting partial solution is feasible; otherwise

    leave it equal to 0.

    Let us apply this strategy to the instance:

    max 3x1+ 10x2+ 3x3+ 7x4+ 6x5

    subject to 2x1+ 6x2+ 5x3+ 8x4+ 3x5 16

    xi {0, 1}(i= 1, 2, . . . , 5).

    For this instance, we successively obtain:

    L= (x2, x4, x5, x1, x3)

    x := (0, 0, 0, 0, 0)

    x2:= 1; x := (0, 1, 0, 0, 0);

    x4:= 1; x := (0, 1, 0, 1, 0);

    x5:= 0; x := (0, 1, 0, 1, 0) (since (0, 1, 0, 1, 1) is not feasible !);

    x1:= 1; x := (1, 1, 0, 1, 0);

    x3:= 0; x := (1, 1, 0, 1, 0);

    So, the algorithm returns the heuristic solution (1, 1, 0, 1, 0), with value 20.

    An obvious shortcoming of Strategy 1 is that it does not take the value of the coefficients aj into

    account when fixing the priority list. So, in the previous instance, variable x4 is given higher priority

    than x5 when in fact, for a comparable utility, x5 adds much less weight to the knapsack than x4. This

    observation leads to the next strategy.

  • 8/11/2019 AOR Syllabus20132014

    27/115

    22 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    Strategy 2.

    1. Sort the variables by nonincreasing value of the ratios ciai

    : if ciai

    > cjaj

    , then xi precedesxj in L.

    2. Setx

    := (0, 0, . . . 0).3. Run throughL; increase the current variable to 1 if the resulting partial solution is feasible; otherwise

    leave it equal to 0.

    Going back to the numerical instance, we now obtain L = (x5, x2, x1, x4, x3). The resulting heuristic

    solution is (1, 1, 1, 0, 1), with value 22.

    Interestingly, it can be proved that this strategy is equivalent to the following one (which combines

    rounding with list-processing): solve the LP relaxation of the knapsack problem to obtain a fractional

    solutionx, then sort the variables by nonincreasing value of the componentsxi and continue as in Steps

    2-3 of Strategy 2 (see e.g. Nemhauser and Wolsey (1988)).

    Example: The traveling salesman problem. The TSP can be viewed as a minimization problem

    of the form (IS), where E is the set of edges of the underlying graph, and S is in I if and only ifS is

    a subset of edges which can be extended to a TSP tour. Assume now that the priority listL sorts the

    edges by nondecreasing length. The resulting greedy heuristic is known in the literature as theshortest

    edgeheuristic.

    Example: The maximum forest (MFT) problem. Let G = (V, E) be a non-oriented graph with

    weightw(e) 0 on each edgee E. IfSis any subset of edges ofG, the weight ofSis w(S) =

    eSw(e).

    AforestofG is a subset of edges ofG which does not contain any cycle (i.e., closed path). Themaximum

    forest problemasks for a forest ofG of maximum weight.

    The greedy (list-processing) algorithm for this problem is:

    Greedy MFT

    1. Sort the edges ofG by nonincreasing weight: ifw(ei)> w(ej), then ei precedesej inL.

    2. SetT :=.

    3. Run through L; ifT {ei}is a forest (i.e., is cycle-free), then set T :=T {ei}.

    Let us look at the instance in Figure 3.3, with the following weights (we denote by w(1, 2) the weight

    of edge {1, 2}, etc.): w(1, 2) = 10, w(3, 5) = 8, w(1, 3) = 7, w(2, 3) = 7, w(5, 6) = 6, w(3, 6) = 6,

    w(2, 4) = 2, w(4, 5) = 2. Note that we have listed the weights by nonincreasing value. So, the Greedy

    algorithm successively produces:

  • 8/11/2019 AOR Syllabus20132014

    28/115

    3.3. LIST-PROCESSING HEURISTICS 23

    4 5

    2 3

    1

    6

    Figure 3.3: A graph with 6 vertices and 8 edges

    T :=

    T :={(1, 2)}

    T :={(1, 2), (3, 5)}

    T :={(1, 2), (3, 5), (1, 3)}

    T :={(1, 2), (3, 5), (1, 3), (5, 6)}

    T :={(1, 2), (3, 5), (1, 3), (5, 6), (2, 4)}.

    The resulting forest has weight 33 and it is easy to check that this is the optimal solution for this instance

    (although there are several alternative optimal forests).

    Actually, Proposition 3.3.1 hereunder shows that the Greedy algorithm is not only a heuristic, but

    also an exactalgorithm for the Maximum forest problem. Together with some of its far-reaching gener-

    alizations, this result plays a central role in combinatorial theory.

    We first need a lemma. Recall that a treeis a connected forest.

    Lemma 3.3.1. IfG= (V, E) is a connected graph, then every maximal forest ofG is a tree containing

    |V| 1 edges. More generally, ifG hasc connected componentsGi= (Vi, Ei) (i= 1, 2, . . . , n), then every

    maximal forest ofG is the union ofc trees and containsc

    i=1(|Vi| 1) edges.

    Proof. We leave the proof to the reader. QED

    Proposition 3.3.1. The Greedy algorithm delivers an optimal solution for every instance of the Maxi-

    mum forest problem.

  • 8/11/2019 AOR Syllabus20132014

    29/115

    24 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    Proof. Let T ={e1, e2, . . . , et} be the solution returned by the greedy algorithm and let S= {e1, e

    2, . . . , e

    t}

    be an optimal solution, where ei precedes ei+1andei precedese

    i+1 in L. We want to show by induction

    that, fork = 1, 2, . . . , t, w(ek)w(e

    k), which will imply that w(T) w(S).Fork = 1, we have w(e1) w(e

    1) by definition of the Greedy algorithm.

    Consider now an index k > 1. Suppose that w(ei) w(ei) for 1 i < k and w(e

    k) > w(ek). Note

    that ek precedes ek in L.

    Consider the edge-set R = {e E | w(e) w(ek)} and the forests F = {e1, e2, . . . , ek1} and

    H={e1, e2, . . . , e

    k}. We claim thatF is a maximal forest in R, i.e. every edge ofR \ Fcreates a cycle

    in F: indeed, ife R \ F, then w(e) w(ek)> w(ek) and the greedy algorithm should have chosen e

    rather than ek.

    Since|F|= k 1< |H|, we conclude that the graph (V, R) contains two maximal forests of different

    cardinalities, contradicting Lemma 3.3.1. QED

    Beyond their application to CO problems of the form (IS), list-proceesing algorithms can be extended

    to handle associatedpartitioning problems like

    min m

    subject to S1 S2 . . . Sm= E, (PART)

    Si I (i= 1, 2, . . . , m).

    Thus, problem (PART) is here to partition E into a smallest number of sets in I. This problem can

    be attacked by solving a sequence of optimization subproblems over (IS), with F(S) = |S|: try first

    to determine a large set S1 I, then remove from Eall elements ofS1, repeat the process in order to

    determineS2, and so on. If each step is solved by a list-processing algorithm, then the resulting procedure

    is also called a list-processing algorithm for (PART); see Exercises 3 and 4 hereunder.

    Additional examples of list-processing algorithms can be found, for instance, in Crama (2002).

    3.3.1 Exercises.

    Exercise 1. Apply the shortest edge heuristic to the TSP instance given in Section 2.2.3. Compare the

    length of this tour with the lower bounds computed in Section 2.2.4.

    Exercise 2. Prove Lemma 3.3.1.

    Exercise 3. Let G = (V, E) be a graph. A subset of verticesS V is called stable (or independent)

  • 8/11/2019 AOR Syllabus20132014

    30/115

    3.4. NEIGHBORHOODS AND NEIGHBORS 25

    in G if it contains no edges, that is if the following condition holds: for all u, v S, {u, v} E. The

    maximum stable set problem consists in finding a stable set of maximum size in a given graph. Provide

    a greedy heuristic for this problem.Exercise 4. Show that the graph coloring problem (Section 2.1.6) and the bin packing problem (Section

    2.2.4) are partitioning problems of the form (PART). Develop a greedy heuristic for each of these problems.

    3.4 Neighborhoods and neighbors

    In this and the following sections, we concentrate on local search procedures. A common feature of all

    local search procedures is that they exploit the neighborhood concept (see Figure 3.1).

    Definition 3.4.1. A neighborhood structure for the setXis a collection of subsetsN(x) X, one foreachx X. We callN(x) the neighborhood of solutionx, and we say that every element inN(x) is a

    neighbor ofx.

    The neighborhhood concept is naturally linked to the concept of local optimality.

    Definition 3.4.2. A solutionx X is alocal minimumof CO with respect to the neighborhood structure

    N(x) ifN(x) does not contain any solution better thanx, i.e. ifF(x) F(x) for allxX.

    Note for further reference that this definition does not only depend on the problem at hand (i.e.,

    on X and F) but also on the neighborhood structure which has been adopted. However, when theneighborhhod structure is clear from the context and when no confusion can arise, it is common practice

    to omit the qualifier with respect to the neighborhood structure N(x).

    There are in general very many ways to define a neighborhood structure for a particular CO problem.

    Although all possible definitions are not necessarily equally good from the point of view of local search

    performance, it is often difficult to decide ex antewhich ones will perform best. Some experimentation

    and some experience will usually be necessary in order to make the best choice of neighborhoods.

    3.4.1 Some examples

    1. In the 0-1 linear programming problem, we have

    X={x: Ax b, xj {0, 1}, j = 1,..n}.

    For a solution x X, we can for instance define

  • 8/11/2019 AOR Syllabus20132014

    31/115

    26 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    N1(x) ={y X :x and y differ by at most one component}.

    Note that, intuitively speaking, the number of components on which two binary x and y differ provides

    a measure of the distance between x and y (sometimes called Hamming distance between x and y). In

    some applications of local search, we may actually prefer to use the neighborhood structure

    Nk(x) ={y X :x and y differ by at most k components}

    wherek may take any of the values 1,2,3,...

    2. In the traveling salesman problem, a solution can be viewed as a permutation of the vertices. E.g., the

    permutation = (2, 3, 6, 5, 1, 4) represents the tour which visits vertices 2, 3, 6, 5, 1 and 4 in that order.

    Note that every permutation of the vertices corresponds to a feasible tour.Then, a neighborhood structure could be, for instance:

    N() ={| permutation results from permutation by transposition of two vertices}.

    With this definition, permutations (3, 2, 6, 5, 1, 4), (6, 3, 2, 5, 1, 4), (2, 1, 6, 5, 3, 4), (4, 3, 6, 5, 1, 2), etc., are

    neighbors of (2, 3, 6, 5, 1, 4).

    An alternative, slightly more subtle neighborhood structure arises if we look at tours as lists of edges,

    rather than lists of vertices (this is of course conceptually equivalent, but experience shows that different

    representations of solutions may sometimes lead to very different algorithmic developments). Considernow a tour Cas represented in Figure 3.4, where i ,j,k, l are four distinct vertices and edges {l, j} and

    {i, k} are not in the tour. Then, a neighborC ofCcan be obtained by removing the edges {l, k} and

    {i, j}from Cand by adding the edges {l, j} and {i, k}to it. The operation that transforms C into C is

    called a 2-exchange. The 2-exchange neighborhood structure is naturally defined by

    N(C) ={C |C results from Cby a 2-exchange }.

    3. With an instance G = (V, E) of the graph equipartitioningproblem, we associate the feasible set of all

    equipartitions ofV , i.e.

    X={(V1, V2) :V1 V2= V , V1 V2= , |V1|= |V2|}.

    A possible neighborhood structure for this problem is defined as

    N(V1, V2) ={(V1 , V

    2) :V

    1 =V1 {v} \ {u}, V

    2 =V2 {u} \ {v}for some pair of nodes u V1, v V2}.

  • 8/11/2019 AOR Syllabus20132014

    32/115

    3.4. NEIGHBORHOODS AND NEIGHBORS 27

    C

    i

    j

    k

    l

    C

    i

    j

    k

    l

    Figure 3.4: 2-exchange neighborhood concept for the traveling salesman problem

    V1 V2 V

    1 V

    2

    u v v u

    Figure 3.5: Neighborhood concept for the graph equipartitioning problem

  • 8/11/2019 AOR Syllabus20132014

    33/115

    28 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    We have imposed N(x) X for all x. WhenX is small (i.e., for heavily constrained problem), it is

    sometimes difficult, or overly restrictive, to define neighborhoods that obey this condition. For instance,

    when partitioning a graph, it may be natural to consider the alternative neighborhood structure

    N(V1, V2) ={(V1 , V

    2) :V

    1 =V1 {v}, V

    2 =V2\ {v} for some node v V2}.

    In this case, a problem occurs as the feasibility condition |V1 |= |V2 |does not hold, that is (V

    1 , V

    2) /X.

    One way around this difficulty is to reformulate the original CO problem into an equivalent problem that

    admits more feasible solutions (i.e., to extend X) and to penalize all solutions that are not in X.

    For example, for any partition (V1, V2) (not necessarily into equal parts) of the vertex set V, define

    e(V1, V2) to be the number of edges from V1 to V2. Then, the graph equipartitioning problem

    minimizee(V1, V2)

    subject to V1 V2= , V1 V2= V, |V1|= |V2|

    has the same optimal solutions as the following one:

    minimizeh(V1, V2) =e(V1, V2) + M(|V1| |V2|)2

    subject to V1 V2= , V1 V2= V

    whereM is a very large number (penalty). Such problem reformulation allows to enlarge the feasible set,

    hence to move more freely within this set and to find more easily an initial feasible solution x X. (A

    similar reformulation is used in the big M method of linear programming.)

    3.4.2 Exercises.

    Exercise 1. For each of the neighborhood structures defined in Section 3.4.1, estimate the size of the

    neighborhood of a feasible solution as a function of the size of the instance (number of variables, number

    of vertices, etc.).

    Exercise 2. Consider problem (IS) in Section 3.3. Show that a list heuristic is obtained by applying the

    local search principle to (IS) with the following neighborhood structure

    N(T) ={S I |T Sand |S|= |T| + 1}

    (i.e., S results fromTby adding one element to it).

  • 8/11/2019 AOR Syllabus20132014

    34/115

    3.5. STEEPEST DESCENT 29

    3.5 Steepest descent

    The steepest descent metaheuristic is one of the most natural local search heuristics: it simply recommends

    to keep moving from the current solution to the best solution in its neighborhood, until no further

    improvement can be found.

    A more formal description of the algorithm is given in Figure 3.5. We assume here that a particular

    neighborhood structure has been selected. For k = 1, 2, . . ., we denote by xk the current solution at

    iteration k. We denote by x the best available solution and by F the best available function value:

    that is, F =F(x).

    1. Selectx1 X, set F :=F(x1), x =x1 andk := 1.

    2. Repeat:

    find the best solution x in N(xk) :F(x) = min{F(x) :x N(xk)};

    ifF(x)< F(xk) then xk+1 :=x, F :=F(x), x =x and k := k+ 1

    else return x, F and stop.

    Figure 3.6: The steepest descent metaheuristic

    Note that steepest descent really is a metaheuristic, not an algorithm. In particular, it cannot be

    applied directly to any particular CO problem until the initialization procedure has been described or,

    more fundamentally, until the neighborhood structure has been specified for this problem.

    Note also that, when dealing with maximization (rather than minimization) problems, we speak of

    steepest ascent rather than steepest descent.

    We now proceed with a number of further comments on this framework.

    3.5.1 Initialization

    How should we select x1 ? Intuitively, it seems preferable to start from a good solution, such as a

    solution selected by a list-processing heuristic. Experiments show, however, that this is not necessarily

    the case and that starting from a random solution may sometimes be a good idea. The influence of the

    initial solution may be reduced if we execute several times the algorithm with different initial solutions.

  • 8/11/2019 AOR Syllabus20132014

    35/115

    30 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    3.5.2 Local minima

    By definition, steepest descent heuristics terminate with a local optimum of CO which is not necessarily

    a global optimum.

    For example, consider the following instance of the knapsack problem:

    max2x1 3x2+ x3+ 4x4 2x5

    subject to 2x1 3x2+ 2x3+ 3x4 x5 2

    xi {0, 1} for i= i = 1, 2, . . . , n

    and consider the neighborhood structure defined by N1(x) ={y X :5

    i=1 |xi yi| 1}.

    Suppose that the initial solution is x1 = (0, 0, 0, 0, 0) and F = 0. Then, steepest ascent sets x2 =

    (1, 0, 0, 0, 0), F = 2 and stops.

    Suppose now that we start with another initial solution, say x1 = (0, 1, 0, 0, 1) and F =5. Then,

    we successively getx2 = (0, 1, 0, 1, 1),F =1, and nextx3 = (0, 0, 0, 1, 1),F = 2. The algorithm stops

    with the local maximum x =x3.

    So, in both cases, we have only found local maxima, whereas the global maximum isx = (1, 1, 0, 1, 0),

    with F(x) = 3.

    3.5.3 Choice of neighborhood structure

    A further observation (closely related to the previous one) is that, when N(x) is too small, the risk of

    missing the global optimum is high. But conversely, whenN(x) is large, the heuristic may spend a lot

    of time exploring the neighborhood of the current solution in order to determine x. This is another

    manifestation of the quality vs. time trade-off already mentioned in Section 2.2.3.

    This is illustrated (although caricaturally) by considering two extreme cases:

    ifN(x) ={x} for all x X(a very small neighborhood, indeed), then the algorithm stops at the first

    iteration and simply returns the initial solution;

    at the other extreme, ifN(x) =X for all x X, then x is the global optimum of the problem (which,

    of course, may be very hard to find).

    More interestingly, this brief discussion points to the fact that the subproblem min{F(x) :x N(xk)}which is to be solved at every iteration of steepest descent is fundamentally a problem of the same nature

    as CO itself, but over a restricted region of the search space. In many cases, this subproblem will be

    solved by exhaustive search, i.e., by complete enumeration of all solutions in N(x). This observation may

    guide the choice of an appropriate neighborhood structure.

  • 8/11/2019 AOR Syllabus20132014

    36/115

    3.5. STEEPEST DESCENT 31

    3.5.4 Selection of neighbor

    Some variants of the algorithm do not completelyexplore the neighborhood of the current solution xk in

    order to find x, but rather select, for instance, the first solution x such that F(x)< F(xk) found during

    the exploration phase, or the best solution among the first ten candidates, etc. (This is akin to the partial

    pricing strategy used in certain implementations of the simplex method for linear programming.)

    3.5.5 Fast computation of the objective function

    When exploring the neighborhood N(xk) of the current solution, it is sometimes possible to improve

    efficiency by avoiding to recompute F(x) from scratch for all x N(xk), and by making use of the

    information that is already available about the value ofF(xk).

    For example, assume (as in a knapsack problem) that F(xk) =n

    j=1 cjxkj and thatx N1(x

    k) differs

    fromxk only on the 5th component. How should we computeF(x) in this case? Brute force computation

    of the expression F(x) =n

    j=1 cjxj requires n multiplications and n 1 additions. By contrast, only 2

    multiplications and 2 additions are required if we notice that F(x) =F(xk) c5xk5+ c5x5. Similarly, if

    X={x|n

    j=1 ajxj b}, then we can check whether x Xby storing the valuen

    j=1 ajxkj and simply

    checking whethern

    j=1 ajxkj a5xj

    k + a5xj b.

    Let us consider another example. For the traveling salesman problem, let Cbe a feasible tour (set of

    edges) with length L(C). After the 2-exchange displayed in Figure 3.7, we obtain a tour C with length

    L(C) =L(C) dij dkl+ dik+ djl ,

    and the computation ofL(C) only requires 4 additions ifL(C) is available.

    3.5.6 Flat objective functions

    Consider the graph coloring (or chromatic number) problem. Here, X is the set of all feasible colorings

    and F(x) is the number of colors used in coloring x. We can define the neighborhood of coloringx as

    consisting of all the colorings which can be obtained by changing the color of at most one vertex in x.

    Suppose for instance that Figure 3.8 represents an arbitrary coloring. The color of each vertex

    is indicated next to it, by a number between brackets (this is the coloring provided by the smallest

    available color heuristic when the vertices are explored in the order v1, v2, . . . , v10). In this case, no

    neighbor improves the initial solution. Intuitively, the objective function is flat in the neighborhood

  • 8/11/2019 AOR Syllabus20132014

    37/115

    32 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    C

    i

    j

    k

    l

    C

    i

    j

    k

    l

    Figure 3.7: 2-exchange

    v4(1) v5(3)

    v9(4) v10(4)

    v7(1) v8(3)

    v2(2) v3(2)

    v6 (2)

    v1(1)

    Figure 3.8: A feasible coloring with 4 colors

  • 8/11/2019 AOR Syllabus20132014

    38/115

    3.6. SIMULATED ANNEALING 33

    of the current solution (all neighbors have the same objective function value) and it is difficult to find

    descent directions (see Figure 3.9).

    A possible remedy to this difficulty is to modify both F(x) and X in the definition of the problem!For instance, let us select a tentative number of colors C (for example, C= 3) and define

    XC={ colorings ofVusing the colors {1, 2, . . . , C } },

    where the colorings are not necessarily required to be feasible (cf. Section 2.1.6), and let

    F(x) = number of monochromatic edges induced by coloring x.

    So, in Figure 3.10, F(x) = 4.

    Of course, the graph can be colored with Ccolors if and only if min{F(x)| x XC}= 0. In other

    words, the chromatic number of a graph is the smallest value ofCfor which min{F(x)| x XC}= 0.

    So, the original graph coloring problem can be transformed into a sequence of problems of the form

    (XC, F), for decreasing values ofC.

    Note now that the objective function F(x) is not flat, as opposed toF(x). For instance, changing the

    color ofv4 from 1 to 3 yields F(x) = 2. Next, changing the color ofv8 from 1 to 3 leads to F

    (x) = 0,

    meaning that the graph is feasibly colored with 3 colors.

    3.5.7 Exercises.

    Exercise 1. Explain why the simplex algorithm for linear programming can be called a steepest desccent

    method.

    Exercise 2. Show that changing the color ofv9 from 1 to 3 in Figure 3.10 leads to a local optimum of

    F(x).

    3.6 Simulated annealing

    The major weakness of steepest descent algorithms is that they tend to stop too early, i.e. they get

    trapped in local optima of poor quality. How can we avoid this weakness?

    A possible solution is to run the algorithm repeatedly from multiple initial solutions. This multistart

    strategy may work well in some cases, but other, more complex approaches have proved to be much more

    powerful for large, difficult instances of CO problems.

    In this section, we want to explore the following ideas.

  • 8/11/2019 AOR Syllabus20132014

    39/115

    34 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    F(x)

    xxk

    Figure 3.9: Flat objective function

    v4(1) v5(1)

    v9(1) v10(3)

    v7(1) v8(1)

    v2(2) v3(2)

    v6 (2)

    v1(1)

    Figure 3.10: An infeasible coloring with 3 colors

  • 8/11/2019 AOR Syllabus20132014

    40/115

    3.6. SIMULATED ANNEALING 35

    Idea # 1. In order to escape local minima, it may be useful to take steps which deteriorate the objective

    function, at least once in a while. One way to achieve this goal may be to replace xk by a neighborxk+1

    chosen randomly in N(xk

    ). This idea has shown to be especially useful when combined with the nextingredient.

    Idea # 2. Select a good neighbor with higher probability than a bad one.

    Taken together, these two ideas result in the very popular simulated annealing algorithm. Various

    aspects of the implementation of SA algorithms are discussed at length, for instance, in two papers by

    Johnson, Aragon, McGeoch and Schevon (1989, 1991) or in Pirlot (1992). We only provide here some

    basic elements of information and we refer to these papers for additional details.

    3.6.1 The simulated annealing metaheuristic

    The generic framework of the simulated annealing metaheuristic is shown in Figure 3.11. We suppose

    again that a particular neighborhood structure has been selected and we use the same notations x, F, xk

    as in the steepest descent heuristic. Moreover, we assume that for k = 1, 2, . . ., a number (calledtransition

    probability) 0< pk

  • 8/11/2019 AOR Syllabus20132014

    41/115

    36 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    1. Selectx1 X, set F :=F(x1), x =x1 andk:= 1.

    2. Repeat:

    Choose x randomly in N(xk) (Propose a move).

    IfF(x)< F(xk) then AcceptMove(x) else Toss(xk, x).

    Evaluate the stopping conditions.

    If Terminate =True then return x, F and stop, else continue.

    Procedure AcceptMove(x)

    xk+1

    :=x (Accept the move). ifF(x)< F(x) then F :=F(x), x :=x.

    Procedure Toss(xk, x)

    let xk+1 :=x with probability equal to pk (Accept the move)

    else, let xk+1 :=xk (Reject the move).

    Procedure Stopping conditions

    if the stopping conditions are satisfied then Terminate := True

    else k := k + 1 and Terminate :=False.

    Figure 3.11: The simulated annealing metaheuristic

    cooling schedulewhereby the temperature decreases by a constant factor (the cooling factor) after a

    constant number L of iterations. The iterations performed at constant temperature constitute a plateau

    (see Figure 3.14).

    3.6.3 Stopping criteria

    Note that, contrary to local search, simulated annealing may perform an infinite number of iterations ifwe do not impose some limitation on its running time. So, when should we terminate the process ?

    A common criterion is to stop when a large number of iterations has been performed without any

    improvement in the objective function and when the process seems to be stalling. One way to implement

    this idea requires to select two positive numbers, say 2 and K2 (for example, 2 = 2 and K2 = 5). The

  • 8/11/2019 AOR Syllabus20132014

    42/115

    3.6. SIMULATED ANNEALING 37

    Figure 3.12: Fixing p(k)

    Figure 3.13: Fixing p(k) II

    Tk

    k

    T0

    T0

    2T0

    3T0

    L 2L 3L 4L

    Plateau

    ........................

    .................

    ..............

    . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Figure 3.14: A geometric cooling schedule

  • 8/11/2019 AOR Syllabus20132014

    43/115

  • 8/11/2019 AOR Syllabus20132014

    44/115

    3.6. SIMULATED ANNEALING 39

    1. Selectx1 X, set F :=F(x1), x =x1, k := 1 and T :=T0.

    2. Repeat:

    Choose x randomly in N(xk) (Propose a move).

    IfF(x)< F(xk) then AcceptMove(x) else Toss(xk, x).

    Evaluate the stopping conditions.

    If Terminate =True then return x, F and stop, else continue.

    Procedure AcceptMove(x)

    xk+1 :=x (Accept the move).

    ifF(x)< F(x) then F :=F(x), x :=x.

    Procedure Toss(xk, x)

    compute F :=F(x) F(xk) andpk =eFT (transition probability)

    draw a number u, randomly and uniformly distributed in [0,1]

    ifu pk then xk+1 :=x (Accept the move)

    else xk+1 :=x

    k

    (Reject the move).

    Procedure Stopping conditions

    if the number of iterations since the last decrease of temperature is less than L

    then k := k + 1 and Terminate := False(Continue with the same plateau)

    else

    if no improvement ofF has been recorded and if fewer than 2 % of the moves have been

    accepted during the last K2 temperature plateaus

    then Terminate := True

    elseT :=T (decrease T), k := k+ 1 and Terminate := False. (Proceed to the next plateau)

    Figure 3.15: Implementing the simulated annealing metaheuristic

  • 8/11/2019 AOR Syllabus20132014

    45/115

  • 8/11/2019 AOR Syllabus20132014

    46/115

    3.6. SIMULATED ANNEALING 41

    A potential problem is that the choice x1 = 1 is maybe very bad and, during each iteration, the

    probability of reversing this choice is only 1n

    . To solve this, here is a possible remedy: at the beginning of

    steps 1, n + 1, 2n + 1,..., generate a random permutation of indices 1,...,n.; for example, (5, 3, 6, 2, 1,...).During the nextn iterations, generate the neighbors obtained by modifying each coordinates in the order

    defined by the permutation. In other words:

    step 2 :x15 becomes 1 x15= x

    25

    step 3 :x23 becomes 1 x23= x

    33

    step 4 :x36 becomes 1 x36= x

    46

    step 5 :x42 becomes 1 x42= x

    52

    or

    (1,0,0,1,0,1,...)

    (1,0,0,1,1,1,...)

    (1,0,1,1,1,1,...)

    (1,0,1,1,1,0,...)

    (1,1,1,1,1,0,...)

    Thus, aftern iterations, each coordinate had the opportunity to be modified at least once (subject to

    the acceptation of the move).

    Approximate exponentiation

    The computation time ofeFT is quite high. A non-negligible speedup can be obtained if we replace this

    expression by its approximation 1 FT

    (25 times faster for comparable quality; see Oliveira, Ferreira,

    and Vidal (1993) for details).

    Once again, we refer the reader to Aarts and Lenstra (1997), Johnson, Aragon, McGeoch and Schevon

    (1989, 1991), Pirlot (1992) and to other references in the bibliography for more information on simulated

    annealing algorithms.

  • 8/11/2019 AOR Syllabus20132014

    47/115

    42 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    3.7 Tabu search

    (To be revised and completed...)

    3.7.1 Introduction

    Idea : at each iteration, choose a neighbor x ofxn that minimizes F(x) in N(xn)1

    Consider the following example:

    max x1+ 10x2+ 3x3+ 7x4+ 6x5

    subject to 2x1+ 6x2+ 5x3+ 8x4+ 3x5 16

    xj {0, 1}, j = 1, ..., 5

    The neighbors are solutions within a Hamming distance of 1. Let x0 = (0, 0, 0, 0, 0) be the initial

    solution. Thus we might have

    x0 = (0, 0, 0, 0, 0)

    x1 = (0, 1, 0, 0, 0)

    x2 = (0, 1, 0, 1, 0)

    x3 = (1, 1, 0, 1, 0)

    x4 = (0, 1, 0, 1, 0)

    Here, x4 = x2, underlying a danger of this method : the cycling problem. Now, suppose that coming

    back to the last explored solution is forbidden. We could have therefore

    x4 = (1, 1, 0, 0, 0) (x2 is tabu)

    x5 = (1, 1, 0, 0, 1) (x3 is tabu)

    x6 = (1, 1, 1, 0, 1) (optimal solution)

    Note that the interested reader will find a generic description of this problem in Section 4.1 of Pirlot

    (1992).

    3.7.2 The algorithm

    Initialization: selectx1 X; F =F(x1); x x1 and the Tabu List T L:= .

    Step k (with k = 1,2,...):

    Choose the best neighbor x ofxk that is not tabu

    F(x) = min{F(x) :x N(xk), x /T L}.

    1This is the steepest descent mildest ascent method. See Hansen and Jaumard (1990) for more about this topic.

  • 8/11/2019 AOR Syllabus20132014

    48/115

  • 8/11/2019 AOR Syllabus20132014

    49/115

    44 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    i

    j

    k

    l

    i

    j

    k

    l

    Figure 3.18: Tabu list

    x1 :F(x1) = 5.

    The best move is to change vertex (1) into V (or vertex (3) into V or vertex (4) into V). That way,

    F(x2) = 3 and T L= {(1, B)}.

    Then, the best move is to change vertex (3) into V. That way, F(x3) = 1 andT L= {(1, B), (3, B)}.

    All the moves increase theFfunction. Choose for instance to change vertex (4) into V F(x4) = 3

    and T L= {(1, B), (3, B), (4, B)}.

    The best move is to change vertex (1) into B, which is tabu, but we accept it since it satisfies the

    aspiration criterion. Thus, F(x5) = 1 and T L= {(1, B), (3, B), (4, B), (1, V)}.

    ...

  • 8/11/2019 AOR Syllabus20132014

    50/115

    3.7. TABU SEARCH 45

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    B V B

    R

    B V

    B V

    R

    R R V V

    B

    B

    Figure 3.19: Chromatic number - Tabu search

  • 8/11/2019 AOR Syllabus20132014

    51/115

    46 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    3.8 Genetic algorithms

    3.8.1 Introduction

    Steepest descent, simulated annealing and tabu search are designed to improve an initial solution by

    exploring solutions that are close to it. This approach is sometimes called an intensification strategy

    since it allows to intensify the search in the vicinity of a current solution.

    A major drawback of such strategies is that they cannot easily reach areas that are very distant from

    the initial solution; that is, they cannotdiversifythe exploration of the feasible set X.

    A possible remedy to this drawback is to apply the algorithm a large number of times from many

    different initial solutions (multistart, or sampling strategy). But here again, several problems occur: first,

    a large number of times (1000 times, 10000 times) can still be quite small as compared to the size ofthe space to explore. Second, it is hard to ensure that the sample of initial solutions faithfully represents

    the setX. 2

    Genetic algorithms (GA) offer a specific, quite powerful approach to the diversification issue (see e.g.

    Goldberg (1989)). In fact, they alternate diversification and intensification phases. At each iteration,

    they produce a population (i.e., a subset of solutions): at step k, the population is denoted X(k) =

    {x(k)1 , x

    (k)2 ,...,x

    (k)N } X.

    3.8.2 Diversification via crossover

    Consider a pair of solutions x and y (to be called parents) in the current population. We can combine

    these solutions to produce one or two new solutions (called children) u and v that share some features

    of both x and y. The operator that associates a child (or two children) to a pair of parents is called

    crossover.

    Intuitively (and just as in real life), the children obtained by crossover should look like their parents,

    but should also introduce some diversity in the current population.

    Suppose for example thatx andy are binary vectors:

    x= (11010011)y= (01100101)

    A possible crossover operator produces the single child u, where ui = xi with probability 0.5 and

    ui= yi with probability 0.5. For our example, this operator could produce the child

    2This is a general problem with sampling methods.

  • 8/11/2019 AOR Syllabus20132014

    52/115

    3.8. GENETIC ALGORITHMS 47

    u= (11100111).

    Note that the second, fifth and eighth elements (underlined) ofu are predetermined since they are

    common tox and y.Another crossover method works by randomly choosing an index i, splitting x and y at coordinate i

    and exchanging the initial segments ofx and y. For instance, with i = 4 in the previous example, we

    produce two children:

    u= (1101|0101)

    v= (0110|0011).

    The crossover operators defined above are uniform operators, meaning: ifz = (z1,...,zn) is a child of

    x = (x1,...,xn) and y = (y1,...,yn), then either (zi =xi) or (zi =yi). Note that nonuniform crossovers

    are also frequently used in the literature (see Mulhenbein (1997) for details).

    Ideally, the new individuals created by crossover should inherit desirable features from their parents:

    we would like to produce good children from good parents. This goal can be achieved by combining

    the following elements:

    When picking a pair of parents to mate, good parents should be selected with a higher probability

    than bad ones. For instance, x andy could be drawn in X(k) with probability equal to

    Prob(x) = Fmax F(x)Nj=1[Fmax F(xj)]

    (3.1)

    whereFmax= max{F(xj) :j = 1,...,N}. See Table 3.1 for an example.

    Common features of the parents (those that are expected to be typical of good solutions) or,

    at least, some of those features, should be preserved when producing children (see later). As an

    example, consider the traveling salesman problem. If the salesman is to visit every European

    capital, then, in a reasonable tour, Helsinki and Madrid will never be visited successively (neither

    will London and Athens). This feature should be preserved when crossing two reasonable parents.

    3.8.3 A basic genetic algorithm

    We are now ready to describe a primitive genetic algorithm for the combinatorial optimization problem

    min{F(x) : x X}. The algorithm depends on the choice of a crossover operator, and on the choice of

    a probability distribution Prob(.) defined on every finite subset ofX. Let us assume that the following

    parameters have also been selected: N(the population size) and M (the number of children produced in

    each generation) with MN. Then, the basic genetic metaheuristic is presented in Figure 3.20.

  • 8/11/2019 AOR Syllabus20132014

    53/115

    48 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    X(k) F(x) Prob

    x1 F(x1) = 15 0

    x2 F(x2) = 12 3/13

    x3 F(x3) = 10 5/13

    x4 F(x4) = 10 5/13

    Table 3.1: Genetic algorithms: selecting good parents

    1. Initialization: Select an initial population X(1) X with |X(1)| = N, set F := min{F(x) :

    x X(1)}, x := argmin{F(x) :x X(1)}andk := 1.

    2. Repeat:

    Selection of parents: Create a new temporary populationY(k) ={y1,...,y2M}, drawn randomly

    (with replacements) from X(k) according to the distribution Prob(x).

    Crossover: Forj = 1,...,M, cross the pair of parents (y2j1, y2j) to produce the set of children

    Z(k) ={z1,...,zM}.

    Survival of the fittest: Draw randomly N M elements from X(k) (with probability

    Prob(x)) and add them to Z(k) in order to create the next-generation population X(k+1) =

    {x(k+1)1 ,...,x

    (k+1)N }. (An alternative procedure would draw Nelements from X

    (k) Z(k).)

    Letx:= argmin{F(x)| x X(k+1)}. IfF(x)< F then F :=F(x) and x :=x.

    If the stopping criterion is satisfied then returnx

    , F

    and stop, else letk := k +1 and continue.

    Figure 3.20: A simple genetic metaheuristic

  • 8/11/2019 AOR Syllabus20132014

    54/115

    3.8. GENETIC ALGORITHMS 49

    Let us formulate some comments on this algorithm:

    1. A mutationphase is sometimes added to this basic algorithm, for instance after the step Survival

    of the fittest. A mutation operator replaces each element ofX(k+1), with low probability , by a

    randomly selected neighbor of this element. In other words, with probability , each element is

    slightly perturbed. For example, (100101) could be replaced by its mutant (101101). The objective

    of this operation is to increase the amount of diversification in a population. However, many

    researchers consider nowadays that mutation does not significantly improve the performance of

    GAs.

    2. Possible stopping criteria are, as usual: a limit on the total number of iterations, convergence ofF,

    a measure of the gap between F and a lower bound on min F(x), etc. For GAs, another criterion

    is also commonly used. Let us define the fitnessof population X(k) as the average value ofF(x)

    over Xk, that is the value:

    Qk = 1

    |X(k)|

    xX(k)

    F(x).

    Convergence ofQk toward a fixed value indicates that the population is increasingly homogeneous

    and that the procedure reaches a stationary state. Thus, if the difference |Qk+1 Qk| is small for

    several successive iterations, then the algorithm can stop.

    In its primitive form, the genetic algorithm presented above is generally not a very efficient approach

    to the solution of hard combinatorial optimization problems. Before it becomes a practical method,

    some enhancements have to be added to this basic scheme. In the next subsections, we proceed with a

    discussion of such possible refinements.

    3.8.4 Intensification and local search

    In the simple GA outlined above, the average quality (or fitness) of a population is driven up by a single

    factor in the course of iterations, namely: the random bias introduced in the selection of parents and in

    the selection of the fittest step. However, by itself, this bias is generally insufficient to significantly

    improve a bad initial population.

    Moreover, in spite of everything we said earlier, solutions (children) arising from a crossover operation

    are frequently quite different from their parents and may turn out to be much worse.

    These observations lead to an improvement of the GA scheme which is conceptually simple, but

    very powerful in practice: it consists in introducing a local search (intensification) phase within the

  • 8/11/2019 AOR Syllabus20132014

    55/115

    50 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    diversification strategy of GA. This is simply done, for instance, by adding the following step right after

    the crossover step. (Some authors speak ofmemetic algorithmswhen this step is introduced in the basic

    GA scheme.) Local improvement: Forj = 1, 2, . . . , M , letzj be the best solution produced by a local search algorithm

    (either greedy, or steepest descent, or SA,...) starting from zj as initial solution. Replace zj by zj in

    Z(k).

    In picturesque terms, we could say that children must be raised before they can be incorporated in the

    population. More abstractly, with the above modification, we can view GA as performing a succession of

    multistart rounds, where each round is initialized from members of the current population.

    Whatever the interpretation, interlacing the basic GA scheme with some form of local search seems to

    be asine qua nonecondition for the efficiency of the procedure. Let us illustrate this on some examples.

    Example: Knapsack problem. Consider the knapsack problem

    max cx

    subject to ax b andx {0, 1}n

    and the particular instance:

    max 2x1+ 3x2+ 5x3+ x4+ 4x5

    subject to 5x1+ 4x2+ 4x3+ 3x4+ 7x5 14xi {0, 1} for i = 1, . . . , 5.

    We use the following crossover operator: if the parents are x and y, then the child z has zi = 1 when

    xi= yi= 1, and zi= 0 otherwise (1, 2, . . . , n). (The child inherits an object only if both his parents own

    it.) So we obtain for instance:

    x= 11010, value = 6

    y= 10001, value = 6

    z = 10000, value = 2

    Note that this crossover, even though it ensures feasibility of the children, will systematically produce

    children of lower quality than their parents.

    Assume now that we apply a variant of the classical greedy algorithm during the improvement phase:

    first, we sort the indices (1, 2, . . . , n) by nonincreasing ratios cjaj

    . Then, without changing the components

  • 8/11/2019 AOR Syllabus20132014

    56/115

    3.8. GENETIC ALGORITHMS 51

    ofz that are already equal to 1, we run through L and fix the next variable to 1 as long as the knapsack

    constraint is not violated.

    In our example, this procedure yields the priority listL = (3, 2, 5, 1, 4), and successively produces thesolutions: z = 100001010011100 =z, stop (with value = 10).

    An alternative interpretation of the previous approach is to consider the local optimization step as

    a feature of the crossover operator itself, rather than as an addition to it. (Even though both points of

    view are in a sense equivalent, it is sometimes interesting to look at them from different angles.)

    To illustrate this idea, let us again consider the knapsack problem and consider a priority list L on

    {1, 2, . . . , n}. Then, we can define an optimizing crossover operator as follows: to compute the child z

    ofx andy , we go through L and we let

    zi= 1 if either xi= 1 or yi= 1 and if this results in a feasible solution;

    zi= 0 otherwise.

    (Another description of the same heuristic is: restrict the attention to those objects that have b een

    selected at least once in either x or y , and apply the greedy heuristic to this subset of objects.)

    For the above example, the list L = (3, 2, 5, 1, 4) leads to

    x= 11010

    y= 10001

    z= 01011

    The resulting solution z has value 8 (better than both its parents).

    Example: Traveling salesman problem.

    Considering the local optimization step as a feature of the crossover operator can similarly be applied

    to the traveling salesman problem, as explained for instance in Hoos and Stutzle (2005), Kolen and Pesch

    (1994), Merz and Freisleben (2001).

    Suppose that T and T are two distinct solutions of the traveling salesman problem, viewed as sets

    of edges. A child ofT and T can be produced by keeping all edges that occur in both parent solutions,

    and by using a greedy procedure to complete the resulting partial solution T T.

    Merz and Freisleben (2001) propose more specifically to apply the DPX crossover operator shown in

    Figure 3.21 (we skip some details). They show that variants of this crossover operator, when combined

    with effective local improvement steps, provide excellent solutions for the TSP.

  • 8/11/2019 AOR Syllabus20132014

    57/115

    52 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS

    DPX Crossover:

    1. computeC= T T; letP1, P2, . . . , P k be the subpaths that make up C, and letuj , vj

    be the endpoints of subpath Pj forj = 1, 2, . . . , k;

    2. while C is not a tour, repeat

    ifC is a path containing all vertices, then add the missing edge that closes the

    tour; else,

    choose randomly one of the endpoints uj ;

    choose the closest vertex to uj among all verticesw {u1, v1, u2, v2, . . . , uk, vk},

    w / {uj , vj}, such that the edge (uj , w) is not included in T T;

    add the edge (uj , w) to C;

    3. return C;

    Figure 3.21: DPX crossover for the TSP

    Such adaptations of the basic genetic algorithm allow to enrich it with some heuristics that have

    been specifically developed for the problem at hand. Indeed, whereas the special features of a problem

    are usually included quite naturally in a steepest descent or in a simulated annealing algorithm (via theneighborhood structure), this is not immediately true in the basic GA formulation displayed in Figure

    3.20.

    A similar objective can sometimes be attained through a judicious encoding of the solutions. The