Multi-GPU Island-Based Genetic Algorithm for Solving the ...€¦ · Multi-GPU Island-Based Genetic Algorithm for Solving the Knapsack Problem Jiri Jaros 1 Overview The Genetic Algorithms

Multi-GPU Island-Based Genetic Algorithm for Solving the Knapsack Problem

Jiri Jaros

1 Overview

The Genetic Algorithms (GAs) have become widely applied

optimization tools since their development by Holland in 1975 [1].

One of the famous NP-hard problems successfully solved by

GAs is the knapsack. However, millions of candidate solutions

have to be created and evaluated for large problem instances

rising the execution time up to hours and days [2]. The latest

GPUs are about 15 times faster than six-core Intel CPUs which

opens new possibilities for massive acceleration of GAs [3].

2 Multi-GPU Cluster Systems

The availability of multiple PCI-Express buses, even on very low

cost commodity computers, means that it is possible to construct

cluster nodes with multiple GPUs. Inter-node communications

are done via MPI over a high speed network while intra-node

communications exploit CPU shared memory.

3 Multi-GPU Island-Based Genetic Algorithm

The population of the GA is distributed over multiple GPUs.

Every GPU, controlled by a single MPI process, entirely evolves

a single island. Migration of individuals occurs after a predefined

number of generations exchanging the best local solution and an

optional number of randomly selected individuals over a ring

topology.

4 Local GPU Island Implementation Details [4]

• 32 knapsack items packed into a single integer value

• Individuals processed by CUDA WARPS in multiple rounds

• Most CUDA block barriers removed

• Negligible thread divergence ( < 0.5%)

• Blocks of knapsack data shared within the block

• Uniform crossover, bit-flip mutation, binary tournament

5 Experimental Results

• Highly optimized CPU implementation running on 4 6-core

Intel Xeon processors with 40Gb infiniband interconnection

• CUDA implementation running on 14 NVIDIA GTX 580

Knapsack problem with 10,000 items

6 Conclusions

The proposed multi-GPU island-based GA allows the solution of

large-scale instances of the knapsack problem.

The significant benefits:

• Speedups up to 35, 194, 781 (14 GPUs vs. 24, 6, 1 cores)

• Overall performance of 5.67 TFLOPS (14 GPUs)

• Overall efficiency of 26%

The codes will be released as an open-source software

(http://www.fit.vutbr.cz/~jarosjir).

[1] J. H. Holland, “Adaptation in Natural and Artificial Systems”, Ann Arbor, no. 53. University of Michigan Press, 1975, p. 211

[2] Z. Michalewicz and J. Arabas, “Genetic algorithms for the 0/1 knapsack problem”, in Lecture Notes in Computer Science, 1994, vol. 869/1994, 134-143

[3] V. W. Lee et al., “Debunking the 100X GPU vs. CPU myth,” in Proceedings of the 37th annual international symposium on Computer architecture - ISCA ’10, 2010, p. 451

[4] J. Jaros and P. Pospichal, “A Fair Comparison of Modern CPUs and GPUs Running the Genetic Algorithm under the Knapsack Benchmark”, in Applications of Evolutionary Computation, Heidelberg, DE, Springer, 2012, p. 426-435

College of Engineering and Computer Science, Australian National University

This research has been partially supported by the

research grant "Natural Computing on

Unconventional Platforms", GP103/10/1517,

Czech Science Foundation (2010-13).

Evaluate the local GPU island

Select emigrants in the local GPU island

Transfer the emigrants from GPU to CPU and then over

network

Receive immigrants from network by CPU and upload them to

GPU

Incorporate the immigrants into the local GPU island

MPI

process

2820

2825

2830

2835

2840

128 256 512 1024 2048

Fitn

ess

val

ue

x1

00

0

Individuals per local island

Solution quality

1 GPU 6 GPUs7 GPUs 12 GPUs14 GPUs

0

20

40

60

80

100

120

140

128 256 512 1024 2048

Exe

cuti

on

tim

e [

s]


Total execution time

1 GPU 6 GPUs 7 GPUs 12 GPUs 14 GPUs

0

10

20

30

40

50

60

128 256 512 1024 2048

Spe

ed

up

vs.

sin

gle

-th

read

CP

U


Speedup on a single island reached by multicore CPU and GPU

1xGPU

2x6 CPU threads

6 CPU threads

048

1216202428323640

128 256 512 1024 2048

Spe

ed

up

vs.

4 6

-co

re X

eo

ns


GPU Speedup vs. 4 6-core Xeons

1 GPU

6 GPUs

7 GPUs

12 GPUs

14 GPUs

Inte

rcon

ne

ction

Netw

ork

Multi-GPU Island-Based Genetic Algorithm for Solving the ...€¦ · Multi-GPU Island-Based Genetic Algorithm for Solving the Knapsack Problem Jiri Jaros 1 Overview The Genetic Algorithms

Documents