Top Banner
A Step-by-Step Extending Parallelism Approach for Enumeration of Combinatorial Objects Hien Phan 1 , Ben Soh 1 , and Man Nguyen 2 1 Department of Computer Science and Computer Engineering, LaTrobe University, Australia 2 Faculty of Computer Science and Engineering, University of Technology, Ho Chi Minh City, Vietnam Abstract. We present a general step-by-step extending approach to par- allel execution of enumeration of combinatorial objects (ECO). The method- ology extends a famous enumeration algorithm, OrderlyGeneration, which allows concurrently generating all objects of size n + 1 from all objects of size n. To the best of our knowledge, this is the first time there is an at- tempt to plug parallel computing into OrderlyGeneration algorithm for ECO problem. The potential impact of this general approach could be ap- plied for many different servants of ECO problem on scientific computing areas in the future. Our work has applied this strategy to enumerate Or- thogonal Array (OA) of strength t, a typical kind of combinatorial objects by using a implementation with MPI paradigm. Several initial results in relation to speedup time of the implementation have been analyzed and given significant efficiency of the proposed approach. 1 Introduction Enumeration of combinatorial objects (ECO) remains an important role in com- binatorial algorithms. Many scientific applications have been using results from typical servants of enumeration of combinatorial objects, such as maximal clique enumeration (MCE), hexagonal system enumeration and enumeration of orthog- onal arrays. For example, the solutions of MCE related problems are used to align 3-dimensional protein structures [CC05] and to find clusters of orthologous genes [PSK + 07]. Hexagonal systems play an important topic in computational and theoretical chemistry [BCH03] whilst orthogonal arrays could be applied in Design of Experiment [LYPP03] and software testing [LM08]. Usually, we are interested in enumerating or producing precisely one represen- tative from each isomorphism class. In many cases, the only available methods for enumeration base on the exhaustive generation for counting the objects. Several general serial algorithms have been proposed for enumeration of combinatorial objects ( [McK98], [Far78], [Rea79], [AF93]) and we call them isomorph-free exhaustive generation algorithms. One of the major characteristics of ECO is the huge effort needed to complete the computation concerned. Therefore, a parallelism method for solving ECO C.-H. Hsu et al. (Eds.): ICA3PP 2010, Part I, LNCS 6081, pp. 463–475, 2010. c Springer-Verlag Berlin Heidelberg 2010
13

A Step-by-Step Extending Parallelism Approach for Enumeration of Combinatorial Objects

Mar 22, 2023

Download

Documents

Vu Le
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Step-by-Step Extending Parallelism Approach for Enumeration of Combinatorial Objects

A Step-by-Step Extending Parallelism Approach

for Enumeration of Combinatorial Objects

Hien Phan1, Ben Soh1, and Man Nguyen2

1 Department of Computer Science and Computer Engineering, LaTrobe University,Australia

2 Faculty of Computer Science and Engineering, University of Technology, Ho ChiMinh City, Vietnam

Abstract. We present a general step-by-step extending approach to par-allel execution of enumeration of combinatorial objects (ECO).Themethod-ology extends a famous enumeration algorithm,OrderlyGeneration, whichallows concurrently generating all objects of size n + 1 from all objects ofsize n. To the best of our knowledge, this is the first time there is an at-tempt to plug parallel computing into OrderlyGeneration algorithm forECO problem. The potential impact of this general approach could be ap-plied for many different servants of ECO problem on scientific computingareas in the future. Our work has applied this strategy to enumerate Or-thogonal Array (OA) of strength t, a typical kind of combinatorial objectsby using a implementation with MPI paradigm. Several initial results inrelation to speedup time of the implementation have been analyzed andgiven significant efficiency of the proposed approach.

1 Introduction

Enumeration of combinatorial objects (ECO) remains an important role in com-binatorial algorithms. Many scientific applications have been using results fromtypical servants of enumeration of combinatorial objects, such as maximal cliqueenumeration (MCE), hexagonal system enumeration and enumeration of orthog-onal arrays. For example, the solutions of MCE related problems are used toalign 3-dimensional protein structures [CC05] and to find clusters of orthologousgenes [PSK+07]. Hexagonal systems play an important topic in computationaland theoretical chemistry [BCH03] whilst orthogonal arrays could be applied inDesign of Experiment [LYPP03] and software testing [LM08].

Usually, we are interested in enumerating or producing precisely one represen-tative from each isomorphism class. In many cases, the only available methods forenumeration base on the exhaustive generation for counting the objects. Severalgeneral serial algorithms have been proposed for enumeration of combinatorialobjects ( [McK98], [Far78], [Rea79], [AF93]) and we call them isomorph-freeexhaustive generation algorithms.

One of the major characteristics of ECO is the huge effort needed to completethe computation concerned. Therefore, a parallelism method for solving ECO

C.-H. Hsu et al. (Eds.): ICA3PP 2010, Part I, LNCS 6081, pp. 463–475, 2010.c© Springer-Verlag Berlin Heidelberg 2010

Page 2: A Step-by-Step Extending Parallelism Approach for Enumeration of Combinatorial Objects

464 H. Phan, B. Soh, and M. Nguyen

could allow us to reduce the execution time significantly and generate new resultsusing the power of high performance computing system.

Strengthened by the above objective, the general parallelism approach pre-sented in this paper efficiently decomposes the computation-intensive nature ofECO problem. All objects of a new size will be concurrently enumerated from allgenerated objects of the previous size, not from scratch. This saves a lot of costfor regenerating old objects of previous size, especially when the target size S ishuge. Moreover, the trivial data parallelism strategy applied for each extendingstep could gives an efficient speedup and the scalability.

The rest of this paper is organized as follows: Section 2 presents an overviewabout general algorithms for isomorph-free exhaustive generation. In Section 3, astep-by-step extending parallelismapproach for ECO will be proposed. Since thereare many different servants of ECO problem, we choose a specific servant of ECOand apply our general proposed approach to do some experiments. This case studyis discussed in Section 4 in which we applied the proposed general approach for enu-meration of orthogonal array of strength t, a specific kind of combinatorial objects.Detail of this implementation and some initial results will be given on Section 5.And finally, some conclusion will be discussed in Section 6.

2 Overview and Related Work

2.1 Serial Algorithms for Isomorph-Free Exhaustive Generation

The objective of isomorph-free exhaustive generation of combinatorial objects isto generate a representative for each of the isomorphism classes of those objects.For construction of objects, the most natural and widely used method is back-tracking [Wal60]. On the other hand, most methods proposed for isomorphismrejection could be classified in two types. These are OrderlyGeneration methodwhich has been proposed by Read [Rea79] and Faradev [Far78] and the canonicalaugmentation method which has been proposed by McKay [McK98]. The thor-ough discussion of these methods could be seen in [KO06] and [MS08]. Note thatthere are the so called ”method of homomorphisms” (Laue & others [GLMB96])but it uses a more algebraic approach, not the search-tree model.

The most common method is OrderlyGeneration which was independentlyintroduced at the same time in 1978 by Read [Rea79] and Faradev [Far78]. Ba-sically, it uses the idea that there is a canonical representative of every isomor-phism class that is the object that needs to be generated. Usually, the canonicalobject is the isomorphic object that is extremal in its isomorphism class (largestlexicographically or smallest lexicographically). The algorithm will backtrack ifa subobject is not canonical. The canonical labeling and the extensions of anobject must be defined in order to ensure that each canonically labeled objectis the extension of exactly one canonical object.

The second method is the canonical augmentation which has been proposedby McKay [McK98], where generation is done via a canonical construction path,instead of a canonical representation. In this method, objects of size k are gener-ated from objects of size k−1, where only canonical augmentations are accepted.

Page 3: A Step-by-Step Extending Parallelism Approach for Enumeration of Combinatorial Objects

A Step-by-Step Extending Parallelism Approach for ECO 465

Hence the canonicity testing is substituted by testing the augmentation from thesmaller object is a canonical one. McKay’s method is related to the reverse searchmethod of Avis and Fukuda [AF93]. Both are based on the idea of defining atree structure on a set of objects with a function for deciding parenthood forobjects. However, they differ in that Avis and Fukuda’s method is not concernedwith eliminating isomorphs, but simply repeated objects.

It is noteworthy to note that those algorithms are isomorph exhaustive gener-ation and based on a famous association rule mining algorithm, a priori [AS94],which generate all objects of size k from all objects of size k − 1.

2.2 Two Issues of Concern

Before we propose our small-step parallelism approach, we will discuss furtherabout the properties of the OrderlyGeneration algorithms.

Note that the OrderlyGeneration algorithm allows for generation from scratchwhen called with the root parameters [] and n = 0 and it will finish when reach-ing the target size S. Such characteristic results in two issues that need to betaken into account. First, suppose that we want to generate for the next levelS + 1 after finishing generation at level S, in this case we must restart the pro-cedure again from scratch and regenerate temporary levels. Obviously, this is awaste of time and cost since we do not reuse any result in the previous step.Second, in practice, sometimes when size S is huge, the computation cost atsuch one extending step (generating objects of size k from objects of size k − 1with k ≤ S) is also very large (see Section 5). So in this case it could be worthapplying parallel computing for solving the high calculation cost issue.

3 A Proposed Approach

The above two issues motivate us to find out a parallel method for decomposingthe generation process into small separated computation steps and try to reuseold calculated results before generating for the new size. Fortunately, this canbe done by using a special characteristic of OrderlyGeneration algorithm.

An important characteristic of the OrderlyGeneration algorithm is that acanonical object size k is guaranteed to be an extension of exactly one previ-ously canonical object of size k − 1. The outcome of this characteristic is allcanonical objects of size k can be generated by extending all canonical objectsof size k−1. In the point of data parallelism view, this is a very important char-acteristic that could be used to exploit the parallel computing on the generation.In particular, the OrderlyGeneration algorithm will generate canonical objectsin an increasing way, in which it begins with object of size 0, generates step bystep all canonical objects of size k from all canonical objects of size k − 1. Thegeneration is repeated until all target canonical objects of a target size, S, arereached. The search tree space of the OrderlyGeneration algorithm is given inFig. 1

We propose a novel small-step parallelism approach for ECO based on theabove important characteristic of OrderlyGeneration algorithm. The main idea

Page 4: A Step-by-Step Extending Parallelism Approach for Enumeration of Combinatorial Objects

466 H. Phan, B. Soh, and M. Nguyen

Fig. 1. Search tree space of the OrderlyGeneration algorithm

is that we divide the generation in separated small steps. At each step, we justextend concurrently all canonical objects of size k from all canonical objectsof size k − 1. All canonical objects of size k will be stored and reused for thenext step. In particular, we propose basically a general step-by-step parallelismapproach for enumeration of combinatorial objects as follows:

1. At the initial stage, using the original serial algorithm OrderlyGenerationto generate all canonical objects of an initial size k0, which will be chosendepending on the specific kind of combinatorial object.

2. The data parallelism strategy is applied to generate all canonical objects ofsize k from all canonical objects of size k − 1. All results will be stored andreused in the next step.

3. The step-by-step extending phases continue until the all the canonical objectsof the target size S are reached.

The most important phases is the generation of objects of size k from all objectof size k − 1, in which canonical objects of size k will be generated and storedconcurrently. On section 5, we will discuss about some methods could be usedfor domain decomposition on this phase.

With the proposed approach, we have some advantages. Most importantly,with the reusing of canonical objects, all objects of each level will be generatedprecisely one time. All objects of a new size will be enumerated from all gener-ated objects of the previous size, not from scratch. This save a lot of cost forregenerating old objects of previous size, especially when size S is huge. More-over, the data parallelism strategy gives a efficient speedup in each extendingstep (see Section 5).

Page 5: A Step-by-Step Extending Parallelism Approach for Enumeration of Combinatorial Objects

A Step-by-Step Extending Parallelism Approach for ECO 467

4 A Case Study: Enumeration of Orthogonal Array ofStrength t

Since there are many different kinds of ECO problems, we choose a specific ser-vant of ECO and apply our general proposed approach to do some experiments.On this paper, our work has applied the proposed approach for enumeration ofOrthogonal Array (OA) of strength t, a special kind of combinatorial objects. It isnoteworthy that OA has many applications in Design of Experiment [LYPP03]and software testing [LM08]. On this section we first present the notation ofOA of strength t and then, the MSC algorithm, an OrdelyGeneration algo-rithm for enumerating orthogonal array, which is proposed by Eric, Pieter andMan [SEN10] will be presented. Finally, we will discuss about how our approachis applied for enumeration of orthogonal array of strength t.

4.1 Notation of Orthogonal Array

We present a clear definition of OA of strength t which could be found in [Ngu08].We denote d finite sets Q1, Q2, ..., Qd as factors where d is a finite number.

The elements of a factor are called levels. The (full) factorial design withrespect to these factors is the Cartesian product D = Q1×·· ·×Qd . A fractionaldesign or fraction F of D is a subset consisting of elements of D (possibly withmultiplicities). We denote ri := |Qi| as the number of levels of the ith factor.Let s1 > s2 > · · · > sm be the distinct factor sizes of F , and suppose that F hasexactly ai factors with si levels. We call the partition

r1 · r2 · · · rd = sa11 · sa2

2 · · · · ·amm

the design type of F .A subfraction of F is obtained by choosing a subset of the factors (columns),

and removing the other factors. If a fraction is a multiple of a full design, i.e. itcontains every possible row with the same multiplicity, we call it trivial. Witha natural number t, a fraction F is called t-balanced if, for each choice of tfactors, the corresponding subfraction is trivial. In other words, every possiblecombination of coordinate values from a set of t factors occurs equally often. At-balanced fraction F is also called an OA of strength t. If F has N rows, wewrite F = OA(N ; sa1

1 · sa22 · · · sam

m ; t).For instance, the following array is an OA of strength 3 but not strength 4,

OA(16; 3 · 23; 3):

F =

⎡⎢⎢⎣

0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 30 1 0 1 0 1 0 1 1 0 1 0 1 0 1 10 0 1 1 0 0 1 1 1 1 0 0 0 0 1 10 1 1 0 0 0 1 1 1 0 0 1 0 1 0 1

⎤⎥⎥⎦

T

We say that a triple of column vectors X, Y, Z are orthogonal if each possibletuple (x, y, z) in [X |Y |Z] appears with the same frequency. So an array hasstrength 3 if, and only if, every triple of columns in the array is orthogonal.

Page 6: A Step-by-Step Extending Parallelism Approach for Enumeration of Combinatorial Objects

468 H. Phan, B. Soh, and M. Nguyen

4.2 An OrdelyGeneration Algorithm for Enumerating OrthogonalArray

Recently, Eric, Pieter and Man ( [SEN10]) have proposed an algorithm namedMinimum Complete Set (MSC) for finding lexicographically-least orthogonalarrays. With that algorithm, several orthogonal arrays with distinct specifictypes have been generated and enumerated.

LMC matrix. Lexicographically less comparison, a comparison metric of twoarbitrary orthogonal arrays with the same specific design type, has been firstlyproposed in [Ngu05].

For two vectors u and v of length L, we say u is lexicographically less than v,written u < v, if there exists an index j = 1, 2, ..., L− 1 such that u[i] = v[i] forall 1 ≤ i ≤ j and u[j + 1] < v[j + 1].

Let F = [c1, ..., cd], F′

= [c′1, ..., c

′d] be any pair of fractions where ci, c

′i are

columns. We say F is column-lexicographically less than F′, written F < F

′,

if and only if there exists an index j ∈ {1, ..., d − 1} such that ci = c′i for all

1 ≤ i ≤ j and cj+1 < c′j+1 lexicographically.

The smallest matrix of an isomorphic class which corespondent to a specificdesign type will be called lexicographically minimum in column (LMC )matrixand it is the only representative of this isomorphic class. Certainly, the con-cept of LMC matrix is equivalent with the concept of canonical object in thegeneral OrderlyGeneraion algorithm. In other words, LMC matrix is a specificcanonical object in the context of orthogonal array.

Finding lexicographically-least orthogonal array algorithm. The MCSbacktracking algorithm has been used to construct new orthogonal arrays andcheck whether every new generated orthogonal array is LMC. In particular, itwill generate and extend column by column until it reach the target column sizeS. The detail of this algorithm is so complicated on [SEN10]. Hence, we sum-marize the outline of the MCS algorithm as below:

Input: An orthogonal array X = [x1, x2, ..., xn], nif IsComplete(X) then

process Xend ifif IsExtendible(X) then

for all extension X ′ = [x1, x2, ..., xn, x′] of X doif IsNewOA(X ′) then

if IsLexLeast(X ′) thenMCS(X ′, n + 1)

end ifend if

end forend if

Algorithm 1. MCS algorithm

Page 7: A Step-by-Step Extending Parallelism Approach for Enumeration of Combinatorial Objects

A Step-by-Step Extending Parallelism Approach for ECO 469

On the outline above, X = [x1, x2, ..., xn] is an orthogonal array with ncolumns. The MCS algorithm uses the backtracking approach to put new valuesfor cells on the appended column. After appending completely a new columnto create a new orthogonal array, it will check whether the new one is LMCmatrix. If not, it will backtrack to search for another new orthogonal array. Ifyes, it will call MCS algorithm recursively to continue appending new columnsuntil it reach the target column size S.

4.3 Applying Our Proposed Approach for Enumeration ofOrthogonal Arrays

Since enumeration of orthogonal array is a typical kind of the general ECOand MCS algorithm is an specific OrderlyGeneration algorithm, our proposedparallel computing approach will be applied for solving this issue and we use MCSalgorithm as an original sequence algorithm for our approach. In particular, ourproposed method was applied specifically for enumeration of orthogonal arrayof strength t as follows:

1. At the initial stage, using the original serial algorithm MCS to generateall LMC matrices OA(N ; sa10

1 · sa202 · · · s

am0m ; t) of an initial column size

k0 = a10 + a20 + ... + am0

2. The data parallelism strategy is applied to generate all LMC matrices ofOA(N ; sa1

1 · sa22 · · · sam+1

m ; t) with column size k = a1 + a2 + ... + am + 1 fromall LMC matrices of OA(N ; sa1

1 · sa22 · · · sam

m ; t) with column size k − 1 =a1 + a2 + ... + am. All results will be stored and reused in the next step.

3. The step-by-step extending phases continue until the all the LMC matrices ofthe target column size S are reached or there are no LMC matrix generatedat an arbitrary size S0 (S0 < S).

5 Algorithm Design Details

The most important phase on our proposed approach is the extending phase,in which all LMC matrices of OA(N ; sa1

1 · sa22 · · · sam+1

m ; t) will be generatedconcurrently from all LMC matrices of OA(N ; sa1

1 · sa22 · · · sam

m ; t). The initialimplementation of this phase will be presented on this section.

5.1 Domain Decomposition

At the beginning, all LMC matrices of OA(N ; sa11 · sa2

2 · · · samm ; t) are stored in

an input file. We need useful methods to deliver input matrices to all processes.

Naive method. The basic method for domain decomposition is dividing equallyall input for matrices. In particular, a single process, such as the process withrank 0, will read all input matrices from the input file and and deliver evenlyLMC matrices of OA(N ; sa1

1 ·sa22 · · ·sam

m ; t) to all other processes. For each input,each process utilizes MCS algorithm to generate LMC matrices of OA(N ; sa1

1 ·sa22 · · · sam+1

m ; t). Note that there are no any dynamic load balancing schemedeployed on this method. The inputs are just distributed equally for all processesat the start time.

Page 8: A Step-by-Step Extending Parallelism Approach for Enumeration of Combinatorial Objects

470 H. Phan, B. Soh, and M. Nguyen

Master-slave method. There is another method using a single process fordynamic load balancing for all other processes. In particular, the process rank 0,called master, after reading all input matrices from the input file will distributeone input matrix for each other process, called worker, at a time to generate newresults. After finish utilizing the single input, each worker will request a newinput from master. Master accepts the request and sends another new input.This work continues until there are no more input at master.

In fact, the load balancing is better when we use the Master-slave method,this could be seen in Figure 2 when we do the experiment:

Process ID

Exe

cuti

on

tim

e(s

)

0 1 2 3 4 5 6 70

2000

4000

6000

8000

10000

12000

Fig. 2. Execution time of each process

Random pooling method. Besides using one process as a master for dynamicload balancing, some other methods also could be used. One of them is randompooling method, which is referred as one of the most efficient methods of re-questing work when the underlying architecture of the computing system is nowknown [KGR94]. The idea of random pooling method is after finish exploringall of input matrices which had been assigned, the idle process will request formore inputs to explore from another randomly chosen process. It seems to usthat random pooling could be a useful method for dynamic load balancing.

Work-stealing method. Besides master-slave and random pooling method,there is a popular method for dynamic load balancing named Work-stealingalgorithm. The nature of work-stealing method is so simple. At each time step,each empty process will send a request to one other processor, which is chosenusually at random. Each non-empty processor having received at least one suchrequest will select one of the requests. Now each empty process with its requestbeing accepted will ”steals” tasks (matrices) from the other process. Since work-stealing method is a stable [BF01] and scalable [DLS+09] algorithm for domaindecomposition, it could be also a good choice for us for dynamic load balancingour general proposed approach.

At this time, the decision for choosing the proper domain decompositionand dynamic load balancing will depend on the specific ECO problem. On

Page 9: A Step-by-Step Extending Parallelism Approach for Enumeration of Combinatorial Objects

A Step-by-Step Extending Parallelism Approach for ECO 471

this initial experiment, we choose Master-slave as a method for dynamic loadbalancing.

5.2 Processing Outputs

Remind that our approach will generate all canonical objects of size k from allcanonical objects of size k − 1 and all results will be stored and reused in thenext step. Hence, the storage of all generated outputs is an important factor thatneed to be concerned. The best choice is that all outputs which are generated bydistinct processes will be stored concurrently on a single file. On this experiment,we choose a basic method that each time a new LMC matrix is generated, theprocess will write the result to the shared log file of the cluster system by basicallyusing the print function of C language. However, with the support of parallelfile system, MPI-IO [Mes97] could be a helpful tool for us to improve furtherthe performance of concurrent I/O tasks.

5.3 Some Initial Experiment Results

Our work has been executed on the Hercules cluster of La Trobe university,Australia. A small experiment has been executed to simplify the evaluation. Atthe initial stage, the MCS algorithm is used to generate all 89 LMC -matrices ofOA(72; 3 ·24; 3). After that, those 89 LMC matrices of OA(72; 3 ·24; 3) have beenchosen as inputs for enumerating all non-isomorphism class of OA(72; 3 · 25; 3).Note that the result is we have 27730 LMC matrices of OA(72; 3 · 25; 3). Thenumber of processes have been doubled up to 16. On each experiment, we collectthe maximal execution time, total execution time and the results are recordedin Table 1:

Table 1. Execution time

Number of processes Maximal execution time (minutes)

2 (1 + 1) 465.9’4 (1 + 3) 156.23’8 (1 + 7) 79.6.

16 (1 + 15) 57.45’

Note that, with the experiment using two processes, actually there is only oneworker do the generation whilst the master just send every input for the workerat a time. With the experiment using four processes, there are three workerprocesses and one master and so on.

Since we use the master-slave method for our experiments, we always need atleast two processes (one master and at least one worker) for every parallelismexperiment. That’s why we are more concerned with the relative speedup ofthe algorithm with the initial number of processes is two. The formula for therelative speedup is given as follows: Speedup(p) = T (p)

T (2p) The result is given inTable 2:

Page 10: A Step-by-Step Extending Parallelism Approach for Enumeration of Combinatorial Objects

472 H. Phan, B. Soh, and M. Nguyen

Table 2. Relative speedup

Number of processes Relative Speedup (ideal is 2)

4 (1 + 3) 2.988 (1 + 7) 1.97

16 (1 + 15) 1.38

With using 89 matrices inputs of OA(72; 3 · 24; 3) for all experiments, theresults show that the speedup is really scalable with 8 processes. However, thespeedup is reduced significantly when we doubled the process up to 16. Thisis because there is a really big difference of the exploring times for inputs. Forexample, the exploring times when we execute the MCS function on each inputare recorded in the Figure 3.

MATRIX ID

Tim

e f

or

pro

gre

ssin

g e

ach

ma

trix

(s)

0

50

100

150

200

250

300

350

400

450

1 2 3 4 5 6 7 8 9 16 17 18 1910 11 12 13 14 15

Fig. 3. Time for processing each input

As you can see in Figure 3, with a simple input, the exploring time could beabout 1 second, however, with a more complex input matrix, the exploring timecould be about 430 seconds. Note that the execution time of a process includesthe summary of all the exploring times for all input matrices which had beenassigned. When we just have 89 input matrices delivering for 16 processes, it ispossible to have a situation in which the total exploring time for all inputs ofa process is less than the exploring time just for one input of another process.This affects a lot to the load balancing of the algorithm. In fact, the executiontime when we use 16 processes are significantly distinct as the results recordedin Figure 4.

Finally, it is noteworthy to analyze more about the cost for each extendingphase. Using 89 input matrices of OA(72; 3·24; 3) to generate 27730LMC matricesof OA(72; 3 · 25; 3) took us 80 minutes using 8 processes(see Table 1). Quantita-tively, using 27730 LMC matrices of OA(72; 3·25; 3) to generate all LMC matricesof OA(72; 3 · 26; 3) with using 8 processes could take us about (27730/89)* 80 =25.000 minutes. This number shows that it is worth for applying the parallel com-puting for reducing the time taken for each extending phase.

Page 11: A Step-by-Step Extending Parallelism Approach for Enumeration of Combinatorial Objects

A Step-by-Step Extending Parallelism Approach for ECO 473

Process ID

Exe

cuti

on

tim

e(s

)0

0

500

1000

1500

2000

2500

3000

3500

4000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Fig. 4. Execution Time of 16 processes

6 Conclusion and Future Work

The parallelism approach presented in this paper is able to handle the computing-intensive of ECO problem. The approach utilizes the OrderlyGeneration originalmethod of enumerating of combinatorial objects as a foundation to step-by-stepextending generation.

With the proposed approach, we have some advantages. Most importantly,with the reusing of canonical objects, all objects of each level will be generatedprecisely one time. All objects of a new size will be enumerated from all gener-ated objects of the previous size, not from scratch. This saves a lot of cost forregenerating old objects of previous size, especially when size S is huge. More-over, on the point of data parallelism view, the reusing of previous objects forgenerating all objects of the next size strategy certainly gives us a great chanceto apply useful data parallelism techniques in each extending step.

The experiments done on this current work are just on the initial stage, inwhich we use the Master-slave method for domain decomposition. It is because,on this paper we just aim to show the potential usefulness of our proposedmethod on applying parallel computing for enumeration of combinatorial ob-jects. Certainly, we could explore advanced domain decomposition techniquessuch as random pooling or work-stealing method to improve the load balancingof the experiments. Besides, with applying work-stealing algorithm or a randompooling algorithm for domain decomposition, the speedup analysis of the ini-tial experiment would provide a much better understanding of the performancegains (and the communication overhead) if done in contrast with the sequentialalgorithm, i.e. absolute speedup instead of relative speedup as we have done onthis initial experiment.

Moreover, with the results gained on the initial work, we believe that I/O timespent writing and reading results, plus the communication load on the masternode might be a dominant factor on the speed of the algorithm, especially whenthe search space is wide. Hence, some I/O optimization issues could be appliedon the future, such as using MPI-IO to improve the performance of I/O tasks.

Finally, the potential usefulness of our approach could be applied for many spe-cial kinds of ECO. For example, besides the MCS algorithm for enumeration of

Page 12: A Step-by-Step Extending Parallelism Approach for Enumeration of Combinatorial Objects

474 H. Phan, B. Soh, and M. Nguyen

orthogonal arrayof strength t, we have some specificOrderlyGeneration algorithmsfor special ECO problems, such as an OrderlyGeneration algorithm for classifyingtriple systems [KTR09]. Hence, we could apply our parallelism method for suchclassifying triple systems problem.

Acknowledgment

The research was carried out whilst the first author was supported by a LaTrobe University Tuition Fee Remission, Postgraduate Research Scholarship andeResearch grant by La Trobe University. We also would like to thank anonymousreviewers have given us useful comments to improve further this final paper.

References

[AF93] Avis, D., Fukuda, K.: Reverse search for enumeration. Discrete AppliedMathematics 65, 21–46 (1993)

[AS94] Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In:APriori, pp. 487–499 (1994)

[BCH03] Brinkmann, G., Caporossi, G., Hansen, P.: A survey and new results oncomputer enumeration of polyhex and fusene hydrocarbons. Journal ofChemical Information and Computer Sciences 43(3), 842–851 (2003)

[BF01] Berenbrink, P., Friedetzky, T.: The natural work-stealing algorithm is sta-ble. In: FOCS 2001: Proceedings of the 42nd IEEE symposium on Founda-tions of Computer Science, Washington, DC, USA, p. 178. IEEE ComputerSociety, Los Alamitos (2001)

[CC05] Chen, Y., Crippen, G.M.: A novel approach to structural alignment usingrealistic structural and environmental information. Protein Science 14(12),2935–2946 (2005)

[DLS+09] Dinan, J., Larkins, D.B., Sadayappan, P., Krishnamoorthy, S., Nieplocha,J.: Scalable work stealing. In: SC 2009: Proceedings of the Conference onHigh Performance Computing Networking, Storage and Analysis, pp. 1–11.ACM, New York (2009)

[Far78] Faradzev, I.A.: Constructive enumeration of combinatorial objects. prob-lemes combinatoires et theorie des graphes collogue interat. CNRS 260,131–135 (1978)

[GLMB96] Grner, T., Laue, R., Meringer, M., Bayreuth, U.: Algorithms for groupactions: Homomorphism principle and orderly generation applied to graphs.In: DIMACS Series in Discrete Mathematics and Theoretical ComputerScience, pp. 113–122. American Mathematical Society, Providence (1996)

[KGR94] Kumar, V., Grama, A.Y., Rao, V.N.: Scalable load balancing techniques forparallel computers. Journal of Parallel and Distributed computing, 60–79(1994)

[KO06] Kaski, P., Ostergard, P.R.J.: Classification algorithms for codes and de-signs. Algorithms and Computation in Mathematics 15 (2006)

[KTR09] Khosrovshahi, G.B., Tayfeh-Rezaie, B.: Classification of simple 2-(11,3,3)designs. Discrete Mathematics 309(3), 515–520 (2009); International Work-shop on Design Theory, Graph Theory, and Computational Methods - IPMCombinatorics II, International Workshop on Design Theory, Graph The-ory, and Computational Methods

Page 13: A Step-by-Step Extending Parallelism Approach for Enumeration of Combinatorial Objects

A Step-by-Step Extending Parallelism Approach for ECO 475

[LM08] Lazic, L., Mastorakis, N.: Orthogonal array application for optimal com-bination of software defect detection techniques choices. W. Trans. onComp. 7(8), 1319–1336 (2008)

[LYPP03] Lee, K.-H., Yi, J.-W., Park, J.-S., Park, G.-J.: An optimization algorithmusing orthogonal arrays in discrete design space for structures. Finite Ele-ments in Analysis and Design 40(1), 121–135 (2003)

[McK98] McKay, B.D.: Isomorph-free exhaustive generation. J. Algorithms 26(2),306–324 (1998)

[Mes97] Message-Passing Interface Forum. MPI-2.0: Extensions to the Message-Passing Interface, ch. 9. MPI Forum (June 1997)

[MS08] Moura, L., Stojmenovic, I.: Backtracking and isomorph-free generation ofpolyhexes. In: Nayak, A., Stojmenovic, I. (eds.) Handbook of Applied Algo-rithms: Solving Scientic, Engineering, and Practical Problems, pp. 39–83.John Wiley & Sons, Chichester (2008)

[Ngu05] Nguyen, M.: Computer-algebraic methods for the construction of designsof experiments. Ph.D. Thesis, Technische Universiteit Eindhoven (2005)

[Ngu08] Nguyen, M.V.M.: Some new constructions of strength 3 mixed orthogonalarrays. Journal of Statistical Planning and Inference 138(1), 220–233 (2008)

[PSK+07] Park, B.-H., Samatova, N.F., Karpinets, T., Jallouk, A., Molony, S., Hor-ton, S., Arcangeli, S.: Data-driven, data-intensive computing for modellingand analysis of biological networks: application to bioethanol production.Journal of Physics: Conference Series 78, 012061 (6p.) (2007)

[Rea79] Read, R.C.: Every one a winner. Ann Discrete Math., 107–120 (1979)[SEN10] Schoen, E.D., Eendebak, P.T., Nguyen, M.V.M.: Complete enumeration

of pure-level and mixed-level orthogonal array. Journal of CombinatorialDesigns 18(2), 123–140 (2010)

[Wal60] Walker, R.J.: An enumerative technique for a class of combinatorial prob-lems. In: Proc. Sympos. Appl. Math., vol. 10. American Mathematical So-ciety, Providence (1960)