Top Banner
The Journal of Supercomputing, 37, 249–269, 2006 C 2006 Springer Science + Business Media, LLC. Manufactured in The Netherlands. A New Genetic Algorithm for Loop Tiling SAEED PARSA [email protected] SHAHRIAR LOTFI shahriar lotfi@iust.ac.ir Department of Computer Engineering, Iran University of Science and Technology, Narmak, 16844, Tehran, Iran Abstract. Tiling is a known problem especially in the field of computational geometry and its related engineering branches. In fact, a tile is a set of points in the Cartesian space. The goal is to partition the space of the points as tiles with optimal dimensions and shapes such that a number of predefined semantic relations holds amongst the tiles. So far, this problem has been solved in special cases with two or three dimensions. The problem of determining the optimal tile is an NP-Hard problem. Presenting a novel constraint genetic algorithm in this paper, we have been able to solve the tiling problem in Cartesian spaces with more than two dimensions, for the loop parallelization problem. Keywords: constraint genetic algorithm, optimal tiling, serial and parallel loops, parallel computers 1. Introduction The problem of optimal partitioning of a set of related points in the Cartesian space in two or more dimensions is a known problem in the field of computational science. In this type of partitioning, the underlying Cartesian space is subdivided into geometrically identical shapes called tiles, preferably parallelepiped. The problem is to find optimal sizes for the tiles such that each tile covers a predefined number of interrelated points considering intertile communication restrictions such as: (1) the chain of vectors connecting any two points in a tile, reside in that tile, (2) the number of vectors connecting the tile to its neighboring tiles is minimum and (3) any other constraints imposed by the user. The Tiling technique is of great importance in image compressing, image processing, geographical image processing and some other applications related to those branches of computational science dealing with optimal partitioning of Cartesian spaces [1–3]. Also using tiling can significantly improve data locality for many scientific programs [4], minimize memory access costs [4, 5] and also improve the performance of numerical algorithms on hierarchical memories [6]. In some cases, tiling may be able to exploit invariant reuse carried by outer loops and continue to benefit from the spatial reuse carried by inner loops [7]. Loop Tiling is significant scope for temporal reuse of data in the loops that arise in this context [5]. Since the arrays involved are often too large to fit into the cache, loop tiling has significant potential for reducing cache misses. Loop tiling is used to enhance temporal cache locality and thus reduce memory access costs in compute-intensive loops such as matrix-matrix product routines [5]. In this paper, the aim of tiling is to granulate iterations of nested loops such that each tile can be transferred to a separate processor for parallel execution with the other tiles. In loop tiling problem, points indicate iteration numbers of nested loops, the relations
21

A New Genetic Algorithm for Loop Tiling

Feb 09, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A New Genetic Algorithm for Loop Tiling

The Journal of Supercomputing, 37, 249–269, 2006C© 2006 Springer Science + Business Media, LLC. Manufactured in The Netherlands.

A New Genetic Algorithm for Loop Tiling

SAEED PARSA [email protected]

SHAHRIAR LOTFI shahriar [email protected]

Department of Computer Engineering, Iran University of Science and Technology, Narmak, 16844,Tehran, Iran

Abstract. Tiling is a known problem especially in the field of computational geometry and its related

engineering branches. In fact, a tile is a set of points in the Cartesian space. The goal is to partition the space

of the points as tiles with optimal dimensions and shapes such that a number of predefined semantic relations

holds amongst the tiles. So far, this problem has been solved in special cases with two or three dimensions.

The problem of determining the optimal tile is an NP-Hard problem. Presenting a novel constraint genetic

algorithm in this paper, we have been able to solve the tiling problem in Cartesian spaces with more than two

dimensions, for the loop parallelization problem.

Keywords: constraint genetic algorithm, optimal tiling, serial and parallel loops, parallel computers

1. Introduction

The problem of optimal partitioning of a set of related points in the Cartesian space in twoor more dimensions is a known problem in the field of computational science. In this typeof partitioning, the underlying Cartesian space is subdivided into geometrically identicalshapes called tiles, preferably parallelepiped. The problem is to find optimal sizes forthe tiles such that each tile covers a predefined number of interrelated points consideringintertile communication restrictions such as: (1) the chain of vectors connecting any twopoints in a tile, reside in that tile, (2) the number of vectors connecting the tile to itsneighboring tiles is minimum and (3) any other constraints imposed by the user.

The Tiling technique is of great importance in image compressing, image processing,geographical image processing and some other applications related to those branches ofcomputational science dealing with optimal partitioning of Cartesian spaces [1–3]. Alsousing tiling can significantly improve data locality for many scientific programs [4],minimize memory access costs [4, 5] and also improve the performance of numericalalgorithms on hierarchical memories [6]. In some cases, tiling may be able to exploitinvariant reuse carried by outer loops and continue to benefit from the spatial reusecarried by inner loops [7]. Loop Tiling is significant scope for temporal reuse of data inthe loops that arise in this context [5]. Since the arrays involved are often too large tofit into the cache, loop tiling has significant potential for reducing cache misses. Looptiling is used to enhance temporal cache locality and thus reduce memory access costsin compute-intensive loops such as matrix-matrix product routines [5].

In this paper, the aim of tiling is to granulate iterations of nested loops such that eachtile can be transferred to a separate processor for parallel execution with the other tiles.In loop tiling problem, points indicate iteration numbers of nested loops, the relations

Page 2: A New Genetic Algorithm for Loop Tiling

250 PARSA AND LOTFI

indicate data dependency between the iterations and each dimension of the Cartesianspace represents a loop within a nested loop. Two iterations of a nested loop are datadependent if one of them uses a value evaluated by the other one.

The goal is to determine optimal size and shape for tiles. To this end, some of theexisting methods [8, 9], compute the number of points residing in tiles with predefinedshape and sizes. Also, the degree of dependencies between neighboring tiles is computed.In some other methods [8–19], a restricted form of the tiling problem is solved in twodimensions. Tiling has been considered as an optimization problem and solved usingan ordinary evolutionary approach based on genetic algorithm [11]. In this approach,a genetic algorithm is applied to estimate the optimal size of tiles with a special shape(usually a rectangle or a cube) regarding a number of scheduling parameters, in two orthree dimensions. In general, current approaches to solve the tiling problem restrict tilesto rectangles.

Determining an optimal size and shape for tiles is an optimization problem. Thesetypes of problems can be solved using evolutionary algorithms [20–22]. In this papera new approach using a new genetic algorithm to solve the tiling problem is presented.Using our new constraint genetic algorithms we have been able to solve the tiling prob-lem, where the shape of tiles is a parallelepiped in Cartesian spaces with any number ofdimensions, in a reasonable time.

The remaining sections of this paper are organized as follows: In Section 2, the problemis defined and the tiling problem is presented as a minimization optimizing problem.Next, in Section 3, a new approach to solve the tiling problem using our new constraintgenetic algorithm is presented. The approach is evaluated in Section 4. Finally, in Section5, the conclusions are described.

2. Tiling objectives

This section is concerned with the justification of the use of genetic approaches tosolve the problem of tiling. The justification is carried out by elaborating the degree ofcomplexity of tiling in Cartesian spaces of any dimensionality.

A tile in this paper is a set of loop iterations to be executed on a single processor. Theaim is to partition loop iterations into tiles with minimal inter-dependency, where each ofthe iterations is represented as a point in Cartesian space. Each dimension of the Cartesianspace represents a loop index. For instance, in Figure 1(a) there is a nested loop with twoloop indices, j1 and j2. The corresponding iteration space is a two dimensional Cartesianspace shown in Figure 1(b) [8]. In Figure 1(c) there is a nested loop with three loopindices, j1, j2 and j3. The corresponding iteration space is a three dimensional Cartesianspace, shown in Figure 1(d). Here, it is assumed that the dependent iterations are alreadydetected [23–26] and the iteration space is uniform [27]. We have already developed aconstrained genetic algorithm to find dependent iterations of nested loops [28].

In Figures 1(b) and 1(d) the iteration spaces for the loops presented in Figures 1(a)and 1(b) is represented, respectively. Each point within the iteration spaces representsan iteration of the loops. In Figure 1, data dependencies between the loop iterations areshown by arrows. Since the loops iteration space is uniform, all the arrows are in thesame direction. Each tile is shown as a square including four loop iterations.

Page 3: A New Genetic Algorithm for Loop Tiling

NEW GENETIC ALGORITHM FOR LOOP TILING 251

Figure 1. Nested loops and theirs loop iteration spaces.

In fact, the aim is to partition loop iterations in tiles such that each tile can be executedon a single processor with minimum execution time. Hence, an optimal size and shape forthe tiles should be determined. It is a difficult task to determine an optimal tile because,if we reduce the size of tiles, the running time of a tile will be reduced while at thesame time the number of tiles and so their communication overhead will be increased.On the other hand, because of the small tile size, the amount of the unused parts of theprocessors local memory will be increased. Also, the tile size can not exceed the lengthof the local memory of the parallel processors.

The main issue is to partition the Cartesian iteration space into tiles with optimalsize and shape such that there are minimum inter-tile dependency. The amount of thedependency is computed as the number of dependencies between the iterations residingin the tile with iterations in its neighboring tiles. The inter-tile dependencies shouldbe minimized in order to reduce inter-process communications, because each tile ofthe loop iterations will be executed on a different processor. This issue referred to ascommunication constraint. Hence, the shape and the size of the tile dimensions shouldbe evaluated in such a way that the communication constraint is satisfied. In addition,the tile size is limited by the length of the local memory of the parallel processors,because each tile should best fit the local memory of the processors. Also, shape and

Page 4: A New Genetic Algorithm for Loop Tiling

252 PARSA AND LOTFI

size of a tile should be determined in such a way that the sequence of vectors connectingany two dependent loop iterations or points in the tile, reside in that tile. This issue isreferred to as the atomic constraint. Another constraint which affects shape and size ofa tile is to minimize the I/O time required to transfer the tile to the local memory ofparallel processors. In summary there are four constraints affecting a tile shape and size,as follows:

(1) Communication constraint: to minimize inter-tile dependencies,(2) Computation constraint: to limit the tile size to the length of the local memories,(3) Atomic constraint: To guarantee serial execution iterations residing in a tile,(4) I/O constraint: To minimize I/O cost to transfer tiles

These constraints will be further described in the following section.

3. Tiling genetic algorithm

Tiling is an NP-Hard problem [8–19, 29]. The problem can be considered as an opti-mization problem and can be best solved using evolutionary approaches [21, 22, 30].In this section the details of our new genetic algorithm for solving the tiling problemin Cartesian spaces of any dimensionality is presented. Below, in Figure 2 is the mainbody of our tiling genetic algorithm.

In the above algorithm, in the first stage, an initial population of the problem solutionsis created. Each solution is called a Chromosome. Considering the objective of theevolutionary process and the constraints imposed by the problem, the fitness of eachchromosome in the population is evaluated. The fittest chromosomes are selected andtransferred into an intermediate population. Then, the individuals in the intermediatepopulation are recombined to produce new population. This process is repeated for apredefined number of generations.

Figure 2. The main body of the tiling genetic algorithm.

Page 5: A New Genetic Algorithm for Loop Tiling

NEW GENETIC ALGORITHM FOR LOOP TILING 253

During the recombination process, it is possible to use a mutation operator to achievegenetic diversity. The fittest chromosome in a generation is moved to the next generation.The fittest chromosome within the generations is selected as the final solution. Thegenetic elements of the tiling algorithm, described in Figure 2, are further described inthe following subsections.

3.1. Problem space

The problem space is considered as an n dimensional Cartesian space, where each dimen-sion corresponds to a nested loop variable or index. Using a normalization procedure,the lower bounds of the nested loops is set to zero [31]. Thereby, the upper bounds ofthe loops will specify the problem space. Each point in the problem space represents aspecific iteration in the corresponding nested loops. For example, each iterations of athree level nested loops, is represented as a point, (i, j, k), within a three dimensionalspace. Each tile covers a certain number of points in this space. For instance, noticethe Figure 1(b), in which tiles are specified as rectangles (in general, parallelogram).Obviously, in a three dimensional space the tiles are parallelepiped. In this figure thedata dependency between any two iterations is illustrated using a vector connecting thecorresponding points. For instance, in Figure 1(b) the iteration points (1, 1) and (1, 0)are dependent on the iteration point (0, 0).

3.2. Coding method

Tiles should be represented in a form that can be processed by the genetic algorithm.For this purpose, a tile is coded as a matrix where each column of the matrix definesthe coordinates of an edge beginning at the leftmost corner of the parallelepiped shapedtile. For instance, the matrix P in Figure 3 represents a square shaped tile. Since the datadependency vectors between the iterations are uniform [27, 31–33], the correspondingtiles are similar and hence it is not necessary to study all the tiles because all of them arethe same. Hence, only the first tile beginning at the origin of the Cartesian coordinatesis considered as a chromosome. The chromosome is represented by an n × n matrixP = [pi ], where, pi represents the i th edge of the corresponding tile. Also, a tile can berepresented with a matrix H = P−1. We have used Gaussian algorithm to compute theinverse of matrix [34].

Figure 3. Two different matrix representation of a tile.

Page 6: A New Genetic Algorithm for Loop Tiling

254 PARSA AND LOTFI

3.3. Initial population

The initial population of the chromosomes is created, using a random number generator,based on a normal distribution. To generate a chromosome, the coordinate of the end pointof each edge, beginning at the origin, is defined randomly and saved in the correspondingcolumn in the chromosome matrix. In order to avoid the creation of attenuated tiles, thenormal distribution is used. The mean of the normal distribution is defined as a coefficientof the maximum length of the edges.

3.4. Selection, crossover and mutation operators

In our genetic tiling algorithm, to select two parent chromosomes for the crossing overoperation, the improved roulette wheel selection method SUS (Stochastic UniversalSampling) is used [22]. The crossover operator is applied, by combining the correspond-ing columns of the chromosomes matrices, using a uniform recombination operator asfollows [20, 21]:

Column number i of the child chromosome = Column number i of the first parent∗λ+ Column numberiof the second parent∗ (1 − λ)

In the above relation 0 ≤ λ < 1 is selected randomly. Also, the columns are selectedrandomly for the crossing over operation. For instance in Figure 4, columns number 2and 3 are selected for the crossing over operation. The other columns remain intact.

The mutation operator alters the shape a selected tile, randomly. This is achieved byrandomly selecting two rows in the matrix representation of the tile and then swappingthe selected rows. Using this approach the absolute value of the determinant of themutated tile, representing the tile size, will be the same as before mutation. Figure 5 isan example of the mutation operation.

3.5. Fitness function

The main objective is to find an optimal shape and size for the tiles, to be executed onseparate processors. To evaluate the fitness of a tile, x , the objective of tiling and a certain

Figure 4. The crossover operator.

Page 7: A New Genetic Algorithm for Loop Tiling

NEW GENETIC ALGORITHM FOR LOOP TILING 255

Figure 5. The mutation operator.

number of constraints are considered [22]:

Fitness(x) = Objective(x) + Penalty(x)

The objective of the genetic tiling algorithm is to find a tile with minimum fitness.The fitness of a tile is computed as sum of two positive numbers, the objective and thepenalty values computed for the tile, value of the tile. In our proposed algorithm thepenalty value computed for feasible tiles is zero.

3.5.1. Objective function. There are 5 sub-objectives, described in this section, con-cerning a tile shape and size which has to be covered by the objective function as follows:

Tcost = pnnCI/O+V comptI/O+�V comm/K �Ccomm+V commtcomm + (Vcomp−M)2

In the above relation, CI/O is startup cost for an I/O read (write), tI/O is cost of reading(writing) an element from (into) a file, Ccomm is startup cost for communication, tcomm iscost of communicating an element, Vcomm is the communication cost and K is maximummessage length of the machine.

The overall objective is to minimize the objective function Tcost. Hence, the foursubobjectives of Tcost have to be minimized. The sub-objectives are further describedbelow.

(a) I/O cost of a tile is determined by the number of file accesses I/O calls, pnnCI/O

required to read it from disk. Under row-major layout assumption, in order to min-imize the number file accesses, number of (sub)-row reads should be minimized.This may not be always possible in practice as minimization of I/O calls can leadto illegal tiles or some communication requirements may impose lower bounds onsome edges of the tile. In Figure 6, tile (1) and tile (2) have the same computationvolume (8 data points); however I/O cost (number of sub-rows) of tile (2) is 4 whilethat of tile (1) is 2. Everything else being equal, tile (1) is a better choice than tile(2). Note that this constraint is unique to iteration space tiling [16].

(b) A second sub-objective is to minimize the time required to load a tile into the mainmemory. The loading time is determined by VcomptI/O, where tI/O is the cost ofreading or writing an element from (into) a file and Vcomp is the number of iterationsor index points contained in a tile. In fact, Vcomp expresses the respective computation

Page 8: A New Genetic Algorithm for Loop Tiling

256 PARSA AND LOTFI

Figure 6. Different tile shapes.

cost of the tile, and is calculated by the expression [8, 9, 16]:

Vcomp(H ) = 1/|det(H )|

(c) The third sub-objective is to minimize the total startup cost for communication be-tween a tile and its neighbors. The communication cost is determined by multiplyingthe number of messages to be passed between the tiles, �Vcomm/K �, by the startupcost for communication, Ccomm. Here, K is the maximum message length of themachine and Vcomm is the communication cost of a tile. Vcomm is proportional to thenumber of iteration points that need to send data to the neighboring tiles, or the sumof the dependency vectors cutting the tile’s boundaries. The communication cost canbe calculated by the following expression [8, 9]:

Vcomm(H ) = (1/|det(H )|)n∑

i=1

n∑k=1

m∑j=1

hi,kdk, j

In the above relation, H is the inverse of the tile matrix P, M is the number ofdependency vectors, n is the nesting level of the nested loops and d j represents thei th dependency vector of the n × m matrix D = [d j ]. The matrix D includes all thedependency vectors. Practically this formula computes and sums all possible hi d j ,which express the contribution to communication of every dependency vector, toevery tile boundary surface.

(d) The fourth sub-objective is to reduce the total communication time between a tile andits neighbors. The communication time is determined by multiplying communicationcost, Vcomm, by the cost of communicating an element, tcomm. The execution time ofthe iterations residing in a tile should be reduced to the minimum amount possible.The reason is that tile iterations, are expected to execute serially on a same processor.On the other hand, the smaller the size of a tile the higher the communication costbetween the tiles will be [16].

Page 9: A New Genetic Algorithm for Loop Tiling

NEW GENETIC ALGORITHM FOR LOOP TILING 257

Figure 7. Penalty function for constraints.

(e) The fifth sub-objective is to fit the tiles in the local memory of the parallel processoron which the iterations covered by the tile are supposed to be executed. Hence, thenumber of iteration points residing on a tile, Vcomp, should not exceed the size ofthe local memory of the parallel processors, M . In the other words the memoryconstraint can be defined as: M ≥ Vcomp [16].

The penalty function is described in what follows. We have solved the constraint geneticproblem as an ordinary one using penalty function. For this purpose, we have used fitnessfunction in place of goal function.

For an infeasible solution the quality is reduced by a coefficient of Left(x) or Right(x).Considering a normal distribution for the penalty values of infeasible solutions in a gen-eration, the two functions Left and Right can be defined by the use of normal distributionfunction N(x , μ, σ ) as shown in Figure 7.

Obviously, the more the number of choices the more restrict, you can be. Similarly,when the ratio of the feasible chromosomes in a generation is high, more penalties canbe applied to less feasible chromosomes. This can be achieved by increasing the skew orslope of the two functions Left and Right. The skew of Left and Right is automaticallyincreased as the number of feasible solutions is increased.

3.5.2. Penalty function. In finding optimal sizes for tiles we are face to three constraintsincluding continuous execution, the number of points in each tile and space dimensions.In this section, after description of these two constrains, we present the fitness functionas final goal of genetic algorithm.

Page 10: A New Genetic Algorithm for Loop Tiling

258 PARSA AND LOTFI

Figure 8. Boundary constraint.

(a) The first constraint is concerned with the tiles legality, such that there is no mutualdependency between two tiles. To achieve this, the length and direction of a tile sidesshould be defined in such a way that the set of vectors connecting any two dependentloop iterations or points in the tile, reside inside the tile. This issue is referred to asthe atomic constraint.

Arbitrary clustering of data space points into tiles might result in effectivedependency vector cycles between the tiles. Tile shapes that produce effective depen-dency vector cycles are called illegal tiles. The reason for illegality is that processingof tiles must be atomic in the sense that a tile must take all the data it requires fromoutside before computation begins, and all the data required by other tiles shouldbe available after computation terminates. Allowing effective dependency vectorcycles among tiles violates this requirement. As an example, for the data spacegraph shown in Figure 6, tiles (1) and (2) are legal, while tile (3) is illegal [8, 9,16]. Given a nested loop with dependence matrix D, for a tile P , to be legal, it musthold HD ≥ 0 where, H = P−1. This ensures that tiles are atomic and that the initialexecution order is preserved. In the opposite case any execution order of tiles wouldresult in a deadlock.

(b) The diameter of a tile should not exceed the boundaries of the iteration space. Forinstance in Figure 8, there are two parallelogram shaped tiles. The tile labeled 2exceeds the boundary of the rectangular iteration space, originated at the origin ofthe Cartesian space. The dimension of the space is equal to the number of nestedloops. An axis is assigned to each loop variable. The size of the axis is equal to thenumber of the loop iterations.

4. Evaluation

Tiling is an NP-hard problem. Increasing the number of the tile dimension the timerequired to find an optimal tile increases exponentially. However, the time complexityof the proposed genetic tiling algorithm, is O(n3) and the space complexity is O(n2),because: (1) the main data structures used in the proposed algorithm are n × n matrices,each representing a n dimensional tile, and (2) the costly processing of the algorithmare to compute the determinant and inverse of the tile matrix to find the volume ofcomputation and the volume of communication, respectively.

Page 11: A New Genetic Algorithm for Loop Tiling

NEW GENETIC ALGORITHM FOR LOOP TILING 259

A tile size is determined by the size of available local memory of each parallel pro-cessor. On the other hand, the tiles are all identical. Therefore, the tiling algorithm isonly concerned with finding an optimal single tile and the size of the iteration space doesnot affect the execution time of the algorithm. Therefore, in this Section to evaluate thetiling algorithm, loops with relatively small number of iterations are used.

For further evaluation of the proposed algorithm, in the remaining parts of this section,first our proposed algorithm is compared with a 2-D tile generated for a 2 level depthnested loop presented in reference [9]. It is shown, in Section 4.1, that the proposedalgorithm generates exactly the same optimal tile for the nested loop. To demonstratethe reliability of the proposed algorithm, it is shown that the number of feasible tiles getscloser to the population size as the number of generations increases. A unique capabilityof our proposed genetic algorithm is to create tiles with more than two dimensions. InSection 4.2, 3 to 6 dimensional tiles are created and it is shown that the resulting tilesare both reliable and stable.

The proposed genetic algorithm is implemented within a tiling environment in Delphi,on an Intel Pentium III (852 MHz, 128 MB of RAM) machine. The main interface ofthe environment is illustrated in Figure 9.

As shown in Figure 9, the main interface of our tiling environment is composed of threeparts: (1) Tiling parameters (described in Section 3), (2) Genetic algorithm parameters,(3) Program output including the genetic populations and the final tile. In Figure 9,the final tile is a 4 dimensional parallelepiped, represented with 4 vectors p1 to p4. The

Figure 9. The main interface of the program.

Page 12: A New Genetic Algorithm for Loop Tiling

260 PARSA AND LOTFI

Figure 10. Two dimensional tiles.

vectors begin at the origin. Hence, only the coordinates of the end point of the vectors areshown.

4.1. Two dimensional tiles

In reference [9], it is argued that for a two level depth nested loop, with a dependencyvector D, shown in Figure 10, the optimal tiling solution is a parallelogram shaped tile asshown in part (2) of Figure 10, below. Here, the number of iteration points is 20 and themaximum number of iteration points in each dimension of the two dimensional iterationspace is 8. The idea is to find a tile with the lowest volume of communication and avolume of computation which is the closest to 20. In Figure 10(2), the optimal tile isdefined with a matrix P2.

Using the proposed genetic tiling algorithm, as shown in Figure 11(a), the same tile isproduced for the 2-D tilling problem, at about 70% of times. The volume of computation,volume of communication and fitness for the 30 tiles resulted from 30 times of runningthe algorithm for the 2-D loop tiling problem, is shown in Figure 11(a). As shown inFigure 11(b) the matrix representation of the tiles number 1, 2, 4, 5, 6, 8, 9, 11, 12, 13,14, 15, 18, 20, 22, 24, 25, 27, 28, 29 and 30 is exactly the same as the matrix P2 for theoptimal tile in Figure 10. This experiment, demonstrates the stability of the proposedgenetic tiling algorithm. Also, in Figure 11(c) student’s t-test is applied to the volume ofcomputation and communication and the fitness values resulted in 30 runs of our genetictilling algorithm. Student’s t-test is one the most commonly used techniques for testinga hypothesis on the basis of difference between sample means. The t-test determinesa probability that two populations are the same with respect to the variable tested. Asshown in Figure 11(c) the resulting values are divided into two groups of equal size,A and B. The two groups are further combined. The probability values P computed bycomparing the combined values with the t-table distribution, indicates the stability ofour proposed tiling algorithm.

Figure 12 demonstrates the reliability of the algorithm. In Figure 12(a), the conver-gence of the proposed genetic tiling algorithm is demonstrated by depicting the quality

Page 13: A New Genetic Algorithm for Loop Tiling

NEW GENETIC ALGORITHM FOR LOOP TILING 261

Figure 11. The stability of algorithm for 2-D loop tiling in different runs.

Page 14: A New Genetic Algorithm for Loop Tiling

262 PARSA AND LOTFI

Figure 12. The reliability of algorithm for 2-D loop tiling in two runs.

Page 15: A New Genetic Algorithm for Loop Tiling

NEW GENETIC ALGORITHM FOR LOOP TILING 263

Figure 13. Execution Time of the proposed algorithm for 2 to 6 dimensional tiles.

Figure 14. Stability of the proposed algorithms.

(Continued on next page.)

Page 16: A New Genetic Algorithm for Loop Tiling

264 PARSA AND LOTFI

of the best feasible tile in each generation of the evolutionary process of finding the opti-mal tile. In Figure 12(b), the fluctuation of the number of feasible for two different runsof the algorithm is depicted. The number of feasible solutions, for the above mentionedproblem, approaches the genetic population size, as the number of generations increase.

4.2. Tiles of 3 to 6 dimensions

As described earlier in the beginning of Section 4, the time complexity of our proposedgenetic tiling algorithm is O(n3) and the space complexity is O(n2). The experimentalresults for the generation of tiles of 2 to 6 dimensions, in Figure 13, almost shows thesame amount of the time complexity; as the number of dimensions increases the amountof time to find the first feasible solution and the time required to find the optimal tile

Figure 14. (Continued on next page.)

Page 17: A New Genetic Algorithm for Loop Tiling

NEW GENETIC ALGORITHM FOR LOOP TILING 265

Figure 14. (Continued).

increases in polynomial magnitude. The execution times shown in Figure 13 are all theaverage for about 30 times of running the algorithm.

The volume of computation, volume of communication and fitness for the 30 tilesresulted from 30 times of running the algorithm for the 3, 4 and 5 dimensional looptiling problem, is shown in Figure 14. Also the results of the student’s t-test are listedin Figure 14. This experiment, demonstrates the stability of the proposed genetic tilingalgorithm.

Figure 15 demonstrates the reliability of the algorithm for the generation of 3 to6 dimensional tiles. In Figure 15(a), the convergence of the proposed genetic tilingalgorithm is demonstrated by depicting the quality of the best feasible tile in eachgeneration of the evolutionary process of finding the optimal tile. In Figure 15(b), todemonstrate the reliability of the proposed algorithm, it is shown that the number offeasible tiles gets closer to the population size as the number of generations increases.As shown in Figure 15(b), the number of 3-D feasible solutions is very low for thegeneration numbers 0 to 25. As the number of dimensions of the tiles increases, it takes

Page 18: A New Genetic Algorithm for Loop Tiling

266 PARSA AND LOTFI

Figure 15. The number of feasible tiles for deferent dimensions loops tiling.

(Continued on next page.)

longer to find the first optimal tile, For instance, as shown in Figure 15(b), it takes longerfor the algorithm to find the first feasible 6-D tile, than the time required to find the first2 to 5 dimensional tiles.

5. Conclusions and future works

Tiling is an NP-hard problem. However, to solve the problem the previous works usedeterministic approaches. Even the award wining work described in reference [16] solvesthe tiling problem in only 2 dimensions, using a deterministic approach called dataspace tilling. Using constraints and objective functions defined in the data space tillingapproach, it is shown that the proposed genetic tiling algorithm not only generates exactlythe same two dimensional tiles as the ones generated by the data space technique butalso generates tiles of any parallelepiped shape and any dimensions greater than two.

Page 19: A New Genetic Algorithm for Loop Tiling

NEW GENETIC ALGORITHM FOR LOOP TILING 267

Figure 15. (Continued).

Also, the proposed tiling approach enhances the cache performance because it reducesthe number of cache misses by applying the I/O constraint.

The order of the proposed genetic algorithm is n3, where n is the number of the tiledimensions. The experimental results show a similar execution time for generating tilesof 2 to 6 dimensions. The proposed algorithm is stable and hence the tiles generatedby several times of running the algorithm are the same. We are currently working onschemes to apply tiles for automatic generation of parallel loops and scheduling theiterations of the parallel loops.

Page 20: A New Genetic Algorithm for Loop Tiling

268 PARSA AND LOTFI

References

1. H. I. Lislevand and H. O. Njaa. Connection between the Need for compression, band with and quality for

Multimedia Transmissions in UMTS. Master Thesis, Agder University collage Faculty of Engineering

and Science, 2004.

2. R. A. Dolin. A scalable distributed architecture for locating heterogeneous information sources. Ph. D.

Thesis. University of California Santa Barbara, 1998.

3. A. Almansa, S. Durand and B. Rouge. Measuring and improving image resolution by adaptation of the

reciprocal cell, July 5, 2002.

4. G. Rivera and CH. W. Tseng. Tiling optimizations for 3D scientific computations. In the Proceeding ofSC’00, Dallas, IEEE, pp.1–23, November 2000.

5. D. Cociorva, J. W. Wilkins, C. Lam, G. Baumgartner, and J. Ramanujam. Loop optimization for a class

of memory-constrained computations.ICS’01 Sorrento, Italy, ACM, pp. 103–113, 2001.

6. C. Leopold. Exploiting non-uniform reuse for cache optimization: A case study. SAC, Las Vegas, ACM,

pp. 560–564, 2001.

7. K. Mckinley, S. Carr, and CH. W. Tseng. Improving data locality with loop transformations. ACM Trans-actions on Programming Languages and Systems, 18(4):424–453, 1996.

8. G. Goumas, A. Sotiropoulos, and N. Koziris. Minimizing completion time for loop tiling with computation

and communication overlapping. IEEE, 1–10, 2001.

9. F. Rastello and Y. Robert. Automatic partitioning of parallel loops with parallelepiped-shaped tiles. IEEETransactions on Parallel and Distributed Systems, 13(5):460–470, 2002.

10. K. Hogstedt, L. Carter, and J. Ferrante. On the parallel execution time of tiled loops. In Transactions onParallel and Distributed Systems, 14(3):307–320, 2003.

11. T. Kisuki, P. M. W. Knijnenburg, and M. F. P. O’Boyle. Combined selection of tile sizes and unroll factors

using iterative compilation. IEEE, pp. 237–246, 2000.

12. P. R. Panda, H. Nakamura, N. D. Dutt, and A. Nicolau. Augmenting loop tiling with data alignment for

improved cache performance. IEEE Transactions on Computers, 48(2):142–149, 1999.

13. V. Sarkar and N. Megiddo. An analytical model for loop tiling and its solution. IEEE, pp. 146–153, 2000.

14. S. R. Sathe and P. M. Nawghare. On optimal tiling of iteration spaces. IEEE, pp. 256–258, 2000.

15. N. Ahmed, N. Mateev, and K. Pingali. Tiling imperfectly-nested loop nests, IEEE, pp. 1–14, 2000.

16. M. Kandamir, R. Bordawekar, A. Choudhary, and J. Ramanujam. A unified tiling approach for out-o-core

computations, pp. 1–12, 1997.

17. Ch. Hsu and U. Kremer. A quantitative analysis of tile size selection algorithms. Journal of Supercom-puting, 1–28, 2000.

18. G. Pike and N. P. Hilfenger. Better tiling and array contraction for compiling scienti c programs. IEEE,

pp. 1–12, 2002.

19. P. Boulet, D. Dongarra, Y. Robert, and F. Vivien, Tiling for heterogeneous computing platforms. pp. 1–19,

1998.

20. T. Bak. Evolutionary Algorithms in Theory and Practice. Oxford University, 1996.

21. D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley,

1989.

22. M. Gen and R. Cheng. Genetic Algorithms , Engineering Design. John Wiley & Sons, 1997.

23. C. Eisenbeis and J. C. Sogno. A general algorithm for data dependence analysis. In International Confer-ence on Supercomputing, Washington, July 19–23, pp. 1–28, 1992.

24. D. E. Maydan, J. L. Hennessy and M. S. Lam. Efficient and exact data dependence analysis. ACMSIGPLAN’91 Conference on Programming Language Design and Implementation, Toronto, Ontario,

Canada, June 26–28, pp. 1–10, 1991.

25. A. J. C. Bik and H. A. G. Wijshoff. Implementation of Fourier-Motzkin elimination. Leiden UniversityNetherlands, pp. 1–10, 1994.

26. L. Song and K. M. Kavi A technique for variable dependence driven loop peeling. In the Proceeding ofFifth International Conference on Algorithms and Architectures for Parallel Processing, IEEE, pp. 1–6,

2002.

27. F. J. Miyandashti. Loop uniformization in shared-memory MIMD machine. Master Thesis, Iran University

of Science and Technology, 1997 (Persian).

Page 21: A New Genetic Algorithm for Loop Tiling

NEW GENETIC ALGORITHM FOR LOOP TILING 269

28. S. Parsa, Sh. Lotfi, O. Boushehrian, A. Avaani and Sh. Tasharrofi. On the use of genetic algorithms

for parallelization of serial loops in super compilers for execution on super computers. In 9th AnnualComputer Society of Iran Computer Conference, pp. 62–69 (Persian) 2003.

29. E. Hodzic and W. Shang. On time supernode shape. IEEE Transactions on Parallel and DistribytedSystems, 13(12):1220–1233, 2002.

30. Z. Michalewicz. A survey of constraint handling techniques in evolutionary computation methods. In

McDonnell et al., pp. 135–155, 1995.

31. H. Zima and B. Chapman. Super Compilers for Parallel and Vector Computers. Addison-Wesley, 1991.

32. A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhauser, 2000.

33. R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers,

2001.

34. M. Saffari and K. Bahrampour. Matrices and Vectors. Schwartz, 1969 (Persian).