@HC 5KK70 Platform-based Design 1 Loop Transformations • Motivation • Loop level transformations catalogus – Loop merging – Loop interchange – Loop unrolling – Unroll-and-Jam – Loop tiling • Loop Transformation Theory and Dependence Analysis Thanks for many slides go to the DTSE people from IMEC an Dr. Peter Knijnenburg († 2007) from Leiden University
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Any array access A[e1][e2] for linear index expressions e1 and e2 can be represented as an access matrix and offset vector.
A + a
• This can be considered as a mapping from the iteration space into the storage space of the array (which is a trivial polyhedron)
@HC 5KK70 Platform-based Design 48
Unimodular Matrices
• A unimodular matrix T is a matrix with integer entries and determinant 1.
• This means that such a matrix maps an object onto another object with exactly the same number of integer points in it.
• Its inverse T¹ always exist and is unimodular as well.
@HC 5KK70 Platform-based Design 49
Types of Unimodular Transformations
• Loop interchange
• Loop reversal
• Loop skewing for arbitrary skew factor
• Since unimodular transformations are closed under multiplication, any combination is a unimodular transformation again.
@HC 5KK70 Platform-based Design 50
Application
• Transformed loop nest is given by AT¹ I’ a• Any array access matrix is transformed into
AT¹.• Transformed loop nest needs to be
normalized by means of Fourier-Motzkin elimination to ensure that loop bounds are affine expressions in more outer loop indices.
@HC 5KK70 Platform-based Design 51
Dependence Analysis
Consider following statements:
S1: a = b + c;
S2: d = a + f;
S3: a = g + h;
• S1 S2: true or flow dependence = RaW
• S2 S3: anti-dependence = WaR
• S1 S3: output dependence = WaW
@HC 5KK70 Platform-based Design 52
Dependences in Loops
• Consider the following loop
for(i=0; i<N; i++){
S1: a[i] = …;
S2: b[i] = a[i-1];}
• Loop carried dependence S1 S2.
• Need to detect if there exists i and i’ such that i = i’-1 in loop space.
@HC 5KK70 Platform-based Design 53
Definition of Dependence
• There exists a dependence if there two statement instances that refer to the same memory location and (at least) one of them is a write.
• There should not be a write between these two statement instances.
• In general, it is undecidable whether there exist a dependence.
@HC 5KK70 Platform-based Design 54
Direction of Dependence
• If there is a flow dependence between two statements S1 and S2 in a loop, then S1 writes to a variable in an earlier iteration than S2 reads that variable.
• The iteration vector of the write is lexicographically less than the iteration vector of the read.
• I I’ iff i1 = i’1 i(k-1) = i’(k-1) ik < i’k for some k.
@HC 5KK70 Platform-based Design 55
Direction Vectors
• A direction vector is a vector
(=,=,,=,<,*,*,,*)– where * can denote = or < or >.
• Such a vector encodes a (collection of) dependence.
• A loop transformation should result in a new direction vector for the dependence that is also lexicographically positive.
@HC 5KK70 Platform-based Design 56
Loop Interchange
• Interchanging two loops also interchanges the corresponding entries in a direction vector.
• Example: if direction vector of a dependence is (<,>) then we may not interchange the loops because the resulting direction would be (>,<) which is lexicographically negative.
@HC 5KK70 Platform-based Design 57
Affine Bounds and Indices
• We assume loop bounds and array index expressions are affine expressions:
a0 + a1 * i1 + + ak * ik
• Each loop bound for loop index ik is an affine expressions over the previous loop indices i1 to i(k-1).
• Each loop index expression is a affine expression over all loop indices.
@HC 5KK70 Platform-based Design 58
Non-Affine Expressions
• Index expressions like i*j cannot be handled by dependence tests. We must assume that there exists a dependence.
• An important class of index expressions are indirections A[B[i]]. These occur frequently in scientific applications (sparse matrix computations).
• In embedded applications???
@HC 5KK70 Platform-based Design 59
Linear Diophantine Equations
• A linear diophantine equations is of the form
aj * xj = c
• Equation has a solution iff gcd(a1,,an) is divisor of c
@HC 5KK70 Platform-based Design 60
GCD Test for Dependence
• Assume single loop and two references A[a+bi] and A[c+di].
• If there exist a dependence, then gcd(b,d) divides (c-a).
• Note the direction of the implication!
• If gcd(b,d) does not divide (c-a) then there exists no dependence.
@HC 5KK70 Platform-based Design 61
GCD Test (cont’d)
• However, if gcd(b,d) does divide (c-a) then there might exist a dependence.
• Test is not exact since it does not take into account loop bounds.
• For example:
• for(i=0; i<10; i++)
• A[i] = A[i+10] + 1;
@HC 5KK70 Platform-based Design 62
GCD Test (cont’d)
• Using the Theorem on linear diophantine equations, we can test in arbitrary loop nests.
• We need one test for each direction vector.
• Vector (=,=,,=,<,) implies that first k indices are the same.
• See book by Zima for details.
@HC 5KK70 Platform-based Design 63
Other Dependence Tests
• There exist many dependence test– Separability test– GCD test– Banerjee test– Range test– Fourier-Motzkin test– Omega test
• Exactness increases, but so does the cost.
@HC 5KK70 Platform-based Design 64
Fourier-Motzkin Elimination
• Consider a collection of linear inequalities over the variables i1,,in
• Is this system consistent, or does there exist a solution?
• FM-elimination can determine this.
@HC 5KK70 Platform-based Design 65
FM-Elimination (cont’d)
• First, create all pairs L(i1,,i(n-1)) in and in U(i1,,i(n-1)). This is solution for in.
• Then create new system
• L(i1,,i(n-1)) U(i1,,i(n-1))
• together with all original inequalities not involving in.
• This new system has one variable less and we continue this way.
@HC 5KK70 Platform-based Design 66
FM-Elimination (cont’d)
• After eliminating i1, we end up with collection of inequalities between constants c1 c1’.
• The original system is consistent iff every such inequality can be satisfied.
• There does not exist an inequality like • 10 3.• There may be exponentially many new
inequalities generated!
@HC 5KK70 Platform-based Design 67
Fourier-Motzkin Test
• Loop bounds plus array index expressions generate sets of inequalities, using new loop indices i’ for the sink of dependence.
• Each direction vector generates inequalities
• i1 = i1’ i(k-1) = i(k-1)’ ik < ik’
• If all these systems are inconsistent, then there exists no dependence.
• This test is not exact (real solutions but no integer ones) but almost.
@HC 5KK70 Platform-based Design 68
N-Dimensional Arrays
• Test in each dimension separately.
• This can introduce another level of inaccuracy.
• Some tests (FM and Omega test) can test in many dimensions at the same time.
• Otherwise, you can linearize an array: Transform a logically N-dimensional array to its one-dimensional storage format.
@HC 5KK70 Platform-based Design 69
Hierarchy of Tests
• Try cheap test, then more expensive ones:• if (cheap test1= NO)• then print ‘NO’• else if (test2 = NO)• then print ‘NO’• else if (test3 = NO)• then print ‘NO’• else
@HC 5KK70 Platform-based Design 70
Practical Dependence Testing
• Cheap tests, like GCD and Banerjee tests, can disprove many dependences.
• Adding expensive tests only disproves a few more possible dependences.
• Compiler writer needs to trade-off compilation time and accuracy of dependence testing.
• For time critical applications, expensive tests like Omega test (exact!) can be used.