CR18: Advanced Compilers L02: Dependence Analysis Tomofumi Yuki 1.

1

CR18: Advanced Compilers

L02: Dependence Analysis

Tomofumi Yuki

2

Today’s Agenda Legality of Loop Transformations

Dependences Legality of loop parallelization Legality of loop permutation

Dependence Tests How to find dependences? Conservative tests Exact methods

Polyhedral Representations

Loop Parallelism “Simple” transformation

Not so simple to reason about Legality Performance impacts

More complicated cases Transform the loops to expose

parallelism

3

for (i=0; i<N; i++) S;

forall (i=0; i<N; i++) S;

4

Legality of Transformations First Rule of Compiler

preserve original semantics Many complications

loops parameters array accesses branches pointers random numbers

regular subset

5

Preserving Semantics Preserving the order of operations

one “easy” way to ensure preservation dependence is a partial order

Exceptions?

Dependences Express relations between statements

flow (true) dependence RAW

anti-dependence WAR

output dependence WAW

input dependence RAR

6

a = ...... = a

... = aa = ...

a = ...a = ...

... = a

... = a

7

Flow vs Anti Dependence Why is flow the “true” dependence?

Flow is value-based

Anti is memory-based

for i a[i] = ... ... = a[i]

for i ... = a[i] a[i] = ...

for i ... = a[i] b[i] = ...

Dependence Abstractions Distance Vector

distance between write and read [i,j] + c e.g., [0,1]

Direction Vector direction of the instance that uses the

value one of <, >, ≤, ≥, =, *

e.g., [0,<] less precise, but sometimes sufficient

8

Direction Vector Example 1

9

for (i=0; i<N; i++) for (j=0; j<M; j++) A[i][j] = A[i][0] + B[i][j];

i

j

distance vector [0,1], [0,2], [0,3]

direction vector [0,<]

Direction Vector Example 2

10

for (i=1; i<N; i++) for (j=0; j<M; j++) A[i][j] = A[i-1][j+1] + B[i][j];

i

j

distance vector [1,-1]

direction vector [<,>]

So what does these vectors do? Parallelism is clear

same for direction vectors Loop carried-dependence

loop at depth d carries a dependence if at least one of the distance/direction vectors have non-zero entry at d

11

[0,0,1][0,1,0][1,1,0]

[0,0, 1][0,1, 1][0,1,-1]

[1, 0,0][1, 1,0][1,-1,0]

12

Loop Carried Dependence Is any of the loops parallel?

What are the distance vectors?

for i for j A[j] = foo(A[j], A[j+1])

Legality of Loop Permutation Another application of distance vectors Which ones can you permute?

13

for (i=1; i<N; i++) for (j=1; j<M; j++) A[i][j] = A[i-1][j-1] + B[i][j];

for (i=1; i<N; i++) for (j=1; j<M; j++) A[i][j] = A[i][j-1] + B[i][j];


[1,1]

[0,1]

[1,-1]

Legality of Loop Permutation Another application of distance vectors Which ones can you permute?

14

for (i=1; i<N; i++) for (j=1; j<M; j++) A[i][j] = A[i-1][j-1] + B[i][j];

[1,1]

i

j

for (i=1; i<N; i++) for (j=1; j<M; j++) A[i][j] = A[i+1][j-1] + B[i][j];

Another application of distance vectors Which ones can you permute?

Legality of Loop Permutation

15

i

j

[0,1]for (i=1; i<N; i++) for (j=1; j<M; j++) A[i][j] = A[i][j-1] + B[i][j];

Another application of distance vectors Which ones can you permute?


[1,-1]

Legality of Loop Permutation

16

i

j

Fully permutable: [≤,...,≤]

17

Legality of Loop Reversal Is this transformation legal?


for (i=1; i<N; i++) for (j=M-1; j>0; j--) A[i][j] = A[i-1][j+1] + B[i][j];

[1,-1]

[?,?]

18

Today’s Agenda Legality of Loop Transformations

Dependences Legality of loop parallelization Legality of loop permutation

Dependence Tests How to find dependences? Conservative tests Exact methods

How to Find the Vectors Easy case

Not too easy

19


for (i=1; i<N; i++) for (j=0; j<M; j++) A[i] = A[i] + B[i][j];

for (i=1; i<N; i++) for (j=0; j<M; j++) A[i] = A[2*i-j+3] + B[i][j];

How to Find the Vectors Really difficult

No general solution polynomial case is undecidable can work for linear accesses wide range of precise-ness even for

linear case

20

for (i=1; i<N; i++) for (j=0; j<M; j++) { A[i*i+j*j-i*j] = A[i] + B[i][j]; A[i*j*j-i*j*3] = A[i] + B[i][j]; }

Dependence: Affine Case Given two accesses f(i,j) and g(x,y)

the two accesses are in conflict if: same location: f(i,j) = g(x,y) one of them is a write

Let f and g be affine a0+a1i+a2j = b0+b1x+b2y

The last write to a conflicting location is the producer

21

22

It is just solving a linear system Theoretically it is not that “hard”

Two Directions Polyhedral: use PIP and get exact

solution

Others: less expensive solutions work in practice

Exact Method: Polyhedral Model Array Dataflow Analysis [Feautrier 1991]

Given read and write statement instances r,w

Find w as a function of r such that r and w are in conflict w happens-before r w is the most recent write when everything is affine

Main Engine Parametric Integer Linear Programming

23

Exact Dependence Analysis Who produced the value read at A[j]?

Powerful but expensive

24

for (i=0; i<N; i++) for (j=0; j<M; j++)S: A[i] = A[j] + B[i][j];

S<i,j> = if i>j and j>0 : S<j,M-1>; if i=j and i>0 : S<j,j-1>; if j>i or i=j=0: A[j];

0≤i,i’<N0≤j,j’<Mi=j’

(i’,j’)<<(i,j)

obj:max i’*X+j’

25

ADA Example 1 What is the PIP problem?for (i = 0; i<=N; i++) for (j = i; j<=M; j++) A[j] = foo(A[j], A[j+1])

26

ADA Example 2 What is the PIP problem?for (i = 0; i<=N; i++) B[j] = foo(...);

for (j = i; j<=M; j++) A[j] = bar(B[j], B[j-1]);

27

Digression: Multiple Statements Within a domain, the order of execution

is given by lex. order

What do you do when you have multiple statements?

28

2d+1 Notation A convention to encode statement

ordering Called in many different names in the original ADA paper, it simply said

to:“use the textual order”

For a d-dimensional loop nest, use d+1 constant dimensionsfor i for j S1<i,j>; for j S2<i,j>; S3<i,j>;

dom(S1) = {0,i,0,j,0| ...}dom(S2) = {0,i,1,j,0| ...}dom(S3) = {0,i,1,j,1| ...}

29

ADA Example 2 What is the PIP problem?for (i = 0; i<=N; i++) B[j] = foo(...);

for (j = i; j<=M; j++) A[j] = bar(B[j], B[j-1]);

30

ADA Example 3 What is the PIP problem?for (t=0; t<=T; t++) { for (i=0; i<=N; i++) A[i] = foo(B[j]); for (j=0; j<=M; j++) B[j] = foo(A[i]);}

31

The Omega Test Another Variant of ADA

William Pugh (1991) based on Fourier-Motzkin for integers Presburger Arithmetic

Two slightly different branches one in US, the other in France we mostly talk about the French stuff,

but similar evolution took place with Omega

32

So what is wrong? Can’t we just use this powerful method

all the time?

Dependence Tests Same setting (conflicting memory

accesses) f(i,j) = g(x,y)

Let f and g be affine a0+a1i+a2j = b0+b1x+b2y linear Diophantine equation solution exists if

33

gcd(a1,a2,b1,b2)=|a0-b0|

GCD Test

3i=6x-3y+2 3i-6x+3y=2 gcd(3,6,3) = 2 ?

2i=4x-2y+2 2i-4x+2y=2 gcd(2,4,2) = 2 ?

34

for (i=1; i<N; i++) for (j=0; j<M; j++) A[3*i] = A[6*i-3*j+2] + B[i][j];

for (i=1; i<N; i++) for (j=0; j<M; j++) A[2*i] = A[4*i-2*j+2] + B[i][j];

35

GCD vs ADA ADA is clearly much more precise

(exact) What can ADA say for the following?

for (i=1; i<N; i++) for (j=0; j<i*i; j++) A[i] = foo(A[i])...

Why is GCD Test Inexact? When does GCD test give false positive?

What happens when GCD=1?

GCD test: i = j trivial solution exist

Main problem the space is completely unconstrained

36

for (i=0; i<N; i++) for (j=N; j<M; j++) A[i] = A[j] + B[i][j];

37

Exact vs Exact Array Dataflow Analysis

“exact” dependence analysis GCD Test

inexact dependence test Exact Dependence Tests

no false positives/negatives does not necessary give the producer

Banerjee Test [Banerjee 1976]

Making it slightly better There may be a dependence if

min(f(i,j)-g(x,y))≤0, and 0≤max(f(i,j)-g(x,y))

min(i-j) = 0-(M-1) = 1-M max(i-j) = N-1-N = -1

38

for (i=0; i<N; i++) for (j=N; j<M; j++) { A[i] = A[j] + B[i][j]; }

39

Banerjee Test Intuition interval of 2 functions

40

Banerjee Test Exact or Inexact? Weakness?

41

What happens with 2D arrays? How to formulate?

given read: A[i][j] and write: A[x+1][y+2]

How to formulate?

given read: A[i][i] and write: A[x+1][x+2]

for (i=0; i<N; i++) A[i][j] = A[i+1][j+2];

for (i=0; i<N; i++) A[i][i] = A[i+1][i+2];

42

Dimension-by-Dimension Simple extension

also called subscript-by-subscript Given

A[f1(ivec),f2(ivec),...,fn(ivec)] B[g1(jvec),g2(jvec),...,gn(jvec)]

Check feasibility of: f1 = g1 or f2 = g2 or , ..., f3 = g3

43

Limitations of Dim-by-Dim Is there parallelism in this loop nest?

“coupled” subscript

We need to check for feasibility of: f1 = g1 ∧ f2 = g2 ∧ , ..., ∧ f3 = g3

for (i=0; i<N; i++) for (j=0; j<M; j++) A[i][j] = ... A[2*j][i] = ...

44

Lambda Test [Li et al. 1989]

Multi-dimensional Banerjee Given

A[f1(ivec),f2(ivec),...,fn(ivec)] B[g1(jvec),g2(jvec),...,gn(jvec)]

Check

45

How to get Direction Vectors Pick a direction vector and then test it!

only test relevant vectors to the legality testing for lex. negative vectors can

return true, but makes no sense What makes sense for the following?


46

Lambda Test Let’s try [=,<]


47



48



i,i’

j,j’

ψ2

ψ1

49



50

Delta Test [Goff et al. 1991]

Further extensions for multiple indices Pragmatic approach

key observation: real programs are not that complicated when it comes to array accesses

1st Step, classify array access (pairs) ZIV (Zero Index Variable) pair SIV (Single Index Variable) pair MIV (Multiple Index Variables) pair

51

Delta Test Classifications ZIV

e.g., A[N], A[10], ... loop invariant

SIV e.g., A[i], A[j], A[i+2], ... only one loop iterator

MIV e.g., A[i+j], A[2*i-j], A[i*j]... when two ore more iterators are

involved

52

Array Access Patterns What do they look like in “real-life”?

1D, 2D, 3D+ arrays

coupled, separate

ZIV, SIV, MIV

53

Delta Test Algorithm 1. Classify accesses 2. Solve the easy cases

if separable ZIV/SIV proves independence, done

3. Solve the harder cases BUT, some information are used from

Step 2. constraint intersection/propagation

54

Constraint Intersection It is sometimes easy to show that

multiple constraints cannot be satisfied at the same time If you have coupled SIV accesses

e.g., A[i,i] = A[i+1, i+2] By analyzing each dim separately, you

get i’ = i+1 and j’=i+2

But you also know that the valid space is i’=j’, i’=j’=i+c

Intersecting everything gives empty set

55

Constraint Propagation Like intersection, SIV gives partial

information e.g., A[i,i+j] = A[i+1, i+j]

i’=i+1 is derived from the 1st dim you then substitute the info to the 2nd

dim A[i’,j’] = A[i+1, i+1+j’] Reformulating the 2nd dim gives

i+1+j’ = i+j which yields

j’=j-1

56

Putting it All Together Delta-Test aims to take advantage of

various properties of how the code is written a collection of many small tricks

It is probably closer to what is in actual compilers than the polyhedral model

57

transition

58

Back to Array Dataflow Result Another view: PRDG

Polyhedral Reduced Dependence Graph reduced vs extended (recall L01)

Node: Statement domain Edge: Dependence (domain + function)for (i=0; i<N; i++) { for (j=0; j<P; j++)S0: A[j] = B[j]+B[j+1]; for (j=0; j<Q; j++)S1: B[j] = A[j];}

S0

S1

0≤i<N0≤j<P

0≤i<N0≤j<Q

59

Polyhedral Objects We will usually use ISL syntax Set

[<params>] -> { [<indices>] : <constraints>}

[N,M]->{ [i,j] : 0<=i<N and 0<=j<M } Relation

[<params>] -> { [<in>] -> [out] : <constraints>}

[N,M]->{ [i,j] -> [x,y] : x=i+1 } Function is a special case of relation

I often use (<indices> → <expression>)

60

Additional Conventions You can name each tuple

Following are NOT equivalent [N,M] -> { S0[i,j] : 0<=i<N and 0<=j<M } [N,M] -> { S1[i,j] : 0<=i<N and 0<=j<M }

Index names DO NOT matter Following are equivalent

[N,M] -> { [i,j] : 0<=i<N and 0<=j<M } [N,M] -> { [x,y] : 0<=x<N and 0<=y<M }

Names of parameters DO matter

61

Set vs Relations They are not really different

[N]->{ [i,j] -> [x,y] : i=x and j=y } [N]->{ [i,j,x,y] : i=x and j=y }

Mostly for convenience when representing program information

Ex1. Dependence S0[i,j] -> S1[i’,j’]

Ex2. Array access S0[i,j] -> A[i]

62

Matrix Representation Polyhedral obj. are often encoded as

matrices Ax + b ≥ 0

A: linear part (matrix) x: indices (symbolic vector) b: constant (constant vector) Px + Ax + b to explicitly separate

params Simply Ax + b for functions

Algebraic properties of A is often used

63

Matrix Form Example { [i,j] : 0≤i<10 and 0≤j<i }

64

Integer Set Library Tool for manipulating sets and relations

mostly by Sven Verdoolaege Kind of does every thing now

manipulating set/relation scheduling code generation PIP counting integer points ...

65

ISL Demo Online interface

http://www2.cs.kuleuven.be/cgi-bin/dtai/barvinok.cgi

66

PRDG Example (Dataflow Edge)

for (i=0; i<N; i++) { for (j=0; j<P; j++)S0: A[j] = foo(A[j]); for (j=0; j<Q; j++)S1: A[j] = bar(A[j]);}

S0

S1

S0[i,j]→S0[i+1,j] : j≥Q S0[i,j]→S1[i,j] : j<Q

S1[i,j]→S1[i+1,j] : j<P S1[i,j]→S1[i+1,j] : j≥P

67

PRDG Example (Dep. Polyhedra)

for (i=0; i<N; i++) { for (j=0; j<P; j++)S0: A[j] = foo(A[j]); for (j=0; j<Q; j++)S1: A[j] = bar(A[j]);}

S0

S1

j≥Q, i’=i+1, j=j’ i’=i, j=j’ : j<Q

i’=i+1, j=j’ : j<P i’=i+1, j=j’ : j’≥P

0≤i<N0≤j<P

0≤i<N0≤j<Q

68

Uniform vs Affine Dependence Uniform dependences

constant offset: <i,j> → <i,j> + c can be described with distance vectors

Affine dependences any affine function: <i,j> → A.[i j]+b uniform when A = I

When do we need affine dependences?

69

PRDG + Expressions = ??? PRDG is an abstraction of dependences

what each statement does is lost

You may want the expressions in you analysis typically when semantic properties are

useful

Polyhedral Equational Model

70

Alpha Language Equational Language

or PRDG + Expressions or Systems of Affine Recurrence

Equations or dynamic single assignment code

Basic structure declaration of the domain of equations affine equations that define the

computation performed at each iteration point

71

Alpha Examplefor (i=1; i<=N; i++) {S0: A[i,i] = foo(); for (j=i+1; j<=M; j++)S1: A[i,j] = A[i,j-1] * A[i,i];}

S0 : [N,M] -> { [i] : 1<=i<=N }S1 : [N,M] -> { [i,j] : i<=i<=N and i<j<=M }

S0[i] = foo();S1[i,j] = case { : j=1} : A [i,j-1] * S0[i]; { : j>1} : S1[i,j-1] * S0[i];esac;

72

Role of Alpha in this Course Polyhedral Equational Model is not

popular within the already niche Polyhedral

Model I know A LOT about it because my

advisor is the main guy working on it It is a good IR to look at both

dependences and expressions it is also suited for teaching some of the

aspects I sometimes use it to replace PRDG

but keep in mind that it is a different view of it

73

Next Time Transforming polyhedral representations Tiling

CR18: Advanced Compilers L02: Dependence Analysis Tomofumi Yuki 1.

Documents