CR18: Advanced Compilers L04: Scheduling Tomofumi Yuki 1.

Post on 17-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

1

CR18: Advanced Compilers

L04: Scheduling

Tomofumi Yuki

2

Today’s Agenda

Revisiting legality with schedules How to find schedules

3

Schedules

Recall that we had many “schedules” here, we use the one related to time

In general, a schedule is a function s.t. input: statement instance output: timestamp where instances mapped to the same

timestamp “may happen in parallel” We talk about static schedules in this

class

4

Legality with Schedule

Causality Condition Given a PRDG with nodes N and edges E

src(e) = producer statement dst(e) = consumer statement DS = domain of statement node S De = domain of dependence e

Check:

5

Example (uniform case)

Back to the legality check with vectorsfor (i=1; i<N; i++) for (j=1; j<M; j++)S: A[i][j] = A[i-1][j+1] + B[i][j];

[1,-1]

i

j θs(i,j)=i

e: (i,j->i+1,j-1)

θs(i+1,j-1)>θs(i,j)

i+1>i

6

Example (uniform case)

Back to the legality check with vectorsfor (i=1; i<N; i++) for (j=1; j<M; j++)S: A[i][j] = A[i-1][j+1] + B[i][j];

[1,-1]

i

j θs(i,j)=j

e: (i,j->i+1,j-1)

θs(i+1,j-1)>θs(i,j)

j-1>j

7

Example (uniform case)

Back to the legality check with vectorsfor (i=1; i<N; i++) for (j=1; j<M; j++)S: A[i][j] = A[i-1][j+1] + B[i][j];

[1,-1]

i

j θs(i,j)=i-j

e: (i,j->i+1,j-1)

θs(i+1,j-1)>θs(i,j)

i-j+2>i-j

8

Example (affine case)

Back to the legality check with vectorsfor (i=1; i<N; i++) for (j=1; j<M; j++)S: A[i][j] = A[i][j-1] + A[i-1][M-j]; [1,*]

i

j θs(i,j)=i+j

e: (i,j->i+1,M-j)

θs(i+1,M-j)>θs(i,j)

M+i-j+1>i+j(M+1)/2>j

[0,1]

9

The Scheduling Problem

Find θs that satisfy causality conditions i.e., no dependences are violated

Connection to loops you can complete the schedule to get

the transformation for loops

Sometimes, the problem is formulated in terms of the transform instead of schedule

10

Parallel Execution of DO Loops [Lamport 74]

One of the 1st papers on automatic parallelization

Hyper-plane method Loops of the form

Scope of dependences: uniform + α

for I1 = l1 .. u1

... for In = ln .. un

body

for J1 = λ1 .. μ1

... for Jk = λk .. μk

forall Jk+1 = λk+1 .. μk+1

... forall Jn = λn .. μn

body

11

The Hyper-Plane Method

The main theorem (simplified) We are looking for a schedule θ, such

thatthe inner n-1 loops are parallel θ is restricted to linear θ=a1I1+...+anIn

The key idea: given a distance vector c we want θ (c)>0 proof of existence for lex. positive c in

paper

12

The Hyper-Plane Method

Optimizing the schedule

What should be the objective function? In this paper, it is min(μ1-λ1)

which is min(θ’(μ1-λ1)) θ’(x)=|a1|x1+...+|an|xn

for I1 = l1 .. u1

... for In = ln .. un

body

for J1 = λ1 .. μ1

forall J2 = λ2 .. μ2

... forall Jn = λn .. μn

body

13

Example 1

With distance vectors [1,0] [0,1] θ(i,j)=ai+bj Constraints

θ([1,0])>0 : a>0 θ([0,1])>0 : b>0

Minimize Ni+Mj for 0≤i<N, 0≤j<M

i

j

14

Example 2

With distance vectors [1,-1] [0,1] θ(i,j)=ai+bj Constraints

θ([1,-1])>0: a>b θ([0,1])>0 : b>0

Minimize Ni+Mj for 0≤i<N, 0≤j<M

i

j

15

The General Plane Method

Generalizing the Hyper-Plane method When the dependences are no longer

uniform Given the iteration vector x,

Hyper-Plane method is for array accesses:

VAR[p(x)+c] where p is a permutation common to the

entire body General-Plane method extends to:

VAR[d(p(x)+c)] where d “drops” some number of

dimensions

16

Final Words on this Paper

Very earlier paper, but it does dependence analysis scheduling loop transformation / code generation

Similar technique by Wolf & Lam for direction vectors (1991)

17

Farkas Scheduling [Feautrier 92]

Given a PRDG find a schedule θs for each statement S θis restricted to affine functions

Affine form of Farkas Lemma given a domain D = Ax+b≥0 an affine form ψ(x) is non-negative in D iff

it can be described as positive combination

Farkas Multiplier

18

Problem Formulation

Given a PRDG with nodes N and edges E Positivity:

all schedules starts at 0 Causality:

source/destination instance x,y when the dependence is active

note: edge is producer to consumer

19

Using Farkas Lemma

Given statements S1 and S2 with schedules θS1, θS2 and a dependence e (from S1 to S2)

We want to make sure θS2(y)>θS1(x) for all <x,y> in De

which is θS2(y)-θS1(x)-1≥0 in De

make it a single function to get ψe (x,y)≥0 in De

20

The Farkas Method

Build constraints on the schedule build ψe(x,y) for each e each ψ constraints the Farkas multipliers solve!

21

Example 1

Consider the following

a

DS0: {[i,j] : 0≤i≤N and 0≤j<i }

DS1: {[i] : 0≤i≤N} e1: S0[i,j]->S0[i,j-1] e2: S1[i]->S0[i,i-1]

for (i=0 .. N) { for (j=0 .. i-1)S0: x[i] = x[i] – L[i,j]*x[j];S1: x[i] = x[i] / L[i,j];}

direction is consumer to producer

22

Example 2

Consider the following

DS0: {[i] : 0≤i≤N}

DS1: {[i,j] : 0≤i,j≤N} e1: S1[i,j]->S0[i] : j=0 e2: S1[i,j]->S1[i,j-1] : j>0

for (i=0 .. N) {S0: x[i] = 0; for (j=0 .. N)S1: x[i] = x[i] + L[i,j]*b[j];}

23

Example 3

Back to this examplefor (i=1; i<N; i++) for (j=1; j<M; j++)S: A[i][j] = A[i][j-1] + A[i-1][M-j];

i

j θs=a1i+a2j+a0

e1:(i,j->i,j-1)

e2:(i,j->i-1,M-j)

24

Multi-Dimensional Scheduling One-Dimensional Affine Schedules are

not sufficient linearization of lex. order is polynomial

(if you have parameters)

So we want to find a set of θs for each statement

25

Multi-Dimensional Farkas

Formulate the problem just like 1D case each dependence adds constraints

But, we allow some to be not satisfied recall causality condition

δ< 0 : dependence violation δ= 0 : weakly satisfied δ> 0 : strongly satisfied

26

Greedy Algorithm

Given a PRDG with edged E 1. formulate the problem for all edges in E

2. weakly satisfy all of them 3. strongly satisfy as much as possible 4. add the obtained θ to the list 5. remove strongly satisfied edged from E

6. repeat until E is empty The obtained list of θs is your schedule

27

Back to the Example

Back to this examplefor (i=1; i<N; i++) for (j=1; j<M; j++)S: A[i][j] = A[i][j-1] + A[i-1][M-j];

i

j θs=a1i+a2j+a0

e1:(i,j->i,j-1)

e2:(i,j->i-1,M-j)

28

The Vertex Method

Another method for scheduling Uses the generator representation of

polyhedra Constraint representation:

intersection of half-spaces Generator representation:

convex hull of vertices, rays, and lines

The Mapping of Linear Recurrence Equations on Regular Arrays, Patrice Quinton and Vincent Van Dogen, 1989

29

The Main Theorem

A schedule legal for the vertices + rays + lines is also legal for the entire polyhedron generated by them you can compute constraints on

schedules no need to reason about potentially

infinite set of iterations

30

On the Optimality of Scheduling Paper by Alain Darte and Frédéric Vivien Survey of various methods for

scheduling what is the dependence abstraction

used? what can you say about optimality?

Optimality: does the method find all parallelism? how to define “all” parallelism?

31

Scheduling Algorithms

Allen and Kennedy [1987] targeting vector machines; dependence-

levels Wolf and Lam [1991]

Lamport-like; dependence vectors Darte and Vivien [1996]

Farkas-like; dependence polyhedra Feautrier [1992]

Farkas Algorithm; affine dependences Lim and Lam [1997]

32

Allen and Kennedy (in short)

You have dependence-levels only i.e., you know the dimension where the

dependence is carried Parallelizes the inner loops with no loop

carried dependence this paper introduced dependence levels

Also deals with loop fusion if dependence is carried in some outer

common loop, it can safely be fused

33

Optimality of Allen and Kennedy The dependence information is very

limited dependence-level only

Then the parallelism found is actually optimal later proved by Darte and Vivien

34

Wolf and Lam (in short)

Input: direction vectors Output: fully permutable loops

what does this mean? Context: unimodular transformations

Optimal parallelism extraction if you only know direction vectors perfectly nested loops

35

Optimality of Farkas Algorithm Original paper had no claims

later proved by Darte and Vivien

The Greedy algorithm is actually optimal!

With a few caveats affine schedules one schedule per statement

36

Index-Set Splitting

Piece-wise affine schedule or split a statement into multiple

statements or split an equation into ...

Main Idea: using one schedule for the entire statement is (sometimes) not optimal

37

Example: Smashing

Periodic Boundaries can you tile?

i

j

38

Example: Smashing

Periodic Boundaries

i

j

i

j

39

How Good is Optimal

What does Farkas scheduling bring?

top related