Top Banner
Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger
38

Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

ValueEvolution Graph

The

And its Applications to Automatic Parallelization

Silvius Rus, Dongmin Zhang, andLawrence Rauchwerger

Page 2: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Motivating Example: Parallelization

q = 0 DO i = 1, 100 q = q+1 B(q) = 1 ENDDO

Sample Code

!$OMP PARALLEL DO DO i = 1, 100 B(i) = 1 ENDDO q = 100

Automatically Parallelized

Classic Solution

1. Induction Variable Substitution: q f(i) = i

2. Dependence Test:

1 ≤ i1 ≤ 100

1 ≤ i2 ≤ 100

i1 i2

f(i1) = f(i2)

Page 3: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Motivating Example: Parallelization

1 old = p 2 q = 0 3 DO i = 1, old 4 q = q+1 5 B(q) = 1 6 IF (A(q).GT.0) 7 p = p+1 8 A(p) = 0 9 ENDIF10 ENDDO

Sample Code

q is substituted with closed form q = i

p cannot be substituted with a closed form

1 old = p 2 3 DO i = 1, old 4 5 B(i) = 1 6 IF (A(i).GT.0) 7 p = p+1 8 A(p) = 0 9 ENDIF10 ENDDO

After Induction Variable Recognition/Substitution

Array B is independent

Array A is dependent

Anti Flow

Output

Page 4: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Motivating Example: Parallelization

1 old = p 2 3 DO i = 1, old 4 5 6 IF (A(i).GT.0) 7 p = p+1 8 A(p) = 0 9 ENDIF10 ENDDO

After Induction Variable Recognition/Substitution

Anti Flow

Outputp(8)[1:old]p(8)[1:old]

p(8) non-repeating

Page 5: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Recurrence Properties

1 old = p 3 DO i = 1, old 6 IF (A(i).GT.0) 7 p = p+1 8 A(p) = 0 9 ENDIF10 ENDDO

Cross-iteration mutually independent if p strictly increasing, or

step(p|i=k, p|i=k+1) > 0, k [1:old]

STEP

Page 6: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Recurrence Properties

1 old = p 3 DO i = 1, old 6 IF (A(i).GT.0) 7 p = p+1 8 A(p) = 0 9 ENDIF10 ENDDO

Independent if p and i belong to disjoint sets, or

image(p|i[1:old]) image(i)|i[1:old])=

STEP

IMAGE

Page 7: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

A Simple Value Evolution Graph

1 p1 = 0 2 IF (cond) 3 p2 = p1+5 4 ELSE 5 p3 = p1+7 6 ENDIF p4 = γ(p2, p3, cond) 7 IF (p4>0) 8 … 9 ENDIF

p2

5 7

p1:0

p3

p4

00

p4 = p1 + 5 + 0

p4 > p1

p4 > 0

p4 = p1 + 7 + 0

p4 > p1

p4 > 0

1 p = 0 2 IF (cond) 3 p = p+5 4 ELSE 5 p = p+7 6 ENDIF

7 IF (p>0) 8 … 9 ENDIF

Sample CodeStatic Single Assignment Form

Page 8: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

The Value Evolution Graph

1 old = p0

3 DO i = 1, old p1 = μ(p0, p3) 5 B(i) = 1 6 IF (A(i).GT.0) 7 p2 = p1+1 8 A(p2) = 0 9 ENDIF p3 = γ(p1, p2, A(i).GT.0)10 ENDDO p4 = η(p0, p1)

Page 9: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Our Solution: The Value Evolution Graph

VEG: • acyclic graph, GSA names as nodes• one for each loop body/subprogram

1 old = p1 old = p00

3 DO i = 1, old3 DO i = 1, old p1 = μ(p0, p3) 5 B(i) = 1 6 IF (A(i).GT.0) 7 p2 = p1+1 8 A(p2) = 0 9 ENDIF p3 = γ(p1, p2, A(i).GT.0)10 ENDDO10 ENDDO pp44 = = ηη(p(p00, p, p11))

p3

p1

p20

0

1

VEG for the loop body

Page 10: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Our Solution: The Value Evolution Graph

VEG: • acyclic graph, GSA names as nodes• one for each loop body/subprogram

old

p4

p0

p1

[0:old]

0

0

0

VEG for the outer context

1 old = p0

3 DO i = 1, old3 DO i = 1, old pp11 = = μμ(p(p00, p, p33)) 5 B(i) = 15 B(i) = 1 6 IF (A(i).GT.0)6 IF (A(i).GT.0) 7 p7 p22 = p = p11+1+1 8 A(p8 A(p22) = 0) = 0 9 ENDIF9 ENDIF pp33 = = γγ(p(p11, p, A(i).GT.0), p, A(i).GT.0)10 ENDDO10 ENDDO p4 = η(p0, p1)

Page 11: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Our Solution: The Value Evolution Graph

VEG: • acyclic graph, GSA names as nodes• one for each loop body/subprogram• hierarchical relations among VEGs

1 old = p0

3 DO i = 1, old p1 = μ(p0, p3) 5 B(i) = 1 6 IF (A(i).GT.0) 7 p2 = p1+1 8 A(p2) = 0 9 ENDIF p3 = γ(p1, p2, A(i).GT.0)10 ENDDO p4 = η(p0, p1)

p3

p1

p20

0

1

VEG for the loop body

old

p4

p0

p1

[0:old]

0

0

0

VEG for the outer context

Page 12: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

VEG Nodes

p0 = 0DO i = 1, N

p1 = μ(p0, p4) IF (A(i).GT.0) p2 = p1+1 ELSE p3 = 0 ENDIF p4 = γ(p2, p3, A(i).GT.0)

ENDDO

p2

p1

00

1

p3:0 Regular

μ

Back

Input

p0

Input: result of assignment of loop invariant

Back: last value in one iteration

μ : merges value from outside with loop-back

p4

Regular: all others

Page 13: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

VEG Edges

p1 = …IF (A(i).GT.0) p2 = p1+1ENDIFp3 = γ(p1, p2, A(i).GT.0)

p2

p1

(+0, A(i).LE.0)

(+0, A(i).GT.0)

(+1, .TRUE.)

p3

Page 14: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

VEG Distance

p1 = …IF (A(i).GT.0) p2 = p1+1ENDIFp3 = γ(p1, p2, A(i).GT.0)

p2

p1

0

0

1

p3

distance(p1,p3) = [ ShortestPath(p1,p3) : LongestPath(p1,p3) ]

distance(p1,p3) = [0:1]

Page 15: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Recurrence Properties

p1

p20

0

1

p0 = 0DO i = 1, N p1 = μ(p0, p3) IF (A(i).GT.0) p2 = p1+1 ENDIF p3 = γ(p1, p2)ENDDO

step(p2|i=k, p2|i=k+1) =

distance(p2, p3) + distance(p1, p2) = 0 + 1 = 1

Back Node

μ-Node p3

Page 16: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Recurrence Properties

p1

p20

0

1

p0 = 0DO i = 1, N p1 = μ(p0, p3) IF (A(i).GT.0) p2 = p1+1 ENDIF p3 = γ(p1, p2)ENDDO

step(p2|i=k, p2|i=k+1) =

distance(p2, p3) + distance(p1, p2) = 0 + 1 = 1

Back Node

μ-Node

image(p2) i[1:N] = initial value(p1) +

step(p1|i=k, p1|i=k+1) * [0:N–1] +

distance(p1, p2) = 0 + [0:1]*[0:N-1] + 1 = [1:N]

p3

Page 17: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Recurrence Properties

p1

p20

0

1

p0 = 0DO i = 1, N p1 = μ(p0, p3) IF (A(i).GT.0) p2 = p1+1 ENDIF p3 = γ(p1, p2)ENDDO

step(p2|i=k, p2|i=k+1) =

distance(p2, p3) + distance(p1, p2) = 0 + 1 = 1

Back Node

μ-Node

image(p2) i[1:N] = initial value(p1) +

step(p1|i=k, p1|i=k+1) * [0:N–1] +

distance(p1, p2) = 0 + [0:1]*[0:N-1] + 1 = [1:N]

last value(p1) i=N = initial value(p1) +

step(p1|i=k, p1|i=k+1) * N = 0 + [0:1]*N = [0:N]

p3

Page 18: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Recurrence Properties

old = p0

q0 = 0 DO i = 1, old q1 = μ(q0, q2) p1 = μ(p0, p3) q2 = q1+1 B(q2) = 1 IF (A(i).GT.0) p2 = p1+1 A(p2) = 0 ENDIF p3 = γ(p1, p2)ENDDO

Closed Form

No Closed Form

q1

1

q2

p1

p20

0

1

p3

step(q2, q2) = 1 B(q2) independent

step(p2, p2) = 1 A(p2) independent

Page 19: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Logic Inference on the VEG

1 f1 = 0 2 IF (c1) 3 f2 = 1 4 ENDIF 5 f3 = γ(f1,f2,c1) 6 IF (c2) 7 value = … 8 ELSE 9 f4 = f3+210 ENDIF11 f5 = γ(f3,f4,c2)12 IF (f5.EQ.1)13 PRINT *, value14 ENDIF

f3

0 0

f1:0 f2:1

f4

2

f5

0

(0,c2)

(f5.EQ.1) c2?

Extract range: f5.EQ.1 f5 [1:1]

Trace range backwards: f5 [1:1]

f5 [1:1]

f4+0 [1:1] f3+0 [1:1]

f3+2 [1:1]

f1+2 [1:1] f2+2 [1:1]

2 [1:1] 3 [1:1]

f1 [1:1] f2 [1:1]

0 [1:1] 1 [1:1]

Propagate value from 7 to 13

Page 20: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

VEG Pruning

1 A(p1) = …2 f1 = 03 IF (cond)4 f2 = 15 p2 = p1+16 ENDIF p3 = γ(p1, p2, cond) f3 = γ(f1, f2, cond)7 IF (f3.GT.0)8 p4 = p3-19 ENDIF p5 = γ(p3, p4, f3.GT.0)10 IF (f3.EQ.1)11 … = A(p5)12 ENDIF

Is … = A(p5) covered by A(p1) = … ?

f3

0 0

f1:0 f2:1

f3.EQ.1 cond

p3

p1

p20

0

1

p5

p40

0

-1

VEG before Pruning

p5[p1-1:p1 +1]

f3.EQ.1 f3.GT.0

p5[p1-1:p1]

p3

p1

p2

0

1

p5

p4

0

-1

After VEG-based GSA-Path Pruning

p5 = p1

p3

p1

p20

0

1

p5

p4

0

-1

After GSA-Path Pruning

[ Tu, Padua, ICS95 ]

Page 21: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Automatic Parallelization Framework

Privatization Analysis Dependence Analysis

Memory Classification Analysis

Generation of Parallel Code

PARALLELIZATION

DATAFLOW

[Rus, Rauchwerger, Hoeflinger 2002]

Page 22: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Memory Classification Analysis [Hoeflinger 1998]

Memory reference set partition

Provides array dataflow/dependence information Relies heavily on closed forms

A(3) = A(1) + A(2)

A(1) = A(3) + A(2)

ReadOnly (A) = { 2 }

WriteFirst (A) = { 3 }

ReadWrite (A) = { 1 }

Page 23: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Memory Reference Sequences

1 DO i = 1, N 2 p = 0 3 DO j = 1, M 4 IF (…) 5 p = p+1 6 A(p) = … 7 ENDIF 8 ENDDO 9 DO j = 1, p10 … = A(j)11 ENDDO12 ENDDO

Stack push

Is the inner loop independent?

Yes, increasing in inner loop

Is A privatizable in the outer loop?

Yes, contiguous write in inner loop

P3M / PP_do100

WF : predwrite [p : p+lengthwrite]

Recurrence : predstep { p = p + lengthstep }

Contiguous:

predstep predwrite, lengthstep lengthwrite

Increasing:

predstep predwrite, lengthstep lengthwrite

Consecutive:

predstep predwrite, lengthstep = lengthwrite

Page 24: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Pushback Sequences

DO i = 1, N IF (C(i).EQ.1) A(p) = … p = p+1 ENDIFENDDO

Conditional Pushback

HYDRO2D WNFLE_do10

Page 25: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Pushback Sequences

DO i = 1, N IF (C(i).EQ.1) A(p) = … p = p+1 ENDIFENDDO

Conditional Pushback

old = pDO i = 1, N next = p+1 same = 0 A(next) = … DO j = 1, old IF (A(j).EQ.A(next)) same = 1 ENDIF ENDDO IF (same.EQ.0) p = next ENDIFENDDO

Conditional Pushback, Stack lookup

HYDRO2D WNFLE_do10

TRACK FPTRAK_do300

Page 26: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Pushback Sequences

DO i = 1, N IF (C(i).EQ.1) A(p) = … p = p+1 ENDIFENDDO

Conditional Pushback

old = pDO i = 1, N next = p+1 same = 0 A(next) = … DO j = 1, old IF (A(j).EQ.A(next)) same = 1 ENDIF ENDDO IF (same.EQ.0) p = next ENDIFENDDO

Conditional Pushback, Stack lookup

old = pDO i = 1, N ifdata = p+1 DO k = 1, M A(p+1) = … DO j = ifdata, p IF (A(1,j).EQ.A(1,p+1)) A(2,j) = A(2,j)+A(2,p+1) same = 1 ENDIF ENDDO IF (same.EQ.0) p = p+1 ENDIF ENDDOENDDO

Conditional Pushback, Stack lookup & update

HYDRO2D WNFLE_do10

TRACK FPTRAK_do300

TRACK / EXTEND_do400

Page 27: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Pushback Sequences

Detection– Consecutive WF

Parallelization– Accumulation to private storage– Simple copy-out to shared storage

Page 28: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Privatization Analysis Dependence Analysis

Memory Classification Analysis

Generation of Parallel Code

PARALLELIZATION

DATAFLOW

VEG-based Analysis

Implementation in Polaris

Page 29: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Privatization Analysis Dependence Analysis

Memory Classification Analysis

Generation of Parallel Code

PARALLELIZATION

DATAFLOW

VEG-based Analysis

Partially aggregated descriptors are fed to VEG-based analysis

Implementation in Polaris

Page 30: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Privatization Analysis Dependence Analysis

Memory Classification Analysis

Generation of Parallel Code

PARALLELIZATION

DATAFLOW

VEG-based Analysis

Contiguous sequences lead to more accurate dataflow information

Implementation in Polaris

Page 31: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Privatization Analysis Dependence Analysis

Memory Classification Analysis

Generation of Parallel Code

PARALLELIZATION

DATAFLOW

VEG-based Analysis

More storage dependences eliminated by privatization

Implementation in Polaris

Page 32: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Privatization Analysis Dependence Analysis

Memory Classification Analysis

Generation of Parallel Code

PARALLELIZATION

DATAFLOW

VEG-based Analysis

Closer value ranges, increasing sequences less false dependences

Implementation in Polaris

Page 33: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Privatization Analysis Dependence Analysis

Memory Classification Analysis

Generation of Parallel Code

PARALLELIZATION

DATAFLOW

VEG-based Analysis

Efficient pushback sequence parallelization

Implementation in Polaris

Page 34: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Experimental Results Program Loop Seq% Description

EXTEND_do400 15-65 Pushback & stack update, VEG pruning

FPTRAK_do300 4-50 Pushback & stack lookup, VEG inference

TRACK

GETDAT_do300 1-5 Pushback & stack update

MDG INTERF_do1000 92 Disambiguated control using VEG inference

PP_do100 52 Contiguous write covers subsequent read P3M

SUBPP_do100 9 Contiguous write covers subsequent read

BDNA ACTFOR_do240 29 Read covered by write using VEG ranges

MDLJDP2 JLOOPB_do20 12 Pushback

ADM DKZMH_do60 6 Contiguous write covers subsequent read

QCD QQQLPS_do21 <1 Pushback

DYFESM SETCOL_do1 <1 Pushback

HYDRO2D WNFLE_do10 <1 Pushback

Seq% = Sequential Time (loop) / Sequential Time (whole application)

Page 35: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Pushbacks in PERFECT

Benchmark Code Pushback Loops Pushback Arrays

BDNA 2 3

DYFESM 7 9

MDG 1 1

QCD 5 7

SPEC77 3 3

TRACK 7 40

More in C and C++ codes !

Page 36: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Related Work

Gupta, Mukhopadhyay, Sinha, PACT’99

Lin & Padua CC'00

Wu, Cohen, Hoeflinger, Padua ICS'01-LCPC'01

Our Framework

Problems Solved

Privatization, Data Dependence

Privatization, some Data Dependence

Data Dependence Privatization, Data Dependence, Dataflow

Method Memory Reference Analysis

Algorithm Recognition

Monotonic Evolution Memory Reference Classification

Recurrence Model

Implicit Implicit: DDG

Explicit: Evolution Explicit: VEG

Multi-variable Not Specified No No Yes

Distance Ranges

Yes No Yes Yes

Conditional Ranges

Range Extraction No No Yes

Mem. Ref. Type Generic Single-indexed

Not Defined Generic

Interprocedural Yes No No Yes

Pushback Parallelization

No Yes (restrictive)

No Yes (more general)

Page 37: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

Conclusions

Value EvolutionGraph

Range

Comparison

Recurrences

Logic Inferences

Memory ReferenceAnalysis

Array Dataflow

Privatization

Dependence Analysis

Pushback Parallelization

Page 38: Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger.

EXTEND_do300EXTEND_do400

Sample VEGs