Value Evolution Graph The And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger
Dec 19, 2015
ValueEvolution Graph
The
And its Applications to Automatic Parallelization
Silvius Rus, Dongmin Zhang, andLawrence Rauchwerger
Motivating Example: Parallelization
q = 0 DO i = 1, 100 q = q+1 B(q) = 1 ENDDO
Sample Code
!$OMP PARALLEL DO DO i = 1, 100 B(i) = 1 ENDDO q = 100
Automatically Parallelized
Classic Solution
1. Induction Variable Substitution: q f(i) = i
2. Dependence Test:
1 ≤ i1 ≤ 100
1 ≤ i2 ≤ 100
i1 i2
f(i1) = f(i2)
Motivating Example: Parallelization
1 old = p 2 q = 0 3 DO i = 1, old 4 q = q+1 5 B(q) = 1 6 IF (A(q).GT.0) 7 p = p+1 8 A(p) = 0 9 ENDIF10 ENDDO
Sample Code
q is substituted with closed form q = i
p cannot be substituted with a closed form
1 old = p 2 3 DO i = 1, old 4 5 B(i) = 1 6 IF (A(i).GT.0) 7 p = p+1 8 A(p) = 0 9 ENDIF10 ENDDO
After Induction Variable Recognition/Substitution
Array B is independent
Array A is dependent
Anti Flow
Output
Motivating Example: Parallelization
1 old = p 2 3 DO i = 1, old 4 5 6 IF (A(i).GT.0) 7 p = p+1 8 A(p) = 0 9 ENDIF10 ENDDO
After Induction Variable Recognition/Substitution
Anti Flow
Outputp(8)[1:old]p(8)[1:old]
p(8) non-repeating
Recurrence Properties
1 old = p 3 DO i = 1, old 6 IF (A(i).GT.0) 7 p = p+1 8 A(p) = 0 9 ENDIF10 ENDDO
Cross-iteration mutually independent if p strictly increasing, or
step(p|i=k, p|i=k+1) > 0, k [1:old]
STEP
Recurrence Properties
1 old = p 3 DO i = 1, old 6 IF (A(i).GT.0) 7 p = p+1 8 A(p) = 0 9 ENDIF10 ENDDO
Independent if p and i belong to disjoint sets, or
image(p|i[1:old]) image(i)|i[1:old])=
STEP
IMAGE
A Simple Value Evolution Graph
1 p1 = 0 2 IF (cond) 3 p2 = p1+5 4 ELSE 5 p3 = p1+7 6 ENDIF p4 = γ(p2, p3, cond) 7 IF (p4>0) 8 … 9 ENDIF
p2
5 7
p1:0
p3
p4
00
p4 = p1 + 5 + 0
p4 > p1
p4 > 0
p4 = p1 + 7 + 0
p4 > p1
p4 > 0
1 p = 0 2 IF (cond) 3 p = p+5 4 ELSE 5 p = p+7 6 ENDIF
7 IF (p>0) 8 … 9 ENDIF
Sample CodeStatic Single Assignment Form
The Value Evolution Graph
1 old = p0
3 DO i = 1, old p1 = μ(p0, p3) 5 B(i) = 1 6 IF (A(i).GT.0) 7 p2 = p1+1 8 A(p2) = 0 9 ENDIF p3 = γ(p1, p2, A(i).GT.0)10 ENDDO p4 = η(p0, p1)
Our Solution: The Value Evolution Graph
VEG: • acyclic graph, GSA names as nodes• one for each loop body/subprogram
1 old = p1 old = p00
3 DO i = 1, old3 DO i = 1, old p1 = μ(p0, p3) 5 B(i) = 1 6 IF (A(i).GT.0) 7 p2 = p1+1 8 A(p2) = 0 9 ENDIF p3 = γ(p1, p2, A(i).GT.0)10 ENDDO10 ENDDO pp44 = = ηη(p(p00, p, p11))
p3
p1
p20
0
1
VEG for the loop body
Our Solution: The Value Evolution Graph
VEG: • acyclic graph, GSA names as nodes• one for each loop body/subprogram
old
p4
p0
p1
[0:old]
0
0
0
VEG for the outer context
1 old = p0
3 DO i = 1, old3 DO i = 1, old pp11 = = μμ(p(p00, p, p33)) 5 B(i) = 15 B(i) = 1 6 IF (A(i).GT.0)6 IF (A(i).GT.0) 7 p7 p22 = p = p11+1+1 8 A(p8 A(p22) = 0) = 0 9 ENDIF9 ENDIF pp33 = = γγ(p(p11, p, A(i).GT.0), p, A(i).GT.0)10 ENDDO10 ENDDO p4 = η(p0, p1)
Our Solution: The Value Evolution Graph
VEG: • acyclic graph, GSA names as nodes• one for each loop body/subprogram• hierarchical relations among VEGs
1 old = p0
3 DO i = 1, old p1 = μ(p0, p3) 5 B(i) = 1 6 IF (A(i).GT.0) 7 p2 = p1+1 8 A(p2) = 0 9 ENDIF p3 = γ(p1, p2, A(i).GT.0)10 ENDDO p4 = η(p0, p1)
p3
p1
p20
0
1
VEG for the loop body
old
p4
p0
p1
[0:old]
0
0
0
VEG for the outer context
VEG Nodes
p0 = 0DO i = 1, N
p1 = μ(p0, p4) IF (A(i).GT.0) p2 = p1+1 ELSE p3 = 0 ENDIF p4 = γ(p2, p3, A(i).GT.0)
ENDDO
p2
p1
00
1
p3:0 Regular
μ
Back
Input
p0
Input: result of assignment of loop invariant
Back: last value in one iteration
μ : merges value from outside with loop-back
p4
Regular: all others
VEG Edges
p1 = …IF (A(i).GT.0) p2 = p1+1ENDIFp3 = γ(p1, p2, A(i).GT.0)
p2
p1
(+0, A(i).LE.0)
(+0, A(i).GT.0)
(+1, .TRUE.)
p3
VEG Distance
p1 = …IF (A(i).GT.0) p2 = p1+1ENDIFp3 = γ(p1, p2, A(i).GT.0)
p2
p1
0
0
1
p3
distance(p1,p3) = [ ShortestPath(p1,p3) : LongestPath(p1,p3) ]
distance(p1,p3) = [0:1]
Recurrence Properties
p1
p20
0
1
p0 = 0DO i = 1, N p1 = μ(p0, p3) IF (A(i).GT.0) p2 = p1+1 ENDIF p3 = γ(p1, p2)ENDDO
step(p2|i=k, p2|i=k+1) =
distance(p2, p3) + distance(p1, p2) = 0 + 1 = 1
Back Node
μ-Node p3
Recurrence Properties
p1
p20
0
1
p0 = 0DO i = 1, N p1 = μ(p0, p3) IF (A(i).GT.0) p2 = p1+1 ENDIF p3 = γ(p1, p2)ENDDO
step(p2|i=k, p2|i=k+1) =
distance(p2, p3) + distance(p1, p2) = 0 + 1 = 1
Back Node
μ-Node
image(p2) i[1:N] = initial value(p1) +
step(p1|i=k, p1|i=k+1) * [0:N–1] +
distance(p1, p2) = 0 + [0:1]*[0:N-1] + 1 = [1:N]
p3
Recurrence Properties
p1
p20
0
1
p0 = 0DO i = 1, N p1 = μ(p0, p3) IF (A(i).GT.0) p2 = p1+1 ENDIF p3 = γ(p1, p2)ENDDO
step(p2|i=k, p2|i=k+1) =
distance(p2, p3) + distance(p1, p2) = 0 + 1 = 1
Back Node
μ-Node
image(p2) i[1:N] = initial value(p1) +
step(p1|i=k, p1|i=k+1) * [0:N–1] +
distance(p1, p2) = 0 + [0:1]*[0:N-1] + 1 = [1:N]
last value(p1) i=N = initial value(p1) +
step(p1|i=k, p1|i=k+1) * N = 0 + [0:1]*N = [0:N]
p3
Recurrence Properties
old = p0
q0 = 0 DO i = 1, old q1 = μ(q0, q2) p1 = μ(p0, p3) q2 = q1+1 B(q2) = 1 IF (A(i).GT.0) p2 = p1+1 A(p2) = 0 ENDIF p3 = γ(p1, p2)ENDDO
Closed Form
No Closed Form
q1
1
q2
p1
p20
0
1
p3
step(q2, q2) = 1 B(q2) independent
step(p2, p2) = 1 A(p2) independent
Logic Inference on the VEG
1 f1 = 0 2 IF (c1) 3 f2 = 1 4 ENDIF 5 f3 = γ(f1,f2,c1) 6 IF (c2) 7 value = … 8 ELSE 9 f4 = f3+210 ENDIF11 f5 = γ(f3,f4,c2)12 IF (f5.EQ.1)13 PRINT *, value14 ENDIF
f3
0 0
f1:0 f2:1
f4
2
f5
0
(0,c2)
(f5.EQ.1) c2?
Extract range: f5.EQ.1 f5 [1:1]
Trace range backwards: f5 [1:1]
f5 [1:1]
f4+0 [1:1] f3+0 [1:1]
f3+2 [1:1]
f1+2 [1:1] f2+2 [1:1]
2 [1:1] 3 [1:1]
f1 [1:1] f2 [1:1]
0 [1:1] 1 [1:1]
Propagate value from 7 to 13
VEG Pruning
1 A(p1) = …2 f1 = 03 IF (cond)4 f2 = 15 p2 = p1+16 ENDIF p3 = γ(p1, p2, cond) f3 = γ(f1, f2, cond)7 IF (f3.GT.0)8 p4 = p3-19 ENDIF p5 = γ(p3, p4, f3.GT.0)10 IF (f3.EQ.1)11 … = A(p5)12 ENDIF
Is … = A(p5) covered by A(p1) = … ?
f3
0 0
f1:0 f2:1
f3.EQ.1 cond
p3
p1
p20
0
1
p5
p40
0
-1
VEG before Pruning
p5[p1-1:p1 +1]
f3.EQ.1 f3.GT.0
p5[p1-1:p1]
p3
p1
p2
0
1
p5
p4
0
-1
After VEG-based GSA-Path Pruning
p5 = p1
p3
p1
p20
0
1
p5
p4
0
-1
After GSA-Path Pruning
[ Tu, Padua, ICS95 ]
Automatic Parallelization Framework
Privatization Analysis Dependence Analysis
Memory Classification Analysis
Generation of Parallel Code
PARALLELIZATION
DATAFLOW
[Rus, Rauchwerger, Hoeflinger 2002]
Memory Classification Analysis [Hoeflinger 1998]
Memory reference set partition
Provides array dataflow/dependence information Relies heavily on closed forms
A(3) = A(1) + A(2)
A(1) = A(3) + A(2)
ReadOnly (A) = { 2 }
WriteFirst (A) = { 3 }
ReadWrite (A) = { 1 }
Memory Reference Sequences
1 DO i = 1, N 2 p = 0 3 DO j = 1, M 4 IF (…) 5 p = p+1 6 A(p) = … 7 ENDIF 8 ENDDO 9 DO j = 1, p10 … = A(j)11 ENDDO12 ENDDO
Stack push
Is the inner loop independent?
Yes, increasing in inner loop
Is A privatizable in the outer loop?
Yes, contiguous write in inner loop
P3M / PP_do100
WF : predwrite [p : p+lengthwrite]
Recurrence : predstep { p = p + lengthstep }
Contiguous:
predstep predwrite, lengthstep lengthwrite
Increasing:
predstep predwrite, lengthstep lengthwrite
Consecutive:
predstep predwrite, lengthstep = lengthwrite
Pushback Sequences
DO i = 1, N IF (C(i).EQ.1) A(p) = … p = p+1 ENDIFENDDO
Conditional Pushback
HYDRO2D WNFLE_do10
Pushback Sequences
DO i = 1, N IF (C(i).EQ.1) A(p) = … p = p+1 ENDIFENDDO
Conditional Pushback
old = pDO i = 1, N next = p+1 same = 0 A(next) = … DO j = 1, old IF (A(j).EQ.A(next)) same = 1 ENDIF ENDDO IF (same.EQ.0) p = next ENDIFENDDO
Conditional Pushback, Stack lookup
HYDRO2D WNFLE_do10
TRACK FPTRAK_do300
Pushback Sequences
DO i = 1, N IF (C(i).EQ.1) A(p) = … p = p+1 ENDIFENDDO
Conditional Pushback
old = pDO i = 1, N next = p+1 same = 0 A(next) = … DO j = 1, old IF (A(j).EQ.A(next)) same = 1 ENDIF ENDDO IF (same.EQ.0) p = next ENDIFENDDO
Conditional Pushback, Stack lookup
old = pDO i = 1, N ifdata = p+1 DO k = 1, M A(p+1) = … DO j = ifdata, p IF (A(1,j).EQ.A(1,p+1)) A(2,j) = A(2,j)+A(2,p+1) same = 1 ENDIF ENDDO IF (same.EQ.0) p = p+1 ENDIF ENDDOENDDO
Conditional Pushback, Stack lookup & update
HYDRO2D WNFLE_do10
TRACK FPTRAK_do300
TRACK / EXTEND_do400
Pushback Sequences
Detection– Consecutive WF
Parallelization– Accumulation to private storage– Simple copy-out to shared storage
Privatization Analysis Dependence Analysis
Memory Classification Analysis
Generation of Parallel Code
PARALLELIZATION
DATAFLOW
VEG-based Analysis
Implementation in Polaris
Privatization Analysis Dependence Analysis
Memory Classification Analysis
Generation of Parallel Code
PARALLELIZATION
DATAFLOW
VEG-based Analysis
Partially aggregated descriptors are fed to VEG-based analysis
Implementation in Polaris
Privatization Analysis Dependence Analysis
Memory Classification Analysis
Generation of Parallel Code
PARALLELIZATION
DATAFLOW
VEG-based Analysis
Contiguous sequences lead to more accurate dataflow information
Implementation in Polaris
Privatization Analysis Dependence Analysis
Memory Classification Analysis
Generation of Parallel Code
PARALLELIZATION
DATAFLOW
VEG-based Analysis
More storage dependences eliminated by privatization
Implementation in Polaris
Privatization Analysis Dependence Analysis
Memory Classification Analysis
Generation of Parallel Code
PARALLELIZATION
DATAFLOW
VEG-based Analysis
Closer value ranges, increasing sequences less false dependences
Implementation in Polaris
Privatization Analysis Dependence Analysis
Memory Classification Analysis
Generation of Parallel Code
PARALLELIZATION
DATAFLOW
VEG-based Analysis
Efficient pushback sequence parallelization
Implementation in Polaris
Experimental Results Program Loop Seq% Description
EXTEND_do400 15-65 Pushback & stack update, VEG pruning
FPTRAK_do300 4-50 Pushback & stack lookup, VEG inference
TRACK
GETDAT_do300 1-5 Pushback & stack update
MDG INTERF_do1000 92 Disambiguated control using VEG inference
PP_do100 52 Contiguous write covers subsequent read P3M
SUBPP_do100 9 Contiguous write covers subsequent read
BDNA ACTFOR_do240 29 Read covered by write using VEG ranges
MDLJDP2 JLOOPB_do20 12 Pushback
ADM DKZMH_do60 6 Contiguous write covers subsequent read
QCD QQQLPS_do21 <1 Pushback
DYFESM SETCOL_do1 <1 Pushback
HYDRO2D WNFLE_do10 <1 Pushback
Seq% = Sequential Time (loop) / Sequential Time (whole application)
Pushbacks in PERFECT
Benchmark Code Pushback Loops Pushback Arrays
BDNA 2 3
DYFESM 7 9
MDG 1 1
QCD 5 7
SPEC77 3 3
TRACK 7 40
More in C and C++ codes !
Related Work
Gupta, Mukhopadhyay, Sinha, PACT’99
Lin & Padua CC'00
Wu, Cohen, Hoeflinger, Padua ICS'01-LCPC'01
Our Framework
Problems Solved
Privatization, Data Dependence
Privatization, some Data Dependence
Data Dependence Privatization, Data Dependence, Dataflow
Method Memory Reference Analysis
Algorithm Recognition
Monotonic Evolution Memory Reference Classification
Recurrence Model
Implicit Implicit: DDG
Explicit: Evolution Explicit: VEG
Multi-variable Not Specified No No Yes
Distance Ranges
Yes No Yes Yes
Conditional Ranges
Range Extraction No No Yes
Mem. Ref. Type Generic Single-indexed
Not Defined Generic
Interprocedural Yes No No Yes
Pushback Parallelization
No Yes (restrictive)
No Yes (more general)
Conclusions
Value EvolutionGraph
Range
Comparison
Recurrences
Logic Inferences
Memory ReferenceAnalysis
Array Dataflow
Privatization
Dependence Analysis
Pushback Parallelization