Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a 2 ) ) a 3 = a 1 ) (a 2 ) a 3 ) How to compute (a 1 ) a 2 ) …. ) a n ) in parallel in O(logn) time?
22
Embed
Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Parallel Prefix and Data Parallel Operations
Motivation: basic parallel operations which occurs repeatedly.Let ) be an associative operation.
(a1 ) a2) ) a3 = a1 ) (a2 ) a3 )
How to compute
(a1 ) a2 ) …. ) an ) in parallel in O(logn) time?
Approach 1
a0 a1 a2 a3 a4 a5 a6 a7
[0:1][0:0] [1:2] [2:3] [3:4] [4:5] [5:6] [6:7]
[0:1][0:0] [0:2] [0:3] [1:4] [2:5] [3:6] [4:7]
[0:1][0:0] [0:2] [0:3] [0:4] [0:5] [0:6] [0:7]
d=1
d=2
d=4
Assume that n = 2k
for i = 0 to k-1 for j = 0 to n-1-2i do in parallel
x[j+ 2i ] = x[j] + x[j+ 2i ]
How to do on Tree Architecture?
for each nodeif there is a signal from left and right
St <- Sl + Sr
if there is a signal R, send R to both its children
if the node is a leaf and there is a signal R, X <- X + R
SlSr
StR
How to do on a Hypercube
A complete binary tree can be embedded into a hypercubeSimpler solution: each node computes prefix and total sum for i = 0 to k-1 for j = 0 to n-1 do in parallel
x[j] = x[j] + sum[ji] if i-th bit of j = 1
sum[j ] = sum[j] + sum[ji],
where ji and j have the same binary number representation
except their i-th bit, where the i-th bit of ji is the
complement of the i-bit of j.
Prefix on Hypercube
a0 a1 a2 a3 a4 a5 a6 a7
for i = 0 to k-1 for j = 0 to n-1 do in parallel
x[j] = x[j] + sum[ji] if i-th bit of j = 1
sum[j ] = sum[j] + sum[ji],
[0:1]
[0:1]
[0:0]
[0:1]
[2:2]
[2:3]
[2:3]
[2:3]
[4:4]
[4:5]
[4:5]
[4:5]
[6:6]
[6:7]
[6:7]
[6:7]d=1X
SUM
[0:1]
[0:3]
[0:0]
[0:3]
[2:2]
[0:3]
[2:3]
[0:3]
[4:4]
[4:7]
[4:5]
[4:7]
[4:6]
[4:7]
[4:7]
[4:7]d=2X
SUM
[0:1]
[0:7]
[0:0]
[0:7]
[2:2]
[0:7]
[2:3]
[0:7]
[0:4]
[0:7]
[0:5]
[0:7]
[0:6]
[0:7]
[0:7]
[0:7]d=4X
SUM
Applications of Data Parallel Operations
Any associative operations:
Examples:– min, max, add– adding two binary numbers– finite state automata– radix sort– segmented prefix sum– routing
• packing• unpacking• broadcast (copy-scan)
– solving recurrence equations– straight line computation (parallel arithmetic evaluation)
Let be any associative operation.For segmented operation of , define ’ as follows:
’ b | b
a a b | b | a | (a b) | b
Then ’ is associativeand we can compute segmented operation in O(logn) time.
Enumerating
Data = [5 6 3 1 8 3 7 5 9 2]
active procs = [1 0 1 1 0 0 1 0 1 0]
enumerated = [0 x 1 2 x x 3 x 4 0]
packing
data = [5 6 3 1 8 3 7 5 9 2]
active procs = [1 0 1 1 0 0 1 0 1 0]
enumerated = [0 x 1 2 x x 3 x 4 x]
packed data =[5 3 1 7 9 x x x x x]
Packing and Unpacking on Hypercube
Packing• adjust bit 0• adjust bit 1• adjust bit 2 • ...• adjust bit k-1
Unpacking• adjust bit k-1• adjust bit k-2• ...• adjust bit 1• adjust bit 0
How about in the order of adjust bit 0, 1, ..., k-1 for packing?
Unpacking
Address 0 1 2 3 4 5 6 7 8 9
data = [6 2 3 5 9 x x x x x]
active procs = [1 0 1 1 0 0 1 0 1 0]
enumerated = [0 x 1 2 x x 3 x 4 x]
destination = [0 2 3 6 8 x x x x x]
unpacked data = [6 x 2 3 x x 5 x 9 x]
Copy Scan (broadcast)
address 0 1 2 3 4 5 6 7 8 9
data = [ 6 2 3 5 9 4 1 7 8 10]
segmented bit = [ 1 0 1 1 0 0 1 0 1 0]
result = [ 6 6 3 5 5 5 1 1 8 8]
Radix Sort
for j = k-1 to 0 // x has k bits for all i in [0 .. n-1] do parallel { if j-th bit of x[i] is 0 { y[i] = enumerate c = count } if j-th bit of x[i] is 1 y [i] <- enumerate + c
x [y[i]] = x [i] }
Radix sort another code
for j = k-1 to 0 // x has k bits for all i in [0 .. n-1] do parallel { pack left x[i] if j-th bit of x[i] pack right x[i] if j-th bit of x[i] }
Quick Sort
1. Pick a pivot p
2. Broadcast p
3. For all PE i, compare A[i] with p
{ if A[i] <p, pack left A[i] in the segment
if A[i] >= p, pack right A[i] in the segment
}
4. Mark the segment boundary
5. Each segment, quick sort recursively
Solving Linear Recurrence Equations
fn=an-1fn-1 + an-2fn-2
fn
fn-1
Pointer Jumping and Tree Computation
How to compute a prefix on a linked list?
1 2 3 4 5 6 7
If NEXT[i] != NILL then X[i] <- X[i] + X[NEXT[i]] NEXT[i] <- NEXT[NEXT[i]]
10 14 18 22 18 13 7
3 5 7 9 11 13 7
28 27 25 22 18 13 7
How to make 1 3 6 10 15 21 28 order?
Application: Tree computationPre-order numbering
Each node
Leaf node
1
1
Can be applied to in order, post ordernumber of children, depth etc.Bi-component, etc also