Pipelined Computations P0 P1 P2 P3 P5
Pipelined Computations
P0 P1 P2 P3 P5
Pipelined Computations
asin sout
a[0]
asin sout
a[1]
asin sout
a[2]
asin sout
a[3]
asin sout
a[4]
for(i=0 ; i<n ; i++) sum = sum + a[i];
sum = sum + a[0];sum = sum + a[1];sum = sum + a[2];sum = sum + a[3];sum = sum + a[4];
Pipelined Computations
f0
fin fout
f1
fin fout
f2
fin fout
f3
fin fout
f4
fin fout
Signal without
f0
Signal without
f1
Signal without
f2
Signal without
f3
Pipelined Computations
1. If more than one instance of the complete problem is to be executed.
2. If a series of data items must be processed, each requiring multiple operations
3. If information to start the next process can be passed forward before the process has completed all its internal operations
Pipelined Computations
P5P4P3P2P1P0 1 2
1 21
3
21
34
21
345
21
3456
234567
34567
4567
567
mp-1
The average number of cycles is (m+p-1) cycles.
Pipelined Computations
P4P3P2P1P0 P5P4P3P2P1P0 P5
P4P3P2P1P0 P5P4P3P2P1P0 P5
P4P3P2P1P0 P5
I0I1I2I3I4…
time
Pipelined Computations
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9d9d8d7d6d5d4d3d2d1d
0
P9P8P7P6P5P4P3P2P1P0 1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
p-1 n
Pipelined Computations
P0
P1
P2
P3
P4
P5
P0
P1
P2
P3
P4
P5
Pipelined Computations
P0 P1 P2 P3
Processor 0
P4 P5 P6 P7
Processor 1
P8 P9 P10 P11
Processor 2
Pipelined Computations
Hostcomputer
An ideal interconnection structure is a line or ring structure.
Adding numbers
P0 P1 P2 P3 P4
1
1
i 2
1
i 3
1
i 4
1
i 5
1
i
recv(&accumulation, Pi-1);accumulation = accumulation + number;send(&accumulation, Pi+1);
Adding numbers
P0: send(&number, P1);Pn-1: recv(&number, Pn-2); accumulation = accumulation + number; send(&number, P0);
if( process > 0){ recv(&accumulation, Pi-1); accumulation = accumulation + number;}if( process < n-1) send(&accumulation, Pi+1);
Adding numbers
P0 P1 P2 P4dn-1 …..d2d1d0
sum
Adding numbers
P0 P1 P2 Pn-1dn-1 …..d2d1d0
sum
…..
Adding numbers
Analysis :
ttotal=(time for one pipeline cycle)(number of cycles)ttotal=(tcomp+tcomm)(m+p-1)
ta = ttotal/m
Single Instance of Problem :
tcomp = 1tcomp = 2(tstartup+tdata)tcomp = (2(tstartup+tdata)+1)n
Adding numbers
Multiple Instance of Problem:
tcomp = (2(tstartup+tdata)+1)(m+n-1)
ta = ttotal/m=2(tstartup+tdata)+1
Adding numbers
Data partitioning with Multiple Instances of problem
tcomp = d
tcomp = 2(tstartup+tdata)
tcomp = (2(tstartup+tdata)+d)(m+n/d-1)
Sorting Numbers
P0 P1 P2 P3 P4
5
5
5 2
5 2
5 3 1
5 4 2
5 4 3 1
5 4 3 2
5 4 3 2 1
4,3,1,2,5
4,3,1,2
4,3,1
4,3
4
2
1
13
4 2
3 1
2
1
1
2
3
4
5
6
7
8
9
10
Sorting Numbers
recv(&number, Pi-1);if(number > x) { send(&x, Pi+1); x = number; } else send(&number, Pi+1);
right_procno = n-i-1; recv(&x, Pi-1); for(j=0 ; j<right_procno ; j++) recv(&number, Pi-1); if(&number > x){ send(&x, Pi+1); x = number; } else send(&number, Pi+1);}
Sorting Numbers
xmax
comparexn-1..x1x0xmax
compare
xmax
compare
P0 P1 P2
smallernumbers
Sorting Numbers
P0 P1 P2 Pn-1dn-1 …..d2d1d0
Master
right_procno = n-i-1; recv(&x, Pi-1); for(j=0 ; j<right_procno ; j++) recv(&number, Pi-1); if(&number > x){ send(&x, Pi+1); x = number; } else send(&number, Pi+1);}
send(&number, Pi-1);for(j=0 ; j<right_procno ; j++){ recv(&x, Pi+1); send(&x, Pi-1);}
Sorting Numbers
analysis :
2
)1(12......)2()1(
nnnnts
)(2
1
datastartupcomm
comp
ttt
t
)12))((21()12)(( nttnttt datastartupcommcomptotal
Sorting phase Returning sorted numbers
2n-1 nP4P3P2P1P0
Primer number Generation
2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
To find the primes up to n, it is only necessary tostart at numbers up to n
Primer number Generation
Analysis
1....1
51
31
2 n
nnnnts
for(i=2 ; i<n ; i++)
prime[i] = 1;
for(i=2 ; i<=sqrt_n ; i++)
if(prime[i] == 1)
for(j=i+1 ; i<n ; j=j+i)
prime[j] = 0;
Sequential code:
Primer number Generation
Parallel Coderecv(&x, Pi-1)recv(&number, Pi-1);if((number%x) !=0) send(&number, Pi+1);
p1
comparexn-1..x1x0p2
compare
p3
compare
P0 P1 P2Not multiple of1st prime number
recv(&x, Pi-1);for(i=0 ; i<n ; i++){ recv(&number, Pi-1); if(number == terminator) break; if((number%x) !=0 ) send(&number, Pi-1);}
Solving a System of Linear Equations
Upper-triangular
000,1
111,100,1
222,211,200,2
111,122,111,100,1
...........................................................
...............................................
..................................
....
.......
bxa
bxaxa
bxaxaxa
bxaxaxaxa nnnnnnn
ii
i
jjjii
i a
xab
x
a
xaxabx
a
xabx
a
bx
,
1
0,
2,2
11,200,222
1,1
00,011
0,0
00
Solving a System of Linear Equations
Sequential Code
x[0] = b[0] / a[0][0];for(i=1 ; i<n ; i++){ sum = 0; for(j=0 ; j<i ; j++){ sum = sum + a[i][j]*x[j]; x[i]=(b[i]-sum)/a[i][i]; }}
Solving a System of Linear Equations
Compute x0 Compute x1 Compute x2 Compute x3
X0X1X2X3
X0X1X2
x0x1
x0
Solving a System of Linear Equations
Parallel Code
for(j=0 ; j<i ; j++){ recv(&x[j], Pi-1); send(&x[j], Pi+1);}
sum=0;for(j=0 ; j<i ; j++) sum=sum+a[i][j]*x[j];x[i] = (b[I]-sum)/a[i][i];Send(&x[i], Pi+1);
Solving a System of Linear EquationsSolving a System of Linear Equations
Parallel Code : Pi
sum = 0;
for(j=0 ; j<i ; j++){
recv(&x[j], Pi-1);
send(&x[j], Pi+1);
sum = sum + a[i][j]*x[j];
}
x[i]=(b[i]-sum)/a[i][i];
send(&x[i], Pi+1);
dividesend(x0)end
recv(x0)send(x0)multiply/adddivide/subtractsend(x1)end
recv(x0)send(x0)multiply/addrecv(x1)send(x1)multiply/adddivide/subtractsend(x2)end
recv(x0)send(x0)multiply/addrecv(x1)send(x1)multiply/adddivide/subtractrecv(x2)send(x2)multiply/adddivide/subtractsend(x3)end
recv(x0)send(x0)multiply/addrecv(x1)send(x1)multiply/adddivide/subtractrecv(x2)send(x2)multiply/adddivide/subtractrecv(x3)send(x3)multiply/adddivide/subtractsend(x4)end
time
P0 P1 P2 P3 P4
Solving a System of Linear Equations
Solving a System of Linear Equations
P5
P4
P3
P2
P1
P0First value passed onward
final computed value