This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Chapter 3 Parallel and Pipelined ProcessingChapter 4 Retiming
*
Definitions
Retiming
Retiming is a mapping from a given DFG, G to a retimed DFT, Gr such
that the corresponding transfer function of G and Gr differ by a
pure delay z-L.
Purposes
To reduce number of registers needed.
(C) 2004-2006 by Yu Hen Hu
*
Cut-set Retiming
Feed-forward cut-set:
Feed-back cut-set
Delay transfer theorem
Adding arbitrary non-negative number of delays to each edge of a
feed-forward cut-set of a DFG will not alter its output, except the
output timing will be delayed.
Transfer the same amount of delays from edges of the same direction
across a feed-back cut set of a DFG to all edges of opposing edges
across the same cut set will not alter the output, but its
timing.
(C) 2004-2006 by Yu Hen Hu
*
Feed-forward Cut-Set Retiming
y(n) = b0x(n) + b1x(n-1)
Select a cut set
Insert a delay each to each edge in the cut set.
Retiming:
*
Feed-back Cut Set Retiming
y(n) = a·y(n-2) + x(n)
loop bound = (TM+TA)/2
clock cycle = TM+TA
Shift 1 delay to the other edge across a feed-back cut set
Filter remains unchanged.
*
Timing Diagram
Before retiming
After retiming
*
Feed-back Cut Set Retiming
y(n) = ay(n-1) + x(n)
*
Slowdown + Retiming
Start with
*
Example 3.2.1
Clock cycle time = 4
Clock cycle time = 2
*
Slow Down for Cut-Set Retiming
(C) 2004-2006 by Yu Hen Hu
*
Node Retiming
Transfer delay through a node in DFG:
r(v) = # of delays transferred from out-going edges to incoming
edges of node v w(e) = # of delays on edge e
wr(e) = # of delays on edge e after retiming
Retiming equation:
then
v
v
3D
D
2D
3D
D
2D
*
Invariant Properties
Retiming does NOT change the total number of delays for each
cycle.
Retiming does not change loop bound or iteration bound of the
DFG
If the retiming values of every node v in a DFG G are added to a
constant integer j, the retimed graph Gr will not be affected. That
is, the weights (# of delays) of the retimed graph will remain the
same.
(C) 2004-2006 by Yu Hen Hu
*
Node Retiming Examples
*
DFG Illustration of the Example
T = max. {(1+2+1)/2, (1+2+1)/3} = 2
Cr. Path delay = 2+1 = 3 t.u
T = max. {(1+2+1)/2, (1+2+1)/3} = 2
Cr. Path Delay = max{2,2,1+1} = 2 t.u
(C) 2004-2006 by Yu Hen Hu
*
Retiming for Minimizing Clock Period
Note that retiming will NOT alter iteration bound T.
Iteration bound is the theoretical minimum clock period to execute
the algorithm.
Let edge e connect node u to node v. If the node computing time
t(u) + t(v) > T, then clock period T > T. For such an edge,
we require that
To generalize, for any path from v0 to vk, we have
In other words, for any possible critical path in the DFG that is
larger than T, we require wr(e) 1.
(C) 2004-2006 by Yu Hen Hu
*
Retiming Example Revisited
Use eq. wr(euv) = w(e) + r(v) – r(u),
w(e21) + r(1) – r(2) = 1 + r(1) – r(2) 0
w(e13) + r(3) – r(1) = 1 + r(3) – r(1) 1
w(e14) + r(4) – r(1) = 2 + r(4) – r(1) 1
w(e32) + r(2) – r(3) = 0 + r(2) – r(3) 1
w(e42) + r(2) – r(4) = 0 + r(2) – r(4) 1
(C) 2004-2006 by Yu Hen Hu
*
Solution continues
Since the retimed graph Gr remain the same if all node retiming
values are added by the same constant. We thus can set r(1) =
0.
The inequalities become
r(2) – r(3) 1 or r(3) r(2) - 1
r(2) – r(4) 1 or r(2) r(4) + 1
Since
one must have r(2) = +1.
This implies r(3) 0. But we also have r(3) 0. Hence r(3)=0.
These leave –1 r(4) 0.
Hence the two sets of solutions are:
r(0) = r(3) = 0, r(2) = +1, and r(4) = 0 or -1.
(C) 2004-2006 by Yu Hen Hu
*
Systematic Solutions
r(i) – r(j) k; 1 i,j N
Construct a constraint graph:
Map each r(i) to node i. Add a node N+1.
For each inequality
r(i) – r(j) k,
Draw N edges eN+1,i = 0.
The system of inequalities has a solution if and only if the
constraint graph contains no negative cycles
If a solution exists, one solution is where ri is the minimum
length path from the node N+1 to the node i.
Shortest path algorithms: (Applendix A)
Bellman-Ford algorithm
Floyd-Warshall algorithm
*
Bellman-Ford Algorithm
Find shortest path from an arbitrarily chosen origin node U to each
node in a directed graphif no negative cycle exists.
Given a direct graph
w(m,n): weight on edge from node m to node n, = if there is no edge
from m to n
r(i,j): the shortest path from node U to node i within j-1
steps.
r(i,1) = w(U,i),
j = 1, 2, …, N-1
if max(r(:,n-1)-r(:,n))>0, then there is a negative cycle. Else,
r(i,n-1) gives shortest cycle length from i to U.
Note that 1 > 0, hence there is at least one negative
cycle.
2
1
3
4
1
1
2
-3
1
spbf.m
*
Floyd-Warshall Algorithm
Find shortest path between all possible pairs of nodes in the graph
provided no negative cycle exists.
Algorithm:
For k=1 to N
R(k+1)(u,v) = min{R(k)(u,:) + R(k)(:,v)}
If R(k)(u,u) < 0 for any k, u, then a negative cycle exist.
Else, R(N+1)(u,v) is SP from u to v
2
1
3
4
2
1
2
-3
1
*
Retiming Example
2
1
3
4
5
1
1
0
0
0
0
0
-1
-1
*
Retiming Example
Floyd-Warshall algorithm
*
Retiming to Reduce Registers
Register Sharing
When a node has multiple fan-out with different number of delays,
the registers can be shared so that only the branch with max. # of
delays will be needed.
Register reduction through node delay transfer from multiple input
edges to output edges (e.g. r(v) > 0)
Should be done only when clock cycle constraint (if any) is not
violated.
D
D
D
*
Time Scaling (Slow Down)
Transform each delay element (register) D to ND and reduce the
sample frequency by N fold will slow down the computation N
times.
During slow down, the processor clock cycle time remains unchanged.
Only the sampling cycle time increased.
Provides opportunity for retiming, and interleaving.
+