DSP Design Unfolding Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se DSP Design Unfolding Unfolding Unfolding creates a program with more than one iteration, J=unfolding factor Unfolding is a structured way to achieve parallel processing Applications – sample period reduction reach T sample period reduction, reach – Parallel processing – Bit-serial and Digit-serial ∞ T Unfolding = Loop unrolling – assembly programming – compiler theory Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se DSP Design Example: Loop unrolling+ Example: Loop unrolling Software Pipelining oper CC GSM S h d 1 3 2 1 3 oper 2 CC 1 2 3 1 2 1 GSM Speechcoder • Org. C-code = 250k cc 7 6 5 3 3 2 1 3 3 2 3 1 2 3 • Mod. C-code = 90k cc • Hand Opt = 50k cc 8 7 3 1 • Hand Opt. = 50k cc Iteration 1 Iteration 2 Iteration 3 Higher order Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se Iteration 2 Iterations DSP Design Unfolding ≡ Parallel Processing (1) 2-unfolded A B 2D (1) (1) A 0 B 0 (1) (1) 0,2,4,…. 2D A 0 B 0 => A 2 B 2 => A 4 B 4 =>….. A 1 B 1 => A 3 B 3 => A 5 B 5 =>….. D (1) (1) 1,3,5,…. T ’ ∞ = 2ut 2 nodes & 2 edges T ∞ = (1+1)/2 = 1ut A 1 B 1 D 4 nodes & 4 edges T ’ ∞ = 2ut 4 nodes & 4 edges T ∞ = 2/2 = 1ut • In a ‘J ’ unfolded system each delay is J-slow D if input to a delay element is x(kJ + m) D Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se the output is x((k-1)J + m) = x(kJ + m – J ). J samples
13
Embed
Unfolding T - eit.lth.se · DSP Design Unfolding Viktor Öwall, Dept. of Electrical and Information T echnology, Lund University, Sweden- DSP Design Unfolding Unfolding creates a
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
DSP Design
Unfolding
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
UnfoldingUnfoldingUnfolding creates a program with more than one
iteration, J=unfolding factorg
Unfolding is a structured way to achieve parallel processingp g
Applications– sample period reduction reach Tsample period reduction, reach– Parallel processing– Bit-serial and Digit-serial
∞T
Unfolding = Loop unrolling– assembly programming– compiler theory
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.sey(2k+1)
DSP Design
Definitions
⎣ ⎦x is the floor of x, largest integer x≤
⎡ ⎤x i th ili f x ll t i t x≥⎡ ⎤x is the ceiling of x, smallest integer x≥
ba% remainder after ba
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Algorithm for unfolding• For each node U in the original DFG, J 4
9D
For each node U in the original DFG, draw J nodes U0 , U1 , U2 ,…, UJ-1
U0 V0
J=4
U V37D 9D
U
U1
V
V1
( ) ( ) ⎧ =⎥⎢ +⎥⎢ + 210937 iiwi9D
10DU3
U2
V3
V2( ) ( )⎩⎨⎧
==
=⎥⎦⎥
⎢⎣⎢ +
=⎥⎦⎥
⎢⎣⎢ +
3,102,1,0,9
437
iii
Jwi
• For each edge U → V with w delays in the original DFG,draw the J edges Ui → V(i + w)%J with
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
g i (i w)%J with ⎣(i+w)/J⎦ delays for i = 0, 1, …, J-1
DSP Design
Properties of unfoldingProperties of unfolding
U VD U V T2D
2D gcd=greatestcommon divisor
U
T
V
5D 6DU0
U1
V0
V1
T0
T12D
3-unfolded
DFG
gcd(12 , 3)=3
U2 V2 T22D
DD 2D
• Unfolding preserves the number of delays in a DFG⎣w/J⎦ + ⎣(w+1)/J⎦ + … + ⎣(w + J - 1)/J⎦ = w
• Unfolding preserves precedence constraints
f f G• J-unfolding of a loop with wl delays in the original DFG gcd(wl , J) loops in the unfolded DFG. Each loop contains wl/gcd(wl , J) delays and J/ gcd(wl , J) copies of each node.
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
• Unfolding a DFG with iteration bound T∞ results in a J-unfolded DFG with iteration bound JT∞ .
DSP Design
Unfolding and Iteration BoundUnfolding and Iteration Bound
29/18TTA=3, TM=6gcd(9 , 2) = 1 1 loop
x(n)y(n)
a9Dy(2k)
29/18 ==∞Tx(n) a9D
x(2k) a5D19/9 ==∞T 5D
x(2k+1) a4DBut we process
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
y(2k+1)2 samples
DSP Design
D DA CBD D
J=3A0 C0B0
J 3
D D
A CBgcd(2 , 3) =1
A1 C1B1gcd=greatest common divisor
A2 C2B2
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
The Critical PathThe Critical PathIf edge with w<J (J-w) paths with zero
delay and w paths with 1 delay
A CBD DA0 C0B0
D D
A CBCan lead to A1 C1B1Can lead to increased
critical path!A2 C2B2
critical path!
Edge with w>=J will not
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
gcreate new critical path!
DSP Design
Sample Period Reduction
• Case 1 : A node in the DFG having computation time greater than T∞.
• Case 2 : Iteration bound is not an integer.
• Case 3 : Longest node computation is g plarger than the iteration bound T∞, and T∞is not an integer
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Sample Period Reduction: case 1Sample Period Reduction: case 1
Qb2 S
(4)
Q
Q
b1 Q T(4)(1)
D
X(n) y(n)
Q
P R U(0) (0)
2D
IIR-filter from Lab1P R U
(1)
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Sample Period Reduction: case 1The original DFG cannot have sample period equal to the iteration bound because a node computation time is more than iteration bound
Sample Period Reduction: case 1bound because a node computation time is more than iteration bound
S(4)
⎪⎬⎫⎪
⎨⎧ lt
6S
Q T(4)(1)
D ⎪⎭
⎪⎬
⎪⎩
⎪⎨
∈=∞
lwl
LlT max3
Q T
(0) (0)2D
⎭⎩
366max =⎬⎫
⎨⎧=
P R U(1) 6
32
,3
max⎭⎬
⎩⎨
∈Ll
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
2 <4, max node time
DSP Design
Sample Period(4)
SSample PeriodReduction: case 1 (4)(1)
T
S0
Q
If the computation time of
(2)(0) (0)D
2D
U
T0Q0
RPpa node ‘U’, tu, is greater than the iteration bound T then ⎡t /T ⎤ - (4)
(1) D
U0R0P0
But twoSamples!T∞, then ⎡tu/T ∞⎤ -
unfolding should be used.
t = 4 and T = 3
(4)
(4)(1)
S14Samples!
3tu = 4 and T∞ = 3
⎡ ⎤(0) (0)
(4)(1)
D
T1Q1
6
3
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
⎡4/3⎤ = 2 - unfolding( )
(1)
( )
P1 U1R1
6
DSP Design
Sample Period Reduction: case 2The original DFG cannot have sample period equal to the iteration
bound because the iteration bound is not an integer
Sample Period Reduction: case 2g
4⎪⎫⎪⎧ ltS T V(1)
DU(1)(1)(1)
D
34max =
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
∈=∞
lwlt
LlT
S T VU
DD
If a critical loop bound is of the form tl/wl where tl and wl are mutually co-prime, then wl-unfolding should be used.mutually co prime, then wl unfolding should be used.
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Unfolding of 3
DSP Design
Sample Period Reduction: case 2 (2)Sample Period Reduction: case 2 (2)(1) (1)(1)(1)
S0 V2U1T1
D
S0 V2U1T1
S T V(1)
DU(1)(1)(1)
D(1)
D(1)(1)(1)
S1 V0U2T2
4=TD
(1) (1)(1)(1)DS VUT4=∞T DS2 V1U0T0
and 3 samples gives
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
minimum sample period 4/3
DSP Design
Sample Period Reduction: case 3The original DFG cannot have sample period
l t th it ti b d b th
Sample Period Reduction: case 3
equal to the iteration bound because the longest node computation is larger than the i i b d d i iiteration bound T∞, and T∞ is not an integer
The minimum J that achieves the iteration bound is the minimun value of J such that is an integer and is greater or equal to the
∞JT
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
g g qlongest node computation time
DSP Design
Parallel processing can beParallel processing can be performed by unfolding, chapter 3
D x(2k-1)x(2k)x(2k+1)
Db0
x(2k-2)
b1 b2b0
y(2k)
b1 b2
b0 b1 b2
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
y(2k+1)
DSP Design
Another FIR-filter, J=3
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
DSP Design
Bit-Level Parallel Processinga b
Bit-parallel
a0a1a2a3
b0b1b2b33 3
b b b bBit-seriala3 a2 a1 a0 b3 b2 b1 b0
Digit-Serial(Digit-size = 2)
a2 a0
a3 a1
b2 b0
b3 b1
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
a3 a1 b3 b1
DSP Design
Bit-Paralleli+1bi+1a ibiamsbbmsba
Digit-Serialicin
i+1cout
msbcin
msbcoutia g
Bit-Serial
i
ib is
1+is
icout
Bit Serialiaib is
1+ia1+ib
2+ia
1+is
2+is
1+icout
i
icout
Δ Δ
2+ia2+ib
2+icout
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
Δ Δ
DSP Design
Bit-serial adderBit-serial adderBit-serial can be seen as a time-multiplexed architecture,in this example on addition (i e 1 iteration) takes 4cc
Bit i la3 a2 a1 a0 s3 s2 s1 s0
in this example on addition (i.e. 1 iteration) takes 4cc.
Bit-serialadder
Db3 b2 b1 b0
4l+1,2,34l+00
Switch for carry signal
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se
How to unfold switches?
DSP Design
Unfolding of Switches• The following assumptions are made when unfolding an edge U→VThe following assumptions are made when unfolding an edge U→V
containing a switch :The wordlength W is a multiple of the unfolding factor J, i.e. W = W’J.All edges into and out of the switch have no delays.All edges into and out of the switch have no delays.
• With the above two assumptions an edge U→V can be unfolded as follows :
Write the switching instance asWrite the switching instance as
Wl + u = J( W’l + ⎣u/J⎦ ) + (u%J)Draw an edge from the node Uu%J Vu%J, % %
which is switched at time instance ( W’l + ⎣u/J⎦) .
Wl+u
Viktor Öwall, Dept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se