Top Banner
4/2/2013 1 ĐẠI HỌC QUỐC GIA TP.HỒ CHÍ MINH TRƯỜNG ĐẠI HỌC BÁCH KHOA KHOA ĐIỆN-ĐIỆN TỬ BỘ MÔN KỸ THUẬT ĐIỆN TỬ TP.Hồ Chí Minh 01/2013 XỬ LÝ TÍN HiỆU SỐ VỚI FPGA Chaper 4: Retiming (Tái định thì) GV: Hoàng Trang Email: [email protected] [email protected] Thank to: thầy Hồ Trung Mỹ Slide: from text book of Parhi 1 Hoàng Trang BM Điện Tử-DSP-FPGA-chapter4 01/2013 Thuật ngữ English Vietnamses Pipelining tạo đường ống Cutset tập cắt Transposed SFG SFG chuyển vị Data broadcast truyền dữ liệu khắp nơi, phát tán dữ liệu Parallel processing xử lý song song block processing xử lý khối communication bound giới hạn truyền thông thời gian trễ truyền thông 2
28

X LÝ TÍN Hi U S V I FPGA

Apr 28, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: X LÝ TÍN Hi U S V I FPGA

4/2/2013

1

ĐẠI HỌC QUỐC GIA TP.HỒ CHÍ MINHTRƯỜNG ĐẠI HỌC BÁCH KHOA

KHOA ĐIỆN-ĐIỆN TỬ BỘ MÔN KỸ THUẬT ĐIỆN TỬ

TP.Hồ Chí Minh 01/2013

XỬ LÝ TÍN HiỆU SỐ VỚI FPGAChaper 4: Retiming

(Tái định thì)GV: Hoàng Trang

Email: [email protected]@gmail.com

Thank to: thầy Hồ Trung MỹSlide: from text book of Parhi

11

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

Thuật ngữ

English VietnamsesPipelining tạo đường ốngCutset tập cắtTransposed SFG SFG chuyển vịData broadcast truyền dữ liệu khắp nơi, phát tán dữ liệuParallel processing xử lý song songblock processing xử lý khốicommunication bound giới hạn truyền thông

thời gian trễ truyền thông

2

Page 2: X LÝ TÍN Hi U S V I FPGA

4/2/2013

2

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

Outline

• Retiming Introduction

• Preliminaries

– Quantitative Description

– Properties of Retiming

– Solving systems of inequalities

• Special Cases

– Cutset Retiming

– Pipelining

• Uses of Retiming

– Retiming for Clock Period Minimization

– Retiming for Register Minimization

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

4.1 INTRODUCTION

• Retiming is a transformation technique used to

change the locations of delay elements in a circuit

without affecting the input/output characteristics of

the circuit.

• For example, consider the IIR filters in Fig. 4.1(a) &

(b). Although the filters in Fig. 4.1(a) and Fig. 4.1(b)

have delays at different locations, these filters have

the same input/output characteristics. These 2

filters can be derived from one another using

retiming.

4

Page 3: X LÝ TÍN Hi U S V I FPGA

4/2/2013

3

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 5

The filter in Fig. 4.1(b) is described byThe filter in Fig. 4.1(a) is described by

Example:

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

Applications of Retiming

• Retiming has many applications in synchronous circuit

design. These applications include

– reducing the clock period of the circuit,

– reducing the number of registers in the circuit,

– reducing the power consumption of the circuit, and

– logic synthesis

6

Page 4: X LÝ TÍN Hi U S V I FPGA

4/2/2013

4

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

Applications of Retiming (cont’d)

• Retiming can be used to increase the clock rate of a circuit by

reducing the computation time of the critical path.

• For example:

– The critical path of the filter in Fig. 4.1(a) = TM +TA = 3 u.t. => this filter cannot be clocked with a clock period of less than 3 u.t.

– The retimed filter in Fig. 4.1(b) = TA+TA = 2 u.t. => this filter can be clocked with a clock period of 2 u.t.

– By retiming the filter in Fig. 4.1(a) to obtain the filter in Fig. 4.1(b), the clock period has been reduced from 3 u.t. to 2 u.t., or by 33%.

• Retiming can be used to decrease the number of registers in a

circuit. The filter in Fig. 4.1 (a) uses 4 registers while the filter in

Fig. 4.1 (b) uses 5 registers.

• Since retiming can affect the clock period and the number of

registers, it is sometimes desirable to take both of these

parameters into account.7

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 8

Page 5: X LÝ TÍN Hi U S V I FPGA

4/2/2013

5

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

Example:

9

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

Retiming• Generalization of Pipelining

• Pipelining is Equivalent to Introducing Many

delays at the Input followed by Retiming

10

Page 6: X LÝ TÍN Hi U S V I FPGA

4/2/2013

6

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

4.2 DEFINITIONS AND PROPERTIES

4.2.1 Quantitative Description of Retiming

11

• Retiming maps circuit G to a retimed circuit Gr

• Retiming solution characterized by a value r(V) for

each node V in graph

– Let w(e) denote weight of edge e of graph G, and wr(e) denote weight of edge e of graph Gr

– Weight of edge rom U V in the retimed graph is computed from weight of edge in original graph using

wr(e) = w(e) + r(V) - r(U)

• Retiming solution is feasible if wr(e) >= 0 for all edges

e

Node Retiming

• Transfer delay through a node in DFG:

• r(v) = # of delays transferred from out-going edges to incoming edges of node v

• w(e) = # of delays on edge e

• wr(e) = # of delays on edge e afterretiming

• Retiming equation:

subject to wr(e) ≥ 0.

• Let p be a path from v0 to vk

then

v v

3D

D2D

3D

D2D

r(v) = 2

( ) ( ) ( ) ( )rw e w e r v r u= + −

( )

1

0

1

1

0

0

( ) ( )

( ) ( ) ( )

( ) ( ) ( )

k

r r i

i

k

i i i

i

k

w p w e

w e r v r v

w p r v r v

=

+=

=

= + −

= + −

v0e0 v1

e1 W vkek

u ve

p

Page 7: X LÝ TÍN Hi U S V I FPGA

4/2/2013

7

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

Invariant Properties

1. Retiming does NOT change the total number

of delays for each cycle.

2. Retiming does not change loop bound or

iteration bound of the DFG

3. If the retiming values of every node v in a

DFG G are added to a constant integer j, the

retimed graph Gr will not be affected. That is,

the weights (# of delays) of the retimed graph

will remain the same.

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 14

Example:

Page 8: X LÝ TÍN Hi U S V I FPGA

4/2/2013

8

DFG Illustration of the Example

T∞ = max. {(1+2+1)/2, (1+2+1)/3} = 2Cr. Path delay = 2+1 = 3 t.u

T∞ = max. {(1+2+1)/2, (1+2+1)/3} = 2Cr. Path Delay = max{2,2,1+1} = 2 t.u

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

4.2.2 Properties of Retiming• Weight of a path from node 0 to node k is

number of delays between those nodes

• Computation time of a path between node 0

to node k is the sum of computation times

(adders, etc.) of each of the nodes

• Properties:

– Retiming does not change number of delays in a cycle

– Retiming does not alter iteration bound of DFG

– Adding a constant value j to the retiming value of each node does not change the mapping from G to Gr

1

0

( ) ( )k

i

i

w p w e−

=

=∑

0

( ) ( )k

i

i

t p t V=

=∑

Page 9: X LÝ TÍN Hi U S V I FPGA

4/2/2013

9

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 17

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

4.3 Solving Systems of Inequalities• Shortest path algorithms (Appendix A of Parhi book)

– Bellman-Ford– Floyd-Warshall

• Given a set of M inequalities and N variables, where each inequality has the form ri – rj <= k for integer values of k, can use one of shortest path algorithms to determine if solution exists and to find one solution

• Procedure:– 1) Draw the constraint graph

a) Draw the node i for each of the N variables ri, i=1,..Nb) Draw the node N+1c) For each inequality ri – rj <= k, draw the edge j�i for node j to node i

with length kd) For each node i, i=1,2,WN, draw the edge N + 1 � i from the node N+1

to the node i with length 0– 2) Solve using a shortest path algorithm

a) the system of equalities has a solution if and only if the constraints graph contains no negative cycles

b) if a solution exists, one solution is where ri is the minimum-length path from the node N+1 to the node i

Page 10: X LÝ TÍN Hi U S V I FPGA

4/2/2013

10

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 19

Bellman-Ford Algorithm

Find shortest path from an arbitrarily chosen origin node U to each node in a directed graphif no negative cycle exists.

Given a direct graph

w(m,n): weight on edge from node m to node n, = ∞ if there is no edge from m to n

r(i,j): the shortest path from node U to node i within j-1 steps.

r(i,1) = w(U,i),

r(i,j+1) = min {r(k,j) + w(k,i)}, j = 1, 2, …, N-1

if max(r(:,n-1)-r(:,n))>0, then there is a negative cycle. Else, r(i,n-1) gives shortest cycle length from i to U. Note that 1 > 0, hence there is at least

one negative cycle.

21

34

11

2

−3

1

0 3 2 2 2

0 1 1 0 0 1 1

0 2 1 1 1 0

1 0 1 1 1 0

W r

− ∞ ∞ ∞ ∞ − − = = ∞ ∞

∞ ∞

Page 11: X LÝ TÍN Hi U S V I FPGA

4/2/2013

11

Floyd-Warshall Algorithm

Find shortest path between all

possible pairs of nodes in the

graph provided no negative cycle

exists.

Algorithm:

Initialization: R(1) =W;

For k=1 to N

R(k+1)(u,v) = min{R(k)(u,:) + R(k)(:,v)}

If R(k)(u,u) < 0 for any k, u, then a

negative cycle exist. Else,

R(N+1)(u,v) is SP from u to v

21

34

21

2

−3

1

(2)

(3) (4) (5)

0 3 0 3 2 1

0 1 2 3 0 1 2

0 2 3 0 2

1 0 1 2 0

0 3 2 1

3 0 1 2

3 0 0 2

1 2 1 0

W R

R R R

− ∞ ∞ − − − ∞ = = ∞ ∞ ∞

∞ ∞ − ∞ − − −

= = =

− −

Retiming Example – Bellman-Ford Algorithm

• For retiming example:

– r(2) – r(1) ≤ 1

– r(1) – r(3) ≤ 0

– r(1) – r(4) ≤ 1

– r(3) – r(2) ≤ –1

– r(4) – r(2) ≤ –1

• Bellman-Ford Algorithm for

Shortest Path

213

4

5

1

1

00

00

0

−1

−1

0 1

0 1 1

0 0

1 0

0 0 0 0 0

0 0 1 1

0 0 0 0

0 1 1 1

0 1 1 1

0 0 0 0

W

R

∞ ∞ ∞ ∞ − − ∞ = ∞ ∞ ∞

∞ ∞ ∞

− − = − − −

− − −

Page 12: X LÝ TÍN Hi U S V I FPGA

4/2/2013

12

Retiming Example – Floyd-Warshall algorithm

• Floyd-Warshall algorithm

(1) (3) (4) (5) (6)

(2)

0 1 0 1 0 0

0 1 1 1 0 1 1

0 0 0 1 0 0

1 0 1 2 1 0

0 0 0 0 0 1 0 1 1 0

0 1 0 0

1 0 1 1

0 1 0

1 2 0

0 0 1 1 0

W R R R R R

R

∞ ∞ ∞ ∞ ∞ − − ∞ − − − ∞ = = = = = =∞ ∞ ∞ ∞

∞ ∞ ∞ ∞ − − −

∞ − − − ∞ = ∞ ∞

∞ ∞ − −

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

4.4 RETIMING TECHNIQUES

• This section considers some techniques used for

retiming:

– First, two special cases of retiming, namely, cutset retiming and pipelining, are considered.

– Two algorithms are then considered for etiming to minimize the clock period and retiming to minimize the number of registers that are required to implement the circuit.

24

Page 13: X LÝ TÍN Hi U S V I FPGA

4/2/2013

13

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

4.4.1 Cutset Retiming and Pipelining

Cutset Retiming

25

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

Single Node Subgraph Cutset Retiming

26

Page 14: X LÝ TÍN Hi U S V I FPGA

4/2/2013

14

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 27

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 28

Page 15: X LÝ TÍN Hi U S V I FPGA

4/2/2013

15

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 29

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 30

Page 16: X LÝ TÍN Hi U S V I FPGA

4/2/2013

16

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 31

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 32

Page 17: X LÝ TÍN Hi U S V I FPGA

4/2/2013

17

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 33

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

Pipelining

34

Fig. 4.6 (a) The unretimed DFG with a

cutset shown as a dashed line. (b) The 2

graphs G1 and G2 formed by removing the edges in the cutset. (c) The graph obtainedby cutset retiming with k = 2.

(a) (b)

(c)

Page 18: X LÝ TÍN Hi U S V I FPGA

4/2/2013

18

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

Lattice Filter

35

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

N‐Slow Down

36

• Cutset retiming is often used in combination with slow-down. • The procedure is to first replace each delay in the DFG with N delays to create an N -slow version of the DFG and then to perform cutset retiming on the

N –slow DFG

Page 19: X LÝ TÍN Hi U S V I FPGA

4/2/2013

19

Time Scaling (Slow Down)

• Transform each delay element (register) D to ND and reduce the sample frequency by N fold will slow down the computation N times.

• During slow down, the processor clock cycle time remains unchanged. Only the sampling cycle time increased.

• Provides opportunity for retiming, and interleaving.

+

××××D

V x(3) x(2) x(1)

+

××××2D

V y(3) y(2) y(1)

V -- x(3) -- x(2) -- x(1) V y(3) -- y(2) -- y(1)

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 38

Page 20: X LÝ TÍN Hi U S V I FPGA

4/2/2013

20

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

39

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 40

Page 21: X LÝ TÍN Hi U S V I FPGA

4/2/2013

21

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 41

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 42

Page 22: X LÝ TÍN Hi U S V I FPGA

4/2/2013

22

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 43

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 44

Page 23: X LÝ TÍN Hi U S V I FPGA

4/2/2013

23

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 45

Retiming of N‐‐‐‐Slow Down with Cutset Retiming

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

• In previous lectures, we have learned to calculate the

iteration bound of a DFG

– Iteration bound determines the minimum clock period of a recursive DFG

• Retiming for clock period minimization is the tool

used to cause a recursive DFG to have a clock period

to equal the iteration bound

4.4.2 Retiming for Clock Period Minimization

Page 24: X LÝ TÍN Hi U S V I FPGA

4/2/2013

24

Retiming for Clock Period Minimization cont’d

• Minimum feasible clock period is computation time of the

critical path, which is the path with the longest computation

time among all paths with no delays. Minimum clock period is

Φ(G)

• Want to find a retiming solution Φ(Gr0) <= Φ(Gr) for any other

retiming solution r. In other words, we want to find the

retiming solution with minimum clock period

• Nomenclature:

– W(U,V) = minimum numbers of registers on any path from node U to V

– D(U,V) = maximum computation time among all paths from U to V

with weight W(U,V)

( ) max{ ( ) : ( ) 0}G t p w pΦ = =

( , ) min{ ( ) : }p

W U V w p U V= →

( , ) max{ ( ) : ( ) ( , )}p

D U V t p U V and w p W U V= → =

Algorithm for Retiming for Clock Period

Minimization

• Algorithm for retiming for clock period minimization

• First construct W(U,V) and D(U,V)

– 1) Let M=tmax·n where tmax is the maximum computation time of the

nodes in G and n is the number of nodes in G.

– 2) Form a new graph G' which is the same as G except the edge

weights are replaced by w'(e) = Mw(e) – t(U) for all edges e for U�V

– 3) Solve the all-pairs shortest path problem on G' (using Floyd-

Warshall, for example). Let S'UV be the shortest path from U to V.

– 4) If U ≠ V, then W(U,V) = ceil(S'UV/M) and D(U,V) = MW(U,V) - S'UV +

t(V). If U=V, then W(U,V) = 0 and D(U,V) = t(U). Ceil() is the ceiling

function.

• Use W(U,V) and D(U,V) to determine if there is a retiming

solution that can achieve a desired clock period c.

– Usually set this desired clock period equal to the iteration bound of

the circuit.

Page 25: X LÝ TÍN Hi U S V I FPGA

4/2/2013

25

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

Algorithm for Retiming for Clock Period

Minimization cont'd– Given a desired clock period c, there is a feasible retiming solution r

such that Φ(Gr) <= c if the following constraints hold• CONSTRAINT 1: (feasibility) r(U) – r(V) <= w(e) for every U�V along

edge e of G– This enforces the numbers of delays on each edge in the retimed graph to be

nonnegative

• CONSTRAINT 2: (critical path) r(U) – r(V) <= W(U,V) – 1 for all vertices U,V, in G such that D(U,V) > c

– This enforces Φ(Gr) <= c

• Thus, to find a solution

1) pick a value of c (usually equal to iteration bound)2) Create a series of inequalities based on the feasibility constraint. 3) Create a series of inequalities based on the critical path constraint.4) Combine these (using most restrictive if overlap exists) and create a

constraint graph. 5) Find feasibility using shortest-path algorithm (i.e. Floyd-Warshall) and

find retiming values

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 50

Page 26: X LÝ TÍN Hi U S V I FPGA

4/2/2013

26

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 51

Retiming to Reduce Registers

• Register Sharing

When a node has multiple fan-

out with different number of

delays, the registers can be

shared so that only the branch

with max. # of delays will be

needed.

• Register reduction through node delay

transfer from multiple input edges to

output edges (e.g. r(v) > 0)

• Should be done only when clock cycle

constraint (if any) is not violated.

D

D

D

Delay reduction

4.4.3 Retiming for Register Minimization

(a) Usage: 1 + 3 + 7 = 11 Reg (b) Usage: 1 + 2 + 4 = 7 Reg

Page 27: X LÝ TÍN Hi U S V I FPGA

4/2/2013

27

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

Retiming for General DFG

53

Example:

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013

Other Applications of Retiming

• Retiming for Folding (Chapter 6)

• Retiming for Power Reduction (Chap. 17)

• Retiming for Logic Synthesis (Beyond Scope of

This Class)

• Multi-Rate/Multi-Dimensional Retiming

(Denk/Parhi, Trans. VLSI, Dec. 98, Jun.99)

54

Page 28: X LÝ TÍN Hi U S V I FPGA

4/2/2013

28

Hoàng TrangBM Điện Tử-DSP-FPGA-chapter4 01/2013 55

END chapter 4