Top Banner
ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC
17

ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

Dec 21, 2015

Download

Documents

Mitchell Harris
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

ECE 565High-Level Synthesis—An Introduction

Shantanu Dutt

ECE Dept., UIC

Page 2: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

HLS Flow

• Code/Algorithm Architecture (interconnected functional units (FUs), memory units (MUs) via muxes, demuxes, tristate buffers, buses, dedicated interconnects)

Classically, these 3 stages were performed sequentially but currently performed together (which leads to better optimization)

Page 3: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

HLS Flow (contd)

Page 4: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

HLS Flow (contd)

Allocation: Simple counting of FUs after theabove 2 stages

(Binding)

Page 5: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

Simple HLS Examples

+

Page 6: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

Simple HLS Examples (contd)

2) Mapping to h/w w/ constraints: use only 1 (X) and 1 (+) w/ X delay of 2 cc’s and + delay of 1 cc

z ldz

X +

a b

c d

mux mux

demux

x y

lda ldb

ldx

ldc ldd

ldy

mux1 mux2I0I1

I0 I1

demux

cc 3(i+1)

lda = 1 reg. “a”loaded

Note: A register is loaded at the +ve/-ve edge (in a +ve/-ve edge triggered system) of the cc after the one in which its load signal is asseted.

lda=1, ldb=1,ldc=1, ldd=1,

mux1=1, mux2=1demux=1,

ldz=1

mux1=0,mux2=0

demux=0,ldy=1

ldx=1

[z x+y](c3)

[y c+d](c2)

[x a x b](c1)

cc 3i

cc 3(i+2)

Reset

Controller FSM:

1 2 3 4 5 6

c1(1) c1(2)

c2(1) c3(1) c2(2) c3(2)

X

+

i) Non-overlapped pipelined scheduling

cc’s

Note: Unspecified control signals have either an inactive value, or if such a concept doesn’t exists for the cs, then the don’t-care value

(a) Scheduling

(b) Arch. Synthesis

(c) Controller FSMSynthesis

O0O1

Page 7: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

Simple HLS Examples (contd)

2) Mapping to h/w w/ constraints: use only 1 (X) and 1 (+) (cont’d)

1 2 3 4 5 6

c1(1) c1(2)

c2(1) c3(1) c2(2) c3(2)

X

+

ii) Overlapped pipelined scheduling

z ldz

X +

a b

c d

mux mux

demux

x y

lda ldb

ldx

ldc ldd

ldy

mux1 mux2I0I1

I0 I1

demux

cc 3(i+1)

lda=1, ldb=1,mux1=0, mux2=0

demux=0,ldy=1, ldx=1

ldc=1, ldd=1,mux1=1,mux2=1,

demux=1,ldz=1

[y c+d, x a x b]((c1, c2)

[z x+y,](c3)

cc 3iReset

Controller FSM:

cc’s

• For 4 iterations, the overlapped schedule takes 9 cc’s versus 12 cc’s by the non-overlapped sched.• Overlap. sched: Time for n iterations = 2n+1 Throughput = n/(2n+1) ~ 0.5 outputs/cc• Nonoverlap. sched: Time for n iterations = 3n Throughput = n/3n ~ 0.33 outputs/cc ~ 34% throughput improvement using an overlapped schedule

(a) Scheduling

(b) Arch. Synthesis

(c) Controller FSMSynthesis

Page 8: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

Simple HLS Examples (contd)

Condition(T/F)

in

out1 out2

T F

Distributor

Condition(T/F)

in1 in2

out

T F

Selectot• Some DFG control operation nodes:

• Conditional code: If (a > b) then c a-b;Else c b-a;

• Possible DFGs corresponding to the above conditional code:

Page 9: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

Simple HLS Examples (contd)

• Iterative code: while (a > b) a a-b;

dist

>

sel

-

a b

a

T F

T F

Initializedto F

+

b

final a

Mux

Demux

ar1

cin 1

b’+1 = 2’s compl. of -b

b’1 0

1 0

s xor ovfl= 1 -ve= 0 +ve

mux

ldr1 lda ldb

demux

ldfina

To fsmc1c2

c1 c2+

cc’s

c1 c2Scheduling& binding:

a

(a) Scheduling (using only 1 adder/sub)

(b) Arch. Synthesis

Page 10: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

Delay Nodes in DFGs

A delay node is generally implemented as a register; a delay node thus becomes a state variable.

Page 11: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

Delay Nodes in DFGs (contd)

register

Transformation in the DFG Mapping to the architecture

Page 12: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

Detailed HLS Example

Page 13: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

Detailed HLS Example (contd)

The synthesized architecture

Note: Not clear how register allocation has been done.It is sub-optimal (4 non-primary i/p regs. needed)

(a) Scheduling w/ one X (2 cc’s) & one + (1 cc); goal: min. latency

Different paths (i/p o/p) in the DFG

(b) Reg. alloc. for o/p of operations

(c) Arch. synthesis

For WAR constraint

Scheduling heuristic: Among available opers schedule those on available FUs whose delay to o/p is the highest, breaking ties in favor of those opers u whose “sibling” o/ps (o/ps to the same children) that are avail. or will be available at u’s earliest finish will have the largest lifetime at that point.

Page 14: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

Detailed HLS Example (contd)

Page 15: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

Detailed HLS Example—Register Allocation

Page 16: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

d0

3 non-primary i/pregs. needed

Detailed HLS Example—Register Allocation (contd)

• In the conflict graph (one per FU), there is an edge between 2 var. nodes if their lifetimes overlap (indicating that different registers need to be allocated to them)• Graph coloring—using min. # of colors to color node s.t. connected node pairs have different colors—in general is NP-hard• The above type of conflict graph is called an interval graph (derived from a 1-dimensional interval of the lifetimes)• Min. graph coloring can be solved optimally in linear time for interval graphs (using the left-edge algorithm that we will see later for channel routing)

Scheduling heuristic: Among available opers schedule those on avail. FUs whose delay to o/p is the highest, breaking ties in favor of those opers u whose “sibling” o/ps (o/ps to the same children) that are avail. or will be avail. at u’s earliest finish will have the largest lifetime at that point.

Page 17: ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.

Detailed HLS Example—Register Allocation (contd)

d0

3 non-primary i/pregs. needed

Scheduling heuristic: Among available opers schedule those on available FUs whose delay to o/p is the highest, breaking arbitrarily: B’s lifetime oncreases, but D’s (dep. of B) decreases similarly—heuristic should be based on more global information