Top Banner
High-Level Synthesis 1 Zebo Peng, IDA, LiTH High-Level Synthesis 1. Basic definition 2. A typical HLS process 3. Scheduling techniques 4. Allocation and binding techniques 5. Advanced issues
24

High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

Aug 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 1

Zebo

High-Level Synthesis

1. Basic definition

2. A typical HLS process

3. Scheduling techniques

4. Allocation and binding techniques

5. Advanced issues

Peng, IDA, LiTH

Page 2: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 2

Zebo Pen

Intr oduction

Definition: HLS generates register-transfer leveldesigns from behavioral specifications, in aautomatic manner.

• Input:

- The behavioral specification.

- Design constraints (cost, performance, pow-er consumption, pin-count, testability, etc.).

- An optimization function.

- A module library representing the availablecomponents at RTL.

• Output:

- RTL implementation structure (netlist).

- Controller (captured usually as a symbolicFSM).

- Other attributes, such as geometrical infor-mation.

• Goal: to generate a RTL design that implementsthe specified behavior while satisfying the de-sign constraints and optimizing the given costfunction.

g, IDA, LiTH

Page 3: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 3

Zebo

A Typical HLS Pr ocess

1. Behavioral specification:

• Which language to use?

Procedural languages

Functional languages

Graphics notations

• Explicit parallelism?

PROCEDURE Test;

VAR

A,B,C,D,E,F,G:integer;

BEGIN

Read(A,B,C,D,E);

F := E*(A+B);

G := (A+B)*(C+D);

...

END;

Input behavioral specification2. Dataflow analysis:

• Parallelism extraction.

• Eliminating high-levellanguage constructs.

• Loop unrolling.

• Program transformation.

• Common subexpressiondetection.

Dataflow description

*

++

A B C DE

*

GF

Peng, IDA, LiTH

Page 4: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 4

Zeb

A Typical HLS Pr ocess (Cont’ d)

3. Operation scheduling:

• Performance/cost trade-offs.

• Performance measure.

• Clocking strategy.

Scheduled dataflow description

*

++

A B C DE

*

GF

4. Data-path allocation:

• Operator selection.

• Register/memory alloca-tion.

• Interconnection genera-tion.

• Hardware minimization.

Partial data-path

Reg R1 Reg R2

+

Reg R3

M

A<0:7> B<0:7>

*

o Peng, IDA, LiTH

Page 5: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 5

Z

A Typical HLS Pr ocess (Cont’ d)

5. Control allocation:

• Selection of control style (PLA, micro-code, random logic, etc.).

• Controller generation.

Reg R1 Reg R2

+

Reg R3

M

A<0:7> B<0:7>

Controller description:

S1: M1=1, Load R1 next S2;

S2: Load R2 next S3;

S3: Add, Load R3 next S4;

S4: M1=0, Load R1 next...

RTL structure with controller description

ebo Peng, IDA, LiTH

Page 6: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 6

Zebo

A Typical HLS Pr ocess (Cont’ d)

6. Module binding and controller implementation:

• Selection of physical modules.

• Specification of module parameters and constraints.

• Controller implementation.

Reg R1 Reg R2

Adder 8

Reg R3

A<0:7> B<0:7>

M1

Controller ROM:

0000 : 11000000 0001

0001 : 00100000 0010

0010 : 00011000 0011

0011 : 01000000 0100

Final results

Peng, IDA, LiTH

Page 7: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 7

Zebo

The Basic Issues

• Scheduling Assignment of each operation to atime slot corresponding to a clock cycle or time inter-val.

• Resource Allocation Selection of the types ofhardware components and the number for each typeto be included in the final implementation.

• Module Binding Assignment of operation to the al-located hardware components.

• Controller Synthesis Design of control style andclocking scheme.

• Compilation of the input specification language to theinternal representation must be done.

• Parallelism Extraction To extract the inherent par-allelism of the original solution, which is usually donewith data flow analysis techniques.

• Operation Decomposition Implementation ofcomplex operations in the behavioral specification.

• ...

Peng, IDA, LiTH

Page 8: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 8

Zebo

The Scheduling Pr oblem

• Resource-constrained (RC) scheduling:

- Given a set O of operations with a partial orderingwhich determines the precedence relations, a setK of functional unit types, a type function, τ: O→K,to map the operations into the functional unit types,and resource constraints mk for each functionalunit type.

- Find a (optimal) schedule for the set of operationsthat obeys the partial ordering and utilizes only theavailable functional units.

Ex.

• Time-constrained (TC) scheduling:

+ *

+

*

*+

-

*

+

+

a := i1 + i2;

o1 := (a - i3) * 3;

o2 := i4 + i5 + i6;

d := i7 * i8;

g := d + i9 + i10;

o3 := i11 * 7 * g;

1 adder, 1 multiplier

τ: +,- → Adder* → Multiplier

Peng, IDA, LiTH

Page 9: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 9

Zebo

RC Scheduling T echniques

• ASAP: As soon as possible

- Sort the operations topologically according to theirdata /control flow;

- Schedule operations in the sorted order by placingthem in the earliest possible control step.

ControlStep

+ *

+

*

*+1 3 4 5

8

10

-

*

+

+

2

6 7

9

+ *

+

*

*

+

1 3

4

5

8

10

-

*

+

+

2

7

ASAP

(a) Sorted DFG (b) ASAP schedule

2

3

4

6

7

5

1

6

9

Peng, IDA, LiTH

Page 10: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 10

RC Scheduling T echniques (Cont’ d)

• ALAP: As late as possible

- Sort the operations topologically according to their data /control flow;

- Schedule operations in the reversed order by placingthem in the latest possible control step.

ControlStep

+ *

+

*

*+1 3 4 5

8

10

-

*

+

+

2

6 7

9

(a) Sorted DFG (b) ALAP schedule

+

*

+

*

*

+

1

3 4

5

8

10

-

*

+

+

2

6

7

9

1

2

3

4

6

5

Zebo Peng, IDA, LiTH

Page 11: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 11

Zebo

RC Scheduling T echniques (Cont’ d)

• List Scheduling

- For each control step, the operations that are avail-able to be scheduled are kept in a list;

- The list is ordered by some priority function:

1. The length of path from the operation to theend of the block;

2. Mobility: the number of control steps from theearliest to the latest feasible control step.

- Each operation on the list is scheduled one by oneif the resources it needs are free; otherwise it is de-ferred to the next control step.

ControlStep

+ *

+

*

*+1 3 4 5

8

10

-

*

+

+

2

6 7

9

+ *

+

*

*+

1 3

4 5

8

10

-

*

+

+

2

6

7

9

1

2

3

4

6

5

(a) DFG (b) List schedule

Peng, IDA, LiTH

Page 12: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 12

Zebo

TC Scheduling T echniques

• Force-Directed Scheduling: The basic idea is to bal-ance the concurrency of operations.

- ASAP and ALAP schedules are calculated to de-rive the time frames for all operations.

- For each type of operations, a distribution graph isbuilt to denote the possible control steps for eachoperation. If an operation could be done in k steps,then 1/k is added to each of these k steps.

- The algorithm tries to balance the distributiongraph by calculate the force of each operation-to-control step assignment and select the smallestforce:

An example:

+

+

*

*

+

a1

a3a2

Range

1

2

3

a1

a2a3

1

2

3

a2

Distribution Graph

*

+a3

ASAP ALAP

Force σ oi( ) sj=( ) DG sj( ) 1∆T oi( )----------------- DG s( )

s σASAP oi( )=

σALAP oi( )

∑⋅–=

Peng, IDA, LiTH

Page 13: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 13

Classification of Sc heduling Appr oaches

• Constructive scheduling - one operation is assigned to onecontrol step at a time and this process is iterated from con-trol step (operation) to control step (operation).

- ASAP

- ALAP

- List scheduling

• Global scheduling - All control step and all operations areconsidered simultaneously when operations are assignedto control steps.

- Force-directed scheduling

- Neural net scheduling

- Integer Linear Programming algorithms

• Transformational scheduling - starting from an initial sched-ule, a final schedule is obtained by successively transfor-mations.

Zebo Peng, IDA, LiTH

Page 14: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 14

Zebo

Advanced Sc heduling Issues

• Control construct consideration.

- conditional branches

- loops

• Chaining and multicycling.

• Scheduling with local timing constraints.

*100ns

(a) No chaining or multicycling

+

200ns

+

*100ns

(b) Two chained additions

+

+

100ns

(c) A multicycle multiplication

+

+

50ns*

Peng, IDA, LiTH

Page 15: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 15

Zebo

Allocation and Binding

• Allocation (unit selection) Determination of thetype and number of resources required:

- Number and types of functional units- Number and types of storage elements- Number and types of busses

• Binding Assignment to resource instances:

- Operations to functional unit instances- Values to be stored to instances of storage ele-

ments- Data transfers to bus instances

• Optimization goal

- Minimize total cost of functional units, register, busdriver, and multiplexor

- Minimize total interconnection length

- Constraint on critical path delay

s1 +

+

a

a b,e,g

+1, +3

+

+

b c d

s2

o1

o3

o2

o4

e f

g h

c,f,h d

+2, +4

Peng, IDA, LiTH

Page 16: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 16

Zebo

Appr oaches to Allocation/Binding

• Constructive start with an empty datapath andadd functional, storage and interconnects as neces-sary.

- Greedy algorithms perform allocation for onecontrol step at a time.

- Rule-based used to select type and numbers offunction units, especially prior to scheduling.

• Graph-theoretical formulations sub-tasks aremapped into well-defined problems in graph theory.

- Clique partitioning.

- Left-edge algorithm.

- Graph coloring.

• Transformational allocation

+

*

*

+

+

1

2

3

a1

+

m1

a2 a3

m2

a4

+ +

*

Reg

a1, a3, a4 a2

m1, m2

Peng, IDA, LiTH

Page 17: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 17

Clique P artitioning

• Let G = (V, E) be an undirected graph with a set V of verticesand a set E of edges.

• A clique is a set of vertices that form a complete subgraph ofG.

• The problem of partitioning a graph into a minimal number ofcliques such that each vertex belongs to exactly one cliqueis called clique partitioning.

• Formulation of functional unit allocation as a clique partition-ing problem:

- Each vertex represents an operation.

- An edge connects two vertices iff:

1. the two operations are scheduled into different con-trol steps, and

2. there exists a functional unit that is capable of carry-ing out both operations.

+

*

*

+

+

1

2

3

a1

+

m1

a2 a3

m2

a4

a1

a2

a4

a3

a clique

m1

m2

Zebo Peng, IDA, LiTH

Page 18: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 18

Clique P artitioning (Cont’ d)

• Formulation of storage allocation as a clique partitioningproblem:

- Each value needed to be stored is mapped to a vertex.

- Two vertices are connected iff the life-time of the two val-ues do not intersect.

☞ The clique partitioning problem is NP-complete.

☞ Efficient heuristics have been developed; e.g., Tseng useda polynomial time algorithm which generates very good re-sults.

Zebo Peng, IDA, LiTH

Page 19: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 19

Zebo

Tseng’ s Algorithm

• A super-graph is derived from the original graph.

• Find two connected super-nodes such that they havethe maximum number of common neighbors.

• Merge the two nodes and repeated from the first step,until no more merger can be carried out.

v1

v3 v4 v5

v2

e1,3 e1,4e2,5

e4,5e3,4

e2,3

(a)

v1

v3 v4 v5

v2

(b)

Edge Commonneighbors

e’1,3 1e’1,4 1e’2,3 0e’2,5 0e’3,4 1e’4,5 0

v1

v3 v4 v5

v2

(c)

Edge Commonneighbors

e’13,4 0e’2,5 0e4,5 0

v1

v3 v4 v5

v2

(e)

Edge Commonneighbors

e’2,5 0

v1

v3 v4 v5

v2

(d)

s13

s134

s134

s25 Cliques:

S134=(V 1,V 3,V 4)

S25 =(V 2,V 5)

Peng, IDA, LiTH

Page 20: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 20

Zebo

Left-Edg e (LE) Algorithm

• The LE algorithm is used in channel routing to mini-mize the number of tracks used to connect points.

• The register allocation problem can be solved by theLE algorithm by mapping the birth time of a value tothe left edge, and the death time of a value to the rightedge of a wire.

o2

+ *

+ *

+1 3 4

58

Variable life-times

-

* +

+

2

6

7

9

*

10

’3’

i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11

’7’a

t1

t2

d t3

gt4

o1 o3

i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 i11

a d t3

t1

o1

g t4

t2

o2 o3

Peng, IDA, LiTH

Page 21: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 21

Zebo

Left-Edg e (LE) Algorithm (Cont’ d)

• The algorithm works as follows:- The values are sorted in increasing order of their

birth time.

- The first value is assigned to the first register.

- The list is then scanned for the next value whosebirth time is larger than or equal to the death timeof the previous value.

- This value is assigned to the current register.

- The list is scanned until no more value can sharedthe same register. A new register will then be intro-duced.

t4

t2

o3

i1i2i3 i4 i5i6 i7i8 i9i10i11

a d t3

t1 g

o1

o2

i1 i2 i3 i4 i5i6 i7i8 i9i10i11

a

t1

o1

d

g

o2

t3

t4

t2

o3

a) sorted list of variables b) assignment of variables into registers

Peng, IDA, LiTH

Page 22: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 22

Zebo

Left-Edg e (LE) Algorithm (Cont’ d)

• The algorithm guarantees to allocate the minimumnumber of registers, but has two disadvantages:

- Not all life-time table might be interpreted as inter-secting intervals on a line.

loop

conditional branches

- The assignment is neither unique nor necessarilyoptimal (in terms of minimal number of multiplex-ors, for example).

Peng, IDA, LiTH

Page 23: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 23

Zebo

Advanced Issues of HLS

• Target architecture consideration, e.g. pipelining.

• General library organization.

- Component hierarchy.

- Many-to-many mapping between operations andphysical components.

- Multiple technology components.

• Domain-specific synthesis strategies.

- Control-dominated applications.

- Timing-driven optimization.

• Re-use of previous designs.

• Synthesis with commercially available sub-systems,IP-based synthesis.

• HLS with testability consideration.

• HLS with power consideration.

Peng, IDA, LiTH

Page 24: High-Level Synthesispetel71/SysSyn/lect3.frm.pdf’3’ i1 i2 i3 i4 i5 i6 i7 i8 i9 i10i11 a ’7’ t1 t2 d t3 g t4 o1 o3 i1 i2 i3 i4 i5 i6 i7 i8 i9i10 i11 adt3 t1 o1 g t4 t2 o2 o3

High-Level Synthesis 24

Zebo

Advanced Issues of HLS

• Target architecture consideration.

- Multiplexed data-path:

Operations are mapped to combinational units.

Storage is provided by distributed registers.

Interconnect is established by multiplexors/nets.

A single system clock controls all registers.

- Bidirectional bus architecture:

Functional units have storage capabilities.

Register files are often supported.

Interconnect hardware comprises bidirectionalbusses, multiplexors, drivers, and nets.

- Pipelined data paths.

Peng, IDA, LiTH