More Scheduling Resource Sharing ECE 5775 High-Level Digital Design Automation Fall 2018
More SchedulingResource Sharing
ECE 5775High-Level Digital Design Automation
Fall 2018
▸ Lab 3 is released (due Friday 10/5)– 10 FPGA boards available – Go through the CORDIC tutorial asap
1
Announcements
▸ More SDC scheduling
▸ Resource sharing overview– Sub-problems: functional unit, register, and connectivity binding
problems– Key concepts: compatibility and conflict graphs
2
Outline
▸ ILP for time-constrained scheduling minimize cTyx1,1 + x2,1 +x6,1 + x8,1 – y1 £ 0x6,2 + x7,2 + x8,2 – y1 £ 0 x7,3 + x8,3 – y1 £ 0x5,4 + x9,4 + x11,4 – y2 £ 0 …
3
Review: ILP Formulation for TCS
What is the y vector?
+´´´´
´ ´ + <
-
-
1
2
3
4
v2v1
v3
v4
v5
v6
v7
v8
v9
v10
v11
+´
´
´´
´
´
+ <
-
-
1
2
3
4
v2v1
v3
v4
v5
v6
v7 v8
v9
v10
v11
4
Review: SDC-Based Scheduling
• Target cycle time: 5ns• Delay estimates
– Add (+) 1ns– Load (ld) 3ns– Store (st) 1ns
si : schedule variable for operation ild
+
ldld
+
v1
v3
v4
v2v0
1ns
3ns
1ns
§ Dependence constraints<v0 ,v4>:s0 – s4 ≤0<v1 ,v3>:s1 – s3 ≤0<v2 ,v3>:s2 – s3 ≤0<v3 ,v4>:s3 – s4 ≤0<v4 ,v5>:s4 – s5 ≤0
§ Cycle time constraintsv2à v5 :s2 – s5 ≤-1v1à v5 :s1 – s5 ≤-1
stv51ns
▸ A linear programming formulation based on system of integer difference constraints (SDC)
▸ Difference constraints can be conveniently represented using constraint graph– Each vertex represents a variable and each weighted edge
corresponds to a different constraint – Detect infeasibility by the presence of negative cycle (by solving
single-source shortest path)
5
SDC Constraint Graph
s0
s1
s2
s3s4
00
0
0
-1
s0 – s4 ≤0s1 – s3 ≤0s2 – s3 ≤0s3 – s4 ≤0s4 – s5 ≤0s2 – s4 ≤-1s1 – s4 ≤-1s4 – s2 ≤0
0-1
s2 – s4 ≤-1s5
0
-1
s2 – s4 ≤-1s4 – s2 ≤0
0≤-1
▸ Resource constraints cannot be represented exactly in integer difference form
6
Handling Resource Constraints (NP-Hard in General)
§ Resource constraintsè Heuristic partial orderings
v0à v2 :s0 – s2 ≤-1
OR
v1à v0 :s1 – s0 ≤-1v2à v0 :s2 – s0 ≤-1
3 cycle latency
2 cycle latency
ld
+
ldld
+
v1
v3
v4
v2v0
1ns
3ns
1ns
stv51ns
• Resource constraint– Two read ports
[J. Cong & Z. Zhang, DAC, 2006] [Z. Zhang & B. Liu, ICCAD, 2013]
7
Exact and Practically Scalable Scheduling with SDC and SAT (SDS)
Conflict based search
Graph basedfeasibility checkingPolynomial time~1M variables
>1M clauses
SATResource
Constraints
Partial orderings
Difference constraints
s0 – s4 ≤0s1 – s3 ≤0s2 – s3 ≤0s3 – s4 ≤0s4 – s5 ≤0s2 – s5 ≤-1s1 – s5 ≤-1
SDCTiming
Constraints
InfeasibilityConflict clauses
Conflict-driven learning
[S. Dai, G. Liu, and Z. Zhang, FPGA 2018]
▸ Given a Boolean function F(x1, x2, … xn), find an assignment to xi’s to make F evaluate to 1– If such assignment exists, F is satisifiable– Otherwise, F is unsatisfiable
▸ Example: (x + y + z) (x’ + y’ + z) (x’ + y’ + z’) – A satisfying assignment: x=1, y=0, z=1
▸ First NP-complete problem (Cook-Levin theorem)
▸ Numerous practical applications– Hardware/software verification (e.g., equivalence checking, model checking)– Artificial intelligence (e.g., planning, automated reasoning)– Automated theorem proving– Combinatorial design
…
8
Boolean Satisfiability Problem (SAT)
▸ SAT solvers have made significant progress in scalability– From toy problems with 100-200 variables (early 90s) – To industrial applications with 1M+ variables, 5M+ constraints
(2010s)
▸ Modern SAT solvers typically employ a backtracking-based search algorithm where conflict-driven clause learning is a key to efficiency
9
Scalability of SAT Solvers
[source: A. Sabharwal, Modern SAT Solvers:Key Advances and Applications, 2011]
Ruv : whether operation u is sharing the same resource with operation v
Ouàv : denotes whether operation u is scheduled earlier than v
Ordering constraints: Operations sharing the same resources must be scheduled apart
R01à (O0à1∨O1à0 )~(O0à1∧ O1à0 )R02à (O0à2∨O2à0 )~(O0à2∧ O2à0 )R12à (O1à2∨O2à1 )~(O1à2∧ O2à1 )
Note:R01à (O0à1∨O1à0 )meansR01implies(O0à1∨O1à0 )
10
Encoding Resource Constraints in SAT
SAT clauses for ordering three load operations
ld
+
ldld
+
v1
v3
v4
v2v0
stv5
Two read ports available
Partial orderings
Difference constraints
InfeasibilityConflict clauses
▸ Is 2-cycle schedule feasible?
Conflict-Driven Learning
1 -1 -1
-1propose
conflict
Negative cyclesum = -1
11
s0
s1
s2
s3s4
-1
s5
-1
What SAT learns from SDC:Any ordering involving operation 0 before 2 should no longer be attempted
▸ Is a 2-cycle schedule feasible?
Conflict-Driven Learning
1propose
conflict
Negative cyclesum = -2
12
s0
s1
s2
s3s4
-1
s5
-1
-1-1
-1
▸ Is a 2-cycle schedule feasible?
Conflict-Driven Learning
1propose
No conflict
No negative cycle
13
s0
s1
s2
s3s4
-1
s5
-1
-1-1
Feasible! Returns schedule.
▸ Generate short conflicts– Shorter conflict è more pruning è faster
convergence
– Negative cycle = irreducibly inconsistent set of constraints• Keeps conflicts short• Becomes consistent if any constraint is removed from the set
14
Fast Conflict-Driven Learning
▸ Combining SDC and SAT with conflict-driven learning enables fast yet exact resource-constrained scheduling– Up to 1000X faster than ILP
▸ Broader applications– Not just specific to HLS– Applies to constrained scheduling problems in
other fields
15
Take-Away Points on SDS Scheduling
High-level Programming Languages
(C/C++, SystemCMatlab, ...)
Compilation
Transformations
RTLgeneration
Recap: High-Level Synthesis Flow
S0
S1
S2
S0
S1
S2
ab
z
d
3 cycles
*–
Control data flow graph (CDFG)
Finite state machines with datapath
BB3
BB1
BB2
BB4
T F
+
-
*+
*
if (condition) {…
} else {t1 = a + b;t2 = c * d;t3 = e + f;t4 = t1 * t2;z = t4 – t3;
}
16
Scheduling Binding
Allocation
Resource Sharing and Binding
▸ Resource sharing: shares resources to minimize cost, in resource usage/area/power– Typically carried out by binding in high-level synthesis– Other subtasks such allocation and scheduling greatly impact
the resource sharing opportunities
▸ Binding: maps operations, variables, and/or data transfers to the available resources– After scheduling: decide resource usage and detailed
architecture (focus of this lecture)– Before scheduling: affect both area and delay – Simultaneous scheduling and binding: better result but more
expensive
17
▸ Functional unit (FU) binding– Primary objective is to minimize the number of FUs– Considers connection cost
▸ Register binding– Primary objective is to minimize the number of registers– Considers connection cost
▸ Connectivity binding– Minimize connections by exploiting the commutative property of
some operations / FUs– NP-hard
18
Binding Sub-problems
Sharing Conditions
▸ Functional units (registers) are shared by operations (variables) of same type whose lifetimes do not overlap – Lifetime: [birth-time, death-time)
• Operation: The whole execution time (if unpipelined)• Variable: From the time this variable is defined to the time it
is last used
19
20
Operation Binding
Functional Unit Operations
Mul1 op1, op3
AddSub1 op2, op4
AddSub2 op5, op6
clock edge
×
×
+
+ +−
2 31
ab
cdefg
op1op2
op3
op4
op5
op6
Functional Unit Operations
Mul1 op1, op3
AddSub1 op2, op4, op6
AddSub2 op5
Binding 1 Binding 2
21
Register Binding
clock edge
×
×
+
+ +
−
2 31
a
b
c
d
e
f
g
Lifetime crossing clock edge;Register Implied
22
Variable Lifetime Analysis
v1 [1, 2)v2 [2, 3)v3 [3, 4)
Variables v1, v2, and v3 can share the same register
Variable lifetimes
clock edge
×
×
+
+ +−
2 3 41
a
b
cdefg
v1v2
v3
▸ Operation/variables compatibility:– Same type, non-overlapping lifetimes
▸ Compatibility graph: – Vertices: operations/variables – Edges: compatibility relation
▸ Conflict graph: Complement of compatibility graph
23
Compatibility and Conflict Graphs
a b
c
d
a b
c
d
A scheduled DFG(operations have the same type) Compatibility graph
a b
c
d
Conflict graph
Note: The graphs for variables/registers can be constructed in a similar way
Clique Cover Number and Chromatic Number
▸ Compatibility graph:– Partition the graph into a minimum number of cliques
• Clique in an undirected graph is a subset of its vertices such that every two vertices in the subset are connected by an edge
▸ Conflict graph:– Color the vertices by a minimum number of colors (chromatic
number), where adjacent vertices cannot use the same color
24
a b
c
d
a b
c
d
A scheduled DFG Clique partitioning on compatibility graph
a b
c
d
Coloring on conflict graph
Operations have same type
▸ Next lecture: More Binding, Pipelining
25
Before Next Class
▸ These slides contain/adapt materials developed by– Prof. Deming Chen (UIUC)– Prof. Jason Cong (UCLA)
26
Acknowledgements