ENGG3190 ENGG3190 Logic Synthesis Logic Synthesis High Level Synthesis High Level Synthesis Winter 2014 Winter 2014 S. Areibi S. Areibi School of Engineering School of Engineering University of Guelph University of Guelph
Dec 13, 2015
ENGG3190ENGG3190Logic SynthesisLogic Synthesis
High Level SynthesisHigh Level Synthesis
Winter 2014Winter 2014
S. AreibiS. Areibi
School of EngineeringSchool of Engineering
University of GuelphUniversity of Guelph
OutlineOutline• Synthesis & Abstraction Levels of DesignSynthesis & Abstraction Levels of Design• High Level SynthesisHigh Level Synthesis
– DefinitionDefinition– MotivationMotivation
• Main Preprocessing StepsMain Preprocessing Steps– Parsing & AnalysisParsing & Analysis– OptimizationOptimization– Intermediate Forms (DFG/CFG)Intermediate Forms (DFG/CFG)
• Main Processing StepsMain Processing Steps– AllocationAllocation– SchedulingScheduling– BindingBinding
2
Synthesis of Digital CircuitsSynthesis of Digital Circuits
Computational Computational BooleanBooleanAlgebraAlgebra
Two Level & Two Level & Multi Level LogicMulti Level Logic
SynthesisSynthesis
Sequential Logic Sequential Logic SynthesisSynthesis
Data StructuresData StructuresPCNPCN
BDD’s & SAT BDD’s & SAT AlgorithmsAlgorithms
Logic Synthesis
High Level SynthesisHigh Level SynthesisRTLRTL
NETLIST (Gates & Wires)NETLIST (Gates & Wires)
Physical SynthesisPhysical Synthesis
Abstraction levelsAbstraction levels
Level Behavior StructureSpecification System specification
System Algorithms CPU’s, MEM’sBUS’s
Register (RTL) Registertransfers
REG’s, ALU’s, MUX’s
Logic Booleanexpressions
Gates,flip-flops
Circuit Transferfunctions
Transistors
SystemSystem
High-level
Logic
Physical
Synthesisstep
Level Behavior Structure
High-Level SynthesisHigh-Level Synthesis
High-level Code
Custom Circuit
High-Level Synthesis
Could be C, C++, Java, Perl, Python, SystemC, ImpulseC, VHDL, Verilog, etc.
Usually a RT VHDL description, but could as low level as a bit file
High-Level SynthesisHigh-Level Synthesis
High-Level Synthesis
acc = 0;for (i=0; i < 128; i++) acc += a[i];
acci
<
addra[i]
++ +1 128
2x1
0
2x1
0
1
2x1
&a
In from memory
Memory addressaccDone Memory Read
Controller
High Level Synthesis
ConstraintsAreaTime: Clock Period Nr. of clock stepsPower
+ -
* <
LibraryWHILE G < K LOOP F := E*(A+B); G := (A+B)*(C+D);END LOOP;
Algorithm
A C B D EX
Y
F G
K
+ *
<
Datapath
PLA
Latches
Controller
7
Goal of High Level SynthesisGoal of High Level Synthesis
FromFrom
Behavioral specification at ‘System Level’Behavioral specification at ‘System Level’(Algorithms)(Algorithms)
ToTo
Structural implementation at ‘Register Transfer Level’ of Data path Structural implementation at ‘Register Transfer Level’ of Data path (ALU’s, REG’s, MUX’s) and Controller(ALU’s, REG’s, MUX’s) and Controller
• OptimizeOptimize Area, Performance, Power, … Area, Performance, Power, …• Abide by constraints, Abide by constraints, • Generally restricted to a single processGenerally restricted to a single process• Generally data path is optimizedGenerally data path is optimized; controller is by-product ; controller is by-product
Architectural Architectural versusversus Logic Synthesis Logic Synthesis
• Transform Transform behavioralbehavioral into into structuralstructural view. view.
• Behavioral-levelBehavioral-level synthesi synthesis:s:– Architectural abstraction level (Algorithm).Architectural abstraction level (Algorithm).
– Determine Determine macroscopicmacroscopic structure (RTL).structure (RTL).
– Example of synthesis:Example of synthesis: major building blocks. major building blocks.
• Logic-level Logic-level synthesisynthesis:s:– Logic abstraction level (Boolean Equations).Logic abstraction level (Boolean Equations).
– Determine Determine microscopicmicroscopic structure (Gates).structure (Gates).
– Example of synthesis:Example of synthesis: logic gate interconnection. logic gate interconnection.
Level of Automation must go up? Why?Level of Automation must go up? Why?
Architectural-level synthesis motivationArchitectural-level synthesis motivation
1.1. Raise Raise input abstractioninput abstraction level.level.o Reduce specification of details (Reduce specification of details (hides themhides them).).o Extend designer base.Extend designer base.o Self-documentingSelf-documenting design specifications. design specifications.o Ease modifications and extensions (Ease modifications and extensions (PortabilityPortability))
2.2. Reduce Reduce design timedesign time..o Rely on Rely on synthesis compilers synthesis compilers to produce RTLto produce RTLo Faster to Faster to verifyverify our designs at Arch Level our designs at Arch Level
3.3. ExploreExplore and optimize macroscopic structure: and optimize macroscopic structure:o Series/parallel execution Series/parallel execution of operations.of operations.o Reduce power consumption (better algorithms)Reduce power consumption (better algorithms)
Design Space ExplorationDesign Space Exploration
Dela
y
Area
Arch I
Arch II
Arch III
We consider here totally different architectures
13
Design Space Exploration: Multi Objective OptimizationDesign Space Exploration: Multi Objective Optimization
Area
Area
Area
Latency
Latency
Latency
Latency Max
Area Max
Cycle-time
Stages of architectural-level synthesisStages of architectural-level synthesis
1.1. Translate HDL or C/C++ Translate HDL or C/C++ models models into: into: Data Flow Data Flow GraphsGraphs, , SequencingSequencing Graphs. Graphs.
2.2. Behavioral-levelBehavioral-level optimization: optimization: Optimize abstract models Optimize abstract models independentlyindependently from the from the
implementation parameters.implementation parameters.
3.3. ArchitecturalArchitectural synthesis and optimization: synthesis and optimization: Create Create macroscopic structuremacroscopic structure::
• data-pathdata-path and and control-unicontrol-unitt..
Consider Consider area and delayarea and delay information of the information of the implementation. (on the implementation. (on the globalglobal level) level)
HLS FlowHLS Flow
Syntactic Analysis
Optimization
Scheduling/Resource Allocation
Binding/Resource Sharing
High-level Code
Intermediate Representation
Controller + Datapath
Converts code to intermediate representation - allows all following steps to use language independent format.
Determines when each operation will execute, and resources used
Maps operations onto physical resources
Front-end
Back-end
Analysis and ParsingAnalysis and Parsing
Syntactic AnalysisSyntactic Analysis
• Definition: Analysis of code to Definition: Analysis of code to verifyverify syntactic syntactic correctnesscorrectness– Converts code into intermediate representationConverts code into intermediate representation
• 2 steps2 steps– 1) Lexical analysis (Lexing)1) Lexical analysis (Lexing)– 2) Parsing2) Parsing
Syntactic Analysis
High-level Code
Intermediate Representation
Lexical Analysis
Parsing
Lexical AnalysisLexical Analysis
• Lexical analysis (lexing)Lexical analysis (lexing)– BBreaks code reaks code into a series of into a series of defined defined tokenstokens– Token: defined language constructsToken: defined language constructs
x = 0;if (y < z) x = 1;
Lexical Analysis
ID(x), ASSIGN, INT(0)INT(0), SEMICOLON, IF, LPAREN, ID(y), LT, ID(z), RPAREN, ID(x), ASSIGN, INT(1)INT(1), SEMICOLON
Lexing ToolsLexing Tools
• Define tokens using regular expressions - outputs C code that Define tokens using regular expressions - outputs C code that lexes inputlexes input
– Common tool is “lex”Common tool is “lex”
/* braces and parentheses */"[" { YYPRINT; return LBRACE; }"]" { YYPRINT; return RBRACE; }"," { YYPRINT; return COMMA; }";" { YYPRINT; return SEMICOLON; }"!" { YYPRINT; return EXCLAMATION; }"{" { YYPRINT; return LBRACKET; }"}" { YYPRINT; return RBRACKET; }"-" { YYPRINT; return MINUS; }
/* integers[0-9]+ { yylval.intVal = atoi( yytext ); return INT;}
ParsingParsing
• Performs analysis on token sequence to Performs analysis on token sequence to determinedetermine correct correct grammatical structuregrammatical structure– Languages defined by context-free grammarLanguages defined by context-free grammar
Program = ExpExp = Stmt SEMICOLON |
IF LPAREN Cond RPAREN Exp |
Exp Exp
Cond = ID Comp ID
Stmt = ID ASSIGN INTx = 0;if (y < z) x = 1;
x = 0; x = 0; y = 1;
if (a < b) x = 10;
if (var1 != var2) x = 10;
x = 0;if (y < z) x = 1; y = 5; t = 1;
GrammarCorrect Programs
Comp = LT | NE
ParsingParsing
Program = ExpExp = S SEMICOLON |
IF LPAREN Cond RPAREN Exp |
Exp Exp
Cond = ID Comp ID
S = ID ASSIGN INT
Grammar
Comp = LT | NE
x = y;
x = 3 + 5;
x = 5;;;;
if (x+5 > y) x = 2;
Incorrect Programs
x = 5
Parsing ToolsParsing Tools
• Define grammar in special languageDefine grammar in special language– Automatically creates parser based on grammarAutomatically creates parser based on grammar– Popular tool is “yacc” Popular tool is “yacc” - yet-another-compiler-- yet-another-compiler-
compilercompiler
program: functions { $$ = $1; } ;
functions: function { $$ = $1; } | functions function { $$ = $1; } ; function: HEXNUMBER LABEL COLON code { $$ = $2; } ;
Overview of Hardware SynthesisOverview of Hardware Synthesisconverts the program text file into strings of tokens. Tokens can be specified by regular expressions. In the UNIX world, the “lex” tools are popular for this.
•The syntax of a programming language is specified by a grammar. (A grammargrammar defines the order and types the order and types of tokens.) This analysis organized streams of tokens into an abstract syntax treesyntax tree.
Overview of Hardware SynthesisOverview of Hardware Synthesisanalyze the semantics, or meanings, of the program. Generate a symbol tablesymbol table. Check
for uniqueness of symbols and information about them.
determine the order and organization of operations
Intermediate RepresentationsIntermediate Representations
DFG/CDFGDFG/CDFG
Intermediate RepresentationIntermediate Representation
• Parser converts tokens to intermediate representationParser converts tokens to intermediate representation– Usually, an abstract Usually, an abstract syntax treesyntax tree
x = 0;if (y < z) x = 1;d = 6;
Assign
if
cond assign assign
x 0
x 1 d 6y < z
Intermediate RepresentationIntermediate Representation
• Why use intermediate representation?Why use intermediate representation?– EasierEasier to analyze/optimize than source code to analyze/optimize than source code– Theoretically can be used Theoretically can be used for all languagesfor all languages
• Makes synthesis back end language independentMakes synthesis back end language independent
Syntactic Analysis
C Code
Intermediate Representation
Syntactic Analysis
Java
Syntactic Analysis
Perl
Back End
Scheduling, resource allocation, binding, independent of source language - sometimes optimizations too
Intermediate RepresentationIntermediate Representation• Different TypesDifferent Types
– Abstract Syntax Tree Abstract Syntax Tree – Data Flow Graph (DFG)Data Flow Graph (DFG)– Sequencing Graph (DFG with Source and Sink)Sequencing Graph (DFG with Source and Sink)– Control Flow Graph (CFG)Control Flow Graph (CFG)– Control/Data Flow Graph (CDFG)Control/Data Flow Graph (CDFG)
• CDFG CDFG Combines control flow graph (CFG) and data flow graph (DFG) Combines control flow graph (CFG) and data flow graph (DFG)
* *+
Data Flow Graph
Control Flow Graph
Data Flow GraphData Flow Graph
• DefinitionDefinition– A directed graph that shows the A directed graph that shows the data data
dependenciesdependencies between a number of functions between a number of functions– G = (V,E)G = (V,E)
• Nodes (V): each node having input/output data portsNodes (V): each node having input/output data ports• Arces (E): connections between the output ports and Arces (E): connections between the output ports and
input portsinput ports
– SemanticsSemantics• Fire Fire when input data are readywhen input data are ready• Consume data Consume data from input ports and produce data to its from input ports and produce data to its
output ports output ports • There may be many nodes that are ready to fire at a There may be many nodes that are ready to fire at a
given time given time
Data Flow GraphsData Flow Graphs• DFGDFG
– Represents data dependencies between operationsRepresents data dependencies between operations
x = a+b;y = c*d;z = x - y;
+ *
-
a b c d
x y z
-1
+
-
x
/
**
sqrt
x
x
b 4 c a 2
-/
X1
X2
a
acbbx
2
42
1
a
acbbx
2
42
2
Multiplication
Constant
Square root
Division
Nodes of DFG can be any Nodes of DFG can be any operators, also very operators, also very complex operatorscomplex operators
Data flow graph constructionData flow graph construction
original code:
x a + b;
y a * c;
z x + d;
x y - d;
x x + c;
a b c d
+ *
+
+
yxzx
-
x
Data flow graph constructionData flow graph construction
original code:
x a + b;
y a * c;
z x + d;
x y - d;
x x + c;
single-assignment form:
x1 a + b;
y a * c;
z x1 + d;
x2 y - d;
x3 x2 + c;
Data flow graph constructionData flow graph construction
single-assignment form:
x1 a + b;
y a * c;
z x1 + d;
x2 y - d;
x3 x2 + c;
a b c d
+ *
+
+y
x3
z
x1
-
x2
36
Sequencing GraphSequencing Graph
*
* * + <
* * * +
Add source and sink nodes (NOP) to the DFG
*
NOP
* * + <
* * * +
NOPData Flow Graph (DFG)
Sequencing GraphRequired for some Scheduling Algorithms
Control Flow GraphsControl Flow Graphs
• CFGCFG– Represents Represents control flow dependencies control flow dependencies of of basic blocksbasic blocks– Basic block is section of code that always executes from Basic block is section of code that always executes from
beginning to endbeginning to end• I.e. no jumps into or out of blockI.e. no jumps into or out of block
acc = 0;for (i=0; i < 128; i++) acc += a[i];
if (i < 128)
acc=0, i = 0
acc += a[i]i ++
Done
Control/Data Flow GraphControl/Data Flow Graph• CDFG CDFG Combines CFG and DFG Combines CFG and DFG
– Maintains DFG for each node of CFGMaintains DFG for each node of CFG
acc = 0;for (i=0; i < 128; i++) acc += a[i];
if (i < 128)
acc=0; i=0;
acc += a[i]i ++
Done
acc
0
i
0
+
acc a[i]
acc
+
i 1
i
Control/Data Flow GraphControl/Data Flow Graph
Definition– A directed graph that represents the control dependencies among
the functions branch fall-through
– G=(V,E) Nodes (V)
– Encapsulated DFG– Decision
Arcs (E)– flow of the controls
Very similar to FSMD– Operation rectangles (instructions) can be vary complicated– Diamonds for predicates can be very complicated and require
many clock pulses to complete.
CDFG ExampleCDFG Example
fun0();
if (cond1) fun1();
else fun2();
fun3();
switch(test1) {
case 1: fun4(); break;
case 2: fun5(); break;
case 3: fun6(); break;
}
fun7();
fun0
cond1
fun3
fun2fun1
fun5 fun6fun4
fun7
test1
Y N
SummarySummary
Data Flow GraphData Flow Graph (DFG) – models data dependencies.– Does not require that nodes be fired in a particular
order.– Models operationsoperations in the functional model—no
conditionals. – Allocation and Mapping– Scheduling – ASAP, ALAP, List-based schedulingScheduling – ASAP, ALAP, List-based scheduling
Control/Data Flow GraphControl/Data Flow Graph– Represents control dependencies
High-Level Synthesis:High-Level Synthesis: OptimizationOptimization
Synthesis OptimizationsSynthesis Optimizations
• After creating CDFG, high-level synthesis After creating CDFG, high-level synthesis optimizes graphoptimizes graph
• GoalsGoals– ReduceReduce area area– ImproveImprove latency latency– IncreaseIncrease parallelism parallelism– ReduceReduce power/energy power/energy
• 2 types2 types– Data flow optimizationsData flow optimizations– Control flow optimizationsControl flow optimizations
Behavioral OptimizationBehavioral Optimization
Data Flow OptimizationsData Flow Optimizations
• Tree-height reductionTree-height reduction– Generally made possible from commutativity, associativity, Generally made possible from commutativity, associativity,
and distributivityand distributivity
d
+
+
+
a b c
+ +
+
a b c d
+
+
*
a b c d
+ *
+
a b c d
Tree-height reductionTree-height reductionusing commutativity and associativityusing commutativity and associativity
x = ( a + (b *x = ( a + (b * c ) ) + d c ) ) + d x = (a +d) + (b *x = (a +d) + (b * c)c)
No Change in Resources
Tree-height reduction using distributive ..Tree-height reduction using distributive ..
x x = = a * (b * c * d +a * (b * c * d +ee) x ) x = (= (a * b) * (c * d) + (a a * b) * (c * d) + (a e);e);
Increase in Resources Required
Data Flow OptimizationsData Flow Optimizations• Operator Strength ReductionOperator Strength Reduction
– Replacing an expensive (“strong”) operation with a faster Replacing an expensive (“strong”) operation with a faster oneone
– Common example: replacing multiply/divide with shiftCommon example: replacing multiply/divide with shift
b[i] = a[i] * 8; b[i] = a[i] << 3;
a = b * 5; c = b << 2;a = b + c;
1 multiplication 0 multiplications
a = b * 13;c = b << 2;d = b << 3;a = c + d + b;
Data Flow OptimizationsData Flow Optimizations• Constant propagationConstant propagation
– Statically Statically evaluate expressions with constantsevaluate expressions with constants
x = 0;y = x * 15;z = y + 10;
x = 0;y = 0;z = 10;
Examples of propagationExamples of propagation
• First Transformation type: First Transformation type: Constant propagationConstant propagation::– a a = 0, = 0, b b = = a a +1, +1, c c = 2 * b,= 2 * b,– a a = 0, = 0, b b = 1, = 1, c c = 2,= 2,
• Second Transformation typeSecond Transformation type: : Variable propagationVariable propagation::– a a = x, = x, b b = = a a +1, +1, c c = 2 * a,= 2 * a,– a a = x, = x, b b = = x x +1, +1, c c = 2 * x,= 2 * x,
Data Flow OptimizationsData Flow Optimizations• Function SpecializationFunction Specialization
– Create specialized code for common inputsCreate specialized code for common inputs• Treat common inputs as constantsTreat common inputs as constants
• If inputs not known statically, must include if statement for each If inputs not known statically, must include if statement for each call to specialized functioncall to specialized function
int f (int x) { y = x * 15; return y + 10;}
for (I=0; I < 1000; I++) f(0); …}
Treat frequent input as a constant
int f_opt () { return 10;}
for (I=0; I < 1000; I++) f_opt(0); …}
int f (int x) { y = x * 15; return y + 10;}
Data Flow OptimizationsData Flow Optimizations• Common sub-expression eliminationCommon sub-expression elimination
– If expression appears more than once, If expression appears more than once, repetitions can be replacedrepetitions can be replaced
a = x + y; . . . . . . . . . . . . b = c * 25 + x + y;
a = x + y; . . . . . . . . . . . . b = c * 25 + a;
x + y already determined
Data Flow OptimizationsData Flow Optimizations• Dead code eliminationDead code elimination
– Remove code that is Remove code that is never executednever executed
• May seem like stupid code, but often comes from constant May seem like stupid code, but often comes from constant propagation or function specializationpropagation or function specialization
int f (int x) { if (x > 0 ) a = b * 15; else a = b / 4; return a;}
int f_opt () { a = b * 15; return a;}
Specialized version for x > 0 does not need else branch - “dead code”
Data Flow OptimizationsData Flow Optimizations• Code motion (hoisting/sinking)Code motion (hoisting/sinking)
– Avoid Avoid repeated computationrepeated computation
for (I=0; I < 100; I++) { z = x + y; b[i] = a[i] + z ;}
z = x + y;z = x + y;for (I=0; I < 100; I++) { b[i] = a[i] + z ;}
Control Flow OptimizationsControl Flow Optimizations
• Loop UnrollingLoop Unrolling– Replicate body of loopReplicate body of loop
• May increase parallelismMay increase parallelism
for (i=0; i < 128; i++) a[i] = b[i] + c[i];
for (i=0; i < 128; i+=2) { a[i] = b[i] + c[i]; a[i+1] = b[i+1] + c[i+1]}
Control Flow OptimizationsControl Flow Optimizations• Function InliningFunction Inlining
– Replace function call with body of functionReplace function call with body of function
• Common for both SW and HWCommon for both SW and HW
– SW - Eliminates function call instructionsSW - Eliminates function call instructions
– HW - Eliminates unnecessary control statesHW - Eliminates unnecessary control states
for (i=0; i < 128; i++) a[i] = f( b[i], c[i] );f( b[i], c[i] );. . . .int f (int a, int b)f (int a, int b) { return a + b * 15;}
for (i=0; i < 128; i++) a[i] = b[i] + c[i] * 15;
Control Flow OptimizationsControl Flow Optimizations• Conditional ExpansionConditional Expansion
– Replace if with logic expressionReplace if with logic expression
• Execute if/else bodies in parallelExecute if/else bodies in parallel
y = abif (a) x = b+delse x =bd
y = abx = a(b+d) + a’bd
y = abx = y + d(a+b)
Can be further optimized to:
High-level Synthesis:High-level Synthesis:
SummarySummary
Main StepsMain Steps
Syntactic Analysis
Optimization
Scheduling/Resource Allocation
Binding/Resource Sharing
High-level Code
Intermediate Representation
Controller + Datapath
Converts code to intermediate representation - allows all following steps to use language independent format.
Determines when each operation will execute, and resources used
Maps operations onto physical resources
Front-end
Back-end
60
Scheduling, Allocation and AssignmentScheduling, Allocation and Assignment
D
+
-
>>
>>
+
-
>>
+ >>
+
>>
+
Allocation: How Much?2 adders
Assignment: Where?
Schedule: When?
Shifter 1
Time Slot 4
1 shifter24 registers
D
Techniques are well understood and mature
Synthesis in the Synthesis in the TemporalTemporal Domain Domain
ASAP
Here we use sequencing graph
Synthesis in the Synthesis in the Spatial DomainSpatial Domain
First multiplierFirst multiplier
Second multiplier
Third multiplier Fourth multiplier
First ALU
Second ALU
• Solution• Four
Multipliers• Two ALUs• Four Cycles
Main StepsMain Steps
• Front-end (lexing/parsing) converts code into intermediate Front-end (lexing/parsing) converts code into intermediate representationrepresentation– We looked at CDFGWe looked at CDFG
• Scheduling assigns a start time for each operation in DFGScheduling assigns a start time for each operation in DFG– CFG node start times defined by control dependenciesCFG node start times defined by control dependencies
– Resource allocation determined by scheduleResource allocation determined by schedule
• Binding maps scheduled operations onto physical resourcesBinding maps scheduled operations onto physical resources– Determines how resources are sharedDetermines how resources are shared
• Big picture:Big picture:– Scheduled/Bound DFG can be translated into a datapathScheduled/Bound DFG can be translated into a datapath
– CFG can be translated to a controllerCFG can be translated to a controller
– => High-level synthesis can create a custom circuit for any CDFG!=> High-level synthesis can create a custom circuit for any CDFG!
64
ExampleExample
• Optimize thisOptimize this
x = 0;y = a + b;if (x < 15) z = a + b - c;else z = x + 12;output = z * 12;