DSE-14-E14 EMBEDDED HARDWARE SYSTEMS DESIGN LECTURE 4 – MODELING WITH SYSTEMC Summer Semester – 2019 Lecturer: Prof. Dr. Akash Kumar – Dr. Tuan D. A. Nguyen
DSE-14-E14EMBEDDED HARDWARE SYSTEMS DESIGN
LECTURE 4 – MODELING WITH SYSTEMCSummer Semester – 2019Lecturer: Prof. Dr. Akash Kumar – Dr. Tuan D. A. Nguyen
© Akash Kumar
2
Previous Lecture
© Akash Kumar
System Specification
Specification requirements Functionality Timing Performance External interface to other systems Power consumption Manufacturing cost ...
3
Mapping
Application
HWsynthesis
HW
Constraints
Architecture
SWsynthesis
SW
Mappingresults
IFsynthesis
© Akash Kumar
Zynq7000 SoC Architecture4
https://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf
© Akash Kumar
ZynqUltraScale SoC Architecture5
https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
© Akash Kumar
Hardware ModelingAbstraction Levels and Synthesis Flow with HDL
6
beha
vior
al v
iew
stru
ctur
al v
iew
language models abstract models
HDLinterconnected logic blocks
(logic networks)translation
HDLoperations & dependencies(control/data-flow graph)
compilation
HDL FSMs and logic functions(state tables & logic networks)
compilation
architecturalsynthesis/optimization
logicsynthesis/optimization
© Akash Kumar
7
Structure Logic Network State Diagram CDFG – Control/Data Flow Graph
Abstract Model
© Akash Kumar
8
Hardware Compiler v.s. Software Compiler
lex parse
front end
optimization
intermediate form
machinecodecodegen
back end
lex parse
a-synthesisl-synthesis
P&R
behavioraloptimization
Mask layoutFPGA bitstream
front end intermediate form back end
© Akash Kumar
9
Model expansion
Conditional expansion
Control-flow-based Transformation
x = a + b; x = a + b;y = a * b; y = a * b;z = foo (x, y); z = y - x;
foo (p, q) {t = q - p;return t;
}
y = ab; y = ab;if (a)
x = b + d; x = y + d (a + b);else
x = bd;
x = a(b + d) + a’bd= a(b + d + bd) + a’bd= ab + ad + abd + a’bd= ab + ad + bd= ab + d (a + b)
© Akash Kumar
10
FPGA Design Flow
Design • HDL• Block design
Simulate • Testbench
Generate bitstream • 1-button click
© Akash Kumar
11
Basic Structure of a Testbench
Stimulus
Design Under Test(Black Box)
Model
Driver
Expected
Actual
==?
© Akash Kumar
Simulation
Event-driven Each simulation time (ns, ps)
is decomposed into a large number of ∆ time slots
The dependencies (input, output, propagation delay, etc.) between components are resolved sequentially based on ∆
Simulation speed depends on the level of abstraction
12
© Akash Kumar
13
1. Guest Presentation2. SystemC Overview3. Combinational and Sequential Circuit Modelling4. Usecase: Network-on-Chip
Agenda
© Akash Kumar
14
1. Guest Presentation2. SystemC Overview3. Combinational and Sequential Circuit Modelling4. Usecase: Network-on-Chip
Agenda
© Akash Kumar
15
1. Guest Presentation2. SystemC Overview3. Combinational and Sequential Circuit Modelling4. Usecase: Network-on-Chip
Agenda
© Akash Kumar
16
SystemC is a modeling platform A set of C++ class library to add hardware modeling constructs Simulation kernel Supports different levels of abstraction
• Untimed Functional Model• Behavioral Level Model• Register Transfer Level Model
SystemC class library provides constructs to model system architecture which are missing in standard C++• Concurrency• Timed events• Data types
SystemC models provide the hardware and software development team with an executable specification of the system.
What is SystemC
© Akash Kumar
17
Benefits of using SystemC executable specification: Avoid inconsistency and errors ensure the completeness of the
specification The same language for describing both software algorithms and hardware
architectures. Validate system functionality before implementation begins. Helping create early performance models of the system and validate
system performance The test bench is used to test the executable specification can be reused to
test the implementation of the specification
Why SystemC
© Akash Kumar
18
Current System Design Methodology
C, C++ System Level Model
Analysis
Results
Manual Conversion
VHDL/Verilog
Simulation
Synthesis
Rest of Process
Problems: Manual Conversion from C to HDL creates errors Disconnect between system model and HDL model Multiple system tests
© Akash Kumar
19
SystemC Design Methodology
SystemC Model
Simulation
Logic Synthesis
Refinement Behavior Synthesis
Rest of Process
Benefits: Allows gradual refinement or behavior
synthesis System model and synthesis models
are written in a single language. Test benches can be reused at
different abstraction levels.
© Akash Kumar
20
SystemC Design Flow
1. Write source files and test benches for the system.
2. Compile codes and link with SystemC class library.
3. Execute the compiled binary and check simulation results.
4. Iterate to step 1 if performance not satisfied
© Akash Kumar
21
1. A SystemC Primer, by Bhasker, Jayaram. ISBN 09650391292. SystemC : from the ground up, by David C. Black et al, ISBN
14020798853. System design with SystemC, by Thorsten, ISBN 14020707214. The C++ Programming language, by Bjarne, ISBN 0-201-
88954-4
SystemC Books
© Akash Kumar
22
Module, Ports, Processes and Signals
sc_in
sc_in
sc_method(process)
sc_method(process)
sc_module(child module)
sc_out
sc_out
sc_inout
sc_module
sc_signal
© Akash Kumar
23
Example – Adder
https://www.electronics-tutorials.ws/combination/comb_7.html
© Akash Kumar
24
Example – Half-Adder in SystemC and VHDL
//File: half_adder.h
#include “systemc.h”
SC_MODULE(half_adder) {
sc_in<bool>a, b;
sc_out<bool>sum, carry;
void proc_half_adder();
SC_CTOR(half_adder) {
SC_METHOD (proc_half_adder);
sensitive << a << b;
}
}
//File: half_adder.cpp
#include “half_adder.h”
void half_adder::proc_half_adder() {
sum = a ^ b;
carry = a & b;
}
© Akash Kumar
25
VHDLArchitecture arch_namevariable and signal declarationcomputation blockend architecture
SystemCSC_MODULE (module_name) {
input/output declarationinternal variableconstructor (computation block)
};
Module – Basic Block
© Akash Kumar
26
VHDL Input : var1: in std_logic …; Output : var2: out std_logic, …; Types: std_logic, std_logic_vector
SystemC input : sc_in<type> var1, …; Output : sc_out<type>var2, …; Types:
C++ primitive type : int, float, char, ... hardware type : sc_int, sc_uint, ... user defined type
Input/Output Declaration
© Akash Kumar
27
VHDL Event trigger : process (a, b, c)
Edge trigger : if clk’event=1 and clk=1
SystemCSC_CTOR (module_name) {
SC_METHOD (function name);
sensitive << a << b << c;
…
}
Computation Block
C++ constructor
Sensitivity list
Computation function name
© Akash Kumar
28
Example – Half-Adder in SystemC and VHDL
//File: half_adder.h
#include “systemc.h”
SC_MODULE(half_adder) {
sc_in<bool>a, b;
sc_out<bool>sum, carry;
void proc_half_adder();
SC_CTOR(half_adder) {
SC_METHOD (proc_half_adder);
sensitive << a << b;
}
}
//File: half_adder.cpp
#include “half_adder.h”
void half_adder::proc_half_adder() {
sum = a ^ b;
carry = a & b;
}
© Akash Kumar
29
#include “half_adder.h”SC_MODULE (full_adder) {sc_in<bool>a, b, carry_in;sc_out<bool>sum, carry_out;
sc_signal<bool>c1, s2, c2;void proc_or();half_adder ha1(“ha1”), ha2(“ha2”);SC_CTOR(full_adder) {ha1.a(a); //by name connectionha1.b(b); ha1.sum(s1);ha1.carry(c1);ha2(s1, carry_in, sum c2) //by position connection
SC_METHOD (proc_or);sensitive << c1 << c2;
}};
Describing Hierarchy
//File: Full_adder.h
© Akash Kumar
30
#Include “full_adder.h”
#Include “pattern_gen.h”
#include “monitor.h”
int sc_main(int argc, char* argv[]) {
sc_signal<bool> t_a, t_b, t_cin, t_sum, t_cout;
full_adder f1(“Fulladder”);
//connect using positional association
f1 << t_a << t_b << t_cin << t_sum << t_cout;
pattern_gen pg_ptr = new pattern_gen(“Generation”);
//connect using named association
pg_ptr->d_a(t_a);
pg_ptr->d_b(t_b);
pg_ptr->d_cin(t_cin);
monitor mol(“Monitor”);
mo1 << t_a << t_b << t_cin << t_sum << t_cout;
sc_start(100, SC_NS);
return 0;
}
Main – Top Module for Testbench
//File: Full_adder_testbench.cpp
Full_Adder Monitor
Pattern_Gen
a b c_in
sum
carry
© Akash Kumar
31
1. Guest Presentation2. SystemC Overview3. Combinational and Sequential Circuit Modelling4. Usecase: Network-on-Chip
Agenda
© Akash Kumar
32
SC_MODULE (module_name) {// declarations of ports : input, output and inout// Declarations of signals used in inter-process// Communication// Process method declarations // Other (non-process) methods// Child module instantiation pointer declarations// Data variable declarations
SC_CTOR (module_name) {//Child module instantiations and interconnectionsSC_METHOD (process_method_name);// Sensitivity list for processSC_THREAD (process_thread_name);// Sensitivity list for process
}};
SC_MODULE – Combinational and Sequential Logic
© Akash Kumar
33
SC_MODULE (mac) {sc_in<int> a, b, c;sc_out<int> sum;
void proc_mac() { sum = a * b + c;};SC_CTOR(mac){
SC_METHOD(proc_mac)sensitive << a << b << c;
}};
Combinational Logic: Single Process
© Akash Kumar
34
SC_MODULE (mult_procs) {Sc_in<bool> source;Sc_out<bool> drain;Sc_signal<bool> connect1, connect2;void mult_procs_1() { connect1 = !source;);void mult_procs_2() { connect2 = !connect1;};void mult_procs_3() { drain = !connect2;};
SC_CTOR (mult_procs) {SC_METHOD (mult_procs_1);sensitive << source;SC_METHOD (mult_procs_2);sensitive << connect1;SC_METHOD (mult_procs_3);sensitive << connect2;
}
};
Combinational Logic: Multiple Processes
© Akash Kumar
35
SC_MODULE (count4) {//sc_in_clk sc_in<bool>sc_in_clk clk; sc_in<bool> rst;sc_out<int> cout;
int curValue;
void proc_mac() { curValue = rst ? 0 : curValue + 1; cout = curValue;
}
SC_CTOR(count4){
SC_METHOD(proc_mac)sensitive_pos << clk << rst; //async resetsensitive_pos << clk; //sync reset
}};
Sequential Logic
© Akash Kumar
36
1. Guest Presentation2. SystemC Overview3. Combinational and Sequential Circuit Modelling4. Usecase: Network-on-Chip
Agenda
© Akash Kumar
37
A network on a chip or network-on-chip (NoC) is a network-based communications subsystem on an integrated circuit ("microchip"), most typically between modules in a system on a chip (SoC)
Network-on-Chip
https://en.wikipedia.org/wiki/Network_on_a_chip
R00
R10
R20
R01
R11
R21
R02
R12
R22
NI00
NI10
NI20
NI01
NI11
NI21
NI02
NI12
NI22
© Akash Kumar
38
Circuit-switched (CSw) vs. Packet-switched Better predictability in real-time
embedded systems [1,2]
TDM vs. SDM Lower data latency, less complex
switch and cheaper network interfaces/switches [3]
Types of NoC
[1] Goossens, K. & Hansson, A., 2010. The aethereal network on chip after ten years: Goals, evolution, lessons, and future. In DAC, 2010 47th ACM/IEEE. pp. 306–311[2] Liu, S., Jantsch, A. & Lu, Z., 2013. Analysis and evaluation of circuit switched NoC and packet switched NoC. In DSD, 2013 Euromicro Conference on. pp. 21–28[3] Sparsø, J., Kasapaki, E. & Schoeberl, M., 2013. An area-efficient network interface for a TDM-based Network-on-Chip. In DATE 2013. pp. 1044–1047.[4] A. K. Lusala and J. Legat, "A Hybrid Router Combining SDM-Based Circuit Swictching with Packet Switching for On-chip Networks," 2010 ReConFig, 2010, pp. 340-345.
[4]
© Akash Kumar
TDM CSw NoC - Centralized39
SW00
SW10
SW20
SW01
SW11
SW21
SW02
SW12
SW22
NI00
NI10
NI20
NI01
NI11
NI21
NI02
NI12
NI22
Centralized Controller
Processor
© Akash Kumar
TDM CSw NoC - DeCentralized40
SSW00
SSW10
SSW20
SSW01
SSW11
SSW21
SSW02
SSW12
SSW22
NI00
NI10
NI20
NI01
NI11
NI21
NI02
NI12
NI22
© Akash Kumar
41
TDM CSw NoCCentralized vs. Decentralized
Features Centralized NoC [1,2,3] Decentralized NoC [4,5]
Have global view of the network
√ ×
Change routing algorithm √ ×
Allocate multi-slot path √ ×
Setup multicast domain √ ×
Reconfiguration Latency × √
Scalability × √
[1] Goossens, K., Dielissen, J. & Radulescu, A., Æthereal network on chip: concepts, architectures, and implementations. Design & Test of Computers, IEEE, 2005[2] Hansson, A., Subburaman, M. & Goossens, K., 2009. aelite: A flit-synchronous network on chip with composable and predictable services. In DATE 2009 [3] Stefan, R.A., Molnos, A. & Goossens, K., 2014. dAElite: A TDM NoC supporting qos, multicast, and fast connection set-up. IEEE Transactions on Computer, 2014[4] Liu, S., Jantsch, A. & Lu, Z., 2014. Parallel probe based dynamic connection setup in TDM NoCs. In DATE 2014[5] Lusala, A.K. & Legat, J.-D., 2012. A SDM-TDM-Based Circuit-Switched Router for On-Chip Networks. ACM TRETS 2012
© Akash Kumar
Distributed Control Plane42
42
SW00
SW10
SW20
SW01
SW11
SW21
SW02
SW12
SW22
NI00
NI10
NI20
NI01
NI11
NI21
NI02
NI12
NI22
SW03
SW13
SW23
NI02
NI12
NI22
SW30 SW31 SW32
NI30 NI31 NI32
SW33
NI33
Centralized Controller
Processor
© Akash Kumar
Distributed Control Plane43
SW00
SW10
SW01
SW11
NI00
NI10
NI01
NI11
SW02
SW12
NI02
NI12
SW03
SW13
NI02
NI12
SW20 SW21
NI20 NI21
SW30 SW31
NI30 NI31
SW22
NI22
SW23
NI22
SW32
NI32
SW33
NI33
Sub-Ctrl 00
Sub-Ctrl 10
Sub-Ctrl 01
Sub-Ctrl 11
Control Agent
Processor
Strict Sync.Out-of-Order
Sync.
© Akash Kumar
Sub-Controller Synchronization44
Strict Synchronization Out-of-Order Synchronization
© Akash Kumar
What to Simulate?
TDM CSw NoC Data Plane: predictable in terms of throughput and latency once the
connection is established Control Plane: dependent on the current state of the NoC and the
communication patterns between IPs Control Plane
45
© Akash Kumar
What to measure?
Distributed Control Plane vs. Centralized Control Plane Speedup in establishing connection Overhead in data injection latency
46
© Akash Kumar
Test scenarios
Number of SubControllers The communication patterns
Locality
The communication between SubControllers No Out-of-Order Out-of-Order
The Controllers have buffers to store the requests Buffer size
47
© Akash Kumar
Structure48
Request Generator
Centralized Controller
Control Agent(DeCentralized)
SubCtrl SubCtrl
SubCtrl SubCtrl
Finish Finish
MainGlobal Signals
© Akash Kumar
Main49
© Akash Kumar
Generate Instruction50
© Akash Kumar
Central Controller51
© Akash Kumar
SubController52
© Akash Kumar
Compile SystemC53
© Akash Kumar
Testcases with Different Parameters54
© Akash Kumar
Speedup55
0
1
2
3
4
5
6
7
8
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Spee
dup
Locality
1x2 - No OOO 2x2 - No OOO 2x4 - No OOO 4x4 - No OOO1x2 - OOO 2x2 - OOO 2x4 - OOO 4x4 - OOO
© Akash Kumar
Connection Activation Latency56
81 94119 133
208240
272304
136152
168184
37 49 62 74
0
50
100
150
200
250
300
350
6 Hops 8 Hops 10 Hops 12 Hops
Clo
ck c
ycle
s
dAElite [1] Par. Probe [2] XNoC - Cent. XNoC - Dist.
[1] Stefan, R.A., Molnos, A. & Goossens, K., 2014. dAElite: A TDM NoC supporting qos, multicast, and fast connection set-up. IEEE Transactions on Computer, 2014[2] Liu, S., Jantsch, A. & Lu, Z., 2014. Parallel probe based dynamic connection setup in TDM NoCs. In DATE 2014
© Akash Kumar
Data Injection Latency57
89103
127141
34 38 42 46
10 10 10 1013 14 16 17
0
20
40
60
80
100
120
140
160
6 Hops 8 Hops 10 Hops 12 Hops
Clo
ck c
ycle
s
dAElite [1] Par. Probe [2] XNoC - Cent. XNoC - Dist.
[1] Stefan, R.A., Molnos, A. & Goossens, K., 2014. dAElite: A TDM NoC supporting qos, multicast, and fast connection set-up. IEEE Transactions on Computer, 2014[2] Liu, S., Jantsch, A. & Lu, Z., 2014. Parallel probe based dynamic connection setup in TDM NoCs. In DATE 2014
© Akash Kumar
Summary
Design evaluation can be made at different level depending on the metrics: quality of results, performance, etc.
SystemC provides a quick framework for early system specification evaluation
SystemC is not much different than C++ Planning a testbench: what to simulate, what/how to measure the
desired metrics, test scenarios
58
© Akash Kumar
Thank You!
59