8/6/2019 W13L18- Real Time System Design -1
1/29
CSE-350- FPGA Based SystemDesignCourse Instructor: Ms. Saba Zia
Week 13: Real Time System Design
1
8/6/2019 W13L18- Real Time System Design -1
2/29
Agenda
How to architect speed with
Review of Register Transfer Level (RTL)
High throughput
Low Latency
Timing
Adding Register Layers Parallel Structures
Flatten Logic Structures
Ref: Advanced FPGA based System Design by Steve Kilts, Chapter 1 2
8/6/2019 W13L18- Real Time System Design -1
3/29
Register Transfer Level (RTL)
3
Combinatorial Delay
Input
Register
Output
Register
Combinatorial Cloud
8/6/2019 W13L18- Real Time System Design -1
4/29
8/6/2019 W13L18- Real Time System Design -1
5/29
How to Architect Speed in a Simple Design
always @ (posedge clk)
begin
out
8/6/2019 W13L18- Real Time System Design -1
6/29
Single-Cycle Implementation
+
8/6/2019 W13L18- Real Time System Design -1
7/29
How to Architect Speed in a Simple Design
always @ (posedge clk)
begin
case(n)
0: out1
8/6/2019 W13L18- Real Time System Design -1
8/29
Multi-Cycle Implementation (Iterative Process)
+
8/6/2019 W13L18- Real Time System Design -1
9/29
How to Architect Speed in a Simple Design
always @ (posedge clk)
begin
out1
8/6/2019 W13L18- Real Time System Design -1
10/29
Pipelined Implementation
Latency = 40+ 40+ 40+ 40 = 160 ns
Operating Frequency = 1/40ns = 25 MHz
Output Frequency = 1/40ns = 25 MHz
New inputs are processed on every clock
+
8/6/2019 W13L18- Real Time System Design -1
11/29
Architecting Speed Important Parameters for Speed Analysis of Logic Design
High throughput architectures for maximizing the number of bits per second that can be
processed by the design
Low latency architectures for minimizing the delay from the input of a module to the
output
Timing Optimizations to reduce the combinatorial delay of the critical path
11
8/6/2019 W13L18- Real Time System Design -1
12/29
High Throughput Through Pipelining Maximizing number of bits per second so concerned only with steady state
data rate and NOT particularly with time any specific amount of data
requires to propagate through the design (Latency)
Pipelining the design increases its throughput
Through pipelining new data can begin processing before prior data has
finished
12
8/6/2019 W13L18- Real Time System Design -1
13/29
Example Power of a number Software Algorithm (Iterative Approach)
Execution in Microprocessor
Xpower = 1for (I = 0; i< 3; i++)
Xpower = X * Xpower;
13
8/6/2019 W13L18- Real Time System Design -1
14/29
Verilog Code (Iterative Implementation)
Power of a Number
14
8/6/2019 W13L18- Real Time System Design -1
15/29
Verilog Code (Iterative Implementation)
Power of a Number
15
8/6/2019 W13L18- Real Time System Design -1
16/29
Verilog Code (Iterative Implementation)
Power of a Number - Analysis
Throughput = 8 / 3 or 2.7 bits per clock cycle
Latency = 3 clocks
Timing = One multiplier delay in critical path
16
Throughput = Number of bits processed / cycles required to get next result
Latency = Cycles required to get first output
Timing = Delay in critical path of combinational logic
8/6/2019 W13L18- Real Time System Design -1
17/29
Verilog Code (Pipelined Implementation)
Power of a Number
17
8/6/2019 W13L18- Real Time System Design -1
18/29
Verilog Code (Iterative Implementation)
Power of a Number - Analysis
Throughput = 8 / 1 or 8 bits per clock cycle
Latency = 3 clocks
Timing = One multiplier delay in critical path
18
8/6/2019 W13L18- Real Time System Design -1
19/29
Design Schematic Comparison
19
8/6/2019 W13L18- Real Time System Design -1
20/29
Sequential Multiplier Algorithm (Dry Run)
Multiplier in Q
Q[0] = 1; add B
First Partial Product
Shift Right CAQ
Q[0] = 1; add B
Second Partial Product
Shift Right CAQ
Q[0] = 0; shift right CAQ
Q[0] = 0; shift right CAQ
Q[0] = 1; add B
Fifth Partial Product
Shift Right CAQ
Final Product in AQ
20
Multiplicand B = 10111
C A Q P0 00000 10011 101
10111
0 10111 100
0 01011 11001
10111
1 00010 011
0 10001 01100
0 01000 10110 010
0 00100 01011 001
10111
0 11011 000
0 01101 10101
01101 10101
8/6/2019 W13L18- Real Time System Design -1
21/29
Sequential Multiplier ASM Chart
21
S_idleReady
S_add
Decr_P
S_shift
Shift_regs
start
Q[0]
Zero
Load_regs
Add_regs
1
1
1
A
8/6/2019 W13L18- Real Time System Design -1
22/29
Sequential Multiplier Verilog Coding
22
8/6/2019 W13L18- Real Time System Design -1
23/29
Sequential Multiplier Verilog Code
23
8/6/2019 W13L18- Real Time System Design -1
24/29
24
Sequential Multiplier Verilog Code
8/6/2019 W13L18- Real Time System Design -1
25/29
Simulation
25
8/6/2019 W13L18- Real Time System Design -1
26/29
ASM for Pipelined Multiplier (Simplified)
Idle
Processing
Producing ResultsStop Processing
Finish
start
count_s == 8
count_f == 8
stop
0
0
0
0
1
1
1
1
26
8/6/2019 W13L18- Real Time System Design -1
27/29
8-bit Pipelined Multiplier DATAPATH
+ >> + >> + >> + >>
B B_01 B_02 B_03 B_04 B_05 B_06 B_07 B_08
CAQ CAQ_01 CAQ_02 CAQ_03 CAQ_04 CAQ_05 CAQ_06 CAQ_07 CAQ_08
Load_reg
Load_reg
Add_r
egShift_
reg
Shift_
reg
Shift_
reg
Shift_
regAdd_r
eg_02
Add_r
eg_04
Add_r
eg_06
27
8/6/2019 W13L18- Real Time System Design -1
28/29
8-bit Pipelined Multiplier
CONTROL DATAPATH
Add_reg
Add_reg_02
Add_reg_04
Add_reg_06
Load_reg
Shift_reg
Mult_finished
Prod_
produced
Multiplier Multiplicand
Product
Start Stop
Q0, Q2, Q4, Q6
rst_nclk
28
8/6/2019 W13L18- Real Time System Design -1
29/29
Lab Task Implement 8 bit pipelined multiplier, show the simulation and use it in your
projects.
29