W13L18- Real Time System Design -1

8/6/2019 W13L18- Real Time System Design -1

1/29

CSE-350- FPGA Based SystemDesignCourse Instructor: Ms. Saba Zia

Week 13: Real Time System Design

1


2/29

Agenda

How to architect speed with

Review of Register Transfer Level (RTL)

High throughput

Low Latency

Timing

Adding Register Layers Parallel Structures

Flatten Logic Structures

Ref: Advanced FPGA based System Design by Steve Kilts, Chapter 1 2


3/29

Register Transfer Level (RTL)

3

Combinatorial Delay

Input

Register

Output

Register

Combinatorial Cloud


4/29


5/29

How to Architect Speed in a Simple Design

always @ (posedge clk)

begin

out


6/29

Single-Cycle Implementation

+


7/29



begin

case(n)

0: out1


8/29

Multi-Cycle Implementation (Iterative Process)

+


9/29



begin

out1


10/29

Pipelined Implementation

Latency = 40+ 40+ 40+ 40 = 160 ns

Operating Frequency = 1/40ns = 25 MHz

Output Frequency = 1/40ns = 25 MHz

New inputs are processed on every clock

+


11/29

Architecting Speed Important Parameters for Speed Analysis of Logic Design

High throughput architectures for maximizing the number of bits per second that can be

processed by the design

Low latency architectures for minimizing the delay from the input of a module to the

output

Timing Optimizations to reduce the combinatorial delay of the critical path

11


12/29

High Throughput Through Pipelining Maximizing number of bits per second so concerned only with steady state

data rate and NOT particularly with time any specific amount of data

requires to propagate through the design (Latency)

Pipelining the design increases its throughput

Through pipelining new data can begin processing before prior data has

finished

12


13/29

Example Power of a number Software Algorithm (Iterative Approach)

Execution in Microprocessor

Xpower = 1for (I = 0; i< 3; i++)

Xpower = X * Xpower;

13


14/29

Verilog Code (Iterative Implementation)

Power of a Number

14


15/29


Power of a Number

15


16/29


Power of a Number - Analysis

Throughput = 8 / 3 or 2.7 bits per clock cycle

Latency = 3 clocks

Timing = One multiplier delay in critical path

16

Throughput = Number of bits processed / cycles required to get next result

Latency = Cycles required to get first output

Timing = Delay in critical path of combinational logic


17/29

Verilog Code (Pipelined Implementation)

Power of a Number

17


18/29


Power of a Number - Analysis

Throughput = 8 / 1 or 8 bits per clock cycle

Latency = 3 clocks

Timing = One multiplier delay in critical path

18


19/29

Design Schematic Comparison

19


20/29

Sequential Multiplier Algorithm (Dry Run)

Multiplier in Q

Q[0] = 1; add B

First Partial Product

Shift Right CAQ

Q[0] = 1; add B

Second Partial Product

Shift Right CAQ

Q[0] = 0; shift right CAQ

Q[0] = 0; shift right CAQ

Q[0] = 1; add B

Fifth Partial Product

Shift Right CAQ

Final Product in AQ

20

Multiplicand B = 10111

C A Q P0 00000 10011 101

10111

0 10111 100

0 01011 11001

10111

1 00010 011

0 10001 01100

0 01000 10110 010

0 00100 01011 001

10111

0 11011 000

0 01101 10101

01101 10101


21/29

Sequential Multiplier ASM Chart

21

S_idleReady

S_add

Decr_P

S_shift

Shift_regs

start

Q[0]

Zero

Load_regs

Add_regs

1

1

1

A


22/29

Sequential Multiplier Verilog Coding

22


23/29

Sequential Multiplier Verilog Code

23


24/29

24

Sequential Multiplier Verilog Code


25/29

Simulation

25


26/29

ASM for Pipelined Multiplier (Simplified)

Idle

Processing

Producing ResultsStop Processing

Finish

start

count_s == 8

count_f == 8

stop

0

0

0

0

1

1

1

1

26


27/29

8-bit Pipelined Multiplier DATAPATH

+ >> + >> + >> + >>

B B_01 B_02 B_03 B_04 B_05 B_06 B_07 B_08

CAQ CAQ_01 CAQ_02 CAQ_03 CAQ_04 CAQ_05 CAQ_06 CAQ_07 CAQ_08

Load_reg

Load_reg

Add_r

egShift_

reg

Shift_

reg

Shift_

reg

Shift_

regAdd_r

eg_02

Add_r

eg_04

Add_r

eg_06

27


28/29

8-bit Pipelined Multiplier

CONTROL DATAPATH

Add_reg

Add_reg_02

Add_reg_04

Add_reg_06

Load_reg

Shift_reg

Mult_finished

Prod_

produced

Multiplier Multiplicand

Product

Start Stop

Q0, Q2, Q4, Q6

rst_nclk

28


29/29

Lab Task Implement 8 bit pipelined multiplier, show the simulation and use it in your

projects.

29

W13L18- Real Time System Design -1

Documents