Top Banner
Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker
45

Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Verilog, Pipelined ProcessorsCPSC 321

Andreas Klappenecker

Page 2: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Today’s Menu

VerilogPipelined Processor

Page 3: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Recall: n-bit Ripple Carry Adder

module ripple(cin, X, Y,

S, cout);

parameter n = 4;

input cin;

input [n-1:0] X, Y;

output [n-1:0] S;

output cout;

reg [n-1:0] S;

reg [n:0] C;

reg cout;

integer k;

always @(X or Y or cin)

begin

C[0] = cin;

for(k = 0; k <= n-1; k=k+1)

begin

S[k] = X[k]^Y[k]^C[k];

C[k+1] = (X[k] & Y[k])

|(C[k]&X[k])|(C[k]&Y[k]);

end

cout = C[n];

end

endmodule

Page 4: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Recall: ‘=’ versus ‘<=’

initial begin

a=1; b=2; c=3; x=4;

#5 a = b+c; // wait 5 units, grab b,c,

// compute a=b+c=2+3

d = a; // d = 5 = b+c at time t=5.

x <= #6 b+c; // grab b+c now at t=5, don’t stop

// assign x=5 at t=11.

b <= #2 a; // grab a at t=5

//(end of last blocking statement).

// Deliver b=5 at t=7.

// previous x is unaffected by change of b.

Page 5: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Recall: ‘=’ versus ‘<=’

initial begina=1; b=2; c=3; x=4;

#5 a = b+c;

d = a; // time t=5

x <= #6 b+c; // assign x=5 at time t=11

b <= #2 a; // assign b=5 at time t=7

y <= #1 b + c; // grab b+c at t=5, don’t stop,

// assign x=5 at t=6.

#3 z = b + c; // grab b+c at t=8 (5+3),

// assign z=5 at t=8.

w <= x // assign w=4 at t=8.

// (= starting at last blocking assignment)

Page 6: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Confused?

a = b + c // blocking assignment

a <= b + c // non-blocking assignment

#2 // delay by 2 time units

Block assignment with delay? Probably wrong!

Non-blocking assignment without delay? Bad idea!

Page 7: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Address Register

`define REG_DELAY 1

module add_reg(clk, reset, addr, reg_addr);

input clk, reset;

input [15:0] addr;

output [15:0] reg_addr;

reg [15:0] reg_addr;

always @(posedge clk)

if (reset)

reg_addr <= #(`REG_DELAY) 16 h’00;

else

reg_addr <= #(`REG_DELAY) address;

endmodule

Page 8: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Concurrency Example

module concurrency_example;

initial begin

#1 $display(“Block 1 stmt 1");

$display(“Block 1 stmt 2");

#2 $display(“Block 1 stmt 3");

end

initial begin

$display("Block 2 stmt 1");

#2 $display("Block 2 stmt 2");

#2 $display("Block 2 stmt 3");

end

endmodule

Block 2 stmt 1

Block 1 stmt 1

Block 1 stmt 2

Block 2 stmt 2

Block 1 stmt 3

Block 2 stmt 3

Page 9: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Concurrency: fork and join

module concurrency_example;

initial fork

#1 $display(“Block 1 stmt 1");

$display(“Block 1 stmt 2");

#2 $display(“Block 1 stmt 3");

join

initial fork

$display("Block 2 stmt 1");

#2 $display("Block 2 stmt 2");

#2 $display("Block 2 stmt 3");

join

endmodule

Block 1 stmt 2

Block 2 stmt 1

Block 1 stmt 1

Block 1 stmt 3

Block 2 stmt 2

Block 2 stmt 3

Page 10: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Begin-End vs. Fork-Join

• In begin – end blocks, the statements are sequential and the delays are additive• In fork-join bocks, the statements are concurrent and the delays are independent

The two constructs can be used to compound statements. Nesting begin-end statements is not useful; neither is nesting for-join statements.

Page 11: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Displaying Results

a = 4’b0011

$display(“The value of a is %b”, a);

The value of a is 0011

$display(“The value of a is %0b”, a);

The value of a is 11

If you you $display to print a value that is changingduring this time step, then you might get the new orthe old value; use $strobe to get the new value

Page 12: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Displaying Results

• Standard displaying functions• $display, $write, $strobe, $monitor

• Writing to a file instead of stdout• $fdisplay, $fwrite, $fstrobe, $fmonitor

• Format specifiers• %b, %0b, %d, %0d, %h, %0h, %c, %s,…

Page 13: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Display Example

module f1;

integer f;

initial begin

f = $fopen("myFile");

$fdisplay(f, "Hello, bla bla");

end

endmodule

Page 14: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Finite State Automata

Page 15: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Moore Machines

The output of a Moore machine dependsonly on the current state. Output logic andnext state logic are sometimes merged.

next

state

logic

present

state

register

output

logic

input

Page 16: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Mealy Machines

The output of a Mealy machine depends on the current state and the input.

next

state

logic

present

state

register

output

logic

input

Page 17: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

State Machine Modeling

reg = state register, nsl = next state logic, ol = output logic

• Model reg separate, nsl separate, ol separate:• 3 always blocks of combinatorial logic; easy to maintain.

• Combine reg and nsl, keep ol separate• The state register and the output logic are strongly correlated;

it is usually more efficient to combine these two.

• Combine nsl and ol, keep register separate• Messy! Don’t do that!

• Combine everything into one always block• Can only be used for a Moore state machine. Why?

• Combine register and output logic into one always block• Can only be used for a Mealy state machine.

Page 18: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Example: Automatic Food Cooker

Page 19: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Moore Machine Example

Automatic food cooker• Has a supply of food• Can load food into the heater when

requested• Cooker unloads the food when cooking

done

Page 20: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Automated Cooker

Outputs from the machine• load = signal that sends food into the

cooker• heat = signal that turns on the heater• unload = signal that removes food from

cooker• beep = signal that alerts that food is done

Page 21: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Automated Cooker

Inputs• clock • start = start the load, cook, unload

cycle• temp_ok = temperature sensor

detecting when preheating is done• done = signal from timer when done• quiet = Should cooker beep?

Page 22: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Cooker

module cooker(

clock, start, temp_ok, done, quiet, load, heat, unload, beep

);

input clock, start, temp_ok, done, quiet;

output load, heat, unload, beep;

reg load, heat, unload, beep;

reg [2:0] state, next_state;

Page 23: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Defining States

`define IDLE 3'b000

`define PREHEAT 3'b001

`define LOAD 3'b010

`define COOK 3'b011

`define EMPTY 3'b100

You can refer to these states as ‘IDLE, ‘PREHEAT, etc. Symbolic names are a good idea!

Page 24: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

State Register Block

`define REG_DELAY 1

always @(posedge clock)

state <= #(`REG_DELAY) next_state;

Page 25: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Next State Logic

always @(state or start or temp_ok or done)

// whenever there is a change in input

begin

case (state)

`IDLE: if (start) next_state=`PREHEAT;

`PREHEAT: if (temp_ok) next_state = `LOAD;

`LOAD: next_state = `COOK;

`COOK: if (done) next_state=`EMPTY;

`EMPTY: next_state = `IDLE;

default: next_state = `IDLE;

endcase

end

Page 26: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Output Logic

always @(state)

begin

if(state == `LOAD) load = 1; else load = 0;

if(state == `EMPTY) unload =1; else unload = 0;

if(state == `EMPTY && quiet == 0) beep =1;

else beep = 0;

if(state == `PREHEAT ||

state == `LOAD ||

state == `COOK) heat = 1;

else heat =0;

end

Page 27: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

`define IDLE 3'b000

`define PREHEAT 3'b001

`define LOAD 3'b010

`define COOK 3'b011

`define EMPTY 3'b100

module cooker(clock,...);always @(state or start or temp_ok or done)

begin

case (state)

`IDLE: if (start) next_state=`PREHEAT;

`PREHEAT: if (temp_ok) next_state = `LOAD;

`LOAD: next_state = `COOK;

`COOK: if (done) next_state=`EMPTY;

`EMPTY: next_state = `IDLE;

default: next_state = `IDLE;

endcase

end`define REG_DELAY 1

always @(posedge clock)

state <= #(`REG_DELAY) next_state;

always @(state)

begin

if(state == `LOAD) load = 1; else load = 0;

if(state == `EMPTY) unload =1; else unload = 0;

if(state == `EMPTY && quiet == 0) beep =1;

else beep = 0;

if(state == `PREHEAT ||

state == `LOAD ||

state == `COOK) heat = 1;

else heat =0;

end

Page 28: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Pipelined Processor

Page 29: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Basic Idea

Page 30: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Time Required for Load Word

• Assume that a lw instruction needs• 2 ns for instruction fetch• 1 ns for register read• 2 ns for ALU operation• 2 ns for data access• 1 ns for register write

• Total time = 8 ns

Page 31: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Non-Pipelined vs. Pipelined Execution

Page 32: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Question

What is the average speed-up forpipelined versus non-pipelined

executionin case of load word instructions?

Average speed-up is 4-fold!

Page 33: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Reason

Assuming ideal conditions

time between instructions (pipelined) =

time between instructions (nonpipelined) number of pipe stages

Page 34: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

MIPS Appreciation Day

• All MIPS instructions have the same length• => simplifies the pipeline design• fetch in first stage and decode in second stage

• Compare with 80x86• Instructions 1 byte to 17 bytes• Pipelining is much more challenging

Page 35: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Obstacles to Pipelining

• Structural Hazards• hardware cannot support the combination of

instructions in the same clock cycle

• Control Hazards• need to make decision based on results of one

instruction while other is still executing

• Data Hazards• instruction depends on results of instruction

still in pipeline

Page 36: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Structural Hazards

• Laundry examples• if you have a washer-dryer combination

instead of a separate washer and dryer,…• separate washer and dryer, but roommate

is busy doing something else and does not put clothes away [sic!]

• Computer architecture• competition in accessing hardware

resources, e.g., access memory at the same time

Page 37: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Control Hazards

Control hazards arise from the need tomake a decision based on results of aninstruction in the pipeline• Branches: What is the next instruction?• How can we resolve the problem?

• Stall the pipeline until computations done• or predict the result • delayed decision

Page 38: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Stall on Branch

• Assume that all branch computations are done in stage 2

• Delay by one cycle to wait for the result

Page 39: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Branch Prediction

• Predict branch result• For example, predict always that branch is not taken (e.g. reasonable for while instructions)• if choice is correct, then pipeline runs at

full speed• if choice is incorrect, then pipeline stalls

Page 40: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Branch Prediction

Page 41: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Delayed Branch

Page 42: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Data Hazards

• A data hazard results if an instruction depends on the result of a previous instruction• add $s0, $t0, $t1• sub $t2, $s0, $t3 // $s0 to be determined

• These dependencies happen often, so it is not possible to avoid them completely

• Use forwarding to get missing data from internal resources once available

Page 43: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Forwarding

add $s0, $t0, $t1

sub $t2, $s0, $t3

Page 44: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Single Cycle Datapath

Page 45: Verilog, Pipelined Processors CPSC 321 Andreas Klappenecker.

Pipelined Version