Top Banner
15-447 Computer Architecture Fall 2008 © September 24, 2008 Nael Abu-Ghazaleh [email protected] www.qatar.cmu.edu/~msakr/15447-f08/ CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
34

15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh [email protected] msakr/15447-f08/ CS-447– Computer Architecture.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

September 24, 2008

Nael [email protected]

www.qatar.cmu.edu/~msakr/15447-f08/

CS-447– Computer Architecture

Lecture 12Multiple Cycle Datapath

Page 2: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Implementation vs. Performance

Performance of a processor is determined by

• Instruction count of a program

• CPI

• Clock cycle time (clock rate)

The compiler & the ISA determine the instruction count.

The implementation of the processor determines the CPI and the clock cycle time.

Page 3: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Possible Execution Steps of Any Instructions

° Instruction Fetch

° Instruction Decode and Register Fetch

° Execution of the Memory Reference Instruction

° Execution of Arithmetic-Logical operations

° Branch Instruction

° Jump Instruction

Page 4: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Instruction Processing° Five steps:

• Instruction fetch (IF)

• Instruction decode and operand fetch (ID)

• ALU/execute (EX)

• Memory (not required) (MEM)

• Write-back (WB)

Registers

Register #

Data

Register #

Datamemory

Address

Data

Register #

PC Instruction ALU

Instructionmemory

Address

IF

ID

EX

MEM

WB

Page 5: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Single Cycle Implementation

PC

Instructionmemory

Readaddress

Instruction

16 32

Add ALUresult

Mux

Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1Readregister 2

Shiftleft 2

4

Mux

ALU operation3

RegWrite

MemRead

MemWrite

PCSrc

ALUSrc

MemtoReg

ALUresult

ZeroALU

Datamemory

Address

Writedata

Readdata M

ux

Signextend

Add

Page 6: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Multiple ALUs and Memory Units

PC

Instructionmemory

Readaddress

Instruction

16 32

Add ALUresult

Mux

Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1Readregister 2

Shiftleft 2

4

Mux

ALU operation3

RegWrite

MemRead

MemWrite

PCSrc

ALUSrc

MemtoReg

ALUresult

ZeroALU

Datamemory

Address

Writedata

Readdata M

ux

Signextend

Add

Page 7: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Single Cycle Datapath

Page 8: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

What’s Wrong with Single Cycle?

° All instructions run at the speed of the slowest instruction.

° Adding a long instruction can hurt performance• What if you wanted to include multiply?

° You cannot reuse any parts of the processor• We have 3 different adders to calculate PC+4,

PC+4+offset and the ALU

° No profit in making the common case fast• Since every instruction runs at the slowest instruction

speed- This is particularly important for loads as we will see later

Page 9: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

What’s Wrong with Single Cycle?

1 ns – Register read/write time

2 ns – ALU/adder

2 ns – memory access

0 ns – MUX, PC access, sign extend, ROM

add: 2ns + 1ns + 2ns + 1ns = 6 ns

beq: 2ns + 1ns + 2ns = 5 ns

sw: 2ns + 1ns + 2ns + 2ns = 7 ns

lw: 2ns + 1ns + 2ns + 2ns + 1ns = 8 ns

Get read ALU mem writeInstr reg operation reg

Page 10: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Computing Execution Time

Assume: 100 instructions executed25% of instructions are loads,

10% of instructions are stores,

45% of instructions are adds, and

20% of instructions are branches.

Single-cycle execution:

100 * 8ns = 800 ns

Optimal execution:

25*8ns + 10*7ns + 45*6ns + 20*5ns = 640 ns

Page 11: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Single Cycle Problems° A sequence of instructions:

1. LW (IF, ID, EX, MEM, WB)

2. SW (IF, ID, EX, MEM)

3. etc

Clk

Single Cycle Implementation:

Load Store Waste

Cycle 1 Cycle 2

• what if we had a more complicated instruction like floating point?

• wasteful of area

Page 12: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Multiple Cycle Solution

• use a “smaller” cycle time

• have different instructions take different numbers of cycles

• a “multicycle” datapath:

Data

Register #

Register #

Register #

PC Address

Instructionor dataMemory Registers ALU

Instructionregister

Memorydata

register

ALUOut

A

BData

Page 13: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

° We will be reusing functional units• ALU used to compute address and to increment PC

• Memory used for instruction and data

° We’ll use a finite state machine for control

Multicycle Approach

Data

Register #

Register #

Register #

PC Address

Instructionor dataMemory Registers ALU

Instructionregister

Memorydata

register

ALUOut

A

BData

Page 14: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

The Five Stages of an Instruction

° IF: Instruction Fetch and Update PC

° ID: Instruction Decode and Registers Fetch

° Ex: Execute R-type; calculate memory address

° Mem: Read/write the data from/to the Data Memory

° WB: Write the result data into the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

IF ID Ex Mem WB

Page 15: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

° Break up the instructions into steps, each step takes a cycle

• balance the amount of work to be done

• restrict each cycle to use only one major functional unit

° At the end of a cycle

• store values for use in later cycles (easiest thing to do)

• introduce additional “internal” registers

Multicycle Implementation

Readregister 1

Readregister 2

Writeregister

Writedata

Registers ALU

Zero

Readdata 1

Readdata 2

Signextend

16 32

Instruction[25–21]

Instruction[20–16]

Instruction[15–0]

ALUresult

Mux

Mux

Shiftleft 2

Instructionregister

PC 0

1

Mux

0

1

Mux

0

1

Mux

0

1A

B 0

1

2

3

ALUOut

Instruction[15–0]

Memorydata

register

Address

Writedata

Memory

MemData

4

Instruction[15–11]

Page 16: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

The Five Stages of Load Instruction

° IF: Instruction Fetch and Update PC

° ID: Instruction Decode and Registers Fetch

° Ex: Execute R-type; calculate memory address

° Mem: Read/write the data from/to the Data Memory

° WB: Write the result data into the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

IF ID Ex Mem WBlw

Page 17: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

° Break the instruction execution into Clock Cycles

• Different instructions require a different number of clock cycles

• Clock cycle is limited by the slowest stage

• Instruction latency is not reduced (time from the start of an instruction to its completion)

Multiple Cycle Implementation

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

IFetch Dec Exec Mem WBlw

Cycle 7Cycle 6 Cycle 8

sw IFetch Dec Exec Mem

Page 18: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Single Cycle vs. Multiple Cycle

Clk

Cycle 1

Multiple Cycle Implementation:

IFetch Dec Exec Mem WB

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

IFetch Dec Exec Mem

lw sw

Clk

Single Cycle Implementation:

Load Store Waste

IFetch

R-type

Cycle 1 Cycle 2

Page 19: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

° Break up the instructions into steps, each step takes a cycle

• balance the amount of work to be done

• restrict each cycle to use only one major functional unit

° At the end of a cycle

• store values for use in later cycles (easiest thing to do)

• introduce additional “internal” registers

Multicycle Implementation

Readregister 1

Readregister 2

Writeregister

Writedata

Registers ALU

Zero

Readdata 1

Readdata 2

Signextend

16 32

Instruction[25–21]

Instruction[20–16]

Instruction[15–0]

ALUresult

Mux

Mux

Shiftleft 2

Instructionregister

PC 0

1

Mux

0

1

Mux

0

1

Mux

0

1A

B 0

1

2

3

ALUOut

Instruction[15–0]

Memorydata

register

Address

Writedata

Memory

MemData

4

Instruction[15–11]

Page 20: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Instructions from ISA perspective

° Consider each instruction from perspective of ISA.

° Example:

• The add instruction changes a register.

• Register specified by bits 15:11 of instruction.

• Instruction specified by the PC.

• New value is the sum (“op”) of two registers.

• Registers specified by bits 25:21 and 20:16 of the instruction

Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]]

• In order to accomplish this we must break up the instruction.(kind of like introducing variables when

programming)

Page 21: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Breaking down an instruction

° ISA definition of arithmetic:

Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op

Reg[Memory[PC][20:16]]

° Could break down to:

•IR <= Memory[PC]

•A <= Reg[IR[25:21]]

•B <= Reg[IR[20:16]]

•ALUOut <= A op B

•Reg[IR[20:16]] <= ALUOut

° We forgot an important part of the definition of arithmetic!

•PC <= PC + 4

Page 22: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Idea behind multicycle approach

° We define each instruction from the ISA perspective (do this!)

° Break it down into steps following our rule that data flows through at most one major functional unit (e.g., balance work across steps)

° Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.)

° Finally try and pack as much work into each step (avoid unnecessary cycles)

while also trying to share steps where possible(minimizes control, helps to simplify solution)

Page 23: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

° Instruction Fetch

° Instruction Decode and Register Fetch

° Execution, Memory Address Computation, or Branch Completion

° Memory Access or R-type instruction completion

° Write-back step

INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!

Five Execution Steps

Page 24: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

° Use PC to get instruction and put it in the Instruction Register.

° Increment the PC by 4 and put the result back in the PC.

° Can be described succinctly using RTL "Register-Transfer Language"

IR <= Memory[PC];PC <= PC + 4;

Can we figure out the values of the control signals?

What is the advantage of updating the PC now?

Step 1: Instruction Fetch

Page 25: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

° Read registers rs and rt in case we need them

° Compute the branch address in case the instruction is a branch

° RTL:

A <= Reg[IR[25:21]];B <= Reg[IR[20:16]];ALUOut <= PC + (sign-extend(IR[15:0]) << 2);

° We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic)

Step 2: Instruction Decode and Register Fetch

Page 26: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

° ALU is performing one of three functions, based on instruction type

° Memory Reference:

ALUOut <= A + sign-extend(IR[15:0]);

° R-type:

ALUOut <= A op B;

° Branch:

if (A==B) PC <= ALUOut;

Step 3 (instruction dependent)

Page 27: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

° Loads and stores access memory

MDR <= Memory[ALUOut];or

Memory[ALUOut] <= B;

° R-type instructions finish

Reg[IR[15:11]] <= ALUOut;

Step 4 (R-type or memory-access)

Page 28: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

°Reg[IR[20:16]] <= MDR;

Which instruction needs this?

Write-back step

Page 29: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Summary:

Page 30: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Readregister 1

Readregister 2

Writeregister

Writedata

Registers ALU

Zero

Readdata 1

Readdata 2

Signextend

16 32

Instruction[31–26]

Instruction[25–21]

Instruction[20–16]

Instruction[15–0]

ALUresult

Mux

Mux

Shiftleft 2

Shiftleft 2

Instructionregister

PC 0

1

Mux

0

1

Mux

0

1

Mux

0

1A

B 0

1

2

3

Mux

0

1

2

ALUOut

Instruction[15–0]

Memorydata

register

Address

Writedata

Memory

MemData

4

Instruction[15–11]

PCWriteCond

PCWrite

IorD

MemRead

MemWrite

MemtoReg

IRWrite

PCSource

ALUOp

ALUSrcB

ALUSrcA

RegWrite

RegDst

26 28

Outputs

Control

Op[5–0]

ALUcontrol

PC [31–28]

Instruction [25-0]

Instruction [5–0]

Jumpaddress[31–0]

Multiple Cycle Implementation

Page 31: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

° Finite state machines:

• a set of states and

• next state function (determined by current state and the input)

• output function (determined by current state and possibly input)

• We’ll use a Moore machine (output based only on current state)

Review: finite state machines

Inputs

Current state

Outputs

Clock

Next-statefunction

Outputfunction

Nextstate

Page 32: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

° Value of control signals is dependent upon:

• what instruction is being executed

• which step is being performed

° Use the information we’ve accumulated to specify a finite state machine

• specify the finite state machine graphically, or

• use microprogramming

° Implementation can be derived from specification

Implementing the Control

Page 33: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

Graphical Specification of FSMMemRead

ALUSrcA = 0IorD = 0IRWrite

ALUSrcB = 01ALUOp = 00

PCWritePCSource = 00

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

MemReadIorD = 1

MemWriteIorD = 1

RegDst = 1RegWrite

MemtoReg = 0

RegDst = 1RegWrite

MemtoReg = 0

PCWritePCSource = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01

PCWriteCondPCSource = 01

Instruction decode/register fetch

Instruction fetch

0 1

Start

Jumpcompletion

9862

3

4

5 7

Memory readcompleton step

R-type completionMemoryaccess

Memoryaccess

ExecutionBranch

completionMemory address

computation

Page 34: 15-447 Computer ArchitectureFall 2008 © September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu msakr/15447-f08/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2008 ©

° Implementation:

Finite State Machine for Control

PCWrite

PCWriteCond

IorD

MemtoReg

PCSource

ALUOp

ALUSrcB

ALUSrcA

RegWrite

RegDst

NS3NS2NS1NS0

Op5

Op4

Op3

Op2

Op1

Op0

S3

S2

S1

S0

State register

IRWrite

MemRead

MemWrite

Instruction registeropcode field

Outputs

Control logic

Inputs