15-447 Computer Architecture Fall 2008 © September 24, 2008 Nael Abu-Ghazaleh [email protected] www.qatar.cmu.edu/~msakr/15447-f08/ CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
Dec 19, 2015
15-447 Computer Architecture Fall 2008 ©
September 24, 2008
Nael [email protected]
www.qatar.cmu.edu/~msakr/15447-f08/
CS-447– Computer Architecture
Lecture 12Multiple Cycle Datapath
15-447 Computer Architecture Fall 2008 ©
Implementation vs. Performance
Performance of a processor is determined by
• Instruction count of a program
• CPI
• Clock cycle time (clock rate)
The compiler & the ISA determine the instruction count.
The implementation of the processor determines the CPI and the clock cycle time.
15-447 Computer Architecture Fall 2008 ©
Possible Execution Steps of Any Instructions
° Instruction Fetch
° Instruction Decode and Register Fetch
° Execution of the Memory Reference Instruction
° Execution of Arithmetic-Logical operations
° Branch Instruction
° Jump Instruction
15-447 Computer Architecture Fall 2008 ©
Instruction Processing° Five steps:
• Instruction fetch (IF)
• Instruction decode and operand fetch (ID)
• ALU/execute (EX)
• Memory (not required) (MEM)
• Write-back (WB)
Registers
Register #
Data
Register #
Datamemory
Address
Data
Register #
PC Instruction ALU
Instructionmemory
Address
IF
ID
EX
MEM
WB
15-447 Computer Architecture Fall 2008 ©
Single Cycle Implementation
PC
Instructionmemory
Readaddress
Instruction
16 32
Add ALUresult
Mux
Registers
Writeregister
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
Shiftleft 2
4
Mux
ALU operation3
RegWrite
MemRead
MemWrite
PCSrc
ALUSrc
MemtoReg
ALUresult
ZeroALU
Datamemory
Address
Writedata
Readdata M
ux
Signextend
Add
15-447 Computer Architecture Fall 2008 ©
Multiple ALUs and Memory Units
PC
Instructionmemory
Readaddress
Instruction
16 32
Add ALUresult
Mux
Registers
Writeregister
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
Shiftleft 2
4
Mux
ALU operation3
RegWrite
MemRead
MemWrite
PCSrc
ALUSrc
MemtoReg
ALUresult
ZeroALU
Datamemory
Address
Writedata
Readdata M
ux
Signextend
Add
15-447 Computer Architecture Fall 2008 ©
Single Cycle Datapath
15-447 Computer Architecture Fall 2008 ©
What’s Wrong with Single Cycle?
° All instructions run at the speed of the slowest instruction.
° Adding a long instruction can hurt performance• What if you wanted to include multiply?
° You cannot reuse any parts of the processor• We have 3 different adders to calculate PC+4,
PC+4+offset and the ALU
° No profit in making the common case fast• Since every instruction runs at the slowest instruction
speed- This is particularly important for loads as we will see later
15-447 Computer Architecture Fall 2008 ©
What’s Wrong with Single Cycle?
1 ns – Register read/write time
2 ns – ALU/adder
2 ns – memory access
0 ns – MUX, PC access, sign extend, ROM
add: 2ns + 1ns + 2ns + 1ns = 6 ns
beq: 2ns + 1ns + 2ns = 5 ns
sw: 2ns + 1ns + 2ns + 2ns = 7 ns
lw: 2ns + 1ns + 2ns + 2ns + 1ns = 8 ns
Get read ALU mem writeInstr reg operation reg
15-447 Computer Architecture Fall 2008 ©
Computing Execution Time
Assume: 100 instructions executed25% of instructions are loads,
10% of instructions are stores,
45% of instructions are adds, and
20% of instructions are branches.
Single-cycle execution:
100 * 8ns = 800 ns
Optimal execution:
25*8ns + 10*7ns + 45*6ns + 20*5ns = 640 ns
15-447 Computer Architecture Fall 2008 ©
Single Cycle Problems° A sequence of instructions:
1. LW (IF, ID, EX, MEM, WB)
2. SW (IF, ID, EX, MEM)
3. etc
Clk
Single Cycle Implementation:
Load Store Waste
Cycle 1 Cycle 2
• what if we had a more complicated instruction like floating point?
• wasteful of area
15-447 Computer Architecture Fall 2008 ©
Multiple Cycle Solution
• use a “smaller” cycle time
• have different instructions take different numbers of cycles
• a “multicycle” datapath:
Data
Register #
Register #
Register #
PC Address
Instructionor dataMemory Registers ALU
Instructionregister
Memorydata
register
ALUOut
A
BData
15-447 Computer Architecture Fall 2008 ©
° We will be reusing functional units• ALU used to compute address and to increment PC
• Memory used for instruction and data
° We’ll use a finite state machine for control
Multicycle Approach
Data
Register #
Register #
Register #
PC Address
Instructionor dataMemory Registers ALU
Instructionregister
Memorydata
register
ALUOut
A
BData
15-447 Computer Architecture Fall 2008 ©
The Five Stages of an Instruction
° IF: Instruction Fetch and Update PC
° ID: Instruction Decode and Registers Fetch
° Ex: Execute R-type; calculate memory address
° Mem: Read/write the data from/to the Data Memory
° WB: Write the result data into the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
IF ID Ex Mem WB
15-447 Computer Architecture Fall 2008 ©
° Break up the instructions into steps, each step takes a cycle
• balance the amount of work to be done
• restrict each cycle to use only one major functional unit
° At the end of a cycle
• store values for use in later cycles (easiest thing to do)
• introduce additional “internal” registers
Multicycle Implementation
Readregister 1
Readregister 2
Writeregister
Writedata
Registers ALU
Zero
Readdata 1
Readdata 2
Signextend
16 32
Instruction[25–21]
Instruction[20–16]
Instruction[15–0]
ALUresult
Mux
Mux
Shiftleft 2
Instructionregister
PC 0
1
Mux
0
1
Mux
0
1
Mux
0
1A
B 0
1
2
3
ALUOut
Instruction[15–0]
Memorydata
register
Address
Writedata
Memory
MemData
4
Instruction[15–11]
15-447 Computer Architecture Fall 2008 ©
The Five Stages of Load Instruction
° IF: Instruction Fetch and Update PC
° ID: Instruction Decode and Registers Fetch
° Ex: Execute R-type; calculate memory address
° Mem: Read/write the data from/to the Data Memory
° WB: Write the result data into the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
IF ID Ex Mem WBlw
15-447 Computer Architecture Fall 2008 ©
° Break the instruction execution into Clock Cycles
• Different instructions require a different number of clock cycles
• Clock cycle is limited by the slowest stage
• Instruction latency is not reduced (time from the start of an instruction to its completion)
Multiple Cycle Implementation
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
IFetch Dec Exec Mem WBlw
Cycle 7Cycle 6 Cycle 8
sw IFetch Dec Exec Mem
15-447 Computer Architecture Fall 2008 ©
Single Cycle vs. Multiple Cycle
Clk
Cycle 1
Multiple Cycle Implementation:
IFetch Dec Exec Mem WB
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
IFetch Dec Exec Mem
lw sw
Clk
Single Cycle Implementation:
Load Store Waste
IFetch
R-type
Cycle 1 Cycle 2
15-447 Computer Architecture Fall 2008 ©
° Break up the instructions into steps, each step takes a cycle
• balance the amount of work to be done
• restrict each cycle to use only one major functional unit
° At the end of a cycle
• store values for use in later cycles (easiest thing to do)
• introduce additional “internal” registers
Multicycle Implementation
Readregister 1
Readregister 2
Writeregister
Writedata
Registers ALU
Zero
Readdata 1
Readdata 2
Signextend
16 32
Instruction[25–21]
Instruction[20–16]
Instruction[15–0]
ALUresult
Mux
Mux
Shiftleft 2
Instructionregister
PC 0
1
Mux
0
1
Mux
0
1
Mux
0
1A
B 0
1
2
3
ALUOut
Instruction[15–0]
Memorydata
register
Address
Writedata
Memory
MemData
4
Instruction[15–11]
15-447 Computer Architecture Fall 2008 ©
Instructions from ISA perspective
° Consider each instruction from perspective of ISA.
° Example:
• The add instruction changes a register.
• Register specified by bits 15:11 of instruction.
• Instruction specified by the PC.
• New value is the sum (“op”) of two registers.
• Registers specified by bits 25:21 and 20:16 of the instruction
Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]]
• In order to accomplish this we must break up the instruction.(kind of like introducing variables when
programming)
15-447 Computer Architecture Fall 2008 ©
Breaking down an instruction
° ISA definition of arithmetic:
Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op
Reg[Memory[PC][20:16]]
° Could break down to:
•IR <= Memory[PC]
•A <= Reg[IR[25:21]]
•B <= Reg[IR[20:16]]
•ALUOut <= A op B
•Reg[IR[20:16]] <= ALUOut
° We forgot an important part of the definition of arithmetic!
•PC <= PC + 4
15-447 Computer Architecture Fall 2008 ©
Idea behind multicycle approach
° We define each instruction from the ISA perspective (do this!)
° Break it down into steps following our rule that data flows through at most one major functional unit (e.g., balance work across steps)
° Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.)
° Finally try and pack as much work into each step (avoid unnecessary cycles)
while also trying to share steps where possible(minimizes control, helps to simplify solution)
15-447 Computer Architecture Fall 2008 ©
° Instruction Fetch
° Instruction Decode and Register Fetch
° Execution, Memory Address Computation, or Branch Completion
° Memory Access or R-type instruction completion
° Write-back step
INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
Five Execution Steps
15-447 Computer Architecture Fall 2008 ©
° Use PC to get instruction and put it in the Instruction Register.
° Increment the PC by 4 and put the result back in the PC.
° Can be described succinctly using RTL "Register-Transfer Language"
IR <= Memory[PC];PC <= PC + 4;
Can we figure out the values of the control signals?
What is the advantage of updating the PC now?
Step 1: Instruction Fetch
15-447 Computer Architecture Fall 2008 ©
° Read registers rs and rt in case we need them
° Compute the branch address in case the instruction is a branch
° RTL:
A <= Reg[IR[25:21]];B <= Reg[IR[20:16]];ALUOut <= PC + (sign-extend(IR[15:0]) << 2);
° We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic)
Step 2: Instruction Decode and Register Fetch
15-447 Computer Architecture Fall 2008 ©
° ALU is performing one of three functions, based on instruction type
° Memory Reference:
ALUOut <= A + sign-extend(IR[15:0]);
° R-type:
ALUOut <= A op B;
° Branch:
if (A==B) PC <= ALUOut;
Step 3 (instruction dependent)
15-447 Computer Architecture Fall 2008 ©
° Loads and stores access memory
MDR <= Memory[ALUOut];or
Memory[ALUOut] <= B;
° R-type instructions finish
Reg[IR[15:11]] <= ALUOut;
Step 4 (R-type or memory-access)
15-447 Computer Architecture Fall 2008 ©
°Reg[IR[20:16]] <= MDR;
Which instruction needs this?
Write-back step
15-447 Computer Architecture Fall 2008 ©
Summary:
15-447 Computer Architecture Fall 2008 ©
Readregister 1
Readregister 2
Writeregister
Writedata
Registers ALU
Zero
Readdata 1
Readdata 2
Signextend
16 32
Instruction[31–26]
Instruction[25–21]
Instruction[20–16]
Instruction[15–0]
ALUresult
Mux
Mux
Shiftleft 2
Shiftleft 2
Instructionregister
PC 0
1
Mux
0
1
Mux
0
1
Mux
0
1A
B 0
1
2
3
Mux
0
1
2
ALUOut
Instruction[15–0]
Memorydata
register
Address
Writedata
Memory
MemData
4
Instruction[15–11]
PCWriteCond
PCWrite
IorD
MemRead
MemWrite
MemtoReg
IRWrite
PCSource
ALUOp
ALUSrcB
ALUSrcA
RegWrite
RegDst
26 28
Outputs
Control
Op[5–0]
ALUcontrol
PC [31–28]
Instruction [25-0]
Instruction [5–0]
Jumpaddress[31–0]
Multiple Cycle Implementation
15-447 Computer Architecture Fall 2008 ©
° Finite state machines:
• a set of states and
• next state function (determined by current state and the input)
• output function (determined by current state and possibly input)
• We’ll use a Moore machine (output based only on current state)
Review: finite state machines
Inputs
Current state
Outputs
Clock
Next-statefunction
Outputfunction
Nextstate
15-447 Computer Architecture Fall 2008 ©
° Value of control signals is dependent upon:
• what instruction is being executed
• which step is being performed
° Use the information we’ve accumulated to specify a finite state machine
• specify the finite state machine graphically, or
• use microprogramming
° Implementation can be derived from specification
Implementing the Control
15-447 Computer Architecture Fall 2008 ©
Graphical Specification of FSMMemRead
ALUSrcA = 0IorD = 0IRWrite
ALUSrcB = 01ALUOp = 00
PCWritePCSource = 00
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
MemReadIorD = 1
MemWriteIorD = 1
RegDst = 1RegWrite
MemtoReg = 0
RegDst = 1RegWrite
MemtoReg = 0
PCWritePCSource = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01
PCWriteCondPCSource = 01
Instruction decode/register fetch
Instruction fetch
0 1
Start
Jumpcompletion
9862
3
4
5 7
Memory readcompleton step
R-type completionMemoryaccess
Memoryaccess
ExecutionBranch
completionMemory address
computation
15-447 Computer Architecture Fall 2008 ©
° Implementation:
Finite State Machine for Control
PCWrite
PCWriteCond
IorD
MemtoReg
PCSource
ALUOp
ALUSrcB
ALUSrcA
RegWrite
RegDst
NS3NS2NS1NS0
Op5
Op4
Op3
Op2
Op1
Op0
S3
S2
S1
S0
State register
IRWrite
MemRead
MemWrite
Instruction registeropcode field
Outputs
Control logic
Inputs