1 CS141-L4-1 Tarun Soni, Summer ‘03 Multi Cycle CPU Previously: built a Single Cycle CPU. Today: Exceptions Multi-cycle CPU; Microprogramming CS141-L4-2 Tarun Soni, Summer ‘03 Mid-term Review Discussion Session Peterson Hall 104 Tue: 2-3 pm Tue: 3-4 pm 0 5 10 15 20 25 30 35 40 45 50 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 Series1 CS141-L4-3 Tarun Soni, Summer ‘03 Instruction Set Architectures Performance issues 2s complement, Addition, Subtraction Multiplication, Division, Floating Point numbers ALUs Single Cycle CPU Exceptions Multicycle CPU: datapath; control Microprogramming The Story so far: CS141-L4-4 Tarun Soni, Summer ‘03 • Design alternative: – provide more powerful operations – goal is to reduce number of instructions executed – danger is a slower cycle time and/or a higher CPI • Sometimes referred to as “RISC vs. CISC” – virtually all new instruction sets since 1982 have been RISC – VAX: minimize code size, make assembly language easy instructions from 1 to 54 bytes long! • We’ll look at Pentium, UltraSparc and JVM Alternative Architectures CS141-L4-5 Tarun Soni, Summer ‘03 Pentium CS141-L4-6 Tarun Soni, Summer ‘03 Java VM • Most instr one byte – ADD – POP • One byte arg – ILOAD IND8 – BIPUSH CON8 • Two byte arg – SIPUSH CON16 – IF_ICMPEQ OFFSET16 • Type = int, signed int etc.
15
Embed
Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Example: memory is used twice, at different times
– Ave memaccess per inst = 1 + Flw + Fsw ~ 1.3
– if CPI is 4.8, imem util ization = 1/4.8, dmem =0.3/4.8
• We could reduce HW without hurting performance
– extra control
IR <- Mem[PC]
A <- R[rs] ; B<– R[rt]
S <– A + B
R[rd] <– S;PC <– PC+4;
S <– A + SX
M <– Mem[S]
R[rd] <– M;PC <– PC+4;
S <– A or ZX
R[rt] <– S;PC <– PC+4;
S <– A + SX
Mem[S] <- B
PC <– PC+4; PC < PC+4; PC < PC+SX;
CS141-L4-38 Tarun Soni, Summer ‘03
Multicycle CPU: Sharing Functional Units
PC
Memory
Address
Instruction �or data
Data
Instruction �register
RegistersRegister #
Data
Register #
Register #
ALU
Memory �data �
register
A
B
ALUOut
S t ep n a m eA c t i o n f o r R -t y p e
i n s t r u c t io n sA c t io n fo r m e m o r y -r e f er en c e
in s t r u c t i o n sA c ti o n f o r b r an c h e s
A c t io n f o r j u m p s
In st ruc tio n fet c h IR = M em o ry[ P C]P C = P C + 4
In st ruc tio n A = Re g [IR [2 5-2 1] ]de c od e /reg ist er fet c h B = Re g [IR [2 0-1 6] ]
A LU O u t = P C + (s ign -ex t en d (IR [ 15 -0] ) < < 2)
E xe c u tio n , a dd re ss AL U O ut = A o p B AL U O ut = A + sig n-e x te nd if (A = = B ) t he n PC = P C [3 1 -2 8 ] I Ic om p uta tio n, bra nc h / (IR [ 15 -0] ) P C = A L UO u t (I R [ 25 -0] << 2 )ju m p co m ple tion
Me m ory ac c e ss or R -ty p e Re g [IR [1 5-1 1 ]] = L o ad : M D R = Me m ory [ AL U O ut ]c om p le tion A L UO u t o r
S to re : M e mo ry [A L UO ut] = B
Me m ory re a d co m ple tion L o ad : R e g[ IR [ 20 -16 ]] = M DR
Reuse:
• ALU
• Memory
Need more
• Muxing
• Control
Single ALU, Common data and instruction memory datapath
CS141-L4-39 Tarun Soni, Summer ‘03
Since we reuse logic (e.g. ALU), we need to store results between states
Need extra registers when:– signal is computed in one clock cycle and used in
another, AND– the inputs to the combinational circuit can change
before the signal is written into a state element.
Multicycle CPU: Adding State Elements
CS141-L4-40 Tarun Soni, Summer ‘03
PC
Instruction memory
Read address
Instruction
16 32
Add ALU result
M u x
Registers
Write registerWrite data
Read data 1
Read data 2
Read register 1Read register 2
Shift left 2
4
M u x
ALU operation3
RegWrite
MemRead
MemWrite
PCSrc
ALUSrc
MemtoReg
ALU result
ZeroALU
Data memory
Address
Write data
Read data M
u x
Sign extend
Add
� � � � ��� ��� � �
Multicycle CPU: Adding State Elements
CS141-L4-41 Tarun Soni, Summer ‘03
Shift left 2
PCM u x
0
1
RegistersWrite register
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Instruction [15–11]
M u x
0
1
M u x
0
1
4
Instruction [15–0]
Sign extend
3216
Instruction [25–21]
Instruction [20–16]
Instruction [15–0]
Instruction register
ALU control
ALU result
ALUZero
Memory data
register
A
B
IorD
MemRead
MemWrite
MemtoReg
PCWriteCond
PCWrite
IRWrite
ALUOp
ALUSrcB
ALUSrcA
RegDst
PCSource
RegWrite
Control
Outputs
Op [5–0]
Instruction [31-26]
Instruction [5–0]
M u x
0
2
Jump address [31-0]Instruction [25–0] 26 28
Shift left 2
PC [31-28]
1
1 M u x
0
32
M u x
0
1ALUOut
Memory
MemData
Write data
Address
Multicycle CPU: The Full Multi-Cycle Implementation
Instruction A = Reg [IR[25-21]]decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] IIcomputation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)jump completion
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]completion ALUOut or
Store: Memory [ALUOut] = B
Memory read completion Load: Reg[IR[20-16]] = MDR
CS141-L4-56 Tarun Soni, Summer ‘03
Multicycle CPU: Mid-term alert !!
• How many cycles will it take to execute this code?
Consider the FSM in case of 100s of i nstructions !!!
• FSMs get unmanageable quickly as they grow.
– hard to specify
– hard to manipulate
– error prone
– hard to visualize
• The state digrams that arise define the controller for an instruction set processor are highly structured
• Use this structure to construct a simple “microsequencer”
• Control reduces to programming this very simple device
– microprogramming
CS141-L4-71 Tarun Soni, Summer ‘03
Microprogramming
Opcode
State Reg
Inputs
Outputs
Control LogicPLA or ROM
M ulticycleDatapath
1
Address Select Logic
Adder
Types of “ branching”• Set state to 0• Dispatch (state 1)• Use incremented state
number
Common case: State += 1;
Microprogramming:A Part icular Strategy for Implement ing the Control Unit of a processor by "programming" at the level of register transfer operations
Microarchitecture:Logical structure and functional capabilities of the hardware as seen by the microprogrammer
Historical Note:
IBM 360 Series first to distinguish between architecture & organizat ion Same instruction set across wide range of implementat ions, each with different cost/performance
– control field for each control point in the machine
µseq µaddr A-mux B-mux bus enables register enables
Control Logic / Store(PLA, ROM)
OPcode
Datapath
Inst
ruct
ion
Decode
Con
ditio
ns
ControlPoints
microinstruction
Depending on bus organization, many potent ial control combinations simply wrong, i.e., implies transfers that can never happen atthe same time.
Idea: encode fields to save ROM space
Example: mem_to_reg and ALU_to_reg should never happen simultenously;=> encode in single bit which is decoded rather than two separate bits
CS141-L4-74 Tarun Soni, Summer ‘03
Vertical Microinstructions
° “Vertical” Microcode
– encoded control fields with local decode
src dst
DEC
DEC
other control fields next states inputs
MUX
Some of these may havenothing to do with registers!
CS141-L4-75 Tarun Soni, Summer ‘03
Design Microinstruction Sets
1) Start with list of control signals2) Group signals together that make sense (vs. random): called “ fields”3) Places fields in some logical order
(e.g., ALU operation & ALU operands first andmicroinstruction sequencing last)
4) Create a symbolic legend for the microinstruction format, showing name of field values and how they set the control signals
– Use computers to design computers5) To minimize the width, encode operations that will never be used at the same
time
CS141-L4-76 Tarun Soni, Summer ‘03
Microinstructions Start with list of control signals, grouped into fields
Signal name Effect when deasserted Effect when assertedALUSelA 1st ALU operand = PC 1st ALU operand = Reg[rs]RegWrite None Reg. is written MemtoReg Reg. write data input = ALU Reg. write data input = memory RegDst
Reg. dest. no. = rt Reg. dest. no. = rdTargetWrite None Target reg. = ALU MemRead None Memory at address is readMemWrite None Memory at address is written IorD Memory address = PC Memory address = ALUIRWrite None IR = MemoryPCWrite None PC = PCSourcePCWriteCond None IF ALUzero then PC = PCSource
Sing
le B
it C
ontr
ol
Signal name Value EffectALUOp 00 ALU adds
01 ALU subtracts 10 ALU does function code11 ALU does logical OR
ALUSelB 000 2nd ALU input = Reg[rt] 001 2nd ALU input = 4 010 2nd ALU input = sign extended IR[15-0] 011 2nd ALU input = sign extended, shift left 2 IR[15-0]100 2nd ALU input = zero extended IR[15-0]
PCSource 00 PC = ALU 01 PC = Target 10 PC = PC+4[29-26] : IR[25–0] << 2
Mul
tiple
Bit
Con
trol
CS141-L4-77 Tarun Soni, Summer ‘03
Microinstructions
Field Name Width Control Signals Set
wide narrow
ALU Control 4 2 ALUOp
SRC1 2 1 ALUSelA
SRC2 5 3 ALUSelB
ALU Destination 6 4 RegWrite, MemtoReg, RegDst, TargetWr.
Memory 4 3 MemRead, MemWrite, IorD
Memory Register 1 1 IRWrite
PCWrite Control 5 4 PCWrite, PCWriteCond, PCSource
Sequencing 3 2 AddrCtl
Total width 30 20 bits
CS141-L4-78 Tarun Soni, Summer ‘03
Microinstructions: MIPS f ield name and values
Field Name Values for Field Function of Field with Specific ValueALU Add ALU adds
Subt. ALU subtractsFunc code ALU does function codeOr ALU does logical OR
SRC1 PC 1st ALU input = PCrs 1st ALU input = Reg[rs]
SRC2 4 2nd ALU input = 4Extend 2nd ALU input = sign ext. IR[15-0]Extend0 2nd ALU input = zero ext. IR[15-0] Extshft 2nd ALU input = sign ex., sl IR[15-0]rt 2nd ALU input = Reg[rt]
ALU destination Target Target = ALUoutrd Reg[rd] = ALUout
Memory Read PC Read memory using PCRead ALU Read memory using ALU outputWrite ALU Write memory using ALU output
Memory register IR IR = MemWrite rt Reg[rt] = MemRead rt Mem = Reg[rt]
PC write ALU PC = ALU outputTarget-cond. IF ALU Zero then PC = Targetjump addr. PC = PCSource
Sequencing Seq Go to sequential µinstructionFetch Go to the first microinstructionDispatch Dispatch using ROM.
14
CS141-L4-79 Tarun Soni, Summer ‘03
Microinstructions: The datapath again
Shift�
left 2
MemtoReg
IorD MemRead MemWri te
PC
Memory
MemData
Write�
data
M�
u�
x
0
1
RegistersWrite
�register
Write�
data
Read�
data 1
Read�
data 2
Read�
register 1
Read�
register 2
Instruction�
[15–11]
M�
u�
x
0
1
M�
u�
x
0
1
4
ALUOpALUSrcB
RegDst RegWrite
Instruction�
[15–0]
Instruction [5– 0]
Sign�
extend
3216
Instruction�
[25–21]
Instruction�
[20–16]
Instruction�
[15– 0]
Instruction�
register1 M
�u
�x
0
3
2
ALU�
control
M�
u�
x
0
1ALU
�result
ALU
ALUSrcA
ZeroA
B
ALUOut
IRWr ite
Address
Memory�
data�
register
Field Name Values for Field Function of Field with Specific ValueSRC1 PC 1st ALU input = PC
rs 1st ALU input = Reg[rs]SRC2 4 2nd ALU input = 4
Extend 2nd ALU input = sign ext. IR[15-0]Extend0 2nd ALU input = zero ext. IR[15-0] Extshft 2nd ALU input = sign ex., sl IR[15-0]rt 2nd ALU input = Reg[rt]
ALU destination Target Target = ALUoutrd Reg[rd] = ALUout
CS141-L4-80 Tarun Soni, Summer ‘03
Microinstructions: Pros-Cons
• Specification Advantages:
– Easy to design and write
– Design architecture and microcode in parallel
• Implementation (off-chip ROM) Advantages
– Easy to change since values are in memory
– Can emulate other architectures and instruction sets
– Can make use of internal registers
• Implementation Disadvantages, SLOWER now that:
– Control is implemented on same chip as processor
– ROM is no longer faster than RAM
– No need to go back and make changes
CS141-L4-81 Tarun Soni, Summer ‘03
CPU Control: Methodology
Initial�
representationFinite state
�
diagramMicroprogram
Sequencing�
controlExplicit next
�
state functionMicroprogram counter
�
+ dispatch ROMS
Logic�
representationLogic
�
equationsTruth
�
tables
Implementation�
techniqueProgrammable
�
logic arrayRead only
�
memory
CS141-L4-82 Tarun Soni, Summer ‘03
Microprogramming: the last word ?
Summary: Microprogramming one inspiration for RISC
• If simple instruction could execute at very high clock rate…
• If you could even write compilers to produce microinstructions…
• If most programs use simple instructions and addressing modes…
• If microcode is kept in RAM instead of ROM so as to fix bugs …
• If same memory used for control memory could be used instead as cache for “macroinstructions”…
• Then why not skip instruction interpretation by a microprogram and simply compile directly into lowest language of machine? (microprogramming is overkill when ISA matches datapath 1-1)
CS141-L4-83 Tarun Soni, Summer ‘03
Exceptions
Support ing exceptions in our FSM
MemReadALUSelA = 0
IorD = 0IRWrite
ALUSelB = 01ALUOp = 00
PCWritePCSource = 00
ALUSelA = 0ALUSelB = 11ALUOp = 00TargetWrite
Memory InstFSM
R-type InstFSM
Branch InstFSM
Jump InstFSM
Instruction Fetch, state 0 Instruction Decode/ Register Fetch, state 1
Opcode = LW or SW
Opcode = R-ty
pe
Opc
ode
= BE
Q
Opc
ode
= J
MP
Start
to state 10
Opcode = anything else
CS141-L4-84 Tarun Soni, Summer ‘03
Exceptions
Support ing exceptions in our FSM
ALUSelA = 1ALUSelB = 00ALUOp = 10
from state 1
ALUSelA = 1RegDst = 1RegWrite
MemtoReg = 0ALUSelB = 10ALUOp = 10
To state 0
R-type instructions
overflowTo state 11
15
CS141-L4-85 Tarun Soni, Summer ‘03
Exceptions
IntCause=1CauseWrite
ALUSelA = 0ALUSelB = 01ALUOp = 01
EPCWrite
To state 0 (fetch)
IntCause=0CauseWrite
PCWritePCSource=11
state 11
state 13
state 10 state 12
EPC
Cau
se
PC
PCWrite EPCWrite
CauseWrite
IntCause
PCSource
InterruptHandlerAddress
sub4
illegalinstruction
arithmeticoverflow
Support ing exceptions in our FSM
Write Cause into registerWrite PC into EPCLoad Exception Handler address to PC
CS141-L4-86 Tarun Soni, Summer ‘03
Exceptions
IR <= MEM[PC]PC <= PC + 4
R-type
A <= R[rs]B <= R[rt]
S <= A fun B
R[rd] <= S
S <= A op ZX
R[rt] <= S
ORi
S <= A + SX
R[rt] <= M
M <= MEM[S]
LW
S <= A + SX
MEM[S] <= B
SW
other
undefined instruction
EPC <= PC - 4PC <= exp_addrcause <= 10 (RI)
EPC <= PC - 4PC <= exp_addrcause <= 12 (Ovf)
overflow
Additional condition fromDatapath
Equal
BEQ
PC <= PC +SX || 00
0010
0011
S <= A - B ~Equal
CS141-L4-87 Tarun Soni, Summer ‘03
Summary
• multicycle CPUs make things faster.
• control is harder.
• microprogramming can simplify (conceptually) CPU control generation
• a microprogram is a small program inside the CPU that executes the individual instructions of the “real” program.
• exception-handling is difficult in the CPU, because the interactions between the executing instructions and the interrupt are complex and unpredictable.
CS141-L4-88 Tarun Soni, Summer ‘03
Mid-Term Review
• Technology trends: Design for the future • Instruction Set Architectures: types of ISAs: Addressing modes, length of instruction etc.• MIPS instruction format-basic classes of instructions• Registers and load store architectures• Data types, operands, memory organization/addressing• Basic MIPS instructions: Arithmetic, logical, data transfer, branching, jumps• Issues in jump/branching distance and immediate addressing modes• Stacks and frames• E.g., swap(), leaf_procedure(), nested_procedure()
• Performance: Relative (Boeing e.g,), Metrics, Benchmarking, SPEC marks• Performance = Instruction Count x Cycles/Instruction x Seconds/Cycle • Amdahl’s law Improvement = Execution Time Unaffected + ( Execution Time Affected /
Amount of Improvement )• Arithmetic: 2s complement• Basic digital logic, 1-bit adder, full adder, 32-bit adder/subtractor• ALU: adder+mux+special conditions• Delays in combinational logic, clocking• Ripple carry vs. Carry look ahead adders
• Basics of booth arithmetic• Floating point representation• Floating point operations (+,-,*,/)• Guard,round and sticky bits
• Single cycle CPU• Building blocks: Register files, memory etc.• Storage units, clocking methodology • PC arithmetic• Instruction fetch• Datapath on various operations: Load, Store, Branch, R-type, I-type• Control: basic control signals for the MIPS subset• Distributed control: Main control + ALU control • PLA implementation• Timing diagrams
CS141-L4-90 Tarun Soni, Summer ‘03
Mid-Term Review
• Multi-cycle CPU• Datapath: registers/stages: Ifetch, A,B, Execute, Store etc.• Various instructions through the datapath• Control: Sharing functional units• Finite state machine perspective for control: FSM for MIPS • Implementation styles: ROM, PLA• Microprogramming: Horizontal, vertical, relationship to RISC• Exceptions: change in FSM, internal, external; need to save state.