1 Chapter 4: ISA Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of others. Considerable material from previous years gleaned from Patterson & Hennessy.
Mar 29, 2015
1Chapter 4: ISA
Chapter 4: Instruction Set Architectures
CS140 Computer Organization
These slides are derived from those of Null & Lobur + the work of others.
Considerable material from previous years gleaned from Patterson & Hennessy.
Chapter 4: ISA 2
Chapter 4 Objectives & Introduction
Objective:• Learn the components common to every modern computer system.
• Be able to explain how each component contributes to program execution.
• Understand an ISA and how it relates to a real architecture.
• Know how the program assembly process works.
Introduction:• Chapter 1 presented a general overview of computer systems.• Chapter 2 discussed how data is formatted, stored and manipulated.• Chapter 3 described the fundamentals of digital circuits.• With this, we can understand how computer components work, and
how they fit together to create useful computer systems.
Chapter 4: ISA 3
Computer Components
Processor
PCI Controller
Chipset MemoryFSB
PCI Bus
Video ControllerClock
Chapter 7
Chapter 6Chapters 4 & 5
FSB
Chapter 4: ISA 4
4.2 CPU Basics• The computer’s CPU fetches, decodes, and executes
program instructions.
• The two principal parts of the CPU are the datapath and the control unit.
– The datapath consists of an arithmetic-logic unit and storage units (registers) that are interconnected by a data bus that is also connected to main memory.
– Various CPU components perform sequenced operations according to signals provided by its control unit.
– The control unit determines which actions to carry out according to the values in a program counter register and a status register.
It’s like plumbing – the datapath is the pipes, the control units are the faucets and valves.
Chapter 4: ISA 5
4.3 The BusPARALLEL BUS• The CPU shares data with other system components
by way of a data bus.– A bus is a set of wires that simultaneously convey a
single bit along each line.• Two types of buses are commonly found in computer
systems: point-to-point, and multipoint buses.
This is a point-to-point bus configuration:
SERIAL BUS• Deferred to later when we discuss IO
Chapter 4: ISA 6
• Buses have data lines, control lines, and address lines.
• The data lines convey bits from one device to another,
• Control lines determine the direction of data flow, and when each device can access the bus.
• Address lines determine the location of the source or destination of the data.
4.3 The Bus
Chapter 4: ISA 7
• A multipoint bus is shown below.• A multipoint bus is a shared resource, access to it is
controlled through protocols, which are built into the hardware and handled by the control lines.
4.3 The Bus
Chapter 4: ISA 8
– Distributed using self-detection: Devices decide which gets the bus among themselves.
– Distributed using collision-detection: Any device can try to use the bus. If its data collides with the data of another device, it tries again.
– Daisy chain: Permissions are passed from the highest-priority device to the lowest.
– Centralized parallel: Each device is directly connected to an arbitration circuit.
• In a master-slave configuration, where more than one device can be the bus master, concurrent bus master requests must be arbitrated.
• Four categories of bus arbitration are:
4.3 The Bus
Chapter 4: ISA 9
4.4 Clocks• Every computer contains at least one clock that synchronizes the
activities of its components.
• A fixed number of clock cycles are required to carry out each data movement or computational operation.
• The clock frequency, measured in megahertz or gigahertz, determines the speed with which all operations are carried out.
• Clock cycle time is the reciprocal of clock frequency.• Typical Intel and typical PIC clocks look like this:
– A 2 GHz clock has a cycle time of 0.5 nanoseconds.– A 8 MHz clock has a cycle time of 0.125 microseconds.
• One master clock has multiple frequencies used for various parts of the system.
Chapter 4: ISA 10
• Clock speed should not be confused with CPU performance.
• The CPU time required to run a program is given by the general performance equation:
We can improve CPU performance when we – reduce the number of instructions in a program, – reduce the number of cycles per instruction, or – reduce the number of nanoseconds per clock cycle.
4.4 Clocks
Chapter 4: ISA 11
Digression: Sequential Logic, Clocking• Combinational circuits: no memory
• Output depends only on the inputs
• Sequential circuits: have memory• How to ensure memory element is updated neither too soon, nor too
late?• Recall hardware circuits
• Flip/flop register is the writable memory element• Gate propagation delay means result takes time to stabilize;
Delay varies with inputs• Must wait until result stable before we can write that output
register to the next stage - otherwise garbage results.• How to be certain ALU output is stable?
• Solution: let the inputs chatter and stabilize – THEN apply the clock.
4.4 Clocks
Chapter 4: ISA 12
• Clock: free running signal with fixed cycle time (clock period)
° Clock determines when to write memory element
• level-triggered - store clock high (low)
• edge-triggered - store only on clock edge
° We will use negative (falling) edge-triggered methodology
period rising edgefalling edge
high (1)
low (0)
4.4 ClocksDigression: Sequential Logic, Clocking
Chapter 4: ISA 13
Role of Clock in Processors
• single-cycle machine: does everything in one clock cycle• instruction execution = up to 5 steps•must complete 5th step before cycle ends
clocksignal
instruction executionstep 1/step 2/step 3/step 4/step 5
datapathstable
register(s) written
falling clock edgerising clock edge
4.4 Clocks
Chapter 4: ISA 14
4.5 The Input/Output Subsystem
• A computer communicates with the outside world through its input/output (I/O) subsystem.
• I/O devices connect to the CPU through various interfaces.
• I/O can be memory-mapped-- where the I/O device behaves like main memory from the CPU’s point of view.
• Or I/O can be instruction-based, where the CPU has a specialized I/O instruction set.
Chapter 4: ISA 15
4.6 Memory Organization• Computer memory is a linear array of addressable storage cells that
are similar to registers.• Memory can be byte-addressable (most common), or word-
addressable, where a word consists of two or more bytes.• Memory is constructed of RAM chips, often referred to in terms of
length width.• If the addressable-unit of the machine is 8 bits, then a 4M 8 RAM
chip gives us 4 megabytes of 8-bit memory locations.
Chapter 4: ISA 16
4.6 Memory Organization• How does the computer access a memory location
corresponds to a particular address?
• We see that 4 Megabytes = 222 bytes.
• The memory locations for this memory are numbered 0 through 2 22 -1.
• Thus, the memory bus of this system requires at least 22 address lines.– The address lines “count” from 0 to 222 - 1 in
binary. Each line is either “on” or “off” indicating the location of the desired memory element.
Power 2 ^ Power
0 1
1 2
2 4
3 8
4 16
5 32
6 64
7 128
8 256
9 512
10 1,024
11 2,048
12 4,096
13 8,192
14 16,384
15 32,768
16 65,536
17 131,072
18 262,144
19 524,288
20 1,048,576
21 2,097,152
22 4,194,304
Chapter 4: ISA 17
• Physical memory usually consists of more than one RAM chip.
• Access is more efficient when memory is organized into banks of chips with the addresses interleaved across the chips
• With low-order interleaving, the low order bits of the address specify which memory bank contains the address of interest.
4.6 Memory Organization
Low-Order Interleaving
Byte Addresses
Chapter 4: ISA 18
4.7 Interrupts• The normal execution of a program is altered when an event of higher-
priority occurs. The CPU is alerted to such an event through an interrupt.
• Interrupts can be triggered by I/O requests, arithmetic errors (such as division by zero), or when an invalid instruction is encountered.
• For general-purpose systems, it is common to disable all interrupts during the time in which an interrupt is being processed.– Typically, this is achieved by setting a bit in the flags register.– Interrupts that are ignored in this case are called maskable.
• Nonmaskable interrupts are high-priority interrupts that cannot be ignored. (the CPU is on fire is nonmaskable.)
• Each interrupt is associated with a procedure that directs the actions of the CPU when an interrupt occurs.
• In Chapter 7 we’ll look at interrupts in more detail.
Chapter 4: ISA 19
• Interrupt processing involves adding another step to the fetch-decode-execute cycle as shown below.
The next slide shows a flowchart of “Process the interrupt.”
4.7 Interrupts
Chapter 4: ISA 20
4.7 Interrupts
Chapter 4: ISA 21
4.9 Instruction Processing
1. Detour looking at Instruction Sets– How are instructions laid out so they can be
simply decoded?– Going from PIC to MIPS
2. Detour looking at Hardware– Multiplexers, registers, and ALUs
3. The datapath– Looking at each stage– The datapath and some sample instructions.
Chapter 4: ISA 22
These are the 35 instructions available on the PIC processors we’ve used.
4.9 Instruction Processing
There are four types of instructions and the formats for these instructions are defined here.
Byte oriented instructions have a 00 here.
Bit oriented instructions have a 01 here.
Detour looking at Instruction Sets
Goto and Call have a 10 Literal instructions have a 11
Chapter 4: ISA 23
Detour looking at Instruction Sets
4.9 Instruction Processing
There are four types of instructions and the formats for these instructions are defined here.
PIC is a RISC (Reduced Instruction Set Computer.) A relatively simple set of rules can decipher these instructions.
Conversely, Intel is a CISC (Complex Instruction Set Computer.) The wiring needed to decipher the thousands of Intel instructions is overwhelming.
Chapter 4: ISA 24
The MIPS Instruction Formats• All MIPS instructions are 32 bits long. The three instruction formats:
– R-type
– I-type
– J-type
• The different fields are:– op: operation of the instruction– rs, rt, rd: the source and destination register specifiers– shamt: shift amount– funct: selects the variant of the operation in the “op” field– address / immediate: address offset or immediate value– target address: target address of the jump instruction
op target address
02631
6 bits 26 bits
op rs rt rd shamt funct
061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
Detour looking at Instruction Sets
4.9 Instruction Processing
Hey – these are the same flavors as the PIC instructions.
MIPS is a RISCIt has about 75 instructions.
Chapter 4: ISA 25
The MIPS-lite Subset for Today• ADD and SUB
–addu rd, rs, rt–subu rd, rs, rt
• OR Immediate:
–ori rt, rs, imm16• LOAD / STORE Word
–lw rt, rs, imm16–sw rt, rs, imm16
• BRANCH:
–beq rs, rt, imm16
op rs rt rd shamt funct
061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
Detour looking at Instruction Sets
4.9 Instruction Processing
MIPS–Lite has 6 instructions.
Chapter 4: ISA 26
The MIPS-lite Subset for Today
Detour looking at Instruction Sets
4.9 Instruction Processing
Things to Note:1. Since each instruction
is 32 bits, instruction addresses are mod 4.
2. These instructions access 3 registers.
3. The ori can use a 16 bit immediate.
4. Each register needs 5 bits to specify it – how many registers does this machine have?
Chapter 4: ISA 27
The MIPS-lite Subset for Today
Detour looking at Instruction Sets
4.9 Instruction Processing
Chapter 4: ISA 28
4.9 Instruction Processing
1. Detour looking at Instruction Sets– How are instructions laid out so they can be
simply decoded?– Going from PIC to MIPS
2. Detour looking at Hardware– Multiplexers, registers, and ALUs
3. The datapath– Looking at each stage– The datapath and some sample instructions.
Chapter 4: ISA 29
D-Latches
° C = 0, no change of state;
• Q (t + t ) = Q (t )
° C = 1, change is allowed;
• Q (t + t ) = D (t )
• No Indetermined Output
• D-latch based on SR-Latch with NAND Gates and control input C
Basic Building BlocksDetour looking at
Hardware 4.9 Instruction Processing
Chapter 4: ISA 30
Basic Building Blocks
Adder
32A
B32
Y32
Select
MU
X
32
32
A
B32
Result
OP
AL
U
32
32
A
B32
Sum
Carry
Ad
der
CarryIn
ALUMUX
Detour looking at Hardware
4.9 Instruction Processing
Chapter 4: ISA 31
Storage Element: Register File
• Register File consists of 32 registers:– Two 32-bit output busses:
busA and busB– One 32-bit input bus: busW
• Register is selected by:– RA (number) selects the register to put on busA (data)– RB (number) selects the register to put on busB (data)– RW (number) selects the register to be written
via busW (data) when Write Enable is 1
• Clock input (CLK) – The CLK input is a factor ONLY during write operation– During read operation, behaves as a combinational logic
block:• RA or RB valid busA or busB valid after “access
time.”
Write Enable
Clk
busW32
32busA
32busB
5 5 5RWRA RB
32 32-bitRegisters
Basic Building BlocksDetour looking at
Hardware 4.9 Instruction Processing
Chapter 4: ISA 32
Multiplexer
0
31
Input bitsSelector
Output
Selector
Outputs
Input
0
31
The 5 selector wires can choose one of the 32 inputs voltages and send it to the output.
The 5 selector wires choose which of the 32 outputs will get the input voltage.
Decoder
Basic Building BlocksDetour looking at
Hardware 4.9 Instruction Processing
Chapter 4: ISA 33
Multiplexer
0
31
Input words
Selector
Output
………………………………………………………………………………………………………………………
………………………… …………………………
Side ViewEnd View
Word 0
Word 31
Bit 0 Bit 31
……………
End View
Bit 0 Bit 31
Now, each of the 32 inputs has 32 bits. There are 32 x 32 bits in and 1 x 32 bits out.
This multiplexer is equivalent to 32 of those on the previous page
Each of these input words COULD be a register!
Basic Building BlocksDetour looking at
Hardware 4.9 Instruction Processing
Chapter 4: ISA 34
Multiplexer
0
31
Input words
Selector
Output
Basic Building BlocksDetour looking at
Hardware 4.9 Instruction Processing
Write Enable
Clk
busW32
32busA
32busB
5 5 5RWRA RB
32 32-bitRegisters
So this register file is just the multiplexer shown here.
Chapter 4: ISA 35
Storage Element: Idealized Memory• Memory (idealized)
– One input bus: Data In– One output bus: Data Out
• Memory word is selected by:– Address selects the word to put on Data Out– Write Enable = 1: address selects the memory
word to be written via the Data In bus
• Clock input (CLK) – The CLK input is a factor ONLY during write operation– During read operation, behaves as a combinational logic
block:• Address valid Data Out valid after “access time.”
Clk
Data In
Write Enable
32 32DataOut
Address
Basic Building BlocksDetour looking at
Hardware 4.9 Instruction Processing
Chapter 4: ISA 36
4.9 Instruction Processing
1. Detour looking at Instruction Sets– How are instructions laid out so they can be
simply decoded?– Going from PIC to MIPS
2. Detour looking at Hardware– Multiplexers, registers, and ALUs
3. The datapath– Looking at each stage– The datapath and some sample instructions.
Chapter 4: ISA 37
4.9 Instruction Processing
• The fetch-decode-execute-store cycle is the series of steps that a computer carries out when it runs a program.
• We first have to fetch an instruction from memory, and place it into the Instruction Register (IR).
• Once in the IR, it is decoded to determine what needs to be done next.
• If an immediate operand is involved in the operation, it is retrieved and prepared for execution.
• With everything in place, the instruction is executed.• If a result is to be stored in memory, that’s done next.• If a result is placed in a register, that’s the last stage.
The next slide shows a flowchart of this process.
Chapter 4: ISA 38
Generic Steps: DatapathPC
inst
ruct
ion
mem
ory
+4
rtrs
rd
reg
iste
rs
ALU
Data
mem
ory
imm
1. InstructionFetch
2. Decode/ Register
Read
3. Execute4. Memory5. Reg.
Write
4.9 Instruction Processing
Chapter 4: ISA 39
Stages of the Datapath (1/6)
Problem: a single, atomic block which “executes an instruction” (performs all necessary operations beginning with fetching the instruction) would be too bulky and inefficient
Solution: break up the process of “executing an instruction” into stages, and then connect the stages to create the whole datapath
Smaller stages are easier to design Easy to optimize (change) one stage without
touching the others
4.9 Instruction Processing P
C
inst
ruct
ion
mem
ory
+4
rtrsrd
regis
ters
ALU
Data
mem
ory
imm
1. InstructionFetch
2. Decode/ Register
Read
3. Execute4. Memory5. Reg.
Write
Chapter 4: ISA 40
Stages of the Datapath (2/6)
There is a wide variety of MIPS instructions: so what general steps do they have in common?
Stage 1: instruction fetch
No matter what the instruction, the 32-bit instruction word must first be fetched from memory (the cache-memory hierarchy)
Also, this is where we increment PC (that is, PC = PC + 4, to point to the next instruction: byte addressing so + 4)
4.9 Instruction Processing
PC
inst
ruct
ion
mem
ory
+4
rtrsrd
regis
ters
ALU
Data
mem
ory
imm
1. InstructionFetch
2. Decode/ Register
Read
3. Execute4. Memory5. Reg.
Write
Chapter 4: ISA 41
Stages of the Datapath (3/6)
Stage 2: Instruction Decode upon fetching the instruction, we next gather data from
the fields (decode all necessary instruction data) first, read the Opcode to determine instruction type and
field lengths second, read in data from all necessary registers
-for add, read two registers-for addi, read one register-for jal, no reads necessary
4.9 Instruction Processing
PC
inst
ruct
ion
mem
ory
+4
rtrsrd
regis
ters
ALU
Data
mem
ory
imm
1. InstructionFetch
2. Decode/ Register
Read
3. Execute4. Memory5. Reg.
Write
Chapter 4: ISA 42
Stages of the Datapath (4/6)
°Stage 3: ALU (Arithmetic-Logic Unit)
the real work of most instructions is done here: arithmetic (+, -, *, /), shifting, logic (&, |), comparisons (slt)what about loads and stores?
-lw $t0, 40($t1)-the address we are accessing in memory = the value in $t1 + the value 40-so we do this addition in this stage
4.9 Instruction Processing
PC
inst
ruct
ion
mem
ory
+4
rtrsrd
regis
ters
ALU
Data
mem
ory
imm
1. InstructionFetch
2. Decode/ Register
Read
3. Execute4. Memory5. Reg.
Write
Chapter 4: ISA 43
Stages of the Datapath (5/6)
°Stage 4: Memory Access
actually only the load and store instructions do anything during this stage; the others remain idle
since these instructions have a unique step, we need this extra stage to account for them
as a result of the cache system, this stage is expected to be just as fast (on average) as the others
4.9 Instruction Processing
PC
inst
ruct
ion
mem
ory
+4
rtrsrd
regis
ters
ALU
Data
mem
ory
imm
1. InstructionFetch
2. Decode/ Register
Read
3. Execute4. Memory5. Reg.
Write
Chapter 4: ISA 44
Stages of the Datapath (6/6)
°Stage 5: Register Write most instructions write the result of some
computation into a register examples: arithmetic, logical, shifts, loads, slt what about stores, branches, jumps?
-don’t write anything into a register at the end-these remain idle during this fifth stage
4.9 Instruction Processing
PC
inst
ruct
ion
mem
ory
+4
rtrsrd
regis
ters
ALU
Data
mem
ory
imm
1. InstructionFetch
2. Decode/ Register
Read
3. Execute4. Memory5. Reg.
Write
Chapter 4: ISA 45
Generic Steps: Datapath
PC
inst
ruct
ion
mem
ory
+4
rtrs
rd
reg
iste
rs
ALU
Data
mem
ory
imm
1. InstructionFetch
2. Decode/ Register
Read
3. Execute4. Memory5. Reg.
Write
4.9 Instruction Processing
Chapter 4: ISA 46
4.9 Instruction Processing
1. Detour looking at Instruction Sets– How are instructions laid out so they can be
simply decoded?– Going from PIC to MIPS
2. Detour looking at Hardware– Multiplexers, registers, and ALUs
3. The datapath– Looking at each stage– The datapath and some sample instructions.
Chapter 4: ISA 47
Datapath Walkthrough #1 - add
add $r3, $r1, $r2 # r3 = r1+r2 Stage 1: fetch this instruction, incr. PC;
Stage 2: decode to find it’s an add, read registers $r1 and $r2;
Stage 3: add the two values retrieved in Stage 2;
Stage 4: idle (nothing to write to memory);
Stage 5: write result of Stage 3 into register $r3;
4.9 Instruction Processing
Chapter 4: ISA 48
PC
inst
ruct
ion
mem
ory
+4re
gis
ters
ALU
Data
mem
ory
imm
2
1
3
ad
d r
3,
r1,
r2
reg[1]+reg[2]
reg[2]
reg[1]
Datapath Walkthrough #1 add $r3, $r1, $r2 # r3 = r1+r2
4.9 Instruction Processing
Chapter 4: ISA 49
Datapath Walkthroughs #2 - sw
sw $r3, 17($r1) Stage 1: fetch this instruction, inc. PC
Stage 2: decode to find it’s a sw, then read registers $r1 and $r3
Stage 3: add 17 to value in register $41 (retrieved in Stage 2)
Stage 4: write value in register $r3 (retrieved in Stage 2) into memory address computed in Stage 3
Stage 5: go idle (nothing to write into a register)
4.9 Instruction Processing
Chapter 4: ISA 50
PC
inst
ruct
ion
mem
ory
+4re
gis
ters
ALU
Data
mem
ory
imm
3
1
x
SW
r3
, 1
7(r
1)
reg[1]+17
17
reg[1]
ME
M[r
1+
17
]<=
r3
reg[3]
Datapath Walkthroughs #2 sw $r3, 17($r1)
4.9 Instruction Processing
Chapter 4: ISA 51
lw $r3, 17($r1) Stage 1: fetch this instruction, inc. PC Stage 2: decode to find it’s a lw, then read register $r1
Stage 3: add 17 to value in register $r1 (retrieved in Stage 2)
Stage 4: read value from memory address compute in Stage 3
Stage 5: write value found in Stage 4 into register $r3
Datapath Walkthroughs #3 – lwNOTE: This is the one instruction
that requires all 5 stages
4.9 Instruction Processing
Chapter 4: ISA 52
PC
inst
ruct
ion
mem
ory
+4re
gis
ters
ALU
Data
mem
ory
imm
3
1
x
LW r
3,
17
(r1
)reg[1]+17
17
reg[1]
ME
M[r
1+
17
]
Datapath Walkthroughs #3 lw $r3, 17($r1)
4.9 Instruction Processing
Chapter 4: ISA 53
Datapath Summary°The datapath based on data transfers required to perform
instructions
°A controller causes the right transfers to happen
PC
inst
ruct
ion
mem
ory
+4
rtrs
rd
reg
iste
rs
ALU
Data
mem
ory
imm
Controller
opcode, funct
Chapter 4: ISA 54
4.13 Decoding & Control
1. An overview of how control works from instructions to electronics– How are instructions laid out so they can be
simply decoded?– Going from PIC to MIPS
2. Control operations– Samples of load, store and branch
3. The fetch unit in detail using “add” as example.
4. Operation of controls using or immediate, store, branch
Chapter 4: ISA 55
Mapping Code onto the DataPath: How does it all work?We Start With Code
############################################################## This is code that mimics the following C program.# main( )# {# printf( "Hello World\n" );# }# ###########################################################
.text
.globl main
main:lui $a0, helloori $v0, $0, 4 # li $v0, 4add $t0, $t1, $t2syscalljr $ra
.datahello:
.asciiz "Hello World\n"
4.13 Decoding
Chapter 4: ISA 56
From that code we get a 32-bit Equivalent Binary Representation.
Address Hex Op-code Mnemonic
[0x00400020] 0x3c021001 lui $2, 4097 ; 12: la $a0, hello
[0x00400024] 0x34440000 ori $4, $2, 0 ;
[0x00400028] 0x34020004 ori $2, $0, 4 ; 14: ori $v0, $0, 4
[0x0040002c] 0x012a4020 add $8, $9, $10 ; 14: add $t0,$t1,$t2
[0x00400030] 0x0000000c syscall ; 15: syscall
[0x00400034] 0x03e00008 jr $31 ; 16: jr $ra 0 1 2 a 4 0 2 0
0000 0001 0010 1010 0100 0000 0010 0000
000000 01001 01010 01000 00000 100000 0 9 10 8 0 32
4.13 Decoding
Chapter 4: ISA 57
What the hardware looks like.
Registers
R0
R8
R31
MUX To ALU
MUX From ALU
ALUMUX To ALU
Op
A B
Out
Select5 wires
6 wires
ovfc
4.13 Decoding
Chapter 4: ISA 58
What the hardware looks like.Registers
R0
R8
R31
MUX To ALU
MUX From ALU
ALUMUX To ALU
A B
Out
Op4.13 Decoding
Chapter 4: ISA 59
The hardware reads each of those fields.
RegistersR0
R8
R31
MUX To ALU
MUX From ALU
ALUMUX To ALU
Out
000000 01001 01010 01000 00000 100000 0 9 10 8 0 32
A B
4.13 Decoding
Chapter 4: ISA 60
An Overview of the Implementation
DataOut
Clk
5
Rw Ra Rb32 32-bitRegisters
Rd
AL
U
Clk
Data In
DataAddress Ideal
DataMemory
Instruction
InstructionAddress
IdealInstruction
Memory
Clk
PC
5Rs
5Rt
32
323232
A
B
Nex
t A
dd
ress
Control
Datapath
Control Signals Conditions
4.13 Decoding & Control
Chapter 4: ISA 61
4.13 Decoding & Control
1. An overview of how control works from instructions to electronics– How are instructions laid out so they can be
simply decoded?– Going from PIC to MIPS
2. Control operations– Samples of load, store and branch
3. The fetch unit in detail using “add” as example.
4. Operation of controls using or immediate, store, branch
Chapter 4: ISA 62
Overview of the Instruction Fetch Unit
• The common operations– Fetch the Instruction: mem[PC]– Update the program counter:
• Sequential Code: PC PC + 4
• Branch and Jump: PC “something else”
32
Instruction WordAddress
InstructionMemory
PCClk
Next AddressLogic
4.13 Decoding & Control Control Samples
Chapter 4: ISA 63
Add & Subtract
R[rd] R[rs] op R[rt]; Example: addu rd, rs, rt– Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields– ALUctr and RegWr: control logic after decoding the instruction
32
Result
ALUctr
Clk
busW
RegWr
32
32
busA
32
busB
5 5 5
Rw Ra Rb
32 32-bitRegisters
Rs RtRd
AL
Uop rs rt rd shamt funct
061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
4.13 Decoding & Control Control Samples
Chapter 4: ISA 64
Logical Operations With Immediate
• R[rt] R[rs] op ZeroExt[ imm16 ]
11
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits rd?
immediate
016 1531
16 bits16 bits
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
32
Result
ALUctr
Clk
busW
RegWr
32
32
busA
32
busB
5 5 5
Rw Ra Rb32 32-bitRegisters
Rs
ZeroE
xt
Mu
x
RtRdRegDst
Mux
3216imm16
ALUSrc
AL
U
Rt?
4.13 Decoding & Control Control Samples
Chapter 4: ISA 65
Load Operations
• R[rt] Mem[R[rs] + SignExt[imm16]]; Example: lw rt, rs, imm16
11
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits rd
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
5 5 5
Rw Ra Rb32 32-bitRegisters
Rs
RtRdRegDst
Exten
der
Mu
x
Mux
3216
imm16
ALUSrc
ExtOp
Clk
Data InWrEn
32
Adr
DataMemory
32
AL
U
MemWr Mu
x
W_Src
??
Rt?
4.13 Decoding & Control Control Samples
Chapter 4: ISA 66
Store Operations
• Mem[ R[rs] + SignExt[imm16] R[rt] ]; Example: sw rt, rs, imm16
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Exten
der
Mu
x
Mux
3216imm16
ALUSrcExtOp
Clk
Data InWrEn
32Adr
DataMemory
MemWr
AL
U
32
Mu
x
W_Src
4.13 Decoding & Control Control Samples
Chapter 4: ISA 67
The Branch Instruction
•beq rs, rt, imm16– mem[PC] Fetch the instruction from memory
– Equal R[rs] == R[rt] Calculate the branch condition
– if (Equal) Calculate the next instruction’s address• PC PC + 4 + ( SignExt(imm16) 4 )
– else• PC PC + 4
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
4.13 Decoding & Control Control Samples
Chapter 4: ISA 68
Datapath for Branch Operations
• beq rs, rt, imm16 Datapath generates condition (equal)
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
32
imm16P
C
Clk
00
Ad
der
Mu
x
Ad
der
4nPC_sel
Clk
busW
RegWr
32
busA
32busB
5 5 5
Rw Ra Rb32 32-bitRegisters
Rs Rt
Eq
ual
?
Cond
PC
Ext
Inst Address
4.13 Decoding & Control Control Samples
Chapter 4: ISA 69
Summary: A Single Cycle Datapathim
m16
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Exten
der
Mu
x
3216imm16
ALUSrcExtOp
Mu
x
MemtoReg
Clk
Data InWrEn32 Adr
DataMemory
MemWr
AL
U
Equal
Instruction<31:0>
0
1
0
1
01
<21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRtRs
=
Ad
der
Ad
der
PC
Clk
00
Mu
x
4
nPC_sel
PC
Ext
Adr
InstMemory
4.13 Decoding & Control• Rs, Rt, Rd and
Imed16 hardwired into datapath from Fetch Unit
• We have everything except control signals (the underlined pieces)
Chapter 4: ISA 70
Summary: Meaning of the Control Signals• ExtOp: “zero”, “sign”
• ALUsrc: 0 regB; 1 immed
• ALUctr: “add”, “sub”, “or”
° MemWr: 1 write memory
° MemtoReg: 0 ALU; 1 Mem
° RegDst: 0 “rt”; 1 “rd”
° RegWr: 1 write register
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Exten
der
Mu
x
3216imm16
ALUSrcExtOp
Mu
x
MemtoReg
Clk
Data InWrEn32 Adr
DataMemory
MemWr
AL
U
Equal
0
1
0
1
01
=
4.13 Decoding & Control
Chapter 4: ISA 71
4.13 Decoding & Control
1. An overview of how control works from instructions to electronics– How are instructions laid out so they can be
simply decoded?– Going from PIC to MIPS
2. Control operations– Samples of load, store and branch
3. The fetch unit in detail using “add” as example.
4. Operation of controls using or immediate, store, branch
Chapter 4: ISA 72
The add Instruction
•add rd, rs, rt
mem[PC] Fetch the instruction from memory
R[rd] R[rs] + R[rt] The actual operation
PC PC + 4 Calculate next instruction address
op rs rt rd shamt funct
061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
4.13 Decoding & Control
Chapter 4: ISA 73
The Fetch Unit• nPC_sel: 0 PC PC + 4
1 PC PC + 4 + SignExt(Im16) || 00
Adr
InstMemory
Ad
der
Ad
der
PC
Clk
00
Mu
x
4
nPC_sel
PC
Extim
m16
4.13 Decoding & Control
Chapter 4: ISA 74
Fetch Unit at Beginning (and end) of add• Fetch the instruction from
Instruction memory: Instruction mem[PC] (This is the same for all instructions)
• But wait until we get to branch instructions!!
Adr
InstMemory
Ad
der
Ad
der
PC
Clk
00
Mu
x
4
nPC_sel
imm
16
Instruction<31:0>
0
1
4.13 Decoding & Control
Chapter 4: ISA 75
4.13 Decoding & Control
1. An overview of how control works from instructions to electronics– How are instructions laid out so they can be
simply decoded?– Going from PIC to MIPS
2. Control operations– Samples of load, store and branch
3. The fetch unit in detail using “add” as example.
4. Operation of controls using or immediate, store, branch
Chapter 4: ISA 76
The Single Cycle Datapath
during Or Immediate
R[rt] R[rs] or ZeroExt(Imm16)op rs rt immediate
016212631
32
ALUctr =
Clk
busW
RegWr =
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst =
Exten
der
Mu
x
Mux
3216imm16
ALUSrc =
ExtOp =
Mu
x
MemtoReg =
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr = A
LU
InstructionFetch Unit
Clk
Zero
Instruction<31:0>
0
1
0
1
01<
21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
nPC_sel =
4.13 Decoding & Control
Chapter 4: ISA 77
R[rt] R[rs] or ZeroExt(Imm16)op rs rt immediate
016212631
32
ALUctr = Or
Clk
busW
RegWr = 1
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst = 0
Exten
der
Mu
x
Mux
3216imm16
ALUSrc = 1
ExtOp = 0
Mu
x
MemtoReg = 0
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr = 0A
LU
InstructionFetch Unit
Clk
Zero
Instruction<31:0>
0
1
0
1
01<
21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
nPC_sel= +4
The Single Cycle Datapath during Or Immediate
4.13 Decoding & Control
Chapter 4: ISA 78
Data Memory {R[rs] + SignExt[imm16]} R[rt]
op rs rt immediate
016212631
32
ALUctr =
Clk
busW
RegWr =
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst =
Exten
der
Mu
x
Mux
3216imm16
ALUSrc =
ExtOp =
Mu
x
MemtoReg =
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr = A
LU
InstructionFetch Unit
Clk
Zero
Instruction<31:0>
0
1
0
1
01<
21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
nPC_sel =
The Single Cycle Datapath during Store
4.13 Decoding & Control
Chapter 4: ISA 79
Instruction<31:0>
op rs rt immediate
016212631
32
ALUctr = Add
Clk
busW
RegWr = 0
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst = x
Exten
der
Mu
x
Mux
3216imm16
ALUSrc = 1
ExtOp = 1
Mu
x
MemtoReg = x
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr = 1A
LU
InstructionFetch Unit
Clk
Zero
0
1
0
1
01
<21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
nPC_sel= +4
Data Memory {R[rs] + SignExt[imm16]} R[rt]
The Single Cycle Datapath during Store
4.13 Decoding & Control
Chapter 4: ISA 80
if (R[rs] – R[rt] == 0) then Zero 1; else Zero 0op rs rt immediate
016212631
32
ALUctr =Sub
Clk
busW
RegWr = 0
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst = x
Exten
der
Mu
x
Mux
3216imm16
ALUSrc = 0
ExtOp = x
Mu
x
MemtoReg = x
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr = 0A
LU
InstructionFetch Unit
Clk
Zero
Instruction<31:0>
0
1
0
1
01<
21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
nPC_sel= “Br”
The Single Cycle Datapath during Branch
4.13 Decoding & Control
Chapter 4: ISA 81
if (Zero == 1) then PC = PC + 4 + SignExt(imm16)4 ; else PC = PC + 4
op rs rt immediate
016212631
° What is encoding of nPC_sel?
• Direct MUX select?• Branch / not branch
° Let’s choose second option
nPC_sel zero? MUX0 x 01 0 01 1 1
Adr
InstMemory
Ad
der
Ad
der
PC
Clk
00
Mu
x
4
nPC_sel
imm
16
Instruction<31:0>
0
1
Zero
The instruction fetch unitat end of branch
4.13 Decoding & Control
Chapter 4: ISA 82
Summary: A Single Cycle Processor
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Exten
der
Mu
x
Mux
3216imm16
ALUSrc
ExtOp
Mu
x
MemtoReg
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr
AL
U
InstructionFetch Unit
Clk
Zero
Instruction<31:0>
0
1
0
1
01
<21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
MainControl
op6
ALUControlfunc
6
3ALUop
ALUctr3
RegDst
ALUSrc
:Instr<5:0>
Instr<31:26>
Instr<15:0>
nPC_sel
4.13 Decoding & Control
Chapter 4: ISA 83
Drawback of this Single Cycle Processor• Long cycle time:
– Cycle time must be long enough for the load instruction:
PC’s Clock -to-Q +
Instruction Memory Access Time +
Register File Access Time +
ALU Delay (address calculation) +
Data Memory Access Time +
Register File Setup Time +
Clock Skew• Cycle time for load is much longer than needed for all
other instructions
4.13 Decoding & Control
Chapter 4: ISA 84
4.14 Real World Architectures
• We will look at an Intel architecture, which is a CISC machine and MIPS, which is a RISC machine.– CISC is an acronym for complex instruction set
computer.– RISC stands for reduced instruction set computer.
Chapter 4: ISA 85
4.14 Real World Architectures
• The classic Intel architecture, the 8086, was born in 1979. It is a CISC architecture.
• It was adopted by IBM for its famed PC, which was released in 1981.
• The 8086 operated on 16-bit data words and supported 20-bit memory addresses.
• Later, to lower costs, the 8-bit 8088 was introduced. Like the 8086, it used 20-bit memory addresses.
What was the largest memory that the 8086 could address?
Chapter 4: ISA 86
4.14 Real World Architectures• The 8086 had four 16-bit general-purpose registers that
could be accessed by the half-word.
• It also had a flags register, an instruction register, and a stack accessed through the values in two other registers, the base pointer and the stack pointer.
• The 8086 had no built in floating-point processing.
• In 1980, Intel released the 8087 numeric coprocessor, but few users elected to install them because of their cost.
Chapter 4: ISA 87
4.14 Real World Architectures
• In 1985, Intel introduced the 32-bit 80386.
• It also had no built-in floating-point unit.
• The 80486, introduced in 1989, was an 80386 that had built-in floating-point processing and cache memory.
• The 80386 and 80486 offered downward compatibility with the 8086 and 8088.
• Software written for the smaller word systems was directed to use the lower 16 bits of the 32-bit registers.
Chapter 4: ISA 88
4.14 Real World Architectures
• Currently, Intel’s most advanced 32-bit microprocessor is the Pentium 4.
• It can run as fast as 3.8 GHz. This clock rate is nearly 800 times faster than the 4.77 MHz of the 8086.
• Speed enhancing features include multilevel cache and instruction pipelining.
• Intel, along with many others, is marrying many of the ideas of RISC architectures with microprocessors that are largely CISC.
Chapter 4: ISA 89
4.14 Real World Architectures
• The MIPS family of CPUs has been one of the most successful in its class.
• In 1986 the first MIPS CPU was announced.
• It had a 32-bit word size and could address 4GB of memory.
• Over the years, MIPS processors have been used in general purpose computers as well as in games.
• The MIPS architecture now offers 32- and 64-bit versions.
Chapter 4: ISA 90
4.14 Real World Architectures
• MIPS was one of the first RISC microprocessors.
• The original MIPS architecture had only 55 different instructions, as compared with the 8086 which had over 100.
• MIPS was designed with performance in mind: It is a load/store architecture, meaning that only the load and store instructions can access memory.
• The large number of registers in the MIPS architecture keeps bus traffic to a minimum.
How does this design affect performance?
Chapter 4: ISA 91
• The major components of a computer system are its control unit, registers, memory, ALU, and data path.
• A built-in clock keeps everything synchronized.
• Control units can be microprogrammed or hardwired.
• Hardwired control units give better performance, while microprogrammed units are more adaptable to changes.
• Computers run programs through iterative fetch-decode-execute cycles.
• Computers can run programs that are in machine language.
• The Intel architecture is an example of a CISC architecture; MIPS is an example of a RISC architecture.
Chapter 4 Conclusion