Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

1Chapter 4: ISA

Chapter 4: Instruction Set Architectures

CS140 Computer Organization

These slides are derived from those of Null & Lobur + the work of others.

Considerable material from previous years gleaned from Patterson & Hennessy.

Chapter 4: ISA 2

Chapter 4 Objectives & Introduction

Objective:• Learn the components common to every modern computer system.

• Be able to explain how each component contributes to program execution.

• Understand an ISA and how it relates to a real architecture.

• Know how the program assembly process works.

Introduction:• Chapter 1 presented a general overview of computer systems.• Chapter 2 discussed how data is formatted, stored and manipulated.• Chapter 3 described the fundamentals of digital circuits.• With this, we can understand how computer components work, and

how they fit together to create useful computer systems.

Chapter 4: ISA 3

Computer Components

Processor

PCI Controller

Chipset MemoryFSB

PCI Bus

Video ControllerClock

Chapter 7

Chapter 6Chapters 4 & 5

FSB

Chapter 4: ISA 4

4.2 CPU Basics• The computer’s CPU fetches, decodes, and executes

program instructions.

• The two principal parts of the CPU are the datapath and the control unit.

– The datapath consists of an arithmetic-logic unit and storage units (registers) that are interconnected by a data bus that is also connected to main memory.

– Various CPU components perform sequenced operations according to signals provided by its control unit.

– The control unit determines which actions to carry out according to the values in a program counter register and a status register.

It’s like plumbing – the datapath is the pipes, the control units are the faucets and valves.

Chapter 4: ISA 5

4.3 The BusPARALLEL BUS• The CPU shares data with other system components

by way of a data bus.– A bus is a set of wires that simultaneously convey a

single bit along each line.• Two types of buses are commonly found in computer

systems: point-to-point, and multipoint buses.

This is a point-to-point bus configuration:

SERIAL BUS• Deferred to later when we discuss IO

Chapter 4: ISA 6

• Buses have data lines, control lines, and address lines.

• The data lines convey bits from one device to another,

• Control lines determine the direction of data flow, and when each device can access the bus.

• Address lines determine the location of the source or destination of the data.

4.3 The Bus

Chapter 4: ISA 7

• A multipoint bus is shown below.• A multipoint bus is a shared resource, access to it is

controlled through protocols, which are built into the hardware and handled by the control lines.

4.3 The Bus

Chapter 4: ISA 8

– Distributed using self-detection: Devices decide which gets the bus among themselves.

– Distributed using collision-detection: Any device can try to use the bus. If its data collides with the data of another device, it tries again.

– Daisy chain: Permissions are passed from the highest-priority device to the lowest.

– Centralized parallel: Each device is directly connected to an arbitration circuit.

• In a master-slave configuration, where more than one device can be the bus master, concurrent bus master requests must be arbitrated.

• Four categories of bus arbitration are:

4.3 The Bus

Chapter 4: ISA 9

4.4 Clocks• Every computer contains at least one clock that synchronizes the

activities of its components.

• A fixed number of clock cycles are required to carry out each data movement or computational operation.

• The clock frequency, measured in megahertz or gigahertz, determines the speed with which all operations are carried out.

• Clock cycle time is the reciprocal of clock frequency.• Typical Intel and typical PIC clocks look like this:

– A 2 GHz clock has a cycle time of 0.5 nanoseconds.– A 8 MHz clock has a cycle time of 0.125 microseconds.

• One master clock has multiple frequencies used for various parts of the system.

Chapter 4: ISA 10

• Clock speed should not be confused with CPU performance.

• The CPU time required to run a program is given by the general performance equation:

We can improve CPU performance when we – reduce the number of instructions in a program, – reduce the number of cycles per instruction, or – reduce the number of nanoseconds per clock cycle.

4.4 Clocks

Chapter 4: ISA 11

Digression: Sequential Logic, Clocking• Combinational circuits: no memory

• Output depends only on the inputs

• Sequential circuits: have memory• How to ensure memory element is updated neither too soon, nor too

late?• Recall hardware circuits

• Flip/flop register is the writable memory element• Gate propagation delay means result takes time to stabilize;

Delay varies with inputs• Must wait until result stable before we can write that output

register to the next stage - otherwise garbage results.• How to be certain ALU output is stable?

• Solution: let the inputs chatter and stabilize – THEN apply the clock.

4.4 Clocks

Chapter 4: ISA 12

• Clock: free running signal with fixed cycle time (clock period)

° Clock determines when to write memory element

• level-triggered - store clock high (low)

• edge-triggered - store only on clock edge

° We will use negative (falling) edge-triggered methodology

period rising edgefalling edge

high (1)

low (0)

4.4 ClocksDigression: Sequential Logic, Clocking

Chapter 4: ISA 13

Role of Clock in Processors

• single-cycle machine: does everything in one clock cycle• instruction execution = up to 5 steps•must complete 5th step before cycle ends

clocksignal

instruction executionstep 1/step 2/step 3/step 4/step 5

datapathstable

register(s) written

falling clock edgerising clock edge

4.4 Clocks

Chapter 4: ISA 14

4.5 The Input/Output Subsystem

• A computer communicates with the outside world through its input/output (I/O) subsystem.

• I/O devices connect to the CPU through various interfaces.

• I/O can be memory-mapped-- where the I/O device behaves like main memory from the CPU’s point of view.

• Or I/O can be instruction-based, where the CPU has a specialized I/O instruction set.

Chapter 4: ISA 15

4.6 Memory Organization• Computer memory is a linear array of addressable storage cells that

are similar to registers.• Memory can be byte-addressable (most common), or word-

addressable, where a word consists of two or more bytes.• Memory is constructed of RAM chips, often referred to in terms of

length width.• If the addressable-unit of the machine is 8 bits, then a 4M 8 RAM

chip gives us 4 megabytes of 8-bit memory locations.

Chapter 4: ISA 16

4.6 Memory Organization• How does the computer access a memory location

corresponds to a particular address?

• We see that 4 Megabytes = 222 bytes.

• The memory locations for this memory are numbered 0 through 2 22 -1.

• Thus, the memory bus of this system requires at least 22 address lines.– The address lines “count” from 0 to 222 - 1 in

binary. Each line is either “on” or “off” indicating the location of the desired memory element.

Power 2 ^ Power

0 1

1 2

2 4

3 8

4 16

5 32

6 64

7 128

8 256

9 512

10 1,024

11 2,048

12 4,096

13 8,192

14 16,384

15 32,768

16 65,536

17 131,072

18 262,144

19 524,288

20 1,048,576

21 2,097,152

22 4,194,304

Chapter 4: ISA 17

• Physical memory usually consists of more than one RAM chip.

• Access is more efficient when memory is organized into banks of chips with the addresses interleaved across the chips

• With low-order interleaving, the low order bits of the address specify which memory bank contains the address of interest.

4.6 Memory Organization

Low-Order Interleaving

Byte Addresses

Chapter 4: ISA 18

4.7 Interrupts• The normal execution of a program is altered when an event of higher-

priority occurs. The CPU is alerted to such an event through an interrupt.

• Interrupts can be triggered by I/O requests, arithmetic errors (such as division by zero), or when an invalid instruction is encountered.

• For general-purpose systems, it is common to disable all interrupts during the time in which an interrupt is being processed.– Typically, this is achieved by setting a bit in the flags register.– Interrupts that are ignored in this case are called maskable.

• Nonmaskable interrupts are high-priority interrupts that cannot be ignored. (the CPU is on fire is nonmaskable.)

• Each interrupt is associated with a procedure that directs the actions of the CPU when an interrupt occurs.

• In Chapter 7 we’ll look at interrupts in more detail.

Chapter 4: ISA 19

• Interrupt processing involves adding another step to the fetch-decode-execute cycle as shown below.

The next slide shows a flowchart of “Process the interrupt.”

4.7 Interrupts

Chapter 4: ISA 20

4.7 Interrupts

Chapter 4: ISA 21

4.9 Instruction Processing

1. Detour looking at Instruction Sets– How are instructions laid out so they can be

simply decoded?– Going from PIC to MIPS

2. Detour looking at Hardware– Multiplexers, registers, and ALUs

3. The datapath– Looking at each stage– The datapath and some sample instructions.

Chapter 4: ISA 22

These are the 35 instructions available on the PIC processors we’ve used.


There are four types of instructions and the formats for these instructions are defined here.

Byte oriented instructions have a 00 here.

Bit oriented instructions have a 01 here.

Detour looking at Instruction Sets

Goto and Call have a 10 Literal instructions have a 11

Chapter 4: ISA 23



There are four types of instructions and the formats for these instructions are defined here.

PIC is a RISC (Reduced Instruction Set Computer.) A relatively simple set of rules can decipher these instructions.

Conversely, Intel is a CISC (Complex Instruction Set Computer.) The wiring needed to decipher the thousands of Intel instructions is overwhelming.

Chapter 4: ISA 24

The MIPS Instruction Formats• All MIPS instructions are 32 bits long. The three instruction formats:

– R-type

– I-type

– J-type

• The different fields are:– op: operation of the instruction– rs, rt, rd: the source and destination register specifiers– shamt: shift amount– funct: selects the variant of the operation in the “op” field– address / immediate: address offset or immediate value– target address: target address of the jump instruction

op target address

02631

6 bits 26 bits

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits



Hey – these are the same flavors as the PIC instructions.

MIPS is a RISCIt has about 75 instructions.

Chapter 4: ISA 25

The MIPS-lite Subset for Today• ADD and SUB

–addu rd, rs, rt–subu rd, rs, rt

• OR Immediate:

–ori rt, rs, imm16• LOAD / STORE Word

–lw rt, rs, imm16–sw rt, rs, imm16

• BRANCH:

–beq rs, rt, imm16


061116212631


op rs rt immediate

016212631


op rs rt immediate

016212631


op rs rt immediate

016212631




MIPS–Lite has 6 instructions.

Chapter 4: ISA 26

The MIPS-lite Subset for Today



Things to Note:1. Since each instruction

is 32 bits, instruction addresses are mod 4.

2. These instructions access 3 registers.

3. The ori can use a 16 bit immediate.

4. Each register needs 5 bits to specify it – how many registers does this machine have?

Chapter 4: ISA 27

The MIPS-lite Subset for Today



Chapter 4: ISA 28






Chapter 4: ISA 29

D-Latches

° C = 0, no change of state;

• Q (t + t ) = Q (t )

° C = 1, change is allowed;

• Q (t + t ) = D (t )

• No Indetermined Output

• D-latch based on SR-Latch with NAND Gates and control input C

Basic Building BlocksDetour looking at

Hardware 4.9 Instruction Processing

Chapter 4: ISA 30

Basic Building Blocks

Adder

32A

B32

Y32

Select

MU

X

32

32

A

B32

Result

OP

AL

U

32

32

A

B32

Sum

Carry

Ad

der

CarryIn

ALUMUX

Detour looking at Hardware


Chapter 4: ISA 31

Storage Element: Register File

• Register File consists of 32 registers:– Two 32-bit output busses:

busA and busB– One 32-bit input bus: busW

• Register is selected by:– RA (number) selects the register to put on busA (data)– RB (number) selects the register to put on busB (data)– RW (number) selects the register to be written

via busW (data) when Write Enable is 1

• Clock input (CLK) – The CLK input is a factor ONLY during write operation– During read operation, behaves as a combinational logic

block:• RA or RB valid busA or busB valid after “access

time.”

Write Enable

Clk

busW32

32busA

32busB

5 5 5RWRA RB

32 32-bitRegisters



Chapter 4: ISA 32

Multiplexer

0

31

Input bitsSelector

Output

Selector

Outputs

Input

0

31

The 5 selector wires can choose one of the 32 inputs voltages and send it to the output.

The 5 selector wires choose which of the 32 outputs will get the input voltage.

Decoder



Chapter 4: ISA 33

Multiplexer

0

31

Input words

Selector

Output

………………………………………………………………………………………………………………………

………………………… …………………………

Side ViewEnd View

Word 0

Word 31

Bit 0 Bit 31

……………

End View

Bit 0 Bit 31

Now, each of the 32 inputs has 32 bits. There are 32 x 32 bits in and 1 x 32 bits out.

This multiplexer is equivalent to 32 of those on the previous page

Each of these input words COULD be a register!



Chapter 4: ISA 34

Multiplexer

0

31

Input words

Selector

Output



Write Enable

Clk

busW32

32busA

32busB

5 5 5RWRA RB

32 32-bitRegisters

So this register file is just the multiplexer shown here.

Chapter 4: ISA 35

Storage Element: Idealized Memory• Memory (idealized)

– One input bus: Data In– One output bus: Data Out

• Memory word is selected by:– Address selects the word to put on Data Out– Write Enable = 1: address selects the memory

word to be written via the Data In bus

• Clock input (CLK) – The CLK input is a factor ONLY during write operation– During read operation, behaves as a combinational logic

block:• Address valid Data Out valid after “access time.”

Clk

Data In

Write Enable

32 32DataOut

Address



Chapter 4: ISA 36






Chapter 4: ISA 37


• The fetch-decode-execute-store cycle is the series of steps that a computer carries out when it runs a program.

• We first have to fetch an instruction from memory, and place it into the Instruction Register (IR).

• Once in the IR, it is decoded to determine what needs to be done next.

• If an immediate operand is involved in the operation, it is retrieved and prepared for execution.

• With everything in place, the instruction is executed.• If a result is to be stored in memory, that’s done next.• If a result is placed in a register, that’s the last stage.

The next slide shows a flowchart of this process.

Chapter 4: ISA 38

Generic Steps: DatapathPC

inst

ruct

ion

mem

ory

+4

rtrs

rd

reg

iste

rs

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read

3. Execute4. Memory5. Reg.

Write


Chapter 4: ISA 39

Stages of the Datapath (1/6)

Problem: a single, atomic block which “executes an instruction” (performs all necessary operations beginning with fetching the instruction) would be too bulky and inefficient

Solution: break up the process of “executing an instruction” into stages, and then connect the stages to create the whole datapath

Smaller stages are easier to design Easy to optimize (change) one stage without

touching the others

4.9 Instruction Processing P

C

inst

ruct

ion

mem

ory

+4

rtrsrd

regis

ters

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read


Write

Chapter 4: ISA 40


There is a wide variety of MIPS instructions: so what general steps do they have in common?

Stage 1: instruction fetch

No matter what the instruction, the 32-bit instruction word must first be fetched from memory (the cache-memory hierarchy)

Also, this is where we increment PC (that is, PC = PC + 4, to point to the next instruction: byte addressing so + 4)


PC

inst

ruct

ion

mem

ory

+4

rtrsrd

regis

ters

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read


Write

Chapter 4: ISA 41


Stage 2: Instruction Decode upon fetching the instruction, we next gather data from

the fields (decode all necessary instruction data) first, read the Opcode to determine instruction type and

field lengths second, read in data from all necessary registers

-for add, read two registers-for addi, read one register-for jal, no reads necessary


PC

inst

ruct

ion

mem

ory

+4

rtrsrd

regis

ters

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read


Write

Chapter 4: ISA 42


°Stage 3: ALU (Arithmetic-Logic Unit)

the real work of most instructions is done here: arithmetic (+, -, *, /), shifting, logic (&, |), comparisons (slt)what about loads and stores?

-lw $t0, 40($t1)-the address we are accessing in memory = the value in $t1 + the value 40-so we do this addition in this stage


PC

inst

ruct

ion

mem

ory

+4

rtrsrd

regis

ters

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read


Write

Chapter 4: ISA 43


°Stage 4: Memory Access

actually only the load and store instructions do anything during this stage; the others remain idle

since these instructions have a unique step, we need this extra stage to account for them

as a result of the cache system, this stage is expected to be just as fast (on average) as the others


PC

inst

ruct

ion

mem

ory

+4

rtrsrd

regis

ters

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read


Write

Chapter 4: ISA 44


°Stage 5: Register Write most instructions write the result of some

computation into a register examples: arithmetic, logical, shifts, loads, slt what about stores, branches, jumps?

-don’t write anything into a register at the end-these remain idle during this fifth stage


PC

inst

ruct

ion

mem

ory

+4

rtrsrd

regis

ters

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read


Write

Chapter 4: ISA 45

Generic Steps: Datapath

PC

inst

ruct

ion

mem

ory

+4

rtrs

rd

reg

iste

rs

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read


Write


Chapter 4: ISA 46






Chapter 4: ISA 47

Datapath Walkthrough #1 - add

add $r3, $r1, $r2 # r3 = r1+r2 Stage 1: fetch this instruction, incr. PC;

Stage 2: decode to find it’s an add, read registers $r1 and $r2;

Stage 3: add the two values retrieved in Stage 2;

Stage 4: idle (nothing to write to memory);

Stage 5: write result of Stage 3 into register $r3;


Chapter 4: ISA 48

PC

inst

ruct

ion

mem

ory

+4re

gis

ters

ALU

Data

mem

ory

imm

2

1

3

ad

d r

3,

r1,

r2

reg[1]+reg[2]

reg[2]

reg[1]

Datapath Walkthrough #1 add $r3, $r1, $r2 # r3 = r1+r2


Chapter 4: ISA 49

Datapath Walkthroughs #2 - sw

sw $r3, 17($r1) Stage 1: fetch this instruction, inc. PC

Stage 2: decode to find it’s a sw, then read registers $r1 and $r3

Stage 3: add 17 to value in register $41 (retrieved in Stage 2)

Stage 4: write value in register $r3 (retrieved in Stage 2) into memory address computed in Stage 3

Stage 5: go idle (nothing to write into a register)


Chapter 4: ISA 50

PC

inst

ruct

ion

mem

ory

+4re

gis

ters

ALU

Data

mem

ory

imm

3

1

x

SW

r3

, 1

7(r

1)

reg[1]+17

17

reg[1]

ME

M[r

1+

17

]<=

r3

reg[3]

Datapath Walkthroughs #2 sw $r3, 17($r1)


Chapter 4: ISA 51

lw $r3, 17($r1) Stage 1: fetch this instruction, inc. PC Stage 2: decode to find it’s a lw, then read register $r1

Stage 3: add 17 to value in register $r1 (retrieved in Stage 2)

Stage 4: read value from memory address compute in Stage 3

Stage 5: write value found in Stage 4 into register $r3

Datapath Walkthroughs #3 – lwNOTE: This is the one instruction

that requires all 5 stages


Chapter 4: ISA 52

PC

inst

ruct

ion

mem

ory

+4re

gis

ters

ALU

Data

mem

ory

imm

3

1

x

LW r

3,

17

(r1

)reg[1]+17

17

reg[1]

ME

M[r

1+

17

]

Datapath Walkthroughs #3 lw $r3, 17($r1)


Chapter 4: ISA 53

Datapath Summary°The datapath based on data transfers required to perform

instructions

°A controller causes the right transfers to happen

PC

inst

ruct

ion

mem

ory

+4

rtrs

rd

reg

iste

rs

ALU

Data

mem

ory

imm

Controller

opcode, funct

Chapter 4: ISA 54

4.13 Decoding & Control

1. An overview of how control works from instructions to electronics– How are instructions laid out so they can be


2. Control operations– Samples of load, store and branch

3. The fetch unit in detail using “add” as example.

4. Operation of controls using or immediate, store, branch

Chapter 4: ISA 55

Mapping Code onto the DataPath: How does it all work?We Start With Code

############################################################## This is code that mimics the following C program.# main( )# {# printf( "Hello World\n" );# }# ###########################################################

.text

.globl main

main:lui $a0, helloori $v0, $0, 4 # li $v0, 4add $t0, $t1, $t2syscalljr $ra

.datahello:

.asciiz "Hello World\n"

4.13 Decoding

Chapter 4: ISA 56

From that code we get a 32-bit Equivalent Binary Representation.

Address Hex Op-code Mnemonic

[0x00400020] 0x3c021001 lui $2, 4097 ; 12: la $a0, hello

[0x00400024] 0x34440000 ori $4, $2, 0 ;

[0x00400028] 0x34020004 ori $2, $0, 4 ; 14: ori $v0, $0, 4

[0x0040002c] 0x012a4020 add $8, $9, $10 ; 14: add $t0,$t1,$t2

[0x00400030] 0x0000000c syscall ; 15: syscall

[0x00400034] 0x03e00008 jr $31 ; 16: jr $ra 0 1 2 a 4 0 2 0

0000 0001 0010 1010 0100 0000 0010 0000

000000 01001 01010 01000 00000 100000 0 9 10 8 0 32

4.13 Decoding

Chapter 4: ISA 57

What the hardware looks like.

Registers

R0

R8

R31

MUX To ALU

MUX From ALU

ALUMUX To ALU

Op

A B

Out

Select5 wires

6 wires

ovfc

4.13 Decoding

Chapter 4: ISA 58

What the hardware looks like.Registers

R0

R8

R31

MUX To ALU

MUX From ALU

ALUMUX To ALU

A B

Out

Op4.13 Decoding

Chapter 4: ISA 59

The hardware reads each of those fields.

RegistersR0

R8

R31

MUX To ALU

MUX From ALU

ALUMUX To ALU

Out

000000 01001 01010 01000 00000 100000 0 9 10 8 0 32

A B

4.13 Decoding

Chapter 4: ISA 60

An Overview of the Implementation

DataOut

Clk

5

Rw Ra Rb32 32-bitRegisters

Rd

AL

U

Clk

Data In

DataAddress Ideal

DataMemory

Instruction

InstructionAddress

IdealInstruction

Memory

Clk

PC

5Rs

5Rt

32

323232

A

B

Nex

t A

dd

ress

Control

Datapath

Control Signals Conditions


Chapter 4: ISA 61







Chapter 4: ISA 62

Overview of the Instruction Fetch Unit

• The common operations– Fetch the Instruction: mem[PC]– Update the program counter:

• Sequential Code: PC PC + 4

• Branch and Jump: PC “something else”

32

Instruction WordAddress

InstructionMemory

PCClk

Next AddressLogic

4.13 Decoding & Control Control Samples

Chapter 4: ISA 63

Add & Subtract

R[rd] R[rs] op R[rt]; Example: addu rd, rs, rt– Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields– ALUctr and RegWr: control logic after decoding the instruction

32

Result

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

5 5 5

Rw Ra Rb

32 32-bitRegisters

Rs RtRd

AL

Uop rs rt rd shamt funct

061116212631



Chapter 4: ISA 64

Logical Operations With Immediate

• R[rt] R[rs] op ZeroExt[ imm16 ]

11

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits rd?

immediate

016 1531

16 bits16 bits

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

32

Result

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

5 5 5


Rs

ZeroE

xt

Mu

x

RtRdRegDst

Mux

3216imm16

ALUSrc

AL

U

Rt?


Chapter 4: ISA 65

Load Operations

• R[rt] Mem[R[rs] + SignExt[imm16]]; Example: lw rt, rs, imm16

11

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits rd

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

5 5 5


Rs

RtRdRegDst

Exten

der

Mu

x

Mux

3216

imm16

ALUSrc

ExtOp

Clk

Data InWrEn

32

Adr

DataMemory

32

AL

U

MemWr Mu

x

W_Src

??

Rt?


Chapter 4: ISA 66

Store Operations

• Mem[ R[rs] + SignExt[imm16] R[rt] ]; Example: sw rt, rs, imm16

op rs rt immediate

016212631


32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5


Rs

Rt

Rt

RdRegDst

Exten

der

Mu

x

Mux

3216imm16

ALUSrcExtOp

Clk

Data InWrEn

32Adr

DataMemory

MemWr

AL

U

32

Mu

x

W_Src


Chapter 4: ISA 67

The Branch Instruction

•beq rs, rt, imm16– mem[PC] Fetch the instruction from memory

– Equal R[rs] == R[rt] Calculate the branch condition

– if (Equal) Calculate the next instruction’s address• PC PC + 4 + ( SignExt(imm16) 4 )

– else• PC PC + 4

op rs rt immediate

016212631



Chapter 4: ISA 68

Datapath for Branch Operations

• beq rs, rt, imm16 Datapath generates condition (equal)

op rs rt immediate

016212631


32

imm16P

C

Clk

00

Ad

der

Mu

x

Ad

der

4nPC_sel

Clk

busW

RegWr

32

busA

32busB

5 5 5


Rs Rt

Eq

ual

?

Cond

PC

Ext

Inst Address


Chapter 4: ISA 69

Summary: A Single Cycle Datapathim

m16

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5


Rs

Rt

Rt

RdRegDst

Exten

der

Mu

x

3216imm16

ALUSrcExtOp

Mu

x

MemtoReg

Clk

Data InWrEn32 Adr

DataMemory

MemWr

AL

U

Equal

Instruction<31:0>

0

1

0

1

01

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRtRs

=

Ad

der

Ad

der

PC

Clk

00

Mu

x

4

nPC_sel

PC

Ext

Adr

InstMemory

4.13 Decoding & Control• Rs, Rt, Rd and

Imed16 hardwired into datapath from Fetch Unit

• We have everything except control signals (the underlined pieces)

Chapter 4: ISA 70

Summary: Meaning of the Control Signals• ExtOp: “zero”, “sign”

• ALUsrc: 0 regB; 1 immed

• ALUctr: “add”, “sub”, “or”

° MemWr: 1 write memory

° MemtoReg: 0 ALU; 1 Mem

° RegDst: 0 “rt”; 1 “rd”

° RegWr: 1 write register

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5


Rs

Rt

Rt

RdRegDst

Exten

der

Mu

x

3216imm16

ALUSrcExtOp

Mu

x

MemtoReg

Clk

Data InWrEn32 Adr

DataMemory

MemWr

AL

U

Equal

0

1

0

1

01

=


Chapter 4: ISA 71







Chapter 4: ISA 72

The add Instruction

•add rd, rs, rt

mem[PC] Fetch the instruction from memory

R[rd] R[rs] + R[rt] The actual operation

PC PC + 4 Calculate next instruction address


061116212631



Chapter 4: ISA 73

The Fetch Unit• nPC_sel: 0 PC PC + 4

1 PC PC + 4 + SignExt(Im16) || 00

Adr

InstMemory

Ad

der

Ad

der

PC

Clk

00

Mu

x

4

nPC_sel

PC

Extim

m16


Chapter 4: ISA 74

Fetch Unit at Beginning (and end) of add• Fetch the instruction from

Instruction memory: Instruction mem[PC] (This is the same for all instructions)

• But wait until we get to branch instructions!!

Adr

InstMemory

Ad

der

Ad

der

PC

Clk

00

Mu

x

4

nPC_sel

imm

16

Instruction<31:0>

0

1


Chapter 4: ISA 75







Chapter 4: ISA 76

The Single Cycle Datapath

during Or Immediate

R[rt] R[rs] or ZeroExt(Imm16)op rs rt immediate

016212631

32

ALUctr =

Clk

busW

RegWr =

3232

busA

32busB

55 5


Rs

Rt

Rt

RdRegDst =

Exten

der

Mu

x

Mux

3216imm16

ALUSrc =

ExtOp =

Mu

x

MemtoReg =

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = A

LU

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

0

1

0

1

01<

21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel =


Chapter 4: ISA 77

R[rt] R[rs] or ZeroExt(Imm16)op rs rt immediate

016212631

32

ALUctr = Or

Clk

busW

RegWr = 1

3232

busA

32busB

55 5


Rs

Rt

Rt

RdRegDst = 0

Exten

der

Mu

x

Mux

3216imm16

ALUSrc = 1

ExtOp = 0

Mu

x

MemtoReg = 0

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 0A

LU


Clk

Zero

Instruction<31:0>

0

1

0

1

01<

21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel= +4

The Single Cycle Datapath during Or Immediate


Chapter 4: ISA 78

Data Memory {R[rs] + SignExt[imm16]} R[rt]

op rs rt immediate

016212631

32

ALUctr =

Clk

busW

RegWr =

3232

busA

32busB

55 5


Rs

Rt

Rt

RdRegDst =

Exten

der

Mu

x

Mux

3216imm16

ALUSrc =

ExtOp =

Mu

x

MemtoReg =

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = A

LU


Clk

Zero

Instruction<31:0>

0

1

0

1

01<

21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel =

The Single Cycle Datapath during Store


Chapter 4: ISA 79

Instruction<31:0>

op rs rt immediate

016212631

32

ALUctr = Add

Clk

busW

RegWr = 0

3232

busA

32busB

55 5


Rs

Rt

Rt

RdRegDst = x

Exten

der

Mu

x

Mux

3216imm16

ALUSrc = 1

ExtOp = 1

Mu

x

MemtoReg = x

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 1A

LU


Clk

Zero

0

1

0

1

01

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel= +4

Data Memory {R[rs] + SignExt[imm16]} R[rt]

The Single Cycle Datapath during Store


Chapter 4: ISA 80

if (R[rs] – R[rt] == 0) then Zero 1; else Zero 0op rs rt immediate

016212631

32

ALUctr =Sub

Clk

busW

RegWr = 0

3232

busA

32busB

55 5


Rs

Rt

Rt

RdRegDst = x

Exten

der

Mu

x

Mux

3216imm16

ALUSrc = 0

ExtOp = x

Mu

x

MemtoReg = x

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 0A

LU


Clk

Zero

Instruction<31:0>

0

1

0

1

01<

21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel= “Br”

The Single Cycle Datapath during Branch


Chapter 4: ISA 81

if (Zero == 1) then PC = PC + 4 + SignExt(imm16)4 ; else PC = PC + 4

op rs rt immediate

016212631

° What is encoding of nPC_sel?

• Direct MUX select?• Branch / not branch

° Let’s choose second option

nPC_sel zero? MUX0 x 01 0 01 1 1

Adr

InstMemory

Ad

der

Ad

der

PC

Clk

00

Mu

x

4

nPC_sel

imm

16

Instruction<31:0>

0

1

Zero

The instruction fetch unitat end of branch


Chapter 4: ISA 82

Summary: A Single Cycle Processor

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5


Rs

Rt

Rt

RdRegDst

Exten

der

Mu

x

Mux

3216imm16

ALUSrc

ExtOp

Mu

x

MemtoReg

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr

AL

U


Clk

Zero

Instruction<31:0>

0

1

0

1

01

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

MainControl

op6

ALUControlfunc

6

3ALUop

ALUctr3

RegDst

ALUSrc

:Instr<5:0>

Instr<31:26>

Instr<15:0>

nPC_sel


Chapter 4: ISA 83

Drawback of this Single Cycle Processor• Long cycle time:

– Cycle time must be long enough for the load instruction:

PC’s Clock -to-Q +

Instruction Memory Access Time +

Register File Access Time +

ALU Delay (address calculation) +

Data Memory Access Time +

Register File Setup Time +

Clock Skew• Cycle time for load is much longer than needed for all

other instructions


Chapter 4: ISA 84

4.14 Real World Architectures

• We will look at an Intel architecture, which is a CISC machine and MIPS, which is a RISC machine.– CISC is an acronym for complex instruction set

computer.– RISC stands for reduced instruction set computer.

Chapter 4: ISA 85


• The classic Intel architecture, the 8086, was born in 1979. It is a CISC architecture.

• It was adopted by IBM for its famed PC, which was released in 1981.

• The 8086 operated on 16-bit data words and supported 20-bit memory addresses.

• Later, to lower costs, the 8-bit 8088 was introduced. Like the 8086, it used 20-bit memory addresses.

What was the largest memory that the 8086 could address?

Chapter 4: ISA 86

4.14 Real World Architectures• The 8086 had four 16-bit general-purpose registers that

could be accessed by the half-word.

• It also had a flags register, an instruction register, and a stack accessed through the values in two other registers, the base pointer and the stack pointer.

• The 8086 had no built in floating-point processing.

• In 1980, Intel released the 8087 numeric coprocessor, but few users elected to install them because of their cost.

Chapter 4: ISA 87


• In 1985, Intel introduced the 32-bit 80386.

• It also had no built-in floating-point unit.

• The 80486, introduced in 1989, was an 80386 that had built-in floating-point processing and cache memory.

• The 80386 and 80486 offered downward compatibility with the 8086 and 8088.

• Software written for the smaller word systems was directed to use the lower 16 bits of the 32-bit registers.

Chapter 4: ISA 88


• Currently, Intel’s most advanced 32-bit microprocessor is the Pentium 4.

• It can run as fast as 3.8 GHz. This clock rate is nearly 800 times faster than the 4.77 MHz of the 8086.

• Speed enhancing features include multilevel cache and instruction pipelining.

• Intel, along with many others, is marrying many of the ideas of RISC architectures with microprocessors that are largely CISC.

Chapter 4: ISA 89


• The MIPS family of CPUs has been one of the most successful in its class.

• In 1986 the first MIPS CPU was announced.

• It had a 32-bit word size and could address 4GB of memory.

• Over the years, MIPS processors have been used in general purpose computers as well as in games.

• The MIPS architecture now offers 32- and 64-bit versions.

Chapter 4: ISA 90


• MIPS was one of the first RISC microprocessors.

• The original MIPS architecture had only 55 different instructions, as compared with the 8086 which had over 100.

• MIPS was designed with performance in mind: It is a load/store architecture, meaning that only the load and store instructions can access memory.

• The large number of registers in the MIPS architecture keeps bus traffic to a minimum.

How does this design affect performance?

Chapter 4: ISA 91

• The major components of a computer system are its control unit, registers, memory, ALU, and data path.

• A built-in clock keeps everything synchronized.

• Control units can be microprogrammed or hardwired.

• Hardwired control units give better performance, while microprogrammed units are more adaptable to changes.

• Computers run programs through iterative fetch-decode-execute cycles.

• Computers can run programs that are in machine language.

• The Intel architecture is an example of a CISC architecture; MIPS is an example of a RISC architecture.

Chapter 4 Conclusion

Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Documents

bus slide

data bus

bus parallel bus

master clock

serial bus

clock edge

clock speed

clock cycle time