Top Banner
1 Chapter 4: ISA Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of others. Considerable material from previous years gleaned from Patterson & Hennessy.
91

Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Mar 29, 2015

Download

Documents

Cooper Moris
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

1Chapter 4: ISA

Chapter 4: Instruction Set Architectures

CS140 Computer Organization

These slides are derived from those of Null & Lobur + the work of others.

Considerable material from previous years gleaned from Patterson & Hennessy.

Page 2: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 2

Chapter 4 Objectives & Introduction

Objective:• Learn the components common to every modern computer system.

• Be able to explain how each component contributes to program execution.

• Understand an ISA and how it relates to a real architecture.

• Know how the program assembly process works.

Introduction:• Chapter 1 presented a general overview of computer systems.• Chapter 2 discussed how data is formatted, stored and manipulated.• Chapter 3 described the fundamentals of digital circuits.• With this, we can understand how computer components work, and

how they fit together to create useful computer systems.

Page 3: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 3

Computer Components

Processor

PCI Controller

Chipset MemoryFSB

PCI Bus

Video ControllerClock

Chapter 7

Chapter 6Chapters 4 & 5

FSB

Page 4: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 4

4.2 CPU Basics• The computer’s CPU fetches, decodes, and executes

program instructions.

• The two principal parts of the CPU are the datapath and the control unit.

– The datapath consists of an arithmetic-logic unit and storage units (registers) that are interconnected by a data bus that is also connected to main memory.

– Various CPU components perform sequenced operations according to signals provided by its control unit.

– The control unit determines which actions to carry out according to the values in a program counter register and a status register.

It’s like plumbing – the datapath is the pipes, the control units are the faucets and valves.

Page 5: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 5

4.3 The BusPARALLEL BUS• The CPU shares data with other system components

by way of a data bus.– A bus is a set of wires that simultaneously convey a

single bit along each line.• Two types of buses are commonly found in computer

systems: point-to-point, and multipoint buses.

This is a point-to-point bus configuration:

SERIAL BUS• Deferred to later when we discuss IO

Page 6: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 6

• Buses have data lines, control lines, and address lines.

• The data lines convey bits from one device to another,

• Control lines determine the direction of data flow, and when each device can access the bus.

• Address lines determine the location of the source or destination of the data.

4.3 The Bus

Page 7: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 7

• A multipoint bus is shown below.• A multipoint bus is a shared resource, access to it is

controlled through protocols, which are built into the hardware and handled by the control lines.

4.3 The Bus

Page 8: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 8

– Distributed using self-detection: Devices decide which gets the bus among themselves.

– Distributed using collision-detection: Any device can try to use the bus. If its data collides with the data of another device, it tries again.

– Daisy chain: Permissions are passed from the highest-priority device to the lowest.

– Centralized parallel: Each device is directly connected to an arbitration circuit.

• In a master-slave configuration, where more than one device can be the bus master, concurrent bus master requests must be arbitrated.

• Four categories of bus arbitration are:

4.3 The Bus

Page 9: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 9

4.4 Clocks• Every computer contains at least one clock that synchronizes the

activities of its components.

• A fixed number of clock cycles are required to carry out each data movement or computational operation.

• The clock frequency, measured in megahertz or gigahertz, determines the speed with which all operations are carried out.

• Clock cycle time is the reciprocal of clock frequency.• Typical Intel and typical PIC clocks look like this:

– A 2 GHz clock has a cycle time of 0.5 nanoseconds.– A 8 MHz clock has a cycle time of 0.125 microseconds.

• One master clock has multiple frequencies used for various parts of the system.

Page 10: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 10

• Clock speed should not be confused with CPU performance.

• The CPU time required to run a program is given by the general performance equation:

We can improve CPU performance when we – reduce the number of instructions in a program, – reduce the number of cycles per instruction, or – reduce the number of nanoseconds per clock cycle.

4.4 Clocks

Page 11: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 11

Digression: Sequential Logic, Clocking• Combinational circuits: no memory

• Output depends only on the inputs

• Sequential circuits: have memory• How to ensure memory element is updated neither too soon, nor too

late?• Recall hardware circuits

• Flip/flop register is the writable memory element• Gate propagation delay means result takes time to stabilize;

Delay varies with inputs• Must wait until result stable before we can write that output

register to the next stage - otherwise garbage results.• How to be certain ALU output is stable?

• Solution: let the inputs chatter and stabilize – THEN apply the clock.

4.4 Clocks

Page 12: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 12

• Clock: free running signal with fixed cycle time (clock period)

° Clock determines when to write memory element

• level-triggered - store clock high (low)

• edge-triggered - store only on clock edge

° We will use negative (falling) edge-triggered methodology

period rising edgefalling edge

high (1)

low (0)

4.4 ClocksDigression: Sequential Logic, Clocking

Page 13: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 13

Role of Clock in Processors

• single-cycle machine: does everything in one clock cycle• instruction execution = up to 5 steps•must complete 5th step before cycle ends

clocksignal

instruction executionstep 1/step 2/step 3/step 4/step 5

datapathstable

register(s) written

falling clock edgerising clock edge

4.4 Clocks

Page 14: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 14

4.5 The Input/Output Subsystem

• A computer communicates with the outside world through its input/output (I/O) subsystem.

• I/O devices connect to the CPU through various interfaces.

• I/O can be memory-mapped-- where the I/O device behaves like main memory from the CPU’s point of view.

• Or I/O can be instruction-based, where the CPU has a specialized I/O instruction set.

Page 15: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 15

4.6 Memory Organization• Computer memory is a linear array of addressable storage cells that

are similar to registers.• Memory can be byte-addressable (most common), or word-

addressable, where a word consists of two or more bytes.• Memory is constructed of RAM chips, often referred to in terms of

length width.• If the addressable-unit of the machine is 8 bits, then a 4M 8 RAM

chip gives us 4 megabytes of 8-bit memory locations.

Page 16: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 16

4.6 Memory Organization• How does the computer access a memory location

corresponds to a particular address?

• We see that 4 Megabytes = 222 bytes.

• The memory locations for this memory are numbered 0 through 2 22 -1.

• Thus, the memory bus of this system requires at least 22 address lines.– The address lines “count” from 0 to 222 - 1 in

binary. Each line is either “on” or “off” indicating the location of the desired memory element.

Power 2 ^ Power

0 1

1 2

2 4

3 8

4 16

5 32

6 64

7 128

8 256

9 512

10 1,024

11 2,048

12 4,096

13 8,192

14 16,384

15 32,768

16 65,536

17 131,072

18 262,144

19 524,288

20 1,048,576

21 2,097,152

22 4,194,304

Page 17: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 17

• Physical memory usually consists of more than one RAM chip.

• Access is more efficient when memory is organized into banks of chips with the addresses interleaved across the chips

• With low-order interleaving, the low order bits of the address specify which memory bank contains the address of interest.

4.6 Memory Organization

Low-Order Interleaving

Byte Addresses

Page 18: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 18

4.7 Interrupts• The normal execution of a program is altered when an event of higher-

priority occurs. The CPU is alerted to such an event through an interrupt.

• Interrupts can be triggered by I/O requests, arithmetic errors (such as division by zero), or when an invalid instruction is encountered.

• For general-purpose systems, it is common to disable all interrupts during the time in which an interrupt is being processed.– Typically, this is achieved by setting a bit in the flags register.– Interrupts that are ignored in this case are called maskable.

• Nonmaskable interrupts are high-priority interrupts that cannot be ignored. (the CPU is on fire is nonmaskable.)

• Each interrupt is associated with a procedure that directs the actions of the CPU when an interrupt occurs.

• In Chapter 7 we’ll look at interrupts in more detail.

Page 19: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 19

• Interrupt processing involves adding another step to the fetch-decode-execute cycle as shown below.

The next slide shows a flowchart of “Process the interrupt.”

4.7 Interrupts

Page 20: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 20

4.7 Interrupts

Page 21: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 21

4.9 Instruction Processing

1. Detour looking at Instruction Sets– How are instructions laid out so they can be

simply decoded?– Going from PIC to MIPS

2. Detour looking at Hardware– Multiplexers, registers, and ALUs

3. The datapath– Looking at each stage– The datapath and some sample instructions.

Page 22: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 22

These are the 35 instructions available on the PIC processors we’ve used.

4.9 Instruction Processing

There are four types of instructions and the formats for these instructions are defined here.

Byte oriented instructions have a 00 here.

Bit oriented instructions have a 01 here.

Detour looking at Instruction Sets

Goto and Call have a 10 Literal instructions have a 11

Page 23: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 23

Detour looking at Instruction Sets

4.9 Instruction Processing

There are four types of instructions and the formats for these instructions are defined here.

PIC is a RISC (Reduced Instruction Set Computer.) A relatively simple set of rules can decipher these instructions.

Conversely, Intel is a CISC (Complex Instruction Set Computer.) The wiring needed to decipher the thousands of Intel instructions is overwhelming.

Page 24: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 24

The MIPS Instruction Formats• All MIPS instructions are 32 bits long. The three instruction formats:

– R-type

– I-type

– J-type

• The different fields are:– op: operation of the instruction– rs, rt, rd: the source and destination register specifiers– shamt: shift amount– funct: selects the variant of the operation in the “op” field– address / immediate: address offset or immediate value– target address: target address of the jump instruction

op target address

02631

6 bits 26 bits

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

Detour looking at Instruction Sets

4.9 Instruction Processing

Hey – these are the same flavors as the PIC instructions.

MIPS is a RISCIt has about 75 instructions.

Page 25: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 25

The MIPS-lite Subset for Today• ADD and SUB

–addu rd, rs, rt–subu rd, rs, rt

• OR Immediate:

–ori rt, rs, imm16• LOAD / STORE Word

–lw rt, rs, imm16–sw rt, rs, imm16

• BRANCH:

–beq rs, rt, imm16

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

Detour looking at Instruction Sets

4.9 Instruction Processing

MIPS–Lite has 6 instructions.

Page 26: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 26

The MIPS-lite Subset for Today

Detour looking at Instruction Sets

4.9 Instruction Processing

Things to Note:1. Since each instruction

is 32 bits, instruction addresses are mod 4.

2. These instructions access 3 registers.

3. The ori can use a 16 bit immediate.

4. Each register needs 5 bits to specify it – how many registers does this machine have?

Page 27: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 27

The MIPS-lite Subset for Today

Detour looking at Instruction Sets

4.9 Instruction Processing

Page 28: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 28

4.9 Instruction Processing

1. Detour looking at Instruction Sets– How are instructions laid out so they can be

simply decoded?– Going from PIC to MIPS

2. Detour looking at Hardware– Multiplexers, registers, and ALUs

3. The datapath– Looking at each stage– The datapath and some sample instructions.

Page 29: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 29

D-Latches

° C = 0, no change of state;

• Q (t + t ) = Q (t )

° C = 1, change is allowed;

• Q (t + t ) = D (t )

• No Indetermined Output

• D-latch based on SR-Latch with NAND Gates and control input C

Basic Building BlocksDetour looking at

Hardware 4.9 Instruction Processing

Page 30: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 30

Basic Building Blocks

Adder

32A

B32

Y32

Select

MU

X

32

32

A

B32

Result

OP

AL

U

32

32

A

B32

Sum

Carry

Ad

der

CarryIn

ALUMUX

Detour looking at Hardware

4.9 Instruction Processing

Page 31: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 31

Storage Element: Register File

• Register File consists of 32 registers:– Two 32-bit output busses:

busA and busB– One 32-bit input bus: busW

• Register is selected by:– RA (number) selects the register to put on busA (data)– RB (number) selects the register to put on busB (data)– RW (number) selects the register to be written

via busW (data) when Write Enable is 1

• Clock input (CLK) – The CLK input is a factor ONLY during write operation– During read operation, behaves as a combinational logic

block:• RA or RB valid busA or busB valid after “access

time.”

Write Enable

Clk

busW32

32busA

32busB

5 5 5RWRA RB

32 32-bitRegisters

Basic Building BlocksDetour looking at

Hardware 4.9 Instruction Processing

Page 32: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 32

Multiplexer

0

31

Input bitsSelector

Output

Selector

Outputs

Input

0

31

The 5 selector wires can choose one of the 32 inputs voltages and send it to the output.

The 5 selector wires choose which of the 32 outputs will get the input voltage.

Decoder

Basic Building BlocksDetour looking at

Hardware 4.9 Instruction Processing

Page 33: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 33

Multiplexer

0

31

Input words

Selector

Output

………………………………………………………………………………………………………………………

………………………… …………………………

Side ViewEnd View

Word 0

Word 31

Bit 0 Bit 31

……………

End View

Bit 0 Bit 31

Now, each of the 32 inputs has 32 bits. There are 32 x 32 bits in and 1 x 32 bits out.

This multiplexer is equivalent to 32 of those on the previous page

Each of these input words COULD be a register!

Basic Building BlocksDetour looking at

Hardware 4.9 Instruction Processing

Page 34: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 34

Multiplexer

0

31

Input words

Selector

Output

Basic Building BlocksDetour looking at

Hardware 4.9 Instruction Processing

Write Enable

Clk

busW32

32busA

32busB

5 5 5RWRA RB

32 32-bitRegisters

So this register file is just the multiplexer shown here.

Page 35: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 35

Storage Element: Idealized Memory• Memory (idealized)

– One input bus: Data In– One output bus: Data Out

• Memory word is selected by:– Address selects the word to put on Data Out– Write Enable = 1: address selects the memory

word to be written via the Data In bus

• Clock input (CLK) – The CLK input is a factor ONLY during write operation– During read operation, behaves as a combinational logic

block:• Address valid Data Out valid after “access time.”

Clk

Data In

Write Enable

32 32DataOut

Address

Basic Building BlocksDetour looking at

Hardware 4.9 Instruction Processing

Page 36: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 36

4.9 Instruction Processing

1. Detour looking at Instruction Sets– How are instructions laid out so they can be

simply decoded?– Going from PIC to MIPS

2. Detour looking at Hardware– Multiplexers, registers, and ALUs

3. The datapath– Looking at each stage– The datapath and some sample instructions.

Page 37: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 37

4.9 Instruction Processing

• The fetch-decode-execute-store cycle is the series of steps that a computer carries out when it runs a program.

• We first have to fetch an instruction from memory, and place it into the Instruction Register (IR).

• Once in the IR, it is decoded to determine what needs to be done next.

• If an immediate operand is involved in the operation, it is retrieved and prepared for execution.

• With everything in place, the instruction is executed.• If a result is to be stored in memory, that’s done next.• If a result is placed in a register, that’s the last stage.

The next slide shows a flowchart of this process.

Page 38: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 38

Generic Steps: DatapathPC

inst

ruct

ion

mem

ory

+4

rtrs

rd

reg

iste

rs

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read

3. Execute4. Memory5. Reg.

Write

4.9 Instruction Processing

Page 39: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 39

Stages of the Datapath (1/6)

Problem: a single, atomic block which “executes an instruction” (performs all necessary operations beginning with fetching the instruction) would be too bulky and inefficient

Solution: break up the process of “executing an instruction” into stages, and then connect the stages to create the whole datapath

Smaller stages are easier to design Easy to optimize (change) one stage without

touching the others

4.9 Instruction Processing P

C

inst

ruct

ion

mem

ory

+4

rtrsrd

regis

ters

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read

3. Execute4. Memory5. Reg.

Write

Page 40: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 40

Stages of the Datapath (2/6)

There is a wide variety of MIPS instructions: so what general steps do they have in common?

Stage 1: instruction fetch

No matter what the instruction, the 32-bit instruction word must first be fetched from memory (the cache-memory hierarchy)

Also, this is where we increment PC (that is, PC = PC + 4, to point to the next instruction: byte addressing so + 4)

4.9 Instruction Processing

PC

inst

ruct

ion

mem

ory

+4

rtrsrd

regis

ters

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read

3. Execute4. Memory5. Reg.

Write

Page 41: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 41

Stages of the Datapath (3/6)

Stage 2: Instruction Decode upon fetching the instruction, we next gather data from

the fields (decode all necessary instruction data) first, read the Opcode to determine instruction type and

field lengths second, read in data from all necessary registers

-for add, read two registers-for addi, read one register-for jal, no reads necessary

4.9 Instruction Processing

PC

inst

ruct

ion

mem

ory

+4

rtrsrd

regis

ters

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read

3. Execute4. Memory5. Reg.

Write

Page 42: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 42

Stages of the Datapath (4/6)

°Stage 3: ALU (Arithmetic-Logic Unit)

the real work of most instructions is done here: arithmetic (+, -, *, /), shifting, logic (&, |), comparisons (slt)what about loads and stores?

-lw $t0, 40($t1)-the address we are accessing in memory = the value in $t1 + the value 40-so we do this addition in this stage

4.9 Instruction Processing

PC

inst

ruct

ion

mem

ory

+4

rtrsrd

regis

ters

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read

3. Execute4. Memory5. Reg.

Write

Page 43: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 43

Stages of the Datapath (5/6)

°Stage 4: Memory Access

actually only the load and store instructions do anything during this stage; the others remain idle

since these instructions have a unique step, we need this extra stage to account for them

as a result of the cache system, this stage is expected to be just as fast (on average) as the others

4.9 Instruction Processing

PC

inst

ruct

ion

mem

ory

+4

rtrsrd

regis

ters

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read

3. Execute4. Memory5. Reg.

Write

Page 44: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 44

Stages of the Datapath (6/6)

°Stage 5: Register Write most instructions write the result of some

computation into a register examples: arithmetic, logical, shifts, loads, slt what about stores, branches, jumps?

-don’t write anything into a register at the end-these remain idle during this fifth stage

4.9 Instruction Processing

PC

inst

ruct

ion

mem

ory

+4

rtrsrd

regis

ters

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read

3. Execute4. Memory5. Reg.

Write

Page 45: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 45

Generic Steps: Datapath

PC

inst

ruct

ion

mem

ory

+4

rtrs

rd

reg

iste

rs

ALU

Data

mem

ory

imm

1. InstructionFetch

2. Decode/ Register

Read

3. Execute4. Memory5. Reg.

Write

4.9 Instruction Processing

Page 46: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 46

4.9 Instruction Processing

1. Detour looking at Instruction Sets– How are instructions laid out so they can be

simply decoded?– Going from PIC to MIPS

2. Detour looking at Hardware– Multiplexers, registers, and ALUs

3. The datapath– Looking at each stage– The datapath and some sample instructions.

Page 47: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 47

Datapath Walkthrough #1 - add

add $r3, $r1, $r2 # r3 = r1+r2 Stage 1: fetch this instruction, incr. PC;

Stage 2: decode to find it’s an add, read registers $r1 and $r2;

Stage 3: add the two values retrieved in Stage 2;

Stage 4: idle (nothing to write to memory);

Stage 5: write result of Stage 3 into register $r3;

4.9 Instruction Processing

Page 48: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 48

PC

inst

ruct

ion

mem

ory

+4re

gis

ters

ALU

Data

mem

ory

imm

2

1

3

ad

d r

3,

r1,

r2

reg[1]+reg[2]

reg[2]

reg[1]

Datapath Walkthrough #1 add $r3, $r1, $r2 # r3 = r1+r2

4.9 Instruction Processing

Page 49: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 49

Datapath Walkthroughs #2 - sw

sw $r3, 17($r1) Stage 1: fetch this instruction, inc. PC

Stage 2: decode to find it’s a sw, then read registers $r1 and $r3

Stage 3: add 17 to value in register $41 (retrieved in Stage 2)

Stage 4: write value in register $r3 (retrieved in Stage 2) into memory address computed in Stage 3

Stage 5: go idle (nothing to write into a register)

4.9 Instruction Processing

Page 50: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 50

PC

inst

ruct

ion

mem

ory

+4re

gis

ters

ALU

Data

mem

ory

imm

3

1

x

SW

r3

, 1

7(r

1)

reg[1]+17

17

reg[1]

ME

M[r

1+

17

]<=

r3

reg[3]

Datapath Walkthroughs #2 sw $r3, 17($r1)

4.9 Instruction Processing

Page 51: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 51

lw $r3, 17($r1) Stage 1: fetch this instruction, inc. PC Stage 2: decode to find it’s a lw, then read register $r1

Stage 3: add 17 to value in register $r1 (retrieved in Stage 2)

Stage 4: read value from memory address compute in Stage 3

Stage 5: write value found in Stage 4 into register $r3

Datapath Walkthroughs #3 – lwNOTE: This is the one instruction

that requires all 5 stages

4.9 Instruction Processing

Page 52: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 52

PC

inst

ruct

ion

mem

ory

+4re

gis

ters

ALU

Data

mem

ory

imm

3

1

x

LW r

3,

17

(r1

)reg[1]+17

17

reg[1]

ME

M[r

1+

17

]

Datapath Walkthroughs #3 lw $r3, 17($r1)

4.9 Instruction Processing

Page 53: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 53

Datapath Summary°The datapath based on data transfers required to perform

instructions

°A controller causes the right transfers to happen

PC

inst

ruct

ion

mem

ory

+4

rtrs

rd

reg

iste

rs

ALU

Data

mem

ory

imm

Controller

opcode, funct

Page 54: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 54

4.13 Decoding & Control

1. An overview of how control works from instructions to electronics– How are instructions laid out so they can be

simply decoded?– Going from PIC to MIPS

2. Control operations– Samples of load, store and branch

3. The fetch unit in detail using “add” as example.

4. Operation of controls using or immediate, store, branch

Page 55: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 55

Mapping Code onto the DataPath: How does it all work?We Start With Code

############################################################## This is code that mimics the following C program.# main( )# {# printf( "Hello World\n" );# }# ###########################################################

.text

.globl main

main:lui $a0, helloori $v0, $0, 4 # li $v0, 4add $t0, $t1, $t2syscalljr $ra

.datahello:

.asciiz "Hello World\n"

4.13 Decoding

Page 56: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 56

From that code we get a 32-bit Equivalent Binary Representation.

Address Hex Op-code Mnemonic

[0x00400020] 0x3c021001 lui $2, 4097 ; 12: la $a0, hello

[0x00400024] 0x34440000 ori $4, $2, 0 ;

[0x00400028] 0x34020004 ori $2, $0, 4 ; 14: ori $v0, $0, 4

[0x0040002c] 0x012a4020 add $8, $9, $10 ; 14: add $t0,$t1,$t2

[0x00400030] 0x0000000c syscall ; 15: syscall

[0x00400034] 0x03e00008 jr $31 ; 16: jr $ra 0 1 2 a 4 0 2 0

0000 0001 0010 1010 0100 0000 0010 0000

000000 01001 01010 01000 00000 100000 0 9 10 8 0 32

4.13 Decoding

Page 57: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 57

What the hardware looks like.

Registers

R0

R8

R31

MUX To ALU

MUX From ALU

ALUMUX To ALU

Op

A B

Out

Select5 wires

6 wires

ovfc

4.13 Decoding

Page 58: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 58

What the hardware looks like.Registers

R0

R8

R31

MUX To ALU

MUX From ALU

ALUMUX To ALU

A B

Out

Op4.13 Decoding

Page 59: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 59

The hardware reads each of those fields.

RegistersR0

R8

R31

MUX To ALU

MUX From ALU

ALUMUX To ALU

Out

000000 01001 01010 01000 00000 100000 0 9 10 8 0 32

A B

4.13 Decoding

Page 60: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 60

An Overview of the Implementation

DataOut

Clk

5

Rw Ra Rb32 32-bitRegisters

Rd

AL

U

Clk

Data In

DataAddress Ideal

DataMemory

Instruction

InstructionAddress

IdealInstruction

Memory

Clk

PC

5Rs

5Rt

32

323232

A

B

Nex

t A

dd

ress

Control

Datapath

Control Signals Conditions

4.13 Decoding & Control

Page 61: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 61

4.13 Decoding & Control

1. An overview of how control works from instructions to electronics– How are instructions laid out so they can be

simply decoded?– Going from PIC to MIPS

2. Control operations– Samples of load, store and branch

3. The fetch unit in detail using “add” as example.

4. Operation of controls using or immediate, store, branch

Page 62: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 62

Overview of the Instruction Fetch Unit

• The common operations– Fetch the Instruction: mem[PC]– Update the program counter:

• Sequential Code: PC PC + 4

• Branch and Jump: PC “something else”

32

Instruction WordAddress

InstructionMemory

PCClk

Next AddressLogic

4.13 Decoding & Control Control Samples

Page 63: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 63

Add & Subtract

R[rd] R[rs] op R[rt]; Example: addu rd, rs, rt– Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields– ALUctr and RegWr: control logic after decoding the instruction

32

Result

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

5 5 5

Rw Ra Rb

32 32-bitRegisters

Rs RtRd

AL

Uop rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

4.13 Decoding & Control Control Samples

Page 64: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 64

Logical Operations With Immediate

• R[rt] R[rs] op ZeroExt[ imm16 ]

11

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits rd?

immediate

016 1531

16 bits16 bits

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

32

Result

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs

ZeroE

xt

Mu

x

RtRdRegDst

Mux

3216imm16

ALUSrc

AL

U

Rt?

4.13 Decoding & Control Control Samples

Page 65: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 65

Load Operations

• R[rt] Mem[R[rs] + SignExt[imm16]]; Example: lw rt, rs, imm16

11

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits rd

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs

RtRdRegDst

Exten

der

Mu

x

Mux

3216

imm16

ALUSrc

ExtOp

Clk

Data InWrEn

32

Adr

DataMemory

32

AL

U

MemWr Mu

x

W_Src

??

Rt?

4.13 Decoding & Control Control Samples

Page 66: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 66

Store Operations

• Mem[ R[rs] + SignExt[imm16] R[rt] ]; Example: sw rt, rs, imm16

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Exten

der

Mu

x

Mux

3216imm16

ALUSrcExtOp

Clk

Data InWrEn

32Adr

DataMemory

MemWr

AL

U

32

Mu

x

W_Src

4.13 Decoding & Control Control Samples

Page 67: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 67

The Branch Instruction

•beq rs, rt, imm16– mem[PC] Fetch the instruction from memory

– Equal R[rs] == R[rt] Calculate the branch condition

– if (Equal) Calculate the next instruction’s address• PC PC + 4 + ( SignExt(imm16) 4 )

– else• PC PC + 4

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

4.13 Decoding & Control Control Samples

Page 68: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 68

Datapath for Branch Operations

• beq rs, rt, imm16 Datapath generates condition (equal)

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

32

imm16P

C

Clk

00

Ad

der

Mu

x

Ad

der

4nPC_sel

Clk

busW

RegWr

32

busA

32busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs Rt

Eq

ual

?

Cond

PC

Ext

Inst Address

4.13 Decoding & Control Control Samples

Page 69: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 69

Summary: A Single Cycle Datapathim

m16

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Exten

der

Mu

x

3216imm16

ALUSrcExtOp

Mu

x

MemtoReg

Clk

Data InWrEn32 Adr

DataMemory

MemWr

AL

U

Equal

Instruction<31:0>

0

1

0

1

01

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRtRs

=

Ad

der

Ad

der

PC

Clk

00

Mu

x

4

nPC_sel

PC

Ext

Adr

InstMemory

4.13 Decoding & Control• Rs, Rt, Rd and

Imed16 hardwired into datapath from Fetch Unit

• We have everything except control signals (the underlined pieces)

Page 70: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 70

Summary: Meaning of the Control Signals• ExtOp: “zero”, “sign”

• ALUsrc: 0 regB; 1 immed

• ALUctr: “add”, “sub”, “or”

° MemWr: 1 write memory

° MemtoReg: 0 ALU; 1 Mem

° RegDst: 0 “rt”; 1 “rd”

° RegWr: 1 write register

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Exten

der

Mu

x

3216imm16

ALUSrcExtOp

Mu

x

MemtoReg

Clk

Data InWrEn32 Adr

DataMemory

MemWr

AL

U

Equal

0

1

0

1

01

=

4.13 Decoding & Control

Page 71: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 71

4.13 Decoding & Control

1. An overview of how control works from instructions to electronics– How are instructions laid out so they can be

simply decoded?– Going from PIC to MIPS

2. Control operations– Samples of load, store and branch

3. The fetch unit in detail using “add” as example.

4. Operation of controls using or immediate, store, branch

Page 72: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 72

The add Instruction

•add rd, rs, rt

mem[PC] Fetch the instruction from memory

R[rd] R[rs] + R[rt] The actual operation

PC PC + 4 Calculate next instruction address

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

4.13 Decoding & Control

Page 73: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 73

The Fetch Unit• nPC_sel: 0 PC PC + 4

1 PC PC + 4 + SignExt(Im16) || 00

Adr

InstMemory

Ad

der

Ad

der

PC

Clk

00

Mu

x

4

nPC_sel

PC

Extim

m16

4.13 Decoding & Control

Page 74: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 74

Fetch Unit at Beginning (and end) of add• Fetch the instruction from

Instruction memory: Instruction mem[PC] (This is the same for all instructions)

• But wait until we get to branch instructions!!

Adr

InstMemory

Ad

der

Ad

der

PC

Clk

00

Mu

x

4

nPC_sel

imm

16

Instruction<31:0>

0

1

4.13 Decoding & Control

Page 75: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 75

4.13 Decoding & Control

1. An overview of how control works from instructions to electronics– How are instructions laid out so they can be

simply decoded?– Going from PIC to MIPS

2. Control operations– Samples of load, store and branch

3. The fetch unit in detail using “add” as example.

4. Operation of controls using or immediate, store, branch

Page 76: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 76

The Single Cycle Datapath

during Or Immediate

R[rt] R[rs] or ZeroExt(Imm16)op rs rt immediate

016212631

32

ALUctr =

Clk

busW

RegWr =

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst =

Exten

der

Mu

x

Mux

3216imm16

ALUSrc =

ExtOp =

Mu

x

MemtoReg =

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = A

LU

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

0

1

0

1

01<

21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel =

4.13 Decoding & Control

Page 77: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 77

R[rt] R[rs] or ZeroExt(Imm16)op rs rt immediate

016212631

32

ALUctr = Or

Clk

busW

RegWr = 1

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst = 0

Exten

der

Mu

x

Mux

3216imm16

ALUSrc = 1

ExtOp = 0

Mu

x

MemtoReg = 0

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 0A

LU

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

0

1

0

1

01<

21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel= +4

The Single Cycle Datapath during Or Immediate

4.13 Decoding & Control

Page 78: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 78

Data Memory {R[rs] + SignExt[imm16]} R[rt]

op rs rt immediate

016212631

32

ALUctr =

Clk

busW

RegWr =

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst =

Exten

der

Mu

x

Mux

3216imm16

ALUSrc =

ExtOp =

Mu

x

MemtoReg =

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = A

LU

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

0

1

0

1

01<

21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel =

The Single Cycle Datapath during Store

4.13 Decoding & Control

Page 79: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 79

Instruction<31:0>

op rs rt immediate

016212631

32

ALUctr = Add

Clk

busW

RegWr = 0

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst = x

Exten

der

Mu

x

Mux

3216imm16

ALUSrc = 1

ExtOp = 1

Mu

x

MemtoReg = x

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 1A

LU

InstructionFetch Unit

Clk

Zero

0

1

0

1

01

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel= +4

Data Memory {R[rs] + SignExt[imm16]} R[rt]

The Single Cycle Datapath during Store

4.13 Decoding & Control

Page 80: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 80

if (R[rs] – R[rt] == 0) then Zero 1; else Zero 0op rs rt immediate

016212631

32

ALUctr =Sub

Clk

busW

RegWr = 0

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst = x

Exten

der

Mu

x

Mux

3216imm16

ALUSrc = 0

ExtOp = x

Mu

x

MemtoReg = x

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 0A

LU

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

0

1

0

1

01<

21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel= “Br”

The Single Cycle Datapath during Branch

4.13 Decoding & Control

Page 81: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 81

if (Zero == 1) then PC = PC + 4 + SignExt(imm16)4 ; else PC = PC + 4

op rs rt immediate

016212631

° What is encoding of nPC_sel?

• Direct MUX select?• Branch / not branch

° Let’s choose second option

nPC_sel zero? MUX0 x 01 0 01 1 1

Adr

InstMemory

Ad

der

Ad

der

PC

Clk

00

Mu

x

4

nPC_sel

imm

16

Instruction<31:0>

0

1

Zero

The instruction fetch unitat end of branch

4.13 Decoding & Control

Page 82: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 82

Summary: A Single Cycle Processor

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Exten

der

Mu

x

Mux

3216imm16

ALUSrc

ExtOp

Mu

x

MemtoReg

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr

AL

U

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

0

1

0

1

01

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

MainControl

op6

ALUControlfunc

6

3ALUop

ALUctr3

RegDst

ALUSrc

:Instr<5:0>

Instr<31:26>

Instr<15:0>

nPC_sel

4.13 Decoding & Control

Page 83: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 83

Drawback of this Single Cycle Processor• Long cycle time:

– Cycle time must be long enough for the load instruction:

PC’s Clock -to-Q +

Instruction Memory Access Time +

Register File Access Time +

ALU Delay (address calculation) +

Data Memory Access Time +

Register File Setup Time +

Clock Skew• Cycle time for load is much longer than needed for all

other instructions

4.13 Decoding & Control

Page 84: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 84

4.14 Real World Architectures

• We will look at an Intel architecture, which is a CISC machine and MIPS, which is a RISC machine.– CISC is an acronym for complex instruction set

computer.– RISC stands for reduced instruction set computer.

Page 85: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 85

4.14 Real World Architectures

• The classic Intel architecture, the 8086, was born in 1979. It is a CISC architecture.

• It was adopted by IBM for its famed PC, which was released in 1981.

• The 8086 operated on 16-bit data words and supported 20-bit memory addresses.

• Later, to lower costs, the 8-bit 8088 was introduced. Like the 8086, it used 20-bit memory addresses.

What was the largest memory that the 8086 could address?

Page 86: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 86

4.14 Real World Architectures• The 8086 had four 16-bit general-purpose registers that

could be accessed by the half-word.

• It also had a flags register, an instruction register, and a stack accessed through the values in two other registers, the base pointer and the stack pointer.

• The 8086 had no built in floating-point processing.

• In 1980, Intel released the 8087 numeric coprocessor, but few users elected to install them because of their cost.

Page 87: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 87

4.14 Real World Architectures

• In 1985, Intel introduced the 32-bit 80386.

• It also had no built-in floating-point unit.

• The 80486, introduced in 1989, was an 80386 that had built-in floating-point processing and cache memory.

• The 80386 and 80486 offered downward compatibility with the 8086 and 8088.

• Software written for the smaller word systems was directed to use the lower 16 bits of the 32-bit registers.

Page 88: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 88

4.14 Real World Architectures

• Currently, Intel’s most advanced 32-bit microprocessor is the Pentium 4.

• It can run as fast as 3.8 GHz. This clock rate is nearly 800 times faster than the 4.77 MHz of the 8086.

• Speed enhancing features include multilevel cache and instruction pipelining.

• Intel, along with many others, is marrying many of the ideas of RISC architectures with microprocessors that are largely CISC.

Page 89: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 89

4.14 Real World Architectures

• The MIPS family of CPUs has been one of the most successful in its class.

• In 1986 the first MIPS CPU was announced.

• It had a 32-bit word size and could address 4GB of memory.

• Over the years, MIPS processors have been used in general purpose computers as well as in games.

• The MIPS architecture now offers 32- and 64-bit versions.

Page 90: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 90

4.14 Real World Architectures

• MIPS was one of the first RISC microprocessors.

• The original MIPS architecture had only 55 different instructions, as compared with the 8086 which had over 100.

• MIPS was designed with performance in mind: It is a load/store architecture, meaning that only the load and store instructions can access memory.

• The large number of registers in the MIPS architecture keeps bus traffic to a minimum.

How does this design affect performance?

Page 91: Chapter 4: ISA 1 Chapter 4: Instruction Set Architectures CS140 Computer Organization These slides are derived from those of Null & Lobur + the work of.

Chapter 4: ISA 91

• The major components of a computer system are its control unit, registers, memory, ALU, and data path.

• A built-in clock keeps everything synchronized.

• Control units can be microprogrammed or hardwired.

• Hardwired control units give better performance, while microprogrammed units are more adaptable to changes.

• Computers run programs through iterative fetch-decode-execute cycles.

• Computers can run programs that are in machine language.

• The Intel architecture is an example of a CISC architecture; MIPS is an example of a RISC architecture.

Chapter 4 Conclusion