Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth.

Instruction Sets and Pipelining

Cover basics of instruction set types and fundamental ideas of pipelining

Later in the course we will go into more depth about implementation and more advanced topics

Outline

Instruction set characteristics Review of the CPU and actions taken to

fetch, decode, and execute a sample instruction

Review of the fundamental idea of pipelining Fundamentals of RISC architectures

Outline - continued

Pipelining with RISC Five-stage pipeline for a RISC processor Hazards and pipeline stalls CPI definition and calculation

Structural hazards Data hazards

Instruction Set Architecture Classes

Consider the next diagram (figure 2.1) and the table figure 2.2 on page 93 to examine the differences between the four classes of instruction sets.

Historically stack and accumulator style were popular architectures, after 1980 most have used load-store (the right-most in both diagrams).

General-purpose Register Computers

GPR computers have two advantages: First, registers are faster than memory Second, registers are a more efficient way for

compilers than other forms of internal storage Registers can be used to hold variables

This reduces memory traffic vice the alternative of having memory hold variables

DSP Comment

DSPs can differ. The remark in the first line of page 94 is worth noting: because of the dominance of hand-optimized code in the DSP community the DSPs may have many special-purpose registers and only a few general-purpose registers.

Two Major Instruction Set Characteristics

ALU instructions have two or three operands How many of the operands may be memory

addresses in ALU instructions Consider carefully the two tables on page 94

and 95. Especially note the advantages and disadvantages.

The CPU

Review the next two slides that depict a simple CPU and an associated slide that shows one instruction broken down into the different machine actions

linesData

Addresslines

busMemory

Carry-in

ALU

PC

MAR

MDR

Y

Z

Add

XOR

Sub

bus

IR

TEMP

R0

controlALU

lines

Control signals

R n 1-

Instruction

decoder and

Internal processor

control logic

A B

Figure 7.1. Single-bus organization of the datapath inside a processor.

MUXSelect

Constant 4

Remarks

The MAR and MDR components can operate independently from the ALU and IR decoder since they deal with memory.

The instruction or IR decoding section of the graph is quite independent from the rest of the diagram (it doesn’t require the bus for example).

Remarks - continued

In effect three parts of this diagram operate somewhat independently: the ALU, the IR decoder and the MAR/MDR which sends data to and from the memory.

Pipelining builds on this independence: different instructions can be using the IR decode, the ALU and the memory transfer functions somewhat independently.

Step Action

1 PCout , MAR in , Read, Select4,Add, Zin

2 Zout , PC in , Y in , WMFC

3 MDRout , IR in

4 R3out , MAR in , Read

5 R1out , Y in , WMFC

6 MDRout , SelectY,Add, Zin

7 Zout , R1in , End

Figure7.6. Control sequenceforexecutionof theinstructionAdd (R3),R1.

Remarks

The first three steps mainly fetch the next instruction (1 PCout through 3 IRin).

Steps 4-6 are mainly involved in executing the actual adding part of the instruction (6 … Add)

Step 7 is involved in writing the output of the Add instruction

The decode part of the operation is more implicit.

Pipelining Review

This is from the CS 4014 course and represents a four-stage pipeline vice the five-state RISC pipeline that we will study later.

The difference is that this diagram doesn’t have a mem stage – it being combined in other stages

Pipelining Review

Formally pipelining is an implementation technique takes advantage of the parallelism that exists among the actions needed to execute an instruction.

Informally pipelining is like an assembly line.

F4I4

F1

F2

F3

I1

I2

I3

D1

D2

D3

D4

E1

E2

E3

E4

W1

W2

W3

W4

Instruction

Figure 8.2. A 4-stage pipeline.

Clock cycle 1 2 3 4 5 6 7

(a) Instruction execution divided into four steps

F : Fetchinstruction

D : Decodeinstructionand fetchoperands

E: Executeoperation

W : Writeresults

Interstage buffers

(b) Hardware organization

B1 B2 B3

Time

Remarks

Refer to the earlier CPU diagram and the example of a sample instruction and the actual CPU operations.

Note that the fetch, decode, execute and write parts of instructions could (in principle) be separated and make use of independent parts of the CPU.

Appendix A

A1 and A2 are the only sections that we will study for now.

The time required to move an instruction one step down the pipeline is a processor cycle.

If the stages are perfectly balanced then the average time per instruction in a pipelined processor is:

time per instruction / number of stages

Remark

Depending on how you look at it the reduction due to pipelining can be viewed either as decreasing the CPI or as decreasing the clock cycle time or both.

RISC Instruction Set Implementation in a Pipeline

5 stages IF – instruction fetch ID – instruction decode Ex – execution Mem – memory access WB – write-back (result to register)

Look over figure A.1 on page A-7 and the next diagram.

Pipeline = series of data paths shifted in time

IM – instruction memory, DM – data memory

Remarks

Note usage of pipeline registers discussed on page A-9 and the fact that they are edge-triggered so values change instantaneously on a clock edge.

Performance Issues in Pipelining

Go through the example on pages A-10, 11 Stalls and performance are handled with

calculations like those found on page A-12 Although there are different ways of defining

performance and recalling our in-class discussion last week: note the second paragraph remark: The ideal CPI on a pipelined processor is almost

always 1

Pipeline Hazards

Structural – resource conflicts when the hardware cannot support all combinations of instructions

Data – instruction depends on the result of a previous instruction (time dependent – can’t read the result until it is computed and stored by another instruction)

Control – caused by branchs

Hazards cause Stalls

The pipeline must be held up temporarily or stalled when a hazard occurs.

As a result, no new instructions are fetched during the stall (study why in the last paragraph of page A-11).

A processor with only one memory port will generate a conflict. The load instruction needs the memory port in CC 4 at the same time as instruction 3 needs the memory port.

Remark on figure A.4

Look over figure A.5 on page A-15 where the conflict is settled by introducing a stall for instruction i+3.

Data Hazards

The pipeline changes the order of read/write accesses to operands so that the order differs from the order seen by sequential processing.

Example

Example: two sequential statements: R1 = R2 + R3;

R5 = R4 – R1; Now consider the pipelining of these two

statements, see page A-17 and figure A.5 (also the next diagram). In pipelining R1’s value from the first instruction may not be updated before it is needed in the second instruction.

Branch/Control Hazards

Look over the examples in figures A.11 through A.14.

Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth.

Documents

major instruction

classes of instruction

memory addresses

stage pipeline vice

memory traffic vice

diagram figure

generalpurpose registers

diagram doesnt