Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth about implementation and more advanced topics
Jan 11, 2016
Instruction Sets and Pipelining
Cover basics of instruction set types and fundamental ideas of pipelining
Later in the course we will go into more depth about implementation and more advanced topics
Outline
Instruction set characteristics Review of the CPU and actions taken to
fetch, decode, and execute a sample instruction
Review of the fundamental idea of pipelining Fundamentals of RISC architectures
Outline - continued
Pipelining with RISC Five-stage pipeline for a RISC processor Hazards and pipeline stalls CPI definition and calculation
Structural hazards Data hazards
Instruction Set Architecture Classes
Consider the next diagram (figure 2.1) and the table figure 2.2 on page 93 to examine the differences between the four classes of instruction sets.
Historically stack and accumulator style were popular architectures, after 1980 most have used load-store (the right-most in both diagrams).
General-purpose Register Computers
GPR computers have two advantages: First, registers are faster than memory Second, registers are a more efficient way for
compilers than other forms of internal storage Registers can be used to hold variables
This reduces memory traffic vice the alternative of having memory hold variables
DSP Comment
DSPs can differ. The remark in the first line of page 94 is worth noting: because of the dominance of hand-optimized code in the DSP community the DSPs may have many special-purpose registers and only a few general-purpose registers.
Two Major Instruction Set Characteristics
ALU instructions have two or three operands How many of the operands may be memory
addresses in ALU instructions Consider carefully the two tables on page 94
and 95. Especially note the advantages and disadvantages.
The CPU
Review the next two slides that depict a simple CPU and an associated slide that shows one instruction broken down into the different machine actions
linesData
Addresslines
busMemory
Carry-in
ALU
PC
MAR
MDR
Y
Z
Add
XOR
Sub
bus
IR
TEMP
R0
controlALU
lines
Control signals
R n 1-
Instruction
decoder and
Internal processor
control logic
A B
Figure 7.1. Single-bus organization of the datapath inside a processor.
MUXSelect
Constant 4
Remarks
The MAR and MDR components can operate independently from the ALU and IR decoder since they deal with memory.
The instruction or IR decoding section of the graph is quite independent from the rest of the diagram (it doesn’t require the bus for example).
Remarks - continued
In effect three parts of this diagram operate somewhat independently: the ALU, the IR decoder and the MAR/MDR which sends data to and from the memory.
Pipelining builds on this independence: different instructions can be using the IR decode, the ALU and the memory transfer functions somewhat independently.
Step Action
1 PCout , MAR in , Read, Select4,Add, Zin
2 Zout , PC in , Y in , WMFC
3 MDRout , IR in
4 R3out , MAR in , Read
5 R1out , Y in , WMFC
6 MDRout , SelectY,Add, Zin
7 Zout , R1in , End
Figure7.6. Control sequenceforexecutionof theinstructionAdd (R3),R1.
Remarks
The first three steps mainly fetch the next instruction (1 PCout through 3 IRin).
Steps 4-6 are mainly involved in executing the actual adding part of the instruction (6 … Add)
Step 7 is involved in writing the output of the Add instruction
The decode part of the operation is more implicit.
Pipelining Review
This is from the CS 4014 course and represents a four-stage pipeline vice the five-state RISC pipeline that we will study later.
The difference is that this diagram doesn’t have a mem stage – it being combined in other stages
Pipelining Review
Formally pipelining is an implementation technique takes advantage of the parallelism that exists among the actions needed to execute an instruction.
Informally pipelining is like an assembly line.
F4I4
F1
F2
F3
I1
I2
I3
D1
D2
D3
D4
E1
E2
E3
E4
W1
W2
W3
W4
Instruction
Figure 8.2. A 4-stage pipeline.
Clock cycle 1 2 3 4 5 6 7
(a) Instruction execution divided into four steps
F : Fetchinstruction
D : Decodeinstructionand fetchoperands
E: Executeoperation
W : Writeresults
Interstage buffers
(b) Hardware organization
B1 B2 B3
Time
Remarks
Refer to the earlier CPU diagram and the example of a sample instruction and the actual CPU operations.
Note that the fetch, decode, execute and write parts of instructions could (in principle) be separated and make use of independent parts of the CPU.
Appendix A
A1 and A2 are the only sections that we will study for now.
The time required to move an instruction one step down the pipeline is a processor cycle.
If the stages are perfectly balanced then the average time per instruction in a pipelined processor is:
time per instruction / number of stages
Remark
Depending on how you look at it the reduction due to pipelining can be viewed either as decreasing the CPI or as decreasing the clock cycle time or both.
RISC Instruction Set Implementation in a Pipeline
5 stages IF – instruction fetch ID – instruction decode Ex – execution Mem – memory access WB – write-back (result to register)
Look over figure A.1 on page A-7 and the next diagram.
Pipeline = series of data paths shifted in time
IM – instruction memory, DM – data memory
Remarks
Note usage of pipeline registers discussed on page A-9 and the fact that they are edge-triggered so values change instantaneously on a clock edge.
Performance Issues in Pipelining
Go through the example on pages A-10, 11 Stalls and performance are handled with
calculations like those found on page A-12 Although there are different ways of defining
performance and recalling our in-class discussion last week: note the second paragraph remark: The ideal CPI on a pipelined processor is almost
always 1
Pipeline Hazards
Structural – resource conflicts when the hardware cannot support all combinations of instructions
Data – instruction depends on the result of a previous instruction (time dependent – can’t read the result until it is computed and stored by another instruction)
Control – caused by branchs
Hazards cause Stalls
The pipeline must be held up temporarily or stalled when a hazard occurs.
As a result, no new instructions are fetched during the stall (study why in the last paragraph of page A-11).
A processor with only one memory port will generate a conflict. The load instruction needs the memory port in CC 4 at the same time as instruction 3 needs the memory port.
Remark on figure A.4
Look over figure A.5 on page A-15 where the conflict is settled by introducing a stall for instruction i+3.
Data Hazards
The pipeline changes the order of read/write accesses to operands so that the order differs from the order seen by sequential processing.
Example
Example: two sequential statements: R1 = R2 + R3;
R5 = R4 – R1; Now consider the pipelining of these two
statements, see page A-17 and figure A.5 (also the next diagram). In pipelining R1’s value from the first instruction may not be updated before it is needed in the second instruction.
Branch/Control Hazards
Look over the examples in figures A.11 through A.14.