Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.

Chapter 13

Reduced Instruction Set Computers(RISC)

Pipelining

Pipelining ReviewPipelining:

— Break instruction cycle into n phases (one stage per phase)

– e.g. Fetch, Decode, ReadOPs, Execute1, Execute2, WriteBack

— Fetch a new instruction each phase

— Maximum speed gain is n

— Hazards reduce the ability to achieve a gain of n– Types of Hazards

+ Resourceo Hazard occurs when instruction needs a resource being used by another

instruction

+ Datao RAW (hazard if read can occur before write has finished) o WAR (hazard if write can occur before read is finished)o WAW (hazard if writes occur in the unintended order)

+ Controlo Hazard occurs when a wrong fetch decision at a branch results in an

extra instruction fetch and a pipeline flush

— Stalling can always “fix” a hazard

Data Hazards

• Read after Write (RAW) – true dependency— A Hazard occurs if the Read occurs before the Write is

complete– e.g. Reg 1 Reg 1 + Reg 2 {write occurs after execution} Reg 3 reg 1 – Reg 3 {read occurs before execution}

• Write after Read (WAR) – anti-dependency

— A Hazard occurs if the Write occurs before the Read happens– e.g. Reg M(ptr) {2 memory accesses – long read} {M(ptr) & M(pc)

are same loc}

M(pc) Reg {1 memory access – short write}

• Write after Write (WAW) – output dependency

— A Hazard occurs if the two Writes occur in the reverse order than intended

– e.g. Reg A M(PTR) {2 memory accesses – long write} Reg A Reg B {0 memory accesses – short write}

Control Hazard

Control Hazards occur when a wrong fetch decision results in a new instruction fetch and the pipeline being flushed

Solutions include:— Multiple Pipeline streams— Prefetching the branch target — Using a Loop Buffer— Branch Prediction— Delayed Branch— Reordering of Instructions— Multiple Copies of Registers (backups)

Recall Key Features of RISC

—Limited and simple instruction set

—Memory access instructions limited to memory <-> registers

—Operations are register to register

—Large number of general purpose registers (and use of compiler technology to optimize register use)

—Emphasis on optimising the instruction pipeline (& memory management)

—Hardwired for speed (no microcode)

Supporting Pipelining with Registers

• Software contribution— Require compiler to allocate registers

– Allocate based on most used variables in a given time+ Requires sophisticated program analysis

• Hardware contribution— Have more registers

– Thus more variables will be in registers

Register uses

• Store local scalar variables in registers— Reduces memory accesses

• Every procedure (function) call changes locality (typically lots of procedure calls are encountered)— Parameters must be passed— Partial context switch— Results must be returned— Variables from calling program must be restored— Partial Context switch

• Store Global Variables in Registers ?

Using “Register Windows”

Observations:• Typically only a few Local & Pass parameters• Typically limited range of depth of calls

Implications:If we Partition register set• We can use multiple small sets of registers per context • Let Calls switch to a new set of registers• Let Returns switch back to the previously used set of

registers

Using “Register Windows”

• Partition register set into:— Parameter registers (Passed Parameters)— Local registers (includes local variables)—Temporary registers (Passing Parameters)

• Then:—Temporary registers from one set overlap parameter

registers from the next

• And: —This provides parameter passing without moving data

(just move one pointer)

Overlapping “Register Windows”

Picture of Calls & Returns:

Circular Buffer diagram of Overlapping “Register Windows”

Operation of Circular Buffer

• When a call is made, a current window pointer is moved to show the currently active register window

• If all windows are in use, an interrupt is generated and the oldest window (the one furthest back in the call nesting) is saved to memory

• A saved window pointer indicates where the next saved windows should be restored

Global Variables

How should we accommodate Global Variables?

• Allocate by the compiler to memory ?

• Have a static set of registers for global variables ?

• Put them in cache ?

Registers v Cache – which is better?Large Register File Cache

All local scalars Recently-used local scalars

Individual variables Blocks of memory

Compiler-assigned global variables Recently-used global variables

Save/Restore based on procedure nesting depth Save/Restore based on cache replacement algorithm

Register addressing Memory addressing

Referencing a Scalar - Window Based Register File

Referencing a Scalar - Cache

Compiler Based Register Optimization

Basis:• Assuming relatively small number of registers (16-32)

• Optimizing the use is given to the compiler

• HLL programs have no explicit references to registers

Then:• Assign symbolic, or virtual, register to each candidate

variable

• Map (unlimited) symbolic registers to (limited) real registers

• Symbolic registers that are not used at the same time can share real registers

• If you run out of real registers some variables will use memory

Graph Coloring Algorithm for Register Assignment

Given:• A graph of nodes and edges• Nodes represent symbolic registers • Two symbolic registers that are used in the same

program fragment are joined by an edge

Then:• Assign a color to each node• Adjacent nodes must have different colors (connected

by an edge)• Assign a minimum number of colors

And then:• Try to color the graph with n colors, where n is the

number of real registers• Nodes that can not be colored must be placed in

memory

Graph Coloring Algorithm Example

RISC Features Again

• Key features— Large number of general purpose registers (and use of compiler technology to optimize register use)

— Limited and simple instruction set

— Memory access instructions – memory <-> registers

— Operations are register to register

— Emphasis on optimising the instruction pipeline & memory management

— Hardwired for speed (no microcode)

Memory to Memory vs Register to Memory Operations

(RISC uses only Register to memory)

Actually these numbers are bits, not bytes

RISC Pipelining Basics• Define two phases of execution for

register based instructions—I: Instruction fetch—E: Execute

– ALU operation with register input and output

• For load and store there will be three—I: Instruction fetch—E: Execute

– Calculate memory address

—D: Memory– Register to memory or memory to register operation

Effects of RISC Pipelining

(Allows 2 memory accesses per stage) (E1 register read, E2 execute & register write

Particularly beneficial if E phase is long)

(2 stage since ED are effectively one stage)

Optimization of RISC Pipelining

• Delayed branch

— Leverages branch that does not take effect until after execution of following instruction

— The following instruction becomes the delay slot

Normal vs

Delayed Branch

(Text diagram is wrong)

Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.

Documents