Top Banner
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2017):Ch.4 1
50

LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Sep 15, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

LECTURE 3:THE PROCESSOR

Abridged version of Patterson & Hennessy (2017):Ch.4

1

Page 2: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Introduction CPU performance factors

Instruction count Determined by ISA and compiler

CPI and Cycle time Determined by CPU hardware

We will examine two RISC-V implementations A simplified version A more realistic pipelined version

Simple subset, shows most aspects Memory reference: ld, sd Arithmetic/logical: add, sub, and, or Control transfer: beq

2

Page 3: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Instruction Execution PC instruction memory, fetch instruction Register numbers register file, read registers Depending on instruction class

Use ALU to calculate Arithmetic result Memory address for load/store Branch comparison

Access data memory for load/store PC target address or PC + 4

3

Page 4: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Clocking Methodology Combinational logic transforms data during

clock cycles Between clock edges Input from state elements, output to state

element Longest delay determines clock period

4

Page 5: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Full Datapath

5

Page 6: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

The Main Control Unit Control signals derived from instruction

6

Page 7: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Datapath With Control

7

Page 8: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

R-Type Instruction

8

Page 9: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Load Instruction

9

Page 10: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

BEQ Instruction

10

Page 11: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Performance Issues Longest delay determines clock period

Critical path: load instruction Instruction memory register file ALU

data memory register file Not feasible to vary period for different

instructions Violates design principle

Making the common case fast We will improve performance by pipelining

11

Page 12: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

RISC-V Pipeline Five stages, one step per stage

1. IF: Instruction fetch from memory

2. ID: Instruction decode & register read

3. EX: Execute operation or calculate address

4. MEM: Access memory operand

5. WB: Write result back to register

12

Page 13: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Pipeline PerformanceSingle-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

13

Page 14: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Multi-Cycle Pipeline Diagram Form showing resource usage

14

Page 15: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Multi-Cycle Pipeline Diagram Traditional form

15

Page 16: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Pipeline Speedup If all stages are balanced

i.e., all take the same time Time between instructionspipelined

= Time between instructionsnonpipelined

Number of stages If not balanced, speedup is less Speedup due to increased throughput

Latency (time for each instruction) does not decrease

16

Page 17: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Pipeline Summary

Pipelining improves performance by increasing instruction throughput Executes multiple instructions in parallel Each instruction has the same latency

Subject to hazards Structure, data, control

Instruction set design affects complexity of pipeline implementation

The BIG Picture

17

Page 18: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Single-Cycle Pipeline Diagram State of pipeline in a given cycle

18

Page 19: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Pipelined Control

19

Page 20: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Pipelining and ISA Design RISC-V ISA designed for pipelining

All instructions are 32-bits Easier to fetch and decode in one cycle c.f. x86: 1- to 17-byte instructions

Few and regular instruction formats Can decode and read registers in one step

Load/store addressing Can calculate address in 3rd stage, access memory

in 4th stage

20

Page 21: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Hazards Situations that prevent starting the next

instruction in the next cycle Structure hazards

A required resource is busy Data hazard

Need to wait for previous instruction to complete its data read/write

Control hazard Deciding on control action depends on

previous instruction

21

Page 22: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Structure Hazards Conflict for use of a resource In RISC-V pipeline with a single memory

Load/store requires data access Instruction fetch would have to stall for that

cycle Would cause a pipeline “bubble”

Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches

22

Page 23: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Data Hazards An instruction depends on completion of

data access by a previous instruction add x19, x0, x1

sub x2, x19, x3

23

Page 24: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Code Scheduling to Avoid Stalls

Reorder code to avoid use of load result in the next instruction

C code for a = b + e; c = b + f;

ld x1, 0(x0)ld x2, 8(x0)

add x3, x1, x2sd x3, 24(x0)ld x4, 16(x0)

add x5, x1, x4sd x5, 32(x0)

stall

stall

ld x1, 0(x0)ld x2, 8(x0)

ld x4, 16(x0)add x3, x1, x2

sd x3, 24(x0)add x5, x1, x4

sd x5, 32(x0)

11 cycles13 cycles

24

Page 25: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Data Hazards in ALU Instructions

Consider this sequence:sub x2, x1,x3and x12,x2,x5or x13,x6,x2add x14,x2,x2sd x15,100(x2)

We can resolve hazards with forwarding How do we detect when to forward?

25

Page 26: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Forwarding (aka Bypassing) Use result when it is computed

Don’t wait for it to be stored in a register Requires extra connections in the datapath

26

Page 27: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Dependencies & Forwarding

27

Page 28: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Datapath with Forwarding

28

Page 29: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Load-Use Data Hazard Can’t always avoid stalls by forwarding

If value not computed when needed Can’t forward backward in time!

29

Page 30: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Load-Use Hazard Detection Check when using instruction is decoded

in ID stage ALU operand register numbers in ID stage

are given by IF/ID.RegisterRs1, IF/ID.RegisterRs2

Load-use hazard when ID/EX.MemRead and

((ID/EX.RegisterRd = IF/ID.RegisterRs1) or (ID/EX.RegisterRd = IF/ID.RegisterRs1))

If detected, stall and insert bubble

30

Page 31: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

How to Stall the Pipeline Force control values in ID/EX register

to 0 EX, MEM and WB do nop (no-operation)

Prevent update of PC and IF/ID register Using instruction is decoded again Following instruction is fetched again 1-cycle stall allows MEM to read data for ld

Can subsequently forward to EX stage

31

Page 32: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Load-Use Data Hazard

Stall inserted here

32

Page 33: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Datapath with Hazard Detection

33

Page 34: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Stalls and Performance

Stalls reduce performance But are required to get correct results

Compiler can arrange code to avoid hazards and stalls Requires knowledge of the pipeline structure

The BIG Picture

34

Page 35: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Control Hazards Branch determines flow of control

Fetching next instruction depends on branch outcome

Pipeline can’t always fetch correct instruction Still working on ID stage of branch

In RISC-V pipeline Need to compare registers and compute

target early in the pipeline Add hardware to do it in ID stage

35

Page 36: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Stall on Branch Wait until branch outcome determined

before fetching next instruction

36

Page 37: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Branch Hazards If branch outcome determined in MEM

PC

Flush theseinstructions(Set controlvalues to 0)

37

Page 38: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Reducing Branch Delay Move hardware to determine outcome to ID

stage Target address adder Register comparator

Example: branch taken36: sub x10, x4, x840: beq x1, x3, 16 // PC-relative branch // to 40+16*2=7244: and x12, x2, x548: orr x13, x2, x652: add x14, x4, x256: sub x15, x6, x7 ...72: ld x4, 50(x7)

38

Page 39: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Example: Branch Taken

39

Page 40: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Example: Branch Taken

40

Page 41: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Branch Prediction Longer pipelines can’t readily determine

branch outcome early Stall penalty becomes unacceptable

Predict outcome of branch Only stall if prediction is wrong

In RISC-V pipeline Can predict branches not taken Fetch instruction after branch, with no delay

41

Page 42: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

1-Bit Predictor: Shortcoming Inner loop branches mispredicted twice!

outer: … …

inner: … …

beq …, …, inner …

beq …, …, outer

Mispredict as taken on last iteration of inner loop

Then mispredict as not taken on first iteration of inner loop next time around

42

Page 43: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

2-Bit Predictor Only change prediction on two successive

mispredictions

43

Page 44: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

More-Realistic Branch Prediction Static branch prediction

Based on typical branch behavior Example: loop and if-statement branches

Predict backward branches taken Predict forward branches not taken

Dynamic branch prediction Hardware measures actual branch behavior

e.g., record recent history of each branch Assume future behavior will continue the trend

When wrong, stall while re-fetching, and update history

44

Page 45: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Dynamic Branch Prediction In deeper and superscalar pipelines, branch

penalty is more significant Use dynamic prediction

Branch prediction buffer (aka branch history table) Indexed by recent branch instruction addresses Stores outcome (taken/not taken) To execute a branch

Check table, expect the same outcome Start fetching from fall-through or target If wrong, flush pipeline and flip prediction

45

Page 46: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Exceptions and Interrupts “Unexpected” events requiring change

in flow of control Different ISAs use the terms differently

Exception Arises within the CPU

e.g., undefined opcode, syscall, …

Interrupt From an external I/O controller

Dealing with them without sacrificing performance is hard

46

Page 47: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Handling Exceptions Save PC of offending (or interrupted) instruction

In RISC-V: Supervisor Exception Program Counter (SEPC)

Save indication of the problem In RISC-V: Supervisor Exception Cause Register

(SCAUSE) 64 bits, but most bits unused

Exception code field: 2 for undefined opcode, 12 for hardware malfunction, …

Jump to handler Assume at 0000 0000 1C09 0000hex

47

Page 48: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Fallacies Pipelining is easy (!)

The basic idea is easy The devil is in the details

e.g., detecting data hazards

Pipelining is independent of technology So why haven’t we always done pipelining? More transistors make more advanced techniques

feasible Pipeline-related ISA design needs to take account of

technology trends e.g., predicated instructions

48

Page 49: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Pitfalls Poor ISA design can make pipelining

harder e.g., complex instruction sets (VAX, IA-32)

Significant overhead to make pipelining work IA-32 micro-op approach

e.g., complex addressing modes Register update side effects, memory indirection

e.g., delayed branches Advanced pipelines have long delay slots

49

Page 50: LECTURE 3 - FKE · 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 12. Pipeline

Concluding Remarks ISA influences design of datapath and control Datapath and control influence design of ISA Pipelining improves instruction throughput

using parallelism More instructions completed per second Latency for each instruction not reduced

Hazards: structural, data, control

50