Top Banner
Faculty of Computer Science CMPUT 229 © 2006 Accelerating Performance The RISC Revolution
25

Accelerating Performance

Dec 31, 2015

Download

Documents

kai-hunt

The RISC Revolution. Accelerating Performance. Regular. CISC  RISC. CISC: Complex Instruction Set Architecture Complex decoders Lots of Circuitry Some Complex instructions may never be used RISC: Reduced Instruction Set Architecture Better use of silicon real state. Instruction Usage. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Accelerating Performance

Faculty of Computer Science

CMPUT 229 © 2006

Accelerating Performance

The RISC Revolution

Page 2: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

CISC RISC

CISC: Complex Instruction Set Architecture

– Complex decoders

– Lots of Circuitry

– Some Complex instructions may never be used

RISC: Reduced Instruction Set Architecture

– Better use of silicon real state.

Regular

Page 3: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229 Clements, pp. 328

Instruction Usage

Fairclough* divided instructions into eight groups:

– Data movement

– Program modification (branch, call, return)

– Arithmetic

– Compare

– Logical

– Shift

– Bit manipulation

– Input/output and miscellaneous

* Fairclough, D. A., “A Unique Microprcessor Instruction Set,” IEEE Micro, May, 1982, pp. 8-18.

Page 4: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Constants, parameters, and local storage

Tanenbaum* reported that:

• 56% of all constant values are in the -15 to +15 range

• 98% of all constant values are in the -511 to +511 range

• Thus a 5-bit immediate field covers more than half of the literals

Other researchers showed that

• 95% of subroutines require 12 words or less for parameter passing and local storage

• Thus providing this space in the processor reduces processor-memory bus traffic.

* Tanenbaum, Andrew S., “Implications of Structured Programming for Machine Architecture,” Communications of the ACM, Vol. 21, N. 3, March 1978, pp. 237-246

Clements, pp. 329

Page 5: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

RISC Characteristics

Enough registers to reduce memory traffic

Instructions operate on three registers

Efficient parameter passing and branching

Don’t implement infrequent (complex) instructions

Aim to execute one instruction per cycle

Fix instruction length

Clements, pp. 329

Page 6: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Register Windows

A window is a set of registers visible to the current subroutine

A Window Pointer (WP) register indicate the current active

window

In the Berkeley RISC each window has 32 registers.

A call to a subroutine in the Berkeley RISC used the intruct.:

CALLR Rd, address

The current value of the PC is written into the register Rd of

the new window.

Clements, pp. 330

Page 7: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Berkeley RISC Register Window

Register Name Register Type

R0 to R9Global registers common to all windows

R10 to R15Used to receive parameters from parent and to pass parameters back to parent

R16 to R25Accessed exclusively by the current subroutine

R26 to R31Used to pass parameters to and from its own child

Clements, pp. 332

Page 8: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Berkeley RISC Register Window

Clements, pp. 333

Page 9: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

RISC Pipeline

Clements, pp. 335

Page 10: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Instruction Overlapping in a RISC Pipeline

Clements, pp. 336

Page 11: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Instruction Overlapping in a RISC Pipeline

Clements, pp. 336

Page 12: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Pipeline Hazards

Cause a stall in the pipeline

Branch instructions

• We don’t know which instruction to execute next

Data Dependences

• We don’t know what is the value of an operand

Page 13: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

A Bubble in the Pipeline

Clements, pp. 337

Page 14: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Delayed Branch

Clements, pp. 338

Page 15: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Data Dependency

ADD R1, R2, R3 [R1] [R2] + [R3]

ADD R5, R2, R4 [R5] [R2] + [R4]

ADD R6, R7, R5 [R6] [R7] + [R5]

ADD R2, R2, R4 [R2] [R2] + [R4]

Clements, pp. 338

Page 16: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Data Dependency

ADD R1, R2, R3 [R1] [R2] + [R3]

ADD R5, R2, R4 [R5] [R2] + [R4]

ADD R6, R7, R5 [R6] [R7] + [R5]

ADD R2, R2, R4 [R2] [R3] + [R4]

Clements, pp. 338

Page 17: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Bubble Because of Data Dependency

Clements, pp. 338

Page 18: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Internal Forwarding

Clements, pp. 339

Page 19: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

A Probabilistic Model for Branch Penalty

Assumptions:

• Non-branch instructions execute in one cycle

• pb: probability that an instruction is a branch

• pt: probability that a branch instruction is taken

• b: additional cycles required if the branch is taken

• There is no penalty if a branch is not taken

• Tave: average time to execute an instruction

Clements, pp. 339

Tave = (1 - pb)NonBranchTime + pbBranchTime

BranchTime = ptTimeTaken + (1-pt)TimeNotTaken

= pt(1+b) + (1-pt)1

= pt+ptb + 1 - pt = ptb + 1

Tave = (1 - pb)1 + pb(ptb + 1)

Tave = 1 - pb + pbptb + pb

Tave = 1 + pbptb

Page 20: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Branch Prediction

Idea: Guess which way a branch will go and start

fetching instructions from the right place.

pb: probability instruction is a branch

pt: probability taken

pt: probability prediction is correct

a,b,c,d: penalties in each case

Page 21: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Average Branch Penalty The average branch penalty is given by

Cave = a.(pt.pc) +

Page 22: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Average Branch Penalty The average branch penalty is given by

Cave = a.(pt.pc) + b.(1-pt).(1-pc)

Page 23: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Average Branch Penalty The average branch penalty is given by

Cave = a.(pt.pc) + b.(1-pt).(1-pc) + c.pt.(1-pc)

Page 24: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Average Branch Penalty The average branch penalty is given by

Cave = a.(pt.pc) + b.(1-pt).(1-pc) + c.(1-pt).(1-pc) + d.(1-pt).pc

Page 25: Accelerating Performance

© 2006

Department of Computing Science

CMPUT 229

Approaches to Branch Prediction

Static Branch Prediction:

– A given branch is predicted to be either always taken or never taken

Dynamic Branch Prediction:

– Use the past behavior of the program to predict a branch

– Processor maintain a branch prediction table

• Single bit predictors ==> accuracy of 80%

• Five bit predictors ==> accuracy of 98%