Top Banner
RECAP B649 Parallel Architectures and Programming
30

RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

Jul 19, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

RECAPB649

Parallel Architectures and Programming

Page 2: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

RECAP

2

Page 3: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Recap

• ILP• Exploiting ILP• Dynamic scheduling• Thread-level Parallelism• Memory Hierarchy• Other topics through student presentations• Virtual Machines

3

Page 4: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

ILP: Pipelining

4

Page 5: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Pipelining: Adding Latches

5

Page 6: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Pipelining: Adding Forwarding

6

Page 7: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Pipelining: Adding Branch Delay Slots

7

Page 8: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Extending the Basic Pipeline

8

Page 9: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Extending the Basic Pipeline

8

Page 10: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Extending the Basic Pipeline

8

Page 11: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Exploiting ILP Through Compiler Techniques

9

• Loop unrolling• Making use of branch delayed slots• Static branch prediction• Loop fusion• Unroll and jam• ...

Page 12: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Dynamic Branch Prediction

10

bits to index BPB

Address bits

BranchPredictionBuffer1

0

10

Page 13: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Two-bit Branch Predictor

11

Page 14: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

General n-bit Correlating Branch Predictors

12

bits to index BPB

Address bits

BranchPredictionBuffer

global shift register (m bits)

n bits

Use Branch Target Buffers (BTBs) for caching branch targets

Page 15: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Dynamic Scheduling: Tomasulo’s Approach

13

Page 16: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Tomasulo’s Approach: Observations

• RAW hazards handled by waiting for operands• WAR and WAW hazards handled by register

renaming★ only WAR and WAW hazards between instructions

currently in the pipeline are handled; is this a problem?★ larger number of hidden names reduces name dependences

• CDB implements forwarding

14

Page 17: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Tomasulo’s Approach + Speculation

15

Fields in ROB1. Instruction type2. Destination3. Value4. Ready

Page 18: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Observations on Speculation

16

• Speculation enables precise exception handling★ defer exception handling until instruction ready to commit

• Branches are critical to performance★ prediction accuracy★ latency of misprediction detection★ misprediction recovery time

•Must avoid hazards through memory ★WAR and WAW already taken care of (how?)★ for RAW

✴ don’t allow load to proceed if an active ROB entry has Destination field matching with A field of load

✴ maintain program order for effective address computation (why?)

Page 19: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Multiple Issue Processor Types

17

Page 20: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Dyn. Scheduling+Multiple Issue+Speculation

18

• Design parameters★ two-way issue (two instruction issues per cycle)★ pipelined and separate integer and FP functional units★ dynamic scheduling, but not out-of-order issue★ speculative execution

• Task per issue: assign reservation station and update pipeline control tables (i.e., control signals)

• Two possible techniques★ do the task in half a clock cycle★ build wider logic to issue any pair of instructions together

• Modern processors use both (4 or more way superscalar)

Page 21: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Shared-Memory Multiprocessors

19

Page 22: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Distributed-Memory Multiprocessors

20

Page 23: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Other Ways to Categorize Parallel Programming

21

data vs task parallel

client-sever vs p-to-p /

master-slave vs symm.

shared m

em vs m

sg

passing

tight

ly v

s lo

osel

y co

uple

dthreads vs

producer-consumer

course vs fine grained

SPM

D vs

MPM

Drecursive vs iterative

synch. vs asynch.

Parallel Programs

Page 24: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Write Invalidate Cache Coherence Protocolfor Write-Back Caches

22

Page 25: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Distributed Memory+Directories

23

Page 26: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Directory-Based Cache Coherence

24

Page 27: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Other Topics

• x86 assembly programming• VLIW / EPIC• Vector processors• Embedded systems• Scientific applications• GPUs and GPGPUs• CUDA and OpenCL• Interconnection networks• Virtualization

25

Page 28: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

WHAT’S NEXT?

26

Page 29: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Future

• Continued importance of parallel programming★ challenge: how to program multiprocessors★ role of programming languages and compilers

• Convergence or specialization?★ “standardization” of general purpose architecture★ migration of “special-purpose” CPUs for general use

27

Page 30: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Landscape of Parallel Computing Research:A View from Berkeley

28