Top Banner
Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University of Mashhad [email protected]
67

Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Dec 30, 2015

Download

Documents

Ross Paul
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Multiprocessor Architecture Basics

Companion slides forThe Art of Multiprocessor

Programmingby Maurice Herlihy & Nir Shavit

Ahmed Khademzadeh Azad University of Mashhad

[email protected]

Page 2: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

2

Multiprocessor Architecture

• Abstract models are (mostly) OK to understand algorithm correctness and progress

• To understand how concurrent algorithms actually perform

• You need to understand something about multiprocessor architectures

Page 3: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

3

Pieces

• Processors• Threads• Interconnect• Memory• Caches

Page 4: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

6

Processors

• Cycle:– Fetch and execute one instruction

• Cycle times change– 1980: 10 million cycles/sec– 2005: 3,000 million cycles/sec

Page 5: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

7

Computer Architecture

• Measure time in cycles– Absolute cycle times change

• Memory access: ~100s of cycles– Changes slowly– Mostly gets worse

Page 6: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

8

Threads

• Execution of a sequential program• Software, not hardware• A processor can run a thread• Put it aside

– Thread does I/O– Thread runs out of time

• Run another thread

Page 7: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

9

Interconnect

• Bus– Like a tiny Ethernet– Broadcast medium– Connects

• Processors to memory• Processors to processors

• Network– Tiny LAN– Mostly used on large machines

SMP

memory

Page 8: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

10

Interconnect

• Interconnect is a finite resource• Processors can be delayed if others

are consuming too much• Avoid algorithms that use too

much bandwidth

Page 9: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

11

Analogy

• You work in an office• When you leave for lunch, someone

else takes over your office.• If you don’t take a break, a security

guard shows up and escorts you to the cafeteria.

• When you return, you may get a different office

Page 10: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

12

Processor and Memory are Far Apart

processor

memory

interconnect

Page 11: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

13

Reading from Memory

address

Page 12: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

14

Reading from Memory

zzz…

Page 13: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

15

Reading from Memory

value

Page 14: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

16

Writing to Memoryaddress, value

Page 15: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

17

Writing to Memory

zzz…

Page 16: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

18

Writing to Memory

ack

Page 17: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

22

Cache: Reading from Memory

address

cache

Page 18: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

23

Cache: Reading from Memory

cache

Page 19: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

24

Cache: Reading from Memory

cache

Page 20: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

25

Cache Hit

cache

?

Page 21: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

26

Cache Hit

cacheYes!

Page 22: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

27

Cache Miss

address

cache

?No…

Page 23: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

28

Cache Miss

cache

Page 24: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

29

Cache Miss

cache

Page 25: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

30

Local Spinning

• With caches, spinning becomes practical

• First time– Load flag bit into cache

• As long as it doesn’t change– Hit in cache (no interconnect used)

• When it changes– One-time cost– See cache coherence below

Page 26: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

31

Granularity

• Caches operate at a larger granularity than a word

• Cache line: fixed-size block containing the address

Page 27: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

32

Locality

• If you use an address now, you will probably use it again soon– Fetch from cache, not memory

• If you use an address now, you will probably use a nearby address soon– In the same cache line

Page 28: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

33

Hit Ratio

• Proportion of requests that hit in the cache

• Measure of effectiveness of caching mechanism

• Depends on locality of application

Page 29: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

34

L1 and L2 Caches

L1

L2

Page 30: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

35

L1 and L2 Caches

L1

L2

Small & fast1 or 2 cycles~16 byte line

Page 31: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

36

L1 and L2 Caches

L1

L2

Larger and slower10s of cycles~1K line size

Page 32: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

37

When a Cache Becomes Full…

• Need to make room for new entry• By evicting an existing entry• Need a replacement policy

– Usually some kind of least recently used heuristic

Page 33: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

38

Fully Associative Cache

• Any line can be anywhere in the cache– Advantage: can replace any line– Disadvantage: hard to find lines

Page 34: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

39

Direct Mapped Cache

• Every address has exactly 1 slot– Advantage: easy to find a line– Disadvantage: must replace fixed line

Page 35: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

40

K-way Set Associative Cache

• Each slot holds k lines– Advantage: pretty easy to find a line– Advantage: some choice in replacing

line

Page 36: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

41

Contention

• Alice and Bob are both writing research papers on aardvarks.

• Alice has encyclopedia vol AA-AC• Bob asks library for it

– Library asks Alice to return it– Alice returns it & rerequests it– Library asks Bob to return it…

Page 37: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

42

Contention

• Good to avoid memory contention.• Idle processors• Consumes interconnect bandwidth

Page 38: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

43

Contention

• Alice is still writing a research paper on aardvarks.

• Carol is writing a tourist guide to the German city of Aachen

• No conflict?– Library deals with volumes, not articles– Both require same encyclopedia volume

Page 39: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

44

False Sharing

• Two processors may conflict over disjoint addresses

• If those addresses lie on the same cache line

Page 40: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

45

False Sharing

• Large cache line size– increases locality– But also increases likelihood of false

sharing

• Sometimes need to “scatter” data to avoid this problem

Page 41: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

46

Cache Coherence

• Processor A and B both cache address x

• A writes to x– Updates cache

• How does B find out?• Many cache coherence protocols in

literature

Page 42: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

47

MESI

• Modified– Have modified cached data, must

write back to memory

Page 43: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

48

MESI

• Modified– Have modified cached data, must

write back to memory

• Exclusive– Not modified, I have only copy

Page 44: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

49

MESI

• Modified– Have modified cached data, must

write back to memory

• Exclusive– Not modified, I have only copy

• Shared– Not modified, may be cached

elsewhere

Page 45: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

50

MESI

• Modified– Have modified cached data, must write back

to memory

• Exclusive– Not modified, I have only copy

• Shared– Not modified, may be cached elsewhere

• Invalid– Cache contents not meaningful

Page 46: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

51

Bus

Processor Issues Load Request

Bus

cache

memory

cachecache

data

load x

(1)

Page 47: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

52

cache

Bus

Memory Responds

Bus

memory

cachecache

data

Got it!

data

(3)

E

Page 48: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

53

Bus

Processor Issues Load Request

Bus

memory

cachecachedata

data

Load x

(2)

E

Page 49: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

54

Bus

Other Processor Responds

memory

cachecache

data

Got it

datadata

Bus

(2)

ES S

Page 50: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

55

S

Modify Cached Data

Bus

data

memory

cachedata

data

data

(1)

S

Page 51: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

56

S

memory data

data datadata

Bus

Write-Through Cache

Bus

cachedata

Write x!

(5)

S

Page 52: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

57

Write-Through Caches

• Immediately broadcast changes • Good

– Memory, caches always agree– More read hits, maybe

• Bad– Bus traffic on all writes– Most writes to unshared data– For example, loop indexes …

(1)

Page 53: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

58

Write-Through Caches

• Immediately broadcast changes • Good

– Memory, caches always agree– More read hits, maybe

• Bad– Bus traffic on all writes– Most writes to unshared data– For example, loop indexes …

“show stoppers”

(1)

Page 54: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

59

Write-Back Caches

• Accumulate changes in cache• Write back when line evicted

– Need the cache for something else– Another processor wants it

Page 55: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

60

Bus

Invalidate

Bus

memory

cachedatadata

data

cache

Invalidate x

(4)

SS MI

Page 56: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

61

Multicore Architectures

• The university president– Alarmed by fall in productivity

• Puts Alice, Bob, and Carol in same corridor– Private desks– Shared bookcase

• Contention costs go way down

Page 57: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

62

cache

Bus

Old-School Multiprocessor

Bus

memory

cachecache

Page 58: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

63

cache

BusBus

memory

cachecachecache

Multicore Architecture

Page 59: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

64

Multicore

• Private L1 caches• Shared L2 caches• Communication between same-

chip processors now very fast• Different-chip processors still not

so fast

Page 60: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

65

NUMA Architectures

• Alice and Bob transfer to NUMA State University

• No centralized library• Each office basement holds part of

the library

Page 61: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

66

Distributed Shared-Memory Architectures

• Alice’s has volumes that start with A– Aardvark papers are convenient: run

downstairs– Zebra papers are inconvenient: run

across campus

Page 62: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

67

SMP vs NUMA

SMP

memory

NUMA

(1)

• SMP: symmetric multiprocessor• NUMA: non-uniform memory access• CC-NUMA: cache-coherent …

Page 63: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

69

Recall: Real Memory is Relaxed

• Remember the flag principle?– Alice and Bob’s flag variables false

• Alice writes true to her flag and reads Bob’s

• Bob writes true to his flag and reads Alice’s

• One must see the other’s flag true

Page 64: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

70

Not Necessarily So

• Sometimes the compiler reorders memory operations

• Can improve– cache performance– interconnect use

• But unexpected concurrent interactions

Page 65: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

71

Write Buffers

address

• Absorbing• Batching

Page 66: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

72

Volatile

• In Java, if a variable is declared volatile, operations won’t be reordered

• Expensive, so use it only when needed

Page 67: Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.

Art of Multiprocessor Programming

73

         This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

• You are free:– to Share — to copy, distribute and transmit the work – to Remix — to adapt the work

• Under the following conditions:– Attribution. You must attribute the work to “The Art of

Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work).

– Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

• For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to– http://creativecommons.org/licenses/by-sa/3.0/.

• Any of the above conditions can be waived if you get permission from the copyright holder.

• Nothing in this license impairs or restricts the author's moral rights.