Top Banner
Implementation and Performance of Implementation and Performance of Munin (Distributed Shared Memory Munin (Distributed Shared Memory System) System) Dongying Li Department of Electrical and Computer Engineering University of Toronto (Original Authors: J. B. Carter, et al.) ECE 1147, Parallel Computation Oct. 30, 2006
44

Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

Jan 20, 2016

Download

Documents

Marsha Lawson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

Implementation and Performance of Munin Implementation and Performance of Munin (Distributed Shared Memory System)(Distributed Shared Memory System)

Dongying Li

Department of Electrical and Computer Engineering

University of Toronto

(Original Authors: J. B. Carter, et al.)

ECE 1147, Parallel ComputationOct. 30, 2006

Page 2: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

2

Distributed Shared Memory

• Shared address space spanning the processors of a distributed memory multiprocessor

proc1 proc3

X=0

X=0 X=0

proc2

X=0

Page 3: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

3

Distributed Shared Memory

mem0

proc0

mem1

proc1

mem2

proc2

memN

procN

network

...

shared memory

Page 4: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

4

Distributed Shared Memory

• Challenges– Good performance comparable to shared memory

programs

– No significant deviation from shared memory coding model

– Low communication and message passing overheads

Page 5: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

5

Munin System

• Characterized features– Software released consistency– Multiple consistency protocols

• Deviations from shared memory model– Annotated shared memory variable pattern– All Synchronization visible to system

Page 6: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

6

Contents

• Basic concepts– Shared object– Software release consistency– Multiple consistency protocols

• Software implementation– Prototype overview– Execution process– Advanced programming features– Data object directory and delayed update queue– Synchronization

• Performance• Overview of other DSM systems• Conclusion

Page 7: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

7

Basic Concepts

• Basic concepts– Shared object– Software release consistency– Multiple consistency protocols

• Software implementation– Prototype overview– Execution process– Advanced programming features– Data object directory and delayed update queue– Synchronization

• Performance• Overview of other DSM systems• Conclusion

Page 8: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

8

Shared Object

x

y

x

x

8-kilo 8-kilo 8-kilo

Page 9: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

9

Software Release Consistency

• Sequential Consistency– All processors observe the same order– Must correspond to some serial order– Only ordering constraint is that reads/writes of P1

appear in the same order, but no restrictions on relative ordering between processors.

• Synchronous read/write– Writes must be propagated before moving on to the

next operation

Page 10: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

10

Software consistency

• Problems– Message passing overhead– False sharing

w(x)

r(y) r(y) r(x)

w(x) w(x)

Page 11: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

11

Weak Consistency

• Data modifications only propagated at synchronization.• Works fine if program properly synchronized through

system primitives.

w(x)

r(y) r(y) r(x)

synch

w(x) w(x)

Page 12: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

12

Weak Consistency

w(x) w(x)

r(y) r(y) r(x)

synch

Page 13: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

13

Software Release Consistency

• Special weak consistency protocol

• Reduction of message passing overhead

• Two categories of shared variable operations– Ordinary access

• Read• Write

– Synchronization access (lock, semaphore, barrier)• Acquire• Release

Page 14: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

14

Software Release Consistency

• Before ordinary access (read, write) allowed, all previous acquire performed

• Before release allowed, all previous ordinary access performed

• Before acquire allowed, all previous release performed

• Before release allowed, all previous acquire performed

• In a word, results of writes prior to a release propagated before next processor acquiring this released lock

Page 15: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

15

Eager Release Consistency

• Write propagating at release

Page 16: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

16

Lazy Release Consistency

• Write propagating at acquire

Page 17: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

17

Multiple Consistency Protocols

• No single consistency protocol suitable for all parallelization purpose

• Shared variables accessed in different ways within single program

• Variable access pattern changes during execution

• Multiple protocols allow access pattern-oriented tuning for different shared variables

Page 18: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

18

Multiple Consistency Protocols

• High-level sharing pattern annotation– Specified in shared variable declaration– Combinations of low-level protocol parameters

• Low-level protocol parameter– Specified in shared variable directory– Specific aspect of protocol

Page 19: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

19

Protocol Parameters

• I: invalidate or update?

• R: Replicas allowed?

• D: Delayed operation allowed?

• FO: Having fixed owner?

• M: Multiple writers allowed?

• S: Stable access pattern?

• FL: Flushing changes to owner?

• W: Writable? (write protected?)

Page 20: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

20

Sharing annotations

• Read only– Simplest pattern: once initialized, no further access– Suitable for constant etc.

• Migratory– Only one thread can access at one period of time– Suitable for variables accessed only in critical session

• Write-shared– Can be written concurrently by multiple threads– Different threads update different words of variable

• Producer-consumer– Written only by one threads and read by others– Replicate and update the object, not invalidate

Page 21: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

21

Sharing annotations

• Example: producer-consumer

for some number of timesteps/iterations {for (i=0; i<n; i++ )

for( j=1, j<n, j++ )temp[i][j] = 0.25 *

( grid[i-1][j] + grid[i+1][j]grid[i][j-1] + grid[i][j+1] );

for( i=0; i<n; i++ )for( j=1; j<n; j++ )

grid[i][j] = temp[i][j];}

back

Page 22: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

22

Sharing annotations

• Reduction– Accessed by fetching and operation (read, write then

release)– Example: min(), a++

• Result– Phase 1: multiple write allowed– Phase 2: one thread (the result) access exclusively

• Conventional– Conventional update protocol for shared variables

Page 23: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

23

Sharing annotations

Sharing Annotations

Protocol Parameters

I R D FO M S FL W

Read-only N Y - - - - - N

Migratory Y N - N N - N Y

Write-shared N Y Y N Y N N Y

Producer-Consumer

N Y Y N Y Y N Y

Reduction N Y N Y N - N Y

Result N Y Y Y Y - Y Y

Conventional Y Y N N N - N Y

Page 24: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

24

Software Implementation

• Basic concepts– Shared object– Software release consistency– Multiple consistency protocols

• Software implementation– Prototype overview– Execution process– Advanced programming features– Data object directory and delayed update queue– Synchronization

• Performance• Overview of other DSM systems• Conclusion

Page 25: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

25

Prototype Overview

• A simple processor converting annotations to suitable format

• A linker creating the shared memory segment

• Library routines linked into program

• Operating system support for fault handling and page table manipulation

Page 26: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

26

Execution Process

• Compiling

Sharing annotations

Munin processor

Auxiliary file

Linker

Shared data segment

Shared data description table

Page 27: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

27

Execution Process

• Initialization

P1

P2

Pn

.

.

Munin root thread

Munin worker thread

Munin worker thread

User_init()

Code copy

Data segment

Code copy

Data segment

user root thread

Page 28: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

28

Execution Process

• Synchronization

P1

P2

Pn

.

.

Munin root thread

Munin worker thread

Synchronization operation User thread

Page 29: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

29

Advanced Programming Features

• Associate data & Synch back

msg

acq(m) r(x) r(x)

rel(m)

msg

acq(m) r(x)

rel(m)

w(x)

Page 30: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

30

Advanced Programming Features

• PhaseChange()– Change the producer consumer relationship– Example: adaptive mesh sor

• ChangeAnnotation()– Change the access pattern in execution

• Invalidate()

• Flush()

• SingleObject()

• PreAcquire()

Page 31: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

31

Data Object Directory

• Start Address and Size• Protocol parameters• Object state (valid, writable, invalid)• Copyset (which remote has copies)• Synchq (corresponding synchronization object)• Probable owner• Home node• Access control semaphore• Links

Page 32: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

32

Delayed Update Queue

acq(m)w(x) w(y)

rel(m)

x xy

Page 33: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

33

Multiple Writer Handling

Page 34: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

34

Multiple Writer Handling

Page 35: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

35

Synchronization

• Queue based synchronization

• Request – reply – lock forward mechanism

• AcquireLock(), Unlock(), WaitAtBarrier()

Page 36: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

36

Performance

• Basic concepts– Shared object– Software release consistency– Multiple consistency protocols

• Software implementation– Prototype overview– Execution process– Advanced programming features– Data object directory and delayed update queue– Synchronization

• Performance• Overview of other DSM systems• Conclusion

Page 37: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

37

Matrix Multiply

0

50

100

150

200

250

300

350

400

2 Procs 4 Procs 8 Procs 16Procs

DM

Munin

0

2

4

6

8

10

2Procs

4Procs

8Procs

16Procs

Diff %

Page 38: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

38

Matrix Multiply Optimized

0

50

100

150

200

250

300

350

400

2 Procs 4 Procs 8 Procs 16Procs

DM

Munin

0

0.5

1

1.5

2Procs

4Procs

8Procs

16Procs

Diff %

Page 39: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

39

SOR

0

10

20

30

40

50

60

70

2 Procs 4 Procs 8 Procs 16Procs

DM

Munin

0

2

4

6

8

10

2Procs

4Procs

8Procs

16Procs

Diff %

Page 40: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

40

Effect of Multiple Protocols

Protocol Matrix Multiply SOR

Multiple 72.41 27.64

Write-shared 75.59 64.48

Conventional 75.85 67.64

Page 41: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

41

Overview of Other DSM System

• Basic concepts– Shared object– Software release consistency– Multiple consistency protocols

• Software implementation– Prototype overview– Execution process– Advanced programming features– Data object directory and delayed update queue– Synchronization

• Performance• Overview of other DSM systems• Conclusion

Page 42: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

42

Overview of Other DSM System

• Clouds: per-segment (object) based consistency protocol

• Mirage: per-page based• Orca: reliable ordered broadcast protocol• Amber: user responsible for the data distribution

among processors• Linda: shared variable in tuple space, atomic

operation: insertion, removal, reading• Midway: using entry consistency (weaker

consistency than release consistency)• DASH: hardware DSM

Page 43: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

43

Conclusion

• Objective: efficient DSM system with similar protocol to shared memory programming and small message passing overhead

• Special feature: multiple protocols, software release consistency

• Implementation: synchronization realized by Munin root thread and Munin worker threads

Page 44: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

44

Thank you