Top Banner
HBO Locks HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Hierarchical Back-Off (HBO) Locks for Non-Uniform Communication Architectures Non-Uniform Communication Architectures Zoran Radovic and Erik Hagersten Zoran Radovic and Erik Hagersten {zoran.radovic, erik.hagersten}@it.uu.se {zoran.radovic, erik.hagersten}@it.uu.se HPCA-9 Ninth International Symposium on High Performance Computer Architecture Anaheim, California, February 8-12, 2003
29

HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

HBO LocksHBO Locks

Uppsala UniversityDepartment of Information Technology

Uppsala Architecture Research Team [UART]

Hierarchical Back-Off (HBO) Locks forHierarchical Back-Off (HBO) Locks forNon-Uniform Communication ArchitecturesNon-Uniform Communication Architectures

Zoran Radovic and Erik HagerstenZoran Radovic and Erik Hagersten{zoran.radovic, erik.hagersten}@it.uu.se{zoran.radovic, erik.hagersten}@it.uu.se

HPCA-9Ninth International Symposium onHigh Performance Computer ArchitectureAnaheim, California, February 8-12, 2003

Page 2: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

Synchronization BasicsSynchronization Basics

Locks are used to protect the shared critical section data

Common software-based solutions: Simple spin-locks

• TATAS (‘84)• TATAS_EXP (‘90)

Queue-based locks• MCS (‘91)• CLH (‘93)

A:=0 BARRIER

LOCK(L)A:=A+1

UNLOCK(L)LOCK(L)B:=A+5

UNLOCK(L)

Page 3: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

Raytrace SpeedupRaytrace Speedup

0

1

2

3

4

5

6

7

8

9

0 4 8 12 16 20 24 28

Number of Processors

Spe

edup

TATAS MCS

Sun WildFire (WF)

14 14

WF

Page 4: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

VasaloppetVasaloppet“Contention Problem in Sweden”“Contention Problem in Sweden”

Traditional cross-country ski race55 miles …

51.6533 miles to

go… CS

Page 5: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

Spin Locks Under ContentionSpin Locks Under Contention

Amount of Contention

Spin locks

Spin locksw/ backoff

Cri

tic

al S

ecti

on

(C

S)

Co

st

IF (more contention) THEN less efficient CS …

“The more important the slower it runs…”

IF (more contention) THEN less efficient CS …

“The more important the slower it runs…”

Page 6: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

Queue-based LocksQueue-based Locks

Amount of Contention

Spin locks

Spin locksw/ backoff

CS

Co

st

Queue-based locks IF (more contention) THEN constant CS cost …

IF (more contention) THEN constant CS cost …

Page 7: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

This TalkThis Talk

Amount of Contention

Queue-based locks

Spin locks

Spin locksw/ backoff

HBO locks

CS

Co

st

IF (more contention) THEN more efficient CS …

“The more important the faster it runs…”

IF (more contention) THEN more efficient CS …

“The more important the faster it runs…”

Page 8: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

Raytrace SpeedupRaytrace Speedup

0

1

2

3

4

5

6

7

8

9

0 4 8 12 16 20 24 28

Number of Processors

Spe

edup

TATAS MCS

HBO Locks

Sun WildFire (WF)

14 14

WF

Page 9: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

OutlineOutline

Background & Motivation NUMA vs. NUCA Architectures Hierarchical Back-Off (HBO) Locks

HBO HBO_GT HBO_GT with starvation detection/avoidance

Performance Results Conclusions

Page 10: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

Switch

Non-Uniform MemoryNon-Uniform MemoryArchitecture (NUMA)Architecture (NUMA)

Many NUMA optimizations are proposed Page migration speed up accesses to “private” data Page replication speed up reads to “shared” data

Does not help communication… E.g., synchronization

P1

$

P2

$

P3

$

Pn

$

P1

$

P2

$

P3

$

Pn

$

Memory Memory

12 – 10

Accesstime ratio ...

Page 11: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

A “new” propertyof NUMAs…

NUCA

Non-Uniform CommunicationNon-Uniform CommunicationArchitecture (NUCA)Architecture (NUCA)

NUCA examples (NUCA ratios): 1992: Stanford DASH (~ 4.5) 1996: Sequent NUMA-Q (~ 10) 1999: Sun WildFire (~ 6) 2000: Compaq DS-320 (~ 3.5) Future: CMP, SMT (~ 10)

NUCAratio

Switch

P1

$

P2

$

P3

$

Pn

$

P1

$

P2

$

P3

$

Pn

$

Memory Memory

1 2 – 10

NUCA optimizationsare getting important for

future architectures!

NUCA optimizationsare getting important for

future architectures!

...

Page 12: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

Our GoalsOur Goals

Design scalable spin locks that exploit NUCAs

Create communication affinity Keep the lock in the neighborhood [Mr. Rogers, 1968]

Speeds up lock handover

Lowers the access cost to critical section (CS) data

Reduce remote “probing” traffic Portable and scalable to many NUCA nodes

Page 13: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

The HBO Lock (the simplest HBO)The HBO Lock (the simplest HBO)

What do we need? node_id Compare&swap (CAS) atomic operation

CAS(Lock_address, FREE, node_id)

lock-acquire: If the lock-value is in the state FREE:

• The node_id is CAS-ed into the lock location

Else: 2 cases (for 2 levels of non-uniformity):• The lock is “local” TATAS_EXP with small backoff• The lock is “remote” TATAS_EXP with large backoff

Simple but fairly effective…

CreatesCommunication

Affinity

Page 14: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

The HBO_GT LockThe HBO_GT LockGT = Global ThrottlingGT = Global Throttling

FREE

P

$

P

$

P

$

P

$

Node 2: Memory

P

$

P

$

P

$

P

$

Node 5: Memory

FREE

Lock1:

Lock2:

P

FREE2

P

Local spinning

Remote spinning(w/ exp. backoff)

… …

FREECS2 2 2(remote_node_id)

FREELock3:

0x00000000my_is_

spinning:0x00000000

my_is_spinning:

Probing...(with CAS)

addr(Lock1)

Read a node-local flag...

Page 15: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

The HBO_GT LockThe HBO_GT LockGT = Global ThrottlingGT = Global Throttling

A couple of nanoseconds later …

Page 16: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

The HBO_GT LockThe HBO_GT LockGT = Global ThrottlingGT = Global Throttling

FREE

P

$

P

$

P

$

P

$

Node 2: Memory

P

$

P

$

P

$

P

$

Node 5: Memory

FREE

Lock1:

Lock2:

5

P

Local spinning

Remote spinning(w/ exp. backoff)

… …

FREECS55(remote_node_id)

FREELock3:

0x00000000my_is_

spinning:0x00000000

my_is_spinning:

Probing...(with CAS)

addr(Lock1)

Read a node-local flag...

5

P

Page 17: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

Our NUCA: Sun WildFireOur NUCA: Sun WildFire

NUCAratio

Switch

P1

$

P2

$

P3

$

P14

$

P1

$

P2

$

P3

$

P14

$

Memory Memory

1 6

14 14

WF

...

Page 18: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

Traditional MicrobenchmarkTraditional Microbenchmark

for (i = 0; i < iterations; i++) { LOCK(L); /* null/small Critical Section */ UNLOCK(L);}

For each thread:

Page 19: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

NUCA-performanceNUCA-performanceTraditional microbenchmarkTraditional microbenchmark

0

5

10

15

20

25

30

35

40

45

50

55

60

0 4 8 12 16 20 24 28

Number of Processors

Tim

e [m

icro

seco

nds]

TATAS

MCS

HBO_GT

WF

0

10

20

30

40

50

60

70

80

90

100

0 4 8 12 16 20 24 28Number of Processors

Nod

e ha

ndof

fs [

%]

TATAS

MCS

HBO_GT

Page 20: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

New MicrobenchmarkNew Microbenchmark

for (i = 0; i < iterations; i++) { LOCK(L); delay(critical_workcritical_work); // CS UNLOCK(L); static_delay(); random_delay();}

More realistic node handoffs for queue-locks Constant number of processors Control the “amount of contention”

Page 21: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

Performance ResultsPerformance ResultsNew microbenchmark, 2-node Sun WildFire, 28 CPUsNew microbenchmark, 2-node Sun WildFire, 28 CPUs

3

4

5

6

7

8

9

10

11

12

0 500 1000 1500 2000critical_work

Tim

e [s

econ

ds]

TATAS

MCS

HBO_GT

WF

14 14

0

10

20

30

40

50

60

0 500 1000 1500 2000

critical_work

Nod

e ha

ndof

fs [

%]

Fairness?Fairness?

Page 22: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

Fairness StudyFairness StudyNew microbenchmark, 2-node Sun WildFire, 28 CPUsNew microbenchmark, 2-node Sun WildFire, 28 CPUs

02468

10121416182022242628

0 5 10 15Time [seconds]

Num

ber

of F

inis

hed

Pro

cess

ors TATAS

MCS

HBO_GT

t

Page 23: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

Application PerformanceApplication PerformanceRaytrace SpeedupRaytrace Speedup

WF

0

1

2

3

4

5

6

7

8

0 4 8 12 16 20 24 28

Number of Processors

Spe

edup

TATAS

MCS

Page 24: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

Application PerformanceApplication PerformanceRaytrace SpeedupRaytrace Speedup

WF

0

1

2

3

4

5

6

7

8

0 4 8 12 16 20 24 28

Number of Processors

Spe

edup

TATAS

MCS

HBO

HBO_GT

Page 25: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

HBO Locks Under ContentionHBO Locks Under Contention

Amount of Contention

Queue-based locks

Spin locks

Spin locksw/ backoff

CS

Co

st

HBO locks

Page 26: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

Total Traffic: RaytraceTotal Traffic: Raytrace

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

TATAS TATAS_EXP MCS HBO_GT

Local Transactions Global Transactions

1.11x1.11x

1.45x1.45x

Page 27: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

Application PerformanceApplication Performance28-processor runs28-processor runs

0.0

0.5

1.0

1.5

2.0

2.5

Barne

s

Choles

kyFM

M

Radios

ity

Raytra

ce

Volren

d

Wat

er-N

sq

Avera

ge

No

rma

lize

d S

pe

ed

up

TATAS TATAS_EXP MCS HBO_GT

Page 28: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

First-come, first-served not desirable for NUCAs The HBO lock exploits NUCAs by

creating locality through CS affinity (stable lock) reducing traffic compared with the test&set locks

HBO performs better under contention Traffic is significantly reduced Applications with contented locks scale better with

HBO locks on NUCAs

Starvation detection/avoidance in the paper…

ConclusionsConclusions

Page 29: HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

[email protected]@it.uu.se Uppsala Architecture Research Team (UART)Uppsala Architecture Research Team (UART) HBO LocksHBO Locks

http://www.http://www.it.uu.se/research/group/uartit.uu.se/research/group/uart

UART’s Home PageUART’s Home Page

Supported by Sun Microsystems, Inc., and theParallel and Scientific Computing Institute (PSCI)