RH Locks Uppsala University Information Technology Department of Computer Systems Uppsala Architecture Research Team [UART] RH Lock: A Scalable Hierarchical Spin Lock Zoran Radovic and Erik Hagersten {zoranr, eh}@it.uu.se 2nd ANNUAL WORKSHOP ON MEMORY PERFORMANCE ISSUES (WMPI 2002) May 25, 2002, Anchorage, Alaska
25
Embed
RH Locks Uppsala University Information Technology Department of Computer Systems Uppsala Architecture Research Team [UART] RH Lock: A Scalable Hierarchical.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
RH Locks
Uppsala UniversityInformation Technology
Department of Computer SystemsUppsala Architecture Research Team [UART]
RH Lock:A Scalable Hierarchical Spin Lock
RH Lock:A Scalable Hierarchical Spin Lock
Zoran Radovic and Erik Hagersten{zoranr, eh}@it.uu.se
WMPI 2002, Alaska Uppsala Architecture Research Team (UART) RH Locks
Our NUCA: Sun WildFire
Two E6000 connected through a hardware-coherent interface with a raw bandwidth of 800 MB/s in each direction
16 UltraSPARC II (250 MHz) CPUs per node 8 GB memory
NUCA ratio 6
WMPI 2002, Alaska Uppsala Architecture Research Team (UART) RH Locks
Performance on our NUCA
0,00
0,05
0,10
0,15
0,20
0,25
0,30
0,35
0,40
0,45
0,50
0 4 8 12 16 20 24 28 32Processors
Tim
e/P
roce
sso
rs [
seco
nd
s]
TATAS
TATAS_EXP
MCS
CLH
0
10
20
30
40
50
60
70
80
90
100
0 4 8 12 16 20 24 28 32Processors
Nod
e-ha
ndof
fs [
%]
16 16
WMPI 2002, Alaska Uppsala Architecture Research Team (UART) RH Locks
Our Goals
Demonstrate that the first-come first-served nature of queue-based locks is unwanted for NUCAs new microbenchmark: “more realistic” behavior, and real application study
Design a scalable spin lock that exploits the NUCAs creating a controlled unfairness (stable lock), and reducing the traffic compared with the test&set locks
WMPI 2002, Alaska Uppsala Architecture Research Team (UART) RH Locks
Outline
History & BackgroundNUMA vs. NUCAExperimentation Environment The RH Lock Performance Results Application Performance Conclusions
WMPI 2002, Alaska Uppsala Architecture Research Team (UART) RH Locks
Key Ideas Behind RH Lock
Minimizing global traffic at lock-handover Only one thread per node will try to acquire a “remote” lock
Maximizing node locality of NUCAs Handover the lock to a neighbor in the same node Creates locality for the critical section (CS) data as well Especially good for large CS and high contention
RH lock in a nutshell: Double TATAS_EXP: one node-local lock + one “global”
WMPI 2002, Alaska Uppsala Architecture Research Team (UART) RH Locks
The RH Lock Algorithm
FREE
P1
$
P2
$
P3
$
P16
$
Cabinet 1: Memory
REMOTE
P17
$
P18
$
P19
$
P32
$
Cabinet 2: Memory
FREEREMOTELock1:
Lock2:
Lock1:
Lock2:
P2
2
P19
19else:
TATAS(my_TID, Lock)until FREE or
L_FREE
if “REMOTE”:Spin remotely
CAS(FREE, REMOTE)until FREE
(w/ exp backoff)
… …
FREECS
1
2
16
1 REMOTE
32L_FREE
Acquire:SWAP(my_TID, Lock)If (FREE or L_FREE) You’ve got it!
Release:CAS(my_TID, FREE) else L_FREE)
16
FREECS
IF (more contention) THEN more efficient CS
IF (more contention) THEN more efficient CS
WMPI 2002, Alaska Uppsala Architecture Research Team (UART) RH Locks
Performance ResultsTraditional microbenchmark, 2-node Sun WildFire
0
10
20
30
40
50
60
70
80
90
100
0 4 8 12 16 20 24 28 32Processors
Nod
e-ha
ndof
fs [
%]
TATAS
TATAS_EXP
MCS
CLH
RH Fair_factor = 1
RH Fair_factor = 50
RH Fair_factor = 100
0,00
0,05
0,10
0,15
0,20
0,25
0,30
0,35
0,40
0,45
0,50
0 4 8 12 16 20 24 28 32Processors
Tim
e/P
roce
ssor
s [s
econ
ds]
TATAS
TATAS_EXP
MCS
CLH
RH
WMPI 2002, Alaska Uppsala Architecture Research Team (UART) RH Locks