Top Banner
Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian R. Murphy, Bratin Saha, Tatiana Shpeisman
25

Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

Mar 26, 2015

Download

Documents

Julian Muñoz
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

Compiler and Runtime Supportfor Efficient

Software Transactional Memory

Vijay Menon

Programming Systems Lab

Ali-Reza Adl-Tabatabai, Brian T. Lewis,Brian R. Murphy, Bratin Saha, Tatiana Shpeisman

Page 2: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

2

Motivation

Locks are hard to get right• Programmability vs scalability

Transactional memory is appealing alternative• Simpler programming model• Stronger guarantees

•Atomicity, Consistency, Isolation•Deadlock avoidance

• Closer to programmer intent• Scalable implementations

Questions• How to lower TM overheads – particularly in software?• How to balance granularity / scalability?

Page 3: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

3

Our System

Java Software Transactional Memory (STM) System– Pure software implementation (McRT-STM – PPoPP ’06)– Language extensions in Java (Polyglot)– Integrated with JVM & JIT (ORP & StarJIT)

Novel Features– Rich transactional language constructs in Java– Efficient, first class nested transactions– Complete GC support – Risc-like STM API / IR– Compiler optimizations– Per-type word and object level conflict detection

Page 4: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

4

Transactional Java → Java

Transactional Java

atomic {

S;

}

Other Language Constructs• Built on prior research

– retry (STM Haskell, …)– orelse (STM Haskell) – tryatomic (Fortress)– when (X10, …)

Standard Java + STM API

while(true) {

TxnHandle th = txnStart();

try {

S’;

break;

} finally {

if(!txnCommit(th))

continue;

}

}

Page 5: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

5

Tight integration with JVM & JIT

StarJIT & ORP

• On-demand cloning of methods (Harris ’03)

• Identifies transactional regions in Java+STM code

• Inserts read/write barriers in transactional code

• Maps STM API to first class opcodes in StarJIT IR (STIR)

Good compiler representation →

greater optimization opportunities

Page 6: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

6

Representing Read/Write Barriers

atomic {

a.x = t1

a.y = t2

if(a.z == 0) {

a.x = 0

a.z = t3

}

}

stmWr(&a.x, t1)

stmWr(&a.y, t2)

if(stmRd(&a.z) != 0) {

stmWr(&a.x, 0);

stmWr(&a.z, t3)

}

Traditional barriers hide redundant locking/logging

Page 7: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

7

An STM IR for Optimization

Redundancies exposed:

atomic {

a.x = t1

a.y = t2

if(a.z == 0) {

a.x = 0

a.z = t3

}

}

txnOpenForWrite(a)

txnLogObjectInt(&a.x, a)

a.x = t1

txnOpenForWrite(a)

txnLogObjectInt(&a.y, a)

a.y = t2

txnOpenForRead(a)

if(a.z != 0) {

txnOpenForWrite(a)

txnLogObjectInt(&a.x, a)

a.x = 0

txnOpenForWrite(a)

txnLogObjectInt(&a.z, a)

a.z = t3

}

Page 8: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

8

Optimized Code

atomic {

a.x = t1

a.y = t2

if(a.z == 0) {

a.x = 0

a.z = t3

}

}

txnOpenForWrite(a)

txnLogObjectInt(&a.x, a)

a.x = t1

txnLogObjectInt(&a.y, a)

a.y = t2

if(a.z != 0) {

a.x = 0

txnLogObjectInt(&a.z, a)

a.y = t3

}

Fewer & cheaper STM operations

Page 9: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

9

Compiler Optimizations for Transactions

Standard optimizations• CSE, Dead-code-elimination, …

• Careful IR representation exposes opportunities and enables optimizations with almost no modifications

• Subtle in presence of nesting

STM-specific optimizations• Immutable field / class detection & barrier removal (vtable/String)

• Transaction-local object detection & barrier removal

• Partial inlining of STM fast paths to eliminate call overhead

Page 10: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

10

McRT-STM

PPoPP 2006 (Saha, et. al.)• C / C++ STM• Pessimistic Writes:

– strict two-phase locking– update in place– undo on abort

• Optimistic Reads: – versioning– validation before commit

• Benefits– Fast memory accesses (no buffering / object wrapping)– Minimal copying (no cloning for large objects)– Compatible with existing types & libraries

Similar STMs: Ennals (FastSTM), Harris, et.al (PLDI ’06)

Page 11: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

11

STM Data Structures

Per-thread:

• Transaction Descriptor– Per-thread info for version validation, acquired locks, rollback– Maintained in Read / Write / Undo logs

• Transaction Memento– Checkpoint of logs for nesting / partial rollback

Per-data:

• Transaction Record– Pointer-sized field guarding a set of shared data– Transactional state of data

• Shared: Version number (odd)• Exclusive: Owner’s transaction descriptor (even / aligned)

Page 12: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

12

Mapping Data to Transaction Record

Every data item has an associated transaction record

TxR1

TxR2

TxR3

…TxRn

Object words hashinto table of TxRs

Hash is f(obj.hash, offset)

class Foo { int x; int y;}

TxRxy

vtbl Transactionrecord embedded

In objectObject

granularity

Wordgranularity

class Foo { int x; int y;}

hashxy

vtbl

Page 13: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

13

Granularity of Conflict Detection

Object-level• Cheaper operation• Exposes CSE opportunities• Lower overhead on 1P

Word-level • Reduces false sharing• Better scalability

Mix & Match• Per type basis• E.g., word-level for arrays,

object-level for non-arrays

// Thread 1

a.x = …

a.y = …

// Thread 2

… = … a.z …

Page 14: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

14

Experiments

16-way 2.2 GHz Xeon with 16 GB shared memory• L1: 8KB, L2: 512 KB, L3: 2MB, L4: 64MB (per four)

Workloads• Hashtable, Binary tree, OO7 (OODBMS)

– Mix of gets, in-place updates, insertions, and removals

• Object-level conflict detection by default– Word / mixed where beneficial

Page 15: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

15

Effective of Compiler Optimizations

1P overheads over thread-unsafe baseline

Prior STMs typically incur ~2x on 1PWith compiler optimizations:

- < 40% over no concurrency control- < 30% over synchronization

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

HashMap TreeMap

% O

verh

ead

on

1P

Synchronized

No STM Opt

+Base STM Opt

+Immutability

+Txn Local

+Fast Path Inlining

Page 16: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

16

Scalability: Java HashMap Shootout

Unsafe (java.util.HashMap)• Thread-unsafe w/o Concurrency Control

Synchronized• Coarse-grain synchronization via SynchronizedMap wrapper

Concurrent (java.util.concurrent.ConcurrentHashMap)• Multi-year effort: JSR 166 -> Java 5• Optimized for concurrent gets (no locking)• For updates, divides bucket array into 16 segments (size / locking)

Atomic• Transactional version via “AtomicMap” wrapper

Atomic Prime• Transactional version with minor hand optimization

• Tracks size per segment ala ConcurrentHashMap

Execution• 10,000,000 operations / 200,000 elements• Defaults: load factor, threshold, concurrency level

Page 17: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

17

Scalability: 100% Gets

Atomic wrapper is competitive with ConcurrentHashMapEffect of compiler optimizations scale

02468

10121416

0 4 8 12 16

# of Processors

Sp

eed

up

over

1P

Un

safe

Unsafe Synchronized Concurrent

Atomic (No Opt) Atomic

Page 18: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

18

Scalability: 20% Gets / 80% Updates

ConcurrentHashMap thrashes on 16 segmentsAtomic still scales

0

24

6

8

1012

14

16

0 4 8 12 16

# of Processors

Sp

eed

up

ove

r 1P

Un

safe

Synchronized Concurrent Atomic (No Opt) Atomic

Page 19: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

19

20% Inserts and Removes

Atomic conflicts on entire bucket array- The array is an object

0

0.5

1

1.5

2

2.5

3

0 4 8 12 16

# of Processors

Sp

eed

up

ove

r 1P

Un

safe

Synchronized Concurrent Atomic

Page 20: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

20

20% Inserts and Removes: Word-Level

We still conflict on the single size field in java.util.HashMap

0

0.5

1

1.5

2

2.5

3

0 4 8 12 16

# of Processors

Sp

eed

up

ove

r 1P

Un

safe

Synchronized Concurrent

Object Atomic Word Atomic

Page 21: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

21

20% Inserts and Removes: Atomic Prime

Atomic Prime tracks size / segment – lowering bottleneckNo degradation, modest performance gain

0

0.5

1

1.5

2

2.5

3

0 4 8 12 16

# of Processors

Sp

eed

up

ove

r 1P

Un

safe

Synchronized ConcurrentObject Atomic Word AtomicWord Atomic Prime

Page 22: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

22

20% Inserts and Removes: Mixed-Level

Mixed-level preserves wins & reduces overheads-word-level for arrays-object-level for non-arrays

0

0.5

1

1.5

2

2.5

3

0 4 8 12 16

# of Processors

Sp

eed

up

ove

r 1P

Un

safe

Synchronized ConcurrentObject Atomic Word AtomicWord Atomic Prime Mixed Atomic Prime

Page 23: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

23

Key Takeaways

Optimistic reads + pessimistic writes is nice sweet spot

Compiler optimizations significantly reduce STM overhead- 20-40% over thread-unsafe

- 10-30% over synchronized

Simple atomic wrappers sometimes good enough

Minor modifications give competitive performance to complex fine-grain synchronization

Word-level contention is crucial for large arrays

Mixed contention provides best of both

Page 24: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

24

Novel Contributions

Rich transactional language constructs in Java

Efficient, first class nested transactions

Complete GC support

Risc-like STM API

Compiler optimizations

Per-type word and object level conflict detection

Page 25: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.

25