Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

Post on 31-Dec-2015

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Reduced Hardware NOrec: A Safe and Scalable

Hybrid Transactional Memory

Alexander MatveevNir Shavit

MIT

Good: Hardware Transactional Memory (HTM)

HTM may always fail due to:1. L1 cache capacity2. Interrupt3. Unsupported instruction

Bad: The HTM is “best-effort”

To ensure progress, we need

a software fallback

Thread 1 Thread 2

1. HTM Start

2. Read lock and check it is free

3. ... code …

4. HTM Commit

1. HTM Start

2. Read lock and check it is free

3. ... code …

4. HTM Commit

No conflict – HTMs commit concurrently

A Possible Solution is:Lock Elision

1. Lock1. Lock

2. Unlock2. Unlock

Thread 1 Thread 2

1. HTM Start

2. Read lock and check it is free

3. ... code …

1. HTM Start

2. Read lock and check it is free

3. ... code …

No concurrency between hardware and software

Thread 3

1. HTM Start

2. Read lock and check it is free

3. ... code …3. ... FAIL … HTM Restart

1. Acquire Lock

2. ... code …

3. Release Lock

4. ... CONFLICT … HTM Restart

4. ... CONFLICT … HTM Restart

Wait for LockWait for Lock

A Possible Solution is:Lock Elision

• Good– Simple: No need to instrument reads and

writes• Bad:

– Serial fallback: A software fallback grabs the global lock and aborts all hardware transactions

A Possible Solution is:Lock Elision

Thread 1 Thread 2

1. HTM Start

2. Read lock and check it is free

3. ... code …

1. HTM Start

2. Read lock and check it is free

3. ... code …

Thread 3

1. HTM Start

2. Read lock and check it is free

3. ... code …3. ... FAIL … HTM Restart

1. STM Start

2. ... code …

3. … more code …

4. ... more code …4. ... more code

STM and HTM execute concurrently

Another Approach is:Hybrid Transactional Memory

• Good– Hardware-Software Concurrency

• Bad:– Complex:1. Hard to coordinate hardware and software

2. Hard to apply to code due to instrumentation

Another Approach is:Hybrid Transactional Memory

Our focus

GCC C/C++ TM helps here a lot

• 2006: First Hybrid TM [DamronFedorovaLevLuchangcoMoirNussbaum]

– Key Idea: Use per location metadata version-locks to coordinate hardware and software

• Bad:– Hardware is slow: on each read/write must

read the version-lock and execute a branch condition check

Hybrid TM History

• 2007: Phased TM [LevMoirNussbaum]

– Key Idea: Use HTM mode or STM mode, but not HTM and STM at the same time

• Bad:– Expensive to switch modes: a single fallback

must stop all hardware

Hybrid TM History

• 2011: Hybrid Norec (state-of-the-art) [DalessandroCarougeWhiteLevMoirScottSpear]

– Key Idea: No metadata + global clock for coordination

Hybrid TM History

• Good– No metadata: Efficient for low concurrency

• Bad:– Limited Scalability: too much aborts due to

global clock updates• A software write must abort all hardware• A hardware write must abort all software

Hybrid NOrec

Hybrid NOrec

Slow-Path: Software

Read X (pure)Lock clock

ABORTX = 4

Fast-Path: Hardware

Unlock clock

Read clock Read clock

Read clockRead X

Read clock

RESTART

Update clock

Read X (verify clock)

Read X:check clock =>

changed => restart/revalidate

• 2011: Hybrid NOrec 2 [RiegelMarlierNowackFelberFetzer]

– Key Idea: Use non-speculative reads inside HTM to verify the global clock and avoid unnecessary aborts

• Bad:– HTM of Intel and IBM has no support for non-

speculative reads

A Possible Solution

• 2014: Invyswell Hybrid [CalciuGottschlichShpeismanPokamHerlihy]

– Key Idea: Allow unsafe concurrency between hardware and software, and use the HTM sandboxing to detect and handle errors

A Recent Approach

Invyswell

Slow-Path: Software

Read X (NEW)

Lock clock

X = 4 (NEW)

Read Y (OLD)

Func(X, Y): UnsafeHopes HTM aborts

Y = 8 (NEW)

Unlock clock

Update clock

Fast-Path: Hardware

NO ABORT

FUTURE

• Good– Much less aborts than Hybrid Norec

• Bad:– Unfortunately, HTM sandboxing may miss

errors, so a corrupted transactions may commit and crash the system:

– This problem was shown in a recent work: “Pitfalls of Lazy Subscription” by [DiceHarrisKoganLevMoir]

Invyswell

• 2015: RH NOrec [MatveevShavit]

– Key Idea: Use a “mixed” fallback path, that uses both software and short hardware transactions

Our New Approach

RH NOrecSlow-Path: Software

Read X (NEW)

Lock clock

X = 4 (NEW)

Read Y (OLD)

Func(X, Y): UnsafeHopes HTM aborts

Y = 8 (NEW)

Unlock clock

Update clock

Fast-Path: Hardware

X = 4 (HIDDEN)

Y = 8 (HIDDEN)

HTM

X and Y both OLD or both NEW – not a mix

Read X (OLD)

Read Y (OLD)

Func(X, Y) Safe!

A Writes are speculative (invisible)

Mixed Slow-Path

• Key Point 1: Execute software writes in a short hardware transaction – No need to abort hardware transactions– Full safety

• In practice this works well– Due to the 80:20 rule: a typical operation has

80% reads and 20% writes

RH NOrec

• Key Point 2: Execute a maximal amount of initial software reads in a read-only hardware transaction – Allows to defer the global clock read, and

significantly reduce the software restarts/revalidations

RH NOrec

HTM start

…reads/writes…

Update clock

HTM commit

Fast-Path: Hardware Mixed Path

Read clock

RESTARTRead some X:check clock =>

changed => restart/revalidate

… reads in software …(verifies clock)

HTM start

…reads/writes…

Update clock

HTM commit

HTM start

…reads in HTM… (pure/direct)

Read clock

HTM commit

HTM Prefix

Fast-Path: Hardware Mixed Path

NO ABORTNO ABORT

HTM start

…reads/writes…

Update clock

HTM commit

HTM start

…reads in HTM… (pure/direct)

Read clock

HTM commit

HTM Prefix

…reads in software…

HTM start

HTM commit

HTM Postfix

Lock clock

…writes in HTM…

Unlock clock

HTM start

Update clock

HTM commitNO ABORTNO ABORT

…reads/writes…

Throughput on 8-core Intel (GCC C/C++)

1 2 4 6 8 10 12 14 160.00E+00

1.00E+08

2.00E+08

3.00E+08

4.00E+08

5.00E+08

6.00E+08

7.00E+08

Lock ElisionRH-NORecTL2HY-NORec

Red-Black Tree (10K)10% mutations

1 2 4 6 8 10 12 14 160.00E+00

5.00E+07

1.00E+08

1.50E+08

2.00E+08

2.50E+08

3.00E+08

3.50E+08

4.00E+08

4.50E+08

Lock ElisionRH-NORecTL2HY-NORec

Red-Black Tree (10K)40% mutations

1 2 4 6 8 10 12 14 160.00E+00

2.00E+05

4.00E+05

6.00E+05

8.00E+05

1.00E+06

1.20E+06

1.40E+06

Lock Elision RH-NORecTL2 HY-NORecNORec

Vacation Database (STAMP - Low)

1 2 4 6 8 10 12 14 160.00E+00

5.00E+05

1.00E+06

1.50E+06

2.00E+06

2.50E+06

3.00E+06

3.50E+06

4.00E+06

Lock Elision RH-NORecTL2 HY-NORecNORec

Intruder Detection (STAMP)

1 2 4 6 8 10 12 14 160.00E+001.00E+052.00E+053.00E+054.00E+055.00E+056.00E+057.00E+058.00E+059.00E+051.00E+06

Lock Elision RH-NORecTL2 HY-NORecNORec

Genome Sequencing (STAMP)

1 2 4 6 8 10 12 14 160.00E+00

5.00E+05

1.00E+06

1.50E+06

2.00E+06

2.50E+06

3.00E+06

3.50E+06

4.00E+06

4.50E+06

Lock Elision RH-NORecTL2 HY-NORecNORec

SSCA2 (STAMP)

• RH Norec: a new Hybrid TM that is safe and scalable

• Key Idea: Use a “mixed” fallback path that uses two short hardware transactions:1. HTM Prefix: Executes a maximal amount of

initial reads – defers the global clock read2. HTM Postfix: Executes the software writes –

preserves safety and allows hardware-software concurrency

Conclusion

Thank You

top related