Top Banner
More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet
30

More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

More on

Thread Level Speculation

Anthony GitterDafna Shahaf

Or Sheffet

Page 2: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Thread Level Speculation (TLS)

A technique for automatic parallelization.• Run threads in parallel, but in a speculative state. • Check for violations.• Commit upon successful completion.• Squash when detecting a violation.

– Propagate the squash onwards.– Re-run the thread.

Page 3: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Thread Level Speculation Example

Page 4: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Mechanism of TLS1. Managing speculative state.2. Disambiguation: checking addresses for violating

dependencies– Eager vs. Lazy

3. Upon commit– Broadcast (Everybody? Relevant?)– Invalidate/update of other threads– Leave speculative state

4. Upon squash– Broadcast– Invalidate changes for this thread– Re-run

At hardware level. Involve Cache.

Simple. Fast.

Page 5: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Scenarios

• Thread attributes:– Length– Memory accesses– Dependences

??

Many

??0

Serial Easily parallel

ShortManyFew

TLS costly

ShortFewFew

TLS works

LongFewFew

TLS costly

LengthAccessesDepend.

Page 6: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

When is TLS Too Costly?

• “Too much data” scenario– Thread touches too many addresses.

• “Too much time” scenario– Execution involves many instructions

(e.g. Databases transactions).

Bulk Disambiguation of Speculative Threads in multiprocessors

Ceze, Tuck, Cascaval, Torrellas.

Tolerating Dependences Between Large Speculative Threads Via Sub-Threads

Colohan, Ailamaki, Steffan, Mowry.

Page 7: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Too Many Addresses – Solution 1

Each thread maintains a bitwise mask of the cache.• Flip bit on when touching an address.• Upon completion, check addresses you and others touched.

(Lazy)• Commit / Squash : send mask.• Invalidating/replacing/changing address state in cache:

use mask.

All bitwise operations. Very simple!Infeasible for size reasons (won’t scale).

Page 8: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Solution: Hash!

Introducing BULK - a hardware that hashes the address space into a signature (~2k in size).

0 1 0 1 0 0 0 0 1

0 0 1 1 0 0 1 0 0

0 1 1 1 0 0 1 0 1

Address Space

Signature

Bitwise OR

Upon completion, send signature!

Upon receiving, pull back to a superset of possible addresses.

Page 9: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Bulk Features:

• Separate Reading / Writing signatures.

• Committing: sending signature.• Invalidating: pulling back signature

into a superset.• Granularity is on word level

(not cache line)– since we map addresses

Caveat:We might see violations even if there weren't any!

Page 10: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Bulk Performance

Page 11: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Bulk Performance

Fraction of False Positives as a function of Signature Length

Page 12: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

When is TLS Too Costly?

• “Too much data” scenario– Thread touches too many addresses.

• “Too much time” scenario– Execution involves many instructions

(e.g. Databases transactions).

Bulk Disambiguation of Speculative Threads in multiprocessors

Ceze, Tuck, Cascaval, Torrellas.

Tolerating Dependences Between Large Speculative Threads Via Sub-Threads

Colohan, Ailamaki, Steffan, Mowry.

Page 13: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Handling Long Threads (Attempt 1)

Image courtesy Chris Colohan

Q: Does eliminating a data dependence help?

*p=

*q=

=*p

R2

Violation!

=*p

=*q

Parallel

Upon violation – we re-execute a long thread.

Page 14: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Handling Long Threads (Attempt 1)

*p=

*q=

=*p

R2

Violation!

=*p

=*q

Parallel

*q==*q

=*q

Violation!

Eliminate *p Dep.

Image courtesy Chris Colohan

Page 15: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Handling Long Threads (Attempt 2):Sub-Threads

• Sub-threads are checkpoints during thread execution

• No longer “all or nothing”

• Must be lightweight• Help with primary and

secondary violations

*q=Violation! =*q

=*q

Image courtesy Chris Colohan

Page 16: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Sub-thread Implementation

• Assume CMP with shared L2• L1 is unaware of sub-threads

– Speculatively modified bit per cache line• L2 performs eager violation detection

– 2 additional bits per cache line per sub-thread– Replication to track different sub-thread contexts

Page 17: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

17

Sub-thread Evaluation

0

0.2

0.4

0.6

0.8

1

1.2

Idle CPU

Failed

Cache Miss

Busy

Tim

e (n

orm

aliz

ed)

New O

rder

New O

rder

150

Deliv

ery

Deliv

ery

Outer

Stock

Lev

el

Paym

ent

Order

Sta

tus

N S L N S L N S L N S L N S L N S L N S L

N = no sub-threadsS = with sub-threads

L = limit, ignoring violationsImage courtesy Chris Colohan

Page 18: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Summary

• Thread attributes:– Length– Memory accesses– Dependences

??

Many

??0

Serial Easily parallel

ShortFewFew

TLS works

LongManyFew

Hopeless??

LengthAccessesDepend.

ShortManyFew

LongFewFew

TLS costlyBULK

TLS costlySub-Threads

Page 19: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Open Questions

• Long threads that also touch many addresses.– Bulk on top of sub-threads?

• Combining lazy/eager evaluations

Thank you!

Page 20: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Backup Slides

Page 21: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

21

Buffering Large Threadsstore X, 0x00

L1$

0x00:

0x01:

L2$

X

0x00:

0x01:

L1$

0x00:

0x01:

XS1

Store and load bit per thread

Store and load bit per thread

Slide courtesy Chris Colohan

Page 22: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

22

Buffering Large Threadsstore X, 0x00store A, 0x01

L1$

0x00:

0x01:

L2$

X

A

0x00:

L1$

0x00:

0x01:

X

A

S1

S10x01:

Slide courtesy Chris Colohan

Page 23: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

23

Buffering Large Threadsstore X, 0x00store A, 0x01 load 0x00

L1$

0x00:

0x01:

L2$

X

A

0x00:

0x01:

L1$

0x00:

0x01:

X

X

A

S1

S1

L2

Slide courtesy Chris Colohan

Page 24: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

24

XL2 XS1

Buffering Large Threadsstore X, 0x00store A, 0x01 load 0x00

store Y, 0x00

L1$

0x00:

0x01:

L2$

X

A

0x00:

0x01:

L1$

0x00:

0x01:

XY

AS1

YS2 L2 Replicate line – one version per thread

Replicate line – one version per thread

Slide courtesy Chris Colohan

Page 25: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

25

Buffering Large Threadsstore X, 0x00store A, 0x01 load 0x00

load 0x01

store Y, 0x00

L1$

0x00:

0x01:

L2$

X

A

0x00:

0x01:

X

A

Y

L1$

0x00:

0x01:

Y

A

S1

S2 L2

S1 L2

Slide courtesy Chris Colohan

Page 26: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

26

Buffering Large Threadsstore X, 0x00store A, 0x01 load 0x00

load 0x01

store Y, 0x00

store B, 0x01

L1$

0x00:

0x01:

L2$

X

A

0x00:

0x01:

X

A

L1$

0x00:

0x01:

Y

A

S1

YS2 L2

S1 L2

B

B

Slide courtesy Chris Colohan

Page 27: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

27

Sub-thread Supportstore X, 0x00store A, 0x01 load 0x00

load 0x01

store Y, 0x00

store B, 0x01

L1$

0x00:

0x01:

L2$

X

A

0x00:

0x01:

X

A

L1$

0x00:

0x01:

S1

S1 L2

B

B

Y

YS2 L2

a {b {

Divide into two sub-threads

Only roll backviolated sub-thread

Slide courtesy Chris Colohan

Page 28: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Copyright 2006 Chris Colohan 28

Sub-thread Supportstore X, 0x00store A, 0x01 load 0x00

load 0x01

store Y, 0x00

L1$

0x00:

0x01:

L2$

X

A

0x00:

0x01:

X

A

Y

L1$

0x00:

0x01: A

S1a

S1a

A

A

S2a L2a

L2b

Y

a {b {

Store and load bit per sub-thread

Store and load bit per sub-thread

store B, 0x01

B

Slide courtesy Chris Colohan

Page 29: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Copyright 2006 Chris Colohan 29

AAAL2bS1a

Sub-thread Supportstore X, 0x00store A, 0x01 load 0x00

load 0x01

store Y, 0x00

L1$

0x00:

0x01:

L2$

X

A

0x00:

0x01:

X

Y

L1$

0x00:

0x01:

Y

S1a

A

S2a L2a

B

store B, 0x01

S1b

AB

a {b {

Slide courtesy Chris Colohan

Page 30: More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.

Sub-thread Evaluation

• Evaluate using large database transactions• Parallelize the loops• Can we place an upper bound on the possible

speedup?