Top Banner
RTR: 1 Byte/Kilo- Instruction Race Recording Min Xu Rastislav BodikMark D. Hill
23

RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

Apr 01, 2015

Download

Documents

Ernest Bramley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

RTR: 1 Byte/Kilo-InstructionRace Recording

Min Xu Rastislav Bodik Mark D. Hill

Page 2: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

2

% gcc sim.c% a.outSegmentation fault%

% gdb a.outgdb> runProgram received SIGSEGV.In get() at hash.c:4545 a = bucket->d;

% gdb a.outgdb> runProgram exited normally.gdb>

% gcc para-sim.c% a.outSegmentation fault%

Why Do You Need a Recorder?

% gdb a.out loggdb> runProgram received SIGSEGV.In get() at para-hash.c:6767 a = bucket->d;

% gcc para-sim.c% a.outSegmentation faultRace recorded in “log”%

Page 3: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

3Ideally …

% gdb a.out loggdb> runProgram received SIGSEGV.In get() at para-hash.c:6767 a = bucket->d;

% gcc para-sim.c% a.outSegmentation faultRace recorded in “log”%

Long recording:small logLow runtime

overheadLow cost

Applicability:Programs – data race

Systems – non-SC

Page 4: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

4Better and Better Recorders

Low Cost

Low Overhead

Small Log Size

Applicability

Data Race Non-SC

InstRply ’87 Y Y

B. & G. ’91 YNetzer’93 YDéjà Vu

’98 YRecPlay

’00JaRec ’04

FDR ’03BugNet ’05

YReEnact

‘03 Y

CORD ‘06 YStrata ’06 YRTR ‘06 Y Y

Page 5: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

1 Byte/Kilo-Instruction[ASPLOS’06]

A New Recorder

HardwareAcceleration

[ISCA’03]

LessHardware[ASPLOS’06]

SC & TSO[ASPLOS’06]

Result: One more step toward practical

This talk covers only RTR• Regulated Transitive Reduction algorithm

Page 6: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

6Outline

Race Recording

RTR Algorithm

Results with Commercial Workloads

Compress log during recording replay more “regularly”

Conclusion

Page 7: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

Technically, what’s race recording?

Page 8: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

8Race Recording

X=6

X = 1

X++

print(X)

X = 1

X++

print(X)

-X = X*5

--

---

X = X*5-

Thread IThread J

Original Replay

X=10

Recording

X= 6

-X = X*5

--

Log

Thread IThread J

Page 9: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

9

Goal: Reproduce same conflicts with minimum log data

Terminologies and Assumptions

ld A

Thread I Thread J

Recording

st B

st C

sub

ld B

add

st C

ld B

st A

st C

Thread I Thread J

Replay

Log

ld D

st D

ld A

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

Conflicts(red)

Dependence(black)

Page 10: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

Regulated Transitive Reduction (RTR)

Page 11: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

11Log All Conflicts

1

2

3

4

5

6

1

2

3

4

5

6

ld A

Thread I Thread J

Replay

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

Log J: 23 14 35 46

Log I: 23

Log Size: 5*16=80 bytes(10 integers)

Dependence Log

16 bytes

But too many conflicts

Page 12: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

12Netzer’s Transitive Reduction (TR)

1

2

3

4

5

6

1

2

3

4

5

6

ld A

Thread I Thread J

Replay

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

TR reduced Log J: 23

35 46

Log I: 23

Log Size: 64 bytes(8 integers)

TR Reduced Log

How to further reduce log size?

Page 13: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

13The Intuition of the RTR Algorithm

After Reduction

From I to J

From J to I

Vectors

Vectors“Regulate” Replay

Page 14: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

14

Stricter Dependences to Aid Vectorization

1

2

3

4

1

2

3

4

ld A

Thread I Thread J

Replay

st B

st C

add

st C

ld B

st Ald D

5 5sub st C

6 6ld B st D

Log J: 23 45

Log I: 23

Log Size: 48 bytes(6 integers)

New Reduced Log

stricter

Reduced

Fewer dependencies to log

Page 15: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

15Compress Vectorized Dependencies

1

2

3

4

5

6

1

2

3

4

5

6

ld A

Thread I Thread J

Replay

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

Log J: x=3,5, ∆=1

Log I: x=3, ∆=1

Log Size: 40 bytes(5 integers)

Vectorized Log

VectorDeps.

TRRTR: fewer deps + fewer byte/dep

Page 16: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

16Deadlock Avoidance of RTR

1

2

3

4

5

6

1

2

3

4

5

6

ld A

Thread I Thread J

Recording

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

Limit the strict dependencies (see paper)

i:4j:1 j:2 i:3 i:4

Replay Cycle

Page 17: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

Results with Commercial Workloads

Page 18: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

18Full-system Simulation Method

Commercial server hardware• GEMS: http://www.cs.wisc.edu/gems• Full-system (OS + application) executions• 4-core CMP (Sequential Consistent)

• 1-way in-order issue, 2 GHz, • 64KB I/D L1, 4MB L2, 64byte lines, MOSI directory

Commercial server software• Apache – static web serving• SpecJBB – middleware• OLTP – TPC-C like• Zeus – static web serving

L2C0

C1

C3

C2

Page 19: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

19Log Size: 1 byte/KI

0.0

0.5

1.0

1.5

2.0byte/core/KI

ApacheJBB OLTP Zeus AVG0

50

100

150

200KB/core/s

ApacheJBB OLTP Zeus AVG

Less buffer, longer recording, smaller logs

Page 20: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

20RTR vs. Netzer’s TR

TR

RTR

0

20

40

60

80

100

ApacheJBB OLTP ZeusAVG

Lo

g

Siz

e 28% smaller log• TR was “optimal”

Page 21: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

21Why Does RTR Work Well?

RTR•Instructions execute at similar speed•Dependencies are often “vectorizable”

Page 22: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

A New Recorder

HardwareAcceleration

[ISCA’03]

1 Byte/Kilo-Instruction[ASPLOS’06]

LessHardware[ASPLOS’06]

SC & TSO[ASPLOS’06]

Result: One more step toward practical

“Less hardware” & “TSO” not covered• Equally important• More details in the paper

Page 23: RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

23Conclusion

Race recording Counter nondeterminism

RTR 1 byte/kilo-instruction•Based on Netzer’s transitive reduction•Create stricter dependencies•Vectorize dependencies to compress log•Avoid overly-strict hence no deadlock

Future work•Support snooping, SMT, replayer