Top Banner
UW-Madison Computer Sciences Multifacet Group © 2011 Karma: Scalable Deterministic Record- Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at University of Wisconsin- Madison
24

UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

Dec 14, 2015

Download

Documents

Daniella Towles
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

UW-Madison Computer Sciences Multifacet Group © 2011

Karma:Scalable Deterministic Record-Replay

Arkaprava BasuJayaram Bobba

Mark D. Hill

Work done at University of Wisconsin-Madison

Page 2: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

2

Executive summary

• Applications of deterministic record-replay– Debugging– Fault tolerance– Security

• Existing hardware record-replayer– Fast record but– Slow replay or – Requires major hardware changes

• Karma: Faster Replay with nearly-conventional h/w– Extends Rerun– Records more parallelism

Page 3: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

3

Outline

• Background & Motivation• Rerun Overview• Karma Insights• Karma Implementation• Evaluation• Conclusion

Page 4: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

4

Deterministic Record-Replay

• Multi-threaded execution non-deterministic• Deterministic record-replay to reincarnate

past execution• Record:

– Record selective events in a log• Replay:

– Use the log to reincarnate past execution• Key Challenge: Memory races

Page 5: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

5

Record-Replay Motivation

• Debugging– Ensures bugs faithfully reappear (no heisenbugs)

• Fault-Tolerance– Enable hot backup for primary server to

shadow primary & take over on failure

• Security– Real time intrusion detection & attack analysis

Rep

lay sp

eed

matte

rs

Page 6: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

6

Previous work

• Record Dependence– Wisconsin Flight Data Recorder [ISCA’03,etc.]: Too much

state– UCSD Strata [ASPLOS’06]: Log size grows rapidly w #cores

• Record Independence– UIUC DeLorean [ISCA’08]: Non-conventional BulkSC H/W– Wisconsin Rerun [ISCA’08]: Sequential replay– Intel MRR [MICRO’09]: Only for snoop based systems– Timetraveler [ISCA’10]: Extends Rerun to lower log size

• Our Goal– Retain Rerun’s near-conventional hardware– Enable Faster Replay

Page 7: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

7

Outline

• Background & Motivation• Rerun Overview• Karma Insights• Karma Implementation• Evaluation• Conclusion

Page 8: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

8

Rerun’s Recording

• Most code executes without races– Use race-free regions for ordering

• Episodes: independent execution regions– Defined per thread

T0 T1

LD A ST B ST C LD F

ST E LD B ST X LD R ST T LD X

T2

ST V ST Z LD W LD J

ST C LD Q LD J

ST Q ST E ST K LD Z

LD V

ST X

Partially adopted from ISCA’08 talk

Page 9: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

9

23

Rerun’s Recording (Contd.)

• Capturing causality:– Timestamp via Lamport scalar clock [Lamport ‘78]

• Replay in timestamp order– Episodes with same timestamp can be replayed in parallel

43 2260

61 44

62

2344

45

T0 T1 T2

Page 10: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

10

Rerun’s Replay

T0 T1 T2

22

43

4444

45

60

61

TS=22

TS=45

TS=44

TS=43

TS=60

TS=61

Page 11: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

11

Outline

• Background & Motivation• Rerun Overview• Karma Insights• Karma Implementation• Evaluation• Conclusion

Page 12: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

12

Karma’s Insight 1:

• Capture order with DAG (not scalar clock)

Recording: DAG captured with episode predecessor & successor sets 23

43 2260

61 44

62

2344

45

T0 T1 T2

Page 13: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

13

Karma’s Insight 1:

T0 T1 T2

2260

61 43

4444

62

T0 T1 T2

22

43

4444

45

60

61

Reru

n’s

Rep

lay

Karm

a’s

Rep

lay

Page 14: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

14

Karma’s Insight 1: (Contd.)

• Naïve approach: DAG arcs point to episodes– Episode represented by integers– Too much log size overhead !!

• Our approach: DAG arcs point to cores– Recording: Only one “active” episode per core – Replay: Send wakeup message(s) to core(s) of

successor episode(s)

Page 15: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

15

Karma’s Insight 1:

T0 T1 T2

2260

61 43

44

44

62

84 0|0|1 0|0|1

Anatomy of a log entry

Page 16: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

17

• Not necessary to end the episode on every conflict:– As long as the episodes can be ordered during

replay

ST B ST C

Karma Insight 2:

T0 T1 LD A

LD F

ST E LD B ST X LD R ST T

LD X

T2

ST V ST Z LD W LD J

ST C LD Q

LD J ST Q

ST E ST K LD Z

LD V

ST X

Page 17: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

18

Outline

• Background & Motivation• Rerun Overview• Karma Insights• Karma Implementation• Evaluation• Conclusion

Page 18: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

19

Karma Hardware

Coherence Controller

L1 I

L2 0

L2 1

L2 14

L2 15

Core 15

Interconnect

DR

AM

DR

AM

Core 14

Core 1

Core 0 …

Base System

Total State: 148 bytes/core

Address Filter(FLT)

Reference (REFS)

Predecessor(PRED)

Successor(SUCC)

Timestamp(TS)

Page 19: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

20

Outline

• Background & Motivation• Rerun Overview• Karma Insights• Karma Implementation• Evaluation• Conclusion

Page 20: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

21

Evaluation:

• Were we able to speed up the replay?

0

0.2

0.4

0.6

0.8

1

1.2

4core-4MB 8core-8MB 16core-16MB

Spee

dup

norm

aliz

ed to

"Ba

se"

of c

orre

spon

ding

co

nfigu

rati

on

Number of cores-L2 cache size

Apache Base

Rerun Replay

Karma Replay

Page 21: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

22

Evaluation:

• Were we able to speed up the replay?

0

0.2

0.4

0.6

0.8

1

1.2

4core-4MB 8core-8MB 16core-16MB

Spee

dup

norm

aliz

ed to

"Ba

se"

of c

orre

spon

ding

co

nfigu

rati

on

Number of cores-L2 cache size

Apache Base

Rerun Replay

Karma Replay

0

0.2

0.4

0.6

0.8

1

1.2

4core-4MB 8core-8MB 16core-16MB

Spee

dup

norm

aliz

ed to

"Ba

se"

of c

orre

spon

ding

co

nfigu

rati

on

Number of cores-L2 cache size

Jbb Base

Rerun Replay

Karma Replay

0

0.2

0.4

0.6

0.8

1

1.2

4core-4MB 8core-8MB 16core-16MB

Spee

dup

norm

aliz

ed to

"Ba

se"

of c

orre

spon

ding

co

nfigu

rati

on

Number of cores-L2 cache size

OltpBaseRerun ReplayKarma Replay

0

0.2

0.4

0.6

0.8

1

1.2

4core-4MB 8core-8MB 16core-16MB

Spee

dup

norm

aliz

ed to

"Ba

se"

of c

orre

spon

ding

co

nfigu

ratio

n

Number of cores-L2 cache size

Zeus Base

Rerun Replay

Karma Replay

On Average ~4X improvement in replay speed over Rerun

Page 22: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

23

Evaluation

• Did we blowup log size?

0

0.2

0.4

0.6

0.8

1

1.2

1.4

128 256 512 1024 2048 4096 8192 Unbounded

Ka

rma

lo

g s

ize

no

rma

lize

d t

o R

eru

n's

lo

g s

ize

Maximum allowable Episode size

Apache

Zeus

Oltp

Jbb

On average Karma does not increase the size of the log but instead improves it by as much as 40% as we allow larger episodes

Page 23: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

25

Conclusion

• Applications of deterministic replay– Debugging– Fault tolerance– Security

• Existing hardware record-replayer– Slow replay or – Requires major hardware changes

• Karma: Faster Replay with nearly-conventional h/w– Extends Rerun– Uses DAG instead of Scalar clock– Extend episodes past conflicts

• Widen Application + Lower Cost More Attractive

Page 24: UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

26

Questions?