Top Banner
Eidetic Systems David Devecsery, Michael Chow, Xianzheng Dou, Jason Flinn, Peter Chen University of Michigan
41

Eidetic Systems

Jan 02, 2016

Download

Documents

Sandra Townsend

Eidetic Systems. David Devecsery , Michael Chow, Xianzheng Dou, Jason Flinn, Peter Chen University of Michigan. What is an Eidetic System?. Eidetic – Having “Perfect memory” or “Total Recall” - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Eidetic Systems

Eidetic SystemsDavid Devecsery, Michael Chow, Xianzheng Dou,

Jason Flinn, Peter ChenUniversity of Michigan

Page 2: Eidetic Systems

What is an Eidetic System?

Eidetic – Having “Perfect memory” or “Total Recall”

Eidetic System – A system which can recall and trace through the lineage of any past computation

2David Devecsery

Page 3: Eidetic Systems

Motivation - Heartbleed

3

• Was Heartbleed exploited?• What data was leaked?

David Devecsery

Page 4: Eidetic Systems

Motivation - Heartbleed

4

HeartbleedMessage

Leaked Data

• Was Heartbleed exploited? - Yes• What data was leaked?

David Devecsery

Page 5: Eidetic Systems

Motivation - Heartbleed

5

• Was Heartbleed exploited? - Yes• What data was leaked?

Leaked Database Rows

David Devecsery

HeartbleedMessage

Leaked Data

Page 6: Eidetic Systems

Motivation – Wrong Reference

6

Bad Citation

• How did I get the wrong citation?

David Devecsery

Page 7: Eidetic Systems

Motivation – Wrong Reference

7

• How did I get the wrong citation?

David Devecsery

Page 8: Eidetic Systems

Motivation – Wrong Reference

8

• How did I get the wrong citation?

David Devecsery

Page 9: Eidetic Systems

Motivation – Wrong Reference

9

• How did I get the wrong citation?• What else did this affect?

David Devecsery

Page 10: Eidetic Systems

Motivation

10

• How did I get the wrong citation?• What else did this affect?

David Devecsery

Page 11: Eidetic Systems

Arnold

•First practical eidetic computer system• Efficiently records & recalls all user-space computation• Process register/memory state• Inter-process communication

• Handles lineage queries• What data was affected?• What states and outputs were affected?

• Targeted towards desktop/workstation use•Reasonable overheads• Record 4 years of data on $150 commodity HD• Under 8% performance overhead on most benchmarks

11David Devecsery

Page 12: Eidetic Systems

Overview

• Introduction•Motivation•How Arnold remembers all state•How Arnold supports lineage queries•Conclusion

12David Devecsery

Page 13: Eidetic Systems

Remembering State

•Requirements:• Store years of state on a single disk• Memory/register space within a process• Inter process communication• File state

• Recall any state in reasonable time•Solution:• Deterministic record & replay• “Process group” based replay• “Process graph” to track inter-process lineage

• Log compression

13David Devecsery

Page 14: Eidetic Systems

Recording Granularity

•What granularity is best to record our system?

14

Pipe

1

2Read 1

Pipe

1

2Read 1 Pipe

1

2Read 1

Pipe

1

2Read 1

Pipe

1

2Read 1

ExternalInputs

David Devecsery

Page 15: Eidetic Systems

Recording Granularity

• Whole system recordingLow space overhead× Costly to replay

15

Pipe

1

2Read 1

Pipe

1

2Read 1 Pipe

1

2Read 1

Pipe

1

2Read 1

Pipe

1

2Read 1

ExternalInputs

David Devecsery

Page 16: Eidetic Systems

Recording Granularity

•Process level recordingEfficient to replay×Uses extra disk space×No Inter-process tracking

16

Pipe

1

2Read 1

Pipe

1

2Read 1 Pipe

1

2Read 1

Pipe

1

2Read 1

Pipe

1

2Read 1

ExternalInputs

David Devecsery

Page 17: Eidetic Systems

Recording Granularity

•Process group recordingEfficient to replayReasonable disk space×No Inter-process tracking

17

Pipe

1

2Read 1

Pipe

1

2Read 1 Pipe

1

2Read 1

Pipe

1

2Read 1

Pipe

1

2Read 1

ExternalInputs

David Devecsery

Page 18: Eidetic Systems

Implementation – Process Graph

18

Record Log

Pipe

1

2Read 1

1

Pipe

1

2Read 1Pipe

1

2Read 1

2

IPC Read

David Devecsery

Page 19: Eidetic Systems

Implementation – Process Graph

19

Record Log

Pipe

1

2Read 1

1

Pipe

1

2Read 1Pipe

1

2Read 1

2

IPC ReadPipe

1

2Read 1Pipe

1

2Read 1

David Devecsery

Page 20: Eidetic Systems

Recording

•Process group recording + process graphEfficient to replayReasonable disk spaceInter-process tracking

20

Pipe

1

2Read 1

Pipe

1

2Read 1 Pipe

1

2Read 1

Pipe

1

2Read 1

Pipe

1

2Read 1

ExternalInputs

David Devecsery

Page 21: Eidetic Systems

Space Optimizations

21

Baselin

e

+Model-

Based Compres

sion

+Ded

uplicate

d File

Cache

+X Se

rver C

ompressio

n

+Sem

i-Dete

rministi

c Tim

e+G

zip

0

0.2

0.4

0.6

0.8

1

1.2

Log

Com

pres

sion

vs B

asel

ine

David Devecsery

Page 22: Eidetic Systems

Space Optimizations

22

Baselin

e

+Model-

Based Compres

sion

+Ded

uplicate

d File

Cache

+X Se

rver C

ompressio

n

+Sem

i-Dete

rministi

c Tim

e+G

zip0

0.2

0.4

0.6

0.8

1

1.2

Log

Com

pres

sion

vs B

asel

ine

411:1 Ratio

David Devecsery

Page 23: Eidetic Systems

Space Optimizations

23

Baselin

e

+Model-

Based Compres

sion

+Ded

uplicate

d File

Cache

+X Se

rver C

ompressio

n

+Sem

i-Dete

rministi

c Tim

e+G

zip

Only Gzip

0

0.2

0.4

0.6

0.8

1

1.2

Log

Com

pres

sion

vs B

asel

ine

411:1 Ratio

6:1 Ratio

David Devecsery

Page 24: Eidetic Systems

Space Optimizations

24

Baselin

e

+Model-

Based Compres

sion

+Ded

uplicate

d File

Cache

+X Se

rver C

ompressio

n

+Sem

i-Dete

rministi

c Tim

e+G

zip

Only Gzip

0

0.2

0.4

0.6

0.8

1

1.2

Log

Com

pres

sion

vs B

asel

ine

411:1 Ratio

6:1 Ratio

David Devecsery

4 years of data on a $150 4TB commodity HD

Page 25: Eidetic Systems

Model-Based Compression

• Formulate a model of a typical execution • Only record deviations from that model

ret_val = sys_read (fd, buffer, count);

• Idea: Partial determinism• Encourage the program to conform to the model

25

usually equal

David Devecsery

Page 26: Eidetic Systems

Semi-Deterministic Time

• Frequent time queries are non-deterministic• Use partially deterministic clock• Real time clock & deterministic clock• Bound deviation

26

if (deterministic_clock – real_time_clock < threshold) {adjust deterministic_clockrecord deviation

}return deterministic_clock

David Devecsery

Page 27: Eidetic Systems

Performance Evaluation

27

kern

el copy

cvs c

heckout

make

latex

apac

hege

dit

facebook

spread

sheet

0.9

0.95

1

1.05

1.1

1.15

1.2 Baseline Arnold

Nor

mal

ized

Runti

me

David Devecsery

Page 28: Eidetic Systems

Overview

• Introduction•Motivation•How Arnold remembers all state•How Arnold supports lineage queries•Conclusion

28David Devecsery

Page 29: Eidetic Systems

Querying Lineage

•Two types of queries:•Reverse: Where did this data come from?•Forward: What did this data affect?

•How does Arnold support these queries?•User specifies initial state•Trace the lineage of the computation• Intra-process tracking• Inter-process tracking

29David Devecsery

Page 30: Eidetic Systems

Intra-Process Lineage

• Use taint tracking for intra-process causality• Run retroactively, on recorded execution• Parallelizable

• Arnold supports several notions of causality:

30

Copy Only Data Flow Data+Index Flow

May miss relations Misses few relationsRecall

Strong input/output relation

Weak input/output Relation

Precision

Control Flow

David Devecsery

Page 31: Eidetic Systems

Intra-Process Lineage

31

Which linkage tool should Arnold use?

Inputs

Program

David Devecsery

Page 32: Eidetic Systems

Intra-Process Lineage

32

Data Flow Data+IndexFlow

May miss relations Misses few relationsRecall

Strong input/output relation

Weak input/output Relation

Precision

Copy

David Devecsery

Page 33: Eidetic Systems

Intra-Process Lineage

33

Data Flow Data+IndexFlow

May miss relations Misses few relationsRecall

Strong input/output relation

Weak input/output Relation

Precision

Copy

David Devecsery

Page 34: Eidetic Systems

Intra-Process Lineage

34

Data Flow Data+IndexFlow

May miss relations Misses few relationsRecall

Strong input/output relation

Weak input/output Relation

Precision

Copy

David Devecsery

Page 35: Eidetic Systems

Intra-Process Lineage

35

Data Flow Data+IndexFlow

May miss relations Misses few relationsRecall

Strong input/output relation

Weak input/output Relation

Precision

Copy

David Devecsery

Page 36: Eidetic Systems

Intra-Process Lineage

36

Data Flow Data+IndexFlow

May miss relations Misses few relationsRecall

Strong input/output relation

Weak input/output Relation

Precision

Copy

David Devecsery

Page 37: Eidetic Systems

Intra-Process Lineage

37

Data Flow

May miss relations Misses few relationsRecall

Strong input/output relation

Weak input/output Relation

Precision

Arnold selects themost precise tool withat least one result

David Devecsery

Page 38: Eidetic Systems

Inter-Process Lineage

• Two notions of inter-process linkage• Process graph• Tracks lineage through inter-process communication• Precise • Captures group to group communication

• Human linkage• Handles relations between user inputs and outputs• Infers linkages based on data content and time• Imprecise – may have false negatives and false positives• Can capture linkages the process graph can miss

38David Devecsery

Page 39: Eidetic Systems

Evaluation – Wrong Reference

39

Data + IndexDataCopyCopyData

• Few false positives (font files, latex sty files, libc.so, libXt.so)• No false negatives

Record Time Replay Time Replay + Pin Time

Query Time

96.1s 2.2s 70.0s 209.5s

HumanLinkage

David Devecsery

Page 40: Eidetic Systems

Evaluation – Heartbleed

40

• No false positives or negatives

Data + IndexData + Index Data + Index

Record Time Replay Time Replay + Pin Time

Query Time

230.3s 0.4s 139.5s 235.1s

David Devecsery

Page 41: Eidetic Systems

Conclusion

•Eidetic Systems are powerful tools• Complete vision into past computation• Answer powerful queries about state’s lineage

•Arnold – First practical Eidetic System• Low runtime overhead• 4 years of computation on a commodity HD• Supports powerful lineage queries

•Code is releasedhttps://github.com/endplay/omniplay

41David Devecsery