Top Banner
Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology
34

Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Automatic Detection and Repair of Errors in Data Structures

Brian DemskyMartin Rinard

Laboratory for Computer ScienceMassachusetts Institute of Technology

Page 2: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Motivation

F = 20G = 5

F = 20G = 10

I = 5

J = 2

Broken Data Structure

Errors• Missing elements• Inappropriate

sharing• Dangling

references• Out of bounds

array indices• Inconsistent values

Page 3: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Goal

F = 10G = 5

F = 20G = 10

I = 3

J = 2

F = 2G = 1

F = 20G = 5

F = 20G = 10

I = 5

J = 2

Broken Data Structure Consistent Data Structure

RepairAlgorithm

Page 4: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Goal

F = 10G = 5

F = 20G = 10

I = 3

J = 2

F = 2G = 1

F = 20G = 5

F = 20G = 10

I = 5

J = 2

Broken Data Structure Consistent Data Structure

RepairAlgorithm

ConsistencyProperties

FromDeveloper

Page 5: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

What Does Repair Algorithm Produce?

• Data structure that • Satisfies consistency properties, and• Heuristically close to broken data

structure• Not necessarily the same data structure

as (hypothetical) correct program would produce

• But enough to keep program going

Page 6: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Precursors

• Data structure repair has historically appeared in systems with extreme reliability goals• 5ESS switch – hand coded audit

routines• IBM MVS operating system – hand

coded failure recovery routines• Key component of these systems

Page 7: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Where Is This Likely To Be Useful?

• Not for transient errors in systems with slack – you can just reboot• Must be willing to lose volatile state• Must be willing to wait for system to

come back up• Permanent data structures

• File systems• Application files (Word, PowerPoint, …)

• Autonomous systems• Critical systems

Page 8: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Architecture

101110011000111101110101010111100111011010111000111101110

Broken Bits

BrokenAbstract Model

RepairedAbstract Model

101001111000111101110101101011100110101010111011001100010

Repaired Bits

Model Definition &Translation

Internal ConsistencyProperties

External ConsistencyProperties

Page 9: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Architecture RationaleWhy go through abstract model?

• Simple, uniform structure • Sets of objects• Relations between objects

• Simplifies both• Expression of consistency properties• Repair algorithm

• Enables system to support full range of efficient, heavily encoded data structures

Page 10: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

File System Example

abst intro 0 2 1

Directory Entries Disk Blocks

struct Entry {byte name[Length];int firstBlock;

}struct Block {

int nextBlock;data byte[BlockSize];

}

struct Disk {Entry dir[NumEntries];Block block[NumBlocks];

}

Disk D;

-5 1 -1

Page 11: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Model Definition

• Sets of objectsset blocks of integer : partition used |

free;• Relations between objects – values of

object fields, referencing relationships between objectsrelation next : used, used;blocks

used freenext

Page 12: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Model TranslationBits translated to sets and relations in abstract

model using statements of the form:

Quantifiers, Condition Inclusion Constraint

for i in 0..NumEntries, 0 D.dir[i].firstBlock and D.dir[i].firstBlock < NumBlocks D.dir[i].firstBlock in used

for b in used, 0 D.block[b].nextBlock and D.block[b].nextBlock < NumBlocks b,D.block[b].nextBlock in next

for b,n in next, true n in usedfor b in 0..NumBlocks, not (b in used) b in free

Page 13: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Model in Example

1

0

2

next

next

used

free

3

blocks

abst intro 0 2 1

Directory Entries Disk Blocks

-5 1 -1

Page 14: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Internal Consistency PropertiesQuantifiers, Body

• Body is first-order property of basic propositions• Inequality constraints on values of numeric

fields • V.R = E, V.R < E, V.R E, V.R E, V.R > E

• Presence of required number of objects• size(S) = C, size(S) C, size(S) C

• Topology of region surrounding each object• size(V.R) = C, size(V.R) C, size(V.R) C • size(R.V) = C, size(R.V) C, size(R.V) C

• Inclusion constraints: V in S, V1 in V2.R, V1,V2 in R• Example: for b in used, size(next.b) 1

Page 15: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Internal Consistency ViolationsEvaluate consistency properties, find

violationsfor b in used, size(next.b) 1 is false for b

= 1

1

0

2

next

next

used

free

3

blocks

Page 16: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Repairing Violations of Internal Consistency Properties

• Violation provides binding for quantified variables

• Convert Body to disjunctive normal form(p1 … pn ) … (q1 … qm )

p1 … pn , q1 … qm are basic propositions

• Choose a conjunction to satisfy• Repair violated basic propositions in

conjunction

Page 17: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Repairing Violations of Basic Propositions

• Inequality constraints on values of numeric fields • V.R = E, V.R < E, V.R E, V.R E, V.R > E• Compute value of expression, assign field

• Presence of required number of objects• size(S) = C, size(S) C, size(S) C• Remove or insert objects from/to set

• Topology of region surrounding each object• size(V.R) = C, size(V.R) C, size(V.R) C • size(R.V) = C, size(R.V) C, size(R.V) C• Remove or insert pairs from/to relation

• Inclusion constraints: V in S, V1 in V2.R, V1,V2 in R• Add object or pair to set or relation

Page 18: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Repair in Examplefor b in used, size(next.b) 1 is false for b

= 1Must repair size(next.1) 1

Can remove either 0,1 or 2,1 from next

1

0

2

next

next

used

free

3

blocks

Page 19: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Repair in Examplefor b in used, size(next.b) 1 is false for b

= 1Must repair size(next.1) 1

Can remove either 0,1 or 2,1 from next

1

0

2

next

used

free

3

blocks

Page 20: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Acyclic Repair Dependences

• Questions• Isn’t it possible for the repair of one

constraint to invalidate another constraint?

• What about infinite repair loops?• What about unsatisfiable specifications?

• Answer• We require specifications to have no

cyclic repair dependences between constraints

• So all repair sequences terminate• Repair can fail only because of resource

limitations

Page 21: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

External Consistency Constraints

Quantifiers, Condition Body• Body of form V = E, V.F = E, V.F[I] = E• Example

for b in free, true D.block[b].nextBlock = -2

for i,j in next, true D.block[i].nextBlock = j

for b in used, size(b.next) = 0 D.block[b].nextBlock = -1

• Repair simply performs assignments• Translates model repairs to bit repairs

Page 22: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

abst intro 0 2 1

Directory Entries Disk Blocks

-5 1 -1

abst intro 0 2 1

Directory Entries Disk Blocks

-1 -1 -2

Repaired File System

Repair in Example

Inconsistent File System

Page 23: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

What About Corrupted Pointers?• Sets may contain pointers to structs• System only allows valid structs in

model• struct must be completely in valid

memory• one struct may be nested inside

another struct (but must agree on memory format)

Valid Memory

Invalid Memory

Valid StructValid Structs

Invalid Struct

Page 24: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

When to Test for Consistency and Repair

• Persistent data structures• Repair can be independent activity, or• Repair when data written out or read in

• Volatile data structures in running program• Under programmer control• Transaction-based approach

• Identify transaction start and end• Repair at start, end, or both

• Failure-based approach• Wait until program fails• Repair and restart from latest safe point

Page 25: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Experience• We acquired three benchmarks

• Simplified Linux file system• Freeciv interactive game• Microsoft Word files

• We developed specifications for all three • Less than a week of development time• Most of time spent figuring out Freeciv

• Each benchmark has• Workload• Fault insertion methodology

• Ran benchmarks with and without repair

Page 26: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

intro 110 0 1011

directoryblock

inodebitmapblock

blockbitmapblock

inode inode…

inode block

disk blocks

Simplified Linux File System

Some Consistency Properties• inode bitmap consistent with inode

usage• block bitmap consistent with block

usage• directory entries refer to valid inodes • files contain valid blocks only• files do not share blocks

superblock

groupblock

Page 27: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Results

• Workload – write and verify several files • Fault insertion – crash file system

• Inode and block bitmap errors• Partially initialized directory and inode

entries• Without repair

• Incorrect file contents because of inode and disk block sharing

• With repair• Bitmaps repaired preventing illegal

sharing, correct file contents

Page 28: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

PO MM

OO MP

PO MM

PP MP

loc: 3,0

loc: 2,3

Terrain Grid

City Structures

Freeciv

Consistency Properties• Tiles have valid terrain

values• Cities are not in the ocean• Each city has exactly one

reference from city location grid

• City locations are consistent in• City structures and• tile grid

O = OceanP = PlainM = Mountain

Page 29: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Results

• Workload – Freeciv software plays against itself

• Fault insertion – randomly corrupt terrain values

• Without repair – program fails (seg fault)• With repair

• Game runs just fine• But game plays out differently because

of the different terrain values

Page 30: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Microsoft Word Files• Files consist of a sequence of streams• Streams stored using FAT-based data

structure

-1 -1 -21

HeaderFAT

blockDisk Blocks

Page 31: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Consistency Properties

• The FAT blocks exist• FAT contains valid values only

• -1 – terminates FAT streams• -2 – indicates free blocks• Valid disk block index – next block in

stream• FAT streams properly terminated• Free blocks properly marked• Streams contain valid blocks only • Streams do not share blocks

Page 32: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Results

• Workload – several Microsoft Word files• Fault insertion – scramble FAT• Without repair

• If blocks containing the FAT were incorrectly marked as free, Word successfully loads file

• Otherwise, “The document name or path is not

valid”

• With repair• Word loads all files

Page 33: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Related Work

• Hand-coded repair• Lucent 5ESS switch• IBM MVS operating system

• Transactions• Identify actions that leave system

consistent• If action fails, roll back to consistent state

• Checkpoint and recovery• Reboot system from scratch• Logging for roll-forward

• Self-stabilizing algorithms

Page 34: Automatic Detection and Repair of Errors in Data Structures Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Conclusion

• Data structure repair interesting way to (potentially) improve reliability

• Specification-based approach promises to make technique more widely applicable

• Moving towards more robust, probabilistic, continuous concept of system behavior