Top Banner
DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University
37

DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Mar 27, 2015

Download

Documents

Gavin Castillo
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

DynaMine: Finding Common Error Patternsby Mining Software Revision Histories

Benjamin LivshitsStanford

University

Thomas ZimmermannSaarland University

Page 2: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

A Box Full of Nails

A lot of promise potential excitement

Not that many success stories Not sure what to apply it to Let’s try this particularly exciting idea

Miners looking at their tools Promises, promises…

Interesting usage patterns found by CVS mining

Interesting error patterns found by CVS mining

Page 3: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

My Background

Tools for bug detection Analysis: pointer analysis, etc. Mostly static, some dynamic

Applications: Security

Buffer overruns Format string violations SQL injections Cross-site scripting HTTP response splitting Data lifetimes

J2EE patterns Bad session stores Lapsed listeners

Eclipse patterns Missing calls to dispose Not calling super Forgetting to deregister listeners

Page 4: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Classification of Error Patterns

NULL dereferences Buffer overruns Double-deletes Locking errors/threads

Generic patterns -- the usual suspects

App-specific patterns particular to a system or a set of APIs

Bugs in Linux code

Bugs in J2EE servlets

Device drivers

Error Pattern Iceberg

NULL dereferencesBuffer overrunsDouble-deletesLocks/threads

Page 5: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Classification of Error Patterns

App-specific patterns particular to a system or a set of APIs

Intuition: Many other application-specific patterns exist Much of application-specific stuff remains a gray area so far

Goal: Let’s figure out what the patterns are

Generic patterns -- the usual suspects

NULL dereferences Buffer overruns Double-deletes Locking errors/threads

Anybody knows any good error

patterns specific to WinAmp plugins??

There are hundreds of

WinAmp plugins out there

Page 6: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Motivation: Matching Method Pairs

Start small: Matching method pairs Only two methods A very simple state machine Calls must match perfectly, order matters

Very common, our inspiration is System calls

fopen/fclose lock/unlock …

GUI operations addNotify/removeNotify addListener/removeListener createWidget/destroyWidget …

Want to find more of the same And, if are lucky, more interesting patterns

Page 7: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

DynaMine: Our Insight

Our problem: Want to find patterns whose violation causes

errors Want to find patterns for program understanding

Our technique: Look at revision histories

Crucial observation:

Use data mining techniques to find method that are often added at the same time

Things that are frequently checked in together often form a pattern

Page 8: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

DynaMine: Our Insight (continued)

Now we know the potential patterns “Profile” the patterns

Run the application See how many times each pattern

hits – number of times a pattern is followed misses – number of times a pattern is violated

Based on this statistics, classify the patterns Usage patterns – almost always hold Error patterns – violated a large number of the times, but

still hold most of the time Unlikely patterns – not validated enough times

Page 9: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Architecture of DynaMine

mine CVS histories patterns

run the application

post-process

usagepatterns

errorpatterns

unlikelypatterns

sort andfilter

revision history mining

dynamic analysis

report bugs

report patterns

reporting

instrument relevantmethod calls

Page 10: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Mining approach

Page 11: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Mining Basics

Rely on co-change Simplification: look at

method calls only Look for interesting

patterns in the way methods are called

Example: Sequence of revisions Files Foo.java, Bar.java,

Baz.java, Qux.java

o1.addListenero1.removeListener

o2.addListenero2.removeListenerSystem.out.println

o3.addListenero3.removeListenerlist.iteratoriter.hasNextiter.next

o4.addListenerSystem.out.println

o4.removeListener

Foo.java1.12

Bar.java1.47

Baz.java1.23

Qux.java1.41

1.42

Page 12: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Mining Matching Method Calls

Use our observation: Methods that are

frequently added simultaneously often represent a usage pattern

For instance: … addListener(…); … removeListener(…); …

o1.addListenero1.removeListener

o2.addListenero2.removeListenerSystem.out.println

o3.addListenero3.removeListenerlist.iteratoriter.hasNextiter.next

o4.addListenerSystem.out.println

o4.removeListener

Foo.java1.12

Bar.java1.47

Baz.java1.23

Qux.java1.41

1.42

Page 13: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Data Mining Summary

We consider method calls added in each check-in We want to find patterns of method calls

Too many potential patterns to consider Want to filter and rank them Use support and confidence for that

Support and confidence of each pattern Standard metrics used in data mining Support reflects how many times each pair appears Confidence reflects how strongly a particular pair is

correlated Refer to the paper for details

Page 14: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Improvements Over the Traditional Approach

Default data mining approach doesn’t quite work

Filters based on confidence and support Still too many potential patterns!

1. Filtering: Consider only patterns with the same initial

subsequence as potential patterns

2. Ranking: Use one-line “fixes” to find likely error patterns

Page 15: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Matching Initial Call Sequences

o1.addListenero1.removeListener

o2.addListenero2.removeListenerSystem.out.println

o3.addListenero3.removeListenerlist.iteratoriter.hasNextiter.next

o4.addListenerSystem.out.println

o4.removeListener

Foo.java1.12

Bar.java1.47

Baz.java1.23

Qux.java1.41

1.42

1 Pair

3 Pairs 1 Pair

10 Pairs 2 Pairs

1 Pair 0 Pairs

0 Pairs

Page 16: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Using Fixes to Rank Patterns

Look for one-call additions which likely indicate fixes

Rank patterns with such methods higher

o1.addListenero1.removeListener

o2.addListenero2.removeListenerSystem.out.println

o3.addListenero3.removeListenerlist.iteratoriter.hasNextiter.next

o4.addListenerSystem.out.println

o4.removeListener

Foo.java1.12

Bar.java1.47

Baz.java1.23

Qux.java1.41

1.42

This is a fix! Move patterns containing removeListener up

Page 17: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Applications under Study

Apply these ideas to the revision history of Eclipse and jEdit Very large open-source projects Many people working on both, are all over the

planet 122 on Eclipse 92 on jEdit

Many check-ins Eclipse 2,837,854 jEdit 144,495

Long histories Eclipse since 2001 jEdit since 2000

Page 18: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Some patterns

(as promised)

Page 19: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Categories of Patterns

Method calls during execution: Care about the methods Care about the order Care about the parameters/return values

Here’re some common cases

Matching method pairs

State machines

More complex patterns

Page 20: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Some Interesting Method Pairs (1)

kEventControlActivate kEventControlDeactivate

addDebugEventListener removeDebugEventListener

beginRule endRule

suspend resume

NewPtr DisposePtr

addListener removeListener

register deregister

addElementChangedListener removeElementChangedListener

addResourceChangeListener removeResourceChangeListener

addPropertyChangeListener removePropertyChangeListener

createPropertyList reapPropertyList

preReplaceChild postReplaceChild

addWidget removeWidget

stopMeasuring commitMeasurements

blockSignal unblockSignal

HLock HUnlock

OpenEvent fireOpen

Page 21: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Some Interesting Method Pairs (2)

kEventControlActivate kEventControlDeactivate

addDebugEventListener removeDebugEventListener

beginRule endRule

suspend resume

NewPtr DisposePtr

addListener removeListener

register deregister

addElementChangedListener removeElementChangedListener

addResourceChangeListener removeResourceChangeListener

addPropertyChangeListener removePropertyChangeListener

createPropertyList reapPropertyList

preReplaceChild postReplaceChild

addWidget removeWidgetstopMeasuring commitMeasurements

blockSignal unblockSignal

HLock HUnlock

OpenEvent fireOpen

Register/unregister the current widget with the parent display object for

subsequent event forwarding

Page 22: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Some Interesting Method Pairs (3)

kEventControlActivate kEventControlDeactivate

addDebugEventListener removeDebugEventListener

beginRule endRule

suspend resume

NewPtr DisposePtr

addListener removeListener

register deregister

addElementChangedListener removeElementChangedListener

addResourceChangeListener removeResourceChangeListener

addPropertyChangeListener removePropertyChangeListenercreatePropertyList reapPropertyList

preReplaceChild postReplaceChild

addWidget removeWidget

stopMeasuring commitMeasurements

blockSignal unblockSignal

HLock HUnlock

OpenEvent fireOpen

Add/remove listener for a particular kind of GUI events

Page 23: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Some Interesting Method Pairs (4)

kEventControlActivate kEventControlDeactivate

addDebugEventListener removeDebugEventListener

beginRule endRule

suspend resume

NewPtr DisposePtr

addListener removeListener

register deregister

addElementChangedListener removeElementChangedListener

addResourceChangeListener removeResourceChangeListener

addPropertyChangeListener removePropertyChangeListener

createPropertyList reapPropertyList

preReplaceChild postReplaceChild

addWidget removeWidget

stopMeasuring commitMeasurements

blockSignal unblockSignal

HLock HUnlockOpenEvent fireOpen

Use OS native locking mechanism for resources such as icons, etc.

Page 24: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

State Machines

Order captured by a state machine Must be followed precisely: omitting

or repeating a method call is a sign of error.

Simplest formalism for describing the object life-cycle.

Matching method pairs – specific case Very common in C

Consider OS code Less common in Java, but…

Page 25: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

State Machines (1)

o.enterAlignment [o.redoAlignment] o.exitAlignment

Part of the org.eclipse.jdt.internal.formatter.Scribe package responsible for pretty-printing of code

enterAlignment/exitAlignment pairs must match

redoAlignment is invoked in exception cases

Page 26: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

State Machines (2)

o.beginCompoundEdit()(o.insert(...) | o.remove(...))+

o.endCompoundEdit()

Compound edits within jEdit: can be undone at once

beginCompoundEdit/endCompoundEdit act as brackets

Other operations inbetween

Page 27: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

State Machines (3)

OS.PmMemCreateMC[OS.PmMemStart OS.PmMemFlush

OS.PmMemStop]OS.PmMemReleaseMC

Memory context manipulation (like memory pools)

Wrappers around underlying OS functionality The middle part of the pattern is optional

Page 28: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

More Complex Stuff (1)

try { monitor.beginTask(null, Policy.totalWork); int depth = -1; try { workspace.prepareOperation(null, monitor); workspace.beginOperation(true); depth = workspace.getWorkManager().beginUnprotected(); return runInWorkspace(Policy.subMonitorFor(monitor, Policy.opWork, SubProgressMonitor.PREPEND_MAIN_LABEL_TO_SUBTASK)); } catch (OperationCanceledException e) { workspace.getWorkManager().operationCanceled(); return Status.CANCEL_STATUS; } finally {

if (depth >= 0) workspace.getWorkManager().endUnprotected(depth); workspace.endOperation(null, false, Policy.subMonitorFor(monitor, Policy.endOpWork));}

} catch (CoreException e) { return e.getStatus();} finally { monitor.done();}

Page 29: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

More Complex Stuff (2)

try { monitor.beginTask(null, Policy.totalWork); int depth = -1; try { workspace.prepareOperation(null, monitor); workspace.beginOperation(true); depth = workspace.getWorkManager().beginUnprotected(); return runInWorkspace(Policy.subMonitorFor(monitor, Policy.opWork, SubProgressMonitor.PREPEND_MAIN_LABEL_TO_SUBTASK)); } catch (OperationCanceledException e) { workspace.getWorkManager().operationCanceled(); return Status.CANCEL_STATUS; } finally {

if (depth >= 0) workspace.getWorkManager().endUnprotected(depth); workspace.endOperation(null, false,

Policy.subMonitorFor(monitor, Policy.endOpWork));}

} catch (CoreException e) { return e.getStatus();} finally { monitor.done();}

Page 30: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

More Complex Stuff (3)

try { monitor.beginTask(null, Policy.totalWork); int depth = -1; try { workspace.prepareOperation(null, monitor); workspace.beginOperation(true); depth = workspace.getWorkManager().beginUnprotected(); return runInWorkspace(Policy.subMonitorFor(monitor, Policy.opWork, SubProgressMonitor.PREPEND_MAIN_LABEL_TO_SUBTASK)); } catch (OperationCanceledException e) { workspace.getWorkManager().operationCanceled(); return Status.CANCEL_STATUS; } finally {

if (depth >= 0) workspace.getWorkManager().endUnprotected(depth);

workspace.endOperation(null, false, Policy.subMonitorFor(monitor, Policy.endOpWork));}

} catch (CoreException e) { return e.getStatus();} finally { monitor.done();}

Page 31: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Grammar for Workspace Transactions

Requires human intelligence Requires a lot of it Is actually an excellent pattern – haven’t seen runtime violations

S → O

O → w.prepareOperation()

w.beginOperation()

U

w.endOperation()

U → w.getWorkManager().beginUnprotected()

S

[w.getWorkManager().operationCanceled()]

w.getWorkManager().beginUnprotected()

Page 32: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Dynamic checking

Page 33: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Dynamically Check the Patterns

Home-grown bytecode instrumentor Get a list of matching patterns Instrument calls to any of the methods to dump parameters

Post-processing of the output Process a stream of events Find and count matches and mismatches

…o.register(d)…o.deregister(d)…o.deregister(d)

matched

mismatched

???

Page 34: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Experiments

Page 35: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Experimental Setup

Applied to Eclipse and jEdit 3,600,000 lines of Java code combined Included many plugins

Times: 6 days to fetch and process CVS histories 30 minutes to compute the patterns An hour to instrument 15 minutes to run And we are done!

Page 36: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Experimental Summary

Pattern classification: 56 patterns total 13 are usage patterns 8 are error patterns 11 are unlikely patterns 24 were not hit at

runtime Error patterns

Resulted in a total of 264 dynamically confirmed pattern violations

Page 37: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Summary

Knowing code patterns is important We explored using software histories:

Co-change often indicates patterns Use previous fixes (one-line changes) to

drive error patterns Found interesting patterns:

Matching method pairs State machines More complex stuff

Confirmed valid patterns Found pattern violations at runtime We have a paper in FSE 2005