DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Post on 27-Mar-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

DynaMine: Finding Common Error Patternsby Mining Software Revision Histories

Benjamin LivshitsStanford

University

Thomas ZimmermannSaarland University

A Box Full of Nails

A lot of promise potential excitement

Not that many success stories Not sure what to apply it to Let’s try this particularly exciting idea

Miners looking at their tools Promises, promises…

Interesting usage patterns found by CVS mining

Interesting error patterns found by CVS mining

My Background

Tools for bug detection Analysis: pointer analysis, etc. Mostly static, some dynamic

Applications: Security

Buffer overruns Format string violations SQL injections Cross-site scripting HTTP response splitting Data lifetimes

J2EE patterns Bad session stores Lapsed listeners

Eclipse patterns Missing calls to dispose Not calling super Forgetting to deregister listeners

Classification of Error Patterns

NULL dereferences Buffer overruns Double-deletes Locking errors/threads

Generic patterns -- the usual suspects

App-specific patterns particular to a system or a set of APIs

Bugs in Linux code

Bugs in J2EE servlets

Device drivers

Error Pattern Iceberg

NULL dereferencesBuffer overrunsDouble-deletesLocks/threads

Classification of Error Patterns

App-specific patterns particular to a system or a set of APIs

Intuition: Many other application-specific patterns exist Much of application-specific stuff remains a gray area so far

Goal: Let’s figure out what the patterns are

Generic patterns -- the usual suspects

NULL dereferences Buffer overruns Double-deletes Locking errors/threads

Anybody knows any good error

patterns specific to WinAmp plugins??

There are hundreds of

WinAmp plugins out there

Motivation: Matching Method Pairs

Start small: Matching method pairs Only two methods A very simple state machine Calls must match perfectly, order matters

Very common, our inspiration is System calls

fopen/fclose lock/unlock …

GUI operations addNotify/removeNotify addListener/removeListener createWidget/destroyWidget …

Want to find more of the same And, if are lucky, more interesting patterns

DynaMine: Our Insight

Our problem: Want to find patterns whose violation causes

errors Want to find patterns for program understanding

Our technique: Look at revision histories

Crucial observation:

Use data mining techniques to find method that are often added at the same time

Things that are frequently checked in together often form a pattern

DynaMine: Our Insight (continued)

Now we know the potential patterns “Profile” the patterns

Run the application See how many times each pattern

hits – number of times a pattern is followed misses – number of times a pattern is violated

Based on this statistics, classify the patterns Usage patterns – almost always hold Error patterns – violated a large number of the times, but

still hold most of the time Unlikely patterns – not validated enough times

Architecture of DynaMine

mine CVS histories patterns

run the application

post-process

usagepatterns

errorpatterns

unlikelypatterns

sort andfilter

revision history mining

dynamic analysis

report bugs

report patterns

reporting

instrument relevantmethod calls

Mining approach

Mining Basics

Rely on co-change Simplification: look at

method calls only Look for interesting

patterns in the way methods are called

Example: Sequence of revisions Files Foo.java, Bar.java,

Baz.java, Qux.java

o1.addListenero1.removeListener

o2.addListenero2.removeListenerSystem.out.println

o3.addListenero3.removeListenerlist.iteratoriter.hasNextiter.next

o4.addListenerSystem.out.println

o4.removeListener

Foo.java1.12

Bar.java1.47

Baz.java1.23

Qux.java1.41

1.42

Mining Matching Method Calls

Use our observation: Methods that are

frequently added simultaneously often represent a usage pattern

For instance: … addListener(…); … removeListener(…); …

o1.addListenero1.removeListener

o2.addListenero2.removeListenerSystem.out.println

o3.addListenero3.removeListenerlist.iteratoriter.hasNextiter.next

o4.addListenerSystem.out.println

o4.removeListener

Foo.java1.12

Bar.java1.47

Baz.java1.23

Qux.java1.41

1.42

Data Mining Summary

We consider method calls added in each check-in We want to find patterns of method calls

Too many potential patterns to consider Want to filter and rank them Use support and confidence for that

Support and confidence of each pattern Standard metrics used in data mining Support reflects how many times each pair appears Confidence reflects how strongly a particular pair is

correlated Refer to the paper for details

Improvements Over the Traditional Approach

Default data mining approach doesn’t quite work

Filters based on confidence and support Still too many potential patterns!

1. Filtering: Consider only patterns with the same initial

subsequence as potential patterns

2. Ranking: Use one-line “fixes” to find likely error patterns

Matching Initial Call Sequences

o1.addListenero1.removeListener

o2.addListenero2.removeListenerSystem.out.println

o3.addListenero3.removeListenerlist.iteratoriter.hasNextiter.next

o4.addListenerSystem.out.println

o4.removeListener

Foo.java1.12

Bar.java1.47

Baz.java1.23

Qux.java1.41

1.42

1 Pair

3 Pairs 1 Pair

10 Pairs 2 Pairs

1 Pair 0 Pairs

0 Pairs

Using Fixes to Rank Patterns

Look for one-call additions which likely indicate fixes

Rank patterns with such methods higher

o1.addListenero1.removeListener

o2.addListenero2.removeListenerSystem.out.println

o3.addListenero3.removeListenerlist.iteratoriter.hasNextiter.next

o4.addListenerSystem.out.println

o4.removeListener

Foo.java1.12

Bar.java1.47

Baz.java1.23

Qux.java1.41

1.42

This is a fix! Move patterns containing removeListener up

Applications under Study

Apply these ideas to the revision history of Eclipse and jEdit Very large open-source projects Many people working on both, are all over the

planet 122 on Eclipse 92 on jEdit

Many check-ins Eclipse 2,837,854 jEdit 144,495

Long histories Eclipse since 2001 jEdit since 2000

Some patterns

(as promised)

Categories of Patterns

Method calls during execution: Care about the methods Care about the order Care about the parameters/return values

Here’re some common cases

Matching method pairs

State machines

More complex patterns

Some Interesting Method Pairs (1)

kEventControlActivate kEventControlDeactivate

addDebugEventListener removeDebugEventListener

beginRule endRule

suspend resume

NewPtr DisposePtr

addListener removeListener

register deregister

addElementChangedListener removeElementChangedListener

addResourceChangeListener removeResourceChangeListener

addPropertyChangeListener removePropertyChangeListener

createPropertyList reapPropertyList

preReplaceChild postReplaceChild

addWidget removeWidget

stopMeasuring commitMeasurements

blockSignal unblockSignal

HLock HUnlock

OpenEvent fireOpen

Some Interesting Method Pairs (2)

kEventControlActivate kEventControlDeactivate

addDebugEventListener removeDebugEventListener

beginRule endRule

suspend resume

NewPtr DisposePtr

addListener removeListener

register deregister

addElementChangedListener removeElementChangedListener

addResourceChangeListener removeResourceChangeListener

addPropertyChangeListener removePropertyChangeListener

createPropertyList reapPropertyList

preReplaceChild postReplaceChild

addWidget removeWidgetstopMeasuring commitMeasurements

blockSignal unblockSignal

HLock HUnlock

OpenEvent fireOpen

Register/unregister the current widget with the parent display object for

subsequent event forwarding

Some Interesting Method Pairs (3)

kEventControlActivate kEventControlDeactivate

addDebugEventListener removeDebugEventListener

beginRule endRule

suspend resume

NewPtr DisposePtr

addListener removeListener

register deregister

addElementChangedListener removeElementChangedListener

addResourceChangeListener removeResourceChangeListener

addPropertyChangeListener removePropertyChangeListenercreatePropertyList reapPropertyList

preReplaceChild postReplaceChild

addWidget removeWidget

stopMeasuring commitMeasurements

blockSignal unblockSignal

HLock HUnlock

OpenEvent fireOpen

Add/remove listener for a particular kind of GUI events

Some Interesting Method Pairs (4)

kEventControlActivate kEventControlDeactivate

addDebugEventListener removeDebugEventListener

beginRule endRule

suspend resume

NewPtr DisposePtr

addListener removeListener

register deregister

addElementChangedListener removeElementChangedListener

addResourceChangeListener removeResourceChangeListener

addPropertyChangeListener removePropertyChangeListener

createPropertyList reapPropertyList

preReplaceChild postReplaceChild

addWidget removeWidget

stopMeasuring commitMeasurements

blockSignal unblockSignal

HLock HUnlockOpenEvent fireOpen

Use OS native locking mechanism for resources such as icons, etc.

State Machines

Order captured by a state machine Must be followed precisely: omitting

or repeating a method call is a sign of error.

Simplest formalism for describing the object life-cycle.

Matching method pairs – specific case Very common in C

Consider OS code Less common in Java, but…

State Machines (1)

o.enterAlignment [o.redoAlignment] o.exitAlignment

Part of the org.eclipse.jdt.internal.formatter.Scribe package responsible for pretty-printing of code

enterAlignment/exitAlignment pairs must match

redoAlignment is invoked in exception cases

State Machines (2)

o.beginCompoundEdit()(o.insert(...) | o.remove(...))+

o.endCompoundEdit()

Compound edits within jEdit: can be undone at once

beginCompoundEdit/endCompoundEdit act as brackets

Other operations inbetween

State Machines (3)

OS.PmMemCreateMC[OS.PmMemStart OS.PmMemFlush

OS.PmMemStop]OS.PmMemReleaseMC

Memory context manipulation (like memory pools)

Wrappers around underlying OS functionality The middle part of the pattern is optional

More Complex Stuff (1)

try { monitor.beginTask(null, Policy.totalWork); int depth = -1; try { workspace.prepareOperation(null, monitor); workspace.beginOperation(true); depth = workspace.getWorkManager().beginUnprotected(); return runInWorkspace(Policy.subMonitorFor(monitor, Policy.opWork, SubProgressMonitor.PREPEND_MAIN_LABEL_TO_SUBTASK)); } catch (OperationCanceledException e) { workspace.getWorkManager().operationCanceled(); return Status.CANCEL_STATUS; } finally {

if (depth >= 0) workspace.getWorkManager().endUnprotected(depth); workspace.endOperation(null, false, Policy.subMonitorFor(monitor, Policy.endOpWork));}

} catch (CoreException e) { return e.getStatus();} finally { monitor.done();}

More Complex Stuff (2)

try { monitor.beginTask(null, Policy.totalWork); int depth = -1; try { workspace.prepareOperation(null, monitor); workspace.beginOperation(true); depth = workspace.getWorkManager().beginUnprotected(); return runInWorkspace(Policy.subMonitorFor(monitor, Policy.opWork, SubProgressMonitor.PREPEND_MAIN_LABEL_TO_SUBTASK)); } catch (OperationCanceledException e) { workspace.getWorkManager().operationCanceled(); return Status.CANCEL_STATUS; } finally {

if (depth >= 0) workspace.getWorkManager().endUnprotected(depth); workspace.endOperation(null, false,

Policy.subMonitorFor(monitor, Policy.endOpWork));}

} catch (CoreException e) { return e.getStatus();} finally { monitor.done();}

More Complex Stuff (3)

try { monitor.beginTask(null, Policy.totalWork); int depth = -1; try { workspace.prepareOperation(null, monitor); workspace.beginOperation(true); depth = workspace.getWorkManager().beginUnprotected(); return runInWorkspace(Policy.subMonitorFor(monitor, Policy.opWork, SubProgressMonitor.PREPEND_MAIN_LABEL_TO_SUBTASK)); } catch (OperationCanceledException e) { workspace.getWorkManager().operationCanceled(); return Status.CANCEL_STATUS; } finally {

if (depth >= 0) workspace.getWorkManager().endUnprotected(depth);

workspace.endOperation(null, false, Policy.subMonitorFor(monitor, Policy.endOpWork));}

} catch (CoreException e) { return e.getStatus();} finally { monitor.done();}

Grammar for Workspace Transactions

Requires human intelligence Requires a lot of it Is actually an excellent pattern – haven’t seen runtime violations

S → O

O → w.prepareOperation()

w.beginOperation()

U

w.endOperation()

U → w.getWorkManager().beginUnprotected()

S

[w.getWorkManager().operationCanceled()]

w.getWorkManager().beginUnprotected()

Dynamic checking

Dynamically Check the Patterns

Home-grown bytecode instrumentor Get a list of matching patterns Instrument calls to any of the methods to dump parameters

Post-processing of the output Process a stream of events Find and count matches and mismatches

…o.register(d)…o.deregister(d)…o.deregister(d)

matched

mismatched

???

Experiments

Experimental Setup

Applied to Eclipse and jEdit 3,600,000 lines of Java code combined Included many plugins

Times: 6 days to fetch and process CVS histories 30 minutes to compute the patterns An hour to instrument 15 minutes to run And we are done!

Experimental Summary

Pattern classification: 56 patterns total 13 are usage patterns 8 are error patterns 11 are unlikely patterns 24 were not hit at

runtime Error patterns

Resulted in a total of 264 dynamically confirmed pattern violations

Summary

Knowing code patterns is important We explored using software histories:

Co-change often indicates patterns Use previous fixes (one-line changes) to

drive error patterns Found interesting patterns:

Matching method pairs State machines More complex stuff

Confirmed valid patterns Found pattern violations at runtime We have a paper in FSE 2005

top related