Motivation
□ >50% of maintenance time spent trying to understand the program
□Where are the features,
reqs, etc. in the code?
3ICPC 2008Marc Eaddy
Motivation
□ >50% of maintenance time spent trying to understand the program
□Where are the features,reqs, etc. in the code?
□What is this code for?
4ICPC 2008Marc Eaddy
Motivation
□ >50% of maintenance time spent trying to understand the program
□Where are the features,reqs, etc. in the code?
□What is this code for?
□Why is it hard to
understand and change
the program?
5ICPC 2008Marc Eaddy
What is a “concern?”
□ Feature, requirement, design pattern, coding idiom, etc.
□ Raison d'être for code
□ Every line of code exists to satisfy some concern
6
Anything that affects the implementation of a program
ICPC 2008Marc Eaddy
Concern location problem
7
ConcernsProgram
Elements
Concern–code relationship hard to obtain
ICPC 2008Marc Eaddy
Concern location problem
8
ConcernsProgram
Elements
□ Concern–code relationship undocumented
Concern–code relationship hard to obtain
ICPC 2008Marc Eaddy
Concern location problem
9
ConcernsProgram
Elements
□ Concern–code relationship undocumented
□ Reverse engineer the relationship
□ (but, which one?)
Concern–code relationship hard to obtain
ICPC 2008Marc Eaddy
Software pruning
□ Remove code that supports certain features, reqs, etc.
□ Reduce program’s footprint
□ Support different platforms
□ Simplify program
10ICPC 2008Marc Eaddy
Prune dependency rule [ACOM’07]
□ Code is prune dependent on concern if
□ Pruning the concern requires removing or altering the code
□Must alter code that depends on removedcode
□ Prevent compile errors
□ Eliminate “dead code”
□ Easy to determine/approximate
11ICPC 2008Marc Eaddy
Automated concern location
□ Experts mine clues in code, docs, etc.
□ Existing techniques use 1 or 2 experts only
□ Our solution: Cerberus
1. Information retrieval
2. Execution tracing
3. Prune dependency analysis
12
Concern–code relationship predicted by an “expert”
ICPC 2008Marc Eaddy
IR-based concern location
□ i.e., Google for code
□ Program entities are documents
□ Requirements are queries
13
joinId_joi
njs_join(
)
Requirement
“Array.join”
Source
Code
ICPC 2008Marc Eaddy
Vector space model [Salton]
□ Parse code and reqs doc to extract term vectors□ NativeArray.js_join() method � “native,” “array,” “join”□ “Array.join” requirement � “array,” “join”
□ Our contributions□ Expand abbreviations
□ numconns � number, connections, numberconnections
□ Index fields
□ Weigh terms (tf · idf)□ Term frequency (tf)□ Inverse document frequency (idf)
□ Similarity = cosine distance between document and query vectors
14ICPC 2008Marc Eaddy
Tracing-based concern location
□ Observe elements activated when concern is exercised
□ Unit tests for each concern
□ e.g., find elements uniquely activated by a concern
15ICPC 2008Marc Eaddy
Tracing-based concern location
□ Observe elements activated when concern is exercised
□ Unit tests for each concern
□ e.g., find elements uniquely activated by a concern
16
Call
Graph
js_joi
n
var a = new Array(1,
2);
if (a.join(',') ==
"1,2")
{
print "Test
passed";
}
else {
print "Test
failed";
js_construct
Unit Test
for “Array.join”
Marc Eaddy
Tracing-based concern location
□ Observe elements activated when concern is exercised
□ Unit tests for each concern
□ e.g., find elements uniquely activated by a concern
17
Call
Graph
js_joi
n
var a = new Array(1,
2);
if (a.join(',') ==
"1,2")
{
print "Test
passed";
}
else {
print "Test
failed";
js_construct
Unit Test
for “Array.join”
Marc Eaddy
Prune dependency analysis
□ Infer relevant elements based on structural relationship to relevant element e (seed)
□ Assumes we already have some seeds
□ Prune dependency analysis
□ Determines prune dependency rule using program analysis
□ Find references to e
□ Find superclasses and subclasses of e
18ICPC 2008Marc Eaddy
PDA example
19
C AB
foofoomain bar
calls
contains
refs
containscontains contains
Program Dependency Graph
interface A {
public void foo();
}
public class B implements A {
public void foo() { ... }
public void bar() { ... }
}
public class C {
public static void main() {
B b = new B();
b.bar();
}
Source Code
inherits
ICPC 2008Marc Eaddy
20
C AB
foofoomain barcallscalls
contains
refs
containscontains contains
Program Dependency Graph
interface A {
public void foo();
}
public class B implements A {
public void foo() { ... }
public void bar() { ... }
}
public class C {
public static void main() {
B b = new B();
b.bar();
}
Source Code
inherits
PDA example
ICPC 2008Marc Eaddy
PDA example
21
C AB
foofoomain bar
containscontains
refs
containscontainscontains contains
Program Dependency Graph
interface A {
public void foo();
}
public class B implements A {
public void foo() { ... }
public void bar() { ... }
}
public class C {
public static void main() {
B b = new B();
b.bar();
}
Source Code
calls
inherits
ICPC 2008Marc Eaddy
PDA example
22
Program Dependency Graph
interface A {
public void foo();
}
public class B implements A {
public void foo() { ... }
public void bar() { ... }
}
public class C {
public static void main() {
B b = new B();
b.bar();
}
Source Code
inheritsinherits
C AB
foofoomain bar
refs
contains contains
calls
contains
contains
ICPC 2008Marc Eaddy
PDA example
23
Program Dependency Graph
interface A {
public void foo();
}
public class B implements A {
public void foo() { ... }
public void bar() { ... }
}
public class C {
public static void main() {
B b = new B();
b.bar();
}
Source Code
C AB
foofoomain bar
refs
contains containscontains
calls
contains
contains
inherits
ICPC 2008Marc Eaddy
Future work
□ Improve PDA□ Reimplemented using Soot and Polyglot
□ Generalize using prune dependency predicates
□ Improve precision using points-to analysis
□ Improve accuracy using□Dominator heuristic
□Variable liveness analysis
□ Improve accuracy of Cerberus□ Combine experts using matrix linear regression
28ICPC 2008Marc Eaddy
Cerberus contributions
□ Effectively combined 3concern location techniques
□ PDA boosts performance ofother techniques
Marc Eaddy ICPC 2008 29
Program Dependency Graph
interface A {
public void foo();
}
public class B implements A {
public void foo() { ... }
public void bar() { ... }
}
public class C {
public static void main() {
B b = new B();
b.bar();
}
Source Code
C AB
foofoomain bar
refs
contains contains
calls
contains
contains