Pitfalls In Aspect Mining

Pitfalls inAspect Mining

Pr. Kim MensUniversité catholique de Louvain

B-1348 Louvain-la-NeuveBelgium

[email protected]

Dr. Jens KrinkeKing’s College London

United [email protected]

Dr. Andy KellensVrije Universiteit Brussel

[email protected]

WCRE 2008, 15th Working Conference on Reverse EngineeringOctober 15th – 18th, 2008 Antwerp, Belgium

1

mailto:[email protected]






What’s this paper doing here?Reverse engineering is about

“recovering information from existing software and systems”

WCRE studies innovative

methods for extracting such information and

ways of using that information for system renovation and program understanding

Aspect mining tries to

identify potential aspects and crosscutting concerns from existing software systems

in order to improve the system's comprehensibility or

to enable its migration to an aspect-oriented solution

2

Why did we write this paper?

Partly out of frustration

Prior research on aspect mining

Co-authored ~8 papers since 2004,including some survey papers

Variety of techniques based on FCA, clustering, clone detection, ...

No satisfactory results : Why ?

3

Our goalMost proposed aspect mining techniques have not lived up to their expectations yet

Draw list of problems that most aspect mining techniques suffer from

Identify root causes underlying these problems

Provide suggestions for improvements

Moment of reflection on state of research in aspect mining

no big “surprises”

provide broader basis for discussion

4

Aspects in a nutshellimplementing a notify/listener

tanglingcode in one region addresses

multiple concerns s c a t t e r i n gcode addressing one concern

is spread around the system

public abstract class Customer {private CustomerID id;private Collection listeners;

public Address getAddress() { return this.address; } public void setLastName(String name) {

this.lastName = name; }public void setCustomerID(String id) {

this.id = id; notifyListeners(); }

...

public class CorporateCustomer {...private String companyName;private CompanyName taxNumber;...public void setCompanyName(String name) {

this.companyName = name; notifyListeners(); }

public void setTaxNumber(String nr) {this.taxNumber = nr; notifyListeners(); }

}

public class PrivateCustomer {...private String lastName;private String firstName;...public void setLastName(String name) {

this.lastName = name; notifyListeners(); }

public void setFirstName(String name) {this.firstName = name; notifyListeners(); }

}

public class CustomerListener {

public void notify(Customer modifiedCustomer) { System.out.println("Customer " + modifiedCustomer.getID() + " was modified"); }

}

the OO way

public abstract class Customer {private CustomerID id;public Address getAddress() {

return this.address; } public void setLastName(String name) {

this.lastName = name; }public void setCustomerID(String id) {

this.id = id; }...

public class CorporateCustomer {...private String companyName;private CompanyName taxNumber;...public void setCompanyName(String name) {

this.companyName = name; }public void setTaxNumber(String nr) {

this.taxNumber = nr; }}

public class PrivateCustomer {...private String lastName;private String firstName;...public void setLastName(String name) {

this.lastName = name; }public void setFirstName(String name) {

this.firstName = name; }}

public aspect ChangeNotification {

pointcut stateUpdate(Customer c) :execution(* Customer.set*(..)) &&this(c);

after(Customer c): stateUpdate(c) {for (Iterator iterator = c.listeners.iterator(); iterator.hasNext();) {

CustomerListener listener = (CustomerListener) iterator.next();listener.notify(c); }

}

... some interclass definitions here ...

pointcut

advice

the AO

way

clean separation of concerns

5

Aspect Mining

aspect 2aspect 3 aspect 1

Note:If you want to migrate towards aspects,aspect mining is only the first step.

You still need to “extract” the actual aspects from the discovered aspect

6

Why Aspect Mining?Legacy systems

large, complex systems

Not always clearly documented

Program understanding

useful to find crosscutting concerns (what?)

useful to find extent of the crosscutting concerns (where?)

First step in migration to aspect-oriented solution

or just to document the croscutting concerns

7

How does it work?(mostly) Variety of techniques from data mining, code analysis, reverse engineering

specifically redesigned to identify potential aspect candidates in software source code

by looking for symptoms of crosscutting concerns (scattering, tangling, code duplication, ...)

Semi-automated: manual intervention required to

set thresholds, fine-tune filters to apply, ...

verify, select and complete reported results

8

Problems with aspect mining

Poor precision

Poor recall

Subjectivity

Scalability

Empirical validation

Comparability

Composability

(At different levels of granularity)

9

Levels of granularityMake sure that you know what you are mining for

Example of joinpoints: - all mutators that notify a listener (“change notification aspects”)

aspects = what aspects are implemented in the source code

crosscutting sorts = all aspects or concerns of a given kind

joinpoints = places in the code that address a particular aspect

Example of aspects:- change notification, synchronisation, logging

Contract enforcement =The sort of all aspects that check a common condition for a set of methods.

Example of such an aspect; before updating a view check whether it is necessary to update.

Different techniques may work at different levels of granularity

Consequence:- difficult to compare- difficult to combine- technique may not return what you look for

10

Poor precision and poor recall

Precision = relevant candidates ÷ reported candidates

Poor precision => false positives => more user involvement

Recall = discovered aspects ÷ all aspects

Poor recall => false negatives => incomplete results

Hard to calculate

Recall is inversely correlated with precision

Poor precision or recall occurs at different levels of granularity

11

Example

In order to perform this evaluation, we use grok to pro-cess the sets of clone classes of both clone detectors sepa-rately. For each of the concerns we consider, we try to findan ordered selection of clone classes that does a good job at‘covering’ the region of code defined by the concern in ques-tion. A source code line of a concern is covered by a cloneclass if it is included in one of the clones (code fragments) ofthe clone class.For each concern, we then proceed as follows: for all of

the clone classes in the set, we calculate which concern linesare covered by each clone class. The clone class that cov-ers the most lines of the concern is selected, and the concernlines that are covered will no longer be considered duringthe remainder of the algorithm. Subsequently, the algorithmwill select the clone class that covers the most of the remain-ing concern lines, and so on until no more concern lines arecovered by any clone class. If it occurs that multiple cloneclasses cover an equal number of concern lines, we selectthe clone class that contains the least number of non-concernlines. Similar to lines belonging to a concern, non-concernslines are also considered at most once.

6. Obtained Results

Our primary goal is finding the code belonging to a cer-tain concern. Therefore, in our algorithm to select the cloneclasses (see Section 5), we favor coverage and sacrifice pre-cision (defined below). Arguably, other goals require differ-ent criteria to rank the clone classes. For example, in orderto identify opportunities for (automatic) refactoring, preci-sion would be the primary issue. We plan to explore thesepossibilities in the future.In order to evaluate to what extent the clone detectors

meet our goal, we investigate the level of concern coveragemet by the clone classes. Concern coverage is the fractionof a concern’s source code lines that are covered by the firstn selected clone classes. Using the selection algorithm de-scribed in Section 5 we obtain the results displayed in Fig-ure 2(a) and Figure 2(b) for Bauhaus’ ccdiml and CCFinder,respectively.Additionally, we evaluate the precision obtained by the

first n selected clone classes. Precision is defined as follows:

precision(n) =concernLines(n)totalLines(n)

,

where n indicates the first n selected clone classes, concern-Lines equals the number of concern code lines covered bythe first n selected clone classes, and likewise totalLinesequals the total number of lines covered by the first n se-lected clone classes. Figure 2(c) and Figure 2(d) show theprecision obtained by the first n selected clone classes forBauhaus’ ccdiml and CCFinder, respectively.

Observe that as the number of clone classes considered in-creases, the coverage displays a monotonic growth, whereasthe precision tends to decrease. The highest coverage isless than 100% in all cases: the remaining percentage cor-responds to concern code that is coded in such a unique waythat it does not occur in any clone class. For example, Fig-ure 2(a) and Figure 2(b) show that 5% of the memory errorhandling code is not part of any clone class.We are primarily interested in achieving sufficient cover-

age without loosing too much precision. Therefore, we willfocus on the number of clone classes needed to cover mostof a concern, where we will consider 80% to be a sufficientcoverage level.

6.1. Memory Error HandlingUsing 9 clone classes is enough to sufficiently cover thememory error handling concern for both Bauhaus’ ccdimland CCFinder, resulting in 69% and 52% precision, respec-tively.We observe that CCFinder yields a clone class that al-

ready covers 45% of the concern code. This particular cloneclass contains 96 clones which are 6 lines in length. Figure 3shows an example clone from this class. While the linesmarked with ‘M’ belong to the memory handling concern,only the lines marked with ‘C’ are included in the clones.CCFinder allows clones to start and end with little regard tosyntactic units. In contrast, Bauhaus’ ccdiml does not allowthis, due to its AST-based clone detection algorithm.

M C if (r != OK)M C {M C ERXA_LOG(r, 0, ("PLXAmem_malloc failure."));M CM C ERXA_LOG(VSXA_MEMORY_ERR, r,M C ("%s: failed to allocated %d bytes.",M func_name, toread));MM r = VSXA_MEMORY_ERR;M }

Figure 3. CCFinder clone covering memory errorhandling.

Furthermore this clone class does not cover memory er-ror handling code exclusively. In Figure 2(d), note that theprecision obtained for the first clone class is roughly 82%.Through inspection of the code we found that some of theclones do not cover memory error handling code at all, butcode that is similar at the syntactical level, yet semanticallydifferent.

6.2. Parameter CheckingOur results show that the parameter checking concern isfound very well by both clone detectors: using 7 cloneclasses of Bauhaus’ ccdiml is sufficient to cover 80% of theconcern, while for CCFinder we can suffice with 4 clone

5

• 3 clone detection techniques• 5 known aspects• 16KLOC C code• Aspects manually annotated

by programmer• Precision and recall compared

to manual annotations

Even for this “ideal” case still relatively poor precision ->

par t ially funded by t he Interuniversi ty A t t rac-t ion Poles P rogramme - B elgian St a te B elgianScience Policy.

References

1. Elisa Baniassad and Siobhan Clarke. Theme:An approach for aspect-oriented analysis anddesign. In Proc. Int’l Conf. Software Engineer-ing (ICSE), pages 158–167, Washington, DC,USA, 2004. IEEE Computer Society Press.

2. Elisa Baniassad, Paul C. Clements, JoaoAraujo, Ana Moreira, Awais Rashid, and BedirTekinerdogan. Discovering early aspects. IEEESoftware, 23(1):61–70, January-February 2006.

3. Len Bass, Mark Klein, and Linda Northrop.Identifying aspects using architectural reason-ing. Position paper presented at Early Aspects2004: Aspect-Oriented Requirements Engineer-ing and Architecture Design, Workshop of the3rd Int’l Conf. Aspect-Oriented Software Devel-opment (AOSD), 2004.

4. Magiel Bruntink, Arie van Deursen, Remcovan Engelen, and Tom Tourwe. An evalua-tion of clone detection techniques for identify-ing crosscutting concerns. In Proc. Int’l Conf.Software Maintenance (ICSM), pages 200–209.IEEE Computer Society, 2004.

5. Magiel Bruntink, Arie van Deursen, Remco vanEngelen, and Tom Tourwe. On the use of clonedetection for identifying cross cutting concerncode. IEEE Computer Society Trans. SoftwareEngineering, 31(10):804–818, 2005.

6. M. Ceccato, M. Marin, K. Mens, L. Moonen,P. Tonella, and T. Tourwe. Applying andcombining three di!erent aspect mining tech-niques. Software Quality Journal, 14(3):209–231, September 2006.

7. A. Kellens, K. Mens, and P. Tonella. A survey ofautomated code-level aspect mining techniques.Trans. AOSD, 2007. To be published.

8. Awais Rashid, Peter Sawyer, Ana M. D. Mor-eira, and Joao Araujo. Early aspects: A modelfor aspect-oriented requirements engineering.In Joint Int’l Conf. Requirements Engineering(RE), pages 199–202. IEEE Computer SocietyPress, 2002.

9. Bedir Tekinerdogan and Mehmet Aksit. De-riving design aspects from canonical mod-els. In S. Demeyer and J. Bosch, editors,Workshop Reader of the 12th European Conf.Object-Oriented Programming (ECOOP), Lec-ture Notes in Computer Science, pages 410–413.Springer-Verlag, 1998.

10. Charles Zhang and Hans-Arno Jacobsen. Ef-ficiently mining crosscutting concerns throughrandom walks. In AOSD ’07: Proceedings

of the 6th international conference on Aspect-oriented software development, pages 226–238,New York, NY, USA, 2007. ACM Press.

technique is rela t ively low. W hile t his low pre-cision is not a problem in se, i t does imply t ha taspect mining techniques tend to ret urn a lotof false posi t ives, which can be det riment al tot heir scalabili ty and ease-of-use. E specially fortechniques t ha t ret urn a large number of resul ts,t his lack of precision can be problema t ic, sincei t may require an impor t ant amount of user in-volvement to separa te the false posi t ives fromt he relevant aspect candida tes.

Note t ha t precision can be considered a t sev-eral levels of granulari ty. A t t he level of cross-cu t t ing sor ts: if we look for all aspects or con-cerns of a given kind, how many false posi t ivesdo we find t ha t do not belong to t ha t kind? A tt he level of individual aspects or concerns: do wefind some t hings t ha t are not really aspects orconcerns? A t t he level of joinpoints: for a givenaspect candida te or seed we detected, are t hecode fragments we find as belonging to t ha t con-cern really a par t of t ha t aspect?

E x a m p le B runt ink et al [4, 5] evalua ted t hesui t abili ty of clone detect ion techniques forau toma t ically ident ifying crosscu t t ing concerncode. T hey considered 16,406 lines of code be-longing to a large indust rial software systemand five known crosscu t t ing concerns t ha t ap-peared in t ha t code: memory handling, nullpointer checking, range checking, excep t ion han-dling and t racing. B efore applying t heir clonedetect ion techniques to mine for t he code frag-ments (lines of code) belonging to each of t hoseconcerns, t hey asked t he developer of t his codeto manually mark , for each line of code, to wha tconcern(s) i t belonged. N ex t , t hey applied t hreedi erent clone detect ion techniques to t he code:an A S T -based, a token-based and a P D G -basedone. In order to evalua te how well each of t het hree techniques succeeded in finding t he codet ha t implemented t he five crosscu t t ing concerns,t he resul ts of each of t he clone detect ion tech-niques were compared to t he manually markedoccurrences of t he di erent crosscu t t ing con-cerns, and precision and recall were calcula tedagainst t hose. Table 1 shows t he average preci-sion of t he t hree clone detect ion techniques foreach of t he five concerns considered.

A s can be seen from t he t able, t he resul ts oft his experiment were ra t her dispara te. For t henull pointer checking concern, all clone detectorsident ified t he concern code a t near-perfect preci-

Technique: A S T Token P D GConcern:M emory handling .65 .63 .81N ull pointer checking .99 .97 .80R ange checking .71 .59 .42E xcep t ion handling .38 .36 .35Tracing .62 .57 .68

Ta b le 1. A verage precision of each techniquefor each of t he five concerns

sion. For most of t he ot her concerns, none of t heclone detectors achieved sa t isfying precision.

R ela t ed p rob le ms Poor precision has a nega-t ive impact on scaleabili ty (3.5). T here is alsoa sub t le t rade-o between recall (3.2) and pre-cision: often bet ter precision can be reached a tt he cost of lower recall and vice versa.

3.2 P oor recall

D esc r i p t ion R ecall is t he propor t ion of rele-vant aspect candida tes t ha t were discovered ou tof all aspect candida tes present in t he sourcecode. In ot her words, recall gives an idea of howmany false nega t ives remain in t he code andt hus how well t he technique covers t he ent irecode analysed. A s for precision, recall can beconsidered a t several levels of granulari ty. A t t helevel of crosscu t t ing sor ts: if we look for all as-pects or concerns of a given kind, do we find allconcerns of t ha t kind which exist in t he code? A tt he level of individual aspects or concerns: do wefind all aspects and concerns t ha t are present int he code? A t t he level of joinpoints: do we findt he full ex tent of t he aspect or concern or doest he technique fail to discover some code frag-ments per t aining to t he aspect?

A problem wi t h calcula t ing recall is t ha t typ-ically, in a program under analysis, i t is notknown wha t t he relevant aspects and joinpointsare, excep t in an ideal case like t he valida t ionexperiment of B runt ink et al. (see above) wheret he concerns are known in advance and wherea programmer took t he t ime to mark each lineof code wi t h t he concern(s) i t belongs to. A sec-ond problem is t ha t most techniques will lookfor cer t ain symp toms of aspects only and t husare bound to miss occurrences of aspects t ha texhibi t di erent symp toms.

12

Subjectivity and scalabilitySubjectivity in interpretation of results

Filters, threshold values and blacklists configured by users

Ambiguity in interpretation of what is valid aspect candidate

“if it is part of the core functionality, it is not an aspect”

e.g. “Moving Figures” in JHotDraw

Scalability can be problematic due to user involvement

often many results to be validated / refined by user

looking for false positives / completing the aspect seeds

13

Evaluate, compare and combineEmpirical validation

no common benchmark

subjectivity in interpretation

results at different levels of detail and granularity

Comparability

how to compare the quality of mining techniques?

Composability

how to combine the results of different mining techniques?

14

Causes of the problemsInappropriate techniques

Too general-purpose

Too strong assumptions

Too optimistic approaches

Scattering versus tangling

Lack of use of semantic information

Imprecise definition of what is an aspect

Inadequate representation of results

15

Aspect mining problems and causes

CauseInappropriate techniques Imprecise

definition

Inadeq. repres. of results

too general purpose

too strong assumptions

too optimistic approaches

no attention to tangling

lack of use of sem. info

Problem

poor precision - - - - - -

poor recall - - - -

subjectivity - - -

scalability (-) (-) (-) (-) - (-)

emp. valid. - -

comparability - -

composability - -

What can we learn from this table?

Most causes negatively affect

either precision, recall or both

Poor precision negatively affects scalability: more user involv.

Only this one seems specific

to aspects

These three cause most problems

16

How to improve? (1)

Provide more rigourous definition of aspect

Dedicated mining techniques may be more successful than general-purpose ‘one size fits all’ aspect mining techniques

Rely on semantics rather than on code structure

need for stable semantic foundation

Desired quality depends on purpose of mining

what is it that you want to do with the mined information?

initial understanding vs. migration towards aspects

17

How to improve? (2)

Leave room for variability

Look for counter-evidence

Look for symptoms of tangling

Choose adequate and uniform way of presenting the results

enough detail but not too much

Combine results of different techniques

Provide common framework to compare and evaluate mining techniques

18

ConclusionMost encountered pitfalls not specific to “aspect mining”

relevant to any discovery / reverse engineering process

especially present in aspect mining due to relative immaturity of domain

potential for cross-fertilisation?

A word of warning

If you want to use aspect mining, don’t apply tools blindly

If you want to research aspect mining, still many research opportunities but also a high risk of failure

19

Pitfalls In Aspect Mining

Technology