Scalability Solutions for Program Comprehension Through ... · Scalability Solutions for Program Comprehension Through Dynamic Analysis Andy Zaidman Promotor: prof. dr. Serge Demeyer

Scalability Solutions forProgram Comprehension

Through Dynamic Analysis

Andy Zaidman

Promotor: prof. dr. Serge Demeyer

Proefschrift ingediend tot het behalen van de graad van

Doctor in de Wetenschappen

Acknowledgements

First and foremost, the support of my family has been instrumental. I couldnot have done it without every one of you. A very special thank you goes toWendy, my wife and best friend, for always being there and supporting me.

Over the years prof. Demeyer has given me carte blanche when it comesto doing my research, yet was always there to provide me with preciousadvice and the necessary support at times when there were more questionsthan answers. Serge, thank you very much!

Every day, Filip Van Rysselberghe, my office mate, was there to cheerme up, provide answers to unanswerable questions and... provide music. Forall other questions, Bart Du Bois, Hans Stenten, Niels Van Eetvelde, PieterVan Gorp, Hans Schippers, Bart Van Rompaey, Matthias Rieger and OlafMuliawan were always ready to provide their opinions and insights. Manyother people from the department were always in for a chat when research wasgoing slow and helped me keep my spirits high. Thank you all for providingsuch a pleasant atmosphere.

I had the opportunity to be able to enjoy a number of collaborations withamongst others Bram Adams, Toon Calders, Kris De Schutter, Orla Greevy,Wahab Hamou-Lhadj and Kim Mens. To them I say: keep up the good work!

A sincere word of gratitude goes out to the members of my doctoraljury, who, through their extensive reviews, have helped this dissertation be-come significantly better. Thank you Chris Blondia, Serge Demeyer, TheoD’Hondt, Stephane Ducasse, Dirk Janssens, Jan Paredaens and Tarja Systa.

To all my friends that kept spurring me on: I hope I can one day repayyou guys.

And finally to everyone who over the years has played a role — small orbig — in this dissertation and the work leading up to it: I will not forgetand you will not be forgotten...

i

Abstract

Dynamic analysis has long been a subject of study in the context of (com-piler) optimization, program comprehension, test coverage, etc. Ever-since,the scale of the event trace has been an important issue. This scalabilityissue finds its limits on the computational front, where time and/or spacecomplexity of algorithms become too large to be handled by a computer, butalso on the cognitive front, where the results presented to the user becometoo large to be easily understood.

This research focusses on delivering a number of dynamic analysis basedprogram comprehension solutions that help software engineers to focus on thesoftware system during their initial program exploration and comprehensionphases.

The key concepts we use in our techniques are frequency of executionand runtime coupling. Both techniques deliver a solution which can helpthe software engineer bring focus into his or her comprehension process byannotating parts of the trace that contain similarities or by bringing outthe key concepts (classes) of a system. To validate our techniques we useda number of open-source software systems, as well as an industrial legacyapplication.

iii

Contents

I Introduction 1

1 Introduction 31.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 The modalities of change . . . . . . . . . . . . . . . . . . . . . 51.3 Program comprehension . . . . . . . . . . . . . . . . . . . . . 61.4 Lack of documentation . . . . . . . . . . . . . . . . . . . . . . 71.5 Dynamic analysis . . . . . . . . . . . . . . . . . . . . . . . . . 71.6 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.7 Solution space . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.7.1 Run-time coupling based heuristic . . . . . . . . . . . . 91.7.2 Frequency based heuristic . . . . . . . . . . . . . . . . 101.7.3 Research contributions . . . . . . . . . . . . . . . . . . 10

1.8 Academic context . . . . . . . . . . . . . . . . . . . . . . . . . 111.9 Organization of this dissertation . . . . . . . . . . . . . . . . . 12

2 Program comprehension 132.1 What is program comprehension? . . . . . . . . . . . . . . . . 132.2 Program understanding as a prerequisite . . . . . . . . . . . . 142.3 Program comprehension models . . . . . . . . . . . . . . . . . 15

2.3.1 Top-down program comprehension models . . . . . . . 162.3.2 Bottom-up program comprehension models . . . . . . . 162.3.3 Integrated model . . . . . . . . . . . . . . . . . . . . . 17

2.4 On the influence of comprehension tools . . . . . . . . . . . . 18

3 Dynamic Analysis 193.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Why dynamic analysis? . . . . . . . . . . . . . . . . . . . . . . 20

3.2.1 Goal oriented strategy . . . . . . . . . . . . . . . . . . 203.2.2 Polymorphism . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 Modalities of dynamic analysis . . . . . . . . . . . . . . . . . . 213.3.1 Example execution trace . . . . . . . . . . . . . . . . . 22

v

vi CONTENTS

3.3.2 Trace extraction technologies . . . . . . . . . . . . . . 223.3.3 Implementation . . . . . . . . . . . . . . . . . . . . . . 24

3.4 The observer effect . . . . . . . . . . . . . . . . . . . . . . . . 253.5 Threats to using dynamic analysis . . . . . . . . . . . . . . . . 253.6 Strengths and weaknesses . . . . . . . . . . . . . . . . . . . . 26

II Coupling based solutions for program compre-hension 27

4 Coupling & program comprehension 294.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.3 Dynamic coupling . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 314.3.2 Classification of dynamic coupling measures . . . . . . 324.3.3 Dynamic coupling for program comprehension . . . . . 33

4.4 Research question . . . . . . . . . . . . . . . . . . . . . . . . . 354.5 Research plan . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.6 Validation and evaluation . . . . . . . . . . . . . . . . . . . . 364.7 Practical application . . . . . . . . . . . . . . . . . . . . . . . 36

5 Webmining 375.1 Indirect coupling . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.1.1 Context and definition . . . . . . . . . . . . . . . . . . 375.1.2 Relevance in program comprehension context . . . . . 38

5.2 The HITS webmining algorithm . . . . . . . . . . . . . . . . . 395.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 395.2.2 HITS algorithm . . . . . . . . . . . . . . . . . . . . . . 405.2.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.3 Practical application . . . . . . . . . . . . . . . . . . . . . . . 42

6 Experiment 436.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . 43

6.1.1 Case studies . . . . . . . . . . . . . . . . . . . . . . . . 436.1.2 Execution scenarios . . . . . . . . . . . . . . . . . . . . 446.1.3 Program comprehension baseline . . . . . . . . . . . . 446.1.4 Validation . . . . . . . . . . . . . . . . . . . . . . . . . 456.1.5 Research plan . . . . . . . . . . . . . . . . . . . . . . . 456.1.6 Threats to validity . . . . . . . . . . . . . . . . . . . . 46

6.2 Apache Ant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

CONTENTS vii

6.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 476.2.2 Architectural overview . . . . . . . . . . . . . . . . . . 476.2.3 Execution scenario . . . . . . . . . . . . . . . . . . . . 496.2.4 Discussion of results . . . . . . . . . . . . . . . . . . . 49

6.3 Jakarta JMeter . . . . . . . . . . . . . . . . . . . . . . . . . . 516.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 516.3.2 Architectural overview . . . . . . . . . . . . . . . . . . 526.3.3 Execution scenario . . . . . . . . . . . . . . . . . . . . 526.3.4 Discussion of results . . . . . . . . . . . . . . . . . . . 53

6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.4.1 Experimental observations . . . . . . . . . . . . . . . . 55

6.5 Observations with regard to the research question . . . . . . . 56

7 Static coupling 577.1 Introduction & motivation . . . . . . . . . . . . . . . . . . . . 577.2 A static coupling metrics framework . . . . . . . . . . . . . . . 587.3 Expressing IC CC′ statically . . . . . . . . . . . . . . . . . . . 597.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

7.4.1 Ant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617.4.2 JMeter . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637.5.1 Practical implications . . . . . . . . . . . . . . . . . . . 637.5.2 Comparing static and dynamic results . . . . . . . . . 657.5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 65

III Frequency based solutions for program compre-hension 67

8 Frequency Spectrum Analysis 698.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

8.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 698.1.2 Research questions . . . . . . . . . . . . . . . . . . . . 708.1.3 Solution space . . . . . . . . . . . . . . . . . . . . . . . 718.1.4 Formal background . . . . . . . . . . . . . . . . . . . . 72

8.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728.3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . 77

8.3.1 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . 778.3.2 The experiment itself . . . . . . . . . . . . . . . . . . . 788.3.3 Case studies . . . . . . . . . . . . . . . . . . . . . . . . 78

8.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

viii CONTENTS

8.4.1 Jakarta Tomcat 4.1.18 . . . . . . . . . . . . . . . . . . 79

8.4.2 Fujaba 4.0 . . . . . . . . . . . . . . . . . . . . . . . . . 84

8.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

8.5.1 Connection with hypothesis . . . . . . . . . . . . . . . 90

8.5.2 Connection with the research questions . . . . . . . . . 91

8.5.3 Open questions . . . . . . . . . . . . . . . . . . . . . . 92

IV Industrial experiences 93

9 Industrial case studies 95

9.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

9.2 Industrial partner . . . . . . . . . . . . . . . . . . . . . . . . . 96

9.3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . 97

9.3.1 Mechanism to collect run-time data . . . . . . . . . . . 97

9.3.2 Execution scenario . . . . . . . . . . . . . . . . . . . . 100

9.3.3 Details of the system under study . . . . . . . . . . . . 100

9.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

9.4.1 Experimental setup of the validation phase . . . . . . . 101

9.4.2 Webmining . . . . . . . . . . . . . . . . . . . . . . . . 101

9.4.3 Frequency analysis . . . . . . . . . . . . . . . . . . . . 103

9.5 Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

9.5.1 Adapting the build process . . . . . . . . . . . . . . . . 105

9.5.2 Legacy issues . . . . . . . . . . . . . . . . . . . . . . . 107

9.5.3 Scalability issues . . . . . . . . . . . . . . . . . . . . . 108

9.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

V Concluding parts 113

10 Related Work 115

10.1 Dynamic analysis . . . . . . . . . . . . . . . . . . . . . . . . . 115

10.2 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

10.3 Industrial experiences . . . . . . . . . . . . . . . . . . . . . . . 120

11 Conclusion 123

11.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

11.1.1 Dynamic coupling . . . . . . . . . . . . . . . . . . . . . 123

11.1.2 Relative frequency of execution . . . . . . . . . . . . . 124

11.2 Opportunities for future research . . . . . . . . . . . . . . . . 125

VI Appendices 127

A HITS webmining 129A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129A.2 Setup and proof . . . . . . . . . . . . . . . . . . . . . . . . . . 129

B Frequency analysis results for TDFS 133

x CONTENTS

List of Figures

3.1 Example execution trace . . . . . . . . . . . . . . . . . . . . . 22

5.1 Indirect coupling example. . . . . . . . . . . . . . . . . . . . . 38

5.2 Indirect coupling example. . . . . . . . . . . . . . . . . . . . . 39

5.3 Example graph . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.1 Simplified class diagram of Apache Ant. . . . . . . . . . . . . 48

7.1 Piece of Java code to help explain metrics. . . . . . . . . . . . 59

8.1 Frequency annotation example. . . . . . . . . . . . . . . . . . 74

8.2 Example of identical execution frequency. . . . . . . . . . . . . 76

8.3 Frequency pattern. . . . . . . . . . . . . . . . . . . . . . . . . 77

8.4 Tomcat with dissimilarity measure using window size 2 . . . . 80

8.5 Tomcat with dissimilarity measure using window size 5 . . . . 81

8.6 Tomcat with dissimilarity measure using window size 10 . . . 82

8.7 Tomcat with dissimilarity measure using window size 20 . . . 83

8.8 Example of two execution traces with possible polymorphism . 83

8.9 Blowup of the interval [80000, 100000] of Figure 8.7 to showfrequency patterns . . . . . . . . . . . . . . . . . . . . . . . . 85

8.10 Fujaba with dissimilarity measure using window size 20 . . . . 86

8.11 Fujaba scenario with a high degree of repetition . . . . . . . . 87

8.12 Duploc output of part of the trace (event interval 44 000 to54 000). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

9.1 Three frequency clusters from the TDFS application . . . . . . 104

9.2 Original makefile. . . . . . . . . . . . . . . . . . . . . . . . . . 106

9.3 Adapted makefile. . . . . . . . . . . . . . . . . . . . . . . . . . 106

9.4 Original esql makefile. . . . . . . . . . . . . . . . . . . . . . . 106

9.5 Adapted esql makefile. . . . . . . . . . . . . . . . . . . . . . . 106

xi

xii LIST OF FIGURES

10.1 Simple interaction diagram (a) and its corresponding execu-tion pattern (b) [De Pauw et al., 1998] . . . . . . . . . . . . . 119

List of Tables

2.1 Tasks and activities requiring code understanding. . . . . . . . 15

3.1 Strengths and weaknesses of dynamic analysis . . . . . . . . . 26

4.1 Dynamic coupling classification. . . . . . . . . . . . . . . . . . 324.2 Dynamic coupling measures [Arisholm et al., 2004]. . . . . . . 34

5.1 Example of the iterative nature of the HITS algorithm. Tupleshave the form (H,A). . . . . . . . . . . . . . . . . . . . . . . . 42

6.1 Ant metric data overview. . . . . . . . . . . . . . . . . . . . . 506.2 JMeter metric data overview. . . . . . . . . . . . . . . . . . . 536.3 Strengths and weaknesses of the proposed coupling-based tech-

niques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7.1 Ant metric data overview. . . . . . . . . . . . . . . . . . . . . 617.2 JMeter metric data overview. . . . . . . . . . . . . . . . . . . 637.3 Comparison of the strengths and weaknesses of the static and

the dynamic webmining approach. . . . . . . . . . . . . . . . . 65

8.1 Comparison of total tracing versus filtered tracing. . . . . . . . 73

9.1 System passport . . . . . . . . . . . . . . . . . . . . . . . . . . 1009.2 Results of the webmining technique . . . . . . . . . . . . . . . 1029.3 Overview of the time-effort of the analyses. . . . . . . . . . . . 109

xiv LIST OF TABLES

Part I

Introduction

1

Chapter 1

Introduction

In the beginning the Universe was created. This has made a lot of peoplevery angry and has been widely regarded as a bad move.

—Douglas Adams

Greenfield software development is fun! Building software systems fromscratch allows you to be your creative self. You and your team are able todefine a whole range of parameters, whether it be the global architecture of thesystem, some fancy design quirks, the choice of technologies, etc. Sometimeseven, the programming language to be used can be your favorite one.

However, in a world were everything changes at a seemingly continuallyincreasing rate, software has to keep up with the changing environment inorder to stay useful and keep fulfilling the expectations one has of it. Whensoftware that is already in place needs to change, the software developmentteam is confronted with a whole new set of challenges when compared to thegreenfield software development situation.

One of these challenges — and probably the most time-consuming one— is trying to understand the existing piece of software, a discipline whichis termed “program comprehension”. Typically, this process is aided by theavailable documentation, however, often this documentation is either non-existent or out-dated. This dissertation is about providing solutions to beused in this situation, specifically to be used during the initial phases of theprogram comprehension process. The solutions we present are based upondynamic analysis, or the analysis of data gathered during the execution of asoftware system.

3

4 CHAPTER 1. INTRODUCTION

1.1 Context

Legacy software is software that is still very much useful to an organization– quite often even indispensable – but the evolution of which becomes toogreat a burden [Bennett, 1995, Demeyer et al., 2003]. Legacy software isomni-present: think of the large software systems that were designed andfirst put to use in the 1960’s or 1970’s; these software systems are nowadaysoften the backbone of large multinational corporations. For banks, healthcareinstitutions, etc. these systems are vital for their daily operations. As such,failure of these software systems is not an option and that is why these trusted“oldtimers” are still cared for every day. Furthermore, they are still beingevolved to keep up with the current and future business requirements.

We propose to use this definition of Brodie and Stonebraker [Brodie andStonebraker, 1995], which gives an apt description of a legacy system:

“Any information system that significantly resists modificationand evolution to meet new and constantly changing business re-quirements.”

Note that this definition implies that age is no criterion when consideringwhether a system is a legacy system [Demeyer et al., 2003].

As an example from the world of banking, we still see data formats inuse today which have been defined decades ago. Access to that data is beingprovided through special applications or modules. These have now becomelegacy systems, but if any of these were to fail, a downtime of a day or twocan mean bankruptcy for the company in question.

To make things worse, evolving a system can exaggerate the legacy prob-lem. To paraphrase Lehman, an evolving system increases its complexity,unless work is done to reduce it [Lehman and Belady, 1985]. This in-crease in complexity is further enlarged when the original developers, expe-rienced maintainers or up-to-date documentation are not available [Sneed,2005, Brodie and Stonebraker, 1995, Moise and Wong, 2003, de Oca andCarver, 1998, Demeyer et al., 2003]. A number of solutions to cope withevolution have been proposed in the field of software reengineering [Chikof-sky and Cross II, 1990,Bennett, 1995,Sneed, 1996].

1.2 The modalities of change

When one wants to apply countermeasures to stabilize or reduce complex-ity, the software engineer would ideally like to have a deep insight into theapplication when starting his/her reengineering (or better still refactoring)

1.2. THE MODALITIES OF CHANGE 5

operation [Sneed, 2004, de Oca and Carver, 1998, Lehman, 1998]. Yet thisunderstanding is often found lacking as, over time, legacy applications be-come something of magical black boxes. For one, there is the “if it ain’tbroke, don’t fix it” attitude which often gets in the way. Secondly, there isthe problem of out-of-sync documentation, which hinders program compre-hension for maintainers and new developers alike [Chikofsky and Cross II,1990,Moise and Wong, 2003].

Nevertheless, this insight is certainly necessary to be able to apply thesecountermeasures reliably, economically and promptly. Going beyond apply-ing countermeasures to stabilize or reduce complexity, is the need to integratesoftware systems that were originally not conceived to work together. The1990s were characterized by an increase in spending on information technol-ogy, partly due to the so-called “dot com bubble”. During this period, theproblem on integrating standalone applications became known under the flagof Enterprise Application Integration or EAI [Linthicum, 1999]. During theearly 2000s, the community shifted its focus towards more loosely connectedcomponents and the problem of integrating software systems became knownas building up a Service Oriented Architecture or SOA [Gold et al., 2004].

As such, apart from a status-quo scenario, in which the business has toadapt to the software that resists change, a number of scenarios are frequentlyseen [Bennett, 1995]:

1. Rewrite the application from scratch, from the legacy environment, tothe desired one, using a new set of requirements.

2. Reverse engineer the application and perform a rewrite of the applica-tion from scratch, from the legacy environment, to the desired one.

3. Refactor the application. One can refactor the old application, withoutmigrating it, so that change requests can be efficiently implemented; orrefactor it to migrate it to a different platform.

4. Often, in an attempt to limit the costs, the old application is “wrapped”and becomes a component in, or a service for, a new software system.In this scenario, the software still delivers its useful functionality, withthe flexibility of a new environment. This works fine and the factthat the old software is still present is slowly forgotten. This leads toa phenomenon which can be called the black-box syndrome: the oldapplication, now component or service in the new system, is trustedfor what it does, but nobody knows – or wants to know – what goeson internally.

5. A last possibility is a mix of the previous options, in which the oldapplication is seriously changed before being set-up as a component orservice in the new environment.

Intuitively, it is clear that for scenarios 2 through 5, a certain level of


insight into the existing application is necessary before reengineering cansafely begin. This is where the discipline of program comprehension comesin.

1.3 Program comprehension

When programming a piece of software, the programmer has to build a men-tal bridge that connects the code he/she is writing and the program behaviorhe/she is trying to accomplish [Renieris and Reiss, 1999]. Conversely, whena programmer is trying to gain an understanding of a system, he is actuallytrying to get the reverse mapping: from the functionality that is present inthe system to the code that is performing that very functionality.

Depending on the source, literature suggests that between 30 and 60% of asoftware engineer’s time is spent in the program comprehension phase, whereone has to study existing code, documentation and other design artifacts inorder to get an adequate understanding of a software system [Spinellis, 2003,Wilde, 1994,Corbi, 1990]. Adequate being the level where the programmerfeels comfortable that his change operation(s) will not harm the system’sarchitecture, design or functionality in a bad way.

The manner in which a programmer builds up his understanding of asoftware system varies greatly. It is mostly dependent on the individual, butcan be influenced by the magnitude of the system under study, the level ofunderstanding needed for the task at hand, the programming language, thefamiliarity with the system under study, previous experiences in the domain,etc. [Lakhotia, 1993, von Mayrhauser and Vans, 1995]. While in theory itis necessary to understand the entire system before making any changes, inpractice it is essential to use a goal oriented or as-needed strategy: you wantto get an understanding of the part of the system that you are specificallyinterested in with regard to the task at hand. Furthermore, due to economicalconstraints this should happen both quickly and thoroughly [Lukoit et al.,2000].

Realizing that program comprehension is such an important part of everysoftware engineer’s life, we wonder how we can provide stimuli to make thecomprehension process more efficient.

1.4 Lack of documentation

When focussing on delivering a program comprehension solution, we do makethe assumption that the program comprehension process happens without

1.5. DYNAMIC ANALYSIS 7

adequate documentation being available. A number of factors make us be-lieve that for many software systems this assumption is more often than nota reality:

• A lot of the knowledge of an application is not written down in doc-umentation, but is present in the heads of the developers. The infor-mation technology (IT) business, being a sector with a high degree ofpersonnel volatility, is certainly at risk of losing a lot of this implicitknowledge about their software systems when personnel is leaving theproject or leaving the company altogether [Chikofsky and Cross II,1990].• As we already mentioned, software systems have to evolve. This evolu-

tion can cause a drift away from the original architecture [Mens, 2000].Furthermore, keeping the documentation synchronized with those evo-lutionary changes does not always happen in a structured way [Chikof-sky and Cross II, 1990,Moise and Wong, 2003].• From our own experiences with industry we also know that the software

system’s documentation is often not up to date. More on this can befound in Chapter 9.

1.5 Dynamic analysis

Dynamic analysis is the study of running software with the aim of extractingproperties of the software system. Typically, software is run according to anexecution scenario and run-time information is stored in a so-called execu-tion trace. Opposite to this dynamic approach stands the concept of staticanalysis, which extracts software system properties from artifacts such assource code, documentation, architectural diagrams or design information.Within this research, dynamic analysis is the basic means by which we wantto stimulate the program comprehension process.

Dynamic analysis has long been a subject of study in the context of (com-piler) optimization, test coverage, etc. It has also been extensively researchedfor program comprehension purposes, sometimes in a pure dynamic analysiscontext, sometimes in a mixed static-dynamic context [Ball, 1999,Eisenbarthet al., 2001, Richner, 2002, Systa, 2000a, Sayyad-Shirabad et al., 1997, El-Ramly et al., 2002,Jerding and Rugaber, 1997,Gschwind et al., 2003,Greevyand Ducasse, 2005,Hamou-Lhadj et al., 2005,Hamou-Lhadj et al., 2004]. Al-though results have been encouraging, the problem of scalability has beenrecognized as an important stumbling point [Larus, 1993, Smith and Korel,2000]. In the context of using dynamic analysis for program comprehensionpurposes, this problem of scalability has three major components, namely:


• A computational component, where the scalability of the underlyingalgorithm for the program comprehension tool is important in orderto make sure that the results (1) can be computed and (2) can becomputed in a reasonable amount of time [Larus, 1993].• A visual component, where the resultset has to be scalable to make it

easy for the user to interpret the results [Jerding and Rugaber, 1997,Jerding and Stasko, 1998,Jerding et al., 1997].• A cognitive component, where the resultset presented to the end user

should be of an acceptable size so that an information overload of theend user can be avoided [Zayour and Lethbridge, 2001].

As such, within the research presented in this dissertation, we will putan emphasis on the scalability of the techniques we propose. This scalabilityis mainly concentrated around developing scalable algorithms and providingconcise resultsets.

1.6 Hypothesis

Run-time coupling measures and relative frequency of exe-cution are two axes in the run-time information-space thatcan help us build heuristics for program comprehension pur-poses. Furthermore, these two axes allow to build in scala-bility, both at the level of the algorithm and at the level ofthe resultset.• Run-time coupling measures allow us to identify must-

be-understood classes in the software system.• Frequency of execution allows us to identify regions of

the trace that are highly repetitive.

1.7 Solution space

As we have already mentioned, there is a clear emphasis on scalability for thetechniques that we have developed. To be more precise, we have defined twoheuristical techniques that allow a huge event trace to be reduced to a moreabstract representation that is presented to the user. We will now brieflyintroduce these two heuristics.

1.7. SOLUTION SPACE 9

1.7.1 Run-time coupling based heuristic

The basic idea behind using coupling is the fact that structural dependenciesbetween modules of a system can indicate modules that are interesting forinitial program comprehension [Robillard, 2005]. As a measure we use run-time export coupling, which — provided we have a well-covering executionscenario — gives us all actual dependencies that happen at runtime. Mod-ules which exhibit a high level of export coupling request other modules to dowork for them (delegation) and often contain important control structures.

Coupling measures however are typically between two classes or modules,whereas we want to take into consideration the complete structural topologyof the application. To overcome this strict binary relation between modules,we add a transitive measurement for reasoning over the topology. We usewebmining techniques for this [Zaidman et al., 2005]. Webmining, a branchof datamining research, analyzes the topological structure of the web tryingto identify important web pages based solely on their hyperlink structure.By interpreting call graphs as web graphs, we port this technique so that weare able to retrieve important classes. An important class being a class thatneeds to be understood early on in the program comprehension process inorder to understand other classes and the interactions between these classes.

The resultset obtained from this heuristic is a list of all the classes/modulesof which containing procedures were executed during the scenario. Theseclasses/ modules are ranked from being important to being irrelevant dur-ing early program comprehension phases. To validate our approach we usedtwo open source case studies, namely Apache Ant 1.6.1 and Jakarta JMeter2.0.1. The actual validation was done by comparing the results obtained toextensive design documentation that was publicly available.

Furthermore, we applied this heuristic on an industrial legacy C system.In contrast to the open source case studies where we had to rely on docu-mentation available on the internet, we were now able to validate the resultswe obtained with the original developers and current maintainers of the ap-plication. The results of this industrial experiment confirm the value of thistechnique.

We expand upon the aforementioned techniques in Part II of this disserta-tion. In Part IV we report on our industrial experiences with this approach.

1.7.2 Frequency based heuristic

Thomas Ball [Ball, 1999] introduced the concept of “Frequency SpectrumAnalysis”, a way to correlate procedures, functions and/or methods throughtheir relative calling frequency. The idea is based around the observation that


a relatively small number of methods/procedures is responsible for a hugeevent trace. As such, a lot of repeated calling of procedures happens duringthe execution of the program. By trying to correlate these frequencies, wecan learn something about (1) the size of the inputset, (2) the size of theoutputset and —most interesting for us— (3) calling relationships betweenmethods/procedures.

We build further upon this idea, by proposing a visualization of the tracethat allows for visual detection of parts of the event trace that show tightlycollaborating methods. We applied this technique on two open source casestudies, namely Apache Tomcat 4.1.18 and Fujaba 4.0.

The visualization we propose resembles a “heartbeat” as seen on an ECGor electrocardiogram and should be interpreted in a similar way. For regionsin the trace where tightly collaborating methods are executed, the visualiza-tion shows a very regular pattern, like a ECG of a heart that is “in rest”.Regions in the trace where the collaboration between methods is less tight arevisualized much more erratically. This distinction can help the software engi-neer concentrate on those parts of the trace that he is particularly interestedin.

We expand upon this visualization in Part III of this dissertation andshow a variation of this approach that we used in the industrial applicationthat is discussed in Part IV.

1.7.3 Research contributions

The major research contributions of this thesis are:

• A technique to identify key classes during early program comprehensionphases [Zaidman et al., 2005,Zaidman et al., 2006b].• A technique to visualize execution traces and identify similar parts in

the execution [Zaidman and Demeyer, 2004].• A large-scale industrial case studies to evaluate the scalability of the

aforementioned techniques [Zaidman et al., 2006a].

1.8 Academic context

The research presented in this dissertation has been performed within theLab On ReEngineering (LORE) research group, part of the University ofAntwerp. In a broader sense, this research has been carried out in the contextof the ARRIBA project. ARRIBA is short for Architectural Resources forthe Restructuring and Integration of Business Applications and its aim is

1.9. ORGANIZATION OF THIS DISSERTATION 11

to provide a methodology and tools to support the integration of businessapplications that have not necessarily been designed to coexist1.

The ARRIBA project team consists out of a team of researchers from theFree University of Brussels (Vrije Universiteit Brussel — VUB), the GhentUniversity and the University of Antwerp. Furthermore, a number of compa-nies are involved in ARRIBA. These industrial partners are (1) responsiblefor checking whether the research that is carried out by the academic part-ners is relevant in an industrial context and (2) they are able to provide casestudies to the academic partners in order to validate the research prototypes.Although the group of companies has changed during the duration of the AR-RIBA project (2002 – 2006), the following companies form the backbone ofthe industrial committee:• Inno.com

An ICT expertise center dedicated to advise and assist its clients andpartners to cope with their most challenging technology and businessissues (www.inno.com).• Banksys

Manages the Belgian network for debit card transactions(www.banksys.be).• Anubex

An expert in application modernisation through software conversionand application migration (www.anubex.com).• Christelijke Mutualiteit

A Belgian social security provider (www.cm.be).• KAVA

A non-profit organization grouping over a thousand Flemish pharma-cists (www.kava.be).• KBC

A banking and insurance company (www.kbc.be).• Toyota Motor Europe

European branch of the Toyota Motor company (www.toyota.be).The ARRIBA project is sponsored by the IWT Flanders2 within the 2002

call of the GBOU program.

1.9 Organization of this dissertation

The organization of this dissertation is as follows.

1For more information about this project, please visit: http://arriba.vub.ac.be2Institute for the Promotion of Innovation by Science and Technology in Flanders. For

more information, see: http://www.iwt.be


Chapter 2 gives a view on program understanding and introduces a num-ber of theories and models pertaining to program comprehension. We alsoposition our research within the existing program comprehension frameworks.Chapter 3 talks about advantages and disadvantages of dynamic analysis. Weprovide an overview of techniques that enable dynamic analysis and discussthe ones that we have used during our research in some more detail.

Part II of this dissertation deals with a program comprehension solu-tion that is based on coupling. In Chapter 4 we introduce some of theconcepts concerning coupling and we relate coupling to program compre-hension. Chapter 4 also introduces the dynamic coupling framework we usefor our research, while in Chapter 5 we explain why it is important to alsotake into account indirect coupling for the purpose of program comprehen-sion. Here we also explain how webmining can help us take into accountthis indirect coupling. Chapter 6 introduces the experimental setup we useto validate our hypothesis and presents the results we have obtained fromapplying our techniques on a set of open source case studies. Chapter 7then describes a control experiment in which we compare the results we haveobtained through dynamic analysis with results from a similar experimentcarried out with static analysis.

Part III deals with frequency analysis. Chapter 8 describes our experi-ences of retrieving clues that can speed up the program comprehension pro-cess when taking into account the relative frequency of execution of methodsor procedures.

Part IV deals with our industrial experiences regarding the techniqueswe have developed. As such, Chapter 9 showcases our experiences with ap-plying both the coupling based and frequency based program comprehensionsolutions in an industrial legacy C context. We present the results we haveobtained and also discuss a number of typical pitfalls that occur when ap-plying dynamic analysis in a legacy context.

Chapter 10 then gives an overview of related work and Chapter 11 con-cludes this dissertation with a discussion about our contributions and somepointers to future work.

Chapter 2

Program comprehension

All truths are easy to understand once they are discovered; the point is todiscover them.

— Galileo Galilei

This chapter tries to capture what program comprehension is. We providea definition and determine in which circumstances a software engineer needsto understand a program. Furthermore, we discuss a number of programcomprehension models and we discuss which factors can influence the choiceof program comprehension model.

2.1 What is program comprehension?

When a person starts to build up an understanding of a previously unknownsoftware system or a portion thereof, he/she must create an informal, humanoriented expression of computational intent. The creation of this expressionhappens through a process of analysis, experimentation, guessing and puzzle-like assembly [Biggerstaff et al., 1993].

When it comes to a definition of what program comprehension means, weadhere to the definition introduced by Biggerstaff et Al [Biggerstaff et al.,1993]:

“A person understands a program when able to explain the pro-gram, its structure, its behavior, its effects on its operation con-text, and its relationships to its application domain in terms thatare qualitatively different from the tokens used to construct thesource code of the program.”

13

14 CHAPTER 2. PROGRAM COMPREHENSION

As such, from this definition we learn that the program comprehensionprocess is closely related to the concept assignment problem. This is the prob-lem of discovering individual human oriented concepts and assigning them totheir implementation oriented counterparts for a given program [Biggerstaffet al., 1993]. From this point of view, it becomes clear that the compre-hension process is a highly individual process, where results can vary fromperson to person, while still understanding the software system in the sameway.

In this chapter we will first try to explain why program understanding isnecessary in the field of reengineering as a whole (Section 2.2), after whichwe will discuss a number of program comprehension theories in Section 2.3.

2.2 Program understanding as a prerequisite

Program understanding is a necessary prerequisite to many software engi-neering activities. Von Mayrhauser and Vans have made a compilation ofsoftware maintenance specific scenarios in which program comprehension isa necessary prerequisite [von Mayrhauser and Vans, 1995]. Table 2.1 providesan overview of these maintenance activities.

From Table 2.1 it becomes clear that most day-to-day software mainte-nance activities require a certain level of insight into the application to bemaintained. Being aware of the fact that almost all software evolution activ-ities require understanding of the software system, makes the link betweensoftware evolution and program understanding become very clear.

Furthermore, knowing that most reengineering operations require a pro-gram comprehension phase and knowing that up to 60% of the software engi-neer’s time can be spent in this phase (see Section 1.3) [Spinellis, 2003,Wilde,1994,Corbi, 1990], improving the efficiency of this phase can mean a signifi-cant overall efficiency gain.

2.3 Program comprehension models

In the introduction we have already mentioned that program understandingis a highly individual activity. A number of factors influence how a softwareengineer goes about his/her program understanding process, i.e. which strat-egy he/she will follow. Some of these — sometimes very subjective — factorsare [von Mayrhauser and Vans, 1995]:

• the level of experience of the software engineer• the level of familiarity with the problem domain

2.3. PROGRAM COMPREHENSION MODELS 15

Maintenance tasks ActivitiesAdaptive Understand system

Define adaptation requirementsDevelop preliminary and detailed adaptation designCode changesDebugRegression tests

Perfective Understand systemDiagnosis and requirements definition for improvementsDevelop preliminary and detailed perfective designCode changes/additionsDebugRegression tests

Corrective Understand systemGenerate/evaluate hypotheses concerning problemRepair codeRegression tests

Reuse Understand problem, find solution based on close fitwith reusable components

Locate componentsIntegrate components

Code leverage Understand problem, find solution based onpredefined components

Reconfigure solution to increase likelihood ofusing predefined components

Obtain and modify predefined componentsIntegrate modified components

Table 2.1: Tasks and activities requiring code understanding.

• the level of familiarity with the solution space• the complexity of the software system’s structure• the amount of time available

Studies that lie on the border between psychology and computer sciencehave shown that many strategies exist for the program comprehension pro-cess. These strategies can roughly be divided into three models of programcomprehension, namely: the top-down model, the bottom-up model or a mixof the previous two, the so-called integrated model [von Mayrhauser andVans, 1995]. The next few sections will discuss each of these models.

2.3.1 Top-down program comprehension models

Top-down understanding typically applies when the code, problem domainand/ or solution space are familiar to the software engineer [von Mayrhauser


and Vans, 1995]. This stems from the idea that when a software engineerhas already mastered code that performed the same or similar tasks, thestructure of the code will have parallels. These similarities in code structureare easier to recognize in a top-down understanding process.

When a software engineer goes about his program understanding in atop-down strategy, he/she usually already has a hypothesis or a number ofhypotheses about the structure of the system. These hypotheses can comefrom previous experiences relating to software in the same domain, usingsimilar technologies, etc. or from beacons in the software’s code, documen-tation or other artifacts. In program comprehension terminology a beaconis a cue that indexes into knowledge, e.g. triggering a memory from a previ-ously seen construct and associating it with the current solution. In softwareengineering terminology a good example of such a beacon can be a designpattern [Gamma et al., 1995], e.g. an MVC (Model View Controller) pattern,that would give an indication as to how the GUI layer is structured.

When using this top-down program comprehension strategy, a mentalmodel is built throughout the process that successively refines hypotheses andauxiliary hypotheses about the software system. Hypotheses are iterativelyrefined, passing several levels until they can be matched to specific code in theprogram (or a related document, e.g. a configuration file) [von Mayrhauserand Vans, 1995].

2.3.2 Bottom-up program comprehension models

When the code and/or problem domain are not familiar to the software engi-neer, bottom-up understanding is frequently chosen. This section describesthe models that are used in this situation.

Program model Pennington [Pennington, 1987] found that when code iscompletely new to programmers, the first mental representation they buildis a control-flow program abstraction called the program model. This repre-sentation, built from the bottom up via beacons, identifies elementary blocksof code in the program. The program model is created via the chunking ofmicrostructures into macrostructures and via cross-referencing. Chunking isabout creating larger entities from small blocks to reason with, while cross-referencing relates these larger entities to a higher level of abstraction. Asan example we could say that all the classes that work together to create alinked list can be chunked together, while then actually designating it as a“linked list” and understanding its purpose (i.e. being a container structure)is cross-referencing it to a higher level of abstraction.

2.4. ON THE INFLUENCE OF COMPREHENSION TOOLS 17

Situation model A second model that Pennington identified is the situ-ation model [Pennington, 1987]. This model also operates in a bottom-upfashion and creates a dataflow/functional abstraction. The application ofthis model requires knowledge of the real-world domains that are present inthe software system. An example of this type of bottom-up comprehension isrelating clothesInventory = clothesInventory - itemsSold to “reduc-ing the inventory of clothes by the number of items sold”. Again, lower ordersituation knowledge can be chunked into higher order situation knowledge.The situation model is complete once the program goal is reached.

2.3.3 Integrated model

An integrated model for code comprehension involves the top-down strat-egy, bottom-up strategy (both the program and the situation model) and aknowledge base. The knowledge base, which typically is the human mind,stores (1) any new information that is obtained directly from the applicationof either of the two program comprehension strategies or (2) information thatis inferred.

Intuitively, one would think that in practice the integrated model is mostcommonly used when trying to understand large scale systems. The rea-soning behind this is that certain parts of the code may be familiar to thesoftware engineer because of previous experiences and other parts of the codemay be completely new. Experiments by Von Mayrhauser confirm this intu-ition [von Mayrhauser and Vans, 1994].

2.4 On the influence of comprehension tools

Storey et al describe an experiment in which they study the behavior of 30participants when using program comprehension tools [Storey et al., 2000].More precisely, they observe the factors that influence the participant’s choiceof program comprehension strategy.

Their conclusion was that, ideally, the tools supporting the program un-derstanding process should facilitate the programmer’s preferred strategy orstrategies, rather then enforce the use of a fixed strategy [Storey et al., 2000].Approaches missing features to optimally use a strategy often meant switch-ing to another strategy, hindering the comprehension process. Being able toseamlessly switch between strategies was seen as a bonus.

Based on these observations, we will keep a serious eye on whether thetechniques that we propose do not force the user to use a specific programcomprehension strategy.


Chapter 3

Dynamic Analysis

It requires a very unusual mind to undertake the analysis of the obvious.

— Alfred North Whitehead

Choosing a basic means to reach a goal usually implies that the meansadds some interesting properties to reach your goal. The means we proposeis dynamic analysis. As such, we advocate the use of our means in thelight of object-oriented software, where polymorphism and late-binding makea program hard to understand statically. Another benefit of using dynamicanalysis is that you are able to follow an as-needed strategy during programcomprehension. We also provide an overview of a number of dynamic anal-ysis enabling technologies.

3.1 Definition

In software engineering, dynamic analysis is the investigation ofthe properties of a software system using information gatheredfrom the system as it runs.

We propose to use the above definition of dynamic analysis. It purposelyremains quite vague as to not put a bias on the type of dynamic informa-tion that is collected or the kind of dynamic analysis that is executed. Inother words, it remains sufficiently broad so that it can be used for programcomprehension purposes, to collect design or performance metrics, etc.

19

20 CHAPTER 3. DYNAMIC ANALYSIS

Opposite to dynamic analysis stands the concept of static analysis, whichcollects its information from artifacts such as the source code, design docu-ments, configuration files, etc. in order to investigate the system’s properties.

Again, we remain quite vague on what kind of properties we want toinvestigate, as in the most general sense, these properties can for example bestructural, behavioral, quality or performance oriented.

Enabling dynamic analysis in a software (re)engineering context requiresthe generation of an execution trace of the software system under study;the execution trace being the structure in which the gathered informationis stored. To obtain this execution trace, one needs to execute the softwaresystem according to a well-defined execution scenario. An execution scenariobeing an instance of one or several use cases [Jacobson, 1995].

3.2 Why dynamic analysis?

The choice of dynamic analysis with regard to this research is inspired by twofactors, firstly because dynamic analysis allows a very goal-oriented approach,meaning that we only have to analyze those parts of the application that weare really interested in and, secondly, because dynamic analysis is much moresuccinct at handling polymorphism, which is abundantly present in modernobject-oriented software systems.

Within this section, we will briefly touch both factors.

3.2.1 Goal oriented strategy

Dynamic analysis allows to follow a very goal oriented (or as-needed) strategywhen it comes to dealing with unknown software systems. When one onlyhas end-user knowledge of the system available, it becomes relatively easyto only exercise those scenarios from the use cases that pertain directly tothe knowledge that one wants to gather. This results in a smaller, moreto-the-point execution trace and as a consequence it can also lead to betteranalysis-results.

Example. One wants to build up knowledge of how a word-processor likeMicrosoft Word functions internally when changing the properties of somepiece of text that is selected. When one uses dynamic analysis for this, onecould execute only those use cases that directly involve the selection of textand the subsequent property-change of that text, e.g. put the text in bold.If one were to use a less goal-oriented strategy, e.g. by a very broadly definedexecution scenario or through static analysis, one would need to understand

3.3. MODALITIES OF DYNAMIC ANALYSIS 21

a lot more of the application before knowing which parts exactly are relatedto the functionality one is trying to understand.

3.2.2 Polymorphism

Polymorphism is the ability of objects belonging to different classes of thesame class-hierarchy to respond to method calls of methods with the samename, in a type-specific way, i.e. with different behavior. Furthermore, theprogrammer does not have to know the exact class of the object in advance,so the class and its resulting behavior can be decided at run time. This givesrise to the notion of late binding, deciding at runtime which behavior will beexecuted for a certain object.

The mechanism of polymorphism allows to write programs more effi-ciently. Furthermore, it also should allow software to be more easily evolv-able. However, for program comprehension purposes, polymorphism cancomplicate matters as it becomes difficult to grasp the precise behavior ofthe application, without witnessing the software system in action. This isbecause one possibly polymorphic call is a variation point that can give riseto a great number of different behaviors (the number of possible behaviorsis equal to the number of classes present in the class-hierarchy below thestatically defined class type plus one). To illustrate this, we know of a classhierarchy in Ant, a Java build tool, where the class Task has more than 100child-classes, each portraying specific command-line tasks that can possiblybe executed.

In contrast, when looking at a software system with the help of dynamicanalysis however, the view obtained from the software is precise with regardto the execution scenario. The behavior that is called upon is specific tothe functionality exercised and as such, the number of possible variations isdiminished up to the point that all variations are actually executed (and notonly theoretically possible).

3.3 Modalities of dynamic analysis

In this section we will give you an example of an execution trace and wewill briefly touch a number of technologies that enable the extraction of anexecution trace from a running software system. Furthermore, we will discusssome details of the implementations we used during our experiments.


3.3.1 Example execution trace

Figure 3.1 shows an example of a small piece of trace obtained from runningJHotDraw1, a small paint-like application written in Java.

CALL org.jhotdraw.application.DrawApplication::<clinit> ( ()V )EXIT org.jhotdraw.application.DrawApplication::<clinit> ( ()V )CALL org.jhotdraw.samples.javadraw.JavaDrawApp::<clinit> ( ()V )EXIT org.jhotdraw.samples.javadraw.JavaDrawApp::<clinit> ( ()V )CALL org.jhotdraw.samples.javadraw.JavaDrawApp::main ( ([String;)V )CALL org.jhotdraw.samples.javadraw.JavaDrawApp::<init> ( ()V )CALL org.jhotdraw.contrib.MDI_DrawApplication::<init> ( (String;)V )CALL org.jhotdraw.application.DrawApplication::<init> ( (String;)V )...

This fragment of an execution trace contains all non-library methods thathappen during a typical run of JHotDraw. To be more precise, each callto and return from a method is recorded, which allows us to retrieve allcalling relations and nesting depth of calls. Consider for example the lastentry in the execution trace above: we record the originating package (e.g.org.jhotdraw.application), the classname (e.g. DrawApplication), themethodname (e.g. <init>, which stands for the constructor) and its param-eters and return type (e.g. parameter String and return type void).

Figure 3.1: Example execution trace

3.3.2 Trace extraction technologies

Profiler or debugger based tracing. A profiler is typically used to in-vestigate the performance or memory requirements of a software system. Adebugger on the other hand is frequently used to step through a softwaresystem at a very fine grained level in order to uncover the reasons for unan-ticipated behavior.

Typically, profiler and debugger infrastructures of virtual machines orother environments (sometimes even the operating system itself) send outevents at certain stages of the execution. One can then write a plugin to thevirtual machine or the environment in order to capture these events and actupon them, e.g. store them in an execution trace. Typical events that can becaught with a profiler or debugger are the invocation of a method/procedure,the return from a method/procedure, access to variables, fields, etc.

1For more information, see: http://www.jhotdraw.org/

3.3. MODALITIES OF DYNAMIC ANALYSIS 23

Aspect-oriented based tracing. Aspect-oriented programming (AOP)introduces a new program entity, called an aspect [Kiczales et al., 1997]. Thisaspect can be used to isolate a so-called cross-cutting concern, or a concernthat is present in many classes or modules and does not strictly belong toany of the classes or modules concerned. The code that is responsible forsuch a concern can be captured in the advice part of the aspect, while thepointcut part of the aspect specifies where to insert the particular piece ofcode.

As such, AOP allows to insert a piece of code at the beginning or atthe end of a method/procedure. This makes it possible to write a so-calledtracing aspect, an aspect that generates an entry in the execution trace everytime a method call or a method return takes place.

AST rewriting based tracing. When parsing the source code of a soft-ware system, alterations can be made to the abstract syntax tree (AST)before outputting the AST again as normal source code. To our knowledge,no standard approach exists for this AST rewriting. An example of such anapproach is the work of Akers [Akers, 2005]. Please also note that some AOPimplementations work in a very similar way, where the aspect weaver is builton top of an AST rewriting mechanism [Zaidman et al., 2006a].

Method wrapper based tracing. Method wrappers allow to interceptand augment the behavior of existing methods in order to wrap new behavioraround them [Brant et al., 1998,Greevy and Ducasse, 2005]. In the presentcontext, this new behavior can be tracing functionality.

Ad-hoc based tracing. The previous mechanisms we have mentioned allhave a structured way at going about the tracing operation. However, some-times, when only a very limited amount of points of interest exist withina software system, manual or script-based instrumentation can be a (short-term) solution.

3.3.3 Implementation

For the experiments that we will describe in subsequent chapters we usedtwo different trace extraction technologies, namely a profiler-based solutionfor the Java experiments we carried out and an aspect-based solution for theindustrial experiment we carried out in a legacy C context. We will nowgive a brief overview of the technologies we used for extracting the executiontraces from these Java and C systems.


Java For our Java experiments we chose to use the Java Virtual MachineProfiler Interface or short JVMPI2. This interface allows you to write a pluginin C/C++ that connects with the Java Virtual Machine (JVM). The JVMPIsends out events at such moments as a method entry, a method exit, anactivation of the garbage collector, etc. The plugin that you can write yourselfthen, can be programmed to capture these events and handle them. In ourcase, we programmed a plugin that captured each method entry and exit andoutputted this information to a trace file, similar to the one that can be seenin Figure 3.1.

The JVMPI was introduced with the release of Java 1.1 and has alwaysbeen labeled as an experimental technology, which in a sense, it has alwaysremained, up until its successor, the JVMTI3 interface was presented withthe release of Java 1.5.

A definite drawback to the JVMPI interface is the fact that it becomesunstable when used in combination with the HotSpot technology that wasintroduced in Java 1.3. HotSpot JVM’s use the principle of just-in-time com-pilation to improve performance. This instability manifested itself throughevents that were never thrown by the JVMPI interface and as such are miss-ing in the resulting trace. To overcome this problem we explicitly switchedoff the HotSpot technology when performing tracing operations. This re-sulted in lower performance, but in a more stable virtual machine and thusin better quality traces.

C When it comes to our experiments with software written in C, we madeuse of an aspect-oriented solution. We used Aspicere, a tool built by mem-bers from the ARRIBA team of the Ghent University, to instrument theapplication under study and generate an execution trace. Chapter 9 givesmore details about our choice for Aspicere.

3.4 The observer effect

In many disciplines of exact science, the observer effect refers to changes thatthe effect of observing has on the phenomenon being observed. A classicalexample of this comes from the discipline of quantum physics, where theobservation of an electron will change its path because the observing lightor radiation contains enough energy to disturb the electron under study. Insocial sciences also, a similar effect has been reported, where people will start

2More information on this technology can be found at:http://java.sun.com/developer/technicalArticles/Programming/jvmpitransition/

3Java Virtual Machine Tool Interface

3.5. THREATS TO USING DYNAMIC ANALYSIS 25

to behave differently when being observed. In social sciences this effect iscalled the Hawthorn effect.

In the field of software engineering, a similar effect has been observed,namely the probe effect [Andrews, 1998]. In the context of using dynamicanalysis, this effect can manifest itself in different ways:• Because the software system being traced is less responsive when exe-

cuting the software system according to the pre-defined execution sce-nario, the user is likely to respond to this unresponsiveness by clickingon a button multiple times. As such, the actually executed scenariocan possibly diverge from the pre-defined execution scenario.• A second, possibly more serious threat can be the influence of the trac-

ing operation on thread interactions that happen within the programbeing traced.

As such, it is advisable to generate as little as possible overhead whenextracting the trace from a running software system in order to minimizethe observer effect. A first step towards minimizing the overhead can be thepost-mortem analysis of the trace, i.e. analyzing the trace after the program(and its accompanied tracing operation) has finished. This solution standsopposite to an online analysis.

3.5 Threats to using dynamic analysis

When performing dynamic analysis, one wants to generate a high-qualityexecution trace of the executed scenario. High-quality, meaning that thetrace we obtain is an actual reflection of what happened during the executionof the software. A number of situations however are typically problematicwhen performing dynamic analysis. In this section we will briefly discussthese and we will indicate which precautions we have taken to minimizetheir effects.• In many software systems, lots of threads interact with each other to

make the functionality of the system happen. These threads can in-teract in parallel (when more than one processor is available) or insequence (when only one processor is available). Just storing all theactions of each of the threads in one execution trace can lead to a con-fusing image, because the execution trace would make one believe thattwo methods were executed sequentially, whereas actually they wereexecuted by two completely different threads. To overcome this situa-tion, an execution trace should be made for each thread that is activeduring the execution of the scenario.• More and more systems make use of classloading functionality or re-


flection mechanisms to load classes dynamically. When using a profileror debugger based dynamic analysis solution, this often leads to a sit-uation where the resulting execution trace contains entries from theclassloader of reflection mechanism whenever calls to methods fromthe loaded class are made. Currently, we have not taken any coun-termeasures to prevent this from happening. A possible consequenceof this is that some method calls are not correctly recorded, meaningthat the execution trace contains an entry of a call to the reflectionmechanism or the classloader, but not of the actual method that isexecuted through these mechanisms. To the best of our knowledge, theexecution scenarios we used for our experiments did not contain any ofthese situations.

3.6 Strengths and weaknesses

Table 3.1 gives an overview of the strengths and weaknesses of using dynamicanalysis for program comprehension purposes.

Strength WeaknessPolymorphism

√

Goal-oriented√

Overhead√

Observer effect√

Table 3.1: Strengths and weaknesses of dynamic analysis

Part II

Coupling based solutions forprogram comprehension

27

Chapter 4

How coupling and programcomprehension interact

To manage a system effectively, you might focus on the interactions of theparts rather than their behavior taken separately.

—Russell L. Ackoff

It has been observed that software engineers who are trying to become fa-miliar with a software system follow structural dependencies that are presentin the system to navigate through the system. Coupling is a direct conse-quence of these structural relationships. This chapter describes how runtimecoupling measures provide an indication for classes that need to be understoodearly on in the program comprehension process.

4.1 Introduction

Program comprehension is an inherently human activity, as such intuitionand a dose of luck are essential ingredients to complete this mission success-fully. Recent empirical studies have shown that when effective developershave to identify high-level concepts associated with the task at hand in thesource code, they have a tendency to follow structural dependencies [Ro-billard et al., 2004]. Novice developers however, working on an unfamiliarsystem may easily get stuck in irrelevant code and fail to notice importantprogram functionality, leading to low-quality software modifications [Robil-lard et al., 2004] or non-optimal time-management.

29

30 CHAPTER 4. COUPLING & PROGRAM COMPREHENSION

Within this research track it is our goal to provide the end-user with anumber of starting points, from which they can start following structuraldependencies in order to familiarize themselves with the system under study.The basic means by which we want to identify these starting points is cou-pling.

4.2 Coupling

Coupling was introduced by Constantine et al in the 1960s as a heuristic fordesigning modules [Yourdon and Constantine, 1979]. Constantine’s originaldefinition of coupling is “a measure of the strength of interconnection betweenmodules”. Constantine’s first definition is rather informal and we will useWand’s definition to describe the basic concept of coupling [Wand and Weber,1990,Chidamber and Kemerer, 1994].

Two things are coupled if and only if at least one of them “actsupon” the other. X is said to act upon Y if the history of Yis affected by X, where history is defined as the chronologicallyordered states that a thing traverses in time.

Software systems are typically composed from several software entities —be it modules, classes, components, aspects,... These entities work togetherto reach their goal(s) and the collaborations that exist between these entitiesgive rise to the notion of coupling. From this observation comes the definitionof coupling from Stevens: “the measure of strength of association establishedby a connection from one module to another” [Stevens et al., 1974].

In the light of this definition, a higher level of coupling means that themodules concerned are more inter-related and as such these modules are moredifficult to understand, change, reuse and correct. From this empirical obser-vation stems the basic principle of the pursuit of low coupling levels withina software system [Selby and Basili, 1991]. Intuitively however, couplingwill always exist within software systems, as modules or classes need to worktogether to reach their goals and ultimately deliver the desired end-user func-tionality [Lethbridge and Anquetil, 1998]. This observation, together withthe definition postulated by Wand [Wand and Weber, 1990], makes that wecan categorize classes that have a relatively high degree of coupling as influ-ential. Influential, because they have a certain amount of control over whatthe application is doing and how it is doing it.

In her research about design flaws, Tahvildari uses a similar concept,called key classes [Tahvildari, 2003]:

4.3. DYNAMIC COUPLING 31

“These key classes are described as the classes that implement thekey concepts of a system. Usually, these most important conceptsof a system are implemented by very few key classes, which can becharacterized by a number of properties. These classes which wecalled key classes manage a large amount of other classes or usethem in order to implement their functionality. The key classesare tightly coupled with other parts of the system. Additionally,they tend to be rather complex, since they implement much of thelegacy system’s functionality.”

4.3 Dynamic coupling

4.3.1 Introduction

Coupling measures have for some time now been subject of research, e.g.in the context of quality measurements. These measures have mostly beendetermined statically, i.e. based upon structural properties of the source code(or models thereof). However, with the wide-spread use of object orientedprogramming languages, these static coupling measures lose precision as moreintensive use of inheritance and dynamic binding occurs. Another factor thatpossibly perturbs the measurements is the presence of dead code, which canbe difficult to detect statically in the presence of polymorphism.

This has led us to start looking at dynamic coupling measures, a branchof software engineering research that has only recently been developing [Ar-isholm et al., 2004]. We propose the following working definition for dynamiccoupling measures:

Dynamic coupling measures are defined based upon an analysisof interactions of runtime objects. We say that two objects aredynamically coupled when one object acts upon the other. Ob-ject x is said to act upon object y, when there is evidence in theexecution trace that there is a calling relationship between ob-jects x and y, originating from x. Furthermore, two classes aredynamically coupled if there is at least one instance of each classfor which holds that they are dynamically coupled.

The basic framework we use when considering dynamic coupling measureswas first introduced by Arisholm et al. [Arisholm et al., 2004].


4.3.2 Classification of dynamic coupling measures

Dynamic coupling can be measured in different ways. Each of the measurescan be justified, depending on the application context where such measuresare to be used [Arisholm et al., 2004]. Table 4.1 is based on the variationsthat Arisholm et al have defined. Each of the variations will also be discussedin this section.

Entity Granularity Scope Direction(Aggregation Level) (Include/Exclude)

Object Object Library objects Import/ExportClass Framework objects

(set of) Scenario(s) Exceptional use cases(set of) Use case(s) ...

SystemClass Class Library classes Import/Export

Inheritance Hierarchy Framework classes(set of) Subsystem(s) ...

System

Table 4.1: Dynamic coupling classification.

1. Entity of measurement. Since dynamic coupling is calculated fromdynamic data stored in the event trace, we can calculate coupling atthe object-level or at the class-level.

2. Granularity. Orthogonal to the entity of measurement, dynamic cou-pling measures can be aggregated at different levels of granularity.With respect to dynamic object coupling, measurement can be per-formed at the object level, but can also be aggregated at the classlevel, i.e. the dynamic coupling of all instances of a class is aggre-gated. Different kinds of aggregations can be made depending on theentity of measurement. Aggregations that can be made include: at the(sub)system, inheritance hierarchy, use case or scenario level.

3. Scope. Another variation can be the classes we want to consider whencalculating the metric(s). For example, instances of library or frame-work classes can sometimes be of no special interest and as such theycan be excluded.

4. Direction (import or export). Consider two classes c and d beingcoupled by the invocation of a method m2 of d in a method m1 in classc. This relationship can be described as a client-server relationshipbetween the classes: the client class c uses (imports services), the serverclass d is being used (exports services). This situation gives rise to theconcepts of import and export coupling.

4.3. DYNAMIC COUPLING 33

4.3.3 Dynamic coupling for program comprehension

Based on the classification schema presented in Section 4.3.2 we will nowdiscuss which properties we expect from a coupling metric in order to beuseful for program comprehension purposes. Based on these properties, wewill find those dynamic coupling metrics that suit our intentions best.

1. At a cognitive level, the software engineer trying to get a first impres-sion of a piece of software, will probably try to comprehend the softwareat the class-level, as these are the concepts he/she can recognize in thesource code, the documentation and the application domain.

2. As such we advocate either the use of classes as level of granularity ora further aggregation up to the the (sub)component (or in other termspackage) level.

3. With regard to the scope, we discard all classes foreign to the actualproject (e.g. libraries), as these have no direct influence on the programcomprehension process. Furthermore, choosing an execution scenarioof the software that involves the features that you are interested infrom a program comprehension point of view, is essential.

4. In Section 4.2 we already stated that we are looking for classes thathave a prominent role within the system’s architecture. We expectthese classes to give orders to other classes, i.e. tell them what to doand what to give in return. As such, we expect these classes to requestthe services of (many) other classes, which in terms of the directionof coupling, is translated as import coupling. On the other hand, weexpect classes with strong export coupling to be classes that provideservices to other classes.

Arisholm et al. defined twelve dynamic coupling metrics; two of these adhereto the criteria we set out, namely: working at the class-level and measuringimport coupling [Arisholm et al., 2004]. We will now discuss these two met-rics.

1. Distinct method invocations. This measure counts the number of dis-tinct methods invoked by each method in each object. This informationis then aggregated for all the objects of each class. Arisholm et al. callthis metric IC CM (Import Coupling, Class level, Distinct Methods).Calls to methods from the same object or class (cohesion) are excluded.

2. Distinct classes. This measure counts the number of distinct serverclasses that a method in a given object uses. That information is thanaggregated for all the objects of each class. Arisholm et al. call thismetric IC CC (Import Coupling, Class level, Distinct Classes). Callsto methods from the same object or class (cohesion) are excluded.

Consider the formal definitions of IC CC and IC CM in Table 4.2.


C Set of classes in the system.M Set of methods in the system.RMC RMC ⊆ M × C

Refers to methods being defined in classes.IV IV ⊆ M × C × M × C

The set of possible method invocations.IC CM(c1) = # {(m1, c1, m2, c2) | (∃ (m1, c1), (m2, c2)

∈ RMC ) ∧ c1 6= c2 ∧ (m1, c1, m2, c2) ∈ IV }IC CC(c1) = # {(m1, c1, c2) | (∃ (m1, c1), (m2, c2) ∈ RMC )

∧ c1 6= c2 ∧ (m1, c1, m2, c2) ∈ IV }IC CC′(c1) = # {(m2, c1, c2) | (∃ (m1, c1), (m2, c2) ∈ RMC )

∧ c1 6= c2 ∧ (m1, c1, m2, c2) ∈ IV }

Table 4.2: Dynamic coupling measures [Arisholm et al., 2004].

Now reconsider the IC CC metric. When we are looking for a metric thatpoints to classes that import a lot of services from other classes, we see thatIC CC has a limited range. IC CC counts the number of (m1, c1, c2) triples.Because the first component in this triple is m1, the maximum metric value isthe product of the number of methods in the definition of c1 and the numberof classes c1 interacts with. Because the number of methods defined in c1

plays a vital role in the calculation of this metric, this can become a limitingfactor. Furthermore, it does not give a true reflection as to how many otherclasses and in particular methods in other classes are used.

Therefore, we made a variation on the IC CC metric, called IC CC′. Thisvariation does not count the number of calling methods, but the number ofcalled methods. In other words, triples of the form (m2, c1, c2) are counted.

A formal definition of IC CC′ can be found in Table 4.2.

Example A class c1 which only has 1 method, calls 4 distinct methodsm1,... ,m4 in a class c2 and calls 2 methods m5 and m6 in a class c3. Calculat-ing IC CC and IC CC′ for c1 would yield 2, respectively 6. This indicates thatIC CC is targeted more towards finding the number of class-collaborations,while IC CC′ retrieves the number of method-collaborations.

In our experiment (see Chapter 6) we will make a thorough comparisonof the effectiveness of the three aforementioned metrics.

4.4. RESEARCH QUESTION 35

4.4 Research question

The central research question of this research track is whether there is a clearlink between influential classes and the classes that need to be understoodduring initial programming understanding. These need-to-be-understoodclasses will be designated important, as their comprehension is needed inorder to understand other classes and interactions within the software sys-tem.

A more abstract description of the research question we are trying to solveis whether it is possible to identify these important classes based solely onthe topological structure of the application. In this context, the topologicalstructure is instantiated by coupling relationships between classes.

A number of subsidiary questions for this research goal are:

1. Which type of coupling measurements are best at mapping influentialmodules or classes on the important modules or classes?

2. Is the simple measurement of a binary coupling relation sufficient toretrieve important classes or do we need to add a measure to take intoaccount indirect coupling?

4.5 Research plan

Over the course of this research track we have developed three heuristicaltechniques to identify important classes (or modules) in a system.

1. Dynamic coupling measures as indicators of classes requesting a signifi-cant amount of actions to be performed for them (see previous sections).

2. Webmining algorithms that allow to take into consideration indirectcoupling (Chapter 5).

3. Static coupling measures as an alternative to their dynamic counter-parts (Chapter 7).

4.6 Validation and evaluation

Each of the three techniques will be evaluated intrinsically against two casestudies. The evaluation is done according to three evaluation criteria, namely:

1. The recall of the resultset, or in other words, the technique’s retrievalpower.

2. The precision of the resultset, or in other words, the technique’s re-trieval quality.

3. The effort it takes to perform the complete analysis, from start to finish.


Validation of each of the techniques is done the same way, with recall andprecision being the deciding factors. For each type of analysis, we will alsoperform an effort analysis, which, although secondary to the primary criteriaof recall and precision, can be a deciding factor when it comes to determiningthe return on time-investment.

4.7 Practical application

In general, we can describe the process to be followed by a software (re)engineerusing our heuristics as follows:

1. Definition of the execution scenario.2. Trace the application according to the chosen execution scenario.3. Determine the most important classes using one of the heuristics pro-

posed.4. Interpret the results.

Chapter 5

Webmining

Keep on the lookout for novel ideas that others have used successfully. Youridea has to be original only in its adaptation to the problem you’re working

on.

—Thomas Edison

Webmining, which is a form of datamining, is a mining technique whichsolely uses the topological structure of a graph to determine which nodes areimportant within a graph. We rely on these webmining techniques to addthe notion of indirect coupling to our previously built-up theory on dynamiccoupling and program comprehension.

5.1 Indirect coupling

5.1.1 Context and definition

Up until now we have talked about direct coupling. Direct coupling is a re-lationship between two entities. However, when considering large-scale soft-ware systems it is far from inconceivable that more than 2 entities influenceeach other. Reconsider the coupling definition from Wand (see Section 4.2)and let X, Y and Z be 3 entities where, respectively (X, Y) and (Y, Z) aredirectly coupled, i.e. X acts upon Y and Y acts upon Z. Intuitively, it iseasy to understand that it is possible that X also (indirectly) acts upon Z.Consider the example code in Figure 5.1. In this example, where (X, Y) and(Y, Z) are directly coupled, it is clear to see that it is possible that X actsupon Z through the parameter that is passed. In terms of object orientation

37

38 CHAPTER 5. WEBMINING

and polymorphism, it is furthermore possible that not only parameter values,but also a parameter’s dynamic type can be of influence.

class X{

Y y = new Y();void doitX(int param){

...y.doitY(param);...

}}

class Y{

Z z = new Z();void doitY(int param){

...z.doitZ(param);...

}}

class Z{

void doitZ(int param){

...}

}

Figure 5.1: Indirect coupling example.

Based upon this observation, we investigate the notion of indirect coupling[Yang et al., 2005]. Briand et al. use the following definition [Briand et al.,1999]:

Direct coupling describes a relation on a set of elements (e.g. arelation “invokes” on the set of all methods of the system, or arelation “uses” on the set of all classes of the system). To accountfor indirect coupling, we need only use the transitive closure ofthat relation.

5.1.2 Relevance in program comprehension context

Consider Figure 5.2, where part of a system is visualized. The nodes in thisgraph represent classes, the edges indicate calling relationships. Furthermore,each edge is annotated with a weight, indicating the strength of the (coupling)relationship. It becomes immediately clear that the class Task is (stronglyimport-) coupled to three other classes. By that same observation, the classMain is weakly coupled (to Task).

From a program comprehension point of view, the class Main can still be ofinterest, because it can be influential to what Task does (e.g. the parameters

5.2. THE HITS WEBMINING ALGORITHM 39

passed or the dynamic type of the parameters can have an influence). Assuch, by adding the concept of indirect coupling, Main will now benefit fromthe strong level of coupling exhibited by Task.

Main

Task

Element

Dependency

Thread

1

5 7 3

Figure 5.2: Indirect coupling example.

We will employ the iterative-recursive nature of the HITS1 webminingalgorithm to express the concept of indirect coupling towards our goal ofprogram comprehension.

5.2 The HITS webmining algorithm

5.2.1 Introduction

Webmining, a branch of datamining research, deals with analyzing the struc-ture of the internet (or to be more specific: the web) [Brin and Page,1998,Gibson et al., 1998,Kleinberg, 1999]. Typically, webmining algorithmssee the internet as a large graph, where each node represents a webpage andeach edge represents a hyperlink between two webpages. Using this graphas an input, the algorithm allows us to identify so-called hubs and authori-ties [Kleinberg, 1999]. Intuitively, on the one hand, hubs are pages that referto other pages containing information rather than being informative them-selves. Standard examples include web directories, lists of personal pages, ...On the other hand, a page is called an authority if it contains useful infor-mation and is referenced by others (e.g. web pages containing definitions,personal information, ...).

Software systems can also be represented by graphs, where classes arenodes and calling relationships between classes are edges (e.g. see Figure

1Hypertext-Induced Topic Search


5.2). Furthermore, there is a “natural” extension to the concepts of hubsand authorities in the context of object-oriented software systems. Classesthat exhibit a large level of import coupling call upon a number of otherclasses that do the groundwork for them. In order for them to control theseassisting classes, they often contain important control structures. As such,they have a considerable amount of influence on the data and control flowwithin the application. Conceptually, the classes that have a high level ofimport coupling are similar to the hubs in web-graphs.

Export coupling on the other hand is often a sign of very specific func-tionality, often frequently reused throughout the system. Because of theirspecificity, they are conceptually bonded to authorities in web-graphs.

Because of this conceptual similarity, we found it worthwhile to try andreach our goal of identifying important classes in a system through the HITSwebmining algorithm [Kleinberg, 1999].

In the context of this experiment, the calling relationships between theclasses are determined dynamically.

5.2.2 HITS algorithm

The HITS algorithm works as follows. Every node i gets assigned to ittwo numbers; ai denotes the authority of the node, while hi denotes thehubiness. Let i → j denote that there is a link from node i to node j.The recursive relation between authority and hubiness is captured by thefollowing formulas.

hi =∑i→j

aj (5.1)

aj =∑i→j

hi (5.2)

The HITS algorithm starts with initializing all h’s and a’s to 1. In a numberof iterations, the values are updated for all nodes, using the previous itera-tion’s values as input for the current iteration. Within each iteration, the hand a values for each node are updated according to the formulas (5.1) and(5.2). If after each update the values are normalized, this process convergesto stable sets of authority and hub weights. Proof of the convergence crite-rion can be found in Appendix A or in [Kleinberg, 1999].

It is also possible to add weights to the edges in the graph. Adding weightsto the graph can be interesting to capture the fact that some edges are moreimportant than others. This extension only requires a small modification to

5.2. THE HITS WEBMINING ALGORITHM 41

the update rules. Let w[i, j] be the weight of the edge from node i to nodej. The update rules become:

hi =∑i→j

w[i, j] · aj (5.3)

aj =∑i→j

w[i, j] · hi (5.4)

5.2.3 Example

1 2

4

3

5

Figure 5.3: Example graph

Consider the example graph of Figure 5.3. Table 5.1 shows three iterationsteps of the hub and authority scores (represented by tuples (H, A)) for eachof the five nodes in the example graph. Even after only 3 iterations steps,it becomes clear that 2 and 3 will be good authorities, as can be seen fromtheir high A scores in Table 5.1. Looking at the H values, 4 and 5 will begood hubs, while 1 will be a less good one.

Nodes1 2 3 4 5

Iter

atio

ns 1 (1,1) (1,1) (1,1) (1,1) (1,1)

2 (2,0) (1,3) (0,3) (2,1) (2,0)3 (4,0) (3,6) (0,5) (6,2) (6,0)4 ... ... ... ... ...

Table 5.1: Example of the iterative nature of the HITS algorithm. Tupleshave the form (H,A).


5.3 Practical application

In order to apply the HITS webmining algorithm, we need the conceptualmodel of a graph. This graph, which we call the compacted call graph [Zaid-man et al., 2005], is built up as follows:• The classes in a system form the nodes, while the calling relationships

between classes are indicated by edges.• The strength of each calling relationship from class A to class B is

determined by the number of elements in the set2:

{(mB, A,B)|(∃(mA, A), (mB, B) ∈ RMC∧A 6= B∧(mA, A,mB, B) ∈ IV }

Each edge is annotated with the calling relationship strength (see Fig-ure 5.3).• The HITS webmining algorithm can now be applied on the graph.On a side note, there is a clear equivalence relationship between building

up the compacted call graph and calculating the IC CC′ metric.

IC CC ′(A) =∑i→j

w[i, j] (5.5)

where i is the node that represents class A in the compacted call graph andj ranges over the classes to which instances from A send messages to.

2For information about the symbols used, please consult Table 4.2 in the previouschapter.

Chapter 6

Experiment

A thinker sees his own actions as experiments and questions — as attemptsto find out something. Success and failure are for him answers above all.

—Friedrich Nietzsche

In the previous chapter we set out the theory behind our analysis thatretrieves the key classes that need to be understood early on in the programcomprehension process. In this chapter we use two open source softwareprojects as case studies to compare the solutions that we have proposed andto determine how good the technique actually performs.

6.1 Experimental setup

6.1.1 Case studies

We selected two open-source software projects as case studies for the fullduration of this research track. When selecting these case studies, we werespecifically looking for two properties that would make the software projectsparticularly well-suited for our program comprehension experiments:• Their public nature ensures the repeatability of these or similar exper-

iments within the research community.• The presence of extensive design documentation is very useful for val-

idating program comprehension experiments. Furthermore, the factthat this extensive design documentation is freely available, is a fur-ther bonus with respect to the guarantee of repeatability.

43

44 CHAPTER 6. EXPERIMENT

Ultimately, we chose Apache Ant 1.6.1 and Jakarta JMeter 2.0.1 becausethey adhere best to the criteria we set out. Although a number of open-source projects would adhere to the above properties, the specific choicefor these two projects is also given by the fact that both software systemsare completely different kinds of applications, e.g. Ant is a command-linebatch application, while JMeter features a highly interactive graphical userinterface.

6.1.2 Execution scenarios

The choice of execution scenario is very important and can influence theresultset. On the other hand, a well-chosen execution scenario can also be anadvantage when reverse engineering large software systems: a strict executionscenario that only executes use cases that the reverse engineer is interestedin, can help in reducing the resultset. As such, it enables a goal orientedapproach. Within the context of this experiment, the execution scenario is atwo-sided sword that can help bring precision, but can also make the resultsless reliable.

The precise execution scenarios which we used for each of the case studieswill be discussed in Sections 6.2.3 and 6.3.3.

6.1.3 Program comprehension baseline

The presence of extensive design documentation made it possible to define abaseline for our program comprehension research. This baseline is the set ofclasses that are marked by the original developers and/or current maintainersas need-to-be-understood before any (re)engineering operation can take placeon the project. This baseline however remains an approximation, becauseit is based on the experienced developer’s point of view, and not on theexperience of a novice maintainer who is trying to understand the softwaresystem.

As such, this baseline enables us to do an intrinsic evaluation of theheuristics. Intrinsic, meaning that we use the developers’ and maintain-ers’ opinion to compare with the results we have obtained. Opposed tothis intrinsic evaluation stands an extrinsic evaluation, where we would em-pirically evaluate the effectiveness of the proposed program comprehensiontechniques [Hamou-Lhadj, 2005a]. At this moment, we regard this extrinsicevaluation as future work.

Applying one of the heuristics we have presented in the previous chap-ters results in a list of classes ranked according to their relative importance

6.1. EXPERIMENTAL SETUP 45

according to the heuristic. By default, we only present the 15% most highlyranked classes, the reasoning behind this is as follows:• From the documentation of both Apache Ant and Jakarta JMeter we

have learned that about 10% of the classes of the systems need to beunderstood before any meaningful change operation can take place. Aswe are working with a heuristical technique we took a 5% margin.• For cognitive reasons, the size of the data presented to the users should

be kept to a minimum, as to not overload the user with information.As such, the resultset should be kept as minimal as possible.• Empirically, we found that lowering the threshold to the top 20%

classes, did not result in an increase in recall. To be more precise,we did not notice any classes mentioned in the documentation showingup in the interval [15%, 20%] [Zaidman et al., 2005].

6.1.4 Validation

As a validation we propose to use the concepts of recall and precision. Eachresultset we obtain from applying one of the proposed heuristics will becompared to the baseline that we defined.

Recall is the technique’s ability to retrieve all items that are contained inthe baseline, while precision is the quality of the retrieved items containedin the resultset. Recall and precision are defined as follows:

Recall (%) :A

A + B× 100 (6.1)

Precision (%) :A

A + C× 100 (6.2)

A: relevant, retrieved itemsB: relevant, non-retrieved itemsC: irrelevant, retrieved items

6.1.5 Research plan

In Sections 6.2 and 6.3 we will compare and discuss the results we haveobtained from the 4 dynamic approaches to identifying important classeswe have introduced in the previous chapters. To be more precise, we willcompare IC CM, IC CC, IC CC′ and IC CC′ combined with the webminingapproach to the program comprehension baseline we have obtained. In thiscomparison, recall and precision play a major role and are the deciding factorsas to which of the approaches delivers the best results.

Chapter 7 then describes a control experiment in which we compare thedynamic approach that delivered the best results with a number of static


variants of our dynamic approach. For this control experiment, we will notonly focus on the two primary criteria of recall and precision, but we adda third — although secondary — criterion, namely round-trip-time, i.e. thetime needed to perform a complete analysis.

6.1.6 Threats to validity

We identified a number of potential threats to validity:

• In the case of Apache Ant, the design documentation we used1 datesfrom 2003. Although since then, no major overhauls of the architecturewere reported, the fact that the source code and the technical documen-tation are not perfectly synchronized can be a threat to the validationprinciple we propose. Other than the fact that one class that was men-tioned in the documentation is no longer part of the Ant distribution,there have been no consequences with regard to our experimental setup.• Comparing static and dynamic analysis poses some threats to the valid-

ity of our experimental setup. When considering the 15% most highlyranked classes, the size of this 15% resultset varies according to the sizeof the inputset, namely the number of classes. In the case of the staticprocess, the size of the inputset equals the total number of definedclasses, while in the dynamic process, this equals the number of classesthat participate in the execution scenario(s). In most cases, the num-ber of classes participating in an execution scenario will be lower thanthe total number of classes present in a system. As such, the primarycriterion on which to compare the resultsets should be recall, becausethe precision will drop automatically when considering the often largerresultsets of static analysis.

6.2 Apache Ant

6.2.1 Introduction

Apache Ant 1.6.12 is a well-known build tool, mainly used in Java environ-ments. It is a command-line tool, has no GUI and is single-threaded. It hasa relatively small footprint, but it does however use a lot of external libraries(e.g. the Xerces XML library) and is user-extensible. Ant relies heavily onXML, as the propriety build files are written entirely in XML.

1http://codefeed.com/tutorial/ant config.html2For more information, see: http://ant.apache.org

6.2. APACHE ANT 47

Even though Ant is open-source, it is used both in open-source and in-dustrial settings. Furthermore, it has been integrated in numerous (Java)Integrated Development Environments (IDE’s) (e.g. Eclipse, IntelliJ IDEA,. . . ). A number of extensions to the basic Ant distribution have been written(e.g. GUI’s) and there has even been a complete port to the .NET environ-ment (called nANT).

The source-file distribution of Apache Ant 1.6.1 contains 1216 Java classes.Only 403 of these classes (around 83 KLOC) are Ant-specific, as most ofthe classes in the distribution belong to general purpose libraries or frame-works, such as Apache ORO (for regular expressions) or Apache Xerces (XMLparser).

6.2.2 Architectural overview

With the help of the freely available design documentation3, we will discussthe role the five classes that are considered important by the architects, playin the execution of a build.xml file:

1. Project: Ant starts in the Main class and immediately creates aProject instance. With the help of subsidiary objects, the Project

instance parses the build.xml file. The xml file contains targets andelements.

2. Target: this class acts as a placeholder for the targets specified in thebuild.xml file. Once parsing finishes, the build model consists of aproject, containing multiple targets – at least one, which is the implicittarget for top-level events.

3. UnknownElement: all the elements that get parsed are temporarilystored in instances of UnknownElement. During parsing the Unknown-Element objects are stored in a tree-like datastructure in the Target

to which they belong. When the parsing phase is over and all depen-dencies have been determined, the makeObject() method of Unknown-Element gets called, which instantiates the right kind of object for thedata that was kept in the placeholder UnknownElement object.

4. RuntimeConfigurable: each UnknownElement has a corresponding Run-timeConfigurable, that contains the element’s configuration informa-tion. The RuntimeConfigurable objects are also stored in trees in theTarget object they belong to.

5. Task is the superclass of UnknownElement and is also the baseclass forall types of tasks that are created by calling the makeObject() method

3The design documentation of Ant can be found at:http://codefeed.com/tutorial/ant config.html


of UnknownElement.We tried to record the relationship between those 5 classes in Figure 6.1.Besides these 5 key classes, the design documentation also mentions five

Project

Target

Task RuntimeConfigurable

UnknownElement

Figure 6.1: Simplified class diagram of Apache Ant.

other important (helper)classes:• IntrospectionHelper

• ProjectHelper2

• ProjectHelperImpl

• ElementHandler

• Main

6.2.3 Execution scenario

We chose to let Ant build itself as the execution scenario of choice for ourexperiment. This scenario involved 127 classes. At first sight this may seemrather low, considering that Ant is built from 403 classes in total. This can beexplained from the fact that the Ant architecture contains some very broad(and sometimes deep) inheritance hierarchies. For example the number ofdirect subclasses from the class Task is 104. Each of these 104 classes standsfor a typical command line task, such as mkdir, cvs, . . . As typical executionscenarios do not contain all of these commands (some are even conflicting,e.g. different versioning system or different platform), the execution scenariocontaining 127 classes covers all basic functionality of the Ant system.

6.2. APACHE ANT 49

The two main reasons why we chose this particular execution scenarioare:• It offers a good balance of features that get exercised, furthermore it

contains all typical build commands, including those for copying filesinto different directories, generating jar (archive) files, etc.• Every source file distribution of Ant contains this specific execution

scenario, through the build.xml file that is included in the distribution.

6.2.4 Discussion of results

We will now discuss the results we have obtained from applying each of thetechniques to the Apache Ant case study. Table 6.1 gives an overview of theaforementioned results.

Class 1:IC

CM

2:IC

CC

3:IC

CC′

4:IC

CC′+

web

min

ing

5:A

ntdo

cs

Project√ √ √

UnknownElement√ √ √ √ √

Task√ √ √ √ √

Main√ √

IntrospectionHelper√ √ √ √

ProjectHelper√ √ √ √

RuntimeConfigurable√ √ √ √ √

Target√ √ √ √ √

ElementHandler√ √ √

TaskContainer N/A

√

→ recall (%) 40 70 70 90 -→ precision (%) 27 47 47 60 -

Table 6.1: Ant metric data overview.

The IC CM metric for a class c1, which counts quadruples of the form(m1, c1, m2, c2), exhibits the lowest recall of all dynamic analysis solutions:40%. The IC CM metric counts distinct method invocations originating fromthe same source (m1, c1) combination. As such, a class c1 using low-levelfunctionality from c2 in each of its methods mi, will get a high metric value.This causes noise in the resultset, because we are actually looking for classes


that use other (high-level) classes. This explains its relatively low recall whencompared to the baseline.

The IC CC and IC CC′ metrics, which count (m1, c1, c2) and (m2, c1, c2)respectively, exhibit a similar recall of 70%. Although at this point, we wouldhave expected IC CC′ to perform considerably better, there is no noticeabledifference with regard to the recall. Our expectation for a better performancefrom IC CC′ stems from the fact that, just as is the case for IC CM, IC CCfocusses on counting the originating class/method pair, while IC CC′ shiftsfocus towards the target class/method pair.

When applying the HITS webmining algorithm on the IC CC′ metricresults, we see that we get a recall of 90%. This increase in recall happensbecause indirect coupling is taken into account when applying the HITSwebmining algorithm on the coupling data.

With regard to precision, it is clear that the webmining algorithm allowsto greatly improve precision and bring it to a level of 60%, which, to our opin-ion, is satisfactory for a heuristic. Satisfactory, but nothing more than that,because it still means that 40% of the program comprehension “pointers”returned to the user are potentially of lesser value.

Trade-off analysis

Based on the results we have obtained from the Apache Ant case study, thisis our analysis:

• Running Ant according to the execution scenario takes 23 seconds with-out collecting trace-information. When we collect a trace from runningAnt according to the same execution scenario, this now takes slightlyunder 1 hour4. The execution generates a trace of roughly 2 GB ofdata.• Processing this amount of data and calculating the IC CM, IC CC and

IC CC′ metrics took 45 minutes (the three metrics were calculated inparallel, only calculating one of these at a time lowers the time neededby only a fraction).• Applying the HITS webmining algorithm on the metric data takes less

than 30 seconds.

When considering the return on time-investment, we are mainly looking atthe round-trip-time, i.e. the time needed to perform the full analysis, fromloading the project into the environment till having the results presented.From starting the reverse engineering process till having the results at one’s

4Experiment conducted on an AMD Athlon 800 with 512MB memory running FedoraCore 3 Linux.

6.3. JAKARTA JMETER 51

disposal takes roughly 105 minutes, which is partly due to the very slow trace-collection-phase. Although we expect to be able to improve these round-trip-times, because of the prototype state of our tools, we firmly believe that theorder of magnitude of the round-trip-time is set.

Rounding up, we can say that we are very much satisfied with the level ofrecall that the dynamic analysis approach gives us. Furthermore, precisionis also good at a level of 60%, however, the round-trip-time should be seenas a major detractor.

6.3 Jakarta JMeter

6.3.1 Introduction

Jakarta JMeter 2.0.15 is a Java application designed to test webapplications.It allows to verify the application (functionally), but it also allows to performload-testing (e.g. to measure performance or stability of the software system).It is frequently used to test webapplications, but it can also handle SQLqueries through JBDC. Furthermore, due to its architecture, plugins can bewritten for other (network) protocols. Results of performance measuring canbe presented in a variety of graphs, while results of the functional testing aresimple text files with output similar to output from regression tests.

JMeter is a tool which relies on a feature-rich GUI, uses threads abun-dantly and relies mostly on the functionality provided by the Java standardAPI (e.g. for network-related functionality)6.

The source-file distribution of Jakarta JMeter 2.0.1 consists of around700 classes, while the core JMeter application is built up from 490 classes(23 KLOC).

6.3.2 Architectural overview

What follows is a brief description of the innerworkings of JMeter.

The TestPlanGUI is the component of the user-interface that lets the enduser add and customize tests. Each added test resides in a JMeterGUIComponentclass. When the user has finished creating his or her TestPlan, the informa-tion from the JMeterGUIComponents is extracted and put into TestElement

classes.

5For more information, see: http://jakarta.apache.org/jmeter/6The design documentation can be found on the Wiki pages of the Jakarta JMeter

project: http://wiki.apache.org/jakarta-jmeter


These TestElement classes are stored in a tree-like datastructure: JMeter-TreeModel. This datastructure is then passed onto the JMeterEngine which,with the help of the TestCompiler, creates JMeterThread(s) for each indi-vidual test. These JMeterThreads are grouped into logical ThreadGroups.Furthermore, for each test a TestListener is created: these catch the resultsof the threads carrying out the actual tests.

As such, we have identified nine key classes from the JMeter documen-tation. The design documentation also mentions a number of importanthelper-classes, being:• AbstractAction

• PreCompiler

• Sampler

• SampleResult

• TestPlanGui


The execution scenario for this case study consists of testing a HTTP (Hy-perText Transfer Protocol) connection to Amazon.com, a well-known onlineshop. More precisely, we configured JMeter to test the aforementioned con-nection 100 times and visualize the results in a simple graph. Running thisscenario took 82 seconds. The scenario is representative for JMeter, becausemany of the possible variation points in the execution scenario lie in (1)the usage of a different protocol (e.g. FTP) or (2) in the output format ofthe data (e.g. different type of graph or plain-text). Also of importanceto note here is that these 100 connections are initiated by a number of dif-ferent threads, in order to simulate concurrent access to the Amazon webapplication. This entails that this particular case study is an example of amulti-threaded application.

6.3.4 Discussion of results

This section presents a discussion about the results from the Jakarta JMetercase study. Table 6.2 provides an overview of these results.

The IC CM metric clearly lags behind the other dynamic metrics pro-posed with a recall of 14% and a precision of 10%. The explanation for thisrelatively bad result is identical to the reasoning given for the Ant case study.

In contrast with our previous case study, there is a notable difference be-tween the most tightly coupled classes as reported by IC CC versus IC CC′.Although not immediately visible from Table 6.2, this phenomenon is re-lated to the feature-rich graphical user interface (GUI). Although there is

6.3. JAKARTA JMETER 53

Class 1:IC

CM

2:IC

CC

3:IC

CC′

4:IC

CC′+

web

min

ing

5:JM

eter

docs

AbstractAction√ √ √ √

JMeterEngine√ √ √

JMeterTreeModel√ √

JMeterThread√ √ √

JMeterGuiComponent√ √ √ √

PreCompiler√ √

Sampler√ √ √ √

SampleResult√ √ √

TestCompiler√ √ √

TestElement√ √ √

TestListener√ √ √

TestPlan√ √ √

TestPlanGui√ √ √

ThreadGroup√ √

→ recall (%) 14 21 71 93 -→ precision (%) 10 14 48 62 -

Table 6.2: JMeter metric data overview.

evidence of an attempt of a model-view-controller (MVC) pattern imple-mentation [Gamma et al., 1995] (both from source code and from designdocuments), there still is a high degree of coupling from the view to themodel in the MVC scheme. Furthermore, a high degree of coupling existswithin the GUI layer.Because certain classes in the GUI layer of JMeter can be catalogued as godclasses (many methods, large methods), the IC CC metric falsely registersthese classes as important, due to the high method count of these classes.IC CC′ however does not suffer from this because its measure is not depen-dent on the number of methods defined within the class.

With regard to the heuristic where we applied webmining on top of theIC CC′ metric, the results are fairly convincing with a recall attaining 93%,while still offering a level of precision of 62%. So again, taking indirectcoupling into account makes sure that the important classes can be retrieved.


Trade-off analysis

Based on the results and the effort it took to generate the resultset, we madethe following analysis:• Running JMeter without collecting trace information takes 82 seconds.

The overhead introduced when recording all necessary run-time datamakes the same execution scenario last around 45 minutes. The execu-tion generates traces of roughly 600 MB of data. Notice the differencewith the Ant case study, where we collected 2 GB during a similar 45minute execution period (when tracing). This difference can mainly beattributed to the fact that JMeter relies heavily on library functions,which are excluded from the trace. This exclusion process however, alsocomes at an additional cost because for each call made, an exclusion-filter needs to be consulted before deciding whether to output a call tothe tracefile or not.• Processing this amount of data and calculating the IC CM, IC CC and

IC CC′ metrics took slightly under 30 minutes.• Applying the HITS webmining algorithm on the metric data takes

around 30 seconds.Here we see a very similar situation to the one we encountered during theAnt case study. Results are very much satisfactory, but the round-trip-time isworrisome when one wants to gain a quick overview of the subject application.

6.4 Discussion

6.4.1 Experimental observations

Table 6.3 gives an overview of the experimental setup we performed. Thecolumns show the two criteria according to which we weigh the quality andthe effectiveness of the 4 variations of the heuristic we proposed. The obser-vations we have made during the two case studies are synthesized with thehelp of a scale ranging from −− (for bad conformance to a certain criterion)to ++ (for good conformance). A dot (·) means neither positive nor negative.

Recall. From Table 6.3 it becomes immediately clear that applying theHITS webmining technique on the dynamic IC CC′ measure delivers the bestrecall results. Looking back at Tables 6.1 and 6.2, we see that this techniqueis able to recall 90 and 93 percent of the classes defined in the baseline. Theplain IC CC′ metric, which does not take into account indirect coupling,comes in as second best with recall percentages of around 70% in both case

6.5. OBSERVATIONS WITH REGARD TO THE RESEARCH QUESTION55

Recall PrecisionIC CM − −IC CC − −IC CC′ + ·IC CC′ + webmining ++ +

Table 6.3: Strengths and weaknesses of the proposed coupling-based tech-niques.

studies. IC CM has a lower level of recall (50% or lower), while IC CC slotsin somewhere between IC CC′ and IC CM (for the Ant case study, IC CCperforms level with IC CC′).

Precision. When it comes to precision, IC CC′ combined with the HITSwebmining approach comes out best with a precision of 60%. No othertechnique is able to reach a level of precision above 50%.

Overall. Looking at the two primary criteria, recall and precision, theapproach consisting out of the combination of the IC CC′ metric and theHITS webmining algorithm delivers the best results. However, the round-trip-time needed to perform a complete analysis remains a serious detractor.

6.5 Observations with regard to the research

question

To sum up, we were trying to answer the following question: “is there a clearlink between influential classes and the classes that need to be understoodduring initial programming understanding?”. We can answer this questionaffirmatively. We have based ourselves on two open source case studies forwhich we had a program understanding baseline available. Singling out thecombination of IC CC′ and the HITS webmining algorithm, we have observedthat this heuristic is able to retrieve around 90% (lower bound) of the im-portant classes, while maintaining a level of precision of around 60% (lowerbound).

With regard to the subsidiary questions, “which metric to use” and“whether or not to take into account indirect coupling” we can add thatthe dynamic IC CC′ metric performs best when taking into account indirectcoupling (through the HITS webmining algorithm).


As such, we are able to provide the end-user with a tool that can helphim/her gain a overview of the application and, foremost, a number of start-ing points from where to start his/her further program understanding recon-naissance.

Chapter 7

Static coupling

That which is static and repetitive is boring. That which is dynamic andrandom is confusing. In between lies art.

—John A. Locke

In this dissertation we have mainly talked about dynamic or runtime cou-pling up until now. Now that we have also obtained the results of our casestudies, we are wondering whether a similar approach, albeit performed stat-ically, can match or even surpass the results we have obtained from perform-ing the webmining analysis with dynamically obtained coupling data. In thischapter we first define static coupling measures that are close to the one thatwe used for our dynamic-analysis-based experiment and then make a com-parison of the results we have obtained.

7.1 Introduction & motivation

Calculating dynamic coupling metrics and the consequent application of thewebmining technique is characterized by a number of constraints:• The need for a good execution scenario.• The availability of a tracing mechanism.• Scalability issues (resulting trace file, overhead from tracing mecha-

nism, ...).These constraints apply on the techniques that we discussed in Chap-

ters 4 & 5. In order to verify whether we could overcome some of theseconstraints by working with static analysis instead of dynamic analysis, we

57

58 CHAPTER 7. STATIC COUPLING

have undertaken a control experiment. In this experiment we apply webmin-ing techniques on a static topological structure of the application and verifywhether we can get a similar level of recall and precision as we found forthe dynamic approach (see Chapter 6), all the while obtaining a significantlybetter round-trip-time.

The setup of this experiment is to compare the candidate of choice fromour previous experiments, namely the combination of the IC CC′ metric withthe HITS webmining technique, with a similar technique that uses staticinformation. Furthermore, because we wanted to make the comparison asobjective as possible, we defined static coupling metrics that are as close aspossible to the IC CC′ metric we used in Chapters 4 & 5.

7.2 A static coupling metrics framework

The framework from Arisholm [Arisholm et al., 2004] does not have to make adistinction between static and polymorphic calls due to the dynamic nature ofits measurements. We add notational constructs from the unified frameworkfor (static) object-oriented metrics from Briand et al [Briand et al., 1999] tothe definitions that we previously used from Arisholm. That way, we can stilluse the basic notation from Arisholm we have used in the previous chapters.For that purpose, some helpful definitions are:

Definition 1 Methods of a Class.For each class c ∈ C let M(c) be the set of methods of class c.

Definition 2 Declared and Implemented Methods.For each class c ∈ C, let:• MD(c) ⊆ M(c) be the set of methods declared in c, i.e., methods that c inherits

but does not override or virtual methods of c.• MI ⊆ M(c) be the set of methods implemented in c, i.e., methods that c inherits

but overrides or nonvirtual noninherited methods of c.

Definition 3 M(C). The Set of all Methods.M(C) = ∪c∈CM(c)

Definition 4 SIM(m). The Set of Statically Invoked Methods of m.Let c ∈ C, m ∈ MI(c), and m′ ∈ M(C). Then m′ ∈ SIM(m) ⇔ ∃d ∈ C such thatm′ ∈M(d) and the body of m has a method invocation where m′ is invoked for an objectof static type class d.

Definition 5 NSI(m,m′). The Number of Static Invocations of m′ by m.Let c ∈ C, m ∈ MI(c), and m′ ∈ SIM(m). NSI(m,m′) is the number of methodinvocations in m where m′ is invoked for an object of static type class d and m′ ∈M(d).

7.3. EXPRESSING IC CC′ STATICALLY 59

1 public void foo() {2 BaseClass base = new BaseClass();3 base.doSomething();4 // some other functionality5 base.doSomething();6 }

Figure 7.1: Piece of Java code to help explain metrics.

Definition 6 PIM(m). The Set of Polymorphically Invoked Methods of m.Let c ∈ C, m ∈ MI(c), and m′ ∈ M(C). Then m′ ∈ PIM(m) ⇔ ∃d ∈ C such that m′ ∈M(d) and the body of m has a method invocation where m′ may, because of polymorphismand dynamic binding, be invoked for an object of dynamic type d.

Definition 7 NPI(m,m′). The Number of Polymorphic Invocations of m’ by m.Let c ∈ C, m ∈ MI(c), and m′ ∈ PIM(m). NPI(m,m′) is the number of methodinvocations in m where m′ can be invoked for an object of dynamic type class d andm′ ∈M(d).

7.3 Expressing IC CC′ statically

With these added notational constructs, we are now able to write down fourstatic coupling measures that closely resemble the measurements that weredefined in Section 4.3.3.

The fact that one dynamic metric IC CC′ is translated into 4 static met-rics can be explained by the fact that the static environment offers somedegrees of choice when calculating the metrics. Consider the Java code snip-pet in Figure 7.1:• The choice between static calls and polymorphic calls. In other words

when considering Figure 7.1, do we only count the reference to Base-

Class or also to all subclasses of BaseClass?• Do we count duplicate calls for the same (origin, target) pairs? When

considering Figure 7.1 do we count the base.doSomething() call onceor twice (lines 3 and 5).

For the purpose of our research we have defined 4 metrics that vary overthe characteristics described above.

Definition SM SO Static Metric, Static calls, count every Occurrence of a call onlyonce.

SM SO(c1, c2) = #{(m2, c2, c1)|∃ (m1, c1), (m2, c2) ∈ RMC

∧ c1 6= c2 ∧ (m1, c1,m2, c1) ∈ IV

∧ m2 ∈ SIM(m1)}


Definition SM SW Static Metric, Static calls, count every occurrence of a call(Weighted).

SM SW (c1, c2) = identical to SM SO(c1, c2), but { } should beinterpreted as bag or multiset.

Definition SM PO Static Metric, Polymorphic calls, count every Occurrence of acall only once.

SM PO(c1, c2) = #{(m2, c2, c1)|∃ (m1, c1), (m2, c2) ∈ RMC

∧ c1 6= c2 ∧ (m1, c1,m2, c1) ∈ IV

∧ m2 ∈ PIM(m1)}

Definition SM PW Static Metric, Polymorphic calls, count every occurrence of acall (Weighted).

SM PW (c1, c2) = identical to SM PO(c1, c2), but { } should beinterpreted as bag or multiset.

To calculate these metrics, we used the JDT2MDR Eclipse plugin devel-oped by Bart Du Bois, a fellow member of the LORE research group [Zaidmanet al., 2006b]. JDT2MDR transforms a Java project to a graph representa-tion closely resembling the metamodel employed by Briand et al. [Briandet al., 1999], thereby enabling the calculation of the coupling and cohesionmeasures formalized in their paper.

7.4 Results

This section will give an overview of the results we have obtained from ap-plying the static approach to finding the most important classes in our twocase studies, Apache Ant and Jakarta JMeter. We compare the results wehave obtained with (1) the best result obtained from the dynamic approach,namely the combination of IC CC′ and webmining and (2) the baseline ob-tained from the documentation from these open source projects.

Besides recall and precision, the criteria we used for determining the bestdynamic approach, we will also keep a close eye on the round-trip-time ofthe static approach, as this is a factor where we expect the static approachto be able to significantly outperform the dynamic approach.

7.4.1 Ant

Based on the results shown in Table 7.1, two categories are formed, namelythe category of metrics that takes polymorphism into account (SM P*) and

7.4. RESULTS 61

Class 1:IC

CC′+

web

min

ing

2:SM

PO

+w

ebm

inin

g

3:SM

PW

+w

ebm

inin

g

4:SM

SO+

web

min

ing

5:SM

SW+

web

min

ing

6:A

ntdo

cs

Project√ √ √ √ √ √

UnknownElement√ √ √ √ √ √

Task√

79 81 119 120√

Main√ √ √ √ √ √

IntrospectionHelper√ √ √

116 105√

ProjectHelper√

97 99 90 190√

RuntimeConfigurable√ √ √

63 63√

Target√

89 93 100 100√

ElementHandler√

192 198 125 125√

TaskContainer N/A 398 403 381 383√

→ recall (%) 90 50 50 30 30 -→ precision (%) 60 8 8 5 5 -

Table 7.1: Ant metric data overview.

the category that does not take polymorphism into account (SM S*). Theformer category exhibits a recall level of 50%, while the latter recalls 30%.Although interesting from the point of view that polymorphism does in-deed play an important role when considering program comprehension, froma practical perspective, these results are disappointing when compared tothe results obtained with the dynamic approach. The observation regardingpolymorphism can be explained by the fact that (1) sometimes a base classis abstract or (2) the base class is not always (or should we say mostly not)the most important class in the hierarchy. The second variation point for thestatic metrics, namely whether to only count an occurrence of a particularcall once or to count every occurrence of a call (weighted), does not seemto make any difference with regard to our specific context (small variationsexist, but these do not influence the resultset).

The fact that precision for the 4 static metrics in columns 2, ..., 5 is muchlower (8% or less) than what we experienced with the dynamic approach, canbe explained by the size of the inputsets, as the inputset for the static exper-iment was 403 classes, while for the dynamic experiment this was only 127classes. When using our rule-of-thumb of presenting the 15% highest rankedclasses in the final resultset, we end up with 60 and 15 classes respectively.


A further point to be made regarding this rule-of-thumb is that whenlooking at the ranking of classes that fall outside the top 15%, lowering thebar to 20% would not have resulted in a (significant) gain in recall, whileprecision would drop further. We can also add, that by raising the bar to10%, recall would fall with 10%.

Considering the round-trip-time, we measured that the prototype (static)metrics engine took one hour to calculate the metrics for Ant. Applying theHITS algorithm takes less than one minute.

7.4.2 JMeter

Class 1:IC

CC′+

web

min

ing

2:SM

PO

+w

ebm

inin

g

3:SM

PW

+w

ebm

inin

g

4:SM

SO+

web

min

ing

5:SM

SW+

web

min

ing

6:JM

eter

docs

AbstractAction√

275 275 336 336√

JMeterEngine√ √ √

484 484√

JMeterTreeModel√ √ √

150 150√

JMeterThread√ √ √

147 147√

JMeterGuiComponent√ √

475 475√

PreCompiler√

362 362 293 293√

Sampler√

457 478 454 454√

SampleResult√

119 119 209 209√

TestCompiler√ √ √

145 145√

TestElement√ √ √

451 451√

TestListener√

450 443 449 449√

TestPlan√

113 113 234 234√

TestPlanGui√

93 93√ √ √

ThreadGroup√

140 140 157 157√

→ recall (%) 93 43 43 7 7 -→ precision (%) 62 8 8 1.4 1.4 -

Table 7.2: JMeter metric data overview.

Similar to what we saw in the Ant case study, two groups can be iden-tified within the JMeter resultset presented in Table 7.2, namely one groupconsisting out of SM PO and SM PW, and one group formed by SM SOand SM SW. Within these two groups, recall and precision are identical, al-though minimal differences exist when looking at the ranking of some classes.

7.5. DISCUSSION 63

In contrast with the previous case study, Ant, these differences are muchmore pronounced. It is our opinion that this is probably due to the factthat most method calls happen only once in each unique method, as op-posed to multiple occurrences of a method call in a unique method, wherethe weighted approach (of SM PW and SM SW) would make the differencemore pronounced.

Also to be noted is the sizeable dissimilarity between the results obtainedwhile only taking into account static calls versus also taking polymorphiccalls into account. As Table 7.2 shows, the SM P* metrics have a recall of43%, while the SM S metrics only recall 7%.

Of interest to note is the fact that when looking at the ranking of theclasses outside the top 15%, it is clear that lowering the bar to the 20%highest ranked classes would not improve recall.

For what the round-trip-time is concerned, the metrics engine took almost1 1

2hours to calculate the metrics for JMeter. This is a considerable increase

from what we saw with Ant. This increase can be attributed to the fact thatJMeter has (1) a larger codebase and (2) uses more libraries, which also needto be parsed. Applying the HITS algorithm takes slightly over one minute.

7.5 Discussion

7.5.1 Practical implications

In Section 7.1 we talked about three drawbacks of the dynamic webminingapproach. Now, after having performed a similar experiment in a static way,we will discuss each of these drawbacks and see whether these are strictlyinherent to the dynamic approach we introduced:

1. The necessity of a good execution scenario.When performing static analysis, having an execution scenario is noissue. However, access to the source code remains a prerequisite. Forcompleteness sake, we do add that reverse engineering (and the subse-quent extraction of coupling metrics) from binaries is sometimes pos-sible. However, having access to the source code is a criterion whichoften has a much more limited impact than having a good executionscenario. As such, static analysis is to be favored here.

2. The availability of a tracing mechanism.Although a tracing mechanism is no longer an issue, having a metricsengine remains a necessity. To implement such an engine, either opensource tools need to be available or a parser needs to be constructed.Because a similar precondition exists for both processes, neither of the


two approaches has a clear advantage here.3. Scalability issues.

In terms of scalability the dynamic process is plagued by the possiblyhuge size of the tracefile. This has consequences on multiple levels:• The I/O overhead on the traced application (e.g. for Ant: exe-

cution of 23 seconds without tracing versus just under one hourwith tracing).• The size of the trace (2 GB in the case of Ant).• The time it takes to calculate the IC CC′ metric and perform the

HITS webmining algorithm on this 2 GB of data. In the case ofAnt this takes around 45 minutes.

We were already aware of the below par round-trip-times from thedynamic approach. However, when comparing these times with thestatic approach, we observe that our prototype metrics engine tookone hour to calculate the metrics for Ant and slightly over one hour forJMeter. Applying the HITS algorithm takes less than one minute, sothe total round trip time is around one hour for both projects. Whilethese times are not so different from the dynamic process, the dynamicprocess still needs the tracing step, which makes that the round triptime for the dynamic process is significantly larger and in the case ofAnt takes around two hours.

7.5.2 Comparing static and dynamic results

In Chapter 6 we saw that the IC CC′ metric combined with the webminingsolution provides a level of recall of at least 90%, while safeguarding a level ofprecision of around 60%. When we look at the results of the static couplingmetrics that we introduced in this chapter, we see that we are able to reacha maximum level of recall of 50%, while the level of precision drops to 8% orless. This observation makes it quite obvious that the dynamic approach isthe solution of choice when only considering the recall and precision results.

7.5.3 Conclusion

Table 7.3 provides an overview of the strengths and weaknesses of both thestatic and the dynamic approach. Although we see that the static approach(the SM * metrics) are better at the round-trip-time performance, they fallthrough when considering their recall and precision characteristics. As such,when considering early program comprehension purposes, the dynamic ap-proach is the best choice, even though its round-trip-time performance is asevere drawback.

7.5. DISCUSSION 65

Recall Precision TimeIC CC′ + webmining ++ + −−SM PO + webmining · −− +/−SM PW + webmining · −− +/−SM SO + webmining −− −− +/−SM SW + webmining −− −− +/−

Synthesis of observations from the results obtained during the experiments.We use a scale ranging from −− (for bad conformance to a certain criterion)to ++ (for good conformance). A dot (·) means neither positive nor negativeand +/− signifies that the results are too much case-related to draw anysignificant conclusion.

Table 7.3: Comparison of the strengths and weaknesses of the static and thedynamic webmining approach.


Part III

Frequency based solutions forprogram comprehension

67

Chapter 8

Frequency Spectrum Analysis

Machines take me by surprise with great frequency.

—Alan Turing

In this chapter we look at a technique to ease the navigation of large eventtraces or a visualization of such an event trace, e.g. an UML sequence di-agram. The technique uses the relative frequency of execution of methodsor procedures within the execution of a software system to generate a vi-sualization that we call a “heartbeat” visualization because is resembles thevisualization that is typical of an electrocardiogram or ECG. With the helpof the visualization it then becomes possible to navigate through the traceand identify regions in the trace where similar or identical functionality isperformed.

8.1 Introduction

8.1.1 Motivation

When it comes to dynamic analysis, one of the most accessible types of infor-mation is the execution frequency of entities within a software system. Thisparticular axis within the run-time information-space of software systems iscommonly used in several software engineering disciplines:• For optimization purposes the software engineer can detect frequently

called (and perhaps time-intensive) entities within a software system.These particular entities can then be subjected to a closer look in or-der to bring about optimizations within the code of that particular

69

70 CHAPTER 8. FREQUENCY SPECTRUM ANALYSIS

entity, because the biggest gain in performance can be obtained fromoptimizing these frequently called entities.• Several virtual machine platforms employ similar schemes to detect

which classes or which methods to optimize. This optimization happensmainly through the (1) inlining1 of frequently called virtual methodsor the just-in-time compilation of methods or complete classes. Anexample of this is the “Hot Spot” technology found in Sun’s recentJava Virtual Machine implementations2.

Even though frequency analysis has been in use within the software engi-neering community for some time, it was never directly used for programcomprehension purposes. This changed when Thomas Ball introduced theconcept of “Frequency Spectrum Analysis” (FSA) [Ball, 1999], a way tocorrelate procedures, functions and/or methods through their relative callingfrequency. This correlation can e.g. happen on the basis of input data, whereobservations are made as to how many input-values a program receives andhow many times certain procedures or methods are called internally. Thesame can be done for output, or one can look at relative frequencies of exe-cution of methods or procedures that are shielded from everything that hasto do with input/output.

8.1.2 Research questions

In this research track, we are looking for ways to exploit the relative execu-tion frequency specifically for program comprehension purposes. The centralresearch questions we have with regard to this research track are:

1. Can we use the relative execution frequency to distinguish tightly col-laborating methods or procedures in a trace?

2. Can we make a visual representation of the execution trace that is at atime scalable and allows to identify these tightly collaborating entities?

3. Is it possible to use this visualization to help the end user navigatethrough the trace and let him/her skip parts of the trace that aresimilar or identical? This question can be subdivided into whether thevisualization allows to discern:• the repetitive calling of end-user functionality (e.g. the repetition

of a use case), i.e. on the macro-level.

1Inlining is a compiler optimization which “expands” a function call site into the actualimplementation of the function which is called, rather than each call transferring controlto a common piece of code. This reduces overhead associated with the function call, whichis especially important for small and frequently called functions.

2For more information about this technology, see:http://java.sun.com/products/hotspot/

8.1. INTRODUCTION 71

• the repetitive calling of lower-level building blocks that are presentin the application, i.e. on the micro-level.

8.1.3 Solution space

Conceptually, in object-oriented software systems, classes (or their instan-tiations — objects) work together to reach a certain goal, i.e. perform acertain function as specified by e.g. a use-case scenario. This collabora-tion is expressed through the exchange of messages between classes. Thismessage-interaction typically occurs according to a certain interaction proto-col. As such, this interaction protocol gives rise to a relationship between thetwo messages and the classes to which the methods belong. This relationshipis also expressed through the relative execution frequency of the messages in-volved. It is based on this execution frequency that we will try to uncoverthe interaction protocol induced relationships. Furthermore, we have seenthat even though the number of classes involved in an execution scenario isfinite and often very limited as well, these interactions nevertheless give riseto sizeable execution traces. Intuitively, this huge size can be explained byrepetitive interactions between multiple instances of classes, which further-more strengthens the idea that execution frequency can be used to uncoverinteraction protocols.

Visualizations of traces, e.g. through UML Interaction Diagrams, makethe trace readable, but therefore not (cognitively) scalable. A typical exam-ple of a visualization tool is IBM’s Jinsight [De Pauw et al., 2001]. To ensurecognitive scalability, we ideally want to guide the end-user quickly and easilythrough the possibly huge execution trace (or its visualization) with the helpof a heuristic [Jahnke and Walenstein, 2000]. The end-user being a softwareengineer trying to familiarize himself/herself with a previously unknown soft-ware system. This heuristic, based on the relative frequency of execution ofmethods, can help provide a program comprehension solution that helps theend-user navigate through the execution trace, by marking highly repetitiveregions in the trace. These regions can be inspected and the identical orsimilar regions can then be quickly discarded.

We explicitly mention that we are working towards building a heuristic,because in the face of huge execution traces, thoroughness comes at a costand the question of scalability inevitably arises [Larus, 1993,Smith and Korel,2000]. Furthermore, when considering traditional dynamic analysis purposessuch as program optimization, soundness plays a crucial role in developing atechnique to guarantee behavior preservation [Mock, 2003]. Dynamic analysisfor program understanding relaxes the problem considerably, because we canafford non-optimal precision.


The actual solution we propose is evolutionary with regard to the con-cepts presented by Ball in [Ball, 1999]. Building upon his concept of “Fre-quency Spectrum Analysis”, we propose a scalable visualization of an execu-tion trace. This visualization can best be described as a heartbeat visualiza-tion of the system, similar to the visual result of an ECG3.

In order to try to answer the research questions within this research track,we use two open source case studies, namely Fujaba and Apache Tomcat.

8.1.4 Formal background

In a more formal way, we can say that we are actually looking for evidenceof the concepts of dominance and post-dominance, borrowed from theslicing community [Tilley et al., 2005]:

We say that an instruction x dominates an instruction y if thetrace prefix which ends with y also contains an instruction x. Inother words an instruction x dominates an instruction y if andonly if the only way to make sure that y gets executed means thatx has already been executed. x post-dominates y if every tracepostfix which begins with y also contains x. Or one can say thatx post-dominates y if every execution of y indicates that x willalso be executed in a relatively short period of time.

8.2 Approach

The approach we follow when applying the heuristic and analyzing its resultsis defined as a seven-step process. This section expands on each of these steps.

Step 1: Define an execution scenario Being aware that even smallsoftware systems that are run for only a few seconds can be responsiblefor generating sizeable execution traces, limiting the events recorded in theexecution trace is a first step towards scalability. Defining a strict executionscenario, that only exercises those use case scenarios that are of interestto the program comprehension assignment or reverse engineering context iscertainly advisable. Moreover, defining a strict execution scenario helps toadhere to the goal oriented strategy we mentioned in Section 3.2.1.

3Electrocardiogram, the tracing made by an electrocardiograph, an instrument forrecording the changes of electrical potential occurring during the heartbeat used espe-cially in diagnosing abnormalities of heart action (source: Merriam-Webster dictionary).

8.2. APPROACH 73

Step 2: Define a filter A second possibility to limit the size of the ex-ecution trace is the up-front exclusion of events that lie outside our zone ofinterest. Good examples of such a situation are method calls to part of thesystem that we are not interested in, e.g. library calls. Table 8.1 shows theresults of a normal tracing operation and of a tracing operation which filtersout all method calls belonging to classes from the Java API4 (Java 2 Stan-dard Edition, release 1.4.1). This filtering operation leads to a significantreduction of the trace data, as we are able to reduce the total number ofevents to between 7 and 15% of the original trace.

Jakarta Tomcat Fujaba 44.1.18

Execution time 48s 70s(without tracing)Classes (total) 13 258 15 630Events 6 582 356 12 522 380Unique events 4 925 858 505Classes (filtered) 3 482 4 253Events 1 076 173 772 872Unique events 2 359 95 073

Table 8.1: Comparison of total tracing versus filtered tracing.

Step 3: Trace according to the scenario using the filter This stepconsists of running the program with an online tracing mechanism accordingto the previously defined execution scenario and with the tracing filter inplace. The result of this step is a file which contains a chronological list ofall method calls which were executed during the scenario.

Step 4: Frequency Analysis In this step, we run over the trace and wecreate a map which contains for every unique method found in the trace thenumber of times it has been called. We decided to perform this step post-mortem, i.e. after the tracing operation itself (Step 3), instead of online. Thereason behind this is that by doing so, we take another measure to minimizethe impact of the tracing operation on the running program, which, indeed,is already impacted by generating the trace (e.g. through the I/O cost).

Step 5: Frequency Annotation We walk over the original trace oncemore and annotate each event with the frequency we retrieve from the map

4The Java API is a standard library that contains functionality for dealing with strings,inter process communication, containers, ...


that we have created in Step 4. The result is a chronological list of executedmethods, with an added first column which respresents the frequency ofexecution of the method listed in column two. Remark that the values foundin the first column represent the total number of times a method is executedduring the scenario. It is important to use the total frequency of executionbecause we want to distinguish methods working together based on theirrelative frequencies. An example can be found in Figure 8.1.

...543 XMLParser.init()978 XMLParser.parseString(String)1243 XMLParser.closingTagFound()1243 XMLParser.validXMLElement()543 XMLParser.close()...543 XMLParser.init()978 XMLParser.parseString(String)1243 XMLParser.closingTagFound()1243 XMLParser.validXMLElement()543 XMLParser.close()...

Figure 8.1: Frequency annotation example.

Please also remark from the example trace that we explicitly omit objectidentifiers (OID’s) and parameter values, because we are looking to makean abstraction and as such, we are not interested in specific instances ofinteraction protocols.

Step 6: Dissimilarity Measure Using the annotated trace we samplethe frequencies of a sequence of method calls, resulting in a characteristicdissimilarity measure for that sequence of events. Conceptually this charac-teristic dissimilarity measure can be compared with a fingerprint, hence itsname: frequency fingerprint.

The sampling mechanism uses a sliding window mechanism to walk overthe annotated trace. When going over the trace, we let the window fill up;once the window size is reached, we apply the dissimilarity measure on thefrequencies of the events in the window and then discard the contents of thewindow. We repeat the process until the end of the trace is reached.

8.2. APPROACH 75

... ...fi−1 eventi−1

fi eventifi+1 eventi+1

fi+2 eventi+2

fi+3 eventi+3

fi+4 eventi+4

apply dissimilarity measure

fi+5 eventi+5

... ...

We illustrate the process with a window size 5. The dissimilarity measureis applied on the frequencies of the events that lie in the interval [fi, fi+4],after that i is incremented by 5, the window size, and the process is repeated.The implementation thus uses simple consecutive blocks for the windows.

In our experiment, we have taken the most commonly used distance met-ric, namely the Euclidian distance [Fraley and Raftery, 1998,Kaufman andRousseeuw, 1990] as a dissimilarity measure to characterize how “related”the method calls within one window are.

d =∑w−1

j=1

√(fj−1 − fj)2

Euclidian distance: with ’w’ the window size and fj as the frequency of thej-th event in the current window on the trace.

Step 7: Analysis When the previous steps have been executed, we arein a position to analyze the dissimilarity measure and the trace looking forclues that point to interesting regions in the trace. To make this analysis stepeasier, we use a very simple visualization that plots the dissimilarity measureon the Y-axis for consecutive windows (X-axis). As such, the X-axis can beinterpreted as being “time”.

• On the one hand we are looking for regions in the trace where thefrequency of execution is (almost) identical. Inspections of the tracesfrom our case studies have learned us that these regions are often rela-tively small, mostly in the neighborhood of ten to thirty method calls.After which, the frequency of execution changes, before changing againto an almost identical level. Evidence of regions in the trace wherethe frequency of execution is identical and the resulting dissimilarityis low to near-zero, is where a frequently applied interaction proto-col is used. The case studies have shown us a typical example of this,namely a wrapper construction, where an old component was wrapped.All communication from the application to that one (older) component


happened through the methods available in the wrapper. We showanother example in Figure 8.2.

...

312 CommunicationChannel.init(String channelType)312 CommunicationChannel.setOptions(ChannelOptions options)312 CommunicationChannel.send(String message)312 CommunicationChannel.receive()312 CommunicationChannel.close()

0

...

This annotated trace fragment shows a frequently occuring interaction protocolused for inter process communication purposes. All methods participating inthe interaction protocol are executed the same number of times. When usinga window size of 5, this results in a dissimilarity of 0.

Figure 8.2: Example of identical execution frequency.

• On the other hand we look for recurring patterns in the dissimilarityindex. Sometimes, methods that work tightly together do not havea similar execution frequency. A typical situation of this can best bedescribed as variation points that exist within the code. These varia-tion points can be introduced through typical conditional constructs orthrough the use of polymorphism. Nevertheless, because these methodsare frequently executed together, a regular pattern appears, a so-calledfrequency pattern. We have extended the example of Figure 8.1 in Fig-ure 8.3 to show such a frequency pattern.

These two types of regions in the trace that carry our interest are calledclusters.


8.3.1 Hypothesis

Having explained the inner workings of the heuristic, we are now ready toformulate our four-part hypothesis.

1. The majority of the found clusters will in fact be frequency patterns.Frequency patterns are mostly the result of using polymorphism andbecause polymorphism is abundantly present in object-oriented soft-ware, we expect this type of clusters to be numerous.


...

543 XMLParser.init()978 XMLParser.parseString(String)1243 XMLParser.closingTagFound()1243 XMLParser.validXMLElement()543 XMLParser.close()

1243

...

543 XMLParser.init()978 XMLParser.parseString(String)1243 XMLParser.closingTagFound()1243 XMLParser.validXMLElement()543 XMLParser.close()

1243

...

Here we illustrate the concept of a frequency pattern, where a number of meth-ods are frequently executed in the same order, without them being relatedthrough an identical frequency of execution. In this example we use a windowsize of 5 and calculate the dissimilarity value for each window.

Figure 8.3: Frequency pattern.

2. Enlarging the window size introduces noise in the frequency signaturesbecause sequences of methods which logically form a whole are perhapssmaller than the window size. This can lead to false negatives.

3. Shrinking the window size introduces noise on the results because whenfrequency signatures become so small, everything becomes a frequencypattern. This can lead to false positives.

4. Regions in which a certain action is repeated become easily discernible:if at a point in time x a certain functionality is activated and at anotherpoint in time y the same functionality is activated, this will be visiblein the dissimilarity values.

8.3.2 The experiment itself

We provide empirical data on and anecdotal evidence about the clustersfound in the event traces in the two case-studies we used. We consider theseresults to be preliminary, because (1) the results have only been comparedmanually with the traces, (2) the validation of the results has only been donefor two cases. We want to verify the results more thoroughly in another ex-periment which would allow us to visualize the clusters in parallel to browsingthe traces in order to do a more thorough validation.


8.3.3 Case studies

We use two well-known open-source Java programs in our experiments:

1. For the representative of a non-graphical, server-like program we choseJakarta Tomcat of the Apache Software Foundation5. Tomcat’s ori-gins lie with Sun Microsystems, but it was donated to the Apacheopen source community in 1999. Since then, the application has seensome major new releases and has been widely accepted as the refer-ence implementation for the Java Servlet and Java Server Pages (JSP)technologies. Furthermore, it is commonly used in industrial settingsin tandem with the Apache HTTP server.

2. On the other hand we have chosen Fujaba6, an open-source UML toolwith Java reverse engineering capabilities. Due to its intensive useof the Java Swing API it is an excellent representative for applicationswith a heavy GUI. This project originates from the University of Pader-born and has been developed by multiple students. It is frequently usedas a research vehicle in the domain of UML modeling and Model DrivenEngineering.

We performed three experiments. We will present them in a brief overviewto give a clearer view on why we performed each of them.

1. The first experiment, performed on Jakarta Tomcat, was executed inorder to validate our hypothesis about window sizes. Starting fromthe same event trace we used different window sizes when applying ouralgorithm.

2. The second experiment recreates the first one, but this time for ourother case-study, namely Fujaba.

3. The third experiment on the other hand, focusses on a slightly differentaspect. We wanted to know how a very specific usage scenario would beprojected onto the dissimilarity graph. Therefore, we defined a usagescenario with a small number of repetitive actions in it and lookedat the results of our heuristic. This experiment specifically zooms inon our third research question (see Section 8.1.2) to see whether it ispossible to spot repetition at the macro-level.

As a final note we wish to add that for all three experiments we made use ofthe filtering technique that eliminates method calls to classes from the JavaAPI, see also Table 8.1.

5More information can be found at: http://tomcat.apache.org/In 2005 Tomcat became a project on its own and left the Jakarta umbrella. It now belongsdirectly to the Apache set of tools and applications.

6Fujaba stands for “From UML to Java and Back Again”, more information on thisproject can be found at: http://www.uni-paderborn.de/cs/fujaba/

8.4. RESULTS 79

8.4 Results

8.4.1 Jakarta Tomcat 4.1.18

Experiment 1

As we pointed out in the previous subsection, this experiment was set upto show the results of differing the window size in our heuristic. We discussthe results of the experiment by looking at Figures 8.4 through 8.7. Thesefigures represent the dissimilarity value of a group of methods, the currentwindow, at a certain point in time during the execution of the program. Assuch, the X-axis can be interpreted as being time. The Y-axis then is thedissimilarity value.

For the purpose of detecting the frequency patterns we talked about ear-lier, we zoomed in on an interval of the chart in Figure 8.7. The result ofthis is shown in Figure 8.9.

When comparing the results of our first experiment with the hypotheseswe introduced in the previous section, where does this leave us?

1. From figures 8.4 through 8.7 it is clear that regions where the dissimi-larity is near-zero are rather limited. In this trace we can only detect ahandful of them. Frequency patterns however are much more frequent,just look at Figure 8.9: between index 86000 and 99000 on the X-axisthere is a clear repetition in the dissimilarity measure.

2. Increasing the window size does not seem to have an influence on theregions with near-zero dissimilarity. This is mainly due to the factthat the execution sequences in these regions remain constant for sometime, i.e., the execution pattern is longer than the (large) window size.Experimenting with window sizes in the neighborhood of 100, however,does show that noise is introduced. This is true for both the regionswith near-zero dissimilarity and the frequency patterns. On the otherhand, frequency patterns are more easily discernible with slightly largerwindow sizes: in figures 8.6 and 8.7 for example, they are much easierto spot than in figures 8.4 and 8.5.

Before going over to our second experiment, we first turn our attentionto the specifics of the already mentioned frequency patterns. Some intervalsshow a recurring pattern in the dissimilarity measure. We took Figure 8.7and blew up the interval [80000, 100000] for the X-axis. The result is shownin Figure 8.9.

Frequency patterns are even more interesting than the regions that have anear-zero dissimilarity value. Why? Because (1) these frequency patterns aremuch more common and (2) because of the polymorphic nature of object-


0 11156500

0.5

1

1.5

2

2.5

3

3.5

4x 10

5

Time (expressed in # events)

Dis

sim

ilarit

y va

lue

As the chart shows, until the 150.000th x-value the dissimilarity measure(Y-axis) remains low. After that there is a small period where the dissimi-larity is near-zero. The interval where the dissimilarity is low, points to ahigh repetition of method invocations (either identical method invocations ormethod invocations related through their frequency of invocation). The mostcommon instances of this kind of repetition are for example the traversal ofa linked list.

Figure 8.4: Tomcat with dissimilarity measure using window size 2

oriented software, it is much more realistic to find clusters in which notevery event is executed the same number of times over and over again. Thiscan be explained by the late binding mechanism in which the exact methodinvocation depends on the type of data to be processed. We illustrate thiswith an example. Consider Figure 8.8.

In the example from Figure 8.8, after eventa and eventb have been exe-cuted, due to polymorphism there is a choice between for example events c,d or events x, y.

Suppose fa = fb = fe and fa 6= fc, fa 6= fd. Neither for executionsequence 1 nor execution sequence 2 would this yield a zero dissimilarityvalue. The chance that fc = fx and fd = fy is pretty slim. That is whyboth execution sequences give rise to a unique frequency signature. Unique,because when fc 6= fx or fd 6= fy they will certainly generate different values

8.4. RESULTS 81

0 11156500

1

2

3

4

5

6

7

8x 10

5


Dis

sim

ilarit

y va

lue

Low or almost zero values for the dissimilarity measure are still clearly vis-ible when using a window size of 5 events. No extra places where there is alow dissimilarity value have been added, so there is no report on false posi-tives. The false negatives did not come through either: no regions where thedissimilarity value is near-zero have disappeared with regard to Figure 8.4


for the dissimilarity measure.

Recording these frequency patterns as clusters when they tend to bepresent multiple times in the event trace is a good idea, because they havesome interesting properties:

• They often tend to repeat themselves in the same locality.• Manual inspection of the trace learned us that frequency patterns are

much more realistic: they are not concentrated around a small numberof methods and are constituted out of a variety of method invocations,often originating from many different classes. As such, these clustersare much more realistic in large-scale object-oriented systems.


0 11156500

2

4

6

8

10

12x 10

5


Dis

sim

ilarit

y va

lue

When doubling the window size to 10, there is still no indication of falsenegatives. Intervals with low dissimilarity are still easily discernible.


8.4.2 Fujaba 4.0

For this case-study we have opted to perform two separate experiments. Oneexperiment is a repeat of the Tomcat experiment, but this time on Fujaba.The second is an experiment whereby a scenario with some repetitive actionsis observed.

Fujaba experiment 1

For this experiment, we will not show the results for all window sizes as wedid for the Tomcat case-study, but we will go straight to the largest windowsize, namely window size 20. In short we can say that the conclusions fromthe Tomcat case remain valid: medium to large window sizes remain themost interesting to distinguish the frequency patterns.

When looking at Figure 8.10, what immediately stands out is the oscil-lation of the dissimilarity measure in the interval [1, 35000]. From manualinspection, we learn that this behavior stems from the animated “splash

8.4. RESULTS 83

0 80000 11156500

2

4

6

8

10

12

14

16

18x 10

5


Dis

sim

ilarit

y va

lue

We again doubled the window size and have no indication of false negatives.


execution sequence 1 execution sequence 2eventa eventa

eventb eventb

eventc eventx

eventd eventy

evente evente

Figure 8.8: Example of two execution traces with possible polymorphism

screen”7 from Fujaba. From index 35 000 onwards, we begin executing thescenario. This scenario consists of the drawing of a simple class hierarchy.Intuitively it is logical to assume that drawing a number of classes also in-vokes a sequence of methods the same number of times. This is exactly whatFigure 8.10 shows when you look at the interval [35000, 45000].

Although this experiment is not a good example for the near-zero dis-

7A splash screen is an introduction screen for a program that is starting up. In thecase of Fujaba it is animated and has text scrolling over it. Graphically it is quite heavy,so this can explain the heavy oscillating behavior of the dissimilarity measure.


80000 90000 1000000

0.5

1

1.5

2

2.5x 10

5


Dis

sim

ilarit

y va

lue

Between time-index 87000 and 98000 there is a clear pattern of repetition,which in the middle of that interval is slightly altered. Considering the factthat this pattern ranges over around 10000 events, further investigation iswarranted. Close inspection learned us that this frequency pattern is thetraversal of a linked list. The slight alteration in the middle can be explainedby polymorphism: not all elements in the linked list have the same dynamictype and as such, there is a slight distortation at this point.

Figure 8.9: Blowup of the interval [80000, 100000] of Figure 8.7 to showfrequency patterns

similarity measure, it supports the frequency patterns theory. The regularpattern that is visible after X-index 35000 is a good example of this.

Fujaba experiment 2

Remaining with Fujaba, we conducted a second experiment. We defined aspecific usage-scenario with a highly repetitive nature. This scenario can bedescribed as follows: after starting the program, we defined a class-hierarchy.The hierarchy consisted of one abstract base class, several child-classes, whothemselves also had a number of child-classes. The total hierarchy consistedof 8 classes with a maximum nesting depth of 3.

Intuitively we expect that the visualization of the dissimilarity metricwould show an 8-time repetition. Figure 8.11 shows that this is indeed the

8.4. RESULTS 85

0 1 2 3 4 5 6

x 104

0

5

10

15x 10

4

Samples

Dis

sim

ilarit

y va

lue

The most interesting interval is [35000, 45000]. Here we clearly see a four-time repetition pattern: first there a two repetitions, then there is a suddendrop in the dissimilarity, characterized by the thin white line in the visual-ization, before there is again a two-time repetition, which is identical to thefirst two-time repetition.

Figure 8.10: Fujaba with dissimilarity measure using window size 20

case. The graph shows 9 peaks in the dissimilarity value. Although theseare interesting, we are more interested in the 8 interlying “valleys” (or de-pressions). The reason that these 8 regions are valleys and not peaks can beexplained by the fact that the methods who are working together to drawsuch a class are closely related through their frequencies, which generateslower dissimilarity values.

These 8 valleys point to the functonality that is activated for drawing theclass that is added to the hierarchy. Note however, how the valleys becomemore stretched as we add more classes to the hierarchy. Inspection of thetrace showed that this is due to the layout algorithm which needs more actionsto perform the (re)layout operation due to the higher number of objects thathave to be placed.

Instead of showing listings from the actual trace to show you the repetitivenature of the actions that can be seen around the X-axis interval [44 000,


0 1 2 3 4 5 6 7

x 104

0

0.5

1

1.5

2

2.5x 10

5

Samples

Dis

sim

ilarit

y va

lue

8 "valleys"

This graph shows the dissimilarity evolution of Fujaba scenario with a highdegree of repetition. The executed scenario consisted of drawing a class hi-erarchy consisting of 8 classes. The 8 corresponding “valleys” are annotatedon the graph. Note that the valleys become somewhat larger towards the end,this can be attributed to the fact that the layout algorithm has to be calledmore times as more objects are placed on the drawingcanvas.

Figure 8.11: Fujaba scenario with a high degree of repetition

54 000], we decided to use techniques for the detection of duplicate code.This allows us to show you that the valleys in Figure 8.11 contain a lot ofrepetition in the executed methods. This evidences only the repetitive natureof method invocations when performing a specific functionality. The secondaspect, namely that methods working together to achieve a common goalhave the same (or related) method invocation frequency became clear aftermanual inspection of the annotated trace (see also Section 8.2, step 4).

The duplicate code detection tool we used is called Duploc [Ducasse et al.,1999]. This tool visualizes code duplication as a dotplot. The visualizationshould be seen as a matrix, where both the X-axis and the Y-axis are lines inthe file. Every time that an identical line is found, a black dot is placed. So,when comparing a file which contains absolutely no duplication with itself, allthe dots on the main diagonal will be marked. However, when duplication is

8.5. DISCUSSION 87

present in the file, other dots will also be marked. For example, when the i-thline is identical to the j-th line, the dot with coordinates (i, j) will be markedblack. Duploc extends this basic principle with what is called a mural view,which allows to scale the dotplot principle so that a small matrix of dots (e.g.4) is replaced by one dot in the mural view. The color-intensity of the dot inthe mural view is determined by the number of dots in the matrix that aremarked. As such, the intensity can range from white (no duplication), overshades of grey, to black, when all 4 dots in the matrix indicate duplication.

The result of applying Duploc is shown in the mural view of Figure 8.12.Two interesting properties of this figure are:

1. (short) lines that run parallel to the main diagonal. This points to(quite lengthy) duplication.

2. recurring patterns in the lower right quadrant of the figure. The verysimilar shapes that can be spotted in the lower right quadrant alsopoints to a lot of repetition in the execution trace.

Moreover, when we compare this with the findings from Figure 8.11 wefind that the regions which are white in Figure 8.12 are the regions which comeout as “peaks” in Figure 8.11. White regions point to no duplication. Thisevidences the fact that the methods which are performed during the peakscan in fact be seen as glue code. This is in accordance with our earlier findingsfrom the dissimilarity value: regions with a high degree of repetition (and/ormethods that work together) show a relatively low dissimilarity value.

8.5 Discussion

By analyzing the charts we have presented in this chapter, combined withthe evidence we found in the execution traces and our knowledge from theinternals from the case studies themselves, we have made the following ob-servations:

1. Regions with near-zero dissimilarity value are easy to spot, even with awindow size that is quite large. This means that we can easily use a bigwindow size, thus reducing the amount of data and still find sequencesof events that logically form a whole.

2. Frequency patterns are much more common than the first type of clus-ters. How common they are exactly is difficult to state at the moment.We presume that the size of the program, i.e. the number of classesand methods, plays a crucial role. Programs in which certain actionsare performed frequently also form better candidates for detecting fre-quency patterns. Both Tomcat and Fujaba fall into this category. Fromour experiences with the two case studies presented here, our predic-


Figure 8.12 shows a mural view of the trace in the interval 44 000 till 54000. This mural view is produced by Duploc [Ducasse et al., 1999], a tool fordetecting duplicated code. In short, this technique plots a point every time aduplicate line in the event trace is found. Logically, the diagonal (from topleft to bottom right) always contains such a dot. However, it becomes moreinteresting when you can see other lines and/or patterns in it: this points toactual duplication.

Figure 8.12: Duploc output of part of the trace (event interval 44 000 to 54000).

tions are that of the full event trace, some 70% of the events can becatalogued as belonging to a detected cluster. This number soundsreasonable, but is nevertheless perhaps not optimal. A full 100%, how-ever, can in our opinion never be reached because of the necessary “gluecode” between components of a large software system.

3. The experiment in which we used a scenario with a highly repetitivenature learned us that it is quite easy to spot functionality when using

8.5. DISCUSSION 89

our heuristic. A groups of methods working together for reaching acommon goal leave behind a very characteristic frequency pattern.

8.5.1 Connection with hypothesis

Now that we have the results from our experiment, we want to see how theresults we have obtained relate to the hypothesis we set out in Section 8.3.1.We reprise our 4-part hypothesis and discuss how the hypothesis matchesand diverges from the results we have obtained.

1. The majority of the found clusters will in fact be frequency patterns.The evidence from our two case studies does indeed point towards thefact that the frequency patterns are more numerous than then regionswith near zero dissimilarity value.

2. Enlarging the window size introduces noise in the frequency signatures.A large window size makes the analysis-step more efficient, becausethere is less data for the end-user to go through. In general, frequencypatterns are also easier to distinguish, but when going for a windowsize that is too large, frequency patterns can sometimes disappear inthe visualization.

3. Shrinking the window size introduces noise on the results.When the windows size is set too small, patterns appear in the visu-alization that are not really there when browsing the actual trace. Assuch, very small windows sizes (e.g. window size 2) should be avoided.

4. Regions in which a certain action is repeated become easily discernible.As evidenced by Figures 8.9 and 8.11 this is true for the repetition ofrespectively use case scenarios and internal functions.

8.5.2 Connection with the research questions

In Section 8.1.2 we set out three research questions that we hoped to be ableto answer within this research-track. We will recapitulate on these researchquestions, before going over to the actual discussion of them:

1. Can we use the relative execution frequency to distinguish tightly col-laborating methods or procedures in a trace?

2. Can we make a visual representation of the execution trace that is at atime scalable and allows to identify these tightly collaborating entities?

3. Is it possible to use this visualization to help the end user navigatethrough the trace and let him/her skip parts of the trace that aresimilar or identical?

We believe that the visualization we have presented makes it possible todistinguish tightly collaborating entities. The best evidence that we have for


this claim is Figure 8.9, which visualizes an operation being performed on aself-implemented linked list. The highly repetitive nature of the heartbeatvisualization typically points to tightly collaborating entities of execution.Figure 8.11 on the other hand is interesting, because the heartbeat visualiza-tion is characterized by 8 “valleys”. These 8 valleys correspond to the 8-timerepetition of a specific use-case. The fact that the repetition of a use-casescenario is visible as a valley in the heartbeat visualization, points in thedirection that the frequency of execution of the entities participating in thatuse-case have a very similar frequency of execution. Similar frequencies leadto low(er) dissimilarity measures, which in turn are visualized as valleys.

Furthermore, the example of the linked list indicates that it is possible toidentify repetition within a trace at the micro-level, while the 8 valley exam-ple shows that the repetition of end-user functionality can be distinguishedin the visualization at the macro-level.

8.5.3 Open questions

After performing our case studies, some open questions remain.We have not established an ideal window size, as it proved to be related

to the size and structure of the program. More research however can be spentin determining a window size that is acceptable for a wide range of programs.

A second open question is the dissimilarity measure used. Although theEuclidian distance is the most commonly used distance metric, it is notperhaps the best one for our type of experiment [Fraley and Raftery, 1998,Kaufman and Rousseeuw, 1990]. Future experiments with different distancemetrics should bring clarity here.

Part IV

Industrial experiences

91

Chapter 9

Industrial case studies

The outcome of any serious research can only be to make two questionsgrow where only one grew before.

—Thorstein Veblen

Talking about how important scalability is when performing dynamic analysis-based techniques for program comprehension does not mean too much withoutactually performing it on a large-scale case study. In this chapter we report onsuch a large scale case study. Both the coupling-based and the frequency-basedapproach that we have introduced earlier on will now be tried upon a large-scale industrial application. Besides presenting the results of the techniquesthat we have introduced earlier on, we are also reporting on some commonpitfalls that occur when working in a legacy environment and more specificallyon some difficulties to enable dynamic analysis in such an environment.

9.1 Motivation

As we already mentioned in Chapter 1 this research was carried out withinthe ARRIBA research project. This generic research project has a user-committee that is populated by industrial partners to ensure the industrialapplicability of the research done by the academic partners. As such, we hadthe opportunity to validate our research within an industrial legacy environ-ment.

In particular, the webmining heuristic and frequency spectrum analysisthat we introduced in Chapter 5 and 8 respectively, could now be validated in

93

94 CHAPTER 9. INDUSTRIAL CASE STUDIES

an industrial context. Both techniques were initially fine-tuned and validatedusing open-source case studies. Using open-source case studies for our initialexperiments allowed us to (1) ensure repeatability of the experiments forthe scientific community and (2) prepare this industrial experiment withoutburdening the industrial partners in our research project too much during thedevelopment of the heuristics. Now however, we could validate our techniquesin an industrial setting.

When considering this opportunity we established 4 goals for this researchtrack, namely:

1. We want to show the industrial relevance of the research conducted.2. We can validate whether the techniques that were developed in the

context of object-oriented software would still function correctly in aprocedural context.

3. Due to the sheer size of industrial applications, we want to ensure thescalability of the proposed techniques.

4. Perform a validation with real-life developers, instead of with documen-tation left behind by the developers. This allows for a more interactiveapproach and also for feedback loops that lead back into the researchand development of these techniques.

This chapter will report on our findings with regard to the 4 goals we setout.

This work has been carried out in collaboration with Bram Adams andKris De Schutter from the University of Ghent, Belgium. Both Bram andKris are also active in the ARRIBA research project.

9.2 Industrial partner

The industrial partner that we cooperated with in the context of this researchexperiment is Koninklijke Apothekersvereniging Van Antwerpen (KAVA)1.Kava is a non-profit organization that groups over a thousand Flemish phar-macists. While originally set up to safeguard the interests of the pharmaceu-tical profession, Kava has evolved into a service-oriented provider offering avariety of services to their pharmacist members. Amongst these services isa tarification service; tarification is determining the price a patient pays forhis/her medication based on his/her medical insurance situation. Once theprice to be paid has been established through tarification, the patient paysthe pharmacist the share of the price that is not covered by the insurance, af-ter which the pharmacist makes a claim for the other share from the insuranceinstitution through Kava. As such they act as a financial and administrative

1http://www.kava.be/ (In English: The Royal Pharmacists Association of Antwerp)


go-between between the pharmacists and the national healthcare insuranceinstitutions.

Kava was among the first in its industry to realize the need to automatethis complex (tarification) process, and they have taken it on themselves todeliver this service to their members. Some 10 years ago, they developed asuite of applications written in non-ANSI C for this purpose. This suite car-ries the name ICA, an acronym for the Dutch Informatica Centrum Apotheek,which can be translated into “pharmacy information processing center”.

Due to successive changes in the healthcare regulation, but also due totechnology changes, the IT department at Kava is very much aware thatrefactoring and reengineering applications is an almost constant necessity.

Furthermore, during their recent migration from UnixWare for Linux theyneeded to make their application-suite ANSI-C compliant. Over the courseof this migration effort, it was noted that documentation of the applicationswas outdated. This provided us with the perfect opportunity to undertakeour experiments.


Applying dynamic analysis entails the collection of run-time data. Whencollecting this data in a new environment, a number of technical or processrelated choices need to be made. This section explains some of the choiceswe had to make during the experiment.

9.3.1 Mechanism to collect run-time data

Introduction to aspect orientated programming

Aspect-orientation (AO) is a relatively new paradigm, grown from the limita-tions of Object Orientation (OO) [Kiczales et al., 1997], and a fortiori thoseof older paradigms. It tries to alleviate the problem of the “tyranny of thedominant decomposition” by proposing a solution to deal with crosscuttingconcerns, i.e. concerns which cannot be cleanly modularized by adhering totraditional object-oriented design principles. The proposed solution consistsof the introduction of a dedicated module, called an aspect. More formally,aspects allow us to select by quantification (through pointcuts) which eventsin the flow of a program (join points) interest us, and what we would havehappen at those points (advice). Hence we can ‘describe’ what some concernmeans to an application and have the aspect-weaver match the pointcuts tothe join points and insert the advice at the appropriate place(s).


Thusfar, we only mentioned OO environments and that is also the direc-tion AOP research was heading until recently. Nevertheless, it is importantto recognize that crosscutting concerns have been in existence for many yearswithout adequate solutions. This situation precedes the advent of object ori-entation and as such, deploying AOP solutions in legacy environments, seemsa good idea. This was the basic premiss of the work carried out by Kris DeSchutter and Bram Adams from the University of Ghent, who, in the frameof the ARRIBA project, developed Cobble [Lammel and De Schutter, 2005]and Aspicere2 [Zaidman et al., 2006a], AOP frameworks for Cobol and Crespectively.

Why AOP?

Generating a trace in an industrial legacy environment is far from trivial.For our experiments, several constraints were in place:

C1 The semantics of the original applications should remain intact.C2 We do not want to go into the original source code before applying our

tools. I.e. the tools should be applicable to the source code “as is”.Otherwise, we would require knowledge of what is in the sources, andthis is exactly what we are trying to recover.

C3 The tools should be deployable in other environments (operating sys-tems, platforms, compilers, . . . ), so that performing other case studiesor making the tools readily available to a wider audience should be noproblem.

C4 The existing build hierarchy should remain in place, with only minimalalterations. To refactor the build system, considerable knowledge of itscurrent internals is needed, but again this is lacking.

AOP offers some interesting solutions to these constraints. Furthermore,because Aspicere, the AOP solution we used, was built to work in legacyenvironments, it offers additional solutions that help overcome the constraintswe previously set out.

1. Constraint C1 can be overcome by carefully writing the advice bodyof the tracing aspect, so that one can be assured that the originalsemantics of the target application remain unaltered. In our particularcase a tracing aspect, which outputs information when entering andexiting a procedure, was needed. This advice preserves the originalsemantics.

2. The base program on which the aspect-oriented solution is applied, isunaware of any changes. The AOP pointcut construct allows to quantify

2Aspicere is freely available from http://users.ugent.be/ ˜badams/aspicere/


where to insert blocks of advice code. This obliviousness guarantees thesatisfaction of constraint C2.

3. Aspicere is built as a preprocessor. Because the aspect-weaver actsbefore the actual C compiler, the result of applying Aspicere on a sourcefile is a new source file, ready to be compiled by the platform-specificC compiler. This approach ensures constraint C3.

4. Considering the choice of a preprocessor architecture, constraint C4can be dealt with in two ways:• Build an ad-hoc tool that scans the makefiles for calls to the com-

piler and adds a call to Aspicere, just before the call to the com-piler. This can be seen as a precursor to an aspect weaver “avant-la-lettre” for makefiles.• Redirect all calls to the compiler, e.g. gcc, to a custom-built script

that first calls Aspicere and then does the actual call to gcc. Thissolution is presented in [Akers, 2005].

As we will see later on, the makefiles are characterized by a very het-erogeneous structure, with calls to a variety of different compilers andtools. That is why we opted for the solution of building a simple ad-hoctool that parses the makefiles and adds calls to Aspicere.

Tracing aspect

To collect the trace for this case study, we used two aspects: the one depictedbelow and a variant in which ReturnType is void.

ReturnType around tracing (ReturnType,FileStr) on (Jp):call(Jp,"^(?!.*printf$|.*scanf$).*$")&& type(Jp,ReturnType) && !str_matches("void",ReturnType)&& logfile(FileName) && stringify(FileName,FileStr)

{FILE* fp=fopen(FileStr,"a");ReturnType i;fprintf (fp,"before ( %s in %s ) \n",

Jp->functionName,Jp->fileName); /* call sequence */fflush(fp);i = proceed (); /* continue normal control flow */fprintf (fp,"after ( %s in %s ) \n",

Jp->functionName,Jp->fileName); /* return sequence */fclose(fp);return i;

}



Finding an appropriate execution scenario to perform a dynamic analysissolution is quite often not straightforward. Having a number of developersreadily available to help with this choice is of course of great benefit. There-fore, we went along with the proposal of the developers to trace the so-calledTDFS3 application. The developers often use this application as a final checkto see whether adaptations in the system do not have any unforeseen conse-quences. As such, it should be considered as a functional application, witha real-world purpose delivering the results intended, but also as a form ofregression test.

The TDFS-application produces a digital and detailed invoice of all pre-scriptions for the healthcare insurance institutions. This is the end-stage ofa monthly control- and tariffing process and acts also as a control-procedureas the results are matched against the aggregate data that is collected earlierin the process.

9.3.3 Details of the system under study

Table 9.3.3 provides some facts about the application.

Name “ICA”Number of C modules 407LOC 453 000 (non-comment, non-blank)Build process GNU make, hierarchy consisting of 269 individual makefilesCurrent build platform Linux: vanilla Slackware 10.0Status in use for > 10 years

Table 9.1: System passport

9.4 Results

This section will cover the results we have obtained from applying frequencyspectrum analysis and webmining on the trace we have obtained from runningthe TDFS application according to the execution scenario that was providedto us by the developers.

3TDFS is an acronym for the Dutch Tariferings Dienst Factuur (en) Statistiek(spoor).Freely translated this would be “Tarification Service for Invoices and Statistics” in English.

9.4. RESULTS 99

9.4.1 Experimental setup of the validation phase

For the particular application we considered, TDFS, two developers wereavailable at Kava. From now on we will call them D1 and D2. Both have athorough knowledge of the structure and the inner workings of this particularapplication.

Before we discussed our findings of their application with the developers,we interviewed the developers separately. During this interview we used aschema were we asked three questions about the 15 modules belonging to theTDFS application:

1. Which module is the most essential?2. Which module tends to contain most bugs?3. Which module is the hardest to debug?We noted their answers and also asked if there were any particular rea-

sons why they believed a certain module to be important, hard to debug orto contain bugs. This questionnaire was particularly useful to validate theresults we had obtained from the webmining approach.

We then presented the results we had obtained, technique by technique, toeach of the two developers separately and wrote down their reactions, ques-tions and/or suggestions. Afterwards, during a short session we discussedthe results with both developers and highlighted similarities and differencesin their answers and/or reactions.

During the final stage of our experiment there was a feedback loop backto the Kava development team in which we discussed a number of constructsthat could be removed from the code in order to make future maintenanceeasier.

9.4.2 Webmining

Resultset

Table 9.2 lists the results of applying the webmining heuristic to the Kavacase study. The classes (1st column) are ranked according to their hubinessvalue (2nd column). Due to the normalization, all hubiness values lie in therange [0, 1].

Some important facts that can be derived from Table 9.2 are:• the heuristic clearly makes module e tdfs mut1.c stand out.• only 7 out of the 15 modules have a value greater than zero. Modules

with a hubiness value of zero, do not call other modules. As such, im-port coupling for these modules is non-existant4, while export coupling

4Import coupling measured within the full ICA project. Import coupling could exist


Module Valuee tdfs mut1.c 0.814941tdfs mut1 form.c 0.45397tdfs bord.c 0.397726tdfs mut2.c 0.164278tools.c 0.164278io.c 0.12548csrout.c 0.0321257tarpargeg.c 0csroutines.c 0UW strncpy.c 0td.ec 0cache.c 0decfties.c 0weglf.c 0get request.c 0

Table 9.2: Results of the webmining technique

levels are moderate to high.• the 4 modules that are specific to the TDFS application show up in the

4 highest ranked places.

Discussion with developers

D1 mentioned e tdfs mut1.c and tdfs mut2.c as being the most essentialmodules for the TDFS application. io.c and cache.c are also importantfrom a technical point of view, but are certainly not specific to the TDFSapplication, as they are used by many other applications of the system. D1

was actually surprised at the fact that cache.c was not catalogued as beingmore important. csrout.c and csroutines.c are difficult to debug, butthey have only once had to change some details in these file in a time periodof 10 years.

D2 clearly ranks the e tdfs mut1.c module as being the most impor-tant and most complicated module: it contains most of the business logic.tdfs mut2.c makes a summary of the operations carried out by e tdfs

mut1.c and checks the results generated by e tdfs mut1.c. tdfs mut1

form.c is mainly responsible for building up an interface for the end-user,while tdfs bord.c is concerned with formatting the output.

with regard to external libraries.

9.4. RESULTS 101

Discussion

As such, the opinions of D1 and D2 are indeed very similar. D1 rankse tdfs mut1.c and tdfs mut2.c as being most important, D2 points toe tdfs mut1.c as being the most important module.

The resultset of our own technique (see Table 9.2) clearly ranks e tdfs

mut1.c as being the most important module in the system. Furthermore, allmodules that are specific to this application appear at the top of the ranking.

Drawbacks – threats to validity

From the resultset of this case study, we noted two drawbacks:

• Classes or modules that are containers, i.e. data-structures with anumber of operations defined on them, are often ranked very low byour heuristic. This can be explained by the fact that these modules areoften self-contained, i.e. they do not rely on other classes or modules todo their work. As a consequence, these classes often have a high levelof export coupling and a low level of import coupling. The webminingalgorithm reacts to this by attributing these classes with a low hubinessvalue and as such, a low ranking amongst other (non-container type)classes.These properties explain why cache.c – a caching data-structure –which was expected to rank higher according to D1, is ranked quitelow.• This particular case actually also serves as a counterexample. Our

heuristic places e tdfs mut1.c, tdfs mut1 form.c, tdfs bord.c andtdfs mut2.c at the top of the ranking. It are exactly those four mod-ules that are specific to the TDFS application, so a simple analysis ofnaming conventions would have sufficed in this particular case.

9.4.3 Frequency analysis

Due to the huge size of the event trace (90GB ≈ 4.86× 108 procedure calls),the visualization we presented in Chapter 8, did not scale up to this hugeamount of data. Therefore, we opted for a slightly different solution. Westill use frequency of execution as the underlying model, but summarize theresults before visualizing.

A fragment of the result is shown in Figure 9.1, the full resultset can befound in Appendix B. Figure 9.1 depicts three “frequency clusters”. Eachcluster shows the total execution frequency, and the procedures that fall intothis frequency interval. Different kinds of boxes can be perceived, to indi-


28580e_tdfs_mut1::ReadCache

cache::Init_Periodecache::memcpy

29986io::InitMyData

io::isopen

6093357tdfs_mut2::UW_atoi

UW_strncpy::atoi

(a) 100% cohesion (b) > 50% cohesion (c) ≤ 50% cohesion

Figure 9.1: Three frequency clusters from the TDFS application

cate the level of cohesion within a frequency cluster: a box with a full line(Figure 9.1.c) indicates that ≤ 50% of the methods in the cluster come fromthe same module, a dashed line (Figure 9.1.b) indicates total cohesion as allprocedures belong to the same module. A dotted line (Figure 9.1.a) mean-while indicates a level of cohesion within the frequency-cluster of between 50and 100%.

In total, 237 unique procedures were executed during the scenario. Ofthese, 160 could be clustered into 25 frequency-clusters (these can be found inAppendix B. In other words, 67.5% of the procedures could be catalogued inclusters. When considering the cohesion of each of these frequency-clusters,we have the following distribution: two of these clusters had a full line,i.e. they did not show cohesion. 12 had a dashed line, meaning that allprocedures within a frequency-cluster originated from a single module, whilethe 11 others had a dotted line, also indicating a strong level of cohesion.

This technique provides an easy way to find procedures that share com-mon goals, because they are related through their frequency of execution.Furthermore, it allows to easily audit the system when it comes to cohesion.

Discussion with the developers.

D1 immediately remarked that one of the two frequency clusters with a fullline, i.e. a cluster with a limited degree of cohesion, was actually a wrapperconstruction they had hastily constructed when performing the migrationfrom UnixWare to Linux.

The clusters found did not surprise the developers either.

Discussion

For our particular case study, 48% of the clusters were found to be fullycohesive. These fully cohesive clusters are accountable for 20% of the proce-dures. 44% were found to be strongly cohesive; these clusters contain 49%

9.5. PITFALLS 103

of the total number of procedures. The largest non-cohesive cluster had afrequency of execution of 1, consisting mainly out of procedures with initial-ization functionality. The other non-cohesive cluster was the one that caughtD1’s attention for containing wrapper functionality.

As such, we can conclude that the system is actually well-structured, asmost clusters were cohesive and these account for 70% of all procedures.

9.5 Pitfalls

This section describes some unexpected experiences we had while performingour dynamic analyses in the legacy context we described in Section 9.2. Someof these experiences seem to be closely related with the usage of AOP for col-lecting our traces, but we have strong indications that other trace-collectionmechanisms, e.g. AST rewriting techniques as presented by Akers [Akers,2005], suffer from similar problems when applied in similar conditions.

This section describes some of our experiences and how we coped withthem.

9.5.1 Adapting the build process

The Kava application uses make to automate the build process. Historically,all 269 makefiles were hand-written by several developers, not always us-ing the same coding-conventions. During a recent migration operation fromUnixWare to Linux, a significant number of makefiles has been automaticallygenerated with the help of automake5. Although a sizeable portion of the269 makefiles are now generated by automake and thus have a standardizedstructure, a number of makefiles still have a very heterogeneous structure, atypical situation in (legacy) systems.

We built a primitive tool, which parses the makefiles and makes the nec-essary adaptations so that our AOP solution Aspicere is applied on eachsource-code file, before this file is compiled. A typical before and after ex-ample of the necessary makefile modifications is shown in Figures 9.2 and9.3. However, due to the heterogeneous structure of a portion of the make-files, we were not able to completely automate the process, so a number ofmakefile-constructions had to be manually adapted.

The adaptations to be made become more difficult, when e.g. Informixesql preprocessing needs to be done (see Figures 9.4 and 9.5).

5Automake is a tool that automatically generates makefiles starting from configurationfiles. Each generated makefile complies to the GNU Makefile standards and coding style.See http://sources.redhat.com/automake/.


$(CC) -c -o file.o file.c

Figure 9.2: Original makefile.

$(CC) -E -o tempfile.c file.ccp tempfile.c file.caspicere -i file.c -o file.c \

-aspects aspects.lst$(CC) -c -o file.o file.c

Figure 9.3: Adapted makefile.

.ec.o:$(ESQL) -c $*.ecrm -f $*.c

Figure 9.4: Original esql makefile.

.ec.o:$(ESQL) -e $*.ecchmod 777 *cp ‘ectoc.sh $*.ec‘ $*.ec$(ESQL) -nup $*.ec $(C_INCLUDE)chmod 777 *cp ‘ectoicp.sh $*.ec‘ $*.ecaspicere -verbose -i $*.ec -o \

‘ectoc.sh $*.ec‘ \-aspects aspects.lst

$(CC) -c ‘ectoc.sh $*.ec‘rm -f $*.c

Figure 9.5: Adapted esql makefile.

Our tool takes only a few seconds to go over the 269 makefiles and makethe necessary alterations. Detecting where exactly our tool failed throughmakefile code inspections took several hours and even some build cycles werelost because of remaining errors in the makefiles.

This tool, however primitive, can be seen as an AOP solution for makefiles“avant-la-lettre”. We used simple string-matching to detect places where thecompiler was called and inserted an extra call to Aspicere before the actualcall to the compiler.

9.5.2 Legacy issues

... impacting quality

Even though Kava recently migrated from UnixWare to Linux, some remainsof the non-ANSI implementation are still visible in the system. In non-ANSI C, method declarations with empty argument list are allowed. Actualdeclaration of their arguments is postponed to the corresponding methoddefinitions. As is the case with ellipsis-carrying methods, discovery of theproper argument types must happen from their calling context. Because thistype-inferencing is rather complex, it is not fully integrated yet in Aspicere.

9.5. PITFALLS 105

Instead of ignoring the whole base program, we chose to“skip” (as yet) un-supported join points, introducing some errors in our measurements. To bemore precise, we advised 367 files, of which 125 contained skipped join points(one third). Of the 57015 discovered join points, there were only 2362 filteredout, or a minor 4 percent. This is likely due to the fact that in a particularfile lots of invocations of the same method have been skipped during weaving,because it was called multiple times with the same or similar variables. Thiswas confirmed by several random screenings of the code. These screeningsalso showed that there is no immediate threat to the validity of this particularexperiment (as the skipped join points were not located in files that belongedto the TDFS package). Nevertheless, similar situations in other cases couldimpact the validity of the resultset.

... impacting performance

Another fact to note is that we constantly opened, flushed and closed thetracefile, certainly a non-optimal solution from a performance point of view.Normally, Aspicere’s weaver transforms aspects into plain compilation mod-ules and advice into ordinary methods of those modules. So, we could gethold of a static file pointer and use this throughout the whole program. How-ever, this would have meant that we had to revise the whole make-hierarchyto link these uniques modules in. Instead, we added a “legacy” mode to ourweaver in which advice is transformed to methods of the modules part of theadvised base program. This way, the make-architecture remains untouched,but we lose the power of static variables and methods.

9.5.3 Scalability issues

Compilation

A typical compile cycle of the original application consisting of 407 C modules(453 KLOC in total) takes around 15 minutes6. With the introduction of theAOP solution into the build process, the compile cycle now looks like:

1. Preprocess2. Weave with Aspicere3. Compile4. Link

While the original compile cycle for the whole system took 15 minutes, thenew cycle lasts around 17 hours. The reason for this substantial increasein time can be attributed to several factors, one of which may be the time

6Timed on a Pentium IV, 2.8GHz running Slackware 10.0


needed by the inference engine for matching up advice and join points. Thereis also evidence that a lot of backtracking takes place, but the currently usedProlog engine [Denti et al., 2001] does not process this in an optimal way.

Running the program

Not only the compilation was influenced by our aspect weaving process. Alsothe running of the application itself. The scenario we used (see Section 9.3.2),normally runs in about 1.5 hours. When adding our tracing advice, it took7 hours due to the frequent file I/O.

Tracefile volume

The size of the logfile also proved problematic. The total size is around90GB, however, the linux 2.4 kernel Kava is using was not compiled withlarge file support. We also hesitated from doing this afterwards because ofthe numerous libraries used throughout the various applications and fear fornasty pointer arithmetic waiting to grab us. As a consequence, only files upto 2GB could be produced. So, we had to make sure that we split up thelogfiles in smaller files. Furthermore, we compressed these smaller logfiles, toconserve some diskspace.

Once compressed with gzip, the 90GB of data was reduced to approx-imately 620MB. 90GB of trace data stands for approximately 9.72 × 108

events (calls and exits), which means that there are approximately 4.86×108

procedure calls.

Effort analysis

Table 9.3 gives an overview of the time-effort of performing each of the anal-yses. As you can see, even a trouble-free run (i.e. no manual adaptation ofmakefiles necessary) would at least take 29 hours, when performing one anal-ysis, and would take slightly under 40 hours when performing all analysesconsecutively. Of course, some speed-ups can be obtained from running thetwo analyses in parallel.

9.6 Discussion

This chapter reports on a research track where we applied our recently de-veloped techniques in an industrial legacy C context. This section presentsa discussion on the goals of this research track and highlights strengths andweaknesses of the approach we have taken.

9.6. DISCUSSION 107

Task Time PreviouslyMakefile adaptations 10 s –Compilation 17h 38min 15minRunning 7h 1h 30minFrequency analysis 5h –Webmining 10h –

Total 39h 38min 10s 1h 45min

Table 9.3: Overview of the time-effort of the analyses.

Goals

In Section 9.1 we set out 4 goals for this research track. During this discus-sion, we will again focus on each of these goals and see whether we achievedwhat we set out to do.

• Industrial relevance of the research conducted.Our experiments do not give conclusive evidence on whether there isstrong industrial relevance for knowing which classes or modules areessential during early stages of program comprehension. The developersat Kava did point out however, that such information is probably usefulwhen instructing a new co-worker, who is unfamiliar with the project.• Validation of techniques in procedural programming context.

As a third case study for both the frequency analysis and the webminingtechnique, the general tendencies that we had found with the previouscase studies is confirmed, even though this is the first case study in aprocedural programming language context.• Scalability of proposed techniques.

Scalability is often seen as the major stumbling block when performingdynamic analysis [Larus, 1993]. We have taken special care duringthe design and development of our techniques to make them scalable.With regard to this aspect, we discuss the frequency analysis and thewebmining techniques separately:

– During the case study the webmining technique scaled more thanadequately to the challenge of providing a resultset for an indus-trial medium-scale legacy application. In absolute terms, the 10hour wait before the results are available is long, but we also haveto consider that just reading the 90 GB of data from file takesa long time, even without performing any computation. Further-more, we are also aware of a number of possible optimizations thatwe could perform on the algorithm, but at this point these remain


untested.– With regard to the frequency analysis technique, we were dis-

appointed to see that even though the basic frequency analysistechnique works finely, generating the visualization proved un-successful due to the immense size of the trace. The scalabilityproblem we encountered has two distinct facets to it: on the onehand, the visualization could not be visualized in its entirety, dueto memory problems, while on the other hand, even if we couldhave visualized it, the resulting visualization would have becomeoverwhelmingly large, which would have negatively impacted thecognitive scalability.

• Validation of resultset with real-life developers.Having several developers (two in our case) cooperate, implies thatseveral opinions exist regarding the importance of certain classes ormodules in a system. However, in this case study, the general direc-tion was clear and there were no major discrepancies in the views ofboth developers. Furthermore, the modules they pointed out as beingmost important, were the same modules that our webmining techniqueranked at the top.

Results

From the resultsets we obtained from our dynamic analysis experiments, wecan conclude that:• The webmining approach results in a ranking of modules according to

their importance from a program comprehension point of view. Inter-views with the developers fully confirm the results that our heuristicdelivered. The only false negative we could note, was a container classthat the developers deemed important, but was judged as being unim-portant by our technique. This is due to the low to non-existent levelof import coupling for this particular module.• The frequency analysis approach allowed to easily audit the system’s

internal structure. We found that most of the modules are (strongly)cohesive, which indicates that the structure is well balanced and reuseis a definite possibility. The developers agreed with our views and toldus that many modules are frequently reused.

Technical limitations

As a vehicle to perform our dynamic analysis, we used Aspicere, which al-lowed us to use the clean and non-intrusive, yet powerful mechanism of As-

9.6. DISCUSSION 109

pect Orientation to trace the entire application.As a clear downside of our approach, we should note the effort it takes

to perform the entire analysis. If no problems are encountered, the entireanalysis we described takes around 39 hours, for a system that should beconsidered as medium-scale. As such, we acknowledge that we should im-prove the efficiency of our tools.


Part V

Concluding parts

111

Chapter 10

Related Work

Programmers have become part historian, part detective, and partclairvoyant.

— Thomas A. Corbi

“Program understanding: Challenge for the 90s” is the title of a paperpublished in 1990 by Thomas Corbi in the IBM Systems Journal [Corbi,1990]. In this paper he reminds us that a significant gain in efficiency canbe attained when the program comprehension process can be stimulated. Nowonder then, that over the last few years, program comprehension has gainedmuch attention and has been — and still is — an active area of research. Inthis chapter, we will discuss some of these past and current research efforts.

10.1 Dynamic analysis

Dynamic analysis techniques come in many forms and usually they also allhave slightly different goals. Some of the techniques focus on retrieving fea-tures from execution traces, others aim at performing a clustering of a staticrepresentation of a software system with the help of dynamic information. Inthis section we discuss a variety of dynamic analysis based techniques, whichalmost all share a common theme: helping the user to better understandthe software system, by presenting the user with an acceptable amount ofinformation.

Greevy Greevy is working on a solution whereby the features of a softwaresystem can be correlated to classes and vice versa. To do this, she uses

113

114 CHAPTER 10. RELATED WORK

feature-traces, which are execution traces that are the result of executing avery specific feature (or a very small set of features). When a number of thesefeatures trace are available, she is able to classify classes as being responsiblefor only one feature, a set of features or all features available in the system.Vice versa, she also catalogs features that are demanding services from onemethod or class, a number of methods or classes or all methods or classes inthe system [Greevy and Ducasse, 2005].

Hamou-Lhadj Hamou-Lhadj has proposed several solutions to overcomethe scalability issues surrounding dynamic analysis. One of the solutions hehas been working on is to automate the selection process of which classes (orother entities) to include in the execution trace and the subsequent analysis.Where in our experiments we explicitly did not trace any classes that arepart of the standard library, the solution provided by Hamou-Lhadj wouldautomate this up to a certain point. The basic idea is to detect those classesand entities in the software system that can be classified as utility componentsand subsequently remove them from the analysis process. The basic meansby which these utility components are detected is a fan-in analysis [Hamou-Lhadj et al., 2005].

Another solution Hamou-Lhadj has presented is trace summarization. Hedescribes how a number of concepts that are also used when summarizing nat-ural text can be helpful when trying to summarize execution traces, e.g. byextracting important methods based on naming conventions [Hamou-Lhadjand Lethbridge, 2006].

Furthermore, Hamou-Lhadj also advocates the use of a meta-model tostore dynamic runtime information from object-oriented systems, which istermed the Common Trace Format or CTF [Hamou-Lhadj, 2005b, Hamou-Lhadj and Lethbridge, 2004].

Mancoridis et al Based on the clustering tool Bunch, which was devel-oped by Mancoridis et al [Mancoridis et al., 1999], Gargiulo and Mancoridisdeveloped Gadget, a tool to cluster the entities of a software system basedon dynamically obtained data [Gargiulo and Mancoridis, 2001]. The goalof using Gadget is to make the often complex structure of software systemsmore explicit and easier to understand.

Gadget builds up a dynamic dependency graph, a graph in which classesare represented as nodes and calling relationships as edges. These callingrelationships are extracted from the obtained execution trace. On this graphthen, they apply Bunch, which delivers a clustering of the original graph.This approach is very similar to what our webmining approach does with the

10.1. DYNAMIC ANALYSIS 115

compacted call graph (see Chapter 5).

Because of the similarities between our own approach and the approachfrom Gadget, we did an initial experiment to see whether the clusters thatwere identified by Bunch had their counterparts in the resultset of our web-mining approach. To our surprise, there was no clear match between thesetwo resultsets and as such, we see the further analysis of these two techniquesas an important direction for future research.

Richner et al Richner’s and Ducasse’s approach is based on storing bothstatically and dynamically obtained information from a software system in alogic database [Richner, 2002]. First, static and dynamic facts of an object-oriented application are modeled in terms of logic facts, after which queriescan be formulated to obtain information about the system. As a case studythey use HotDraw implemented in Smalltalk [Richner and Ducasse, 1999].

In order to overcome scalability problems, they advocate an iterative useof the technique. This means that when having obtained a (high-level) viewof the software through queries, the results of this view are used to restrictthe tracing operation to the parts of the software that you are trying to focuson. This allows for a refinement of the views obtained from using the tool.

The Collaboration Browser tool that they describe is explicitly targeted atrecovering collaborations between classes, without having to rely on visual-ization techniques [Richner and Ducasse, 2002]. Its focus is on understandingthe system in the small, rather than understanding the system as a whole.The underlying model that is built around dynamically gathered informa-tion, is queried using pattern matching criteria in order to find classes andinteractions of interest.

Systa To overcome the scalability issues of analyzing large execution tracesthrough variations of Jacobson interaction diagrams [Jacobson, 1995], Systa usesthe SCED environment to synthesize state diagrams from interaction dia-grams [Systa, 2000b,Systa, 2000a]. State diagrams, which are a variation onUML statechart diagrams, allow to observe the total behavior of an object,while interaction diagrams focus more on sequential interactions betweenseveral objects.

Another research path Systa follows is the combination of static and dy-namic information [Systa, 1999]. One of the observations made is that whencombining static and dynamic information, one has to choose very early onwhich of these two sources of information will be the base layer and whichapproach will be used to augment this base layer. The experiment describeddeals with Fujaba, which is reverse engineered with the help of the Rigi static


reverse engineering environment [Wong et al., 1995] and is augmented withdynamic information [Systa, 1999].

10.2 Visualization

Using dynamic analysis for program comprehension purposes means that youhave to work your way around the often sizeable sets of dynamic informationthat get collected during a program run. A possible solution to overcomethe size of these sets of information is through a well thought-out visualiza-tion. This section describes some of the most common visualization-orientedresearch ideas.

De Pauw et al De Pauw et al are known for their work on IBM’s Jinsight,a tool for exploring a program’s run-time behavior visually [De Pauw et al.,2001]. Jinsight is a research prototype that first emerged from IBM’s T.J.Watson Research Center in 1998. Since then a number of its features havebeen adopted in the Hyades plugin for the Eclipse Java IDE. In 2005 thisplugin was absorbed into the Eclipse Test & Performance Tools Platform(TPTP).

One of the main program comprehension applications of Jinsight (andits derivatives) is the generation of Jacobson interaction diagrams [Jacobson,1995], similar to UML’s sequence diagrams. Even though this visualizationis much more scalable then previous solutions to visualize execution traces,there is still room for improvement. A more scalable visualization is proposedby De Pauw with the concept of the execution pattern notation [De Pauwet al., 1998]. See Figure 10.1 for an example.

Other possible uses of Jinsight and its derivatives are: following the be-havior of multi-threaded object oriented programs, detecting memory leaks,detecting hotspots, etc.

Jerding et al Jerding et al have developed a tool called ISVis (InteractiveScenario Visualizer) [Jerding et al., 1997]. One of its possible usages is tohelp alleviate the architecture localization problem, or the problem of findingthe exact location in a system’s architecture where a specific enhancementcan be inserted into the system [Jerding and Rugaber, 1997].

ISVis generates views of execution traces that are similar to Jacobsoninteraction diagrams [Jacobson, 1995]. The tool environment however allowsto make a more compact visualization by e.g. grouping together classesin package-like structures, by removing utility classes, etc. Furthermore, itallows to visually identify similar (sub)scenarios in execution traces and has

10.2. VISUALIZATION 117

C DA B

D

A B C

A

(a) (b)

Figure1:Sim pleinteraction diagram (a)and itscorresponding execution pattern (b)

Figure2:Sim pleexecution pattern

update m essage. Then he can brow se throughview s,like Figure 3,depicting the w aysdifferentBus-Observer objectshandled theupdate m es-sage.In this exam ple, an initial m essage update

is sent to the black Bus-Observer object(ID=762). Itrespondsby sending a m essageno-tify pending to the purple EClassModel ob-ject(ID=761). N ext,the Bus-Observer objectssendsa Phrase m essageto theorange Annobusobject(ID=758).Finally,itsendsanotherm essageto thepurpleEClassModel object(ID=761).

3.1 CollapsingandExpandingSubtrees

N ow suppose the program m erw ants to exploretheresponseofthisupdate m essageto theBus-Observer object(ID=762)in m oredetailthanFig-

ure3provides.YoucanseethatthedepictionoftheAnnobus object(ID=758)and thelow er(i.e.,later)depiction of the EClassModel object (ID=761)have a beveled border,m aking them look raisedas opposed to flat. A raised rectangle indicatesthattheobjectreacted to thestim ulusby sendingoneorm orem essages.Clickingonaraisedobjectrevealsthem essage(s)

thatthisobjectsentalong w ith the object(s)thatreceived them essage(s).Afterclicking on subse-quentraised objects,w egetaview liketheoneinFigure4.(N otetheself-invocationoftheEClass-Model object.)Allobjectsappearflatnow,m ean-ing thatno hidden m essagesrem ain. Ifw e don’tw anttoseeapartoftheexecution,w ecan collapsepartoftheview byclickingonaflatobject,therebyhiding itsresponses. The objectw illnow appearbeveled asbefore.Thissim pletechniqueofexpandingand collaps-

ing isa helpfulnavigation tool.The program m ercan selectively drilldow n to any levelofdetailw ithoutbeing flooded w ith inform ation. M ore-over,them etaphorisrem iniscentofhow encapsu-lation w orksin object-oriented program s:detailsofhow an objectperform sagiven taskarehiddenunlesssoughtexplicitly.

3.2 Changing Context

A program m erislikelytoasktw oquestionsatthispoint: “W ho sentthe initialm essage update toBus-Observer 762?” and “W hatw asthecontextofthatm essage?” The system can take us up alevelto view thesenderofthisupdate m essage.

Figure 10.1: Simple interaction diagram (a) and its corresponding executionpattern (b) [De Pauw et al., 1998]

limited capabilities to recognize these similar scenarios automatically throughpattern matching. Another feature that helps improve scalability is a muralview that portrays global overviews of scenarios [Jerding and Stasko, 1998].For completeness sake, we mention that the approach of ISVis is actually ahybrid approach, wherein static and dynamic analysis are combined.

Ducasse et al Ducasse, Lanza and Bertuli describe how they use poly-metric views, as used in the CodeCrawler tool [Lanza, 2003], to visualize acondensed set of run-time information [Ducasse et al., 2004]. Using a con-densed set of information means that there is no need to keep and analyzethe complete trace, rather their approach is based on collecting measure-ments during the execution, such as the number of invocations, the numberof object creations, the number of used classes/methods, etc.

With these run-time measurements, they are able to provide insight intoa system in a relatively lightweight manner. They present their results inthree different polymetric views, namely:

• Instance usage view: shows which classes are instantiated and usedduring the system’s execution.• Communication interaction view: shows the (strength of) communica-

tion between classes of a system during its execution.• Creation interaction view: shows the number of instances a class creates

and the number of instances each class has.

Reiss and Renieris Reiss and Renieris describe several techniques to en-code program executions [Reiss and Renieris, 2001]. Their main concern is


to offer a way to compact the trace. They use basis mechanism such asrun-length encoding and grammar-based encoding to shorten the trace.

Another approach they discuss is interval compactation. For this ap-proach, they break the execution trace into a small set of intervals (for ex-ample 1024 events) and then do a simple analysis within each of the intervalsto highlight what the system is doing at that point. Although they remainquite vague about the inner-workings of their algorithm, the resulting visu-alization and the ideas behind it have a ressemblance to our own heartbeatvisualization that we use in combination with our frequency spectrum anal-ysis.

Walker et al Walker et Al describe a visualization that has a temporalcomponent to it [Walker et al., 2000] [Walker et al., 1998]. The visualiza-tion consists of a temporally-ordered series of pictures, so-called cells, eachdetailing information about a corresponding point in time in the executionof the system being analyzed.

10.3 Industrial experiences

Wong et al Wong et al describe their experiences with re-documentingindustrial legacy applications with the help of their Rigi static reverse engi-neering environment [Wong et al., 1995]. They have applied Rigi on COBOL,C and PL/AS1 systems. The PL/AS experiment described in [Wong et al.,1995] exhibits a close resemblance with our own experiments, as the goalsand setting were very similar: a large scale industrial legacy application with2M LOC and 1300 compilation units (here written in a proprietary language,not ANSI-C). Because of the large scale of the application, they also focussedon delivering scalable reverse-engineering techniques. One of the most signif-icant lessons they learned from their experiments is that in-the-large designdocuments describing the architecture of the software system’s current statecan be very beneficial for building up understanding of a software system andmaintaining it. Furthermore, they have followed a path similar to ours whenit comes to validating their approach, namely by involving the developersand maintainers and checking whether the mental models from the devel-opers and maintainers concur with the information they retrieved. Anothersimilarity with our own experiences is the effort it takes to perform theiranalysis, although we must remark here that the available computing powerin 1995 is likely to be different from that available 10 years later.

1Programming Language/Advanced Systems (IBM).

Chapter 11

Conclusion

I hope you become comfortable with the use of logic without being deceivedinto concluding that logic will inevitably lead you to the correct conclusion.

—Neil Armstrong

This chapter presents our conclusions with regard to the heuristics wehave developed and the experiments we have undertaken. Furthermore, itprovides a number of possible directions for future research.

11.1 Conclusion

In our hypothesis (see Chapter 1) we state that within the run-time infor-mation space two axes, namely dynamic coupling and relative frequency ofexecution, are good candidates to develop heuristics for program comprehen-sion purposes. We now discuss our experiences for each of these two axesseparately.

11.1.1 Dynamic coupling

The heuristic that uses dynamic coupling measures, allows to identify themost need-to-be-understood classes in a system. Detecting these classes veryearly on in the program comprehension process allows the end user to di-rect his/her attention towards these classes and start exploring the softwaresystem from there.

We experimented with a number of different dynamic coupling metricsand also compared direct and indirect coupling solutions. To simulate this

119

120 CHAPTER 11. CONCLUSION

indirect coupling, we used the HITS webmining algorithm. Our experimentshave shown that taking indirect coupling into account delivers the best re-sults.

Using publicly available extensive documentation of two open source casestudies, we have performed an intrinsic evaluation of this approach. Thevalidation has learned us that we are able to recall 90% of the classes markedas need-to-be-understood by the developers, while maintaining a precision of60%. These results are completely satisfactory, although in an ideal situation,we would have liked to have an even higher level of precision.

We have also applied this technique on an industrial legacy C environ-ment, where the approach again delivered good results in the sense that themodules that the developers designated as being important were ranked veryhigh in the resultset of our approach.

With regard to scalability, our main point of focus, we have a somewhatmixed image. Our approach allows to process huge (e.g. 90 GB) event traces,but of course, this takes time to process (in our industrial case study 10hours). We believe that our approach can still be optimized, but we have tobe realistic in the fact that processing gigabytes of event traces will alwaystake time, as will the collection of the execution trace. On the cognitivescalability front, we are very much pleased that our resultset is concise, whilestill being relatively precise.

As a control experiment to see whether the effort of using dynamic infor-mation is indeed beneficial, we have experimented with applying the samebasic technique on statically collected coupling data. While a slight improve-ment in round-trip-time could be noted, we were also confronted with a dropin recall from 90% dynamically to 50% statically. Precision fell similarlyfrom 60% to 8%. This clearly indicates that using dynamic analysis, with itsgoal oriented strategy, pays dividends when used for program comprehensionpurposes.

11.1.2 Relative frequency of execution

Through the heartbeat visualization that we have obtained with building aheuristic around the concept of relative frequency of execution we have beenable to make an abstract visualization of the execution of a software system.

On a macro-level scale our visualization allows to identify parts in a tracewhere the same — or similar — functionality is executed. As an exam-ple we have drawn a simple class hierarchy in Fujaba — one of our casestudies — that consists out of 8 classes. The resulting heartbeat visualiza-tion clearly contains 8 valleys at the points in time were these 8 classes aredrawn. Through the knowledge that one of these valleys in the visualiza-

11.2. OPPORTUNITIES FOR FUTURE RESEARCH 121

tion is conceptually linked with the execution of the particular functionality,the end-user can focus on studying the execution trace of only one of theapplications of that functionality (instead of focussing on all 8).

On a micro-level scale on the other hand, we have been able to distinguishthe traversal of a self-implemented linked list in the heartbeat visualization.The complete traversal of the linked list in our example requires around 10000method exchanges, which, thanks to the visualization, can now be quicklyskipped because of the high degree of similarity.

As such, both on a macro and on a micro scale, the visualization allowsto discern the repetitive calling of specific functionality, thereby allowing theuser to quickly go over these similar regions in the execution trace (or theresulting interaction diagram visualization).

With regard to scalability, the open source case studies we performedhave shown that the technique is fairly scalable. In the case of the industrialcase study however, where we needed to visualize 90 GB of trace data, wewere unable to visualize the trace in its entirety. We did however recover thebasic underlying mechanism to produce the frequency clusters visualization.This has allowed us to make a quick assessment of the industrial application’sstructure.

11.2 Opportunities for future research

Aspect based slicing We see a clear opportunity for future research ina concept that we call “aspect based slicing”. Based on our research foridentifying the important classes in a system, we want go one step further byalso identifying the key collaborations among these important classes and thecollaborations that these important classes have with other tightly-relatedclasses.

To accomplish this, we are thinking of using aspect-orientation and morespecifically the cflow pointcut, which would allow to obtain a very selectivetrace of all methods that belong to the important classes and their immediatecollaborators.

Static analysis and hybrid approaches Another path that we want topursue in the future is to try and improve the effectiveness of our currentapproaches, by also taking into account static information. This would leadto a hybrid approach, where the dynamic analysis results are augmented bystatic information.

122 CHAPTER 11. CONCLUSION

Bunch As we have already indicated in Chapter 10, a thorough comparisonof our approach and that of the Bunch clustering tool is also a viable researchdirection.

Part VI

Appendices

123

Appendix A

HITS webmining

A.1 Introduction

The HITS webmining algorithm we introduced in Chapter 5 is said to beconvergent [Kleinberg, 1999]. This property of convergence implies that thealgorithm will find a stable set of hub and authority nodes in a graph in a lim-ited number of iterations. This appendix shows the proof of this convergencecriterion, taken from Kleinberg [Kleinberg, 1999].

A.2 Setup and proof

Consider the following setting, taken directly from the domain of webmining.

Consider a collection V of hyperlinked pages as a directed graph G =(V, E): the nodes correspond to the pages and a directed edge (p, q) ∈ Eindicates the presence of a link from p to q. Each page p is associated with anonnegative authority weight x and a nonnegative hub weight y. Weview pages with larger x-values and y-values as being “better” authoritiesand hubs respectively.

We add an invariant that the weights of each type are normalized so theirsquares sum to 1: ∑

p∈S

(x)2 = 1 ;∑p∈S

(y)2 = 1

The mutually reinforcing relationship between hubs and authorities is definedwith the help of two operations on the weights, these operations are denotedby J and O. Given weights { x}, {y}, the J operation updates the

125

126 APPENDIX A. HITS WEBMINING

x-weights as follows:

x ←∑

q:(q,p)∈E

y<q> (A.1)

The O operation then, which updates the y-values, is defined as follows:

y ←∑

q:(p,q)∈E

x<q> (A.2)

Now, to find the desired equilibrium values for the weights, one can applythe J and O operations in an alternating fashion, and see whether a fixedpoint is reached. Indeed, we can now state a version of our basic algorithm.We represent the set of weights {x} as a vector x with a coordinate foreach node in the graph G; analogously, we represent the set of weights {y}as a vector y.

Iterate(G,k)G: a collection of n linked pagesk: a natural numberLet z denote the vector (1, 1, 1, ..., 1) ∈ Rn.Set x0 := z.Set y0 := z.For i = 1, 2, ..., k

Apply the J operation to (xi−1, yi−1),obtaining new x-weights xí.

Apply the O operation to (xí, yi−1),obtaining new y-weights yí.

Normalize xí, obtaining xi.Normalize yí, obtaining yi.

EndReturn (xk, yk).

To address the issue of how best to choose k, the number of iterations,we first show that as one applies Iterate with arbitrarily large values of k,the sequences of vectors {xk} and {yk} converge to fixed points x∗ and y∗.

Let M be a symmetric n× n matrix. An eigenvalue of M is a number λwith the property that, for some vector ω, we have Mω = λω. The set of allsuch ω is a subspace of Rn, which we refer to as the eigenspace associatedwith λ; the dimension of this space will be referred to as the multiplicity ofλ. It is a standard fact that M has at most n distinct eigenvalues, each ofthem a real number, and the sum of their multiplicities is exactly n. Wewill denote these eigenvalues by λ1(M), λ2(M), . . . , λn(M), indexed in orderof decreasing absolute values, and with each eigenvalue listed a number of

A.2. SETUP AND PROOF 127

times equal to its multiplicity. For each distinct eigenvalue, we choose anortonormal basis of its eigenspace; considering the vectors in all these bases,we obtain a set of eigenvectors ω1(M), ω2(M), . . . , ωn(M) that we can indexin such a way that ωi(M) belongs to the eigenspace of λi(M).

For the sake of simplicity, we will make the following technical assumptionabout all the matrices we deal with:

|λ1(M)| > |λ2(M)| (A.3)

When this assumption holds, we refer to ω1(M) as the principal eigenvec-tor, and all other ωi(M) as nonprincipal eigenvectors. When the assumptiondoes not hold, the analysis becomes less clean, but it is not affected in anysubstantial way.

We now prove that the Iterate procedure converges as k increases arbi-trarily.

Theorem A.2.1. The sequences x1, x2, x3,. . . and y1, y2, y3,. . . converge (tolimits x∗ and y∗, respectively).

Proof. Let G = (V, E), with V = {p1, p2, . . . , pn}, and let A denote theadjecency matrix of the graph G; the (i, j)th entry of A is equal to 1 if(pi, pj) is an edge of G, and is equal to 0, otherwise. One easily verifies thatthe J and O operations can be written x← AT y and y ← Ax, respectively.Thus, xk is the unit vector in the direction of AT (AAT )k−1z, and yk is theunit vector in the direction of (AAT )kz.

Now, a standard result of linear algebra (see Kleinberg [Kleinberg, 1999])states that if M is a symmetric n×n matrix, and v is a vector not orthogonalto the principal eigenvector ω1(M), then the unit vector in the direction ofMKv converges to ω1(M) as k increases without bound. Also (as a corollary),if M has only nonnegative entries, then the principal eigenvector of M hasonly nonnegative entries.

Consequently, z is not orthogonal to ω1(AAT ), and hence the sequence{yk} converges to a limit y∗. Similarly, one can show that if λ1(A

T A) 6= 0 (asdictated by the assumption A.3), then AT z is not orthogonal to ω1(A

T A). Itfollows that the sequence {xk} converges to a limit x∗.

The proof of Theorem A.2.1 yields the following additional result (in theabove notation).

Theorem A.2.2. (SUBJECT TO ASSUMPTION A.3). x∗ is the principaleigenvector of AT A, and y∗ is the principal eigenvector of AAT .

128 APPENDIX A. HITS WEBMINING

In our experiments, we find that the convergence of Iterate is quiterapid; one essentially always finds that k = 20 is sufficient for the c largestcoordinates in each vector to become stable, for values of c in the range thatwe use. Of course, Theorem A.2.2 shows that one can use any eigenvectoralgorithm to compute the fixed points x∗ and y∗; we have stuck to the aboveexposition in terms of the Iterate procedure for two reasons. First, it empha-sizes the underlying motivation for our approach in terms of the reinforcingJ and O operations. Second, one does not have to run the above processof iterated J/O operations to convergence; one can compute weights {x}and {y} by starting from any initial vectors x0 and y0, and performing afixed bounded number of J and O operations.

Appendix B

Frequency analysis results forTDFS

20544829UW_strncpy::strlen

UW_strncpy::strncpy

6093357tdfs_mut2::UW_atoi

UW_strncpy::atoi

903149tdfs_mut2::strncmp

tdfs_mut2::bereken_modulustdfs_mut2::fmod

28580e_tdfs_mut1::ReadCa

checache::Init_Periode

cache::memcpy

29986io::InitMyData

io::isopen

13961e_tdfs_mut1::E_Berek_Remgeld_Specialiteit

cache::ConverteerMutualiteitscodecache::ConverteerPatientencategorie

13259cache::fd_MyData

cache::isread

129

130 APPENDIX B. FREQUENCY ANALYSIS RESULTS FOR TDFS

11952e_tdfs_mut1::CreateFak

e_tdfs_mut1::ReadDemut1e_tdfs_mut1::CatApoMut

1272e_tdfs_mut1::ReadFirstFakRece_tdfs_mut1::RewindTempFak

650tdfs_mut1_form::sqli_curs_locate

tdfs_mut1_form::sqli_slct

642csrout::field_countcsrout::form_fields

640tdfs_mut1_form::system

tdfs_mut1_form::write_formtdfs_mut1_form::sqli_curs_fetch

csrout::newwincsrout::keypad

639tdfs_mut1_form::start_curses

csrout::initscrnonlcsrout::raw

csrout::noechocsrout::wclear

tdfs_mut1_form::wrefreshtdfs_mut1_form::write_msg

csrout::qiflushcsrout::wbordercsrout::wmove

csrout::waddnstrcsrout::wrefreshcsrout::delwin

637e_tdfs_mut1::isclosee_tdfs_mut1::Close

io::isrewcurr

2881weglf::fgetsweglf::feofweglf::fputs

131

87tdfs_mut2::Systemtdfs_mut2::system

80tdfs_mut2::NegativeCodedStrToInt

tdfs_mut2::strlentdfs_mut2::CloseRemoveMut

13csrout.c::new_field

csrout.c::set_field_backcsrout.c::set_field_forecsrout.c::set_field_padcsrout.c::set_field_just

8tdfs_mut2::Write90Rectdfs_mut2::CreateDestintdfs_mut2::Write10Rectdfs_mut2::GetDatetdfs_mut2::time

tdfs_mut2::localtime_rtdfs_mut2::malloctdfs_mut2::strftimecsrout::field_opts

csrout::set_field_optstdfs_mut1_form::get_request

get_request::nodelayget_request::wgetch

get_request::TranslateKeyget_request::FormMacros

6csrout::set_fieldtype_arg.csrout::set_field_typecsroutines::waddnstr

csrout::atoicsrout::strcpy

5tdfs_mut2::isopentdfs_mut2::isstarttdfs_mut2::isread

132 APPENDIX B. FREQUENCY ANALYSIS RESULTS FOR TDFS

4tdfs_mut2 ::ReadIndcijfers

tdfs_mut2::cisam_maak_indcijfers_key_1tdfs_mut2::ldlong

csroutines::cntrwaddstrcsroutines::strlen

2tdfs_mut1_form::sqli_curs_close

tdfs_mut1_form::sqli_preptdfs_mut1_form::sqli_curs_decl_dynm

tdfs_mut1_form::sqli_curs_opencsroutines::wbordercsroutines::cs_hlinecsroutines::wattr_oncsroutines::wattr_off

csroutines::read_form

1to big

Bibliography

[Akers, 2005] Akers, R. L. (2005). Using build process intervention to accom-modate dynamic instrumentation of complex systems. In Proceedings ofthe 1st International Workshop on Program Comprehension through Dy-namic Analysis (PCODA’05). Technical Report 2005-12, Department ofMathematics & Computer Science, University of Antwerp.

[Andrews, 1998] Andrews, J. (1998). Testing using log file analysis: tools,methods, and issues. In Proceedings of the 13th International Conferenceon Automated Software Engineering (ASE’98), page 157. IEEE ComputerSociety.

[Arisholm et al., 2004] Arisholm, E., Briand, L., and Foyen, A. (2004). Dy-namic coupling measurement for object-oriented software. IEEE Transac-tions on Software Engineering, 30(8):491–506.

[Ball, 1999] Ball, T. (1999). The concept of dynamic analysis. InESEC/FSE-7: Proceedings of the 7th European software engineering con-ference held jointly with the 7th ACM SIGSOFT international symposiumon Foundations of software engineering, pages 216–234. Springer-Verlag.

[Bennett, 1995] Bennett, K. (1995). Legacy systems: Coping with success.IEEE Software, 12(1):19–23.

[Biggerstaff et al., 1993] Biggerstaff, T. J., Mitbander, B. G., and Webster,D. (1993). The concept assignment problem in program understanding. InProceedings of the 15th international conference on Software Engineering(ICSE ’93), pages 482–498. IEEE Computer Society.

[Brant et al., 1998] Brant, J., Foote, B., Johnson, R. E., and Roberts, D.(1998). Wrappers to the rescue. In Proceedings ECOOP ’98, volume 1445of LNCS, pages 396–417. Springer-Verlag.

133

134 BIBLIOGRAPHY

[Briand et al., 1999] Briand, L. C., Daly, J. W., and Wust, J. K. (1999). Aunified framework for coupling measurement in object-oriented systems.IEEE Transactions on Software Engineering, 25(1):91–121.

[Brin and Page, 1998] Brin, S. and Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1-7):107–117.

[Brodie and Stonebraker, 1995] Brodie, M. and Stonebraker, M. (1995). Mi-grating Legacy Systems: Gateways, Interfaces & The Incremental Ap-proach. Morgan Kaufmann.

[Chidamber and Kemerer, 1994] Chidamber, S. R. and Kemerer, C. F.(1994). A metrics suite for object oriented design. IEEE Transactionson Software Engineering, 20(6):476–493.

[Chikofsky and Cross II, 1990] Chikofsky, E. J. and Cross II, J. H. (1990).Reverse engineering and design recovery: A taxonomy. IEEE Software,7(1):13–17.

[Corbi, 1990] Corbi, T. A. (1990). Program understanding: Challenge forthe 90s. IBM Systems Journal, 28(2):294–306.

[de Oca and Carver, 1998] de Oca, C. M. and Carver, D. L. (1998). Iden-tification of data cohesive subsystems using data mining techniques. InProceedings of the 14th International Conference on Software Maintenance(ICSM’98), pages 16–23. IEEE Computer Society.

[De Pauw et al., 2001] De Pauw, W., Jensen, E., Mitchell, N., Sevitsky, G.,Vlissides, J., and Yang, J. (2001). Visualizing the execution of java pro-grams. In Diehl, S., editor, Software Visualization: International Seminar,Dagstuhl Castle, Germany, May 20-25, 2001, volume 2269 / 2002 of Lec-ture Notes in Computer Science, page 151. Springer.

[De Pauw et al., 1998] De Pauw, W., Lorenz, D., Vlissides, J., and Wegman,M. (1998). Execution patterns in object-oriented visualization. In Proceed-ings of the 4th USENIX Conference on Object-Oriented Technologies andSystems (COOTS).

[Demeyer et al., 2003] Demeyer, S., Ducasse, S., and Nierstrasz, O. (2003).Object-Oriented Reengineering Patterns. Morgan Kaufmann.

[Denti et al., 2001] Denti, E., Omicini, A., and Ricci, A. (2001). tuProlog:A light-weight Prolog for Internet applications and infrastructures. In

BIBLIOGRAPHY 135

Practical Aspects of Declarative Languages, volume 1990 of LNCS, pages184–198. Springer-Verlag.

[Ducasse et al., 2004] Ducasse, S., Lanza, M., and Bertuli, R. (2004). High-level polymetric views of condensed run-time information. In Proceedingsof the 8th European Conference on Software Maintenance and Reengineer-ing (CSMR2004), pages 309–318. IEEE Computer Society.

[Ducasse et al., 1999] Ducasse, S., Rieger, M., and Demeyer, S. (1999). Alanguage independent approach for detecting duplicated code. In Yang, H.and White, L., editors, Proceedings of the 15th International Conferenceon Software Maintenance (ICSM’99), pages 109–118. IEEE Computer So-ciety.

[Eisenbarth et al., 2001] Eisenbarth, T., Koschke, R., and Simon, D. (2001).Aiding program comprehension by static and dynamic feature analysis. In17th International Conference on Software Maintenance (ICSM’01), pages602–611. IEEE Computer Society.

[El-Ramly et al., 2002] El-Ramly, M., Stroulia, E., and Sorenson, P. (2002).From run-time behavior to usage scenarios: an interaction-pattern min-ing approach. In Proceedings of the eighth ACM SIGKDD internationalconference on Knowledge discovery and data mining, pages 315–324. ACMPress.

[Fraley and Raftery, 1998] Fraley, C. and Raftery, A. E. (1998). How manyclusters? which clustering method? answers via model-based cluster anal-ysis. The Computer Journal, 41(8):578–588.

[Gamma et al., 1995] Gamma, E., Helm, R., Johnson, R., and Vlissides, J.(1995). Design Patterns: Elements of Reusable Object-Oriented Software.Addison–Wesley.

[Gargiulo and Mancoridis, 2001] Gargiulo, J. and Mancoridis, S. (2001).Gadget: A tool for extracting the dynamic structure of java programs. InProceedings of the Thirteenth International Conference on Software Engi-neering & Knowledge Engineering (SEKE’01), pages 244–251.

[Gibson et al., 1998] Gibson, D., Kleinberg, J. M., and Raghavan, P. (1998).Inferring web communities from link topology. In UK Conference on Hy-pertext, pages 225–234.

[Gold et al., 2004] Gold, N., Knight, C., Mohan, A., and Munro, M. (2004).Understanding service-oriented software. IEEE Software, 21(2):71–77.

136 BIBLIOGRAPHY

[Greevy and Ducasse, 2005] Greevy, O. and Ducasse, S. (2005). Correlatingfeatures and code using a compact two-sided trace analysis approach. InProceedings of the 9th European Conference on Software Maintenance andReengineering (CSMR 2005), pages 314–323. IEEE Computer Society.

[Gschwind et al., 2003] Gschwind, T., Oberleitner, J., and Pinzger, M.(2003). Using run-time data for program comprehension. In Proceed-ings of the 11th IEEE International Workshop on Program Comprehension(IWPC’03), pages 245–250. IEEE Computer Society.

[Hamou-Lhadj, 2005a] Hamou-Lhadj, A. (2005a). The concept of trace sum-marization. In Proceedings of the 1st International Workshop on ProgramComprehension through Dynamic Analysis, pages 43–47. Technical Report2005-12, Department of Mathematics & Computer Science, University ofAntwerp.

[Hamou-Lhadj, 2005b] Hamou-Lhadj, A. (2005b). Techniques to Simplify theAnalysis of Execution Traces for Program Comprehension. PhD thesis,University of Ottawa, Canada.

[Hamou-Lhadj et al., 2005] Hamou-Lhadj, A., Braun, E., Amyot, D., andLethbridge, T. (2005). Recovering behavioral design models from execu-tion traces. In Proceedings of the 9th European Conference on SoftwareMaintenance and Reengineering (CSMR’05), pages 112–121. IEEE Com-puter Society.

[Hamou-Lhadj and Lethbridge, 2004] Hamou-Lhadj, A. and Lethbridge, T.(2004). A metamodel for dynamic information generated from object-oriented systems. Electr. Notes Theor. Comput. Sci., 94:59–69.

[Hamou-Lhadj and Lethbridge, 2006] Hamou-Lhadj, A. and Lethbridge, T.(2006). Summarizing the content of large traces to facilitate the under-standing of the behaviour of a software system. In Proceedings of the 14thInternational Conference on Program Comprehension (ICPC’06), pages181–190. IEEE Computer Society.

[Hamou-Lhadj et al., 2004] Hamou-Lhadj, A., Lethbridge, T. C., and Fu, L.(2004). Challenges and requirements for an effective trace exploration tool.In Proceedings of the 12th International Workshop on Program Compre-hension (IWPC’04), pages 70–78. IEEE Computer Society.

[Jacobson, 1995] Jacobson, I. (1995). Object-Oriented Software Engineering:a Use Case driven Approach. Addison–Wesley.

BIBLIOGRAPHY 137

[Jahnke and Walenstein, 2000] Jahnke, J. H. and Walenstein, A. (2000). Re-verse engineering tools as media for imperfect knowledge. In Proceedingsof the Seventh Working Conference on Reverse Engineering (WCRE’00),pages 22–31. IEEE Computer Society.

[Jerding and Rugaber, 1997] Jerding, D. and Rugaber, S. (1997). Using vi-sualization for architectural localization and extraction. In Proceedingsof the Fourth Working Conference on Reverse Engineering (WCRE’04),page 56. IEEE Computer Society.

[Jerding and Stasko, 1998] Jerding, D. and Stasko, J. T. (1998). The infor-mation mural: A technique for displaying and navigating large informa-tion spaces. IEEE Transactions on Visualization and Computer Graphics,4(3):257–271.

[Jerding et al., 1997] Jerding, D. F., Stasko, J. T., and Ball, T. (1997). Vi-sualizing interactions in program executions. In Proceedings of the 19thinternational conference on Software Engineering (ICSE’97), pages 360–370, New York, NY, USA. ACM Press.

[Kaufman and Rousseeuw, 1990] Kaufman, L. and Rousseeuw, P. (1990).Finding groups in data. Wiley-Interscience.

[Kiczales et al., 1997] Kiczales, G., Lamping, J., Menhdhekar, A., Maeda,C., Lopes, C., Loingtier, J.-M., and Irwin, J. (1997). Aspect-orientedprogramming. In Proceedings European Conference on Object-OrientedProgramming, volume 1241 of LNCS, pages 220–242. Springer-Verlag.

[Kleinberg, 1999] Kleinberg, J. M. (1999). Authoritative sources in a hyper-linked environment. Journal of the ACM, 46(5):604–632.

[Lakhotia, 1993] Lakhotia, A. (1993). Understanding someone else’s code:Analysis of experiences. Journal of Systems and Software, 23(3):269–275.

[Lammel and De Schutter, 2005] Lammel, R. and De Schutter, K. (2005).What does Aspect-Oriented Programming mean to Cobol? In AOSD ’05:Proceedings of the 4th international conference on Aspect-oriented softwaredevelopment, pages 99–110, New York, NY, USA. ACM Press.

[Lanza, 2003] Lanza, M. (2003). Object-Oriented Reverse Engineering —Coarse-grained, Fine-Grained, and Evolutionary Software Visualization.PhD thesis, University of Berne.

138 BIBLIOGRAPHY

[Larus, 1993] Larus, J. R. (1993). Efficient program tracing. IEEE Com-puter, 26(5):52–61.

[Lehman, 1998] Lehman, M. (1998). Software’s future: Managing evolution.IEEE Software, 15(1):40–44.

[Lehman and Belady, 1985] Lehman, M. and Belady, L. (1985). Programevolution: processes of software change. Academic Press Professional, Inc.,San Diego, CA, USA.

[Lethbridge and Anquetil, 1998] Lethbridge, T. C. and Anquetil, N.(1998). Experiments with coupling and cohesion metrics in a largesystem. Working paper, School of Information Technology andEngineering, also see http://www.site.uottawa.ca/ tcl/papers/ met-rics/ExpWithCouplingCohesion.html.

[Linthicum, 1999] Linthicum, D. S. (1999). Enterprise Application Integra-tion. Addison-Wesley.

[Lukoit et al., 2000] Lukoit, K., Wilde, N., Stoweel, S., and Hennessey, T.(2000). Tracegraph: Immediate visual location of software features. InProceedings of the 16th International Conference on Software Maintenance(ICSM’00), pages 33–39. IEEE Computer Society.

[Mancoridis et al., 1999] Mancoridis, S., Mitchell, B. S., Chen, Y.-F., andGansner, E. R. (1999). Bunch: A clustering tool for the recovery andmaintenance of software system structures. In Proceedings of the 15thInternational Conference on Software Maintenance (ICSM’99), page 50.IEEE Computer Society.

[Mens, 2000] Mens, K. (2000). Automating architectural conformance check-ing by means of logic meta programming. PhD thesis, Vrije UniversiteitBrussel.

[Mock, 2003] Mock, M. (2003). Dynamic analysis from the bottom up. InICSE 2003 Workshop on Dynamic Analysis (WODA’03).

[Moise and Wong, 2003] Moise, D. L. and Wong, K. (2003). An industrialexperience in reverse engineering. In Proceedings of the 10th Working Con-ference on Reverse Engineering (WCRE’03), pages 275–284. IEEE Com-puter Society.

[Pennington, 1987] Pennington, N. (1987). Stimulus structures and mentalprepresentations in expert comprehension of computer programs. CognitivePsychology, 19:295–341.

BIBLIOGRAPHY 139

[Reiss and Renieris, 2001] Reiss, S. P. and Renieris, M. (2001). Encodingprogram executions. In Proceedings of the 23rd International Conferenceon Software Engineering (ICSE01), pages 221–230. IEEE Computer Soci-ety.

[Renieris and Reiss, 1999] Renieris, M. and Reiss, S. P. (1999). AL-MOST: Exploring program traces. In Proc. 1999 Workshop on NewParadigms in Information Visualization and Manipulation, pages 70–77.http://citeseer.nj.nec.com/renieris99almost.html.

[Richner, 2002] Richner, T. (2002). Recovering Behavioral Design Views: aQuery-Based Approach. PhD thesis, University of Berne.

[Richner and Ducasse, 1999] Richner, T. and Ducasse, S. (1999). Recoveringhigh-level views of object-oriented applications from static and dynamicinformation. In Proceedings of the 15th International Conference on Soft-ware Maintenance (ICSM’99), pages 13–22. IEEE Computer Society.

[Richner and Ducasse, 2002] Richner, T. and Ducasse, S. (2002). Using dy-namic information for the iterative recovery of collaborations and roles. InProceedings of the 18th International Conference on Software Maintenance(ICSM’02), pages 34–43. IEEE Computer Society.

[Robillard, 2005] Robillard, M. P. (2005). Automatic generation of sugges-tions for program investigation. SIGSOFT Software Engineering Notes,30(5):11–20.

[Robillard et al., 2004] Robillard, M. P., Coelho, W., and Murphy, G. C.(2004). How effective developers investigate source code: an exploratorystudy. IEEE Transactions on Software Engineering, 30(12):889–903.

[Sayyad-Shirabad et al., 1997] Sayyad-Shirabad, J., Lethbridge, T. C., andLyon, S. (1997). A little knowledge can go a long way towards program un-derstanding. In Proceedings of the 5th International Workshop on ProgramComprehension (IWPC’97), pages 111–117. IEEE Computer Society.

[Selby and Basili, 1991] Selby, R. W. and Basili, V. R. (1991). Analyzingerror-prone system structure. IEEE Transactions on Software Engineering,17(2):141–152.

[Smith and Korel, 2000] Smith, R. and Korel, B. (2000). Slicing event tracesof large software systems. In Automated and Algorithmic Debugging.

140 BIBLIOGRAPHY

[Sneed, 1996] Sneed, H. (1996). Encapsulating legacy software for use inclient/server systems. In Proceedings of the 3rd Working Conference onReverse Engineering (WCRE ’96), pages 104–119. IEEE Computer Soci-ety.

[Sneed, 2004] Sneed, H. (2004). Program comprehension for the purpose oftesting. In Proceedings of the 12th International Workshop on ProgramComprehension (IWPC’04), pages 162–171. IEEE Computer Society.

[Sneed, 2005] Sneed, H. (2005). An incremental approach to system replace-ment and integration. In Proceedings of the Ninth European Conferenceon Software Maintenance and Reengineering (CSMR’05), pages 196–206.IEEE Computer Society.

[Spinellis, 2003] Spinellis, D. (2003). Code Reading: The Open Source Per-spective. Addison-Wesley.

[Stevens et al., 1974] Stevens, W., Meyers, G., and Constantine, L. (1974).Structured design. IBM Systems Journal, 13(2):115–139.

[Storey et al., 2000] Storey, M.-A. D., Wong, K., and Muller, H. A. (2000).How do program understanding tools affect how programmers understandprograms? Science of Computer Programming, 36(2–3):183–207.

[Systa, 1999] Systa, T. (1999). On the relationships between static anddynamic models in reverse engineering java software. In Proceedings ofthe Sixth Working Conference on Reverse Engineering (WCRE’99), pages304–313. IEEE Computer Society.

[Systa, 2000a] Systa, T. (2000a). Static and Dynamic Reverse EngineeringTechniques for Java Software Systemds. PhD thesis, University of Tam-pere.

[Systa, 2000b] Systa, T. (2000b). Understanding the behavior of java pro-grams. In Proceedings of the Seventh Working Conference on ReverseEngineering (WCRE’00), pages 214–223. IEEE Computer Society.

[Tahvildari, 2003] Tahvildari, L. (2003). Quality-Drive Object-Oriented Re-engineering Framework. PhD thesis, Department of Electrical and Com-puter Engineering, University of Waterloo, Ontario, Canada.

[Tilley et al., 2005] Tilley, T., Cole, R., Becker, P., and Eklund, P. W.(2005). A survey of formal concept analysis support for software engineer-ing activities. In Stumme, G., editor, Formal Concept Analysis, volume3626 of LNCS, pages 250–271. Springer.

BIBLIOGRAPHY 141

[von Mayrhauser and Vans, 1994] von Mayrhauser, A. and Vans, A. M.(1994). Comprehension processes during large scale maintenance. InProceedings of the 16th International Conference on Software Engineer-ing (ICSE’94), pages 39–48, Los Alamitos, CA, USA. IEEE ComputerSociety.

[von Mayrhauser and Vans, 1995] von Mayrhauser, A. and Vans, A. M.(1995). Program comprehension during software maintenance and evo-lution. IEEE Computer, 28(8):44–55.

[Walker et al., 1998] Walker, R. J., Murphy, G. C., Freeman-Benson, B.,Wright, D., Swanson, D., and Isaak, J. (1998). Visualizing dynamic soft-ware system information through high-level models. In Proceedings of theConference on Object-Oriented Programming, Systems, Languages, andApplications (OOPSLA’98), volume 33 of ACM SIGPLAN Notices, pages271–238. ACM.

[Walker et al., 2000] Walker, R. J., Murphy, G. C., Steinbok, J., and Robil-lard, M. P. (2000). Efficient mapping of software system traces to archi-tectural views. In Proceedings of CASCON, number TR-2000-09, pages31–40. http://citeseer.nj.nec.com/walker00efficient.html.

[Wand and Weber, 1990] Wand, Y. and Weber, R. (1990). An ontologicalmodel of an information system. IEEE Transactions on Software Engi-neering, 16(11):1282–1292.

[Wilde, 1994] Wilde, N. (1994). Faster reuse and maintenance using softwarereconnaissance. Technical report, Technical Report SERC-TR-75F, Soft-ware Engineering Research Center, CSE-301, University of Florida, CISDepartment, Gainesville, FL.

[Wong et al., 1995] Wong, K., Tilley, S. R., Muller, H. A., and Storey, M.-A. D. (1995). Structural redocumentation: A case study. IEEE Software,12(1):46–54.

[Yang et al., 2005] Yang, H. Y., Tempero, E., and Berrigan, R. (2005). De-tecting indirect coupling. In Proceedings of the Australian Software Engi-neering Conference (ASWEC’05), pages 212–221. IEEE Computer Society.

[Yourdon and Constantine, 1979] Yourdon, E. and Constantine, L. L.(1979). Structured Design: Fundamentals of a Discipline of ComputerProgram and System Design. Prentice Hall.

142 BIBLIOGRAPHY

[Zaidman et al., 2006a] Zaidman, A., Adams, B., De Schutter, K., Demeyer,S., Hoffman, G., and De Ruyck, B. (2006a). Regaining lost knowledgethrough dynamic analysis and Aspect Orientation - an industrial experi-ence report. In Proceedings of the 10th Conference on Software Mainte-nance and Reengineering (CSMR’06), pages 89–98. IEEE Computer Soci-ety.

[Zaidman et al., 2005] Zaidman, A., Calders, T., Demeyer, S., andParedaens, J. (2005). Applying webmining techniques to execution tracesto support the program comprehension process. In Proceedings of the9th European Conference on Software Maintenance and Reengineering(CSMR’05), pages 134–142. IEEE Computer Society.

[Zaidman and Demeyer, 2004] Zaidman, A. and Demeyer, S. (2004). Man-aging trace data volume through a heuristical clustering process based onevent execution frequency. In Proceedings of the 8th European Conferenceon Software Maintenance and Reengineering (CSMR’04), pages 329–338.IEEE Computer Society.

[Zaidman et al., 2006b] Zaidman, A., Du Bois, B., and Demeyer, S. (2006b).How webmining and coupling metrics can improve early program compe-hension. In Proceedings of the 14th International Conference on ProgramComprehension (ICPC’06), pages 74–78. IEEE Computer Society.

[Zayour and Lethbridge, 2001] Zayour, I. and Lethbridge, T. C. (2001).Adoption of reverse engineering tools: a cognitive perspective and method-ology. In Proceedings of the 9th International Workshop on Program Com-prehension, pages 245–255. IEEE Computer Society.

Publications

Conference publications (listed chronologically)

• Andy Zaidman and Serge Demeyer.Managing trace data volume through a heuristical clustering processbased on event execution frequencyProceedings of the 8th European Conference on Software Maintenance andReengineering (CSMR2004), pages 329-338, IEEE Computer Society, 2004• Andy Zaidman, Toon Calders, Serge Demeyer and Jan Paredaens.

Applying Webmining Techniques to Execution Traces to Support theProgram Comprehension ProcessProceedings of the 9th European Conference on Software Maintenance andReengineering (CSMR2005), pages 134-142, IEEE Computer Society, 2005• Orla Greevy, Abdelwahab Hamou-Lhadj and Andy Zaidman.

Workshop on Program Comprehension through Dynamic AnalysisProceedings of the 12th Working Conference on Reverse Engineering (WCRE2005), pages 232-232, IEEE Computer Society, 2005• Andy Zaidman.

Scalability Solutions for Program Comprehension through Dynamic Anal-ysisProceedings of the 10th European Conference on Software Maintenance andReengineering (CSMR2006), pages 327-330, IEEE Computer Society, 2006• Andy Zaidman, Bram Adams, Kris De Schutter, Serge Demeyer, Ghis-

lain Hoffman and Bernard De Ruyck. Regaining Lost Knowledge throughDynamic Analysis and Aspect Orientation - An Industrial ExperienceReportProceedings of the 10th European Conference on Software Maintenance andReengineering (CSMR2006), pages 91-102, IEEE Computer Society, 2006• Andy Zaidman, Bart Du Bois and Serge Demeyer. How Webmining

and Coupling Metrics Can Improve Early Program CompehensionProceedings of the 14th International Conference on Program Comprehen-sion (ICPC2006), pages 74-78, IEEE Computer Society, 2006• Andy Zaidman, Orla Greevy and Abdelwahab Hamou-Lhadj.

143

144 BIBLIOGRAPHY

Workshop on Program Comprehension through Dynamic AnalysisAccepted for publication in the proceedings of the 13th Working Conferenceon Reverse Engineering (WCRE 2006), IEEE Computer Society, 2006

Currently submitted work• Bram Adams, Kris De Schutter, Andy Zaidman, Serge Demeyer and

Herman Tromp.Aspect-Enabled Dynamic Analyses for Reverse Engineering Legacy En-vironments – An Industrial Experience ReportSubmitted to a special CSMR issue of the Journal of Systems and Software(JSS) by Elsevier, as an extension to the CSMR 2006 paper.

Scalability Solutions for Program Comprehension Through ... · Scalability Solutions for Program Comprehension Through Dynamic Analysis Andy Zaidman Promotor: prof. dr. Serge Demeyer

Documents