Top Banner
Points-to analysis for program understanding R. Fiutem a, * , P. Tonella a , G. Antoniol a , E. Merlo b,1 a IRST-Istituto per la Ricerca Scientifica e Tecnologica, Povo, Trento, Italy b Department of Electrical and Computer Engineering, Ecole Polytechnique, C.P. 6079, Succ. Centre Ville, Montreal, Que., Canada Received 1 June 1997; accepted 1 April 1998 Abstract Program understanding activities are more dicult for programs written in languages (such as C) that heavily make use of pointers for data structure manipulation, because the programmer needs to build a mental model of the memory use and of the pointers to its locations. Pointers also pose additional problems to the tools supporting program understanding, since they introduce additional dependences that have to be accounted for. This paper extends the flow insensitive context insensitive points-to analysis (PTA) algorithm proposed by Steensgaard, to cover arbitrary combinations of pointer dereferences, array subscripts and field se- lections. It exhibits interesting properties, among which scalability resulting from the low complexity and good performances is one. The results of the analysis are valuable by themselves, as their graphical display represents the points-to links between locations. They are also integrated with other program understanding techniques like, e.g., call graph construction, slicing, plan recognition and architectural recovery. The use of this algorithm in the framework of the program understanding environment CANTO is discussed. Ó 1999 Elsevier Science Inc. All rights reserved. 1. Introduction When programmers approach the task of maintain- ing a software system by altering existing code to correct errors, add new functionalities or upgrade a system, often their first activity is to build a mental model of the system being studied. Maintenance requires this pre- liminary activity in order to understand the organization of a system at dierent levels of abstractions by identi- fying sub-systems, their functionalities, their interac- tions and mapping them into the dierent software artifacts. Unfortunately maintainers are rarely the de- velopers, documentation is often scarce and people who originally wrote the system may no longer be available. Hence, often the only source of documentation about the program is the program itself. Dierent approaches and tools have been proposed to help people in building conceptual models of existing systems, either at the architectural level or at the code level. Ciao [10] and GASE [23] are two environments in which the capability of representing evolving systems, tracing architectural and code evolution, plays an im- portant role. Recently, some works that address the problem of high-level design recovery have been pre- sented [9,21], integrating a framework architectural style representation and a library of recognizers to extract architectural information from source code. Other tools, focused on code analysis, allow program decomposition with slicing: the user can extract functionalities and vi- sualize the impact of maintenance changes on code [11,19,22]. To be eective, a fundamental requirement for a program understanding or maintenance tool, conceived for a given language, is its ability to successfully deal with the full spectrum of the programming language characteristics. However, many widely used modern programming languages, like C/C++, Pascal, Ada, etc. contain features (e.g., structures, pointers and function pointers) which augment their expressivity and oer greater implementation flexibility to software develop- ers, but, in turn, complicate program understanding and maintenance activities. Often dynamic or static locations are accessed through complex chains of dereferences, field selections and array subscripts, which are dicult to map to the actual referenced memory. Furthermore, function pointers may be employed to dynamically select the called function. In such a case the call graph produced by many commonly available tools contains highly inaccurate or even incorrect information, if function pointers are not properly handled [31]. The Journal of Systems and Software 44 (1999) 213–227 * Corresponding author. 1 E-mail: [email protected] 0164-1212/99/$ – see front matter Ó 1999 Elsevier Science Inc. All rights reserved. PII: S 0 1 6 4 - 1 2 1 2 ( 9 8 ) 1 0 0 5 8 - 4
15

Points-to analysis for program understanding -

Feb 17, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Points-to analysis for program understanding - <SDML>

Points-to analysis for program understanding

R. Fiutem a,*, P. Tonella a, G. Antoniol a, E. Merlo b,1

a IRST-Istituto per la Ricerca Scienti®ca e Tecnologica, Povo, Trento, Italyb Department of Electrical and Computer Engineering, Ecole Polytechnique, C.P. 6079, Succ. Centre Ville, Montreal, Que., Canada

Received 1 June 1997; accepted 1 April 1998

Abstract

Program understanding activities are more di�cult for programs written in languages (such as C) that heavily make use of

pointers for data structure manipulation, because the programmer needs to build a mental model of the memory use and of the

pointers to its locations. Pointers also pose additional problems to the tools supporting program understanding, since they introduce

additional dependences that have to be accounted for. This paper extends the ¯ow insensitive context insensitive points-to analysis

(PTA) algorithm proposed by Steensgaard, to cover arbitrary combinations of pointer dereferences, array subscripts and ®eld se-

lections. It exhibits interesting properties, among which scalability resulting from the low complexity and good performances is one.

The results of the analysis are valuable by themselves, as their graphical display represents the points-to links between locations.

They are also integrated with other program understanding techniques like, e.g., call graph construction, slicing, plan recognition

and architectural recovery. The use of this algorithm in the framework of the program understanding environment CANTO is

discussed. Ó 1999 Elsevier Science Inc. All rights reserved.

1. Introduction

When programmers approach the task of maintain-ing a software system by altering existing code to correcterrors, add new functionalities or upgrade a system,often their ®rst activity is to build a mental model of thesystem being studied. Maintenance requires this pre-liminary activity in order to understand the organizationof a system at di�erent levels of abstractions by identi-fying sub-systems, their functionalities, their interac-tions and mapping them into the di�erent softwareartifacts. Unfortunately maintainers are rarely the de-velopers, documentation is often scarce and people whooriginally wrote the system may no longer be available.Hence, often the only source of documentation aboutthe program is the program itself.

Di�erent approaches and tools have been proposedto help people in building conceptual models of existingsystems, either at the architectural level or at the codelevel. Ciao [10] and GASE [23] are two environments inwhich the capability of representing evolving systems,tracing architectural and code evolution, plays an im-portant role. Recently, some works that address the

problem of high-level design recovery have been pre-sented [9,21], integrating a framework architectural stylerepresentation and a library of recognizers to extractarchitectural information from source code. Other tools,focused on code analysis, allow program decompositionwith slicing: the user can extract functionalities and vi-sualize the impact of maintenance changes on code[11,19,22].

To be e�ective, a fundamental requirement for aprogram understanding or maintenance tool, conceivedfor a given language, is its ability to successfully dealwith the full spectrum of the programming languagecharacteristics. However, many widely used modernprogramming languages, like C/C++, Pascal, Ada, etc.contain features (e.g., structures, pointers and functionpointers) which augment their expressivity and o�ergreater implementation ¯exibility to software develop-ers, but, in turn, complicate program understanding andmaintenance activities. Often dynamic or static locationsare accessed through complex chains of dereferences,®eld selections and array subscripts, which are di�cultto map to the actual referenced memory. Furthermore,function pointers may be employed to dynamicallyselect the called function. In such a case the call graphproduced by many commonly available tools containshighly inaccurate or even incorrect information,if function pointers are not properly handled [31].

The Journal of Systems and Software 44 (1999) 213±227

* Corresponding author.1 E-mail: [email protected]

0164-1212/99/$ ± see front matter Ó 1999 Elsevier Science Inc. All rights reserved.

PII: S 0 1 6 4 - 1 2 1 2 ( 9 8 ) 1 0 0 5 8 - 4

Page 2: Points-to analysis for program understanding - <SDML>

For example, in the ®nd program included in the GNU®ndutils version 4.1, the heavy use of functionpointers makes the extracted call graph incomplete andnot very useful, if points-to results are not taken intoaccount.

Pointer analysis algorithms address these problems,providing the programmer with a representation of thereferences between memory locations. The result is auseful view in itself, but is also a preliminary require-ment for successive computations often used in programunderstanding. Such algorithms have traditionally beendeveloped in the framework of optimizing compilers andprogram parallelization and vectorization [14,24,25].Recently they have been investigated in the context ofsoftware maintenance, particularly in problems relatedto slicing [26,40].

Di�erent ¯ow and context sensitive algorithms havebeen developed to obtain accurate results, at the expenseof time and memory complexity [14,33]. The importanceof controlling the time complexity has been investigatedin [5], where a solution based on a ®ne grained contextsensitivity speci®cation has been proposed. The e�ect ofpointer variables and aliasing in symbolic execution isdiscussed in [12], in the context of the combined Cgraph.

In this paper we investigated a very e�cient contextand ¯ow insensitive points-to analysis (PTA) algorithmwhich extends the approach of [37,38]. Extensions dealwith the analysis of arbitrary access paths, i.e., arbitrarycombinations of pointer dereferences, array subscriptsand ®eld selections, which are encountered in expres-sions involving structures or objects. Through this al-gorithm it is possible to build a conservativeapproximation of the set of locations pointed-to by ev-ery other pointer location. Such relations between sets oflocations can be presented to the programmer as aStorage Shape Graph (SSG) [8], allowing the informa-tion to be summarized in a graphical form.

The discussed algorithm is a fundamental componentof the comprehensive system understanding environ-ment called Code and Architecture aNalysis TOol(CANTO) [4] which integrates multiple approaches,views and perspectives, at di�erent levels of abstractions:from ®ne grained information at the code level to themore abstract views at the architectural level. Theseapproaches, developed by the authors and presentedindependently in [16,40,41], are implemented in CAN-TO. CANTO is composed of several sub-systems: afront end to analyze C code, an architectural recoveryenvironment, a ¯ow analysis environment, an environ-ment for graph displaying and a customized editor. Theuser, in a closed loop, can analyze a system, navigatethrough di�erent views by means of a graphical userinterface, generate queries and new views, add and re-move components, sub-systems and code, to accomplishmaintenance tasks.

This paper is organized as follows: Section 2 discussesrelated work in PTA and in its application to programunderstanding. Section 3 presents the basic points-toalgorithm, followed by the proposed extensions to dealwith arbitrary access paths. In Section 4, results of thealgorithm's application on a test suite of public domainC programs are reported. Finally, conclusions are drawnin Section 5. Our study shows that, using the describedalgorithm to analyze pointers, many program under-standing techniques can be extended to large size sys-tems written in modern and common programminglanguages such as C.

2. Related work

In the Section 2.1 an overview of the main PTA al-gorithms is given. A detailed comparison with theclosest work is also made. Section 2.2 is devoted to therelated work in the ®eld of program understanding. Itprovides motivations for having PTA integrated withother program understanding techniques.

2.1. Points-to analysis

Many ¯ow sensitive and context sensitive points-to oralias analysis were investigated, e.g., [14,25,44]. The al-gorithm proposed in [14] has an exponential time com-plexity (in theory and in practice), as it is based on anexplicit invocation graph. The algorithm presented in[44] uses partial transfer functions to summarize alreadyanalyzed functions, in a framework similar to [14]. Itspractical complexity is likely to be polynomial instead ofexponential. A polynomial algorithm for pointer anal-ysis in the framework of alias computation is describedin [25].

The complexity and the performance of the abovementioned algorithms, which are all context and ¯owsensitive, prevent their use for large programs. On thecontrary ¯ow insensitive PTA algorithms (e.g.,[3,35,37,38]) o�er low complexity and good practicalperformance that make them scalable to large softwaresystems. The ¯ow insensitive algorithm presented in [3]has a cubic complexity and performs well even on largeprograms. It is considered (see [35]) an upper bound forthe precision of ¯ow insensitive algorithms, since norestriction on the SSG node fan out is imposed. Thealgorithm in [37] forces the SSG nodes to have a unitaryfan out. As a consequence the complexity of the algo-rithm can be lowered to almost linear, at the expense ofthe accuracy of the results. In fact the targets of apointer are merged and, when such targets are them-selves pointers, their points-to relation contains everycombination of pointers and ®nal locations, generatingpossibly spurious relations. In [35] the fan out of SSGnodes, referred to as the number of categories, becomes

214 R. Fiutem et al. / The Journal of Systems and Software 44 (1999) 213±227

Page 3: Points-to analysis for program understanding - <SDML>

a tunable parameter, so that the two algorithms in [3,41]are obtained as extreme cases. Intermediate values ofsuch parameter give intermediate levels of precision andperformance.

In the next section, an extension of [37] which handlesarbitrary access paths, i.e., arbitrary combinations ofdereference, array subscript and direct or indirect ®eldselection, is presented. In [38] Steensgaard proposes anextension of his basic algorithm [37] to handle structure®elds. Our extensions to [37], developed independentlyand in parallel, exhibit several di�erences. In [38] thenotion of access path is not introduced, and only thesimple cases are described, not their arbitrary combi-nations. Nevertheless, this is a minor di�erence, becauseit would be possible to apply the syntax directed de®-nitions proposed in Section 3.2 in the framework of [38],to cover any arbitrary access path. The main di�erencebetween the two approaches is in the representation ofstructure ®elds. While in [38] ®elds are distinguished bytheir o�set from the beginning of the structure whichcontains them, in this paper ®elds are accessed by name.The approach by [38] is robust with respect to incon-sistent accesses to structure ®elds, based on the size ofthe structure components. In fact the ®eld being refer-enced is obtained by a map which considers the size ofthe o�set. On the contrary in this paper inconsistentaccesses are not handled and the only way to access a®eld is by its name. As a consequence the algorithmproposed in this paper is more accurate in some cases,under the assumption of consistent use of the structures.Consider for example the code in Fig. 1, which sum-marizes the salient features of the example discussedthroughout the paper [38].

Since ®elds c and g of structures s1 and s3 have thesame o�set in the representation of the two structuresand since s1 and s3 are on the same SSG node, loca-tions s1.c and s3.g are not distinguishable by o�set.For this reason they are on the same SSG node in Fig. 1.Consequently some spurious points-to relations aregenerated, namely between s1.c and f1 and betweens3.g and i1, by [38]. If structure ®elds are distin-guished by name, instead of by o�set, the spurious re-lations are not reported and only the two points-to

relations between s1.c and i1 and between s3.g andf1 result. Accessing structure ®elds inconsistently gen-erates non portable code that highly depends on themachine representation of types. For this reason it isseldom used by programmers. In fact, no one of the testcases discussed in Section 4 contains inconsistent ac-cesses based on component size.

In addition this paper discusses a problem that is notconsidered in [38]: given the result of the PTA, its typicaluse is in determining the locations accessed through anaccess path in the code. Such access path may be used inan expression or even to call a function. In Section 3.2 aset of syntax directed de®nitions is presented to deter-mine the locations accessed through an arbitrary accesspath, given the points-to relation.

2.2. Applications to program understanding

Though valuable by itself, the SSG is essential tounderstanding programs written in modern languageswhich allow pointers, pointers to function and poly-morphism. Several analyses, including call graph con-struction, program slicing, plan recognition andarchitecture recovery require both function pointers andvariable locations resolution through the PTA.

The call graph is one of the most popular, useful andwell known structures extracted by software visualiza-tion and program understanding tools [31]. Whenpointers to functions are used, even this fundamentalprogram representation may not be available or mayresult quite imprecise.

In object-oriented languages like C++, the features ofdynamic binding and polymorphism further complicateprogram understanding. The use of polymorphism im-pacts on the resulting call graph, which at run-time maylook quite di�erent from the one computed at compile-time. PTA techniques can be used to understand whichmethods may actually be called in a polymorphic call.An application of PTA to C++ polymorphism resolu-tion is presented in [40]. In [20] a parameterized frame-work is proposed to describe well-known call graphconstruction algorithms for object-oriented languages.A generalized algorithm is derived from it,based on

Fig. 1. Example of C code for which structure ®elds are distinguished respectively by o�set or by name in the two associated SSGs.

R. Fiutem et al. / The Journal of Systems and Software 44 (1999) 213±227 215

Page 4: Points-to analysis for program understanding - <SDML>

monotonic re®nement of unsound graphs and on non-monotonic improvement of unprecise graphs.

Program slicing [18,42] is based on the knowledge ofdata and control dependences. Interprocedural datadependences require that the call graph has been com-puted in order to properly propagate ¯ow informationinto procedure call points. Hence data dependencescannot be correctly computed if the call graph is notavailable. Data dependences can also be incorrect if datastructure pointers are not taken into account. In theexample of Fig. 2, there is a data dependence betweenthe statement computing the value of z and the incre-ment (*q)++ and decrement (*q))) inside functionf. Such dependences are obtained if q is known topossibly point to the variables x and y, which are usedto compute the value of z. By knowing them, the slice atthe last statement of main on variable z can be correctlycomputed, resulting in the statements that are high-lighted in Fig. 2.

Modi®cations to the slicing algorithms that accountfor the use of pointers in a program are discussed in[1,2,15,29]. A survey is presented in [39]. The three PTAalgorithms proposed in [3,35,37] are considered in [36] toevaluate and compare their direct e�ects on points-tosets and their transitive e�ects on di�erent data ¯owanalyses, among which GMOD analysis, live analysisand slicing. The results suggest that the precision of thedata ¯ow analyses is a�ected by the chosen points-toalgorithm by a degree that depends on the particulardata ¯ow problem. The GMOD analysis exhibits a hightransitive impact, while the size of the slices is onlyslightly in¯uenced by the points-to algorithm used.

Architecture recovery techniques often make use ofthe call graph to identify components within softwaresystems. For example, dominance tree analysis [13] ap-plied to the call graph allows us to extract candidatesubtrees representing reusable sub-systems. The clus-terization techniques adopted by Rigi [45], are alsobased on the knowledge of the call graph.

Other architecture recovery techniques may bene®tfrom pointer analysis. Abstract data type identi®cationtechniques [6,7,28] group together, routines based ontheir signature and their accesses to global variables. Ifaccesses are made through pointers, some of them maybe missed.

Classical plan recognition techniques [30,34,43] havebeen usually presented applied to languages withoutpointers, such as Cobol [30] and Lisp [34,43]. Plans [30]consists of components and constraints, and constraintsmay involve ¯ow dependences. For languages like C, inpresence of pointers, points-to results are needed toverify data dependences involving locations accessedthrough pointers.

Architecture recovery techniques may also use plansto represent the programming stereotypes, i.e. the ar-chitectural clich�es, used by programmers to build ar-chitectural structures [41].

An approach to architectural analysis based on ¯owinformation has been investigated in [16,17,41], whereseveral architectural recognizers have been de®ned andapplied to several case studies (among them, the Mosaicprogram). Pointer analysis plays a fundamental role inallowing ¯ow analysis and consequently architecturalanalysis on modern programming languages. In [41]several plans were de®ned to represent architecturalclich�es and applied in the architectural recovery of theMosaic case study. One of these plans is represented inFig. 3 and may be used to identify which tasks areforked by a program.

If, for example, this plan is applied to the codefragment in Fig. 4, it is not possible to satisfy the data-dep constraint without points-to information. In factthis is a typical example of clich�e delocalization [27],in which two portions of code in di�erent functions

Fig. 2. Slice computed at last statement of main on variable z. Fig. 3. Clich�es used by the spawns recognizer.

216 R. Fiutem et al. / The Journal of Systems and Software 44 (1999) 213±227

Page 5: Points-to analysis for program understanding - <SDML>

implement a common functionality. In such cases PTAallows to identify the locations actually referenced in anaccess path. In this example, pidp points-to pid, andtherefore the statement *pidp�fork( ) results in ade®nition of pid, which is used in a test in the main

function. Therefore a data dependence between theassign and the test components holds, and the clich�eis matched.

3. The points-to analysis algorithm

The PTA purpose is to determine the points-to setof every location, i.e., the set of locations which maybe pointed to by a given location. PTA provides theprogrammer with useful information on the usageof both static and dynamic memory, and is alsoessential for further analyses, like, e.g., datadependences.

3.1. The basic algorithm

PTA can be performed adopting the non-standard setof types de®ned in [37,38], and denoted as s-types. s-types are not related to concrete types: their function isto model the points-to relations between pointers andmemory locations. As types may be recursive, typevariables are introduced to describe mutually referenc-ing objects. In the following, type variables are denotedas si where i is an integer identifying the type. If a lo-cation contains actual data (i.e., it does not referenceother locations), its type is de®ned as si � ref �?�. If, onthe contrary, it contains a location address, it is indi-cated as si � ref �sj�.

After the PTA, each location is associated with atype variable, i.e. si � ref �sj� or si � ref �?�. Thenotation si � ref �sj� implies that all locations of type sj

can be pointed to by each location of type si. Thepoints-to set of a location x is thus de®ned by thefollowing relations:

x :si � ref�sj�points-to(x)�fy j y: sjg

The points-to set of locations associated to type si � ref

�?� is empty.The points-to relation between locations is conve-

niently represented by the SSG [8,37], where each nodeis a s-type (and is labeled with all locations of that type),and an edge exists between si and sj i� si � ref �sj�.Fig. 5 contains a C code fragment, the s-types of everylocation and the resulting SSG. Variable p has type s1 �ref �s2�, while variables s and r have type s2 � ref �?�.Thus, the resulting points-to set of variable p contains sand r. The corresponding SSG graphically shows suchrelations.

At each step of PTA, one code instruction is con-sidered, and for each pointer a�ecting statement thecorresponding s-types are updated. Instructions involv-ing pointers may be divided into two classes, dependingon whether the & operator is used or not to take theaddress of a location. The corresponding update actionsare de®ned in Fig. 6. Instruction (1) of Fig. 6 is an as-signment between two access paths. Types referencedby left and right hand sides are merged according to the

Fig. 5. Example of C code fragment for which the s-types of all lo-

cations are shown together with the resulting SSG.

Fig. 6. The two basic actions to be performed when a pointer a�ecting

statement is met.

Fig. 4. C code fragment generating a new task.

R. Fiutem et al. / The Journal of Systems and Software 44 (1999) 213±227 217

Page 6: Points-to analysis for program understanding - <SDML>

c-join (conditional join) procedure. In the secondkind of assignment the left-hand side referenced type ismerged into the right-hand side type according to thejoin procedure [37], which uni®es the two types into onesingle type, and recursively applies itself to the refer-enced types. c-join di�ers from join in that uni®cationis postponed, when the right-hand side is a reference tobottom, until it becomes a reference to a type variable.Types whose uni®cation is delayed are inserted in apending list associated with the bottom type.

Since the proposed PTA is context insensitive, onlythe points-to relations involving the passed parametershave to be taken into consideration, to deal with theinterprocedural aspects of the analysis. Parameters arehandled by applying the rules of Fig. 6, where each ac-tual parameter of a function invocation is assigned tothe corresponding formal parameter of the called func-tion. In the example of Fig. 7, function f is called twicefrom main. In the ®rst call the actual parameter is p, ismatched with the formal parameter q, so that a c-joinis invoked, according to rule (1) of Fig. 6, with q as a1,and p as a2. In the second call q matches &y, and thusa2 becomes y and rule (2) is applied. Fig. 7 also depictsthe resulting SSG.

3.2. Handling arbitrary access paths

The proposed ¯ow insensitive PTA method, whichhas been extensively discussed in [40], extends to [37]cover arbitrary combinations of dereference, subscript,direct and indirect component selection operators, alongwith the use of function pointers. Since ¯ow insensitivityand context insensitivity are preserved in our extensions,the language control structures and the call sequences

do not a�ect the analysis and therefore are not takeninto consideration.

The presented approach integrates the notion of ac-cess paths [32] and that of non-standard set of s-types[37] with a new type inference scheme, which has beende®ned to be compatible with the low cost PTA pre-sented in [37].

An access path is a unary expression described by thegrammar of Fig. 8. It contains an arbitrary sequence ofdereferences and accesses to structure ®elds. An accesspath represents a ®xed location if it does not containdereference operators (*, -> and [EXPR]) [32]. Other-wise it represents a variable location. Variable locations,depending on the possible values of the involvedpointers, may be associated with di�erent ®xed locationsduring the execution. A dynamic allocation is repre-sented as a ®xed location object heap_n created foreach line, n, where a memory allocation system call, e.g.,malloc, calloc, realloc, is encountered: heapidenti®cation by memory allocation site. Array compo-nents are ®xed location objects denoted as id[ ] (whilea variable location leading to an array component hasthe form id[EXPR]), representing a generic componentof the array.

The extensions of PTA to arbitrary access paths in-volve the computation of appropriate s-types in pres-ence of structure ®eld accesses, pointer dereferences andarray subscripts. The s-type of an access path A is rep-resented as an attribute (A.type), which can be deter-mined using the syntax directed de®nitions of Fig. 9.Explicit and implicit pointer dereferencing and arrayelement accesses perform a deref operation on the s-typeattribute. The deref function simply returns the refer-enced type of a given type, i.e., if si � ref �sj�, deref�si�returns sj. Creating and assigning a dynamic location toan access path, produces the construction of the relation(ref) between the access path s-type and the heap s-typecorresponding to the allocation site. All the other se-mantic rules propagate the s-types of structure ®elds.

Each type variable has a (possibly empty) set of ®eldattributes, used to model the ®elds of the structures. If astructure has a ®eld id which is accessed from a variableof type si, a ®eld attribute id is added to si:

si:id : sk � ref�sj�: �1�As a result sk is the s-type of the ®eld id of all locationsof type si. Typing constraints related to the id ®eld are

Fig. 7. To deal with interprocedurality, the actual parameter p in the

®rst call to f is matched with the formal parameter q. In the second

call &y is matched with q.

Fig. 8. Grammar of the access paths for the C language. Dereference

operators are: *, )> and [EXPR], EXPR is an arbitrary C expression.

218 R. Fiutem et al. / The Journal of Systems and Software 44 (1999) 213±227

Page 7: Points-to analysis for program understanding - <SDML>

introduced through the application of join or c-jointo sk, the type of si. id.

Fig. 10 contains the declaration of structure A andvariables a, c, four instructions to be analyzed (the or-der may not be that of execution, since the algorithm is¯ow insensitive), and the resulting typing environment.The analysis of statements s1 and s2 introduces twopoints-to relations, expressed by the types inferred forvariables p and q, respectively s4 � ref �s1� and s5 � ref

�s2�. Such relations can be computed by simply applyingrule (6) of Fig. 7, because both left and right-hand sideare simple access paths and then performing the joinoperation as indicated by rule (2) in Fig. 5. The nextstatement needs handling a more complex access path,for which the syntax directed de®nitions of Fig. 7 areexploited. In fact the s-type of the left-hand side is ob-tained by applying rule (5), which ®rst dereferences thes-type of p, s4, resulting in s1. Then ®eld x is appendedand the type of s1.x, s6, is obtained. The type of theright-hand side is s5 from the application of (6). Thederef of these two types undergoes conditional join,according to rule (1) of Fig. 5, so that s6 � ref�?� be-comes s6 � ref�s2�. When the statement s3 is processed

during the PTA (step 3), it is possible that variable c hasnot yet been encountered in the analysis, because theorder in which statements are considered is in generaldi�erent from the one in which they are executed.Therefore it is not possible to apply typing constraints toits ®eld (i.e., to c.x). The constraint is applied to thetype referenced by variable p, i.e., s1, whose x ®eld ismade to point to s2. When the last statement is analyzed(step 4), variable c will inherit the typing constraint ofthe ®eld attribute x, which is associated with its newtype s1. Fig. 11 shows the resulting SSG.

After computing the s-typing environment for thegiven program, it is possible to determine the set of ®xedlocations which may be accessed through an access path,stored as the name attribute obtained by the syntaxdirected de®nitions of Fig. 12. Each time a dereference isinvolved in the access path, the name attribute is set tothe pointed-to ®xed locations. A structure componentselection results in appending the ®eld identi®er to cur-rent ®xed locations. In the example of it is possible tocompute the set of locations updated by the statement(*q)++ in function f, by means of the rules of Fig. 10.Rule (6) produces the initial set fqg, which becomesfx;yg by applying rules (2) and (7).

The result of the PTA can be used to determine theset of functions that can be invoked by a functionpointer. In general, a call through a function pointerhas the function name replaced by an access path. Theset of ®xed locations reachable through the access pathgives the names of the invocable functions, and there-fore the syntax directed de®nitions of Fig. 10 can beused again.

Fig. 13 is an example of code exploiting functionpointers. The typing environment that can be deter-mined by applying the rules in Figs. 9 and 6 is shown inFig. 14. The type of s, s7, has a ®eld f and in turn thetype of s7.f, s8, has a ®eld x that is the function pointer

Fig. 10. Example comprising the declaration of struct A, variables a,

c and four statements. The resulting typing environment after ana-

lysing s3 (step 3) and s4 (step 4) is shown. Fig. 11. Storage Shape Graph of the example in Fig. 10.

Fig. 9. Syntax directed de®nitions to compute the type attribute of an access path. The type of a ®eld of a type is denoted as: type.id.type.

R. Fiutem et al. / The Journal of Systems and Software 44 (1999) 213±227 219

Page 8: Points-to analysis for program understanding - <SDML>

of the structure type A. The type of s8.x references thetype of the two functions, f and g.

In the example of Fig. 13, it is possible to determinethe set of invocable functions at the last statement ofmain by applying the syntax directed de®nitions ofFig. 12 to the access path (*a[10]->f.x). The result isthat functions f and g are possibly invoked through the®eld x of ®eld f of structure s. In fact, rule (6) is ®rstapplied, giving a name attribute equal to fag. Then rule(3) transforms it into fa� �g, and rule (5) into fs:fg.Rule (4) produces fs:f:xg and ®nally rules (2) and (7)give the result, ff;gg. The SSG of this example is rep-resented in Fig. 15. It can be noticed that the applicationof the rules in Fig. 12 corresponds to a proper visit ofsome of the SSG nodes. In fact rules (6) and (3) lead tothe node labeled a [ ], associated to the variable locationa[10]. Next steps correspond to handling a[10]->f,

i.e., a dereference leading to node s, and a ®eld selectionthat appends f to s and gives the node labeled s.f.

After that, the x ®eld is appended and node s.f.x isreached. Its target node, labeled f, g is the result of the®nal dereference, obtained through the star (*) operator.

The results of the PTA can be improved by consid-ering only the statements which belong to functionsreachable from the main. All other statements are deadcode for the considered application, and therefore theycan be ignored during the PTA. In presence of functionpointers it is not possible to know which functions arereachable from the main until the points-to results areavailable. Since the order in which statements areconsidered does not a�ect the ®nal result, it is possibleto use the initial approximation of the call graph,without function pointer resolution, to determine a ®rstset of functions the statements of which have to beanalyzed. After applying the proposed algorithm tothese statements, function pointers can be partiallysolved, giving a second approximation of the calledfunctions. If functions not previously considered areadded to the set of functions reachable from the main,their statements have to be analyzed as well. The pro-cess ends when no new function is added to the set offunctions reachable from the main, due to an un-changed function pointer resolution. It has to be no-ticed that when new statements are added for PTA, it isnot necessary to reanalyze previous statements and theanalysis can simply go on with the new statements,because the analysis is not sensitive to the order of theanalyzed statements.

Fig. 13. Last statement of main calls either f or g according to the

points-to set of the across path (*a[10]->f.x).

Fig. 14. Typing environment for the example in Fig. 13.

Fig. 15. Storage Shape Graph of the example in Fig. 13.

Fig. 12. Syntax directed de®nitions to compute the name attribute of an across path, which is the set of ®xed locations reachable through the access

path.

220 R. Fiutem et al. / The Journal of Systems and Software 44 (1999) 213±227

Page 9: Points-to analysis for program understanding - <SDML>

3.3. Algorithm properties

The proposed algorithm has several interestingproperties. It is a safe algorithm: the points-to sets areconservative approximations. This means that if apoints-to relation holds, it is surely included in thepoints-to set reported by the approximate analysis.Spurious relations may be generated by the algorithm,but in general, this is a drawback of every static analysiswhich tries to statically approximate the dynamic be-havior of a program. For example, in Fig. 7 the edgefrom p to x, y introduces a spurious points-to relationbetween p and y.

As the algorithm is both ¯ow and context insensitive,it does not require any knowledge about the call graph,while other ¯ow and context sensitive approaches re-quire the construction of the call graph, or its abstrac-tion, during the analysis. However, in ¯ow sensitiveanalyses and in the presence of function pointers thecall graph depends on the results of the ongoing anal-ysis. Therefore while ¯ow sensitive algorithms iterat-ively use the available points-to information for callgraph construction and vice versa, the proposed ap-proach is able to separate completely the PTA fromfunction pointers resolution since no call graph con-struction is required.

Recursive data structures and recursive function callsare inherently handled. Recursive data structures areproperly modeled by type variables, the ®elds of whichreference the type variable itself. As the analysis iscontext insensitive, the call graph is not taken intoconsideration, nor is the presence of cycles in it.Therefore functions may be recursive or indirectly re-cursive.

Being ¯ow and context insensitive, the algorithm is alow cost one in terms of space and time complexity. Thealmost linear complexity O�Sa�S; S�� presented in [37] ispreserved in practice, where S is a size measure thatgives the number of ®xed locations in the program and athe very slowly increasing inverse Ackermann's function(a detailed discussion can be found in [38,40]).

4. Experimental setup and results

The described points-to algorithm is implemented inFLow ANalysis Tool (FLANT) which, in turn, is inte-grated in CANTO, our environment for program un-derstanding and maintenance [4]. The organization ofFLANT is shown in Fig. 16. The source code is parsedby the front end, with the pre-processor macro valuesspeci®ed in the con®guration ®le. The front end buildsan internal representation of the program in terms ofAbstract Syntax Trees (ASTs), and saves it in a languageindependent format, so that it is possible to analyze codewritten in di�erent imperative languages. For C code,

the intermediate representation is generated by a Re®ne 2

based front end program.The analyzer of FLANT performs several analyses on

the AST format, among which reaching de®nitions anddata dependences (in three variants), dominators, post-dominators, control dependences, slicing, and the pre-sented PTA. A report ®le can be generated, where theresults of an analysis are presented as plain text. Alter-natively, the analyzer interacts with an editor (EMACSor tkedit 3 in our case) to visualize (as in Figs. 23 and 2)the results on demand, while editing the program. Fur-thermore, whenever the results can be depicted bygraphs, like, e.g., the SSG for the PTA, it is possible toinvoke a graph viewer. PTA results are also useful forother successive analyses: the points-to information canbe integrated for data dependences computation or callgraph construction. This is done by a module whichtransforms a variable location into the set of ®xed lo-cations reachable through it. Such a module, that is partof the FLANT analyzer, implements the syntax directedde®nitions described in Fig. 12, and basically transformsan input AST into an augmented one (AST+), accordingto the scheme in Fig. 17. In the new AST+, dereferencechains are resolved: access paths used inside expressionsor used to call a function (in presence of functionpointers) are replaced by the pointed-to ®xed locations.

Table 1 reports some features of the programs usedto test the points-to computation. Their size ranges from

Fig. 16. Organisation of FLANT, the Flow ANalysis Tool.

2 Re®ne and Re®ne/C are trademarks of Reasoning Systems Inc.3 Tkedit is an X11 oriented Editor entirely written in tcl-tk by Rainer

Kowallik at Deutsches Elektronen-Synchrotron Institute for High

Energy Physics.

R. Fiutem et al. / The Journal of Systems and Software 44 (1999) 213±227 221

Page 10: Points-to analysis for program understanding - <SDML>

33 LOC to 16474 LOC (26±10 684 uncommented LOC,ULOC), while the number of procedures is between 1and 201. The total number of nodes in the AST of eachprogram is also given. The test suite contains real publicdomain applications, many of which are common sys-tem utilities. They are good representatives of the ex-pressive capabilities of the C language, like, e.g.,

pointers (and pointers to functions), structures, recur-sion and dynamic allocation.

Table 2 gives the computation times for the PTA.The ®fth column reports the number of access pathsinvolved in program statements. Some of these arevariable locations, in that they contain one or moredereference operators (*, -> and [EXPR]). These pathsare expanded by the integration module into the set of®xed locations reachable through them. The number ofthe resulting ®xed locations is given in the last column(with the ratio between the two in brackets). Compu-tation times are generally small, and 6.8 s accounts forthe analysis of the whole test suite. On average, 37.0% ofthe access paths are variable locations, to be solved bymeans of the PTA results. Each variable location may bemapped into one or more reachable ®xed locations,according to the cardinality of the points-to sets. Thistransformation produces an average 1.99 increase factorin the number of locations: on average about 1/3 of thelocations is made of variable locations, and each vari-able location leads to about two ®xed locations.

As an example, Fig. 18 depicts the SSG for the t

program. Dynamic memory locations have the formheap_nn_mm, where nn is a number that identi®es the®le and mm is the line number of the statement per-forming the allocation. Multiple allocations due tomultiple execution of line mm in ®le nn are representedby the single location heap_nn_mm. Input values arestored in the dynamic locations pointed to by data, anddenoted as heap_1_60, since allocation is performed in®le 1 (testt.c) at line 60. Data are then divided intotwo subsets and are copied inside two arrays of dynamiclocations for further processing. The two representativesof these locations are heap_2_70, heap_2_71 (®le 2is t_mayer.c). Dynamic location heap_2_113 iscreated for subsequent computation. All pointers (y, x,dbl, gi, fn, ®, imag, real, fz) used to access thesedata have an edge in the SSG directed to the nodecontaining these three dynamic locations.

The presence of a ®eld of a structure which points tothe structure itself allows us to identify recursive data

Table 1

Some features of the programs in the test suite: Lines of code (LOC),

AST nodes and procedures

Program LOC ULOC Nodes Procedures

sizeof 33 26 14 1

d2j 92 71 55 4

div 161 109 150 3

crc 428 182 102 6

xref 492 336 342 7

�t 621 532 438 8

yes 1796 942 416 10

whoami 2004 1049 418 10

gdbm 7025 3665 2357 54

®nd 16 474 10 684 12 043 201

Fig. 17. Organisation of the integration module, performing variable

location resolution.

Table 2

PTA computation times are in the fourth column. Access paths and variable locations are given then (the percentage of variable locations is in

brackets). Final locations is the number of locations pointed to by variable locations (the ratio between the two is in brackets)

Program LOC ULOC Time Access paths Variable locations Final locations

sizeof 33 26 0.0 28 9 (32.1%) 9 (1.00)

d2j 92 71 0.0 94 16 (17.0%) 16 (1.00)

div 161 109 0.1 201 94 (46.7%) 96 (1.02)

crc 428 182 0.0 96 20 (20.8%) 20 (1.00)

xref 492 336 0.1 567 270 (47.6%) 291 (1.08)

�t 621 532 0.1 949 165 (17.3%) 425 (2.58)

yes 1796 942 0.1 528 122 (23.1%) 143 (1.17)

whoami 2004 1049 0.1 538 125 (23.2%) 144 (1.15)

gdbm 7025 3665 1.1 3579 1422 (39.7%) 2064 (1.45)

®nd 16 474 10 684 5.2 13 248 5109 (38.5%) 11 442 (2.24)

Total 29 126 17 596 6.8 19 828 7352 (37.0%) 14 650 (1.99)

222 R. Fiutem et al. / The Journal of Systems and Software 44 (1999) 213±227

Page 11: Points-to analysis for program understanding - <SDML>

structures (e.g., trees, lists, graphs). In the ®nd testprogram many examples of recursive data structureswere found. Two of them can be seen in Fig. 19, where aportion of the SSG for this program has been depicted.Dynamic locations heap_13_195, heap_13_155

contain a ®eld, next, which appears to point to thelocations themselves. Therefore, as suggested by thename of the ®eld, they are likely to implement a list.Locations heap_5_344, heap_6_63 contain three®elds, pred_right, pred_left and pred_next,

pointing to themselves. In this case the identi®ers sug-gest that the data structure is a binary tree, accessiblealso as a linked list. The SSG examination allows one tounderstand which pointers are used to access a givendata structure in a program. For example, eval_treereferences the tree data structure heap_5_344,heap_6_63 (®le 5 is tree.c, 6 is util.c and 13 isidcache.c). The fan-in of a node in the SSG is anindicator of the number of ways of access available toreference a location. A high fan-in usually implies adi�culty in controlling the portions of code which canmodify the content of the location associated to thenode, because many di�erent dereferences can be used toread or write it.

The ®nd program also heavily makes use of functionpointers. Fig. 20 shows a portion of the call graph forthe ®nd program. The dotted call graph edges corres-pond to calls through function pointers, and thus theycould not be identi®ed without the PTA. Solid linesdepict the usual direct calls of functions. In the ®ndexample the total number of calls through functionpointers is 21, while normal calls are 280. The 21dynamically dispatched calls result in 570 possiblyinvoked functions, so that the original call graphis totally changed when these possible calls areintroduced.

4.1. Understanding and modifying a real system withCANTO

CANTO was used in a maintenance task involvingthe modi®cation of existing code to adapt it to theslightly di�erent needs of the new users [4]. Theunderstanding phase frequently required accessingthe SSG of the program to obtain information aboutthe use of the pointers in the source code underanalysis.

The maintenance task was related to a previous re-search project at IRST, in which the need for trans-mitting images from a mobile camera was addressed. Wedecided to try to use the H261 software tool 4 for thispurpose. H261 is composed of two programs, an en-coder and a decoder, communicating via TCP socketsand working under Unix and the X Window System.The programs are written in C and the total programsize is about 6500 lines of code. Table 3 shows the linesof code (LOC) and number of functions for each source®le of the H261 software.

Some modi®cations were required in order touse H261 in our environment. In fact, from the en-coder side, there were constraints on the disk space

Fig. 18. Storage Shape Graph of the t program.

4 The H261 program was developed by Thierry Turletti at INRIA,

Sophia Antipolis, France. It is based on H261, a widely di�used image

compression standard for teleconferencing, de®ned by CCITT.Fig. 19. Portion of the Storage Shape Graph of the ®nd program.

R. Fiutem et al. / The Journal of Systems and Software 44 (1999) 213±227 223

Page 12: Points-to analysis for program understanding - <SDML>

and computational power available (the mobile plat-form computer had to perform many other complextasks). Moreover, the interface with the image acqui-sition hardware had to be adapted to our hardware.The graphical user interface of the encoder programwas not needed, because the mobile platform had nographical display, and thus it had to be eliminatedand incorporated into the decoder program's user in-terface. We decided to eliminate the part of the coderelative to reading and writing images to and from®les, from both the encoder and the decoder pro-grams. This was done in order to simplify the archi-tecture of the two programs. From the decoder side,the major modi®cations were on the user interface,

that should provide means to remotely control thecamera on the mobile platform and include additionalfunctionalities (e.g., pausing the image data ¯ow,switching among di�erent cameras, adjusting cameraparameters).

Before modifying the code, we wanted to understandthe overall organization of the application and to isolatethe computation responsible for data encoding/decodingin order to reuse it. During this phase the results of thePTA were extremely useful both as a prerequisite for theother analyses and as a valuable view by itself. A typicalexample of access to the SSG view and usage of itscontent during the comprehension process is representedby the reconstruction of the computation that is done ondata by the Decoder and is described in the following.

To trace read data in the Decoder code, i.e., to seewhere the input image is stored and how it is manipu-lated, the starting points were the read(s, . . .) calls,performing input from socket. Only one such call wasfound in the Decoder, inside the ReceiveBuer func-tion:

if((lread�read(sock,(char*)buf,T_READ_MAX))<� 0)

The SSG of the Decoder program (see Fig. 21) showsthat variable buf of function ReceiveBuer points tothe ®rst dimension components of the arraychaine_codee. Therefore, data are read from socketand put in chaine_codee[. . .].

Fig. 20. Portion of the Call Graph of the ®nd program. Dotted lines represent calls through function pointers.

Table 3

Lines of code (LOC) and number of functions for each C source ®le of

the H261 software

File LOC Functions

h261_encode.c 698 23

coder.c 1393 7

�ct.c 328 2

ifct.c 256 1

h261_decode.c 213 2

decoder.c 983 7

display.c 170 2

decoder_control.c 347 8

Headers 2033 )

Total 6421 52

224 R. Fiutem et al. / The Journal of Systems and Software 44 (1999) 213±227

Page 13: Points-to analysis for program understanding - <SDML>

At the end of the computation, the image will bedisplayed on a window. The hook for this operation isthe X call to XPutImage(. . .):

ximage)>data� (char*)structim;

XPutImage(dpy,win,gc,xim-age,0,0,0,0,L_col,L_lig);

Data displayed by this X call are stored in thelocations pointed to by structim. They resultfrom the SSG (see Fig. 22) as the ®rst dimensioncomponents of the im_pix array declared in themain.

Now we know that data are read in thechaine_codee array and are displayed from theim_pix array: the manipulation of the ®rst array,leading to ®ll the second one is the next thing tosearch. Slicing can be used to identify the computationthat produces the im_pix array. In particular, it ismore useful to ask for one slice step at a time, andtherefore increase the understanding of the computa-tion step by step. Fig. 23 shows one of these steps.The slicing criterion is ®le display.c, line 156, accesspath *structim. Actually, an access path with adereference is used, instead of a simple variable, be-cause we wanted to slice on the content of the data

structure pointed to by structim (i.e., im_pix

components), and not on the pointer variablestructim itself. This is a typical query when dealingwith pointers: usually a maintainer is interested in thestatements that a�ect the contents of a pointer, ratherthan in those that de®ne the pointer itself (i.e. allocatememory to the pointer or assign another pointer to it).Arbitrary chains involving dereferences may berequested by the programmer when dealing withcomplex dynamic data structures, so a program un-derstanding tool must support such kind of queriesand thus must incorporate a PTA.

The result of slicing is that the im_pix content iscomputed in the decode function, by using the contentof the tab_Y, tab_Cb, tab_Cr arrays. In turn, sucharrays are computed by the ifct_8x8 function, in theifct.c module, by using the content of the blockbidimensional array. The ®nal result is that block is®lled with the values decoded by the decode_8x8

function, which scans and decodes the data read in thechaine_codee array. Thus, the reconstructed ma-nipulation includes:· reading data from socket into chaine_codee,· decoding data into block,· computing the inverse Fourier transform intotab_Y, tab_Cb, tab_Cr,

· converting to pixels into im_pix.The reusable functions that have been identi®ed in

this way are depicted in grey in Fig. 24. Their identi®-cation was possible by using techniques like slicing anddata dependences which are based on points-to results,and by directly accessing the SSG, each time a compu-tation involved pointers.Fig. 22. Portion of the storage shape graph of the Decoder.

Fig. 23. One step of slice computation, started at line 156, ®le

display.c, on the access path *structim.

Fig. 21. Portion of the storage shape graph of the Decoder.

R. Fiutem et al. / The Journal of Systems and Software 44 (1999) 213±227 225

Page 14: Points-to analysis for program understanding - <SDML>

5. Conclusion and future work

We have described the extensions of Steensgaard's¯ow and context insensitive PTA algorithm, in order todeal with arbitrary access paths. A graph representationof the results is produced, known as the SSG. The al-gorithm is integrated in FLANT, our FLow ANalysisTool, conceived and realized to be modular and lan-guage independent. The resulting SSG is displayed tothe programmer for use during understanding activities.Points-to results are also integrated with slice compu-tation and architectural recovery.

The PTA has been performed on a test suite. In un-derstanding most of these programs, the analysis may beuseful: without a PTA, it can be di�cult to predict thee�ects of a change, or to understand the implementedfunctionality, because many dependences are due topointers usage, both in expressions and in function calls.The results suggest that about 1/3 of the access pathsinvolve pointer dereferences, and thus require points-toinformation to be solved. Each of them is on averagemapped into about two reachable locations, represent-ing the actual referenced storage. Computation times forthe PTA are generally low, and the whole test suite (29KLOC) can be analyzed in 6.8 s.

PTA was also used during a real maintenance task inwhich the views computed by CANTO were exploited.We have described a typical situation in which lookingat the SSG provided helpful suggestions on the sourcecode under analysis. It was inspected each time a com-putation involved data structures with pointers (arrays,dynamic memory, etc.), to see which locations are ac-tually referenced through some dereference chain. All ofCANTO views can be computed only if the PTA hasbeen preliminarly performed.

Our future work will be devoted to experimenting ourtools on larger industrial size software. We also plan tocompare other ¯ow and context sensitive pointer anal-ysis techniques with ours. In particular we are interestedin quantitatively estimating the trade o� between accu-racy and computation requirements. We are alsoworking on code slicing and architectural analysis, forwhich points-to results are a prerequisite.

Acknowledgements

We would like to thank Laurie Hendren (McGillUniversity), Mary Lou So�a (University of Pittsburgh),and Barbara Ryder (Rutgers University) for the helpfuldiscussions and comments about pointer analysis andabout a preliminary version of this study presented atthe International Conference on Software Engineering,1997.

References

[1] H. Agrawal, R.A. DeMillo, E.H. Spa�ord, Dynamic slicing in the

presence of unconstrained pointers, Proceedings of the ACM

Fourth Symposium on Testing, Analysis and Veri®cation

(TAV4), 1991, pp. 60±73.

[2] H. Agrawal, Towards Automatic Debugging of Computer

Programs, Ph.D Thesis, Purdue University, 1991.

[3] L.O. Andersen, Program Analysis and Specialization for the C

Programming Language, Ph.D Thesis, DIKU, University of

Copenhagen, 1994.

[4] G. Antoniol, R. Fiutem, G. Lutteri, P. Tonella, S. Zanfei, E.

Merlo, Program understanding and maintenance with the CAN-

TO environment, Proceedings of the International Conference on

Software Maintenance, IEEE Computer Society Press, Silver

Spring, MD, 1997, pp. 72±81.

[5] D.C. Atkinson, W.G. Griswold, The design of whole-program

analysis tools, Proceedings of the International Conference on

Software Engineering, 1996, pp. 16±27.

[6] G. Canfora, A. Cimitile, M. Munro, C.J. Taylor, Extracting

abstract data type from C programs: A case study, Proceedings of

the International Conference on Software Maintenance, 1993, pp.

200±209.

[7] G. Canfora, A. Cimitile, M. Tortorella, M. Munro, A precise

method for identifying reusable abstract data type in C code,

Proceedings of the International Conference on Software Main-

tenance, 1994, pp. 404±413.

[8] D.R. Chase, M. Wegman, F.K. Zadeck, Analysis of pointers and

structures, Proceedings of the SIGPLAN'90 Conference on

Programming Language Design and Implementation, 1990, pp.

296±310.

[9] M.P. Chase, D.R. Harris, S.N. Roberts, A.S. Yeh, Analysis

and presentation of recovered software architectures, Proceed-

ings of the Working Conference on Reverse Engineering,

IEEE Computer Society Press, Silver Spring, MD, 1996, pp.

153±162.

[10] Y.R. Chen, G.S. Flowler, E. Koutso®os, R.S. Wallach, Ciao: A

graphical navigator for software document repositories, Proceed-

ings of the International Conference on Software Maintenance,

IEEE Computer Society Press, Silver Spring, MD, 1995, pp. 66±

75.

[11] A. Cimitile, A. De Lucia, M. Munro, Identifying reusable

functions using speci®cation driven program slicing: A case

study, Proceedings of the International Conference on Software

Maintenance, IEEE Computer Society Press, Silver Spring, MD,

1995, pp. 124±133.

[12] A. Cimitile, A. De Lucia, M. Munro, Qualifying reusable

functions using symbolic execution, Proceedings of the Working

Conference on Reverse Engineering, IEEE Computer Society

Press, Silver Spring, MD, 1995, pp. 178±187.

[13] A. Cimitile, G. Visaggio, Software salvaging and the call

dominance tree, Journal of Systems and Software 28 (1995)

117±127.

Fig. 24. Reusable functions responsible for the computation on the

image are grayed.

226 R. Fiutem et al. / The Journal of Systems and Software 44 (1999) 213±227

Page 15: Points-to analysis for program understanding - <SDML>

[14] M. Emami, R. Ghiya, L.J. Hendren, Context-sensitive interpro-

cedural points-to analysis in the presence of function pointers,

Proceedings of the ACM SIGPLAN'94 Conference on Program-

ming Language Design and Implementation, June 20±24, Orlan-

do, FL, 1994, pp. 242±256.

[15] M. Ernst, Practical ®ne-grained static slicing of optimized code,

Technical report MSR-TR-94-14, Microsoft Research, Red-

mond, WA, 1994.

[16] R. Fiutem, E. Merlo, G. Antoniol, P. Tonella, Understanding the

architecture of software systems, Proceedings of the Fourth

Workshop on Program Comprehension, IEEE Computer Society

Press, Silver Spring, MD, 1996, pp. 187±196.

[17] R. Fiutem, P. Tonella, G. Antoniol, E. Merlo, A cliche' based

environment to support architectural reverse engineering, Pro-

ceedings of the International Conference on Software Mainte-

nance, IEEE Computer Society Press, Silver Spring, MD, 1996,

pp. 319±328.

[18] K.B. Gallagher, J.R. Lyle, Using program slicing in software

maintenance, IEEE Transactions on Software Engineering 17 (8)

(1991) 751±761.

[19] K.B. Gallagher, Visual impact analysis, Proceedings of the

International Conference on Software Maintenance, IEEE Com-

puter Society Press, Silver Spring, MD, 1996, pp. 52±58.

[20] D. Grove, G. DeFouw, J. Dean, C. Chambers, Call graph

construction in object-oriented languages, Proceedings of OOP-

SLA'97, 1997, pp. 108±124.

[21] D.R. Harris, H.B. Reubenstein, A.S. Yeh, Recognizers for

extracting architectural features from source code, in: Proceed-

ings of the Second Working Conference on Reverse Engineering,

Toronto, 1995, pp. 252±261.

[22] M.J. Harrold, L. Larsen, J. Lloyd, D. Nedved, M. Page, G.

Rothermel, M. Singh, M. Smith, Aristotle: A system for

development of program analysis based tools, Proceedings of

the 33rd Annual ACM Southeast Conference, 1995, pp. 110±119.

[23] R. Holt, J.Y. Pak, GASE: Visualizing software evolution-in-the-

large, Proceedings of the Working Conference on Reverse

Engineering, IEEE Computer Society Press, Silver Spring, MD,

1996, pp. 163±166.

[24] S. Horwitz, P. Pfei�er, T. Reps, Dependence analysis for pointer

variables, SIGPLAN Notices 24 (7) (1989) 28±40.

[25] W. Landi, B.G. Ryder, A safe approximate algorithm for

interprocedural pointer aliasing, Proceedings of the ACM SIG-

PLAN'92 Conference on Programming Language Design and

Implementation, 1992, pp. 235±248.

[26] L. Larsen, M.J. Harrold, Slicing object-oriented software, Pro-

ceedings of the International Conference on Software Engineer-

ing, 1996, pp. 495±505.

[27] S. Letowsky, E. Soloway, Delocalized plans and program

comprehension, IEEE Software 3 (3) (1986) 41±49.

[28] S.S. Liu, N. Wilde, Identifying objects in a conventional

procedural language: An example of data design recovery,

Proceedings of the Conference on Software Maintenance, 1990,

pp. 266±271.

[29] J.R. Lyle, D. Binkley, Program slicing in the presence of pointers,

Proceedings of the Third Software Engineering Research Forum,

Orlando, FL, 1993, pp. 255±260.

[30] V. Kozaczynski, J.Q. Ning, A. Engberts, Program concept

recognition and transformation, IEEE Transactions in Software

Enginnering 18 (12) (1992) 1065±1075.

[31] G.C. Murphy, D. Notkin, E.S.-C. Lan, An empirical study of

static call graph extractors, Proceedings of the International

Conference on Software Engineering, 1996, pp. 90±99.

[32] H.D. Pande, Compile Time Analysis of C and C++ Systems,

Ph.D Thesis, Rutgers University, New Brunswick, NJ, 1996.

[33] H.D. Pande, W.A. Landi, B.G. Ryder, Interprocedural def-use

associations for C systems with single level pointers, IEEE

Transactions on Software Engineering 20 (5) (1994) 385±403.

[34] C. Rich, L. Wills, Recognizing a program's design: A graph

parsing approach, IEEE Software 7(1) (1990) 82±89.

[35] M. Shapiro, S. Horwitz, Fast and accurate ¯ow-insensitive

points-to analysis, Proceedings of ACM POPL'97, Symposium

on Principles of Programming Languages, 1997, pp. 1±14.

[36] M. Shapiro, S. Horwitz, The e�ects of the precision of pointer

analysis, Fourth International Symposium on Static Analysis

(SAS'97) 1997.

[37] B. Steensgaard, Points-to analysis in almost linear time, Pro-

ceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on

Principles of Programming Languages, St. Petersburg Beach, FL,

1996.

[38] B. Steensgaard, Points-to analysis by type inference of programs

with structures and unions, Proceedings of the International

Conference on Compiler Construction, Link�oping, Sweden, 1996.

[39] F. Tip, A survey of program slicing techniques, Journal of

Programming Languages 3 (3) (1995) 121±189.

[40] P. Tonella, G. Antoniol, R. Fiutem, E. Merlo, Flow insensitive

C++ pointers and polymorphism analysis and its application to

slicing, Proceedings of the International Conference on Software

Engineering, 1997, pp. 433±443.

[41] P. Tonella, R. Fiutem, G. Antoniol, E. Merlo, Augmenting

pattern-based architectural recovery with ¯ow analysis: Mosaic ±

A case study, Proceedings of the Third Working Conference on

Reverse Engineering, 1996, pp. 198±207.

[42] M. Weiser, Program slicing, IEEE Transactions on Software

Engineering 10 (4) (1984) 352±357.

[43] L. Wills, Automated Program Recognition by Graph Parsing,

Ph.D Dissertation, MIT, 1992.

[44] R.P. Wilson, M.S. Lam, E�cient context-sensitive pointer

analysis for C programs, Proceedings of the ACM SIGPLAN'95

Conference on Programming Language Design and Implemen-

tation, La Jolla, CA, 1995, pp. 1±12.

[45] K. Wong, S.R. Tilley, H.A. Muller, M.D. Storey, Structural

redocumentation: A case study, IEEE Software 12 (1995) 46±54.

R. Fiutem et al. / The Journal of Systems and Software 44 (1999) 213±227 227