Top Banner
Not Only Statements: The Role of Textual Analysis in Software Quality Rocco Oliveto [email protected] University of Molise 2nd Workshop on Mining Unstructured Data October 17th, 2012 - Kingston, Canada
72

Not Only Statements: The Role of Textual Analysis in Software Quality

Jul 25, 2015

Download

Science

Rocco Oliveto
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Not Only Statements: The Role of Textual Analysis in Software Quality

Not Only Statements: The Role of Textual Analysis in Software Quality

Rocco Oliveto [email protected] University of Molise

2nd Workshop on Mining Unstructured DataOctober 17th, 2012 - Kingston, Canada

Page 2: Not Only Statements: The Role of Textual Analysis in Software Quality
Page 3: Not Only Statements: The Role of Textual Analysis in Software Quality
Page 4: Not Only Statements: The Role of Textual Analysis in Software Quality
Page 5: Not Only Statements: The Role of Textual Analysis in Software Quality
Page 6: Not Only Statements: The Role of Textual Analysis in Software Quality
Page 7: Not Only Statements: The Role of Textual Analysis in Software Quality

Textual analysis is...

...the process of deriving high-quality information from text

Page 8: Not Only Statements: The Role of Textual Analysis in Software Quality

Text is Software Too

Alexander DekhtyarDept. Computer ScienceUniversity of [email protected]

Jane Hu↵man HayesDept. Computer ScienceUniversity of [email protected]

Tim MenziesDept. Computer Science,Portland State University,

[email protected]

Abstract

Software compiles and therefore is characterized by aparseable grammar. Natural language text rarely conformsto prescriptive grammars and therefore is much harder toparse. Mining parseable structures is easier than miningless structured entities. Therefore, most work on miningrepositories focuses on software, not natural language text.Here, we report experiments with mining natural languagetext (requirements documents) suggesting that: (a) miningnatural language is not too di�cult, so (b) software repos-itories should routinely be augmented with all the naturallanguage text used to develop that software.

1 Introduction

“I have seen the future of software engineering, and itis......Text?”

Much of the work done in the past has focused on themining of software repositories that contain structured, eas-ily parseable artifacts. Even when non-structured artifactsexisted (or portions of structured artifacts that were non-structured), researchers ignored them. These items tendedto be ”exclusions from consideration” in research papers.

We argue that these non-structured artifacts are richin semantic information that cannot be extracted fromthe nice-to-parse syntactic structures such as source code.Much useful information can be obtained by treating textas software, or at least, as part of the software repository,and by developing techniques for its e�cient mining.

To date, we have found that information retrieval (IR)methods can be used to support the processing of textualsoftware artifacts. Specifically, these methods can be usedto facilitate the tracing of software artifacts to each other(such as tracing design elements to requirements). We havefound that we can generate candidate links in an automatedfashion faster than humans; we can retrieve more true linksthan humans; and we can allow the analyst to participatein the process in a limited way and realize vast results im-provements [10,11].

In this paper, we discuss:

• The kinds of text seen in software;

• Problems with using non-textual methods;

• The importance of early life cycle artifacts;

• The mining of software repositories with an emphasison natural language text; and

• Results from work that we have performed thus far onmining of textual artifacts.

2 Text in Software Engineering

Textual artifacts associated with software can roughlybe partitioned into two large categories:

1. Text produced during the initial development and thenmaintained, such as requirements, design specifica-tions, user manuals and comments in the code;

2. Text produced after the software is fielded, such asproblem reports, reviews, messages posted to on-linesoftware user group forums, modification requests, etc.

Both categories of artifacts can help us analyze softwareitself, although di↵erent approaches may be employed. Inthis paper, we discuss how lifecycle development documentscan be used to mine traceability information for Indepen-dent Validation & Verification (IV&V) analysts and howartifacts (e.g., textual interface requirements) can be usedto study and predict software faults.

3 If not text..

One way to assess our proposal would be to assess whatcan be learned from alternative representations. In the soft-ware verification world, reasoning about two represenationsare common: formal models and static code measures.

A formal model has two parts: a system model and aproperties model. The system model describes how the pro-gram can change the values of variables while the propertiesmodel describes global invariants that must be maintainedwhen the system executes. Often, a temporal logic1 is used

1Temporal logic is classical logic augmented with some tem-

poral operators such as ⇤X (always X is true); ⌃X (eventually

X is true); �X (X is true at the next time point); XS

Y (X is

true until Y is true).

Non-structured artifacts are rich in semantic information that

cannot be extracted from the nice-to-parse syntactic

structures such as source code

...TA in SE...

Page 9: Not Only Statements: The Role of Textual Analysis in Software Quality

traceability recovery (Antoniol et al. TSE 2002, Marcus and Maletic ICSE 2003) change impact analysis (Canfora et al. Metrics 2005) feature location (Poshyvanyk et al. TSE 2007) program comprehension (Haiduc et al. ICSE 2010, Hindle et al. MSR 2011) bug localization (Lo et al. ICSE 2012) clone detection (Marcus et al ASE 2001) ...

Textual Analysis Applications

Page 10: Not Only Statements: The Role of Textual Analysis in Software Quality

Why Textual Analysis for Software Quality

Page 11: Not Only Statements: The Role of Textual Analysis in Software Quality

Whyfor

lightweight (as it does not require parsing)

provide complementary information to what

traditional code analysis could provide

Page 12: Not Only Statements: The Role of Textual Analysis in Software Quality

Textual analysis for software quality

Page 13: Not Only Statements: The Role of Textual Analysis in Software Quality

...process overview...source code

entitysource code

entitysource code

entitytext

normalizationidentifier

normalization

term weighting

application of NLP/IR

new knwoledge

new knwoledge

new knwoledge

Page 14: Not Only Statements: The Role of Textual Analysis in Software Quality

Textual Analysis to...

...measure class cohesion

Given a class1. compute the textual similarity between all the

pairs of methods2. compute the average texual similary (value

between 0 and 1)3. the higher the similarity the higher the

cohesion

A. Marcus, D. Poshyvanyk, R. Ferenc: Using the Conceptual Cohesion of Classes for Fault Prediction in Object-Oriented Systems. IEEE Transanctions Software Engineering. 34(2): 287-300 (2008)

Page 15: Not Only Statements: The Role of Textual Analysis in Software Quality

Textual Analysis to...

...measure class coupling

Given two classes A and B1. compute the textual similarity between all

unordered pairs of methods from class A and class B

2. compute the average texual similary (value between 0 and 1)

3. the higher the similarity the higher the coupling

D. Poshyvanyk, A. Marcus, R. Ferenc, T. Gyimóthy: Using information retrieval based coupling measures for impact analysis. Empirical Software Engineering 14(1): 5-32 (2009)

Page 16: Not Only Statements: The Role of Textual Analysis in Software Quality

Yet another metric?PC1 PC2 PC3 PC4 PC5 PC6

Proportion 29,6 20,9 10,1 10 17 8,5

Cumulative 29,6 50,5 60,6 70,7 87,7 96,2

C3 -0,06 -0,03 -0,01 0,99 -0,04 0

LCOM1 0,92 0 0,05 -0,03 0,31 -0,01

LCOM2 0,91 -0,01 0,04 -0,02 0,33 0

LCOM3 0,6 -0,12 0,05 -0,04 0,73 -0,13

LCOM4 0,2 -0,19 0 -0,03 0,93 -0,1

LCOM5 0,08 0,03 0,99 -0,01 0,01 -0,04

ICH 0,91 0,05 0,06 -0,05 -0,06 -0,14

TCC -0,02 0,93 -0,03 0 -0,11 0,28

LCC 0,04 0,96 0,07 -0,05 -0,13 0,09

Coh -0,11 0,47 -0,06 0,01 -0,17 0,84

Page 17: Not Only Statements: The Role of Textual Analysis in Software Quality

Yet another metric?PC1 PC2 PC3 PC4 PC5 PC6

Proportion 29,6 20,9 10,1 10 17 8,5

Cumulative 29,6 50,5 60,6 70,7 87,7 96,2

C3 -0,06 -0,03 -0,01 0,99 -0,04 0

LCOM1 0,92 0 0,05 -0,03 0,31 -0,01

LCOM2 0,91 -0,01 0,04 -0,02 0,33 0

LCOM3 0,6 -0,12 0,05 -0,04 0,73 -0,13

LCOM4 0,2 -0,19 0 -0,03 0,93 -0,1

LCOM5 0,08 0,03 0,99 -0,01 0,01 -0,04

ICH 0,91 0,05 0,06 -0,05 -0,06 -0,14

TCC -0,02 0,93 -0,03 0 -0,11 0,28

LCC 0,04 0,96 0,07 -0,05 -0,13 0,09

Coh -0,11 0,47 -0,06 0,01 -0,17 0,84

Page 18: Not Only Statements: The Role of Textual Analysis in Software Quality

So what?

Page 19: Not Only Statements: The Role of Textual Analysis in Software Quality

Improvedefect prediction

Page 20: Not Only Statements: The Role of Textual Analysis in Software Quality

...some numbers...Metrics Precision Correctness R2 value

LCOM1 61,9 74,39 0,1

LCOM3 62,59 70,55 0,1

LCOM2 62,05 75,93 0,1

LCOM4 59,75 66,36 0,07

C3 62,05 61,35 0,07

ICH 60,92 73,52 0,06

Coh 61,21 59,33 0,03

LCOM5 56,56 54,48 0,03

Page 21: Not Only Statements: The Role of Textual Analysis in Software Quality

...some numbers...Metrics Precision Correctness R2 value

C3+LCOM3 66,2 68,47 0,16

C3+LCOM1 65,23 68,23 0,15

C3+LCOM2 64,88 67,54 0,15

C3+LCOM4 64,98 66,2 0,14

C3+ICH 63,71 64,74 0,12

LCOM4+ICH 63,32 72,87 0,11

LCOM3+ICH 63,46 72,61 0,11

LCOM1+LCOM3 63,27 74,16 0,11

Page 22: Not Only Statements: The Role of Textual Analysis in Software Quality

...some numbers...Metrics Precision Correctness R2 value

C3+LCOM3 66,2 68,47 0,16

C3+LCOM1 65,23 68,23 0,15

C3+LCOM2 64,88 67,54 0,15

C3+LCOM4 64,98 66,2 0,14

C3+ICH 63,71 64,74 0,12

LCOM4+ICH 63,32 72,87 0,11

LCOM3+ICH 63,46 72,61 0,11

LCOM1+LCOM3 63,27 74,16 0,11

The use of C3 improves the

prediction accuracy of models

based only on structural

metrics

Page 23: Not Only Statements: The Role of Textual Analysis in Software Quality

But also refactoring...

Page 24: Not Only Statements: The Role of Textual Analysis in Software Quality

Class Cmethod-by-methodmatrix construction

m1m2 ........ mnm1m2........mn

SSM CIM CSMStructural Similaritybetween Methods

Call-based Interaction between Methods

Conceptual Similaritybetween Methods

n methods

...the approach...

G. Bavota, A. De Lucia, A. Marcus, R. Oliveto: A two-step technique for extract class refactoring. ASE 2010: 151-154G. Bavota, A. De Lucia, R. Oliveto: Identifying Extract Class refactoring opportunities using structural and semantic cohesion measures.

Journal of Systems and Software 84(3): 397-414 (2011)

Page 25: Not Only Statements: The Role of Textual Analysis in Software Quality

public class UserManagement {//String representing the table user in the databaseprivate static final String TABLE_USER = "user";//String representing the table teaching in the databaseprivate static final String TABLE_TEACHING = "teaching";/* Insert a new user in TABLE_USER */public void insertUser(User pUser){

boolean check = checkMandatoryFieldsUser(pUser);...String sql = "INSERT INTO " + UserManagement.TABLE_USER + " ... ";...

}/* Update an existing user in TABLE_USER */public void updateUser(User pUser){

boolean check = checkMandatoryFieldsUser(pUser);...String sql = "UPDATE " + UserManagement.TABLE_USER + " ... ";...

} /* Delete an existing user in TABLE_USER */

public void deleteUser(User pUser){...String sql = "DELETE FROM " + UserManagement.TABLE_USER + " ... ";...

} /* Verify if in TABLE_USER exists the user pUser */

public void existsUser(User pUser){...String sql = "SELECT FROM " + UserManagement.TABLE_USER + " ... ";...

}/* Check the mandatory fields in pUser */public boolean checkMandatoryFieldsUser(User pUser){

...}/* Insert a new teaching in TABLE_TEACHING */public void insertTeaching(Teaching pTeaching){

boolean check = checkMandatoryFieldsTeaching(pTeaching);...String sql = "INSERT INTO " + UserManagement.TABLE_TEACHING + " ... ";...

}/* Update an existing teaching in TABLE_TEACHING */public void updateTeaching(Teaching pTeaching){

boolean check = checkMandatoryFieldsTeaching(pTeaching);...String sql = "UPDATE " + UserManagement.TABLE_TEACHING + " ... ";...

} /* Delete an existing teaching in TABLE_USER */

public void deleteTeaching(Teaching pTeaching){...String sql = "DELETE FROM " + UserManagement.TABLE_TEACHING + " ... ";...

}/* Check the mandatory fields in pTeaching */public boolean checkMandatoryFieldsTeaching(Teaching pTeaching){

...}

}

0 0 0 10.5 00 0.5000 000 0100

0 00 0 0.5100 0

00

00

0.5

0

000

000

00

0

0

00

10 00 000 10.5 00.5 0

00 00 1 00 00 1 00

0.500 0 0100.50001

CDM similarity

SSM similarity

CSM similarity

IU UU IT UT CTIU

UU

DUEUCUIT

method-by-method matrix

wCDM = 0.2

wSSM = 0.5

wCSM = 0.3

IU = insertUser - UU = updateUser - DU = deleteUser - EU = existsUser - CU = checkMandatoryFieldsUserIT = insertTeaching - UT = updateTeaching - DU = deleteTeaching - CT = checkMandatoryFieldsTeaching

DU EU CU DT

UT

DTCT

0 0 0 10 00 0000 100 0110

0 10 0 0110 0

00

000

0

100

000

00

0

1

00

10 00 000 10 00 0

01 11 1 01 01 1 01

011 1 01001111

IU UU IT UT CTIU

UU

DUEUCUIT

DU EU CU DT

UT

DTCT

0 0 0 10.5 0.20 0.30.100 0.300.1 0.210.40

0.1 0.30.1 0 0.510.50 0

00

00.10.5

0

0.400

00

0.100

0.1

0.5

0.10

10 00.2 000.1 10.2 00.1 0.1

0.10.3 0.30.5 1 00.3 01 0.10.5

0.10.30.7 0.4 010.20.20.50.50.71

IU UU IT UT CTIU

UU

DUEUCUIT

DU EU CU DT

UT

DTCT

0 0 0 10.3 0.10 0.3000 0.600 0.110.60

0 0.60 0 0.310.70 0

00

00

0.3

0

0.600

000

00

0

0.7

00

10 00.1 000 10.2 00.1 0

00.6 0.60.7 1 00.6 00.6 1 00.7

0.10.60.7 0.6 010.10.20.70.70.71

IU UU IT UT CTIU

UU

DUEUCUIT

DU EU CU DT

UT

DTCT

Page 26: Not Only Statements: The Role of Textual Analysis in Software Quality

DU

UU

CU

IU0.6

0.7

Candidate Chain C1

Candidate Chain C2

Trivial Chain T1

UUIU DU

Candidate Class C1

DTIT UT CT

Candidate Class C2

EU

Method-by-method Relationships before Filtering Method-by-method Relationships after Filtering Proposed Refactoring

0.7

EU

0.7

0.2

IT

0.1

0.60.1

0.6

UT

DT

CT

0.7

0.6

0.3

0.6

0.3

0.1

DU

UU

CU

IU0.6

0.7

0.7

EU

0.7

IT0.6

0.6

UT

DT

CT

0.7

0.6

0.3

0.6

0.3

CU

method-by-method matrix after transitive closureproposed refactoring

...the approach...

Page 27: Not Only Statements: The Role of Textual Analysis in Software Quality

DU

UU

CU

IU0.6

0.7

Candidate Chain C1

Candidate Chain C2

Trivial Chain T1

UUIU DU

Candidate Class C1

DTIT UT CT

Candidate Class C2

EU

Method-by-method Relationships before Filtering Method-by-method Relationships after Filtering Proposed Refactoring

0.7

EU

0.7

0.2

IT

0.1

0.60.1

0.6

UT

DT

CT

0.7

0.6

0.3

0.6

0.3

0.1

DU

UU

CU

IU0.6

0.7

0.7

EU

0.7

IT0.6

0.6

UT

DT

CT

0.7

0.6

0.3

0.6

0.3

CU

method-by-method matrix after transitive closureproposed refactoring

...the approach...

Conceptual cohesion plays a crucial role

Refactoring operations make

sense for developers

Page 28: Not Only Statements: The Role of Textual Analysis in Software Quality

The developer point of view...

Do measures reflect the quality perceived by developers?

Page 29: Not Only Statements: The Role of Textual Analysis in Software Quality

...the study...How does class coupling align

with developers’ perception of coupling?

Four types of source of informationstructural dynamic semantic historical

The study involved 90 subjects G. Bavota, B. Dit, R. Oliveto, M. Di Penta, D. Poshynanyk, A. De Lucia. An Empirical Study on the Developers'

Perception of Software Coupling. Submitted to ICSE 2013.

Page 30: Not Only Statements: The Role of Textual Analysis in Software Quality

...take away...

Coupling cannot be captured and measured using only structural information, such as method calls

Different sourceS of information are neededSemantic coupling seems to reflect the developers’ mental

model when identifying interaction between entitiesSemantic coupling is able to capture “latent coupling

relationships” incapsulated in identifiers and comments

Page 31: Not Only Statements: The Role of Textual Analysis in Software Quality

Inconsistentcy between code and comments...Not only quality measure...

Page 32: Not Only Statements: The Role of Textual Analysis in Software Quality

Inconsistency between code and comments...

Page 33: Not Only Statements: The Role of Textual Analysis in Software Quality

...the study...QALP Score: the similarity between a module’s comment and its code

Used to evaluate the quality of source code but it can be also used to predict faults

0.0

0.2

0.4

0.6

0.8

1.0

0 2 4 6 8 10 12 14

QAL

P Sc

ore

Defect Count

MozillaMP

Figure 2. Maximum QALP score per defectcount for both programs.

QALP score, denoted as qs, LoC, SLoC, and their inter-actions. From this model statistically insignificant terms(those having a p-value > 0.05) are removed. Dishearten-ingly, the QALP score is one of the terms removed. The lastmodel (during the elimination) to include the QALP scoreis

defects = 0.68 − 0.15qs

(1)+ 7.3 × 10−3 LoC

− 6.8 × 10−4 SLoC− 1.0 × 10−6 LoC × SLoC

The negative coefficient for qs in this model would be goodnews except that the p-value for the QALP score of 0.829 isquite high; thus, QALP score is not significant in predictingthe value of defects.The final model, which includes only LoC, SLoC, and

their interaction, is

defects = 0.61 + 7.2 × 10−3 LoC

(2)− 6.1 × 10−4 SLoC

− 1.0 × 10−6 LoC × SLoC

Its coefficient of determination, R2 = 0.156, is quite low(the model’s p-value is < 0.0001). Thus, for Mozilla nei-ther the QALP score nor the size measures prove to be gooddefect predictors.An inspection of code for Mozilla generated three rele-

vant insights as to why the QALP score proved an ineffec-tive measure for predicting faults. First, many of the classeshad few, if any reported defects and, second, there are veryfew comments. A complete absence of comments producesa QALP score of zero. When these zeros occur in classeswith a wide range of faults, they produce statistical ‘noise’,which interferes with the ability of the statistical techniquesto find correlations.

Second, many of the comments that did exist were eitherused to make up for a lack of self documenting code or wereoutward looking. In the first case, the code uses identifiersthat are not easily understood out of context and commentsare required to explain the code. In the second case, com-ments are intended for users of the code and thus ignore theinternal functionality of the code. In both cases, the codeand comments have few words in common, which leads to alow QALP score. For example, the code snippet in Figure 3shows an example of both types of comments. This snippetdetermines whether there is anything other than whitespacecontained in the variable mText. Unfortunately, it is notclear from the called function name, IsDataInBuffer, thatit is simply a whitespace test. The first comment informsthe reader of this; thus, the comment is compensating for alack of self-documentation. The second comment is an out-ward looking comment, reflecting on the implications thatthe local code segment may have on the system as a whole.The third insight follows from Mozilla being open

source, which means that it includes many different codingpractices and styles [10]. This is evident when inspectingsamples of the code where identifier naming and generalcommenting are not done in a systematic fashion.

4.2 MP

The statistical analysis ofMP begins with the same setof explanatory variables as that of Mozilla. As seen in Fig-ure 2, the maximumQALP scores forMP show a less pro-nounced downward-right trend than those ofMozilla. How-ever, the final model (after removal of statistically insignifi-cant terms) is substantially different. This model, given be-low in Equation 3, includes several interaction terms, mean-ing that no direct correlation between the explanatory vari-ables and the response variable can be given.

defects = −1.83 + qs(−2.4 + 0.53 LoC − 0.92 SLoC)

(3)+ 0.056 LoC

− 0.058 SLoC

The model’s coefficient of determination (R2 = 0.614, p-value < 0.0001) indicates that it explains just over 61% ofthe variance in the number of defects.The model indicates that the QALP score plays a role in

the prediction of faults. Of prime interest is the coefficient

// Don’t bother if there’s nothing but whitespace.// XXX This could cause problems...if (! IsDataInBuffer(mText, mTextLength))break;

Figure 3. Mozilla-1.6 Example Code Snippet

D. Binkley, H. Feild, D. Lawrie, and M. Pighin, “Software fault prediction using language processing,” in Proceedings of the Testing: Academic and Industrial Conference Practice and Research Techniques, 2007, pp. 99–110.

Page 34: Not Only Statements: The Role of Textual Analysis in Software Quality

Inconsistent naming...path? Is it a relative path or an absolute path?

And what about if it is used as both relative and absolute?

Page 35: Not Only Statements: The Role of Textual Analysis in Software Quality

...the study...Term entropy: the physical dispersion of terms in a program. The higher the entropy, the more scattered across the program the terms

Context coverage: the conceptual dispersion of terms. The higher their context coverage, the more unrelated the methods using them

The use of identical terms in different contexts may increase the risk of faults

V. Arnaoudova, L. M. Eshkevari, R. Oliveto, Y.-G. Guéhéneuc, G. Antoniol: Physical and conceptual identifier dispersion: Measures and relation to fault proneness. ICSM 2010: 1-5

Page 36: Not Only Statements: The Role of Textual Analysis in Software Quality

...take away...

Term entropy and context coverage only partially correlate with size

The number of high entropy and high context coverage terms contained in a method or attribute helps to explain

the probability of it being faultyIf a Rhino (ArgoUML) method contains an identifier with a term having high entropy and high context its probability of

being faulty is six (two) times highersee also

S. Lemma Abebe, V. Arnaoudova, P. Tonella, G. Antoniol and Y.-G. Guéhéneuc. Can Lexicon Bad Smells improve fault prediction? WCRE 2013

Page 37: Not Only Statements: The Role of Textual Analysis in Software Quality

Challenges...

Page 38: Not Only Statements: The Role of Textual Analysis in Software Quality

Source code vocabulary...

Page 39: Not Only Statements: The Role of Textual Analysis in Software Quality

How to induce developers to use

meaningful identifiers?

Page 40: Not Only Statements: The Role of Textual Analysis in Software Quality

Reverse engineering, used with evolving software development

technologies, will provide significant incremental

enhancements to our productivity

Page 41: Not Only Statements: The Role of Textual Analysis in Software Quality

Reverse engineering, used evolving software development

technologiessignificant incremental

enhancements to our productivity

Continuous

Textual Analysis

Page 42: Not Only Statements: The Role of Textual Analysis in Software Quality

COCONUT...

1. The Administrator activates the add member function in the terminal of the system and correctly enters his login and password identifying him as an Administrator.

2. The system responds by presenting a form to the Administrator on a terminal screen. The form includes the first and last name, the address, and contact information (phone, email and fax) of the customer, as well as the fidelity index. The fidelity index can be: New Member, Silver Member, and Gold Member. After 50 rentals the member is considered as Silver Member, while after 150 rentals the member becomes a Gold Member. The system also displays the membership fee to be paid.

3. The Administrator fills the form and then confirms all the requested form information is correct.

addmember.txt

Page 43: Not Only Statements: The Role of Textual Analysis in Software Quality

COCONUT...

Page 44: Not Only Statements: The Role of Textual Analysis in Software Quality

COCONUT...

Page 45: Not Only Statements: The Role of Textual Analysis in Software Quality

COCONUT...

1. The Administrator activates the add member function in the terminal of the system and correctly enters his login and password identifying him as an Administrator.

2. The system responds by presenting a form to the Administrator on a terminal screen. The form includes the first and last name, the address, and contact information (phone, email and fax) of the customer, as well as the fidelity index. The fidelity index can be: New Member, Silver Member, and Gold Member. After 50 rentals the member is considered as Silver Member, while after 150 rentals the member is a Gold Member. The system also displays the membership fee to be paid.

3. The Administrator fills the form and then confirms all the requested form information is correct.

addmember.txt

Page 46: Not Only Statements: The Role of Textual Analysis in Software Quality

What about if traceability links are not available?

Page 47: Not Only Statements: The Role of Textual Analysis in Software Quality

Query assessment...

Page 48: Not Only Statements: The Role of Textual Analysis in Software Quality

IR engine

2 3

Textual Query

INPUT

INPUT

OUTPUT

Source Code

Class C1Class C1Class C1Class C1

Relevant Classes

CONCEPTLOCATION

Page 49: Not Only Statements: The Role of Textual Analysis in Software Quality

IR engine

Textual Query

INPUT

INPUT

OUTPUT

Source Code

QUERYASSESSMENT

Query Quality

Page 50: Not Only Statements: The Role of Textual Analysis in Software Quality

Good Query Bad Query

Page 51: Not Only Statements: The Role of Textual Analysis in Software Quality

Good Query Bad Query

# Method Class Score

1 insertUser ManagerUser 0.99

2 deleteUser ManagerUser 0.95

3 assignUser ManagerRole 0.88

4 util Utility 0.84

5 getUsers ManagerUser 0.79

Page 52: Not Only Statements: The Role of Textual Analysis in Software Quality

Good Query Bad Query

# Method Class Score

1 insertUser ManagerUser 0.99

2 deleteUser ManagerUser 0.95

3 assignUser ManagerRole 0.88

4 util Utility 0.84

5 getUsers ManagerUser 0.79

Useful results on top of the list

Page 53: Not Only Statements: The Role of Textual Analysis in Software Quality

Good Query Bad Query

# Method Class Score

1 insertUser ManagerUser 0.99

2 deleteUser ManagerUser 0.95

3 assignUser ManagerRole 0.88

4 util Utility 0.84

5 getUsers ManagerUser 0.79

# Method Class Score

1 util Utility 0.93

2 dbConnect ManagerDb 0.90

3 insertUser ManagerUser 0.86

4 networking Utility 0.76

5 loadRs ManagerDb 0.73

False positives ontop of the list

Useful results ontop of the list

Page 54: Not Only Statements: The Role of Textual Analysis in Software Quality

How to use query assessment for improving code vocabulary?

Page 55: Not Only Statements: The Role of Textual Analysis in Software Quality

IR engine

Textual Query

INPUT

INPUT

OUTPUT

Source Code

Query Quality

Page 56: Not Only Statements: The Role of Textual Analysis in Software Quality

IR engine

Source Code

INPUT

INPUT

OUTPUT

Documents

Code Quality

Page 57: Not Only Statements: The Role of Textual Analysis in Software Quality

What about comments?

Page 58: Not Only Statements: The Role of Textual Analysis in Software Quality

Automatic generation...

Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori L. Pollock, K. Vijay-Shanker: Towards automatically generating summary comments for Java methods. ASE 2010: 43-52

Page 59: Not Only Statements: The Role of Textual Analysis in Software Quality

Source code pre-processing...

Page 60: Not Only Statements: The Role of Textual Analysis in Software Quality

...problems...

how to remove the noise in source code?

which elements should be indexed?

identifier splitting and expansion

task-based pre-processing

Page 61: Not Only Statements: The Role of Textual Analysis in Software Quality

NLP/IR techniques...

Page 62: Not Only Statements: The Role of Textual Analysis in Software Quality

...problems...how to set the parameters of some

technqiues (e.g., LSI)?

do we need customized versions of NLP/IR techniques?

are the different techniques equivalent?

task-specific techniques?

Page 63: Not Only Statements: The Role of Textual Analysis in Software Quality

New horizons...

Page 64: Not Only Statements: The Role of Textual Analysis in Software Quality

Linguistic antipatterns...

Common practices, from linguistic aspect, in the source code that decrease the quality of the software (Arnaoudova WCRE 2010)

Page 65: Not Only Statements: The Role of Textual Analysis in Software Quality

Linguistic

Common practices, from linguistic aspect, in the source code that decrease the quality of the software (Arnaoudova WCRE 2010)

How to define linguistic antipatterns?

How to identify them?

Which is the impact of linguistic antipatterns

on software development and maintenance?

How to prevent linguistic antipatterns?

Page 66: Not Only Statements: The Role of Textual Analysis in Software Quality

00 00 00 000 0

01 10 11 1 11 1 10 0 0 0 0

00 01 1 1

0

Software testing...

Page 67: Not Only Statements: The Role of Textual Analysis in Software Quality

00 00 00 000 0

01 10 11 1 11 1 10 0 0 0 0

00 01 1 1

0

Software

Can textual analysis be used during

test case selection?

Can textual analysis be used to improve

search-based test case generation?

Can textual analysis be used to capture

testing complexity of source code?

Page 68: Not Only Statements: The Role of Textual Analysis in Software Quality

Empirical studies...

Page 69: Not Only Statements: The Role of Textual Analysis in Software Quality

Empirical

When and why does textual analysis complement

traditional source code analysis techniques?

Studies with users are needed?

Page 70: Not Only Statements: The Role of Textual Analysis in Software Quality

Conclusion...

Page 71: Not Only Statements: The Role of Textual Analysis in Software Quality
Page 72: Not Only Statements: The Role of Textual Analysis in Software Quality