Top Banner
Understanding the Rationale for Updating a Function’s Comment Haroon Malik, Istehad Chowdhury, Hsiao-Ming Tsou, Zhen Ming Jiang, Ahmed E. Hassan School of Computing, Queen’s University, Canada
23

Understanding the Rationale for Updating a Function's Comment

Jan 13, 2017

Download

Software

SAIL_QU
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Understanding the Rationale for Updating a Function's Comment

Understanding the Rationale for Updating a Function’s Comment

Haroon Malik, Istehad Chowdhury, Hsiao-Ming Tsou, Zhen Ming Jiang, Ahmed E. HassanSchool of Computing, Queen’s University, Canada

Page 2: Understanding the Rationale for Updating a Function's Comment

2

Documentation is vital for the successful evolution of a software

system

Page 3: Understanding the Rationale for Updating a Function's Comment

3

Why understand the rationale for updating a comment

Page 4: Understanding the Rationale for Updating a Function's Comment

4

Because…

Reduce efforts to understand code

Reduce maintenance cost

Prevent bugs

Increase reliability

Page 5: Understanding the Rationale for Updating a Function's Comment

5

Likelihood of updating a commentFunction 1.function incrementValue ($val) {

return ($val++);}

Function 2.

function processInput($val) { //loop 11 times.

for (i=0;i<10;i++) {// loop executes for the upper bound of J

for (j=0;j<10;j++) { $val = ($val | i) << 2; $val = $val & $j << 2; } } return $val;}

Which

One?

Page 6: Understanding the Rationale for Updating a Function's Comment

6

Likelihood of updating a commentFunction 1.function incrementValue ($val) {

return ($val++);}

Function 2.

function processInput($val) { //loop 11 times.

for (i=0;i<10;i++) {// loop executes for the upper bound of J

for (j=0;j<10;j++) { $val = ($val | i) << 2; $val = $val & $j << 2; } } return $val;}

It Depends!!

Page 7: Understanding the Rationale for Updating a Function's Comment

7

• Modified function characteristics (8 attributes)– Long vs. short functions– Long vs. short function names– Well-documented functions– Complex vs. simple functions (# of control statements)

• Change characteristics (8 attributes)– Complex vs. simple change– Large vs. small change

• Time and code ownership characteristics (9)– Do habits change over time? Weekends vs. weekends– Same developer that changed it last time

Study Dimensions

Page 8: Understanding the Rationale for Updating a Function's Comment

8

Comment Update?

YES ? No?

Modeled as a classification problem

Page 9: Understanding the Rationale for Updating a Function's Comment

9

Measuring Performance

True ClassClassified AsYES NO

YES a bNO c d

We measure overall misclassification rate = (b+c)/(a+b+c+d)

Page 10: Understanding the Rationale for Updating a Function's Comment

10

• Explainable model• Resistant to noise• Correlated attributes• Minimum configuration

Need

Page 11: Understanding the Rationale for Updating a Function's Comment

Random ForestsProject Comment

update history

Data Set

Page 12: Understanding the Rationale for Updating a Function's Comment

12

Random ForestsProject Comment

update history

Data Set

Random Sample

Random TressYes No No

No

Vote

Prediction

Page 13: Understanding the Rationale for Updating a Function's Comment

13

Finding Top Attributes

• Sensitivity Analysis for particular attribute• Randomly change the value in all samples• Re-classify and compare performance–Drop in performance is relative to the

importance of the attribute

Page 14: Understanding the Rationale for Updating a Function's Comment

Case Study

• Used 4 open source projects with over 39 years of development:• PostgreSQL, FreeBSD, Gcluster and GCC

• Conducted 5 experiments• 1 for each dimension• 1 for all attributes of each project• 1 for total combined attributes of all projects

Page 15: Understanding the Rationale for Updating a Function's Comment

15

Exp. #1 Characteristics of changed function

• Intuition– Modification to complex functions are trickier and

more likely to introduce integration bugs

• Findings– Likelihood of comment update is higher in

functions • With a large number of comments • That are complex

Page 16: Understanding the Rationale for Updating a Function's Comment

16

Exp. #2 Characteristics of the change

• Intuition– More extensive and complex changes will increase

the probability that a comment will get updated

• Findings– Likelihood of comment update is higher for

changes • That are bug fixes• With a large number of changed dependencies• Which increase the complexity of a function (control statements)

Page 17: Understanding the Rationale for Updating a Function's Comment

17

Exp. #3 Change time and code-ownership

• Intuition– To see if time has any impact on a developer tendency

to update a comment– To highlight the relation of a function with developer

• Findings– Likelihood of comment update

• Depends on Weekday: Developers are reluctant to update comment on certain weekdays

• Does not depend on developer: non-creator of function will update too

Page 18: Understanding the Rationale for Updating a Function's Comment

18

Exp. #4 All attributes

• Intuition– To find general trend towards all attributes instead of

specific trend per dimension

• Findings– The top attributes are consistent across projects– The top attributes are from the changed function and

change characteristics dimension• Number of changed dependencies• Percentage of changed dependencies• Total number of comments

Page 19: Understanding the Rationale for Updating a Function's Comment

19

Exp. #5 All Projects

• Intuition– Determine the most influential attributes across

all projects

• Added an extra attribute “Project Name”• Findings– Project name did not bubble up as an important

attribute

Page 20: Understanding the Rationale for Updating a Function's Comment

20

How well we did ?

Page 21: Understanding the Rationale for Updating a Function's Comment

21

Number Speaks

• Performance of classifier improves with combining data from all projects. Over all misclassification rate ~ 20%

Page 22: Understanding the Rationale for Updating a Function's Comment

Conclusion

Page 23: Understanding the Rationale for Updating a Function's Comment

23

Random Forests

Training set …

1

2

n

n random cases

Classification Algorithm

n classifiers

1

2

3

3

n

Classification Algorithm

Classification Algorithm

Classification Algorithm

Test set

L1

L2

L3

Ln

n labels

Lvote