Top Banner
The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation Sebastiano Panichella Annibale Panichella Moritz Beller Andy Zaidam Harald Gall
48

The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Apr 15, 2017

Download

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

The Impact of Test Case Summaries on Bug Fixing Performance:

An Empirical Investigation

Sebastiano Panichella

Annibale Panichella

Moritz Beller

Andy Zaidam

Harald Gall

Page 2: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Why?

@Test public void test0() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb");

Option option1 = new Option("aaabbb", true, "aaabbb");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

@Test public void test1() throws Throwable {

Option option0 = new Option("aaabbb", true, "aaabbb");Option option1 = new Option("aaabbb", true, "aaabbb");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

2

Class Name: Option.java Library: Apache Commons-Cli

Page 3: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

@Test public void test0() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb");

Option option1 = new Option("aaabbb", true, "aaabbb");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

@Test public void test1() throws Throwable {

Option option0 = new Option("aaabbb", true, "aaabbb");Option option1 = new Option("aaabbb", true, "aaabbb");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

Class Name: Option.java Library: Apache Commons-Cli

Why?

3

Q1: What are the main differences?

Q2: Do they cover different parts of the code?

Page 4: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

@Test public void test0() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb");

Option option1 = new Option("aaabbb", true, "aaabbb");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

@Test public void test1() throws Throwable {

Option option0 = new Option("aaabbb", true, "aaabbb");Option option1 = new Option("aaabbb", true, "aaabbb");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

Class Name: Option.java Library: Apache Commons-Cli

4

Why?

Q1: What are the main differences?

Q2: Do they cover different parts of the code?

Page 5: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

@Test public void test0() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb");

Option option1 = new Option("aaabbb", true, "aaabbb");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

@Test public void test1() throws Throwable {

Option option0 = new Option("aaabbb", true, "aaabbb");Option option1 = new Option("aaabbb", true, "aaabbb");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

Class Name: Option.java Library: Apache Commons-Cli

5

CandidateAssertions

Why?

Q1: What are the main differences?

Q2: Do they cover different parts of the code?

Page 6: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

@Test public void test0() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb");

Option option1 = new Option("aaabbb", true, "aaabbb");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

@Test public void test1() throws Throwable {

Option option0 = new Option("aaabbb", true, "aaabbb");Option option1 = new Option("aaabbb", true, "aaabbb");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

Class Name: Option.java Library: Apache Commons-Cli

6

Q3: Are these assertions correct?

Why?

Q1: What are the main differences?

Q2: Do they cover different parts of the code?

Page 7: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

@Test public void test0() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb");

Option option1 = new Option("aaabbb", true, "aaabbb");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

@Test public void test1() throws Throwable {

Option option0 = new Option("aaabbb", true, "aaabbb");Option option1 = new Option("aaabbb", true, "aaabbb");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

7

Test Code Comprehension

Generated Tests

Production Codepublic class Options implements Serializable{ private static final long serialVersionUID = 1L;

/** a map of the options with the character key */ private Map shortOpts = new HashMap();

/** a map of the options with the long key */ private Map longOpts = new HashMap();

/** a map of the required options */ private List requiredOpts = new ArrayList();

/** a map of the option groups */

Earl T. Barr, et al., “The Oracle Problem in Software Testing: A Survey”.IEEE Transactions on Software Engineering, 2015.

Page 8: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Are Generated Tests Helpful?

G. Fraser et al., Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study,

TOSEM 2015.

Do not lead to detection of more faults.

8

0%

TestingComprehension

Testing time

75% 100%

Page 9: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Our Solution

Test Case

9

Page 10: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Test Coverage Analysis

COBERTURA

Test Suite GenerationOption.java

TestDescriber

@Testpublic void testProva() throws Throwable {

Option option0 = new Option("aaa", true, "aaa");Option option1 = new Option("aaa", true, "aaa");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

@Testpublic void testProva2() throws Throwable {

Option option0 = new Option("aaa", true, "aaa");Option option1 = new Option("aaa", true, "aaa");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);

}

Summary Generation

10

Page 11: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Summary Generator

Software Words Usage Model: deriving <actions>, <themes>, and <secondary arguments> from class, methods, attributes and variable identifiers

E. Hill et al. Automatically capturing source code context of NL-queries for software maintenance and reuse. ICSE 2009

11

Page 12: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Summary Generator

public class Option {

public Option(String opt, String longOpt, boolean hasArg, String descr) throws IllegalArgumentException {

OptionValidator.validateOption(opt);this.opt = opt;this.longOpt = longOpt;

if (hasArg) {this.numberOfArgs = 1;

}

this.description = descr;}

... }

SWUM in TestDescriber:

Covered Code

12

Page 13: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

public class Option {

public Option(String opt, String longOpt, boolean hasArg, String descr) throws IllegalArgumentException {

OptionValidator.validateOption(opt);this.opt = opt;this.longOpt = longOpt;

if (hasArg) { //FALSEthis.numberOfArgs = 1;

}

this.description = descr;}

... }

Summary Generator

SWUM in TestDescriber:

1) Select the covered statements

Covered Code

13

Page 14: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

public class Option {

public Option(String opt, String longOpt, boolean hasArg, String descr) throws IllegalArgumentException {

OptionValidator.validateOption(opt);this opt = opt;this longOpt = longOpt;

if (hasArg) {false

}

this description = descr;}

... }

SWUM in TestDescriber:

1) Select the covered statements

2) Filter out Java keywords, etc.

Summary Generator

Covered Code

14

Page 15: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

public class Option {

public Option(String opt, String long Opt, boolean has Arg, String descr) throws IllegalArgumentException {

Option Validator.validate Option(opt);this opt = opt;this long Opt = long Opt;

if (has Arg) {false;

}

this description = descr;}

... }

SWUM in TestDescriber:

1) Select the covered statements

2) Filter out Java keywords, etc.

3) Identifier Splitting (Camel case)

Summary Generator

Covered Code

15

Page 16: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

public class Option {

public Option(String option, String long Option, boolean has Argument String description) throws IllegalArgumentException {

Option Validator.validate Option(option);this option = option;this long Option = long Option;

if (has Argument) {false

}

this description = description;}

... }

SWUM in TestDescriber:

1) Select the covered statements

2) Filter out Java keywords, etc.

3) Identifier Splitting (Camel case)

4) Abbreviation Expansion (using external vocabularies)

Summary Generator

Covered Code

16

Page 17: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

SWUM in TestDescriber:

1) Select the covered statements

2) Filter out Java keywords, etc.

3) Identifier Splitting (Camel case)

4) Abbreviation Expansion (using external vocabularies)

5) Part-of-Speech tagger

Summary Generator

<actions> = Verbs <themes> = Nouns/Subjects <secondary arguments> = Nouns / objectes, adjectives, etc

public class Option {Option(String option, String long Option

, boolean has Argument String description) throws IllegalArgumentException

Option Validator.validate Option(option);

this option = option;

this long Option = long Option;

if (has Argument false}this description = description;

}

NOUN NOUN NOUNADJ

NOUNNOUNVERB

NOUN NOUN NOUN

NOUN

VERB NOUN

NOUNADJ

ADJ ADJ ADJ

NOUN

NOUN NOUN

VERB

ADJ

NOUN

CON

NOUN

ADJ

Covered Code

17

Page 18: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Summary Generator

NOUN NOUN NOUNADJ

NOUNNOUNVERB

NOUN NOUN NOUN

NOUN

VERB NOUN

NOUNADJ

ADJ ADJ ADJ

NOUN

NOUN NOUN

VERB

ADJ

NOUN

CON

NOUN

The test case instantiates an "Option" with:- option equal to “...”- long option equal to “...”- it has no argument- description equal to “…”

An option validator validates it

The test exercises the following condition:- "Option" has no argument

public class Option {Option(String option, String long Option

, boolean has Argument String description) throws IllegalArgumentException

Option Validator.validate Option(option);

this option = option;

this long Option = long Option;

if (has Argument false}this description = description;

}

NOUN NOUN NOUNADJ

NOUNNOUNVERB

NOUN NOUN NOUN

NOUN

VERB NOUN

NOUNADJ

ADJ ADJ ADJ

NOUN

NOUN NOUN

VERB

ADJ

NOUN

CON

NOUN

ADJ

Natural Language Sentences Parsed Code

18

Page 19: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

The test case instantiates an "Option" with:- option equal to “...”- long option equal to “...”- it has no argument- description equal to “…”

An option validator validates it

The test exercises the following condition:- "Option" has no argument

Natural Language Sentences

19

Class Level

Method LevelStatement

Level

Branch Level

Summarisation Levels

Page 20: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Summarisation Levels

The test case instantiates an "Option" with:- option equal to “...”- long option equal to “...”- it has no argument- description equal to “…”

An option validator validates it

The test exercises the following condition:- "Option" has no argument

Natural Language Sentences

20

Class Level

Method LevelStatement

Level

Branch Level

Do Test Summaries Improve Test Readability?

Do Test Summaries Help Developers?

Page 21: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Case StudyBug Fixing Tasks

Involving 30 Developers

21

Page 22: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

ContextObject: two Java classes from Apache Commons Primitives and Math4J that have been used in previous studies on search-based software testing [by Fraser et al. TOSEM 2015]

Subjects: 30 Developers

ArrayIntList.javaRational.java

22

Page 23: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Subjects: 30 Developers (23 Researchers and 7 Developers)

ContextObject: two Java classes from Apache Commons Primitives and Math4J that have been used in previous studies on search-based software testing [by Fraser et al. TOSEM 2015]

ArrayIntList.javaRational.java

23

Page 24: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Study Procedure

24

Page 25: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Bug Fixing Tasks

Group 1 Group 2

ArrayIntList.javaRational.java ArrayIntList.javaRational.java

25

Page 26: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Bug Fixing Tasks

Group 1 Group 2

ArrayIntList.javaRational.java ArrayIntList.javaRational.java

26

Page 27: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Bug Fixing Tasks

Group 1 Group 2

ArrayIntList.javaRational.java ArrayIntList.javaRational.java

27

Page 28: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Bug Fixing Tasks

Group 1 Group 2

ArrayIntList.javaRational.java ArrayIntList.javaRational.java

Comments Comments

TestDescriber

28

Page 29: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Bug Fixing Tasks

Experiment conducted Offline via a Survey platform

Each participant received the experiment package consisting of: 1. A pretest questionnaire 2. Instructions and materials to perform the experiment 3. A post-test questionnaire

We do not revealed the goal of the study

45 minutes of time for each task

29

Page 30: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

How do test case summaries impact the number of bugs fixed by developers?

RQ1

Page 31: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

RQ1: How do test case summaries impact the number of bugs fixed by developers?

31

Page 32: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

RQ1: How do test case summaries impact the number of bugs fixed by developers?

Participants WITHOUT TestDescriber summaries fixed 40% of injected bugsNone of them was able to fix all bugs.

32

Page 33: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

RQ1: How do test case summaries impact the number of bugs fixed by developers?

Participants, WITH TestDescriber summaries, fixed 60%-80% of injected bugs 31% of them fixed all the bugs.

33

Participants WITHOUT TestDescriber summaries fixed 40% of injected bugsNone of them was able to fix all bugs.

Page 34: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

RQ1: How do test case summaries impact the number of bugs fixed by developers?

With summaries, the participants were able to fix twice as many number of bugs (+50%,+100%), in the same

time window (45 minutes).

The differences are statistically significant (Wilcoxon test with p-value<0.05) A12 Effect Size is always LARGE

34

Participants, WITH TestDescriber summaries, fixed 60%-80% of injected bugs 31% of them fixed all the bugs.

Participants WITHOUT TestDescriber summaries fixed 40% of injected bugsNone of them was able to fix all bugs.

Page 35: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

RQ1: How do test case summaries impact the number of bugs fixed by developers?

Results are not influenced by developers’ experience:

(i) the number of bugs fixed is not significantly influenced by the programming experience;

(ii)there is no significant interaction between the programming experience and the presence of test case summaries.

35

The differences are statistically significant (Wilcoxon test with p-value<0.05) A12 Effect Size is always LARGE

Page 36: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

RQ1: How do test case summaries impact the number of bugs fixed by developers?

Results are not influenced by developers’ experience:

(i) the number of bugs fixed is not significantly influenced by the programming experience;

(ii) there is no significant interaction between the programming experience and the presence of test case summaries.

Summary: Using automatically generated test case summaries significantly helps developers to

identify and fix more bugs.

36

Page 37: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

How do test case summaries impact developers to change test cases in terms of

structural and mutation coverage?

RQ2

Page 38: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

ArrayIntList.javaRational.java

RQ2: How do test case summaries impact developers to change test cases in terms of structural and mutation coverage?

38

Page 39: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

ArrayIntList.javaRational.java

RQ2: How do test case summaries impact developers to change test cases in terms of structural and mutation coverage?

ONLY for Rational there is an improvements of the mutation score (+10%) when tests are

enriched with summaries.

10%

39

Page 40: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

ArrayIntList.javaRational.java

RQ2: How do test case summaries impact developers to change test cases in terms of structural and mutation coverage?

ONLY for Rational there is an improvements of the mutation score (+10%) when tests are

enriched with summaries.

10%Summary: Test case summaries do not influence how the developers manage the test cases in

terms of structural coverage.

40

Page 41: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Test Cases Summaries and Comprehension

Without

With 4%

6%

14%

33%

14%

6%

32%

9%

36%

45%

Medium High Very High Low Very Low

Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries

41

Page 42: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Test Cases Summaries and Comprehension

WITH Summaries:

(i) 46% of participants consider the test cases as “easy to understand”.

(iii) Only 18% of participants considered the test cases

as incomprehensible.

Without

With 4%

6%

14%

33%

14%

6%

32%

9%

36%

45%

Medium High Very High Low Very Low

Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries

42

Page 43: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Test Cases Summaries and Comprehension

WITHOUT Summaries:

(i) Only 15% of participants consider the test cases as

“easy to understand”.

(iii) 40% of participants considered the test cases

as incomprehensible.

WITH Summaries:

(i) 46% of participants consider the test cases as “easy to understand”.

(iii) Only 18% of participants considered the test cases

as incomprehensible.

Without

With 4%

6%

14%

33%

14%

6%

32%

9%

36%

45%

Medium High Very High Low Very Low

Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries

43

Page 44: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Without

With 4%

6%

14%

33%

14%

6%

32%

9%

36%

45%

Medium High Very High Low Very Low

Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries

Test Cases Summaries and Comprehension

WITHOUT Summaries:

(i) Only 15% of participants consider the test cases as

“easy to understand”.

(iii) 40% of participants considered the test cases

as incomprehensible.

WITH Summaries:

(i) 46% of participants consider the test cases as “easy to understand”.

(iii) Only 18% of participants considered the test cases

as incomprehensible.Summary: Test summaries statistically improve the comprehensibility of automatically generated

test case according to human judgments.

44

Page 45: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Quality of TestDescriber’ Summaries

Expressiveness

30%

70%

Is easy to read and understand

Is somewhat readable and understandable

Is hard to read and understand

Conciseness

10%

52%

38%

Has no unnecessary information

Has some unnecessary information

Has a lot of unnecessary information

Content adequacy

13%

37%50%

Is not missing any information

Missing some information

Missing some very important information

45

Page 46: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Quality of TestDescriber’ Summaries

Expressiveness

30%

70%

Is easy to read and understand

Is somewhat readable and understandable

Is hard to read and understand

Conciseness

10%

52%

38%

Has no unnecessary information

Has some unnecessary information

Has a lot of unnecessary information

Content adequacy

13%

37%50%

Is not missing any information

Missing some information

Missing some very important information

46

Page 47: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Quality of TestDescriber’ Summaries

Expressiveness

30%

70%

Is easy to read and understand

Is somewhat readable and understandable

Is hard to read and understand

Conciseness

10%

52%

38%

Has no unnecessary information

Has some unnecessary information

Has a lot of unnecessary information

Content adequacy

13%

37%50%

Is not missing any information

Missing some information

Missing some very important information

47

Page 48: The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Conclusion

1) Using automatically generated test case summaries significantly helps

developers to identify and fix more bugs.

2) Test case summaries do not influence how the developers manage the test

cases in terms of structural coverage.

3) Test summaries statistically improve the comprehensibility of automatically

generated test case according to human judgments.

Panichella et al. “The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation”. ICSE 2016 48