Mutation Analysis vs. Code Coverage in Automated Assessment of Students’ Testing Skills

Muta%on Analysis vs. Code Coverage in Automated Assessment of Students’

Tes%ng Skills

Kalle Aaltonen, Petri Ihantola and O2o Seppälä (Splash – ETS’10) Aalto University, Finland

What Do We Do?

•  Believe in tesGng •  Provide programming assignments

–  for hundreds of students per course – where students are asked to submit:

•  Their implementaGon •  Unit tests covering their own implementaGon

– Use Web‐Cat for automated assessment •  Grade = our tests passing (%) * student’s tests passing (%) * line or branch coverage of student’s tests

How Students Test three different tests with the same code coverage

assertTrue(1 < 2);

fibonacci(6);

assertTrue(fibonacci(6) >= 0);

assertEquals(8,fibonacci(6));

•  Create variaGons automaGcally from the original program •  Simulate bugs

•  A good test will catch many of these mutants •  Assuming these mutants are really different from the original

•  We hope this to provide be2er feedback/grading •  We used a byte‐code level mutaGon analysis tool called Javalanche

MutaGon Analysis

•  Create variaGons automaGcally from the original program •  Simulate bugs

•  A good test will catch many of these mutants •  Assuming these mutants are really different from the original

•  We hope this to provide be2er feedback/grading •  We used a byte‐code level mutaGon analysis tool called Javalanche

MutaGon Analysis

Int Fib ( int N) { int curr = 1 , prev = 0; for ( int i = 0; i <= N; i++) { int temp = curr ; curr = curr + prev ; prev = temp ; } return prev ; }

MutaGon Analysis Examples of Mutants

Int Fib ( int N ) { int curr = 1 , prev =0; for ( int i = 0; I < N; i++ ) { int temp = curr ; curr = curr + prev ; prev = temp ; } return prev ; }

Int Fib ( int N ) { int curr = 0 , prev = 1; for ( int i = 0; i < N; i++ ) { int temp = curr ; curr = curr + prev ; prev = temp ; } return prev ; }

Int Fib ( int N ) { int curr = 1 , prev = 0; for ( int i = 1; i <= N; i++ ) { int temp = curr ; curr = curr + prev ; prev = temp ; } return prev ; }

Some Results

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Code coverage

Muta

tion s

core

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Code coverage

Mu

tatio

n s

co

re

•  Data: BST, Hashing, Disjoint Sets assignments •  Most students get full points from the coverage •  MutaGon scores more widely distributed

Some Results

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Code coverage

Muta

tion s

core

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Code coverage

Mu

tatio

n s

co

re

•  Data: BST, Hashing, Disjoint Sets assignments •  Most students get full points from the coverage •  MutaGon scores more widely distributed

About the Validity of the Results

40 %

50 %

60 %

70 %

80 %

90 %

100 %

Best Suite

Mut. Score 98,0 %

Random Suite 1

Mut. Score 85,4 %

Random Suite 2

Mut. Score 72,0 %

Worst Suite

Mut. Score 54,8 %

About the Validity of the Results

40 %

50 %

60 %

70 %

80 %

90 %

100 %

Best Suite

Mut. Score 98,0 %

Random Suite 1

Mut. Score 85,4 %

Random Suite 2

Mut. Score 72,0 %

Worst Suite

Mut. Score 54,8 %

Conclusions

•  Can be used to pick up suspicious soluGons –  High code coverage but low mutaGon score

•  Reduces the importance of unit tests wri2en by the teacher –  Also able to ensure that unspecified features are

tested (i.e. specified) •  Immediate feedback

–  When compared to running all tests against each soluGon

•  Complex parts of the code get more a2enGon •  Able to give feedback from teacher’s own

tests •  Should be combined to other test adequacy

metrics

Future DirecGons

•  Evaluate in pracGce •  Data we analyzed is from a course where tradiGonal coverage was used to provide feedback from tests.

•  Testability – Test Adequacy – Correctness •  Use source code mutants directly as feedback

Thank You!

QuesGons, comments?

[email protected]

Graphics: Vte.Moncho, h2p://www.flickr.com/photos/maniacpictures/ Don Solo, h2p://www.flickr.com/photos/donsolo/ licensed under the creaGve commons license

Mutation Analysis vs. Code Coverage in Automated Assessment of Students’ Testing Skills

Technology