Muta%on Analysis vs. Code Coverage in Automated Assessment of Students’ Tes%ng Skills Kalle Aaltonen, Petri Ihantola and O2o Seppälä (Splash – ETS’10) Aalto University, Finland
Jan 25, 2015
Muta%on Analysis vs. Code Coverage in Automated Assessment of Students’
Tes%ng Skills
Kalle Aaltonen, Petri Ihantola and O2o Seppälä (Splash – ETS’10) Aalto University, Finland
What Do We Do?
• Believe in tesGng • Provide programming assignments
– for hundreds of students per course – where students are asked to submit:
• Their implementaGon • Unit tests covering their own implementaGon
– Use Web‐Cat for automated assessment • Grade = our tests passing (%) * student’s tests passing (%) * line or branch coverage of student’s tests
How Students Test three different tests with the same code coverage
assertTrue(1 < 2);
fibonacci(6);
assertTrue(fibonacci(6) >= 0);
assertEquals(8,fibonacci(6));
• Create variaGons automaGcally from the original program • Simulate bugs
• A good test will catch many of these mutants • Assuming these mutants are really different from the original
• We hope this to provide be2er feedback/grading • We used a byte‐code level mutaGon analysis tool called Javalanche
MutaGon Analysis
• Create variaGons automaGcally from the original program • Simulate bugs
• A good test will catch many of these mutants • Assuming these mutants are really different from the original
• We hope this to provide be2er feedback/grading • We used a byte‐code level mutaGon analysis tool called Javalanche
MutaGon Analysis
Int Fib ( int N) { int curr = 1 , prev = 0; for ( int i = 0; i <= N; i++) { int temp = curr ; curr = curr + prev ; prev = temp ; } return prev ; }
MutaGon Analysis Examples of Mutants
Int Fib ( int N ) { int curr = 1 , prev =0; for ( int i = 0; I < N; i++ ) { int temp = curr ; curr = curr + prev ; prev = temp ; } return prev ; }
Int Fib ( int N ) { int curr = 0 , prev = 1; for ( int i = 0; i < N; i++ ) { int temp = curr ; curr = curr + prev ; prev = temp ; } return prev ; }
Int Fib ( int N ) { int curr = 1 , prev = 0; for ( int i = 1; i <= N; i++ ) { int temp = curr ; curr = curr + prev ; prev = temp ; } return prev ; }
Some Results
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Code coverage
Muta
tion s
core
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Code coverage
Mu
tatio
n s
co
re
• Data: BST, Hashing, Disjoint Sets assignments • Most students get full points from the coverage • MutaGon scores more widely distributed
Some Results
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Code coverage
Muta
tion s
core
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Code coverage
Mu
tatio
n s
co
re
• Data: BST, Hashing, Disjoint Sets assignments • Most students get full points from the coverage • MutaGon scores more widely distributed
About the Validity of the Results
40 %
50 %
60 %
70 %
80 %
90 %
100 %
Best Suite
Mut. Score 98,0 %
Random Suite 1
Mut. Score 85,4 %
Random Suite 2
Mut. Score 72,0 %
Worst Suite
Mut. Score 54,8 %
About the Validity of the Results
40 %
50 %
60 %
70 %
80 %
90 %
100 %
Best Suite
Mut. Score 98,0 %
Random Suite 1
Mut. Score 85,4 %
Random Suite 2
Mut. Score 72,0 %
Worst Suite
Mut. Score 54,8 %
Conclusions
• Can be used to pick up suspicious soluGons – High code coverage but low mutaGon score
• Reduces the importance of unit tests wri2en by the teacher – Also able to ensure that unspecified features are
tested (i.e. specified) • Immediate feedback
– When compared to running all tests against each soluGon
• Complex parts of the code get more a2enGon • Able to give feedback from teacher’s own
tests • Should be combined to other test adequacy
metrics
Future DirecGons
• Evaluate in pracGce • Data we analyzed is from a course where tradiGonal coverage was used to provide feedback from tests.
• Testability – Test Adequacy – Correctness • Use source code mutants directly as feedback
Thank You!
QuesGons, comments?
Graphics: Vte.Moncho, h2p://www.flickr.com/photos/maniacpictures/ Don Solo, h2p://www.flickr.com/photos/donsolo/ licensed under the creaGve commons license