The relationship between test code and production code Mauricio Aniche University of São Paulo (USP) [email protected] IME- USP
May 22, 2015
The relationship between test code and production codeMauricio AnicheUniversity of São Paulo (USP)[email protected]
IME-USP
Who Am I? PhD Student at University of São Paulo
Master thesis defended on April.2012 Software Developer
Consultancy for companies such as VeriFone, Sony.
Nowadays: Caelum Open Source
Restfulie.NET 1st Test-Driven Development book in Brazilian
portuguese (in my non biased opinion, the best TDD book ever!)
Unit Tests and Code Quality Great synergy between a testable class
and a well-designed class (Feathers, 2007) The write of unit tests can become
complex as the interactions between software components grow out of control (McGregor, 2001)
Agile practitioners state that unit tests are a way to validate and improve class design (Beck et al, 2001)
What am I Going to Say? A little bit about my master thesis. The first very step of my PhD. The tool I am working on.
1st part: TDD and Class Design Does the practice of TDD influence on the
quality of class design? Mixed study with ~20 experienced
developers from industry 33% has 6 to 10 years of experience 6 different companies in 3 different cities
Developers were asked to implement a set of problems, using and not using TDD. Exercises dealt with coupling, cohesion,
encapsulation problems.
Quantitative Analysis 264 production classes
831 methods / 2520 lines of code 73 test classes
225 methods / 1832 lines of code Wilcoxon to compare the difference in
both groups.
Show me the p-value!
Quantitative Analysis Filtering by their experience in TDD
No statistical significance. Specialists’ opinion
Two different specialists reviewed all generated code, without knowing if that code was produced with or without TDD.
They evaluated in terms of “class design”, “testability”, “simplicity”, using a Likert scale from 1 to 5.
No difference in their evaluation.
Qualitative Analysis Interviews with ~10 developers. All of them said that “TDD does not
guide you to a better class design by itself; the experience in OO and class design makes such a difference”.
Some patterns emerged.
Patterns of Feedback
In my PhD My idea is to check whether the
presence of those patterns in a unit test really implies in a bad production code. MSR techniques. Open source repositories for exploratory
purposes and industry repositories for the final study.
2nd part: Unit Tests and Asserts Every unit test contains three parts
Set up the scenario Invoke the behavior under test Validates the expected output
Assert instructions assertEquals (expected, calculated); assertTrue(), assertFalse(), and so on
No limits for the number of asserts per test
A little piece of codeclass InvoiceTest { @Test public void shouldCalculateTaxes() { // (i) setting up the scenario Invoice inv = new Invoice(5000.0);
// (ii) invoking the behavior double tax = inv.calculateTaxes();
// (iii) validating the output assertEquals (5000 0.06 , tax ); ∗ } }
Why would… … a test contain more than one assert? Is it a smell of bad code/design?
Research Design We selected 22 projects
19 from ASF 3 from a Brazilian consultancy
Data extraction from all projects Code metrics
Statistical Test Qualitative Analysis
Data Extraction Test code
Number of asserts per test Production method being tested
Production code Cyclomatic Complexity (McCabe, 1976) Number of method invocations (Li and
Henry, 1993) Lines of Code
Heuristic to Extract the Production Method
class InvoiceTest { @Test public void shouldCalculateTaxes() { // (i) setting up the scenario Invoice inv = new Invoice(5000.0);
// (ii) invoking the behavior double tax = inv.calculateTaxes();
// (iii) validating the output assertEquals (5000 0.06 , tax ); ∗ }}
class Invoice { public double calculateTaxes() { // something… } }
Asserts Distribution in Selected Projects
Results of the Test
Why more than 1 assert? 130 tests randomly selected Qualitative analysis:
More than one assert for the same object (40.4%)
Different inputs to the same method (38.9%) List/Array (9.9%) Others (6.8%) Extra assert to check if object is not null
(3.8%)
“Asserted Objects” We coined the term “asserted objects”
It counts not the number of asserts in a unit test, but the number of different instances of objects that are being asserted
assertEquals(10, obj.getA());assertEquals(20, obj.getB());
Counts as 1 “asserted object”
Distribution of Asserted Objects
Results of the Test
Findings Counting the number of asserts in a unit test
does not give valuable feedback about code quality But counting the number of asserted objects may
provide useful information However, the difference between both groups was
not “big” A possible explanation:
Methods that contain higher CC, lines of code, and method invocations contains many different paths, and developers prefer to write all of it in a single unit test, rather than splitting in many of them
My current problem How to statistically identify if a test code
is a “unit test” or a “integration/system test”?
3rd Step: Metric Miner Started as a command-line tool to
calculate code metrics in Git repositories. As you can guess, I needed that for my
masters. A undergraduate student ported my tool
to a web-based system. Much more interesting!
What does it do? Tool that facilitates studies in MSR. Already contains the entire Apache
repository cloned. Researcher can write a new metric and
just plug to the system. Later on, he can execute an SQL query
and extract data. He can also execute an statistical test
with two sets of existent data.
Pros and Cons You do not need to spend your computer
resources. The power of cloud computing (thanks,
Locaweb!) Still slow.
We need to parallelize the metric execution.
Go for Google’s Big Query (~300GB of data).
Contact Information Mauricio Aniche
[email protected] / @mauricioaniche TDD no Mundo Real
http://www.tddnomundoreal.com.br
Software Engineering & Collaborative Systems Research Lab (LAPESSC)
http://lapessc.ime.usp.br/