Improving the Integration Process of Large Software Systems Yujuan Jiang , Bram Adams MCIS, Polytechnique Montreal, Canada 1 Friday, 18 April, 14
Improving the Integration Process of Large Software Systems
Yujuan Jiang, Bram AdamsMCIS, Polytechnique Montreal, Canada
1
Friday, 18 April, 14
Integration & its Challenges
external library
host project
platform:Java 7
dependency: Java 8
2
Friday, 18 April, 14
I do hold out hope that Google does come around and works to fix their codebase to get it merged upstream to stop the huge blockage that they have now caused in a large number of embedded Linux hardware companies […] But I need the help of the Google developers to make it happen, without them, nothing can change.
http://www.kroah.com/log/linux/android-kernel-problems.html3
GregKroah-Hartman
Friday, 18 April, 14
Our Approach
Understand how does integration
work?
Analyze why integration fails?
Propose solution to integration
issues.
4
Friday, 18 April, 14
5
linux-usb
linux-scsi
lkml
linux 3.5
subsystemmaintainer1
subsystemmaintainer1
Reviewing Integration Staging
maintainer Linus Torvalds
Case study: Linux Kernel
contributor
contributor
contributor
Friday, 18 April, 14
Few Patches Get In!Long Integration Time!
2005 2006 2007 2008 2009 2010 2011 2012
accepted/rejected patches
year
perc
enta
ge o
f pat
ches
0
20000
40000
60000
80000
100000
120000
28.6328.7
27.03
32.83 32.79 33.8733.55 30.74
71.37
71.3
72.97
67.1767.21 66.13
66.45
69.26
% accepted by linus% rejected by linus
33% of patches make it!2005 2006 2007 2008 2009 2010 2011 2012
year
perc
enta
ge o
f acc
epte
d pa
tche
s of
eac
h ye
ar
020
4060
80
instantlywithin_hourwithin_day
within_weekwithin_monthwithin_quarter
within_half_yearwithin_yeartook_ages
Integration requiring 1~6months!
6
Friday, 18 April, 14
reviewing history
Tracking Evolution Process of the Patch
7
Friday, 18 April, 14
It takes 25% of patches more than 4 weeks to be reviewed!
8
Table 4: Average value of characteristics in di↵erenttypes of threads (including rejected patches).
metric name MM MS SS
Rev
iew
thr_volume 3.838 6.051 3.533
nr_reviews 1.046 1.936 1.165
review_time (day) 1.932 3.022 2.271
response_time (day) 0.871 1.131 1.030
first_response_time (day) 0.801 1.215 1.010Patch size 81.660 146.100 25.430
spread 2.398 3.811 1.016
spread_subsys 1.387 1.750 1.003
Oth
er
acceptance 46.05% 43.97% 23.26%
bug-fix 49.94% 36.72% 31.10%
Table 5: Time duration (#days) of the super-threads of type MM.
time duration # of patch versionsMin. 0 2.000
1st Qu. 2.687 2.000Median 10.061 2.000Mean 21.341 3.172
3rd Qu. 32.923 3.000Max. 107.524 108.000
of MM). Table 5 shows that most of the super-threads con-sist of only few patch versions (Mean. value is 3.172), butcan last a long time (Mean. value is 21.341 days). Morethan 25% of the patches have a reviewing time that is 4.5weeks longer than considered by researchers thus far [16].
Qb) What kind of patches undergo multiple patch versions?
Patches evolving across multiple versions are largerand a↵ect more files than those with a single version.The patches from threads of type MM (especially) and MShave higher values for “size”, “spread” and “spread subsys”,as shown in Table 4. This indicates that the patches under-going multiple versions tend to be larger and more complex,and hence need more attention before being integrated. Sur-prisingly, such invasive patches seem to feature especially insingle threads, rather than multiple ones. As we will seebelow, this means that they are still integrated relativelyquickly, whereas the patches that need more versions tendto be slightly less invasive. A Kruskal-Wallis test with post-hoc tests verified the significant di↵erence.
Qc) What kind of patches undergo multiple threads?
Kernel developers use multiple threads if too muchtime has passed since the previous patch version. Wecompared the time distribution of the interval between twosuccessive threads to that of two successive patch versionswithin one thread. The result is shown in the boxplots ofFigure 10. We can see that the time interval of threadsis much longer than that of patch versions. This seemsto confirm the intuition that people typically start a newthread when too much time has passed since the last re-view or version of a patch, whereas they would continue thesame thread otherwise. A t-test with as null hypothesis “nodi↵erence between the average of both time distributions”obtained a p-value <2.2e-16, which confirms that the di↵er-ences are statistically significant.
within threadbetween 2 successive threadsbetween 2 successive patch versions
#"of"Days
Figure 10: Boxplot of average time interval (#days)between two successive patch versions/threads ofsuper-threads.
Threads of type MM especially consist of bug-fixes. Out of all MM threads, 49.94% are bug-fixes, com-pared to 36.7% for MS and 31.10% for SS threads. The pair-wised Mann-Whitney test show that MM has no significantdi↵erence with MS, but that MM and MS are significantlydi↵erent from SS.On the one hand, this finding seems surprising, since one
would expect bug-fixes to be smaller and hence require lessdiscussion. On the other hand, bugs might be risky to fix,and hence require care and thorough reviews.
Qd) Do reviewers lose interest in multi-version patches?
Patches of MM and MS threads involve more dis-cussion than SS threads. We compared the number ofreviews discussing a patch for the three di↵erent types ofthreads. The value of “thr volume” of MM and MS patchesis higher than for the SS, which means that if a patch needsto undergo one or more additional versions reviewers seem todiscuss more about it and provide more constructive com-ments to help improve it to be accepted. The amount ofreviewing hence does not su↵er from having multiple ver-sions of a patch. A Kruskal-Wallis test with post-hoc testsshowed that the three groups are di↵erent from each other.Patches of type SS and MS have fewer number
of reviews. SS and MS patches receive the most receives(nr reviews), and hence take more review time as well. Hence,it seems like patches evolving across multiple versions at-tract fewer reviews. A Kruskal-Wallis test with post-hoctests showed that the di↵erence is significant. One possibleexplanation could be that MM patches received the majorityof their reviews early on, with later patch versions receivingmore focused reviewing.Reviewers are more eager to review MM threads.
Although we did not find a statistically significant di↵erencebetween the values of metrics“response time”and“first response time”for SS and MS, we found that MM threads significantly takeless time before the first review. Reviewers seem to considersuch patches as having a higher priority than other types ofsuperthreads.
Qe) Do multi-version patches have a lower chance of accep-
Friday, 18 April, 14
9
linux-usb
linux-scsi
lkml
linux 3.5
subsystemmaintainer1
subsystemmaintainer1
Reviewing Integration Staging
maintainer Linus Torvalds
Figure out the Integration Black-box!
contributor
contributor
contributor
Friday, 18 April, 14
Does it have external dependencies?
How much effort do I need to pay?
Is this integration worth the effort?
What will this integration change?
Will it cause further risk?
....... integrator
Propose Solution to Improve Integration
10
Friday, 18 April, 14
ISOMO: Integration of Software cOst
MOdel
11
Friday, 18 April, 14
ISOMO: Quantify the Cost of Integration
Merge Cost
Update Cost
Maintenance Cost
Removal Cost
ISOMO
12
Friday, 18 April, 14