Top Banner
ECOS: Ecological Studies of Open Source S o6ware Ecosystems Tom Mens, Maelick Claes So6ware Engineering Lab Philippe Grosjean Numerical Ecology Lab informaEque.umons.ac.be/genlog/projects/ecos
27

ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

Nov 29, 2014

Download

Technology

Tom Mens

Presentation of research goals and ongoing research in the joint ARC project "ECOS: Ecological Studies of Open Source Software Ecosystems", presented by Tom Mens (UMONS) during the projects track of the CSMR-WCRE 2014 Software Evolution Week. Collaborators: Philippe Grosjean and Maelick Claes.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

ECOS:  Ecological  Studies  ofOpen  Source  So6ware  Ecosystems

• Tom  Mens,  Maelick  Claes  • So6ware  Engineering  Lab  

!• Philippe  Grosjean

 Numerical  Ecology  Lab

informaEque.umons.ac.be/genlog/projects/ecos

Page 2: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium

About  ECOS

• “AcEon  de  Recherche  Concertée”of  University  of  Mons  – Interdisciplinary  project  

• Combines  research  in  biology  (ecology)  and compuEng  science  (empirical  so6ware  engineering)  

– COMPLEXYS  Research  InsEtute  – Oct  2012  —>  Sep  2017  – 500K  EUR  funding  

• Related  EU  project:

�2

informaEque.umons.ac.be/genlog/projects/ecos

Page 3: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium

High-­‐level  project  goal

• Improve  understanding  of,  and  support  for,  open  source  so#ware  ecosystems  

–Draw  inspiraEon  from  biological  evoluEon,  ecology  and  natural  ecosystems  

• Determine  main  factors  of  success  and  failure  of  OSS  projects  within  their  ecosystem  

–Provide  beeer  techniques  and  mechanisms  to  predict  and  improve  survivability  of  OSS  projects  and  resilience  of  their  ecosystems  

–Provide  guidelines  and  evoluEon  dashboards  to  support  so6ware  communiEes

�3

Page 4: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium �4

So6ware  ecosystem  DefiniEon

Business-­‐oriented  view• “a  set  of  actors  func5oning  as  a  unit  

and  interac5ng  with  a  shared  market  for  so#ware  and  services,  together  with  the  rela5onships  among  them.”  (Jansen  et  al.  2009)

Examples

• Eclipse  • Android  and  iOS  app  store

Page 5: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium �5

So6ware  ecosystem  DefiniEon

Development-­‐centric  view• “a  collec5on  of  so#ware  products  

that  have  some  given  degree  of  symbio5c  rela5onships.”  • Messerschmie  &  Szyperski:  So#ware  

ecosystem:  Understanding  an  indispensable  technology  and  industry.  MIT  Press,  2003.  

• “a  collec5on  of  so#ware  projects  that  are  developed  and  evolve  together  in  the  same  environment.”  • M.  Lungu:  Towards  reverse  engineering  

so6ware  ecosystems.  Int’l  Conf.  So#ware  Maintenance,  2008,  pp.  428–431.

Examples

• GnomeKDE  !

• Debian Ubuntu  !

• R’s  CRAN  !

• Apache

Page 6: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium

Main  Research  QuesEons

• Which  control  mechanisms  driving  natural  ecosystems  can  be  used  to  explain  dynamics  of  so6ware  ecosystems?  !

• Which  mechanisms  and  measures  can  we  borrow  from  ecology  to  explain  and  predict  how  so6ware  projects  evolve?

�6

Page 7: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium �7

Terminology Biological  ecosystem

DefiniEons

• Ecology:  the  scien5fic  study  of  the  interac5ons  that  determine  the  distribu5on  and  abundance  of  organisms  

• Ecosystem:  the  physical  and  biological  components  of  an  environment  considered  in  rela5on  to  each  other  as  a  unit  – combines  all  living  

organisms  (plants,  animals,  micro-­‐organisms)  and  physical  components  (light,  water,  soil,  rocks,  minerals)

Example:  coral  reefs

• High  biodiversity:  polyps,  sea  anemones,  fish,  mollusks,  sponges,  algae

Page 8: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium �8

Comparison

Page 9: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium �9

Ecological  theories  of evoluEon  of  species

• Jean-­‐BapEste  Lamarck  (1744-­‐  1829)  • animal  organs  and  behaviour  can  change  according  to

the  way  they  are  used  • those  characterisEcs  can  transmit  from  one  generaEon  to

the  next  to  reach  a  greater  level  of  perfecEon  • Example:  giraffe’s  necks  have  become  longer  while  trying  to  reach  the  upper  

leaves  of  a  tree  

• Charles  Darwin  (1809–1882)  • all  species  of  life  have  descended  over  Eme

from  common  ancestors  • this  branching  paeern  resulted  from  natural  selecEon  • evoluEon  history  is  represented  by  a  phylogene5c  tree  • Example:  13  types  of  Galapagos  finches,  same  habits

and  characterisEcs,  but  different  beaks

Page 10: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium �10

Ecological  theories  of evoluEon  of  species

Hologenome  theory  • The  unit  of  natural  selecEon  is  the  holobiont:  the  organism  

together  with  its  associated  microbial  communiEes,  that  live  together  in  symbiosis.  

• The  holobiont  can  adapt  to  changing  environmental  condiEons  far  more  rapidly  than  by  geneEc  mutaEon  and  selecEon  alone.    

• Darwinism  emphasises  compe55on  (survival  of  the  fieest),  hologenome  theory  also  includes  coopera5on  (through  symbiosis)  

!In  so6ware  evoluEon:  Hologenome  theory  may  be  closer  to  what  one  observes  in  open  source  projects  where  cooperaEon  plays  a  more  important  role.

Page 11: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium �11

Ecological  theories  of evoluEon  of  species

ReEculate  evoluEon  • EvoluEon  history  is  represented  as  a  graph  structure.

Two  or  more  evoluEonary  lineages  can  berecombined  at  some  level  • hybrid  specia5on  (2  lineages  recombine  to  create

a  new  one)    • horizontal  gene  transfer  (genes  are  transferred

across  species)  !In  so6ware  evoluEon:  Distributed  VCS  like  Git  promote  reEculate  evoluEon  through  fork  and  merge  (but  few  projects  actually  merge)  !

See  Robles  et  al.  A  Comprehensive  Study  of  So#ware  Forks:  Dates,  Reasons  and  Outcomes.  OSS  Conference  2012,  Best  Paper  Award.

Page 12: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium �12

EvoluEon  History So6ware

Page 13: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium �13

Trophic  web  (food  chain) in  natural  ecosystems

Page 14: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium �14

Trophic  web  in so6ware  ecosystems

•Producer-­‐consumer  relaEon

Users

Peripheral  developers

Core  developers

Onion  model

TOP-­‐DOWN  change  requests  &  bug  reports

BOTTOM-­‐UP  changes  in  core  projects  and  architecture  

Page 15: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium �15

Core  Architecture  -­‐  or Why  developers  are  polyps

Coral  reef  ecosystem

• Sclerac5nian  coral  polyps  are  responsible  for  creaEng  the  coral  reef  structure  

• This  coral  reef  is  required  for  the  other  species  of  the  ecosystem  to  thrive.

So6ware  ecosystem

• Core  developers  are  responsible  for  creaEng  the  core  so6ware  architecture  

• Based  on  this  core  architecture,  other  developers  and  third  parEes  can  create  other  projects,  services,  and  so  on.

Page 16: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium �16

So6ware  EcosystemDynamics

Predator-­‐prey  relaEonship  (Lotka-­‐Volterra  1925/1926)  • Predators  (hunEng  animals)  feed  upon  their  prey(aeacked  animals)  

• Can  be  described  by  a  dynamic  model  with  mutually  dependentparametric  differenEal  equaEons  

Analogies  in  so6ware  maintenance  • Debuggers  are  predators,  so6ware  defects  are  prey  

Calzolari  et  al.  Maintenance  and  tes5ng  effort  modeled  by  linear  and  nonlinear  dynamic  systems,”  Informa5on  and  So#ware  Technology,  2001  

• Developers  are  predators,  the  informaEon  they  seek  is  prey  Lawrance  et  al.  Scents  in  programs:  Does  informa5on  foraging  theory  apply  to  program  maintenance?  VL/HCC  2007  

• Dual  (socio-­‐technical)  view:  • Developers  are  predators,  the  projects  they  work  on  are  prey  • Projects  are  predators  that  feed  upon  the  cogniEve  resources  of  their  developers

Page 17: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium �17

Desirable  ecosystem  characterisEcs

Biodiversity  measures  the  degree  of  variaEon  of  species  within  a  given  ecosystem  

• Maximum  diversity  if  all  species  have  same  number  of  individuals  

• Low  diversity  if  a  parEcular  species  dominates  the  others  

• Many  different  metrics:  Shannon  entropy,  Simpson  index,  evenness,  …  !

• Posnee  et  al.  used  similar  noEon  to  measure  developer  ac5vity  focus  and  module  ac5vity  focus

Dual Ecological Measures of Focusin Software Development

Daryl Posnett†, Raissa D’Souza∗, Premkumar Devanbu,† and, Vladimir Filkov††∗University of California Davis, USA

†{dpposnett,ptdevanbu,vfilkov}@ucdavis.edu,∗[email protected]

Abstract—Work practices vary among software developers.Some are highly focused on a few artifacts; others make wide-ranging contributions. Similarly, some artifacts are mostly au-thored, or “owned”, by one or few developers; others have verywide ownership. Focus and ownership are related but differentphenomena, both with strong effect on software quality. Priorstudies have mostly targeted ownership; the measures of own-ership used have generally been based on either simple counts,information-theoretic views of ownership, or social-network viewsof contribution patterns. We argue for a more general concep-tual view that unifies developer focus and artifact ownership.We analogize the developer-artifact contribution network to apredator-prey food web, and draw upon ideas from ecology toproduce a novel, and conceptually unified view of measuringfocus and ownership. These measures relate to both cross-entropyand Kullback-Liebler divergence, and simultaneously providetwo normalized measures of focus from both the developer andartifact perspectives. We argue that these measures are theoret-ically well-founded, and yield novel predictive, conceptual, andactionable value in software projects. We find that more focuseddevelopers introduce fewer defects than defocused developers. Incontrast, files that receive narrowly focused activity are morelikely to contain defects than other files.

I. INTRODUCTION

Developers are the lifeblood of open source software, OSS,and their contributions are vital for OSS to thrive. Ratherthan being assigned tasks by management, OSS developers aregenerally free to choose the style, focus, and breadth of theircontributions. Some might be quite focused, working on onespecific subsystem; others may contribute to many differentsubsystems. An device driver expert, for example, may con-tribute very specialized knowledge to an open source project,focusing on only a few files or packages. His contributions to asmall subset of modules1 may be his only contribution duringhis tenure with the project. In contrast, a project leader maywork on a variety of different tasks touching many moduleswithin a project. While OSS developers are free to choosetheir contribution styles, such choices are not inconsequential,especially to the central issue of software quality.

A dominant theme emerging from previous work in thisarea is module ownership [1], [2], [3]. Low ownership of amodule, i.e., too many contributors, can adversely impact codequality. There is, however, an entirely different perspective,developer’s attention focus, which is relatively unexplored.Human attention and cognition are finite resoucres [4]. Whendifferent tasks are simultaneously engaged, they can compete

1We use modules to mean either packages or files, depending on the context.

for mental resources and task performance can suffer [5]. Adeveloper engaged in many different tasks carries a greatercognitive burden than a more focused developer. Interestingly,the developer and module perspectives are, conceptually sym-metric, dualistic views of focus. From a module’s perspective,strong ownership indicates a strong focused contribution. Werefer to this as module activity focus, or MAF , a measure ofhow focused the activities are on a module. Symmetrically, werefer to the developer’s attention focus, or DAF , a measureof how focused the activities are of a particular developer.

A surprising, but natural analogy for MAF and DAF , arepredator-prey food webs from ecology. In a sense, modulesare predators that “feed upon” the cognitive resources ofdevelopers. As the number of developers contributing to amodule increases, the diversity of cognitive resources uponwhich the module “feeds” also increases; likewise, a developeris a “prey” whose limited cognitive resources are spread overthe modules that “prey” upon her.

Ecosystem diversity is of great interest to ecologists.Williams and Martinez call the roles complexity and diversityplay “[o]ne of the most important and least settled questionsin ecology.” [6] This diversity has two symmetric perspectives,both from a prey’s perspective, and a predator’s perspective.Ecologists have developed sophisticated symmetric measuresof predator-prey relationships, drawing upon ideas such asentropy and Kulback-Leibler divergence, that simultaneouslycapture both perspectives. We adapt these measures for soft-ware engineering projects into the metrics MAF and DAF .

In this work, we employ the methodology presented by ElEmam to validate our measures [7]. In particular, we showthat the DAF and MAF measures succeed in distinguishingimportant cases that extant measures don’t capture. We makethe following contributions:

• We adapt terminology and motivation from ecology,based on bipartite graphs;

• We incorporate and generalize previous results on devel-oper and artifact diversity;

• We provide easy to compute measures of focus, MAFand DAF , normalized to facilitate comparison within andacross projects;

• We show these measures more precisely capture out-comes relevant to software researchers and practitioners.

This novel analysis simultaneously considers focus bothfrom the artifact perspective and the author perspective.Researchers can use our MAF and DAF metrics to more

978-1-4673-3074-9/13/$31.00

c� 2013 IEEE

ICSE 2013, San Francisco, CA, USA

452

ICSE  2013

Page 18: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium

• Stability  • the  capacity  to  maintain  an  equilibrium  over  longer  periods  of  Eme  

• Resistance  • the  ability  to  withstand  environmental  changes  without  too  much  

disturbances  of  its  biological  communiEes  • Resilience  

• the  ability  to  return  to  an  equilibrium  a6er  a  disturbance  !Goal:  Use  these  and  related  measures  to  study  maintainability  and  survivability  of  so6ware  projects  within  their  ecosystem

Desirable  ecosystem  characterisEcs

�18

Page 19: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium �19

Ongoing  Research  2  case  studies

• CRAN  (Comprehensive  R  Archive  Network)  – CharacterisEcs  

15  years  >  5000  packages  >  2500  contributors  different  OS  flavours  (Linux,  Windows,  MacOS,  Solaris)  superlinear  package  growth  

– Goal  • Study  package  dependencies  and  maintainability  (number  of  errors  and  Eme  to  fix)  and  their  effect  on  package  survivability  

• See  our  CSMR-­‐WCRE  2014  ERA  paper  “On  the    maintainability  of  CRAN  packages”  

Page 20: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium �20

Ongoing  Research  2  case  studies

• GNOME  – CharacterisEcs  

16  years  >  1400  projects  >  5800  contributors  >  1.3M  commits  >  12M  file  touches  

– Goals  1. Combine  different  ecosystem  measures  into  a  predicEve  

model  of  project  survivability      2. Study  migra5on  paberns  of  contributors  and  their  effect  

on  project  survivability

Page 21: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium

– Replicate  and  generalise  the  empirical  study  by  Uzma  Raja

Ongoing  Research  GNOME  case  study  1

Combine  different  ecosystem  measures  into  a  predicEve  model  of  project  survivability    

�21

Defining and Evaluating a Measureof Open Source Project Survivability

Uzma Raja, Member, IEEE Computer Society, and Marietta J. Tretter

Abstract—In this paper, we define and validate a new multidimensional measure of Open Source Software (OSS) project survivability,

called Project Viability. Project viability has three dimensions: vigor, resilience, and organization. We define each of these dimensions

and formulate an index called the Viability Index (V I) to combine all three dimensions. Archival data of projects hosted atSourceForge.net are used for the empirical validation of the measure. An Analysis Sample (n ¼ 136) is used to assign weights to each

dimension of project viability and to determine a suitable cut-off point for V I. Cross-validation of the measure is performed on a hold-out Validation Sample (n ¼ 96). We demonstrate that project viability is a robust and valid measure of OSS project survivability that can

be used to predict the failure or survival of an OSS project accurately. It is a tangible measure that can be used by organizations tocompare various OSS projects and to make informed decisions regarding investment in the OSS domain.

Index Terms—Evaluation framework, external validity, open source software, project evaluation, software measurement, software

survivability.

Ç

1 INTRODUCTION

OPEN Source Software (OSS) projects are developed anddistributed for free, with full access to the project

source code. Recently there has been a significant increasein the use of these projects. Some OSS projects have earnedthemselves a high reputation and corporate sponsorships.Large corporations (e.g., IBM, SUN microsystems) arebecoming involved with the OSS movement in variouscapacities. Projections indicate that the corporate interest inOSS projects will grow stronger in the future [1] and theseprojects will see integration in enterprise architecture [2].This increased use of OSS projects creates the need forbetter project evaluation measures.

Traditionally, software projects are evaluated by con-formance to budget, schedule, and user requirements [3], [4],[5], [6], [7], [8]. These measures, however, are difficult tomap to OSS projects, which are developed through anetwork of volunteer participants, with no defined budget,schedule, or customer. Although there is a surge in theinvestment in OSS projects [1], research indicates that a largenumber of OSS projects fail [9], [10]. Some have questionedthe operational reliability and quality of OSS projects [11].Since there are no contractual or legal bindings for providingOSS updates or maintenance services, businesses investinghuman or financial capital on adoption of OSS projects needthe ability to evaluate whether the project will continue toexist or not [12]. Development teams need to measure

project survivability to control and improve performance.Individual and corporate users need a measure of projectsurvivability to compare the available OSS projects beforemaking decisions regarding project adoption.

In this paper, we define and validate a new multi-dimensional measure of OSS project survivability, calledProject Viability. OSS projects provide access to theirdevelopment archives, thereby providing a unique oppor-tunity to conduct empirical research [13] and developreliable measures [14], [15]. In the following sections, wedefine, formulate, and validate project viability. Section 2provides a brief overview of the existing empirical researchin OSS and the background of project survivability. Section 3defines the dimensions of project viability and formulatesan index to measure it. Section 4 discusses the empiricalevaluation framework and validates the new measure usingOSS project data. Discussion of the results is presented inSection 5 and conclusions are given Section 6.

2 BACKGROUND

A large number of OSS projects are available for use.However, the failure rate of these projects is high [9]. Theevaluation of OSS projects is different than CommercialSoftware Systems (CSS) [16]. The adopters of OSS projectsneed a mechanism to compare the chances of failure orsurvival of the available projects. This would allow betterdecisions regarding corporate resource investment.

A range of measures has been used in prior research toevaluate OSS projects. Godfrey and Tu [17] examined theevolution of the Linux kernel and its growth pattern in oneof the first empirical studies in the OSS domain. They usedthe Source Lines of Code (SLOC) to compare the growthpattern of Linux to CSS projects and found evidence thatOSS growth rates are significantly high compared to CSSprojects. Paulson et al. [18] compared OSS and CSS projectsusing a diverse sample of OSS projects and found no

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 38, NO. 1, JANUARY/FEBRUARY 2012 163

. U. Raja is with the Department of Information Systems, Statistics andManagement Science, The University of Alabama, Box #870226,300 Campus Drive, Tuscaloosa, AL 35487. E-mail: [email protected].

. M.J. Tretter is with the Department of Information and OperationsManagement, Texas A&M University, Mail Stop #310D, WehnerBuilding, College Station, TX 77840. E-mail: [email protected].

Manuscript received 30 Oct. 2009; revised 14 June 2010; accepted 21 Aug.2010; published online 1 Apr. 2011.Recommended for acceptance by R. Jeffery.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TSE-2009-10-0294.Digital Object Identifier no. 10.1109/TSE.2011.39.

0098-5589/12/$31.00 ! 2012 IEEE Published by the IEEE Computer Society

Page 22: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium

Ongoing  Research  GNOME  case  study  2

�22

Study  migra5on  paberns  of  contributors  and  their  effect  on  project  survivability

Page 23: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium

Time

Joiners

1997 1999 2001 2003 2005 2007 2009 2011 2013

05

1015

2025

3035

�23

Ongoing  Research  GNOME  case  study  2

EvoluEon Gimp GTK+

28 Tom Mens, Maelick Claes, Philippe Grosjean and Alexander Serebrenik

project that were not active in this project during the preceding 6-month period,but that were involved in some activity in other GNOME projects instead. Globaljoiners are incoming coders in the considered project that were not active in anyof the GNOME projects during the preceding period. A similar definition holds forthe local and global leavers. Formally, the metrics are defined as follows. Let p bea GNOME project, t a 6-month activity period (and t � 1 the previous period), c acoder, Gnome the set of GNOME’s code projects, and isDev(c, t, p) is a predicatewhich is true if and only if c made a code commit in p during t:

localLeavers(p, t) ={c|isDev(c, t �1, p)^¬isDev(c, t, p)^9p2 (p2 2 Gnome^ isDev(c, t, p2))}

globalLeavers(p, t) ={c|isDev(c, t �1, p)^8p2 (p2 2 Gnome ) ¬isDev(c, t, p2))}

localJoiners(p, t) ={c|isDev(c, t, p)^¬isDev(c, t �1, p)^9p2 (p2 2 Gnome^ isDev(c, t �1, p2))}

globalJoiners(p, t) ={c|isDev(c, t, p)^8p2 (p2 2 Gnome ) ¬isDev(c, t �1, p2))}

Time

Joiners

1997 1999 2001 2003 2005 2007 2009 2011 2013

05

1015

2025

30

Time

Joiners

1997 1999 2001 2003 2005 2007 2009 2011 2013

05

1015

2025

30

Time

Joiners

1997 1999 2001 2003 2005 2007 2009 2011 2013

05

1015

2025

30

evolution gtk+ gimp

Fig. 1.11 Historical evolution (timeline) of the number of local (black solid) and global (reddashed) joiners (y-axis) for three GNOME projects.

We did not find any general trend, the patterns of intake and loss of coders arehighly project-specific. Figure 1.11 illustrates the evolution of the number of localand global joiners for some of the more important GNOME projects (the figures forleavers are very similar). For some projects (e.g., evolution) we do not observea big difference between the number of local and global joiners, respectively. Theseprojects seem to attract new developers both from within and outside of GNOME.Other projects, like gimp, attract most of its incoming developers from outsideGNOME. A third category of projects attracts most of its incoming developers fromother GNOME projects. This is the case for gtk+, glib and libgnome, whichcan be considered as belonging to the core of GNOME. This observation seems tosuggests that libraries, toolkits and auxiliary projects attract more inside developers,while projects that are well-known to the outside world (such as GIMP, a popular

Timeline  (6-­‐month  intervals)  of  joiners  to  Gnome  projects

-­‐  Black  =  local  joiners  from  other  Gnome  projects  -­‐  Red  =  global  joiners  from  outside  of  Gnome  -­‐  Blue  =  stayers

Time

Joiners

1997 1999 2001 2003 2005 2007 2009 2011 2013

05

1015

2025

3035

Time

Joiners

1997 1999 2001 2003 2005 2007 2009 2011 2013

05

1015

2025

3035

Page 24: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium

Time

Leavers

1997 1999 2001 2003 2005 2007 2009 2011 2013

05

1015

2025

3035

Time

Leavers

1997 1999 2001 2003 2005 2007 2009 2011 2013

05

1015

2025

3035

Time

Leavers

1997 1999 2001 2003 2005 2007 2009 2011 2013

05

1015

2025

3035

�24

MigraEon  in  so6ware  ecosystems Gnome  case  study

EvoluEon Gimp GTK+

-­‐  Black  =  local  joiners  from  other  Gnome  projects  -­‐  Red  =  global  joiners  from  outside  of  Gnome  -­‐  Blue  =  stayers

28 Tom Mens, Maelick Claes, Philippe Grosjean and Alexander Serebrenik

project that were not active in this project during the preceding 6-month period,but that were involved in some activity in other GNOME projects instead. Globaljoiners are incoming coders in the considered project that were not active in anyof the GNOME projects during the preceding period. A similar definition holds forthe local and global leavers. Formally, the metrics are defined as follows. Let p bea GNOME project, t a 6-month activity period (and t � 1 the previous period), c acoder, Gnome the set of GNOME’s code projects, and isDev(c, t, p) is a predicatewhich is true if and only if c made a code commit in p during t:

localLeavers(p, t) ={c|isDev(c, t �1, p)^¬isDev(c, t, p)^9p2 (p2 2 Gnome^ isDev(c, t, p2))}

globalLeavers(p, t) ={c|isDev(c, t �1, p)^8p2 (p2 2 Gnome ) ¬isDev(c, t, p2))}

localJoiners(p, t) ={c|isDev(c, t, p)^¬isDev(c, t �1, p)^9p2 (p2 2 Gnome^ isDev(c, t �1, p2))}

globalJoiners(p, t) ={c|isDev(c, t, p)^8p2 (p2 2 Gnome ) ¬isDev(c, t �1, p2))}

Time

Joiners

1997 1999 2001 2003 2005 2007 2009 2011 2013

05

1015

2025

30

Time

Joiners

1997 1999 2001 2003 2005 2007 2009 2011 2013

05

1015

2025

30

Time

Joiners

1997 1999 2001 2003 2005 2007 2009 2011 2013

05

1015

2025

30

evolution gtk+ gimp

Fig. 1.11 Historical evolution (timeline) of the number of local (black solid) and global (reddashed) joiners (y-axis) for three GNOME projects.

We did not find any general trend, the patterns of intake and loss of coders arehighly project-specific. Figure 1.11 illustrates the evolution of the number of localand global joiners for some of the more important GNOME projects (the figures forleavers are very similar). For some projects (e.g., evolution) we do not observea big difference between the number of local and global joiners, respectively. Theseprojects seem to attract new developers both from within and outside of GNOME.Other projects, like gimp, attract most of its incoming developers from outsideGNOME. A third category of projects attracts most of its incoming developers fromother GNOME projects. This is the case for gtk+, glib and libgnome, whichcan be considered as belonging to the core of GNOME. This observation seems tosuggests that libraries, toolkits and auxiliary projects attract more inside developers,while projects that are well-known to the outside world (such as GIMP, a popular

Timeline  (6-­‐month  intervals)  of  leavers  from  Gnome  projects

Page 25: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium

Some  references

�25

To appear in 2013 in Springer’s Empirical Software Engineering journal – manuscript No.(will be inserted by the editor)

On the variation and specialisation of workload – Acase study of the Gnome ecosystem community

Bogdan Vasilescu · Alexander Serebrenik ·Mathieu Goeminne · Tom Mens

DOI: 10.1007/s10664-013-9244-1

Abstract Most empirical studies of open source software repositories focus on theanalysis of isolated projects, or restrict themselves to the study of the relation-ships between technical artifacts. In contrast, we have carried out a case study thatfocuses on the actual contributors to software ecosystems, being collections of soft-ware projects that are maintained by the same community. To this aim, we defineda new series of workload and involvement metrics, as well as a novel approach—eT-graphs—for reporting the results of comparing multiple distributions. We usedthese techniques to statistically study how workload and involvement of ecosys-tem contributors varies across projects and across activity types, and we exploredto which extent projects and contributors specialise in particular activity types.Using Gnome as a case study we observed that, next to coding, the activities of lo-calization, development documentation and building are prevalent throughout theecosystem. We also observed notable di↵erences between frequent and occasionalcontributors in terms of the activity types they are involved in and the numberof projects they contribute to. Occasional contributors and contributors that areinvolved in many di↵erent projects tend to be more involved in the localization ac-tivity, while frequent contributors tend to be more involved in the coding activityin a limited number of projects.

Keywords open source · software ecosystem · metrics · developer community ·case study

B. Vasilescu and A. SerebrenikMDSE, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, The Nether-landsTel.: +31-40-2473595 Fax: +31-40-2475404E-mail: {b.n.vasilescu | a.serebrenik}@tue.nl

M. Goeminne and T. MensCOMPLEXYS Research Institute, Universite de Mons, Place du Parc 20, 7000 Mons, BelgiumTel.: +32-65-373453 Fax: +32-65-373459E-mail: {mathieu.goeminne | tom.mens}@umons.ac.be

UMONSFaculté des Sciences

Département d’Informatique

Understanding the Evolution ofSocio-technical Aspects in Open SourceEcosystems: An Empirical Analysis of

GNOME

Mathieu Goeminne

A dissertation submitted in fulfillment of the requirements ofthe degree of Docteur en Sciences

Advisor Jury

Dr. TOM MENS Dr. XAVIER BLANCUniversité de Mons, Belgium Université de Bordeaux 1, France

Dr. VÉRONIQUE BRUYÈREUniversité de Mons, Belgium

Dr. JESUS M. GONZALEZ-BARAHONAUniversidad Rey Juan Carlos, Spain

Dr. TOM MENSUniversité de Mons, Belgium

Dr. ALEXANDER SEREBRENIKTechnische Universiteit Eindhoven, The Netherlands

Dr. JEF WIJSENUniversité de Mons, Belgium

June 2013

A historical dataset for GNOME contributorsMathieu Goeminne, Maelick Claes and Tom Mens

Software Engineering Lab, COMPLEXYS research institute, UMONS, Belgium

Abstract—We present a dataset of the open source

software ecosystem GNOME from a social point of view.

We have collected historical data about the contributors

to all GNOME projects stored on git.gnome.org, taking

into account the problem of identity matching, and as-

sociating different activity types to the contributors. This

type of information is very useful to complement the

traditional, source-code related information one can ob-

tain by mining and analyzing the actual source code.

The dataset can be obtained at https://bitbucket.org/

mgoeminne/sgl-flossmetric-dbmerge.

I. INTRODUCTION

The historical and empirical study of open sourcesoftware (OSS) ecosystems is a relatively recent but fast-growing research domain. An important characteristic ofsuch ecosystems, at least according to our definition [15],is the fact that they are made up of a set of softwareprojects sharing a community of users and contributors.A well-known example is GNOME. Its constituent soft-ware projects are designed to work together in order toconstitute a complete software desktop environment. TheGNOME projects are developed by a developer commu-nity that is spread across the world. We have observedthat it is not uncommon for a contributor to be activelyinvolved in many projects at a time [16]. In additionto this, the type of activity a contributor is involved inmay change from one person to another. For example,a very important activity involves internationalization(localization and translation), which is globally managedvia the web application Damned Lies1 for all GNOMEtranslation teams.

Many tools and datasets have been proposed to anal-yse a software project’s history, but few are availableat the level of the ecosystem because of the additionallevel of difficulty involved. It does not suffice to simplyconsider the union of all project histories belonging tothe same ecosystem. Because some projects may havecontributors in common, and some contributors may beinvolved in different projects over time, this informationneeds to be explicitly represented at the ecosystemlevel. The same is true for the types of activity of anecosystem’s contributor, and how this varies over time,and over the different projects he is involved in.

1http://l10n.gnome.org

In this paper, we present the process we have usedto create a dataset containing the historical informationrelated to contributors to the GNOME ecosystem. Ourdatabase and the tools and scripts used to created it canbe found on a dedicated Bitbucket repository2.

In contrast to many other datasets, we do not focus onsource code, since a significant amount of files commit-ted to GNOME’s project repositories do not even containcode (e.g., image files, web pages, documentation, lo-calization and many more). Such type of information isoften ignored in MSR research while it is very relevantto understand which types of activities contributors areinvolved in. For GNOME we observed, for example, thata significant fraction of the community is working oninternationalization instead of code [16].

II. MOTIVATION

An important motivation for creating a historicaldataset for analysing contributors to the GNOME ecosys-tem was inspired by the many OSS repository miningstudies that have used GNOME as a case study [2], [13].In 2009 and 2010, GNOME was part of the MSR MiningChallenge, which lead to many contributions [1], [5], [8],[9], [11], [12], [14].

Of specific interest, in the context of software ecosys-tem research, are the social interactions in the commu-nity of contributors. Following a holistic approach, [7]estimated effort and studied developer co-operation andco-ordination in GNOME, based on the version controlrepositories and mailing lists. Similarly, [4] developed anadvanced measure of individual developer contributionbased on the source code repository, mailing lists andbug tracking systems, and applied the measure to anumber of GNOME projects. [6] studied six GNOMEprojects in order to understand how contributors join,socialize and develop within GNOME. [10] studied re-lations between the GNOME contributors by means ofsocial network analysis.

In our own previous work [15], [16] we used thedataset presented in this article to statistically analyse thespecialization of workload and involvement of GNOMEcontributors across projects and activity types, and we

2https://bitbucket.org/mgoeminne/sgl-flossmetric-dbmerge

@  MSR  2013

Page 26: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  BelgiumFebruary  2014  -­‐  CSMR-­‐WCRE  So6ware  EvoluEon  Week,  Antwerp,  Belgium

References

�26

Mens, Tom; Serebrenik, Alexander; Cleve, Anthony (Eds.) 2014, XXIII, 404 p. !Springer, ISBN 978-3-642-45398-4

Chapter 10Studying Evolving Software Ecosystemsbased on Ecological Models

Tom Mens, Maelick Claes, Philippe Grosjean and Alexander Serebrenik

Research on software evolution is very active, but evolutionary principles, modelsand theories that properly explain why and how software systems evolve over timeare still lacking. Similarly, more empirical research is needed to understand howdifferent software projects co-exist and co-evolve, and how contributors collaboratewithin their encompassing software ecosystem.

In this chapter, we explore the differences and analogies between natural ecosys-tems and biological evolution on the one hand, and software ecosystems and soft-ware evolution on the other hand. The aim is to learn from research in ecology toadvance the understanding of evolving software ecosystems. Ultimately, we wishto use such knowledge to derive diagnostic tools aiming to analyse and optimisethe fitness of software projects in their environment, and to help software projectcommunities in managing their projects better.

Tom Mens and Maelick Claes and Philippe GrosjeanCOMPLEXYS Research Institute, University of Mons, Belgiume-mail: tom.mens,maelick.claes,[email protected]

Alexander SerebrenikEindhoven University of Technology, The Netherlandse-mail: [email protected] work has been partially supported by F.R.S-F.N.R.S. research grant BSS-2012/V 6/5/015author’s stay at the Universite de Mons, supported by the F.R.S-F.N.R.S. under the grant BSS-2012/V 6/5/015. and ARC research project AUWB-12/17-UMONS-3,“Ecological Studies of OpenSource Software Ecosystems” financed by the Ministere de la Communaute francaise - Directiongenerale de l’Enseignement non obligatoire et de la Recherche scientifique, Belgium.

245

Page 27: ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium

Interested  in  joining?

• Open  PhD  posiEon  available  • 6  to  12  month  postdoc  visits  welcomed

�27