Software Evolution anno 2014: directions and challenges Alexander Serebrenik @aserebrenik [email protected]
May 26, 2015
Software Evolution anno 2014:directions and challenges
Alexander Serebrenik
@aserebrenik
2008
Time for a new book!
2014
2008 vs. 2014
From systems to ecosystems
Business-oriented view
“a set of actors functioning as a unit and interacting with a shared market for software and services, together with the relationships among them.”
with thanks to International Data Corporation (IDC)
Development-centric view
a collection of software projects that are developed and evolve together in the same environment
with thanks to Bram Adams
Socio-technical viewa community of persons (end-users, developers, debuggers, …) contributing to a collection of projects
Technical
Scientific
Practical
Legal and ethical
Technical challenges
• eliminate non-names• eliminate specific quirks• group “similar” names
– first/last name – textual similarity– latent semantic analysis
• (correct groups manually)
Technical challenges
Technical challenges
• eliminate non-names• eliminate specific quirks• group “similar” names
– first/last name – textual similarity– latent semantic analysis
• (correct groups manually)
Technical challenges
Structured data2008
Unstructured data2014
Technical challenges
Structured data2008
Unstructured data2014
Scientific challenges
Scientific challenges
Raw dataProcessed data set
Tools & scripts
#MSR papers 2004-2009
Y Y Y 2Y Y N 2Y P Y 1Y P P 2Y P N 2Y N Y 16Y N P 19Y N N 64P N Y 1P N N 2N Y N 2N P N 1N N Y 7N N P 2N N N 31N/A N/A N/A 17
We share raw data but rarely share tools – reinventing the wheel anybody?
Practical challenges
• How can we share our big data with other researchers?• Different formats, different tools, storage
problems, …• How can we make our research results useful
to practitioners and development communities?
• How can we build tools and dashboards that integrate our findings?
Legal and ethical challenges
(especially for survey data)
http://www.intracto.com/blog/online-privacy-belangrijk
k-anonymity
k-anonymity
l-diversityt-closeness
2008 vs. 2014
From “traditional” to “non-traditional” artifacts:
What is software?
http://ctms.engin.umich.edu/CTMS/index.php?example=Introduction§ion=SimulinkModeling
Maintainability???Evolution???
BumbleBee: a refactoring tool for spreadsheets
with thanks to Felienne Hermans
http://help.eclipse.org/juno/index.jsp?topic=%2Forg.eclipse.m2m.atl.doc%2Fguide%2Fconcepts%2FModel-Transformation.html
http://help.eclipse.org/juno/index.jsp?topic=%2Forg.eclipse.m2m.atl.doc%2Fguide%2Fconcepts%2FModel-Transformation.html
• describe evolutionary steps • relate to changes of other
artifacts• describe prevalence in
practice • support automation
New kind of verification
artifacts
2008
2009
2012
2013
2008 vs. 2014
From technical to socio-technical perspective:
Who are these people?
What do they do?
> 90% in WordPress & Drupal> 95% in FLOSS surveys> 87% in GNOME> 70% in software-related jobs (NSF)
MEN
FLOSS 2013
Europe,US,CA,AUBrazil/Argentina
How can we reliably and efficiently identify gender, age, location?
Technical challenges
?
Name + Location = Gender
Lonzo Alonzo ⇒
w35l3y wesley ⇒
Name + Location = Gender
<title>Ben Kamens</title>…<h1>We’re willing to be embarrassed about what we <em>haven’t</em> done…</h1>
Heuristics: title + first h1
Ben Kamens We’re willing to be embarrassed about what we haven’t done…
<PERSON>Ben Kamens</PERSON> We’re willing to be embarrassed about what we haven’t done…
Stanford Named Entity Tagger
Quality of gender resolution: SurveySelf-identification
As inferred TotalM F ?
M 60 3 43 106F 2 5 4 11
Self-identification
As inferred TotalM F ?
M 90 3 13 106F 2 9 0 11
+ avatars, other social media sites (manually)
PAGE 4212-04-2023
.cpp .po
.jpg
/test/
/library/ .doc
makefile .sql .conf
Occasional contributors
Frequent contributors
How can we reliably and efficiently identify human activities?
Technical challenges
How can we reliably and efficiently identify human activities?
Technical challenges