HAL Id: tel-00528121 https://pastel.archives-ouvertes.fr/tel-00528121 Submitted on 21 Oct 2010 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Tapping into the source: corporate involvement in open source software Jan Eilhard To cite this version: Jan Eilhard. Tapping into the source: corporate involvement in open source software. Economics and Finance. École Nationale Supérieure des Mines de Paris, 2010. English. NNT : 2010ENMP0066. tel-00528121
141
Embed
Doctorat ParisTech THÈSE l’École nationale supérieure des ...– the “Economics and Econometrics of Innovation” seminar in Paris in December 2008, – the MERIT-UNU seminar
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
HAL Id: tel-00528121https://pastel.archives-ouvertes.fr/tel-00528121
Submitted on 21 Oct 2010
HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.
Tapping into the source: corporate involvement in opensource software
Jan Eilhard
To cite this version:Jan Eilhard. Tapping into the source: corporate involvement in open source software. Economics andFinance. École Nationale Supérieure des Mines de Paris, 2010. English. �NNT : 2010ENMP0066�.�tel-00528121�
École doctorale nO396 : Economie, organisation, société
Doctorat ParisTech
T H È S Epour obtenir le grade de docteur délivré par
l’École nationale supérieure des mines de Paris
Spécialité « Economie et finance »
présentée et soutenue publiquement par
Jan EILHARDle 14 mai 2010
Tapping Into the Source : Corporate involvement in open sourcesoftware
Directeur de thèse : François LEVEQUECo-encadrement de la thèse : Yann MENIERE
JuryM. Marc BOURREAU, Professeur, SES, Paris Telecom - Paristech RapporteurM. Eric BROUSSEAU, Professeur, Economix, Université de Paris X ExaminateurM. Eric STROBL, Professeur, CECO, Ecole Polytechnique ExaminateurM. Mikko VALIMAKI, Professeur, Helsinki University of Technology RapporteurM. François LEVEQUE, Professeur, CERNA, Ecole des Mines - Paristech Directeur de thèse
To reduce clutter, we drop the subscripts for projects and time periods in the further
presentation when the interpretation is unambiguous. We want to analyze the produc-
tivity of the three different groups of contributors. Without any prior constraints on
the function � �����, we estimate the coefficients of a translog production function for
all open source projects. The translog function is an attractive flexible specification. It
has both linear and quadratic terms with the ability of using more than two factor in-
puts, and can be approximated by a second order Taylor series (Christensen and Greene,
1976). Applied to our three labor inputs, the translog production function can be written
in logarithmic form as:
� ������ �� ��
�
�� ���� ��
�
�
�
�
�
��� ���� ���� � � ��� (2.2)
where ���� is the respective number of developers for academic, corporate or private
developer groups, �� � � ����� ��, and ��� is a spillover proxy which we will define later
in the chapter.
Observe that the translog function can be transformed in a standard Cobb-Douglas
function by imposing zero-value coefficients on the second order terms. This would im-
ply that the estimated elasticity of output with respect to each input be constant by
assumption. For sake of comparison, we also estimate this constrained specification.
41
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
Besides its flexibility, another advantage of the translog function lies in the interpreta-
tion of the second order terms. The sign of the coefficients for the squared logarithms of
labor inputs hints towards increasing or decreasing elasticity of output. We can therefore
use it as an indicator of increasing or decreasing returns with respect to each category
of programmers. Moreover, positive or negative coefficients for the interaction terms,
���� ����, make it possible to derive conclusions on the effect of interactions between
different categories of developers.
Estimating a production function creates a simultaneity problem (Marschak and An-
drews, 1944; Olley and Pakes, 1996). In the traditional setting, firms choose the factors of
production according to profit-maximizing, first-order conditions. Because they consider
firm-specific productivity differences, more productive firms use more factors of produc-
tion, rendering it more difficult to establish a causal link between production factors and
output. The control variables in �� are a first way to capture differences in productivity
between projects. These variables include dummies for changes in the development stage
of the projects and their topic of application, and several channels of knowledge spillovers
from other projects. We moreover address the simultaneity problem by using a project-
level fixed-effect estimation method. This approach eliminates major misspecification,
which are transmitted to the factor decisions and are constant over time (Griliches and
Mairesse, 1998). It however requires the assumption that unanticipated elements of the
error term at period � do not affect factor decisions at later periods.
2.4 Data
2.4.1 Data Source
We use a balanced panel of 10,553 open source projects tracked on SourceForge from
February 2005 until June 2007 (T = 28 months). SourceForge is an internet platform
for open source projects. It helps new projects attract developers and users. Sourceforge
provides the necessary tools for managing a software project, such as user fora, bandwidth
42
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
for downloads and a version control system to keep track of contributions. SourceForge
is the largest internet repository with almost 300,000 projects, similar, but considerably
smaller, platforms are GoogleCode or Kenai.
SourceForge as an incubator includes a large number of small or inactive projects.
Since creating new projects is costless, many projects may be started, just to be subse-
quently abandoned after a short while. For this reason Howison and Crowston (2004)
caution the use of SourceForge projects to draw conclusions about open source projects
in general. In this regard our approach is especially valuable, because we only look at
projects which have already released a version of their software, thus avoiding inactive
projects all together.
Open source projects regularly release newer versions of their software. These file re-
leases fix programming bugs, add new features or improve performance. Larger projects
generally announce their releases in advance; smaller projects may post updates inter-
mittently. There are various forms in which users can obtain these releases: compressed
source code, binary installers or text files. To have comparable observations, we only look
at compressed files. We only look at gzip files, as these were the most common types of
compressed files (30%) in the sample. This allows us to look at projects across different
operating systems, because binary installers often are operating system specific.
In addition to information on software projects, we obtained developer backgrounds
using their email addresses. These email addresses were related to the corresponding web-
sites and sorted into three categories: academic, private and corporate. We sampled ten
websites to obtain keywords for each category and counted the occurrences of each word.
We established a ranking and were thus able to attribute an email address with either
category, for example ’hotmail.com’ to the private category, ’ibm.com’ to the corporate
and ’ensmp.fr’ to the academic one. In all, we checked approximately 15,000 websites. We
then retained projects where all developer backgrounds were known as well those which
actually released a file update during the observed time period. so we could account
for all registered developers in a project, 10,553 projects remained in our sample. Lerner
43
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
et al. (2006) use a similar method in their study. Their analysis encompasses the top-level
domain names, i.e. ’.com’ for corporate developers or ’.edu’ for academic ones. This has
clear disadvantages compared to our method, because top-level domain names, especially
for ’.com’, confound private email accounts, such as for ’yahoo.com’, with corporate ones.
Our method has two drawbacks. On the one hand, we might see a type I error. We
overestimate the number of corporate developers for those who use their work email for
private purposes. They subscribe to SourceForge using their office email and thus pass as
corporate developers, even though their company does not directly promote open source
development. On the other hand, we might see a type II error. Contributors might use
their private email addresses for subscription, although their firms do support open source
projects. There is no direct way to mitigate these issues. One possibility is to argue by
deduction. Finding statistically significant differences between corporate, private and
academic developers might support the validity of our data collection method.
2.4.2 Descriptives
SourceForge offers 222 different topics for open source software. After regrouping these
categories in broader application fields, we obtain a more manageable 19 topics. Figure
2.1 shows the number of projects per topic in our data sample. We immediately see that
a lot of projects focus on four topics: software development, internet, communications
and system administration. The figure also shows the tremendous variety in topics that
we find on SourceForge. There are projects on religion - for instance an application to
calculate the Islamic prayer times - as well as scientific software and games - incidentally,
approximately 200 versions of Tetris are available on Sourceforge.
The number of developers is a possible indicator for project size. Figure 2.2 shows
that our sample contains an overwhelmingly large number of small projects. We see that
almost 80% of the observed projects have only one registered developer over the entire
time period. This begs the question whether these project have a significantly different
mode of production than projects with more developers. To test the impact of this possible
44
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
Figure 2.1: Projects per Topic
bias, we will estimate the production function with a sub-sample of the projects with at
least two developers (n’ = 2,406). A more detailed look at the developer categories reveals
that on average each project has at least one private developer and every fifth project has
a corporate one. Also, every fifth project has an academic contributor (Table 2.1).
Projects release new updates irregularly. Figure 2.3 depicts the number of projects
for the maximal number of file releases within the 28 months. We see that the majority of
projects has only one release over the entire period and that there are few projects which
are very productive, having more than ten releases.
We measure spillovers with the size of the code commons, i.e. size of the available
knowledge base. We compute the code commons available to project � as the sum of
the number of bytes of the new releases at time �, ��� ��
�������������, for all other
projects � , where � �� � , in a particular topic �, �� � ���� ��� ���� ����. We consider topic
to be a meaningful boundary of the code commons, because it encompasses software
with similar objectives, similar problems and, likely, similar solutions. Following the
same reasoning, we also calculate the available code commons with respect to topics and
programming languages. We assume that Table 2.1 shows that average size of the code
commons is considerable. Projects have on average 81 terabytes of - compressed - source
code available to copy and reuse. The refined knowledge base of projects with same topic
45
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
Figure 2.2: Maximal Number of Developers per Projects
Table 2.1: Summary Statistics
and same programming language still encompasses almost 10 terabytes of source code
which is about the size of the printed collection of the U.S. Library of Congress.
SourceForge provides a way to measure the development status of each project. Project
administrators indicate in which of the seven different stages the project is: planning,
pre-alpha, alpha, beta, stable, mature or inactive. Although the choice of one stage over
another is based on subjective criteria, it gives us a general idea of the progress and sta-
bility of the project. Table 2.2 shows the frequencies of projects within each development
stage. The mean gives the percentage share of each category within our sample. We see
46
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
Figure 2.3: Number of Releases
Table 2.2: Development Stages
that around 60% of the projects in our sample are in beta or later stages. This provides
further evidence that we are indeed dealing with actively developed projects.
2.5 Discussion
Table 2.5 shows the results of the regressions. Models 1 and 2 show the estimates for
the full sample (n = 10,553). We progressively render the functional form more flexi-
ble, beginning with a classic Cobb-Douglas production function (model 1) and using the
complete translog specification (model 2). The same translog specification is estimated
in models 9 and 10, respectively for the sample of projects with at least two developers
(n’ = 2,406) and for projects in beta or later development stages (n” = 7,018). All re-
gressions are fixed-effects models. A Hausman test shows that random-effects estimates
are inconsistent, refuting the assumption that unobservable project characteristics are
47
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
Table 2.3: Likelihood Ratio Test
uncorrelated with past, present or future values of the regressors. Considering the nature
of the independent variable, the count of file releases, the use of a Poisson panel regression
lends itself rather directly. The absence of zeros in the dependent variable prevents the
use of negative binomial regressions to test for overdispersion. We argue instead by visual
inspection (see table 2.1) that the assumptions for a Poisson regression are met, namely
that �������������� � � ������������.
We check the validity of our specification with a likelihood ratio test. Comparing the
complete translog function (model 2) with a Cobb-Douglas specification (model 1), we
test the null hypothesis that the each restricted model is a more adequate representation
of the data than the translog function. Table 2.3 shows the chi-squared and the respective
p-values in parentheses. The null hypothesis cannot be supported, thus suggesting that
indeed model 2 is the best specification for our data.
2.5.1 Developer Productivities
As can be seen in Table 2.5, we find significant coefficients for each category of developer
in the Cobb-Douglas specification (model 1). Recall that by construction this specifi-
cation implies constant elasticity of output with respect to each input. The estimated
input coefficients measure these elasticities. They suggest a slightly lower elasticity for
corporate programmers than for private and academic ones. However, this conclusion is
not confirmed statistically, the coefficients being equal with a high probability (p-value
= 0.85).
Model 2 makes it possible to go further in the comparison. The translog function
relaxes the assumption of constant elasticity by taking into account additional input
variables, namely the squared logs and the crossed logs of developers. This flexible spec-
ification also makes it possible to disentangle the different effects captured in each Cobb
48
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
Table 2.4: Computed Marginal Products
Douglas coefficient. Although the simple logs of developers remain significant in model 2,
they have lower values than in model 1, and the estimated coefficient become now higher
for corporate developers. Moreover, significant and positive coefficients for squared logs
show that elasticity of output with respect to each category of developer is in fact increas-
ing. In line with simple logs, these coefficients are higher for corporate developers than
for the other two categories. As shown in Table 2.5, the crossed effects are significant
only when corporate developers are involved, and then negative in each case. In other
term, we find empirical evidence of a negative effect of interactions between corporate
developers and other categories on the productivity of open source projects.
In sum, the productivity of corporate developers is subject to conflicting effects. On
the one hand, coefficients of simple logs and squared logs suggest that corporate developers
are more productive than other categories. On the other hand, corporate developers also
have a negative effect on production when they are associated with other categories.
Making comparisons between categories of developers requires taking into account
these conflicting effects. As a first step in this direction, we calculate the marginal product
of each developer group for each project in model 2. Table 2.4 presents the average and
the median computed marginal products for each developer group as well as the respective
t-statistics for equality of means. We see that the average corporate marginal product is
significantly higher than the marginal product for the other two groups. The difference
is statistically significant at the 1% level.
Our results suggest that corporate developers are on average more productive than pri-
vate or academic contributors. Table 2.4 shows that an additional corporate developer is
on average 42% more productive than a private contributor and approximately 78% more
productive than an additional academic developer. The median marginal productivity
49
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
might mitigate outliers and give a more accurate picture of the productivities in our data
sample. Considering the medians, the difference in productivities between contributor
groups is even more pronounced. Here, adding a corporate contributor increases produc-
tivity by 54% more compared to an additional private developer and 72% more compared
to an additional academic one.
2.5.2 Scale effects
Looking at the average marginal product gives only a partial picture, for this leaves out
diminishing or rising marginal productivity. To address this limitation, we consider now
the relationship between productivity and the scale of open source projects.
The simple Cobb-Douglas specification in model 1 makes it possible to derive clear
conclusions on returns to scale. The sum of the estimated coefficients for each category
of programmers is equal to 1.9813 – much above 1 – which suggests that open source
production is subject to increasing returns to scale. The quadratic and interaction terms
make it more difficult to identify returns to scale in the translog model. To address this
problem, we simulate five scenarios based on model 2: Projects with (i) all corporate,
(ii) all private, (iii) all academic, (iv) cooperation in which the number of contributors is
equally divided between the developer groups and (v) cooperation where the number of
developers is weighted among academic, private and corporate contributors with respect
to their sample frequencies. Figure 2.4 presents these simulations. In the first three
scenarios the project’s output is clearly a convex function of the number of developers
within the range of developers we observe. In other terms, each new contributor increases
the average developer productivity.
Returns to scale are increasing for projects that have developers with the same back-
ground. The question though is how well do the ’all-with-one-background’ scenarios cor-
respond to our sample. These three scenarios are clearly polar cases and we need to
look at what goes on in between these three extreme examples. We therefore calculate
two more simulations in which we assume that developers are added from each developer
50
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
group.
In Figure 2.4b, we can see that the curves for the fourth and fifth scenario are slightly
concave. We add a 45° line to make the concavity easier to see. For scenario 4 (equal
distribution), each new contributor is a less productive than the one before him. Average
productivity decreases as more developers are added to a project. The decreasing returns
to scale are less obvious in the fifth scenario (weighted distribution). Nonetheless, as more
developers are added to a project, average productivity decreases slightly, as can be seen
by comparing it with the 45° line.
The last two scenarios show that open source projects exhibit slight, decreasing returns
to scale. Our results depend on the functional form of the production function and
on the assumed entry to open source projects of developers with different backgrounds.
The heterogeneity of development communities determines whether we are more likely
to observe open source development resembling scenarios 1-3 or scenarios 4 or 5. To
be sure, all five scenarios are extreme cases that do not completely represent the actual
productivity in open source development. These scenarios give us an indication on the
average dynamics inside open source development projects.
Figure 2.5 represents the notion that these scenarios are the upper and lower bounds
in which our model predicts the returns to scale. The upper bound is the “one-single-
background” scenario for corporate developers and the lower bound is scenario 4 in which
we add contributors from all three backgrounds equally. Scenario 4 is the lower bound,
because the negative interaction effects between groups are largest in this scenario. The
area in between these bounds shows the potential returns to scale for any distribution of
contributor groups in a development community. Looking at the area between the two
bounds, we can infer that there is a considerable potential for increasing returns to scale
in open source projects.
A possible factor driving these results could have been a qualitative difference in nature
between the releases observed in nascent and more mature project. Developing the first
version of a software product may indeed take more time than subsequent improvement.
51
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
Figure 2.4: Simulations 1
(a) Same Background
(b) Cooperation
Figure 2.5: Simulations 2
52
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
We have therefore estimated the same translog specification respectively for the sample
of projects with at least two developers (n’ = 2,406) and for projects in beta or later
development stages (n” = 7,018). As can be seen in Table 2.5 (models 9 and 10), the
results do not differ substantially from model 2. Another possible explanation lies in the
way we measure the developers’ contribution. The increasing returns to scale we observe
might reflect the fact that developers tend to devote more time on large mature projects
– which we cannot control for. Still, our results indicate that open source production is
hardly subject to decreasing returns. Despite a possible bias in time measurement, this
sheds light on the intrinsic efficiency of this software production model. The combination
of modular design and voluntary contributions seem to be an efficient way to divide labor,
even for large projects.
2.5.3 Spillovers
In Tables 2.7 and 2.8, we present several models assessing the spillovers effects flowing
between projects through various channels. They show that the coefficients for spillovers
are positive and significant. The size of the code commons has a differentiated effect
on production depending on the developer group, topic, programming language and the
development stage of the project.
Model 2 includes a unique spillover variable denoting the number of terabytes of source
code available in other projects of the same topic. Albeit small, we find a positive and
significant coefficient for spillovers. Calculating the impact of the average input factors
and spillovers on production, we find that spillovers account for 2.5%, whereas developers
make out 97.5% of the entire effect (table 2.6).
We need to address the size of the spillovers on overall productivity. Why is the
spillover effect so small? On the one hand, this may be due to the difference between
codified information and tacit knowledge. In contrast to codified information, tacit knowl-
edge can only be learnt through face-to-face interaction or experience. Tacit knowledge
spillovers may be therefore less commensurable with our definition of code commons and
53
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
are left out of our estimation. This measurement error can lead us to underestimate the
effect of knowledge spillovers. On the other hand, it is also possible that the effect of
knowledge spillovers on productivity is small compared to other factor inputs. As we said
before, we conflate the influence of labor and capital inputs in our estimation. Compared
to these, the overall gains from knowledge spillovers might be minute. In the end, soft-
ware development may come down to having a powerful computer and spending a lot of
manpower on a project.
Despite the small size of the spillover effect, it is worthwhile to look more closely at
the channels through which spillovers flow. Looking at the interaction terms of developer
groups and spillovers (model 3), we find that the coefficients are significant for corporate
and academic contributors. The results indicate that adding another terabyte of code
commons increases the productivities for these two developer groups, but not for the
private one.
It thus seems that academic and corporate contributors copy and reuse more than
private developers and hence benefit more from the available code commons. There is no
significant difference between the effects for academic and corporate contributors (Chi² =
1.02). According to the adage that “good programmers know what to write. Great ones
know what to rewrite (and to reuse)” (Raymond, 2001), it would appear that academic
and corporate contributors are better open source developers than private ones. Of course,
there is another way to read these results. One could argue that corporate and academic
contributors are more pragmatic and goal-oriented than private ones. They may not
write software for fun or to learn. Using available source code in other projects helps
them achieve their objectives more quickly. This pragmatism might cause the larger
spillover effect for academic and corporate developers.
Furthermore, Model 4 looks at the impact of the license type on spillovers. We find
that the coefficient for same license type is positive, but not significant. By contrast the
coefficient for different license type is positive and slightly significant. Sharing the same
license type does not appear to promote knowledge spillovers, however the size of the
54
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
knowledge base matters for projects from different license family. Leaving the differences
in statistical significance aside - indeed at 10%, the latter coefficient is only slightly
significant -, this finding might say that, once source code is made public, contributors
care little about the license type and whether the license is restrictive or not. They may
simply use the available source code regardless of the constraints on license proliferation.
Hazarding a guess, we might say that there is a commingling of different license types at
the source code level.
Model 5 and 8 show that the further advanced a project is in its development, the more
it benefits from the available source code in other projects. Spillovers are significant only
in projects with Production and Mature status. A Wald test shows there is a significant
difference in spillovers between projects in Beta and Production stages (Chi² = 38.72).
It thus appears that spillovers are not a critical resource to start new projects, but rather
to improve and extend more mature projects. This suggests that spillovers could relate
to bug fixing or generic functionalities rather than the core of new software. Since they
mainly benefit the developers of mature projects, this result also provides an interesting
explanation for the non-decreasing returns to scale we observe in SourceForge projects.
We fail to find a significant difference between spillovers from projects with the same
topic and programming language and spillovers from projects with different programming
languages in model 4. Separating the proxies further, however, shows that spillovers for
projects with the same programming language are significantly stronger than for projects
with different languages (model 6). A Wald test reveals that there is a significant differ-
ence between spillovers of same versus different programming languages in production-
stage projects (Chi² = 4.45) and mature ones (Chi² = 7.6). Moreover, we find significant
spillover effects from projects with the same programming language, but different topics
in models 5 and 6. In model 5, the difference between the spillover effects is signifi-
cant for same topic-different programming language spillovers and different topic-same
programming language.
Finally, we find significant spillover effects from projects with the same programming
55
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
language, but different topics in models 5 and 6. In model 5, the difference between the
spillover effects is significant for same topic-different programming language spillovers and
different topic-same programming language.
2.6 Conclusion
The purpose of this paper was to study empirically the production of open source software.
Using a panel of 10,553 projects registered on SourceForge over a period of 28 months
(February 2005 until May 2007), we have estimated a production function relating the
number of file releases with the number of corporate, private and academic contributors.
We have considered two possible specifications of the production function, namely a Cobb-
Douglas function and a more flexible Translog specification. This approach made it
possible to highlight various interesting results, concerning the productivity of corporate
and other developers, the effects of their interactions within open source projects, returns
to scale driven by labor division and the existence of spillovers between projects.
Our first findings concern the developers’ productivity. We find empirical evidence
that corporate developers are generally more productive than voluntary ones. However,
this result must be balanced with the negative effect of interactions between the two
categories of developers. In other terms, although corporate developers are more efficient
individually, they seem less efficient in cooperating with other categories of developers
within a given project, which suggest possible conflicts or at least coordination failures
due to vested interests and/or different ways of approaching their work.
Our estimations suggest that open source projects may not be subject to decreasing
returns to scale. This result should be taken with caution due to a lack of data on the time
developers spend on each project. Still, it denotes a striking constrast with decreasing
returns that are oberserved in more traditional software production (Brooks, 1978). This
suggests that the peculiar organization of production in open source projects – based
on a combination of user-driven innovation, voluntary contribution and modular design –
56
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
enables an efficient division of labor between programmers. Such organizational efficiency
may explain why large open source projects such as Linux or Apache are particularly fierce
competitors for proprietary software.
Finally, we find evidence for positive spillovers between projects of the Source Forge
repository. We test different possible channels, and show that spillovers mainly flow
between projects with the same topic and, to a lesser extent, between projects with the
same programming language. Projects involving corporate and academic developers are
more likely to benefit from spillovers. Project development stage also matters: spillovers
only seem to benefit mature projects, thereby increasing further the total productivity
of large projects. Surprisingly, the license type neither favors nor hinders the flow of
spillovers between projects.
57
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
Table 2.5: Main Results
Table 2.6: Impact of Average Input Factors
58
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
Table 2.7: Results for Spillovers 1
59
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 2. A Look Inside the Forge: Developer productivity and spillovers inSourceForge projects
Table 2.8: Results for Spillovers 2
60
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3
Loose Contracts, Tight Control:
Corporate contributions in
SourceForge projects
3.1 Introduction
Companies increasingly contribute in the development of open source software. A recent
study shows that 70% of the development work for the Linux kernel has been done by
corporate developers (Kroah-Hartman et al., 2009). To be sure, this is not a singular
phenomenon. Firms are responsible for a considerable share of the development work in
all kinds of open source projects. Lakhani and Wolf (2005) find that around 40% of open
source developers are paid to contribute in projects found on the SourceForge website,
while Ghosh et al. (2002) observe that 54% of their surveyed open source developers are
paid.
But what leads firms to contribute in open source software? The academic literature
proposes several answers to this question (Wichmann, 2002b; Henkel, 2006). Firms con-
tribute in open source software to promote standardization. Contribution thus becomes
a collaborative standardization process. Each firm contributes to have its specifications
61
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
included in the final standard. In a similar vein, companies might send programmers to
open source projects to ensure compatibility with their products. Research also shows
that firms contribute to gain reputation within a development community. Lastly, firms
contribute for strategic reasons, for example to undermine a dominant incumbent.
We propose in this chapter a complementary approach to the standards and compat-
ibility reasoning. After all, the development community creates and improves software
voluntarily. So why doesn’t the firm simply request that a standard be included in a
piece of software or an application be made compatible to a product? We argue that it
contributes to mitigate the risks of the open source license. By using the open source
software the firm signs an incomplete contract, the open source license. The lack of fixed
feature specifications, delivery dates or development priorities render the software license
incomplete and create contractual hazards for the company. Therefore, it imposes addi-
tional governance structure in its relation with open source communities by dedicating
developers to the software project.
We use a two-fold empirical method to shed light on this issue. Our first empirical
looks at 2,643 open source projects extracted from Sourceforge to establish a relation
between contractual incompleteness and corporate participation. We find quantitative
evidence of to the notion that firms face contractual incompleteness when dealing with
open source communities. Our results suggest that projects are indeed more likely to
have high corporate participation when they show high degrees of contractual hazards.
In particular, the longer it takes for projects to treat requests and the more variable
the treatment is the higher is the share of corporate developers in the project. Supporting
the incomplete contracts hypothesis further, we find that as the frequency of the treated
submissions increases, corporate open source participation decreases. This is in line with
the transaction cost literature which asserts that the rate of repeated transactions can
mitigate the problem of incomplete contracts.
An exploratory survey provides additional insights on the attitudes towards firms
within different development communities. From May until July 2009, we conducted
62
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
a short email survey among 7,914 project maintainers who coordinate the development
work in open source communities and obtained 60 valid responses. Their comments con-
sistently mention the unwillingness of the community to pay special attention to corporate
needs. This qualtitative evidence suggests that there can be merit in using the theory of
transaction cost and incomplete contracts on corporate open source software. By partici-
pating, firms can affect the structure of the transaction with the development community
and can more closely control their exposure to contractual incompleteness.
Our analysis is unique in that we provide a theoretical underpinning to the growing
body of empirical evidence on corporate open source participation. We complement
previous literature in this field with tools and insights of New Institutional Economics.
In a broader context, this chapter links the literature of technology outsourcing to the
phenomenon of corporate open source software.
Additionally, we present a novel approach to measure incomplete contracts. The
chapter has four parts. The next section presents the theory of incomplete contracts
and the relation with corporate open source. Next, we introduce the data used for our
preliminary results. Then, we establish an econometric model for the SourceForge data.
Lastly, we discuss the results for both the regressions and the survey.
3.2 Background
3.2.1 Incomplete Contracts
We use a simplified setup to represent corporate open source software. In this setup, we
have a firm and a development community who provides an open source application. The
firm needs a computer application to perform a certain task. If the firm decides to use
the open source application, a transaction takes place between the firm and the develop-
ment community. The application has a given set of services in terms of features. The
development community in turn adds new features and improves existing functionality to
the application.
63
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
Unlike other commercial transactions, the firm does not sign a legal contract for the
open source application. The terms of trade between firm and application are laid out in
the open source license. If the firm needs additional features or improved functionality,
e.g. to accommodate new hardware, it has to interact with the development community.
The interactions between firm and community can take the form of bug reports, support
or feature requests, but can also be posts on mailing lists, submissions to fora or direct
contacts with developers.
A bug report for instance is a short message to a bug tracking system that contains
information about what in the application does not work and with which computer hard-
ware and operating system the malfunction has been experienced. Here is an actual bug
report from the JBoss project on Sourceforge.net:
”When I attempt to execute ’build all’ on the latest source, using my w2k machine,
I get the following error.
BUILD FAILED java.lang.OutOfMemoryError
I have set ANT_OPTS=-Xmx640m.
Even the ’build clobber’ command requires 60 MB of memory. Does this make sense?
Isn’t clobber just deleting a bunch of directories?
My latest attempt to ’build all’ topped out at 90 MB.
Could someone please run this [program] through a profiler? There’s probably a
memory leak in there somewhere. Given the nature of the task, it’s hard for me
to believe that it really requires 90+ MB.
Are there any workarounds for this?”1
Messages like the one above are then posted on the web site of the development community
to which other users and developers can respond.
These messages are requests to subcontract development work to the community. Even
though the open source license is not a formal contract (Laurent, 2004), it defines the
terms under which interactions between the firm and the development community take
place. The license sets only the framework for mutual collaboration. This means that it
defines the terms of modification and redistribution, but waives all warranty claims and1Here is a rough translation for the less technology savvy. The person who submitted the bug report
apparently has a problem with creating the application from the source code with the given option of ’buildall’. He uses the Microsoft Windows 2000 operating system and receives a message from the programthat he does not have enough main memory. After explaining his problem, he asks the developmentcommunity to run a program analysis tool, the so-called profiler, on the source code and suggests thatthere is a programming glitch in the source code somewhere.
64
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
responsibility for errors in the software (Rosen, 2004). In this regard, the open source
license is indeed an incomplete contract (McGovan, 2001). It only defines a minimal set
of conditions for the transaction.
Contracts are incomplete when they do not encompass all possible contingencies of a
transaction (Williamson, 1967). The aim of a contract is to outline the terms of trade;
the conditions under which a transaction takes place. Especially for complex contracts
these conditions cannot include all possible future circumstances. If an unforeseen cir-
cumstance occurs, the incentives for the trading parties may change and they may deviate
from the contract after signing it. This opportunistic behavior creates uncertainty about
the contract ex ante. Why draw up a contract which the other party will not honor
anyway? Knowing about the potential opportunism, trading partners might not engage
in transactions in the first place.
To overcome contractual incompleteness and thus to enable the transaction, trading
partners establish additional governance arrangements. These arrangements can range
from full integration (Grossman and Hart, 1986), long-term contracts (Joskow, 1985),
partial ownership agreements (Hennart, 1988), to off-setting relationship-specific invest-
ments (Heide and John, 1988). The firm thus imposes control over its transactions with
a combination of contract and governance structures. Economic theory asserts that the
trading partners will find the best arrangement to reduce the risk of opportunism (She-
lanski and Klein, 1995). This means that all transactions we can observe already have
sufficient governance structures in place.
When the firm decides to use an open source application, it faces also contractual
incompleteness. As mentioned before, development communities supply new features,
bug fixes and support services. All interactions between the development community and
the firm are voluntary, however. Therefore, the firm does not know whether and when
the development community treats its submissions and requests. The company cannot
impose any conditions on the open source developers about the priorities of its requests.
The firm can reduce the incompleteness of the contract by setting up additional gov-
65
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
ernance arrangements. Contributing in the development process of the open source com-
munity can be such an arrangement. Assigning a paid programmer to develop in an open
source project is similar to partial vertical integration. The programmer participates in
the development effort of an open source community and establishes a degree of control
over the transaction between the firm and the community. He becomes the interlocutor
for the firm inside the development community. In environments that change quickly
and hence render the incompleteness of the contract more important, firms may need to
impose more control over the development process and thus assign more developers to
a project. Corporate contribution is a flexible way of vertically integrating open source
software development. Doing so permits the firm to reduce the contractual incomplete-
ness.
In New Institutional Economics, the attributes of the transaction play an important
role in determining the incompleteness of the contract and create contractual hazards
(Williamson, 1991). These attributes can comprise measurement difficulties and the fre-
quency of repeated transactions, but also relationship-specific assets or weaknesses in the
institutional environment. We can find these contractual hazards in the corporate open
source software also.
Measurement difficulties and the frequency of repeated transactions are key elements
in contractual hazards (Shelanski and Klein, 1995; Williamson, 1996). Difficulties in
measuring the outcome of a transaction can lead to a principal-agent problem. When the
outcome of a transaction cannot be fully ascertained, one trading partner may have an
incentive to shirk the contract. In turn, the other partner may be unsure about the quality
of the traded good and refrain from the transaction entirely. The frequency at which
transactions take place plays an important role in determining contractual incompleteness.
Few transactions over long periods of time increase the risk of uncertainty and the problem
of incomplete contracts is more pronounced. The possibility of changing circumstances is
high and, as unforeseen contigencies occur, contracts adjust only slowly. Moreover, with
little repeated trading, partners do not face a reputational effect in contracting. One-off
66
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
deviance cannot be punished easily and uncertainty about a transaction increases.
Relationship-specific investments (Klein et al., 1978; Joskow, 1985) can render the
incompleteness of a contract more prevalent. When a firm has to invest in relationship-
specific assets prior to a transaction, unforeseen contingencies or even a failure of the
transaction is costly for the investing party. Asset specificity occurs when there is little
salvage value to an investment outside the transaction. This creates a holdup problem for
the partner who paid for the machine: Due to his investment, he has a weak bargaining
position in case the contract is renegotiated. Again, knowing this beforehand, the partner
might not be willing to invest in the relationship-specific asset in the first place and the
transaction fails.
Embedded systems using open source software are such specific assets. These systems
are electronic devices for specific purposes or built as fixed hardware-software bundles,
such as GPS receivers, medical devices or cellphones. The cost to adapt the software to the
firm’s hardware and to meet the functional requirements can be significant. The specificity
of the embedded system arises through the modifications to the open source software as
well as the design of the electronic hardware. Corroborating our contention, embedded
systems are prominent examples of corporate open source participation (Henkel, 2006).
One well-documented example of an embedded system is the Maemo software platform
and Nokia’s smartphones. Maemo is an operating system for cellphones which is based on
open source software components (Jaaksi, 2006). Using open source software, Nokia still
incurs cost to modify the software and adapt the hardware. As Jaaksi (2009), Nokia’s
vice-president of open source operations, points out ”[i]n addition of getting the most
significant parts of the code from community projects, the Nokia Maemo team has a huge
job to develop, finalize, optimize, fine tune, test, and integrae [sic] the devices into ready
packages.” These investments in the embedded platform are made prior to marketing the
actual smartphones. Future transactions between Nokia and open source applications run
the danger of a holdup due to the specificity of the Maemo platform and the handheld
devices. To reduce this risk, Nokia shares not only its modifications, but also participates
67
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
in the general development of the open source applications.
Lastly, weak institutational environments can reinforce the incompleteness of con-
tracts. The strength of the intellectual property right regime is important in the trans-
action between the firm and the open source application. In case the firm already has
additional governance arrangements in place, for example a corporate developer, the type
of open source license influences the contractual incompleteness. The corporate devel-
oper works in the development community and contributes source code to the applica-
tion. These contributions are investments in the open source application. The firm wants
to appropriate its investment and protect its intellectual property in the open source
application (Dahlander and Magnusson, 2005; Fosturi et al., 2008).
The appropriation of intellectual property in open source software is difficult. The
type of the open source license plays an important role. Some types of open source licenses
allow the use of the source code in proprietary software and thus facilitate appropriation,
others do not. Scholars call the former less-restrictive, or academic licenses, and the
latter restrictive, or strong copyleft licenses. According to which type of open source
license the open source application runs, the firm needs to establish more governance
arrangements, perhaps vertically integrating the open source application and obtaining
the entire copyright over the application’s source code.
This gives the firm another way to impose control over its open source transactions,
apart from corporate contribution. The firm can retain the copyright of the source code
and oblige other contributors to transfer the copyright of the source code to the firm.
This method gives the firm legal leverage to enforce its control over the development
community. It is a common practice for corporate-led open source applications. MySQL,
Mozilla and Microsoft are typical examples of this practice. We note that the success of
this method depends crucially on the willingness of the open source community to accept
these constraints and trust the company in its open source comittment (Shah, 2006).
68
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
3.3 Data
We use a twofold approach to the analysis of corporate open source software in light of
incomplete contract theory. First, we estimate a reduced form model with a large data
set. Our regression relates the share of corporate developers with proxies for contractual
hazards and incompleteness. Second, we review the responses of 60 project maintainers
in an exploratory survey. The responses and comments that we obtained in this survey
shed some light on the interaction between development community and firm. We find
that the two analyses give converging pieces of evidence on the validity of incomplete
contract theory in corporate open source software.
3.3.1 Quantitative Data
We use data of development communities, or so-called projects, on SourceForge. Source-
Forge (SF) is an Internet platform that acts as an intermediary between project initiators,
developers and users of open source software. Subscription to SF is free and only needed
to participate in any phase of project development, i.e. for fixing a bug, contributing
code or authorizing another release. SF offers the necessary infrastructure to maintain
developer communities, the bandwidth for downloads as well as facilities to manage mail-
ing lists and search facilities for users. We track 2,643 projects over a time period of 28
months from February 2005 until June 2007 (N = 74,004). Table 3.1 presents a list of all
variables.
We retrace the possible factors that affect a firm’s perception of contractual incom-
pleteness and lead it to partially integrate the development of an open source application.
According to our hypothesis, a firm would choose to contribute if the expected probabil-
ity of having its request fixed by the community is low. To approximate this probability,
we use characteristics of the development community about the lengths of time taken to
treat submissions and the number of treated submissions.2
2On a technical note, the uncertainty disappears, if a firm knows the exact probability distribution.So, we have to assume some form of imperfect information on the part of the company. To mimic this,we use variables that are observable by the firm, but that do not assume a given probability distribution
69
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
Table 3.1: Summary Statistics (N=74,004)
70
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
We use the percentage of corporate developers with respect to all contributors in a
development community as a measure of firms’ involvement in a project. This corporate
participation rate (cprate), see Table 3.1, lies well below the findings of other studies on
corporate contributions Ghosh et al. (2002); Lakhani and Wolf (2005); Hammond et al.
(2009). However, previous empirical research on SF notes that a large share of projects
are abandoned or have only one developer (Howison and Crowston, 2004). Bearing the
strong heterogeneity of SF projects in mind, it is also not surprising that participation
rates are distributed very unevenly. Few projects attract a lot of corporate attention,
whereas a large share has no paid developer at all.
Our premise is that the measures for contractual incompleteness and contractual haz-
ards influence the firm’s choice to contribute in the open source community and thus to
establish additional governance. By contrast, we assume that corporate contribution does
not affect our measures of contractual hazard in return. This means that we can observe
our measure of contractual incompleteness, even though there is corporate participation.
This might seem paradoxical because one would normally expect contractual incom-
pleteness to diminish when the firm establishes additional governance. However, we be-
lieve that the firm deals only with the corporate developer witihin the open source com-
munity. As mentioned before, the corporate developer acts as an interlocutor between
firm and community.
Our proxies for contractual incompleteness and hazards comprise all requests to the
development community, not only the corporate ones. Therefore, the firm may perceive
a mitigation in contractual incompleteness, while we observe no reduction in our proxies.
Put differently, those corporate contributors we observe in the development community
specifically work on the requests of the firm and do not affect the overall functioning of
the development community.
We look at three types of submissions. In our data sample, users or developers can
about the treatment of submissions. This prevents us from using more sophisticated econometric methodsto measure transactional uncertainty.
71
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
submit either a bug report, file a support request or ask for a particular feature. These
three categories differ in terms of the overall submissions received, of the development
effort needed and also of the value to the development community. We measure the lapsed
time between submission and treatment as well as the number of treated submissions.
Both variables are proxies for the probability of having one’s submission treated. The
first variable considers more closely the time dimension of a transaction. Timeliness is
important for corporate users. We can extend the duration variable a little further. We
calculate the average duration of a submission as well as its standard deviation. These
variables give us an indication about the average time until a submission is treated. For
the other measure, we compute the number of treated submissions per developer for each
open source community. The number of treated submissions indicates the frequency of
transactions taking place for each community and using the per-developer average allows
us to compare communities of different sizes.
The proxies for contractual incompleteness are (i) the time it takes a community to re-
solve a submission and (ii) the average number of submissions per developer each commu-
nity treats. Submissions in SF, such as bug reports (BR), feature requests (FR) or support
requests (SR), can come from developers or users. For each of the three main categories,
we calculate the mean (dur_XXmeanmth) and standard deviation (dur_XXsdmth) of
the duration of solving the submission. The duration goes from the date of submission to
the date of reporting it fixed. We censor the data at the last month in case the submission
has not been fixed during the observed time period. Our data set includes submissions
from before our observation period. In fact, 80% of them were submitted before February
2005, the first month we track data on project properties. We can see that the average
lifespans of bug reports, feature requests or support requests are around 30 months.
The other variables we use to measure contractual hazards are the number of treated
feature requests per developer (r_FRfixed), the number of closed support requests per
developer (r_SRfixed) and the number of fixed bug reports per developer (r_BRfixed). In
contrast to the lifespan of a submission, we here look at the frequency of the transactions
72
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
between users and community. These variables capture the average resolution rate per
developer in each community. The standard deviation shows that the average resolution
rates vary a lot across development communities.
SF projects are grouped into seven development stages: planning, pre-alpha, alpha,
beta, production, mature or inactive. These are indicators about a project’s life-cycle
and represent the stability and number of features of a project. The pivotal stage is
generally beta status in which, in contrast to earlier stages, the software runs smoothly
and encounters relatively few crashes. The head of project chooses the development stage.
There is one particularity with SF projects as there also exists an inactive development
stage, in which development has basically ceased. Our sample contains a considerable
number of projects in beta or later development phases (71%). We also find that the
share of inactive projects (1%) is smaller than in other studies on SF projects (Howison
and Crowston, 2004).
Like previous research (Perens, 1999; Lerner and Tirole, 2002), we distinguish open
source licenses into three categories: academic, weak copyleft and strong copyleft licenses.
These categories follow license’s restrictions to use derivative work in proprietary software
and their obligations to licensees to distribute modified software under the same license.
Along these two axes, academic licenses allow the combination of derivative work in
proprietary applications and do not require the same license for any modified version.
weak copyleft licenses, in turn, allow the combination of modified software in proprietary
applications, but oblige licensees to publish derivative work under the same license. strong
copyleft restricts the combination of derivative work with proprietary applications and
requires the continuation of the same license for any work that is based on the original
source code. The projects in our sample run to 69% under strong copyleft, to 17%
under academic and to 10% under weak copyleft licenses. The category ”Other licenses”
encompasses all projects that are signed with an older, now defunct, license.
We also have categories for the audiences the project is intended for and the operating
system for which it is developed. SF lists 19 intended audiences among which are a variety
73
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
of general fields, such as science, government or end-user. We adopt a categorization used
by Lerner and Tirole (2002): end-users, developers and system administrators. We find
that 53% of the projects are intended for developers, 33% for end-users and 12% for
system administrators. This slightly skewed distribution towards software development
can be expected from a user-driven development model. The distribution of operating
systems is more interesting. Here the most popular system is middleware (32%), followed
by Posix (23%), embedded systems (15%) and Microsoft-based operating systems (13%).
Middleware encompasses all projects that run independently of the operating system.
It acts as an interpreter between the application and the operation system. The Posix
category includes all varieties of Unixes, such as Linux, AIX or Solaris.
There are two main sources of measurement errors within the data. First, there is
a sample selection bias in the SF data for small and medium-sized projects. Previous
studies (Howison and Crowston, 2004; Lerner and Tirole, 2005; David and Rullani, 2006)
show that SF contains a large number of small or inactive projects. Without a comparable
data set with projects outside a repository, our result cannot be verified for statements on
OSS in general. A possible other consequence is that we underestimate project activity
in the actual SF project population. The second source of measurement errors is more
important for this study. We likely face type I and type II errors on corporate affiliations
that are due to our data collection method. Corporate developers might use their work
email addresses for private purposes. Thus, even though we find a corporate affiliation,
there is in fact no coordinated firm strategy on OSS. The potential type II error is that
corporate developers could use their private email addresses for work related projects.
Without interviewing each developer, there is no possibility of knowing how strong these
two effects are.
Table 3.2 repeats the predictions based on the hypotheses we established with regards
to corporate open source software. We expect that the variables for the treatment duration
have a positive impact on the expected participation rate. The longer it takes and the
more distributed the time-span between submission and fixing becomes, the stronger
74
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
Table 3.2: Predictions
is the contractual uncertainty and the more likely it is that a firm assigns a developer
to the community. Along the same lines, the higher the number of treated requests per
developers is, the higher is the frequency of transactions and the smaller is the contractual
hazard.
3.3.1.1 Econometric Model
We want to regress the share of corporate developers among all developers on a set
of project characteristics and measures of contractual incompleteness. The corporate
participation rate is bounded between 0 and 1 with a positive probability of being at each
limit. Ordinary least squares has the downside that it cannot ensure that the predicted
values lie within the bound per se, similarly to the well-known linear probability model.
Tobit estimation, on the other hand, takes into account these bounds, but treats them as
truncations and thus does not include them in the calculation.
Papke and Wooldridge (1996) propose an estimation method that takes into account
the special nature of the explained variable and proves to be robust to misspecification.
They estimate the conditional mean of the explained value with the nonlinear function
����, which satisfies � � ���� � �, and compute the parameters with a quasi-maximum
likelihood estimation. In most applications ���� is a cumulative distribution function,
75
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
such as a logit, a probit or a loglog function.3 The conditional mean of the corporate
participation rate on the explanatory variables is:
������ � ����� (3.1)
We then obtain the estimates with a Bernoulli log-likelihood function:
����� � � ���������� � �� � �� ������������ (3.2)
Setting a functional form for ����, we derive the influence of each explanatory vari-
able and the predicted values for corporate participation rates. Following Papke and
Wooldridge (1996), we use the logit function for ����. Since it is a nonlinear function,
the effects of changes in one explanatory variable on the expected value of corporate
participation have to be simulated for specific values of all explanatory variables, ��:
����������
� ��������� � �������� ���������� (3.3)
Note that we estimate the coefficient as a cross section instead of a panel and adjust
the standard errors for clustered data, in this case each project. Two properties of the
data sample justify this method. First, the data varies very slightly over time. For ex-
ample, overall participation rates change for only 55 of the 74,004 observations. Running
regressions on the actual cross sections for each time period show that there is no signif-
icant difference between the two approaches (see table C.1). Secondly, we have to adjust
standard errors for heteroskedacity (Papke and Wooldridge, 1996) and can do so using
the clustered robust standard errors, which in this case are similar (see table C.1).
3An added benefit of this estimation method is that it can be easily done using standard econometricsoftware.
76
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
3.3.2 Exploratory Survey
Between June and July 2009, we conducted an email survey on 7,914 project maintainers
and received 60 valid responses.4 Although the response rate is quite low at 0.76%
compared to Henkel (2006), it is inline with research using a similar methodology (Haaland
et al., 2009). The list of project maintainers comes from the SourceForge database at
Notre Dame University (Gao et al., 2007). We contacted them in batches of 50 over a
time period of several days and recontacted unanswered emails after a few weeks.
The survey contains four questions that aim at the treatment of user requests by
the development community. The questions touch upon the treatment of generic user
requests, the treatment of corporate requests as well as the use of a ranking system to
signal priorities or personal communication to motivate voluntary contributions. We focus
on user requests because we believe they are good indicators for the interaction between
development community and firm. Note that the survey does not look at the transaction
between the open source application and the firm, but rather at the attitude within
the development community towards corporate requests. Our contention is that the less
interested the development community is to fix corporate requests, the stronger weighs
the contractual incompleteness on the firm and to more incentive it has to contribute in
the development.
Figure 3.1 shows the responses to question 1. Multiple answers are possible so that
the overall number of responses does not necessarily coincide with the number of respon-
dents. Project maintainers identify difficulty and low interest as the main reasons for slow
responses to user requests. One respondent states that ”[a]s soon as a project reaches a
certain size (typically attained by big user-oriented applications like a word processor for
instance), it becomes very hard to recruit developers. I believe the reason is simply that
in the case of volunteer collaboration, you can only devote parts of your free time to the
project.”
Figure 3.2 depicts the responses to question 2. Project administrators believe that4Appendix B shows the entire email message with the survey.
77
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
little interest from voluntary contributors is the prevalent reason for corporate requests
to take long. The lack of interest may be linked to the specificity of corporate requests.
One administrator believes that “[...] businesses often want a much more complete and
personally tailored solution, as opposed to developers who can handle much of the work
themselves after you give them a small amount of assistance.” Along the same lines, an-
other respondent states ”[...] requested changes sometimes enhance the project itself very
little, and instead are geared towards meeting the requesting business’s goals instead.”
It appears that the firm faces considerable contractual hazard in trading with the open
source application and in turn dealing with the development community. The disinterest
of the development community in corporate requests renders repeated subcontracting of
corporate development work unlikely or difficult. To reduce the contractual hazard, the
firm needs to establish additional governance arrangements.
Figure 3.3 shows the responses to question 4.5 To accelerate the treatment of requests,
project maintainers rely frequently on their own effort and only occasionally on others.
This confirms a property of the development community that Shah (2006) has pointed
out before: The project maintainer has limited executive authority in his project. He
cannot command voluntary contributors to work on particular requests. If they do not
choose themselves, he often has to work on the request himself. This finding has an im-
portant implication for the firm engaging with the development community. To establish
governance, it is not enough for the firm to win over the project administrator or, in
a more extreme case, to replace him with a paid developer. The firm has to take into
account the development community to keep the external development going and create
additional governance arrangements.
The discussion of our survey shows that (i) corporate requests may incite less interest
from the development community and (ii) the project administrator has little influence
on the behavior of other contributors. The first point helps to interpret the results of the
5We skip question 3 because it overlaps with question 4 and adds little additional insight for ouranalysis at hand. The responses can be seen in figure B.1 in Appendix B.
78
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
Figure 3.1: Do you think requests in general take long, because they...
Figure 3.2: Do you think requests by firms take long, because they...
Figure 3.3: To accelerate requests do you...
79
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
econometric analysis in the following section, when we look at the mean treatment time
and its variability of different types of requests. The second point shows that the attitude
of the open source developers towards the firm is an important factor in the transaction
between the firm and the open source application.
3.4 Discussion
Table 3.3 presents the estimates for the regression of corporate participation rates on
the transaction uncertainty variables and the control variables. We add stepwise more
control variables to the equation. None of the added control variables in columns 2 to 4
are significant. The joint hypothesis tests on the control variables on status, audience and
operating systems are all statistically insignificant. For model 4, the respective �� values
(p-values) are 4.14 (0.66), 3.65 (0.16) and 2.21 (0.53). Also, the RESET tests show that
the functional form is not misspecified, e.g. that the population model includes squared
terms of mean duration. In fact, the simplest model is the preferred one. In the following
analysis, we will focus on model 1 and use its parameter values to compute the marginal
effects.
We see that indeed the duration variability for treating feature requests (dur_FRsdmth)
and the respective mean duration (dur_FRmeanmth) are statistically significant and pos-
itive. This suggests that the more time it takes to fix a request or the more variable the
treatment of requests is the higher is the expected corporate participation rate. Also,
the average number of feature requests fixed per developer (r_FRfixed) has a, strongly
significant, negative effect on corporate participation. Increasing the number of treated
feature requests per developer thus translates into lower expected corporate participa-
tion rates. These results lend support to the notion that firms contribute to reduce the
incompleteness of their open source transaction.
Combining these results with the findings of our exploratory survey allows us to draw
a more complete picture. To wit, our respondents state that requests in general take
80
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
long because there are programming difficulties or because there is little interest by the
development community. Project maintainers also contend that corporate requests more
often encounter little interest than requests in general. This leads us to believe that
the firm experiences strong contractual incompleteness and contractual hazard due to
disinterest when there is a long mean duration of requests and larger variability.
The results for r_BRfixed and r_SRfixed in Table 3.3 are counter-intuitive. The aver-
age number of bugs fixed per developer (r_BRfixed) has a positive effect on the expected
participation rate. The average number of support requests per developer (r_SRfixed),
although not significant at the 5% level, points in the same direction. We are likely to
conflate two effects here. On the one hand, there is an uncertainty effect, the longer it
takes or the fewer requests are resolved, the higher is the contractual hazard that one’s
own submissions will not be treated. On the other hand, there is a performance effect.
Better performing projects are more likely to fix bugs and address feature requests than
less performing ones and could therefore also attract more attention from firms. If all
three variables, r_FRfixed, r_BRfixed and r_SRfixed capture the same effect, this can
create a problem in the results. The variables are likely correlated and thus adding the
three can bias the coefficients of all three. The problem becomes clear when we attempt
to interpret the results with respect to our predictions. Does a positive coefficient for the
average number of fixed bugs mean for instance that we do not find supporting evidence
for our transaction cost hypothesis or does this simply capture the performance effect?
One possible solution is to make use of the omitted variable bias. Assuming that the
three variables represent the same effect and that they thus are correlated, we can run
several regressions including all variables except for this uncertainty measure. We then
observe whether the coefficients of the remaining variables change significantly. If they do,
we know that the three variables represent the same underlying effect and are correlated
in some form. The change in the remaining coefficients shows that these take up part of
the effect of the omitted variables. In contrast, if the coefficients do not change signs,
it lends support to the notion that in fact they measure two separate effects altogether.
81
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
Hence, if the performance effect is prevalent among the set of omitted variable and the
remaining variables, the estimates of the remaining variables will change signs. If the
different variables measure two separate effects, they do not change.
Table 3.4 shows the estimates for the complete model (column 1) and the six re-
gressions with different sets of uncertainty measures omitted. We see that, although the
duration variables for feature requests vary, they remain positive across all steps (columns
3 - 7). The duration variables for support requests and bug reports fluctuate around zero
for all iterations. This supports the finding that they are statistically insignificant, with
the exception of the mean treatment duration of support request in column 5. More in-
teresting for our purposes are the coefficients for average treated requests per developer.
The coefficients for the average number of fixed feature requests remain negative in all
columns. The coefficients for the average number of fixed bug reports is positive across
all columns and the the coefficients for r_SRfixed remains also relatively stable.
What can we take away from the discussion of the last two paragraphes and the
analysis in table 3.4? Assuming the potential direction of the omitted variable bias, we are
able to show that the signs of the parameters for contractual incompleteness do not change
significantly as we leave out different variables in the regressions. This lends support to
our assertion that (i) dur_FRsdmth and dur_FRmeanmth capture part of the contractual
incompleteness, (ii) r_FRfixed can be interpreted as a measure of transactional frequency,
or more appropriately of subcontracting frequency, and (iii) r_BRfixed can be seen as a
performance measure for the development community.
As we include more control variables, the coefficients stay stable. In all, we find sup-
porting evidence for the hypothesis that less-restrictive licenses induce higher corporate
participation. The coefficients for the intended audiences suggest that expected corpo-
rate participation rates are higher for projects directed towards developers compared to
projects intended for end-users, but the estimate is insignificant at the 5% level. Lastly,
we find no significant effect of operating systems on corporate participation rates.
Figure 3.4 presents the marginal effects for the variables dur_FRsdmth and dur_FRmeanmth
82
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
Table 3.3: Regression on Corporate Participation
83
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
Table 3.4: Control Regressions on Corporate Participation
84
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
Figure 3.4: Marginal Effects 1
Figure 3.5: Marginal Effects 2
on the expected corporate participation rate for a project under an academic license. We
have fixed all other variables on either their mean values or zero for statistically in-
significant coefficients. For dur_FRsdmth the effect reaches a maximum at 33 months,
after which the variability of time to treat feature requests has a positive, but dimin-
ishing, impact on the expected corporate participation rate. The marginal effect of
dur_FRmeanmth reaches a maximum at 53 months at which date increases in the average
time to treat feature requests increases the expected participation rate to a lesser extent.
Figure 3.5 shows the marginal effect of the average number of fixed feature requests per
developer, r_FRfixed. It decreases continuously and approaches zero.
85
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
Endogeneity is an issue in our analysis. It occurs because corporate participation likely
affects the average number of treated feature requests per developer, the average dura-
tion for each request and its variability. This causes a bias in the estimated coefficients.
The standard way to treat endogeneity is to instrument the variables. Unfortunately,
we are unable to appropriately instrument all variables. We caution therefore that the
coefficients are likely biased. The direction of the endogeneity bias depends on the im-
pact of corporate participation on the treatment of requests. If communities with a lot
of corporate developers respond more slowly to submissions and hence increase mean
duration and variability, we will obtain an upward bias. In case, corporate developers
positively affect the treatment of submissions, we have a downward bias and we obtain
too negative estimates compared to the population coefficients. Lamastra (2009) explores
the influence of corporate developers on the quality of the development of SF projects.
She shows that, when corporate developers enter SF communities, code development in-
creases and non-development activities decrease in terms of quality. This may indicate
that high corporate participation rates positively affect the treatment of feature requests
and bug reports and negatively influences those of support requests. We might observe
too small an effect for the treatment of feature requests and bug reports and too large an
effect on support requests. Considering Table 3.3, these findings confirm our results and
corroborate our interpretation.
3.5 Concluding Remarks
The question on how to control open source projects is of paramount importance for
corporate strategy in open source software. We attempt to shed light on some aspects
of this issue. With regards to contribution, we find evidence that firms opt for projects
with higher measures of contractual incompleteness. This indicates that firms estab-
lish additional governance arrangements in form of dedicated corporate contributors in
open source communities. This type of vertical integration mitigates part of the risk of
86
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Chapter 3. Loose Contracts, Tight Control: Corporate contributions in SourceForgeprojects
transacting with open source applications under an incomplete contract.
There are ample points of entry for future research on corporate contribution, control
and open source software. First, the analysis of corporate control on open source software
could be enhanced with adding firm characteristics and refining the measures for project
control. Second, the strategic behavior of firms within open source projects merits further
consideration. Do firms cooperate in common projects or do they drive other companies
out of projects? Third, the interaction between private and corporate developers raises
interesting questions on the development of individual incentives to contribute.
On a different note, we show that corporate open source software can be understood
with the tools provided by the theory of incomplete contracts and transaction cost. This
might open a way to investigate corporate provision of open source software in terms of
technology outsourcing and open innovation. Indeed, the growing implication of firms
in open source software may render it possible for social scientists to shed light on these
concepts and gain more insights into the obstacles of external development and production
of technology and innovation.
87
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Conclusion
My thesis addresses three aspects of corporate involvement in open source software: the
private provision of a public good, open innovation as well as technology outsourcing.
We broach the first issue by way of a literature survey that focuses on the commercial
viability of the provision of a public good. Accordingly, the first essay discusses the
research on corporate use and provision of open source software. In the second essay,
we turn to open innovation and look at the interaction between corporate and voluntary
contributors in the production of an open source application. In particular, we investigate
the potential productivity effects of this interaction as well as of knowledge spillovers
which are often evoked as a great boon of open innovation. The third essay considers
technology outsourcing in open source software and its governance issues. We contend that
corporate contribution in open source software can be understood in terms of incomplete
contracts and transaction cost.
What can we learn from open source software on the private provision of public goods?
Despite its public good properties, open source software is amenable to corporate objec-
tives. We present business models and ways of corporate involvement that are economi-
cally viable and engage volunteers at the same time. Moreover, we show what can happen
to firms that directly compete with collectively-provided public goods.
Our insights can be useful for the analysis of sectors in which companies struggle
with voluntarily-provided digital content, for instance newspapers, encyclopedias or tele-
vision channels. News articles, editorials or encyclopedic entries are pieces of information
that can be produced by a large number of individuals and redistributed easily at little
88
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Conclusion
cost. Prominent examples are Wikipedia, the blog aggregator Global Voices, the social
networking site Twitter or the user-driven news site Reddit. These sites attract several
million viewers a day and thus pose a threat to traditional media outlets. Although our
examples are all cases with a global scale, our discussion works also, and perhaps even
better, on a smaller scale between local media and local bloggers for instance.
The discussion about proprietary and open source software competition delineates
several possible strategies for traditional publishing houses. We will mention only two
of them here. They could invest more to obtain high quality content, to hire better
journalists or to provide a more diverse set of news, therefore differentiating themselves
from the volunteer-driven alternatives. This will ultimately be futile as it leads to an
arms race between traditional media and online competitors that the traditional media
cannot win. Volunteers will always be able to modify - some might say imitate - and to
redistribute the news or editorials. Therefore the investment will not create a significant
competitive advantage and the publishing house spends money for nothing.
Another strategy is to embrace the user-driven production method. By allowing read-
ers and volunteers to contribute in the production of editorials, newspaper articles or
content in general, traditional media companies may be able to kill two birds with one
stone. They reduce production costs without decreasing the value to the consumer or the
advertiser at the same time. Outsourcing part of the provision of content to volunteers
allows traditional media houses to focus on other competitive advantages, for instance
providing editorials, signalling reliable news or ameliorating the incentives to contribute.
An example of a successful implementation is Google Knol. It is a free encyclopedia that
pays contributors for submitting articles based on the number of views and sells online
advertising.
In a digital network, a key aspect is the enforcement and protection of intellectual
property. We have seen that in the case of open source software, the firm has several
possibilities to deal with this issue. Publishers and commercial content providers, on the
other hand, struggle to find similar means of protection.
89
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Conclusion
Besides the provision of a public good, corporate involvement in open source software
allows us to explore open innovation and its productivity benefits. Much has been writ-
ten about the productivity gains in research made possible by including external actors.
Scholars often attribute economies of scope and knowledge spillovers as the sources for
these productivity gains. We attempt to shed light on this using the case of corporate
involvement in open source software.
There is scarce empirical evidence about the productivity effect of rising corporate
participation in open source development. As the number of corporate contributors in-
side software projects rises, the need to understand their effect on open source production
becomes important. There are good reasons to believe that corporate participation may
not improve productivity. Firms do not necessarily share the same objectives as vol-
untary contributors on the direction of the development process. Conflicts of interests
might hampen software production and may ultimately lead to the forking of an open
source project. Moreover, the work processes of corporate developers may not be readily
applicable to the open source production model. Large companies have long delays in
decision-making which might decrease a project’s productivity as programming tasks are
left undone in the meantime.
We find that there are indeed negative productivity effects due to the interaction of
corporate developers with voluntary contributors. These negative effects might be due to
the aforementioned friction and conflicts of interest. However, knowledge spillovers can
mitigate parts of the loss in productivity and the predominance of corporate or voluntary
contributors in a community might reduce the impact of this friction.
Can we generalize these findings to other cases of open innovation? Our results show
that open innovation is not a panacea to boost research and technology production per
se. To be successful, the firm needs to be aware of the caveats. Motivating the volunteers
and aligning the objectives of the community with those of the firm is crucial. Even
though this might seem obvious, cases of failed open source engagements show that firms
do not always heed this advice. Moreover, the firm needs to learn to let go. To reap the
90
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Conclusion
benefits of its open source engagement, the firm might want to meet the development
community halfway by contributing to the development. In the course of the open source
engagement, the firm might re-evalutate its intellectual property.
In addition, corporate involvement in open source software might trigger a re-appraisal
of the competitive advantages of a firm. Companies might see that the competitive edge
is not the source code of a software, but the human component of software production.
Open source software may augment the value of human capital and commoditize the
actual piece of software. In this regard, open source software is a true process innovation
with profound implications for ICT labor markets.
Related to the issue of productivity is the inter-firm collaboration inside open source
communities. Even though it is widely acknowledged that firms collaborate in the de-
velopment of open source software, researchers have yet to provide insights into how
this collaboration works and what effect it has on software creation. In this sense, open
source projects are similar to R&D joint ventures. Important questions in this respect
are to what extent firms cooperate to establish common standards and how firms can
appropriate their investment in the software development.
In the last essay, we touch upon the issue of contracting research and technology.
The interaction between the firm and external suppliers of technology raises the question
of governance and transactional uncertainty. In a world with an unknown future and
limited resources, how can the firm ensure the fulfillment of its contract by the supplier?
We believe that corporate open source software is an ideal case in point for the analysis
of this question.
We argue that the firm establishes additional governance arrangements to ensure the
completion of the transaction with the open source community. These arrangements can
take different forms, for instance the acquisition of the ownership rights for the source
code, source code hijacking or corporate contributions. Dedicating a developer to work
on an open source project is a kind of partial vertical integration. The firm can scale the
integration, i.e. dedicate more developers, depending on the perceived uncertainty of the
91
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Conclusion
transaction. Using this approach, we can provide a complementary explanation to the
literature of corporate contributions in open source software.
Our analysis of incomplete contracts is useful for the evaluation of corporate partici-
pation in projects of crowd-sourced production. Crowd-sourcing is a concept that takes
user-driven innovation and modularity in production and applies them to sectors outside
the software industry. Its initiators hope to carry over the productivity benefits seen in
open source software. As it stands, crowd-sourcing gains momentum in diverse fields,
such as research, political activisim or Do-It-Yourself tinkering. Prominent examples are
Nasa’s Clickworkers, the UK’s Conservatives “Make IT Better” program or Bug Labs’
The Bug.
One of the advantages of including external developers in the production is the extent
to which the firm can reduce its own production costs. Can the firm outsource enough to
the crowd to make the production worthwhile? This is, above else, a question of creating
sufficient incentives for external agents to contribute. These developers need to find some
personal benefit or altruistic meaning in the production of the good. This said, there is
a second part to this questions. How does the firm design the user contract so that the
firm is not obliged to contribute too much?
The discussion of incomplete contracts and corporate contributions in open source
sheds light on the governance structure between the firm and development community.
Contractual hazards oblige the firm to contribute in the production of an open source
application. For crowd-sourced production, this means that the firm faces a double-edged
sword. On the one hand, it can create a very restrictive contract to reduce contractual
hazards with fixed specifications, delivery deadlines or strict conditions of use. However,
this may scare away external contributors and reduce the productivity benefits of crowd-
sourcing. On the other hand, the firm can offer a contract similar to open source licenses,
in which it delineates only the rudimentary rules of collaboration. Promoting contribu-
tions, this type of contract may mean that the firm needs to get involved too much to
control the production.
92
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Conclusion
We are interested in the commercial viability of corporate crowd-sourced engagements
and the design of possible user contracts. How can the firm design a contract that entices
users to contribute, while creating revenue for the firm? We do not have an answer
for these questions, but we believe that our discussion about incomplete contracts and
corporate contributions is a first step to understand this fascinating problem.
93
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Appendix A
Source Code of Email Tracer
Application
Listing A.1: Python Source Code for JET Application (jet.py)
#########################################
##
## JET (Jan’s Email Tracer)
## version 3
##
##
#########################################
__author__ = ’Jan Eilhard ’
__version__ = ’Version 3.0’
__date__ = ’Date: 24/11/2007 ’
__copyright__ = ’Copyright (c) 2007 Jan Eilhard ’
__license__ = ’Open Software License ’
import auxfunctions
94
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Appendix A. Source Code of Email Tracer Application
import sys
from sys import argv
import os
def getConfig ():
""" Ask questions on project name , configuration files &
proxy settings
"""
configuration = {}
print "Welcome to the configuration of JET."
print " What is the name of the project? ",
try:
configuration["project"] = sys.stdin.readline ().
strip()
print "Enter the filename which contains your
email addresses: ",
filename = sys.stdin.readline ().strip ()
configuration["emails"] = auxfunctions.checkTxt(
str(filename)) and os.path.join(filename) or
None
print "Enter the names of the search categories (
single words only): ",
configuration["search"] = {}. fromkeys(sys.stdin.
readline ().strip ().split(" "))
for cat in configuration["search"].keys():
print "Please enter the filename
containing search terms for %s: " % cat
filename = sys.stdin.readline ().strip ()
configuration["search"][cat] =
95
past
el-0
0527
647,
ver
sion
1 -
19 O
ct 2
010
Appendix A. Source Code of Email Tracer Application
auxfunctions.checkTxt(str(filename))
and os.path.join(filename) or None
print "Do you have special proxy settings (Y/N)? "
,
configuration["proxy"] = (sys.stdin.readline ().
strip() == "Y") and raw_input("Please enter
your proxy settings: ") or None
print "\n".join(["%s = %s" % (k,v) for k,v in
configuration.items ()])
print "Is this configuration correct (Y/N)? ",
(sys.stdin.readline ().strip() == "Y") and
auxfunctions.createDir(str(( configuration["
project"]))) or getConfig ()
print "Do you want to start the Internet search
right now (Y/N)? ",
return (sys.stdin.readline ().strip() == "Y") and
configuration or auxfunctions.saveFile(
configuration , "config")
except:
print "An error has occurred."
def startInterface(argv=None):
""" Dispatch either JET directly or run configuration