Top Banner
Qize Le School of Mechanical and Materials Engineering, Washington State University, Pullman, WA 99164 Zhenghui Sha School of Mechanical Engineering, Purdue University, West Lafayette, IN 47907 Jitesh H. Panchal 1 Assistant Professor School of Mechanical Engineering, Purdue University, West Lafayette, IN 47907 e-mail: [email protected] A Generative Network Model for Product Evolution Modeling the structure and evolution of products is important from the standpoint of improving quality and maintainability. With the increasing popularity of open-source processes for developing both software and physical systems, there is a need to develop computational models of product evolution in such dynamic product developments sce- narios. Existing studies on the evolution of products involve modeling products as net- works, taking snapshots of the structure at different time steps, and comparing the structural characteristics. Such approaches are limited because they do not capture the underlying dynamics through which products evolve. In this paper, we take a step toward addressing this gap by presenting a generative network model for product evolution. The generative model is based on different mechanisms though which networks evolve—addi- tion and removal of nodes, addition and removal of links. The model links local network observations to global network structures. It is utilized for modeling and analyzing the evolution of a software product (Drupal) and a physical product (RepRap) developed by open source processes. For the software product, the generated networks are compared with the actual product structures using various network measures including average degree, density, clustering coefficients, average shortest path, propagation cost, clustered cost, and degree distributions. For the physical product, the product evolution is ana- lyzed in terms of the proposed mechanisms. The proposed model has three general appli- cations: longitudinal studies of a product’s evolution, cross-sectional studies of evolution of different products, and predictive analyzes. [DOI: 10.1115/1.4025856] Keywords: product structure, evolution, complex networks, degree-based models, modularity 1 Introduction The structure of a product is an indicator of its complexity, and hence, impacts its quality and maintainability. Studies have shown that the product structure has an impact on the organizational structure also [1,2]. Due to its importance, the analysis of product structure has received significant attention within the engineering design literature. One of the prominent ways of modeling product structure is through dependency modeling techniques where a product is modeled as a network of components linked through dependency relationships. The network can be represented in a matrix form using the design structure matrix [3,4]. The matrix can be analyzed to answer a number of questions such as: (a) How modular or complex is the product? (b) How the modularity affects the system performance? (c) Is this product more modular than other products? (d) How does the product structure affect complexity? (e) How does a change in a particular module propagate to other modules within the product? Answering these questions can lead to knowledge for better designs and more efficient product development processes. The recent special issue on dependency modeling techniques in the Journal of Engineering Design [5] highlights some of the latest developments in this area. Existing efforts on analyzing product evolution are primarily focused on hierarchical product develop- ment where product architecture is determined early in the design process and the product structures do not change significantly dur- ing the process. In contrast, emerging mass-collaborative product development processes [6,7] such as open-source product devel- opment involve significant changes in the product structure as new requirements are continuously proposed, new modules are created, and new interfaces are designed. Exponential growth in the size of products has been observed in some cases [8]. In such dynamic scenarios, the analysis of the structural changes in the products with time provides important information about the rate of evolution and effectiveness of the product development process. Our goal is to develop dynamic network-based models for prod- ucts that undergo such drastic evolution during their development. The work is motivated by (a) the uniqueness of the emerging product development processes, and (b) the unprecedented access to product evolution data. Open-source processes represent a fun- damentally different way in which participants organize and coor- dinate activities to develop products, as compared to traditional top–down processes [9]. While open-source software development has been well known for the past two decades, the approach is increasingly being adopted for hardware development also [10]. Examples of open hardware products include RepRap [11] and Arduino [12]. Due to the increasing interest in using open-source principles for physical products, there is a need to understand how open-source products evolve. Additionally, open-source processes are carried out on the Internet. The product structure and its evolu- tion are well documented and openly available, making them ideal for analysis. There exist well-developed platforms such as Source- Forge [13] and GitHub for open-source software development, which enable detailed modeling of product evolution. As open- source hardware matures, it is expected that similar platforms will capture information for physical products also. Existing approaches for studying product evolution involve tak- ing snapshots of the structure at different time steps, and compar- ing the structural characteristics. These structural characteristics range from the extent of coupling and cohesion to various metrics for modularity and complexity [2]. Such approaches are suitable for comparing the global characteristics of the product networks but are not effective for capturing the underlying dynamics through which products evolve. For example, comparisons of the 1 Corresponding author. Contributed by the Design Engineering Division of ASME for publication in the JOURNAL OF COMPUTERS AND INFORMATION DIVISION IN ENGINEERING. Manuscript received October 9, 2012; final manuscript received October 23, 2013; published online January 10, 2014. Editor: Bahram Ravani. Journal of Computing and Information Science in Engineering MARCH 2014, Vol. 14 / 011003-1 Copyright V C 2014 by ASME Downloaded From: http://computingengineering.asmedigitalcollection.asme.org/ on 01/12/2014 Terms of Use: http://asme.org/terms
12

A Generative Network Model for Product Evolution · with the actual product structures using various network measures ... structure is through dependency ... analyze the impact of

Jun 04, 2018

Download

Documents

dodiep
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Generative Network Model for Product Evolution · with the actual product structures using various network measures ... structure is through dependency ... analyze the impact of

Qize LeSchool of Mechanical and Materials Engineering,

Washington State University,

Pullman, WA 99164

Zhenghui ShaSchool of Mechanical Engineering,

Purdue University,

West Lafayette, IN 47907

Jitesh H. Panchal1Assistant Professor

School of Mechanical Engineering,

Purdue University,

West Lafayette, IN 47907

e-mail: [email protected]

A Generative Network Modelfor Product EvolutionModeling the structure and evolution of products is important from the standpoint ofimproving quality and maintainability. With the increasing popularity of open-sourceprocesses for developing both software and physical systems, there is a need to developcomputational models of product evolution in such dynamic product developments sce-narios. Existing studies on the evolution of products involve modeling products as net-works, taking snapshots of the structure at different time steps, and comparing thestructural characteristics. Such approaches are limited because they do not capture theunderlying dynamics through which products evolve. In this paper, we take a step towardaddressing this gap by presenting a generative network model for product evolution. Thegenerative model is based on different mechanisms though which networks evolve—addi-tion and removal of nodes, addition and removal of links. The model links local networkobservations to global network structures. It is utilized for modeling and analyzing theevolution of a software product (Drupal) and a physical product (RepRap) developed byopen source processes. For the software product, the generated networks are comparedwith the actual product structures using various network measures including averagedegree, density, clustering coefficients, average shortest path, propagation cost, clusteredcost, and degree distributions. For the physical product, the product evolution is ana-lyzed in terms of the proposed mechanisms. The proposed model has three general appli-cations: longitudinal studies of a product’s evolution, cross-sectional studies of evolutionof different products, and predictive analyzes. [DOI: 10.1115/1.4025856]

Keywords: product structure, evolution, complex networks, degree-based models,modularity

1 Introduction

The structure of a product is an indicator of its complexity, andhence, impacts its quality and maintainability. Studies have shownthat the product structure has an impact on the organizationalstructure also [1,2]. Due to its importance, the analysis of productstructure has received significant attention within the engineeringdesign literature. One of the prominent ways of modeling productstructure is through dependency modeling techniques where aproduct is modeled as a network of components linked throughdependency relationships. The network can be represented in amatrix form using the design structure matrix [3,4]. The matrixcan be analyzed to answer a number of questions such as:

(a) How modular or complex is the product?(b) How the modularity affects the system performance?(c) Is this product more modular than other products?(d) How does the product structure affect complexity?(e) How does a change in a particular module propagate to

other modules within the product?

Answering these questions can lead to knowledge for betterdesigns and more efficient product development processes. Therecent special issue on dependency modeling techniques in theJournal of Engineering Design [5] highlights some of the latestdevelopments in this area. Existing efforts on analyzing productevolution are primarily focused on hierarchical product develop-ment where product architecture is determined early in the designprocess and the product structures do not change significantly dur-ing the process. In contrast, emerging mass-collaborative productdevelopment processes [6,7] such as open-source product devel-opment involve significant changes in the product structure as

new requirements are continuously proposed, new modules arecreated, and new interfaces are designed. Exponential growth inthe size of products has been observed in some cases [8]. In suchdynamic scenarios, the analysis of the structural changes in theproducts with time provides important information about the rateof evolution and effectiveness of the product developmentprocess.

Our goal is to develop dynamic network-based models for prod-ucts that undergo such drastic evolution during their development.The work is motivated by (a) the uniqueness of the emergingproduct development processes, and (b) the unprecedented accessto product evolution data. Open-source processes represent a fun-damentally different way in which participants organize and coor-dinate activities to develop products, as compared to traditionaltop–down processes [9]. While open-source software developmenthas been well known for the past two decades, the approach isincreasingly being adopted for hardware development also [10].Examples of open hardware products include RepRap [11] andArduino [12]. Due to the increasing interest in using open-sourceprinciples for physical products, there is a need to understand howopen-source products evolve. Additionally, open-source processesare carried out on the Internet. The product structure and its evolu-tion are well documented and openly available, making them idealfor analysis. There exist well-developed platforms such as Source-Forge [13] and GitHub for open-source software development,which enable detailed modeling of product evolution. As open-source hardware matures, it is expected that similar platforms willcapture information for physical products also.

Existing approaches for studying product evolution involve tak-ing snapshots of the structure at different time steps, and compar-ing the structural characteristics. These structural characteristicsrange from the extent of coupling and cohesion to various metricsfor modularity and complexity [2]. Such approaches are suitablefor comparing the global characteristics of the product networksbut are not effective for capturing the underlying dynamicsthrough which products evolve. For example, comparisons of the

1Corresponding author.Contributed by the Design Engineering Division of ASME for publication in the

JOURNAL OF COMPUTERS AND INFORMATION DIVISION IN ENGINEERING. Manuscriptreceived October 9, 2012; final manuscript received October 23, 2013; publishedonline January 10, 2014. Editor: Bahram Ravani.

Journal of Computing and Information Science in Engineering MARCH 2014, Vol. 14 / 011003-1Copyright VC 2014 by ASME

Downloaded From: http://computingengineering.asmedigitalcollection.asme.org/ on 01/12/2014 Terms of Use: http://asme.org/terms

Page 2: A Generative Network Model for Product Evolution · with the actual product structures using various network measures ... structure is through dependency ... analyze the impact of

snapshots do not provide information about how the local (mod-ule-level) observations result in the evolution of the product net-works. Further, comparison of structural measures of snapshotsonly provides information about the specific versions of the prod-ucts being analyzed. It does not provide the capabilities to performpredictive or “what-if” analyses to understand the impacts offuture design modifications. To facilitate such analyses, modelsthat capture the evolutionary dynamics of networks based on localobservations on addition and deletion of nodes and links arerequired. These models are bottom-up in nature, and are referredto as generative models. Our review of literature reveals that thereis a lack of generative models of open-source product develop-ment (see Sec. 2.1). Generative models that embody the underly-ing mechanisms of network growth are important because theycan help in understanding the reasons for increasing complexityof products, and identifying specific ways to maintain it, toincrease modularity, and to reduce product complexity.

The goal in this paper is to present a generative model for theevolution of products. The model is inspired by existing models inthe network science literature, reviewed in Sec. 2.2. The modelembodies two categories of mechanisms through which networksevolve: (1) addition or deletion of nodes and (2) addition or dele-tion of links. These categories are divided into six mechanismsthat describe how nodes are added (or removed) and linked witheach other. These mechanisms provide information about the fol-lowing questions: How many new nodes are added at a certaintime-step? How many existing nodes are removed? For givenexisting nodes, what are the probabilities of creation of links withnew nodes, and with other existing nodes? What are the probabil-ities of removal of existing nodes? For new nodes added, what arethe probabilities of linking with existing nodes and other newnodes? The evolution of the product networks is modeled usingthese mechanisms. Depending on the level at which the product isanalyzed, the nodes can refer to different aspects of a product(e.g., modules, files, functions, or classes in a software product).The links refer to dependencies between nodes (e.g., class depend-encies and function calls).

We apply these mechanisms to model the evolution of an open-source software product, Drupal [14]. The results indicate that themodel generates dynamic networks whose evolutionary character-istics are close to that of the original product structure. To illus-trate the generality of the approach, we also apply the model toRepRap, an open source hardware product. The results indicatethat such a bottom-up approach can be utilized for modeling evo-lutionary product structures in open-source processes. The paperis organized as follows. In Sec. 2, we discuss the existing litera-ture on product evolution in the open-source domain, and genera-tive models of networks. A discussion of the mechanisms ofnetwork evolution and application to Drupal is presented in Sec.3. The network-level properties of the Drupal product structureare evaluated in Sec. 4. The proposed generative model and itsapplication to Drupal are presented in Sec. 5. Application toRepRap is presented in Sec. 6. Finally, closing comments arepresent in Sec. 7.

2 Review of Relevant Literature

The relevant literature includes two aspects: analysis of productevolution in open-source domain (discussed in Sec. 2.1), and mod-els of network generation (discussed in Sec. 2.2).

2.1 Existing Studies on the Product Structure in Open-Source Domain. Since open-source hardware is still in itsinfancy, existing literature in the open-source domain is mainlyfocused on software development. Crowston and co-authors [15]recently published a survey article highlighting the diverseresearch efforts on the open-source domain. In the survey, theycategorize the literature into inputs (member characteristics, pro-ject characteristics, and technology use), processes (softwaredevelopment practices, social practices, and firm involvement

practices), emergent states (social states and task related states),and outputs (software implementation, team performance, andevolution). This paper fits into their outputs (evolution) category.Some of the earliest models of software evolution were in theform of differential equations [16] built using general principlessuch as the Lehman’s laws [17].

The software structure is modeled by the technical dependen-cies which can be identified using two approaches [18]. The firstapproach involves extracting relational information between enti-ties such as statements, functions, files, or modules. The relation-ships between the entities can be data-related dependencies [18],functional dependencies [19,20], or syntactic dependencies [21].The second approach involves identifying dependencies by exam-ining how modification requests affect the source code [22].Examples include analysis of code decays based on modificationrequests [23], and analysis of modifications involving files thattend to change together [24].

Based on the goals, the literature can be categorized into threegroups. The first group of studies focuses on the evolution of soft-ware architecture in a single product. The studies in this group areprimarily focused on the modularity and complexity of softwarearchitecture, such as determining how the size increases [25],modularity changes [26], and complexity increases [27] with soft-ware evolution. MacCormack et al. [28] and LaMantia et al. [29]analyze the impact of software modularity on evolutionary charac-teristics. Different metrics are proposed to quantify modularityand structural complexity of software. These metrics include cou-pling and cohesion [30], propagation cost and clustered cost [2].Modularity is an important characteristic because it affects evolv-ability, changeability, maintainability [31], and customizability.Modularity is measured in terms of complex network metrics suchas path length and clustering coefficient [32].

The second group of studies is focused on comparing the struc-tures of different open-source products to identify differences andcommon patterns across different projects. Mockus et al. [33]compare the development process of Apache and Mozilla. Mac-Cormack et al. [2] analyze the structures of Linux and Mozillaand compare them using propagation cost and clustered cost met-rics. The authors show that the modularity of Mozilla was initiallyless than that of Linux, but increased after the redesign efforts.Valverde and Sole [34] analyze 80 OSS projects to determinecommonalities. The authors discover that the product structureshad hierarchical small-world and scale-free characteristics. Fur-ther, the clustering coefficients (C) of the projects are significantlylarger than their random counterparts. Valverde and Sole [34]mainly focus on discovering topologies and characteristics ofproduct structures based on complex network analysis.

The third group of studies focuses on comparing the structuresof open-source and proprietary software where the main goal is toidentify the commonalities and differences between softwareproducts developed using fundamentally different techniques andorganizational (community) settings. Raymond [35] and O’Reilly[36] claim that opens-source software is more “modular” than pro-prietary software. On the other hand, Torvalds [37] suggests thatmodularity is a required property for the success of open sourcesoftware (OSS) development. MacCormack et al. [2] compareopen-source software products with proprietary software.

As a summary, various case studies have been performed to an-alyze the evolution of software products. Different versions ofsoftware are compared using metrics from complex network anal-ysis and metrics for modularity and complexity. Differential equa-tions have been developed to model the growth of commercialsoftware products. However, such general equations only accountfor evolution in terms of the size. Since dependencies betweenmodules are not taken into account, the models do not capture theevolution of structural complexity (which depends on how theentities are linked). Currently, there is a lack of network-basedgenerative models that capture the evolution of open-source prod-ucts and the corresponding changes in modularity and complexity.Such network generation models should not only capture the

011003-2 / Vol. 14, MARCH 2014 Transactions of the ASME

Downloaded From: http://computingengineering.asmedigitalcollection.asme.org/ on 01/12/2014 Terms of Use: http://asme.org/terms

Page 3: A Generative Network Model for Product Evolution · with the actual product structures using various network measures ... structure is through dependency ... analyze the impact of

growth in terms of size but also in terms of structural complexity.To address this gap, we present a network generation model basedon local network observations. Such models have been used in thecomplex network literature to model networks with different top-ologies. To develop an appropriate model, literature review onexisting network generation models is performed in Sec. 2.2.

2.2 Existing Studies on the Network Generation Models.Network-generation models can be classified into two types:structure-based models and evolution-based models. Structure-based models generate networks based on the underlying struc-tural characteristics while the evolution-based models generatenetworks based on assumed evolutionary dynamics.

2.2.1 Structure-Based Models. The exponential random graphmodel, also referred to as P* model, is a widely used structure-based model [38]. In P* models, it is assumed that the network isgenerated by some statistical process and the observed network isone realization from a set of possible networks with similar char-acteristics (e.g., number of actors). The probability of realizing aspecific network is given by [38]

Pr Y ¼ yð Þ ¼ 1

k

� �exp

XA

gAgAðyÞ( )

(1)

where Pr(Y¼ y) represents the probability that a network yemerges, k is a normalizing parameter which ensures that theprobability falls in a proper distribution, A is the set of substruc-ture configurations, gA(y) is the network statistic corresponding tothe configurations A. Based on the observed network, the parame-ters gA are calculated using methods such as pseudo-likelihoodestimation [39] and Markov chain Monte Carlo (MCMC) maxi-mum likelihood estimation [40]. The P* model is built by formu-lating assumptions about substructure configurations and thenvalidating these assumptions using the resulting parameters gA.Simple substructure configurations include reciprocity, two-star,three-star, and triangle.

2.2.2 Evolution-Based Models. In evolution-based models, aninitial network is chosen to represent the early stage of a real net-work. New nodes and links are gradually added (and removed) tosimulate the growth of the network [41]. The process of linking ofnew nodes can be driven by different local properties of connect-ing nodes. The most popular local property used for linking is anode’s degree. The degree of a node is the number of other nodesconnected to it. An example of evolution-based models usingdegree-based linking mechanism is the Barabasi-Albert model ofscale-free graphs [42]. In this model, the assumption is that newnodes entering the network attach to existing nodes with a proba-bility proportional to their degree. Hence, the existing nodes withgreater connections have greater probability of linking to newnodes. This is also referred to as preferential attachment. A num-ber of researchers have proposed variations to the Barabasi-Albertmodel to account for different characteristics of real world net-

works. For example, linear preferential attachment with initialattractiveness [43] is proposed to simulate real networks withpower law distribution with arbitrary exponent. Nonlinearpreferential attachment models have also been proposed [44]where the probability of attachment depends on ka, where k is thedegree of the node and a is an arbitrary parameter.

In this paper, we choose to model the evolution of productstructures using degree-based evolution models over structuralmodels due to two reasons. First, the evolution-based modelsrelate the local dynamic behaviors to the global network structureswhereas structural models capture the relationships between localstructures with global structures. Hence, evolution-based modelsare a natural choice. Second, some studies suggest that degree-based evolutionary models can be more effective in modeling theglobal network structures than the structure-based models [45].Hence, we focus on the degree-based models in the remainder ofthis paper. The node-level mechanisms based on the degrees ofnodes are discussed in Sec. 3.

3 Node-Level Mechanisms for Product Evolution

The premise in the proposed model is that six network evolu-tion mechanisms at the node level determine the evolution ofproduct structures. Additionally, it is assumed that the mecha-nisms are only dependent on the degrees of nodes. The nodes of asoftware product network can refer to functions, classes, files, orfeatures. The nodes in a physical product network can refer toparts, parameters within parts, or sub-assemblies. Function-calllevel of abstraction is considered in this paper for software prod-ucts and part level abstraction is considered for physical products.We discuss the mechanisms in Sec. 3.1 and illustrate themthrough application to the Drupal product network in Sec. 3.2.

3.1 Mechanisms for Modeling the Evolution of Open-Source Products. The six mechanisms through which networksevolve are illustrated in Fig. 1, and discussed next.

(a) Addition of new nodes: The primary growth mechanism ofa network is the addition of new nodes. This mechanismcorresponds to the addition of new modules, functions,classes, parts, etc., to address new requirements, specifica-tions and features. The trends in the number of additionalnodes can be determined by comparing consecutive ver-sions of the product structure.

(b) Removal of existing nodes: Existing nodes may be removedfrom a product network because existing features may nolonger be needed or are replaced by new features. The num-ber of existing nodes removed from a product can also bedetermined by comparing consecutive versions.

(c) Linking of new nodes with existing nodes: After new nodesare added, these nodes can be linked to existing nodes bynew interfaces (function calls in the case of call graphs). Inour degree-based model, we assume that the probabilitythat a new node links to existing node is a function of the

Fig. 1 Six mechanisms for the evolution of product networks

Journal of Computing and Information Science in Engineering MARCH 2014, Vol. 14 / 011003-3

Downloaded From: http://computingengineering.asmedigitalcollection.asme.org/ on 01/12/2014 Terms of Use: http://asme.org/terms

Page 4: A Generative Network Model for Product Evolution · with the actual product structures using various network measures ... structure is through dependency ... analyze the impact of

degree of existing node: PðAe;nÞ ¼ F1ðKeÞ, where Ae,n rep-resents the attachment between existing nodes and a newnode, Ke represents the degree of the existing node. Todetermine the relationship, we compare two consecutiveversions of the product structure network and calculate theaverage numbers of interfaces created between existingnodes with a given degree and new nodes.

(d) Linking of new nodes with each other: Since new nodes donot have any initial links, we assume that the new nodesfirst link with existing nodes and then link with new nodes.After the new nodes link with the existing ones, the degreeof a new node is referred to as the “initial degree”. This ini-tial degree is used to determine the probability of creationof links between two new nodes. The probability of a newnode being linked with other new nodes is modeled as:PðAn1;n2Þ ¼ F2ðKn;iÞ, where An1,n2 represents the attach-ment between two new nodes (n1 and n2), Kn,i representsthe “initial degree” of the new nodes.

(e) Linking of existing nodes with each other: New links canalso be added between existing nodes. This corresponds tothe addition of a new function call between two existingfunctions. The probability that an existing node is attachedto other existing nodes is a function of its degree:PðAe1;e2Þ ¼ F3ðKeÞ, where Ae1,e2 represents the attachmentbetween existing nodes, Ke represents the degree of theexisting nodes.

(f) Removal of existing links: Existing links can be removedin new product versions because of two reasons. First,existing nodes may be removed. In this case, the existinglinks associated with these nodes are also removed. Sec-ond, the links between two existing nodes are no longerused. Hence, the existing links are removed. The probabil-ity of removal of existing links between existing nodes iscalculated by comparing consecutive versions of the code.The probability function can be represented as:PðRe1;e2Þ ¼ F4ðKeÞ, where Re1,e2 represents the removal oflinks between existing nodes, Ke represents the degrees ofexisting nodes.

The existing degree-based models discussed in Sec. 2.2 presentspecific functional forms for the probability functions. For exam-ple, the Barabasi-Albert model [46] assigns a linear probabilityfunction for linking between new nodes and existing nodes.However, we do not pre-assign any functional form for the

probability functions F1–F4. These functions are determined andestimated based on the observed data.

3.2 Utilizing the Node-Level Mechanisms for DrupalProduct Network. Drupal [14] is an open-source content-man-agement system, which is used for the creation of community-based websites. In this paper, we analyze five major versions ofDrupal core (2.0 through 5.0). Drupal is well developed with over7000 community-contributed add-ons, known as contrib modules.Besides, the project also attracts more than 1000 developers. Dru-pal is selected because of its maturity and the availability of codefor different versions. As mentioned earlier, function-leveldependencies in the call graph are used to model the productstructure. Function calls represent one of the ways of modelingthe structure of software. Call graphs have been widely used tomodel software architecture [18,47,48]. If function B is calledwithin function A, then an interface is created between functionsA and B.

In the first step, raw data about the product structure areextracted from major versions of the source code. The raw dataconsist of all the functions in the source code and the correspond-ing function calls. The data are used to derive the relationshipsamong the functions. The second step is to model the productstructure as a complex network in which functions are nodes andfunction calls are links. A documentation generator tool, Doxygen[49], is used to create the call graph. Having generated the net-works for different versions of the code, consecutive versions ofthe network are compared to extract quantitative informationabout the node-level mechanisms.

Mechanisms (a) and (b): The data corresponding to the mecha-nisms (a) and (b) for different evolutionary steps in Drupal net-work are displayed in Table 1. On comparing the number of newnodes added, existing nodes removed and the total number ofnodes, it is observed that about half of the existing nodes areremoved. The number of newly added nodes is close to the totalnumber of nodes in the previous version. This demonstrates sig-nificant evolution of the product. For example, new features areadded, the outdated features are removed, the features that areuseful but not efficient are replaced, and bugs are found and cor-rected. LaMantia et al. [29] propose change ratio, which measuresthe number of new classes added and existing classes removed ascompared with total number of classes. LaMantia et al. [29] alsodetect a high value of change ratio for Tomcat-main product.

Mechanism (c): The mechanism (c) is the probability that exist-ing nodes are linked to new nodes. The probabilities with whichnew nodes link with existing nodes are plotted against the degreesof existing nodes on a log–log scale in Fig. 2. The probabilityfunctions are determined by fitting linear functions in the log–logplots. As shown in the figure, the exponents in the probabilityfunctions are 1.0705, 0.9270, and 0.9828 for versions V2 ! V3,V3 ! V4, and V4 ! V5, respectively. The closeness of theseexponents to 1 is an indication of preferential attachment in theevolution of the Drupal network. Preferential attachment is amechanism that has been shown to result in scale-free characteris-tics in a variety of complex networks. Valverde and Sole [34]

Table 1 Data associated with node-level mechanisms for Dru-pal network (V2 fi V5)

Mechanism V2! V3 V3! V4 V4! V5

(a) Nodes added 294 356 780(b) Nodes removed 154 139 439(c) Links between new and existing nodes 558 987 1451(d) Links among new nodes 530 350 1286(e) Links among existing nodes 51 140 151(f) Links removed 492 744 1790

Fig. 2 Probabilities of creation of links between new and existing nodes (mechanism (c))

011003-4 / Vol. 14, MARCH 2014 Transactions of the ASME

Downloaded From: http://computingengineering.asmedigitalcollection.asme.org/ on 01/12/2014 Terms of Use: http://asme.org/terms

Page 5: A Generative Network Model for Product Evolution · with the actual product structures using various network measures ... structure is through dependency ... analyze the impact of

observed the scale-free nature of software architectures in a largenumber of open-source software products.

Mechanism (d): The probabilities with which new nodes areattached with each other are plotted in Fig. 3. As mentionedabove, the degrees plotted in the figure are initial degrees obtainedafter linking the new nodes to existing nodes. From the probabilityplots, it is observed that for new nodes with high initial degree,the probability of linking to a new node is high compared to thosewith low degree. We observe that exponential functions providegood approximation of the relationships between the initialdegrees and probabilities. The parameters of the exponential func-tions are shown in Fig. 3.

Mechanism (e): The probabilities of creation of links betweentwo existing nodes are shown in Fig. 4. The existing degrees ofnodes are plotted on the x-axis. Exponential functions are fit onthe data. The number of interfaces created between two existingnodes is listed in Table 1. It is observed that from version 3 to ver-sion 4, the number of links created among existing nodesincreases significantly. From version 4 to version 5, the number oflinks created among existing nodes increases slightly.

Mechanism (f): The probabilities of removal of existing linksbetween nodes as functions of the degrees of nodes are shown inFig. 5. It is observed that linear trends from these log–log scaleplots (indicating power law distribution) can be used to describethe probability functions.

4 Network-Level Analysis of the Evolution of Drupal

4.1 Network Measures. After modeling the product structureas a complex network, the evolutionary characteristics of the cor-responding network are explored. Complex network analysis met-rics [50] are employed to quantify the evolutionary characteristicsof complex product structures. The metrics used in this paper areaverage degree, degree distribution, density, clustering coefficient,average shortest path, propagation cost, and clustered cost. Thefirst five metrics are extensively used by the network science com-munity to characterize the topologies of complex networks. Prop-agation cost and clustered cost are used by the OSS community tocharacterize the complexity of software products.

Average degree is the average number of nearest neighbors ofvertices [43]. It is chosen because it represents the average num-ber of other nodes connected to a node, and indicates the com-plexity of the product [51]. According to existing studies [52,53],the design complexity of a product, which indicates the redesignwork in the product development processes, can be quantified asDðAÞ ¼ 1=n

Pni¼1 Zi, where D(A) is the design complexity, n is

the number of components, and Zi is number of connections(degree) per component Zi for all components. Design complexityis related to the average degree in the complex network represen-tation. Manufacturing complexity [53] is also related to the aver-age degree and can be expressed as: MðAÞ ¼ nþ 1=n

Pni¼1 Zi.

Fig. 3 Probabilities of new nodes linking with each other (mechanism (d))

Fig. 4 Probabilities of existing nodes linking with each other (mechanism (e))

Fig. 5 Probabilities of removal of existing links (mechanism (f))

Journal of Computing and Information Science in Engineering MARCH 2014, Vol. 14 / 011003-5

Downloaded From: http://computingengineering.asmedigitalcollection.asme.org/ on 01/12/2014 Terms of Use: http://asme.org/terms

Page 6: A Generative Network Model for Product Evolution · with the actual product structures using various network measures ... structure is through dependency ... analyze the impact of

The degree distribution, P(k) is defined as the fraction of nodes inthe network with degree k [54]. The degree distribution is importantbecause it indicates the topology of a product structure network.

Density of a network is the average proportion of links incidentwith nodes in a network [50]. The density of a complex networkcan be expressed as the ratio of the number of links in a networkto the number of maximum possible links. Density is an alterna-tive way to represent the complexity of a system. According toMarczyk and Deshpande [55], higher network density implies thathigher complexity can be reached.

Clustering coefficient is the probability that two nearest neigh-bors of a vertex are also the nearest neighbors of one another [54].Clustering coefficient indicates possible “cliques” with high con-nections inside and low connections outside. Prior research high-lights that a high clustering coefficient is observed in variousopen-source software projects when compared to their randomnetwork counterparts [34]. A high clustering coefficient means theemergence of cliques in the product structure network. The emer-gence of cliques reduces rework in the development processesbecause the interactions among cliques are lower. The reductionof rework enables the system to be decoupled into sub-systems fordevelopment.

Average shortest path is the average of shortest path lengthsthat links two vertices in a network [54]. In the product structurenetwork, average shortest path indicates the efficiency of informa-tion exchange between two arbitrary nodes. The average shortestpath is related to change complexity [53], which describes thelikelihood of a change propagating between two components in aproduct. The value of change complexity is inversely related tothe average shortest path [56]. The change complexity can beexpressed as

CðAÞ ¼ nðn� 1Þn

Xn

i¼1

Xn

j¼iþ1

dij; (2)

where dij is the shortest path between nodes i and j.Propagation cost, proposed by MacCormack et al. [2], is a mea-

sure of the degree of coupling in a complex system. The metricquantifies the average percentage of other nodes directly or indi-rectly affected by a change to a node within a network. The metricis based on the concept of visibility of a node in a network, whichis the number of other nodes it is directly or indirectly (i.e.,through intermediate nodes) connected to. It is calculated as theaverage “fan-out visibility” of nodes [2].

Clustered cost [2] is another measure of degree of coupling. Incontrast to propagation cost where each dependency is assumed toincur the same cost, the assumption in clustered cost is that thedependencies within a cluster incur a lower cost than the depend-encies across clusters. In order to calculate clustered cost, the net-work is first clustered, and then weights are assigned to thedependencies depending on the location of the nodes within dif-ferent clusters. In this paper, we use the Girvan-Newman [57]clustering algorithm to assign nodes into clusters. MacCormacket al. [2] identify a set of nodes, called the vertical bus, consistingof nodes connected to a large number of other nodes. If a givennode i is connected to a node j in the vertical bus, the dependencycost is a binary variable dij. If two nodes i and j are within a clus-ter, the dependency cost is given by dij � nk, where n is the size ofthe cluster and k is a user defined parameter (set to 2 in Ref. [2]).For links between nodes across different clusters, the dependencycost is dij � Nk, where N is the size of the complete network.

4.2 Analysis of Drupal Product Structure. Using the sourcecode, we identify how the characteristics of the product structurechange over time. The degree distribution plots for versions 2–5are displayed in Fig. 6. The degree distributions are plotted on alog–log scale. It is observed that the general forms of the degreedistributions for all the versions are similar and are closer to thatof a scale-free graph. The degree distribution plots indicate that

the majority of nodes have less than 4 interfaces. Beyond a degreeof 4, the degree distribution exhibits a power-law trend, indicatinga scale-free network topology. Such a scale-free graph propertyhas been found to be a common pattern across many differentsoftware applications. Hyland-Wood et al. [58] show that thedegree distribution of Kowari follows a linear trend when thedegree is larger than 4, while displays a homogenous trend whenthe degree is smaller than 4. LaBelle and Wallingford [59] showsimilar trend in the out-degree distribution of Debian productstructure. The nodes with more than 20 interfaces are “hubs” inthe product structure network. These hubs are analogous to thenodes within a vertical bus, as defined by MacCormack et al. [2].

Other network characteristics for the four versions of Drupalare listed in Table 2. From the number of nodes and interfaces, itis clear that the Drupal project has been constantly growing at afast pace. Plotting the number of nodes and interfaces (see Fig. 7)reveals that the number of nodes scales linearly with the numberof interfaces, indicating a sparse graph. Valverde and Sole observea similar trend and conclude that the network grows such that onaverage, new nodes attach to almost a constant number of existingnodes [34]. However, such a conclusion ignores the fact that alarge portion of the nodes and links are also removed from the net-work. In Sec. 5.2, we investigate this in detail.

The average degree of nodes shows two stages in the evolutionof product structure network. The first stage is from version 2 toversion 3. In this stage, the average degree significantly increases.The second stage is from version 3 to version 5, when the averagedegree does not change significantly. The average density reduceslinearly over time. The decreasing trend of the average density isalso discovered by MacCormack et al. [2] for the structure ofMozilla. The clustering coefficient remains constant and is closeto 0.1 for versions 2–5. In Table 3, the clustering coefficients arecompared with random graphs consisting of the same number ofnodes and edges. The clustering coefficients of product structurenetworks are about an order of magnitude larger than the

Fig. 6 Degree distribution of Drupal product structure

Table 2 Characteristics of different versions of the Drupalproduct structure

V2 V3 V4 V5

No. of nodes 248 412 635 1018No. of interfaces 661 1304 2041 3139Average degree 4.4981 6.122 6.123 6.081Average density 0.019 0.014 0.009 0.005Clustering Coeff. 0.11 0.099 0.107 0.100Avg. shortest path 2.895 2.901 2.965 3.084Propagation cost 0.0106 0.0086 0.0055 0.0035Clustered Cost 92239 271051 676840 1815612

011003-6 / Vol. 14, MARCH 2014 Transactions of the ASME

Downloaded From: http://computingengineering.asmedigitalcollection.asme.org/ on 01/12/2014 Terms of Use: http://asme.org/terms

Page 7: A Generative Network Model for Product Evolution · with the actual product structures using various network measures ... structure is through dependency ... analyze the impact of

corresponding random graphs. Besides, the clustering coefficientsof product structure networks are independent of network sizes,while in the corresponding random graphs, the clustering coeffi-cients are linearly decreasing. This is consistent with the conclu-sion drawn by Valverde and Sole [34]. In their studies, high andsize-independent clustering coefficient values are observed inOSS projects.

As shown in Table 2, the propagation cost decreases from0.0106 to 0.0035 for versions 2–5. As stated earlier, the propaga-tion cost measures the average visibility of the nodes in the net-work. It measures the average number of nodes affected by achange in a node. A decrease in visibility indicates a decrease inthe extent of coupling within the network from versions 2 to 5.Propagation cost is based on the assumption that the costs associ-ated with all links are equal. On the other hand, clustered cost isbased on the assumption that links within a cluster have lowercosts than the links across clusters. In other words, the assumptionis that it is easier to address change propagation within a singlemodule than changes across different modules. The clustered costfor Drupal increases monotonically from 92,239 to 1,815,612 forversions 2–5. While the decrease in propagation cost indicates areduction in the extent of overall coupling in the software overtime, the increase in clustered cost implies that more links are cre-ated across clusters than within clusters. Similar trends for propa-gation and clustered costs have also been observed byMacCormack et al. [2] for the Mozilla project after significantredesign in 1998.

5 Generative Model for the Evolution of Product

Structure

5.1 Modeling Process. A computational model is built tosimulate the effect of mechanisms at the module level on the evo-lution of Drupal product structure. Figure 8 outlines the executionof the model. The data for mechanisms (a) and (b) are used fromTable 1. For mechanisms (c)–(f), the numbers of interfaces cre-ated or removed are based on the functions listed in Figs. 2–5.Although the data are collected only for the major versions of theproduct (V2 ! V5) the network is generated gradually throughthe addition and removal of individual nodes and links. The mech-anisms can be used to represent continuous growth of the network

for a given version (e.g., V5.1.1!V5.1.2 etc.). Information aboutmajor versions is required for the model because these representmajor milestones in the project, and significant changes to theproduct structure happen at these milestones.

Three alternatives for the initial product structure networks arechosen: (a) the product structure network from the first versionconsidered (e.g., version 2 in Drupal), (b) a random network withthe same number of nodes and links as the product structure net-work of version 2, and (c) a scale-free network with the samenumber of nodes and links as the product structure network of ver-sion 2. The reason for selecting three types of initial product struc-ture networks is to determine whether the types of initial productstructure networks also affect the evolutionary characteristics ofthe product structure network. Random network is used as a base-line. In the existing studies of network evolution, random net-works are extensively used to represent initial network topologies.Scale-free network is used because existing studies (e.g., Ref.[34]) reveal that many real-world networks (including OSS) havethe scale free property. Three time periods are simulated for Dru-pal: from version 2 to version 3, from version 3 to version 4, andfrom version 4 to version 5. In each period, the node-level mecha-nisms discussed in Sec. 3.1 are simulated based on the probabilityfunctions discussed in Sec. 3.2.

5.2 Results from the Execution of the Model. The struc-tures of the networks generated using three types of initial net-works are compared with the product structure networks fromversion 2 to version 5. Figure 9 displays a comparison betweenthe characteristics of Drupal product structure and the generatednetworks over time. An important observation is that the struc-tures of the networks generated from all the three types of initialnetworks converge to the structure of Drupal as the network evo-lution takes place. From the figure, it is observed that the averagedegree of the generated network using the initial version 2 net-work matches the Drupal project. Initial scale-free and randomnetworks have different average degrees compared to the initialDrupal version 2 network. However, with the evolutionary pro-cess, the values of average degrees of generated networks con-verge to that of the Drupal network. The average densities of threemodels are similar to the Drupal product (which is because thedensity is dependent on the numbers of nodes and links only).

The clustering coefficients of models with the initial scale-freenetwork and the initial version 2 network are close to the Drupalproduct. The model with initial random network has a small clus-tering coefficient at the beginning, which represents the character-istics of a random network. However, the clustering coefficientsignificantly increases and converges to the Drupal product overtime. The convergence is not obvious because at version 4, thedifferences between models and Drupal project are larger com-pared to versions 3 and 5. In this figure, we also observe that themodel with the initial random network has a large average shortestpath compared to the Drupal project. Finally, it evolves andbecomes closer to Drupal. The propagation costs and clusteredcosts of the Drupal network are close to those of the generatednetworks for all the versions.

A comparison of degree distribution among models with threetypes of initial networks and Drupal is provided in Fig. 10. At thebeginning, the differences among different initial networks aresignificant. The initial random network displays a Poisson distri-bution, while the initial scale-free network displays a power-lawdistribution. Both of them are different from the initial version 2network. With the evolutionary processes, the degree distributionsof three models converge to the degree distribution of Drupal pro-ject. The comparison between degree distributions is carried outusing the Pearson’s chi-square test. The P-values for the test areprovided in Table 4.

From the execution of the models, we conclude that:

(1) When the initial version 2 network is used in the model, theevolutionary characteristics including average degree,

Table 3 Comparison of clustering coefficient between productstructure network and corresponding random graphs

V2 V3 V4 V5

Product structure network 0.110 0.099 0.107 0.093Random graph 0.022 0.015 0.007 0.004

Fig. 7 Relationship between the number of functions andinterfaces

Journal of Computing and Information Science in Engineering MARCH 2014, Vol. 14 / 011003-7

Downloaded From: http://computingengineering.asmedigitalcollection.asme.org/ on 01/12/2014 Terms of Use: http://asme.org/terms

Page 8: A Generative Network Model for Product Evolution · with the actual product structures using various network measures ... structure is through dependency ... analyze the impact of

average density, clustering coefficient, average shortestpath, propagation cost, clustered cost and degree distribu-tion are close to the Drupal product over time, with the useof the proposed mechanisms.

(2) When the initial scale-free network or random network isapplied in the model, the evolutionary characteristics aredifferent at the beginning due to the differences in topolo-gies. However, by executing the model with the proposedmechanisms, the structures of the networks from the mod-els converge to the Drupal product.

These results indicate that the mechanisms can potentially beused to model the evolution of product structures in Drupal. In thecase of Drupal, we found that even when the initial product struc-ture is different, if the same mechanisms are applied, the evolutionof product structures converges to the same structure over time.This indicates the robustness of the node-level mechanisms inmodeling the product evolution.

6 Application to RepRap 3D Printer: An Open-Source

Hardware Product

In this section, we utilize the proposed approach to analyze thestructure and evolution of the Reprap 3D printer [11], which isdeveloped though open-source principles. The Reprap project hasbeen under development since 2005. Reprap has been chosen inthis paper for three reasons. First, it is one of the most developedand widely used open hardware projects. Second, the evolution ofthe entire project is well documented online [11]. Third, thedetailed design documents, such as computer aided design(CAD)/computer-aided manufacturing (CAM) files, are openlyavailable for download. Three major versions of RepRrep: Men-del, Huxley, and Prusa have been used to analyze the evolution.CAD models for these three versions were downloaded fromRefs. [11,60]. The product network structure is extracted from theCAD/CAM files by modeling parts as nodes and their physicalconnections as links.

Compared to open-source software products analyzed in Sec.5.2, the hardware product has smaller number of nodes and fewerphysical connections. On average, RepRap consists of 133 nodesand 205 links. In contrast, Drupal network has more than 500nodes and 1500 edges. The RepRap network includes main sub-assemblies such as the x-axis system, y-axis system, and z-axissystem. Each system consists of more than 30 parts, such asclamps, vertex, sides, bearings, and plates. Within each sub-assembly, a core component such as the x-bar serves as a keynode and has physical connections with many other components.The number of nodes added in each version is listed in Table 5.

6.1 Analysis of Node-Level Mechanisms for Reprap. Thesix node-level mechanisms are used to analyze the product evolu-tion from Mendel to Huxley, and Huxley to Prusa. Table 5 showsthe basic data associated with the node-level mechanisms.

Fig. 9 Comparison of evolutionary characteristics at product level between Drupal product and the models. (i) The product, (ii)model with initial version 2 network, (iii) model with initial random network, (iv) model with initial scale-free network

Fig. 8 The execution of the model based on mechanisms

011003-8 / Vol. 14, MARCH 2014 Transactions of the ASME

Downloaded From: http://computingengineering.asmedigitalcollection.asme.org/ on 01/12/2014 Terms of Use: http://asme.org/terms

Page 9: A Generative Network Model for Product Evolution · with the actual product structures using various network measures ... structure is through dependency ... analyze the impact of

(a) Addition of new nodes corresponds to addition of new parts.In Reprap, new parts are added when new features andspecifications are needed, and when existing parts are rede-signed with new features. For example, from Mendel toHuxley, new parts are added to reinforce the base of theprinter and to increase its stability for printing larger parts.A comparison between Mendel and Huxley is shown inFig. 11, which shows new parts, including base bottom,base top, and base reinforcement clamps to increase the sta-bility of the machine.

(b) Removal of existing nodes corresponds to removal of partsfrom the assembly. In Reprap, existing parts are removedwhen existing features are no longer needed, or when exit-ing parts are replaced by redesigned parts. From Huxley toPrusa, parts are combined to make them multifunctional.The redesign of the x-axis system in Prusa significantlyreduced the number of parts. In Huxley, the x-axis systemconsists of 23 parts, including clamps, spacers, bars, belts,bases, carriages, and motor. However, in Prusa, the x-axissystem consists of only 8 nodes. The unique carriage part inPrusa is redesigned by combining the functions of clamp-ing, connection, and load bearing into one.

(c) Linking of new nodes with existing nodes: The new partscan be connected to existing parts, creating links betweennew parts and existing parts. From Mendel to Huxley, 26existing parts have new links with 78 new parts. 22 out ofthe 78 new parts are connected to two existing x-bars. Byanalyzing the Mendel product network, it is found that thenodes corresponding to these two x-bars have the highestdegree. Each x-bar has a degree of 13, and 10 out of 26existing nodes have only one new link each. For these 10existing nodes, the average degree is 3.1, which is lowerthan the degrees of the x-bars. Figure 12 shows the proba-bilities of the new links established between new and exist-ing nodes as a function of degree. It is observed that thenew nodes have higher probability to connect to existingnodes with higher degree. We do not perform regression ofthe probability function due to the small sample size.

(d) Linking of new nodes with each other: New nodes may alsobe connected with each other to create new functionalities.From Huxley to Prusa, 2 new nodes with initial degree of 4,on average have 3 new connections among other newnodes. However, 27 new nodes with initial degree of 1, onaverage, have 1.2 new connections among new nodes. Fig-ure 12 shows the probabilities of new links establishedbetween new nodes.

(e) Linking of existing nodes with each other: With the productevolution, new links are also added between existing nodes.For this mechanism, we do not plot the probability-degreechart since only a few links were created between existingnodes. We observe that from Mendel to Huxley, two newlinks are established between existing nodes. These twonew links are between y-bar clamps and y-bars in order toreinforce the y-bars during the printing process. From

Fig. 10 Comparison of degree distribution between Drupal product and the models

Table 4 P-values from the Chi-square test on degreedistributions

V2 V3 V4 V5

Initial V2 network — 0.6394 0.1298 0.1777Scale-free network <0.0001 0.0574 0.3430 0.3872Random etwork <0.0001 0.0023 0.0048 0.6616

Journal of Computing and Information Science in Engineering MARCH 2014, Vol. 14 / 011003-9

Downloaded From: http://computingengineering.asmedigitalcollection.asme.org/ on 01/12/2014 Terms of Use: http://asme.org/terms

Page 10: A Generative Network Model for Product Evolution · with the actual product structures using various network measures ... structure is through dependency ... analyze the impact of

Huxley to Prusa, no new links are established betweenexisting nodes.

(f) Removal of existing links: Existing links can be removedbecause of the removal of existing nodes, and the removalof links between two existing nodes. Figure 12 shows theprobabilities of existing links removed from the network.The results indicate that the probability of a link that isremoved has a linear relationship with the degree of thenode to which this link attaches. The higher the degree of anode, the more likely its links will be removed. This isbecause existing links are removed mainly because of theremoval of existing nodes. For example, in Reprap, existinglinks are mainly reduced due to the removal of existingnodes. From Mendel to Huxley, all of the removed links aredue to the removal of existing nodes. From Huxley toPrusa, only one link is removed from two existing nodes,while other links are removed due to the removal of exist-ing nodes. The reason behind this is that in a physical prod-uct, it is difficult to simplify the existing design by simplyremoving links between nodes. Instead, in order to simplifythe existing designs, the parts also need to be redesigned.

6.2 Analysis of Network Structure and Its Evolution.Table 6 displays the characteristics of Reprap product structuresover time for three major versions. The network metrics showthree general trends from Mendel to Prusa. First, there is a reduc-tion in the average degree of the network, which shows a decreasein the number of links between parts. This is mainly a result ofintegration and combination of parts. Since several parts are rede-signed into one part that has multiple functionalities, links that areassociated with the old parts are removed as well. Second, the av-erage cluster coefficient decreases. The average cluster coefficientis proportional to the number of triangles in the network. Thedecrease in cluster coefficient indicates a decrease in the degree ofcoupling in design. For example, to hold the z-bar in Mendel, twoz-bar clamps are designed and connected by fastening bolts. Thisresults in a physical connection between any two parts of the as-sembly, causing a triangle connection in the network topology.

Since there are many such types of structures in Mendel, the aver-age cluster coefficient is high at 0.217. In contrast, the redesign ofparts for multiple functionalities in Huxley and Prusa reduces thetriangle connectivity, thus the average cluster coefficientdecreases. Third, there is a decrease in average shortest path. Thedecrease in this metric also indicates the reduction and simplifica-tion of parts and connectivity for the product changes in differentversion.

Figure 13 displays the degree distribution for three major ver-sions. The degree distributions indicate the similarity in networkstructure to the open source software, as shown in Fig. 6. Most ofthe parts have the same number of connections, and there are onlyfew parts (e.g., y-chassis in Mendel, base sheet bottom in Huxley,and x-carriage in Prusa) with many connections. These parts areeither chassis working as the central support framework, or thepivot working as a bridge to connect other parts.

In summary, the evolution of an open source hardware productis analyzed by using proposed network-based framework whichcontains six node-level mechanisms. This case study does notonly show that the proposed approach can provide insights aboutthe product structure and patterns in product development but alsoshows the generality of the proposed framework. As open sourcehardware matures and products become more complex, regressiontechniques can be utilized to obtain the probability models foreach mechanism so that the evolution of the open source hardwareproducts can be modeled and predicted.

7 Closing Comments

The product structure plays an important role in the product de-velopment. In the open-source domain, the product structureaffects not only the efficiency of product development [7] but alsothe community structure [6,61]. Hence, it is important to get anunderstanding of product structure and evolution in open-sourceprocesses. To facilitate that understanding, we present a genera-tive model of the evolution of open-source software products. Themodel captures the underlying dynamics of evolution of open-source software products. The uniqueness of the proposed modelfor open-source software evolution is that the dynamics is mod-eled in terms of the module-level (i.e., local) observations such asaddition and deletion of nodes and links. It is shown that applyingthe mechanisms is potentially a robust way to model the productstructure over time because the differences in initial product struc-tures do not have a significant effect on the final product structure.Such an evolutionary model based on the local network observa-tions can help in identifying not only the extent of increase incomplexity over time but also the mechanisms through which thecomplexity increases. There are three general applications of themodel presented in this paper: (1) longitudinal studies of a

Fig. 11 Product evolution from Mendel to Prusa

Table 5 Data related to node-level mechanisms in RepRap

Mechanism Mendel! Huxley Huxley! Prusa

(a) nodes added 88 65(b) nodes removed 65 127(c) links between new& existing nodes

78 61

(d) links among new nodes 93 54(e) links among existing nodes 2 0(f) links removed 137 224

011003-10 / Vol. 14, MARCH 2014 Transactions of the ASME

Downloaded From: http://computingengineering.asmedigitalcollection.asme.org/ on 01/12/2014 Terms of Use: http://asme.org/terms

Page 11: A Generative Network Model for Product Evolution · with the actual product structures using various network measures ... structure is through dependency ... analyze the impact of

product’s evolution, (2) cross-sectional studies of evolution of dif-ferent products, and (3) predictive analyses.

Longitudinal studies: The six mechanisms discussed in Sec. 3.1provide different insights into how the products grow in size andcomplexity. Mechanisms (a) and (b) describe the number of nodesadded and removed. Hence, they provide an indication of thegrowth rate and the extent of change in the products. Mechanisms(c)–(f) describe the network evolution in terms of linking of newand existing nodes. Hence, they indicate a change in structuralcomplexity of products. Different probability functions result indifferent topologies of the overall network. For example, simpleprobability function where the probability of linking is propor-tional to the nodes’ degrees results in a power-law degree distribu-tion with c¼ 3. Nonlinear attachment functions result in morecomplex degree distributions. Through these relationshipsbetween the node-level behaviors and the network’s global char-acteristics, the mechanisms can provide information about howspecific design changes during a particular version affect the over-all product structure.

Cross-sectional studies: The goal of this paper is not to com-pare the evolution of different software applications. But thiscould be a potential application of the model proposed in his pa-per. The differences in the evolution of different products can beexplained in terms of the differences between the corresponding

node-level mechanisms. Different values of parameters for thenode-level mechanisms indicate different evolutionarycharacteristics.

Predictive analyses: The third potential application of themodel is that it provides the ability to perform what-if analyses bypredicting the product structures that could emerge based on cer-tain design decisions. For example, if we assume that the productwill evolve in the same manner as it has in the past, we canextrapolate the parameters associated with the node-level mecha-nisms. The extrapolated parameters can be used in the model topredict the evolution of the product. Specifically, the probabilityfunctions can be summarized as: P ¼ aKb or aebK , where a and bare coefficients and K is the degree. In the prediction process, thefitted curves can be used to predict the coefficients a and b for themechanisms (e)–(f) based on the existing coefficients. Once thepredicted coefficients a and b are obtained, the predicted productstructure for the next version can be determined. Additionally,knowledge about specific design changes to be carried out in thefuture versions can also be used in node-level mechanisms to pre-dict the global characteristics of future versions of the software.

There are significant opportunities for further research in thisdirection. First, function-call graphs are used to model the struc-ture of software. The approach can be applied in future to otherlevels of granularity (files, classes, modules, etc.). Second, inorder to understand the network-level impact of the underlyingmechanisms, a comprehensive analysis of the effect of the specificfunctional forms of the probability functions on the network topol-ogy is needed. Third, existing research points to the commonal-ities between the network topologies of software. However, thismodel can be utilized in future to explore commonalities in evolu-tionary patterns in terms of the module-level mechanisms. Finally,the focus in this paper is on studying the evolution of productstructures only. Further work is needed to model the co-evolutionof products and communities of participants [62]. Such analysis isimportant to validate the mirroring hypothesis [1] according towhich, the product structures and community structures mirroreach other.

Acknowledgment

We gratefully acknowledge the financial support from theNational Science Foundation through the CAREER grant number1265622.

References[1] MacCormack, A., Rusnak, J., and Baldwin, C., 2012, “Exploring the Duality

Between Product and Organizational Architectures: A Test of the “Mirroring”Hypothesis,”” Research Policy, 41(8), pp. 1309–1324.

[2] MacCormack, A., Rusnak, J., and Baldwin, G., 2006, “Exploring the Structureof Complex Software Designs: An Empirical Study of Open Source and Propri-etary Code,” Manage. Sci., 52(7), pp. 1015–1030.

[3] Steward, D. V., 1981, “The Design Structure System: A Method for Managingthe Design of Complex Systems,” IEEE Trans. Eng. Manage., 78(3), pp. 71–74.

[4] Browning, T. R., 2001, “Applying the Design Structure Matrix to SystemDecomposition and Integration Problems: A Review and New Directions,”IEEE Trans. Eng. Manage., 48(3), pp. 292–306.

Fig. 12 Probability distributions of mechanisms

Table 6 Evolutionary characteristics of RepRap network

Network measure Mendel Huxley Prusa

Average degree 3.064 3.086 2.804Average density 0.022 0.019 0.029Clustering coefficient 0.217 0.127 0.034Average shortest path 7.587 7.574 6.105Propagation cost 0.0112 0.0095 0.0143Clustered cost 8010 10256 4124

Fig. 13 Degree distributions of the three versions of RepRap

Journal of Computing and Information Science in Engineering MARCH 2014, Vol. 14 / 011003-11

Downloaded From: http://computingengineering.asmedigitalcollection.asme.org/ on 01/12/2014 Terms of Use: http://asme.org/terms

Page 12: A Generative Network Model for Product Evolution · with the actual product structures using various network measures ... structure is through dependency ... analyze the impact of

[5] Wynn, D. C., Kreimeyer, M., Clarkson, P. J., and Lindemann, U., 2012,“Dependency Modelling in Complex System Design,” J. Eng. Design,23(10–11), pp. 715–718.

[6] Panchal, J., 2009, “Agent-based Modeling of Mass Collaborative Product De-velopment Processes,” ASME J. Comput. Inf. Sci. Eng., 9(3), p. 031007.

[7] Le, Q., and Panchal, J. H., 2011, “Modeling the Effect of Product Architectureon Mass Collaborative Processes—An Agent-based Approach,” J. Comput. Inf.Sci. Eng., 11(1), p. 011003.

[8] Huang, H., Le, Q., and Panchal, J. H., 2011, “Analysis of the Structure and Evo-lution of an Open-Source Community,” ASME J. Comput. Inf. Sci. Eng., 11(3),p. 031008.

[9] Weber, S., 2004, The Success of Open Source, Harvard University Press, Cam-bridge, MA.

[10] Pearce, J. M., 2012, “Building Research Equipment With Free, Open-SourceHardware,” Science, 337(6100), pp. 1303–1304.

[11] RepRap, 2013, “Build a RepRap.” Available at: http://reprap.org/wiki/Main_Page

[12] Oxer, J., and Blemings, H., 2009, Practical Arduino: Cool Projects for OpenSource Hardware, APress, New York, NY.

[13] SourceForge, 2009, “SourceForge.net: Open Source Software.” Available at:http://sourceforge.net/

[14] Drupal, 2011, “Drupal: Community Plumbing.” Available at: https://drupal.org.[15] Crowston, K., Wei, K., Howison, J., and Wiggins, A., 2008, “Free/Libre Open-

Source Software Development: What we Know and What we do Not Know,”ACM Comput. Surv., 44(2), pp. 7:1–7:35.

[16] Turski, W. M., 1996, “Reference Model for Smooth Growth of SoftwareSystems,” IEEE Trans. Software Eng., 22(8), pp. 599–600. Available at: http://dl.acm.org/citation.cfm?id=235686

[17] Lehman, M. M., and Belady, L. A., eds., 1985, Program Evolution: Processesof Software Change, Academic Press Professional, Inc., San Diego, CA.

[18] Cataldo, M., Herbsleb, J. D., and Carley, K. M., 2008, “Socio-Technical Con-gruence: A Framework for Assessing the Impact of Technical and WorkDependencies on Software Development Productivity,” Proceedings of the Sec-ond ACM-IEEE International Symposium on Empirical Software Engineeringand Measurement, ESEM’08, ACM, pp. 2–11.

[19] Hutchens, D. H., and Basili, V. R., 1985, “System Structure Analysis: Cluster-ing With Data Bindings,” IEEE Trans. Software Eng., 11(8), pp. 749–757.

[20] Selby, R. W., and Basili, V. R., 1991, “Analyzing Error-Prone SystemStructure,” IEEE Trans. Software Eng., 17(2), pp. 141–152.

[21] Murphy, G. C., Notkin, D., Griswold, W. G., and Lan, E., 1998, “An EmpiricalStudy of Call Graph Extractors,” ACM Trans. Softw. Eng. Methodol., 7(2), pp.158–191.

[22] Gall, H., Hajek, K., and Jazayeri, M., 1998, “Detection of Logical CouplingBased on Product Release History,” Proceedings International Conference onSoftware Maintenance, pp. 190–198. Available at: http://dl.acm.org/citation.cfm?id=853338

[23] Eick, S. G., Todd, L. G., Alan, F. K., Marron, J. S., and Mockus, A., 1999,“Does Code Decay? Assessing the Evidence From Change Management Data,”IEEE Trans. Software Eng., 27(1), pp. 1–12.

[24] Cataldo, M., Wagstrom, P. A., Herbsleb, J. D., and Carley, K. M., 2006,“Identification of Coordination Requirements: Implications for the Design ofCollaboration and Awareness Tools,” Proceedings of the 2006 20th Anniver-sary Conference on Computer Supported Cooperative Work, CSCW’06, ACM,pp. 353–362.

[25] Godfrey, M., and Tu, Q., 2001, “Growth, Evolution, and Structural Change inOpen Source Software,” Proceedings of the 4th International Workshop onPrinciples of Software Evolution, IWPSE’01, ACM, pp. 103–106.

[26] Milev, R., Muegge, S., and Weiss, M., 2009, “Design Evolution of an OpenSource Project Using an Improved Modularity Metric,” Open Source Ecosys-tems: Diverse Communities Interacting, Vol. 299, C. Boldyreff, K. Crowston,B. Lundell, and A. Wasserman, eds., IFIP Advances in Information and Com-munication Technology, Springer, Berlin, Heidelberg, pp. 20–33.

[27] Sosa, M. E., Browning, T., and Mihm, J., 2007, “Studying the Dynamics of theArchitecture of Software Products,” ASME 2007 International Design Engi-neering Technical Conferences and Computers and Information in EngineeringConference, No. DETC2007-34761.

[28] MacCormack, A., Rusnak, J., and Baldwin, C., 2008, “The Impact of Compo-nent Modularity on Design Evolution: Evidence From the Software Industry,”Harvard University, Technical Report 08-038.

[29] LaMantia, M., Cai, Y., MacCormack, A., and Rusnak, J., 2008, “Analyzing theEvolution of Large-Scale Software Systems Using Design Structure Matricesand Design Rule Theory: Two Exploratory Cases,” Seventh Working IEEE/IFIP Conference on Software Architecture, WICSA 2008, pp. 83–92.

[30] Pfleeger, S. L., and Atlee, J. M., 2009, Software Engineering: Theory and Prac-tice. Prentice Hall, Upper Saddle River, NJ.

[31] Huynh, S., and Cai, Y., 2007, “An Evolutionary Approach to Software Modu-larity Analysis,” Proceedings of the First International Workshop on Assess-ment of Contemporary Modularization Techniques, ACoM’07, IEEE ComputerSociety, pp. 1–6.

[32] Wen, H., D’Souza, R. M., Saul, Z. M., and Filkov, V., 2009, Evolution ofApache Open Source Software, Springer, New York, pp. 199–216.

[33] Mockus, A., Fielding, R. T., and Herbsleb, J. D., 2002, “Two Case Studies ofOpen Source Software Development: Apache and Mozilla,” ACM Trans.Softw. Eng. Methodol., 11(3), pp. 309–346.

[34] Valverde, S., and Sole, R., 2003, “Hierarchical Small-Worlds in SoftwareArchitecture,” Santa Fe Inst. Working Paper SFI/03-07-044.

[35] Raymond, E., 2001, “The Cathedral and the Bazaar,” Knowl., Technol. Policy,12(3), pp. 23–49.

[36] O’Reilly, T., 1999, “Lessons From Open-source Software Development,” Com-mun. ACM, 42(4), pp. 33–37.

[37] Torvalds, L., 1999, “The Linux Edge,” Commun. ACM, 42(4), pp. 38–39.[38] Robins, G., Pattison, P., Kalish, Y., and Lusher, D., 2007, “An Introduction to

Exponential Random Graph (p) Models for Social Networks,” Soc. Networks,29(2), pp. 173–191.

[39] Liang, G., and Yu. B., 2003, “Maximum Pseudo Likelihood Estimation in NetworkTomography,” IEEE Trans. Acoust., Speech, Signal Process., 31(8), pp. 2043–2053.

[40] Snijders, T., 2002, “Markov Chain Monte Carlo Estimation of ExponentialRandom Graph Models,” J. Soc. Struct., 3(2), pp. 1–40.

[41] Duke, C. B., Hopcroft, J. E., Arkin, A. P., Armstrong, R. E., Barab�asi, A. L.,Brachman, R. J., Broome, N. L., Davis, S., De Millo, R. A., Hilsman, W. J.,Leland, W. E., Malone, T. W., Murray, R. M., Pellicci, J., Silver, P. A., andVan Riper, P. K., 2007, Network Science: Report from the Committee on Net-work Science for Future Army Applications, The National Academies Press,Washington, D.C.

[42] Barabasi, A. L., Albert, R., and Jeong, H., 2000, “Scale-Free Characteristics ofRandom Networks: The Topology of the World-Wide Web,” Phys. A: Statist.Mech. Appl., 281(1–4), pp. 69–77.

[43] Dorogovtsev, S. N., and Mendes, J. F. F., 2002, “Evolution of Networks,” Adv.Phys., 51(4), pp. 1079–1187.

[44] Krapivsky, P. L., Redner, S., Leyvraz, F., 2000, “Connectivity of Growing Ran-dom Networks,” Phys. Rev. Lett., 85(21), pp. 4629–4632.

[45] Tangmunarunkit, H., Govindan, R., Jamin, S., Shenker, S., and Willinger, W.,2002, “Network Topology Generators: Degree-Based Versus Structural,” SIG-COMM Comput. Commun. Rev., 32(4), pp. 147–159.

[46] Barabasi, A.-L., and Albert, R., 1999, “Emergence of Scaling in RandomNetworks,” Science, 286(5439), pp. 509–512.

[47] Banker, R. D., and Slaughter, S. A., 2000, “The Moderating Effect of Structureon Volatility and Complexity in Software enhancement,” Information SystemsResearch, 11(3), pp. 219–240.

[48] Rusovan, S., Lawford, M., and Parnas, D., 2004, Open Source Software Devel-opment: Future or Fad? Perspectives on Free and Open Source Software, MIT,Cambridge, MA.

[49] Doxygen, 2011, “Generate Documentation From Source Code,” http://www.doxygen.org

[50] Wasserman, S., and Faust, K., 1994, Social Network Analysis: Methods andApplications, Cambridge University, Cambridge.

[51] Henry, S., and Kafura, D., 1981, “Software Structure Metrics Based on Infor-mation Flow,” IEEE Trans. Software Eng., SE-7(5), pp. 510–518.

[52] Sosa, M. E., Eppinger, S. D., and Rowles, C. M., 2007, “A Network Approachto Define Modularity of Components in Complex Products,” J. Mech. Des.,129(11), pp. 1118–1129.

[53] Wyatt, D. F., Wynn, D. C., and Jarrett, J. P., 2010, “Supporting Product Archi-tecture Design Using Computational Design Synthesis With Network StructureConstraints,” Res. Eng. Des., 23(1), pp. 17–52.

[54] Newman, M. E. J., 2003, “The Structure and Function of Complex Networks,”SIAM Rev., 45(2), pp. 167–256.

[55] Marczyk, J., and Deshpande, B., 2008, Measuring and Tracking Complexity inScience, Unifying Themes in Complex Systems, A. Minai, D. Braha, and B-Y.Yaneer, eds., Springer Berlin Heidelberg, pp. 27–33.

[56] Giffin, M., deWeck, O., Buonova, G., Keller, R., Eckert, C., and Clarkson, P.,2009, “Change Propagation Analysis in Complex Technical Systems,” ASMEJ. Mech. Des., 131(8), pp. 081001.

[57] Girvan, M., and Newman, M. E., 2002, “Community Structure in Socialand Biological Networks,” Proc. Natl. Acad. Sci. U.S.A., 99(12), pp.7821–7826.

[58] Hyland-Wood, D., Carrington, D., and Kaplan, S., 2005, “Scale-FreeNature of Java Software Package, Class and Method Collaboration Graphs,”The 5th International Symposium on Empirical Software Engineering.

[59] LaBelle, N., and Wallingford, E., 2004, “Inter-Package Dependency Networksin Open-Source Software.” Available at: http://arxiv.org/abs/cs/0411096

[60] RepRap, 2013, “Prusa Mendel Solidworks 2007 Assembly,” http://reprap.org/wiki/FilePrusa_Mendel_Solidworks_2007_Assembly.zip

[61] Sosa, M. E., Eppinger, S. D., and Rowles, C. M., 2004, “The Misalignment ofProduct Architecture and Organizational Structure in Complex Product Devel-opment,” Manage. Sci., 50(12), pp. 1674–1689.

[62] Sosa, M. E., 2008, “A Structured Approach to Predicting and ManagingTechnical Interactions in Software Development,” Res. Eng. Des., 19(1), pp.47–70.

011003-12 / Vol. 14, MARCH 2014 Transactions of the ASME

Downloaded From: http://computingengineering.asmedigitalcollection.asme.org/ on 01/12/2014 Terms of Use: http://asme.org/terms