Page 1
Modularity in Design:Formal Modeling and Automated Analysis
A Dissertation
Presented to
the faculty of the School of Engineering and Applied Science
University of Virginia
In Partial Fulfillment
of the requirements for the Degree
Doctor of Philosophy
Computer Science
by
Yuanfang Cai
August 2006
Page 2
c© Copyright August 2006
Yuanfang Cai
All rights reserved
Page 3
Approvals
This dissertation is submitted in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
Computer Science
Yuanfang Cai
Approved:
Kevin J. Sullivan (Advisor)
Mary Lou E. Soffa
William G. Griswold
John C. Knight (Chair)
Jack W. Davidson
Accepted by the School of Engineering and Applied Science:
James H. Aylor (Dean)
August 2006
Page 4
Abstract
People have long recognized that evolvability, achieved most fundamentally by appropriate modu-
larity in design, can have enormous technical, organizational and economic value. However, achiev-
ing appropriate modularity in design, e.g., by refactoring, can incur significant costs: both direct
and in tradeoffs against other properties, such as performance and time to market [14]. Reason-
ing about the evolvability properties and economic implications of design structures is critical to
high-consequence decision-making, but it remains difficult, in part due to the lack of formal theo-
ries linking design structures to evolvability and economic properties, and of automated techniques
facilitating value-based decision-making.
One key impediment is the lack of analyzable high-level design representations that both convey
design architectures and enable designers to reason precisely about their modularity properties and
economics. This dissertation contributes such a formal and analyzable representation. It supports
formal design modeling and enables automation of a number of evolvability and economic-related
analyses.
Baldwin and Clark previously contributed an influential but informal theory of modularity in de-
sign, centered on a design representation called the design structure matrix (DSM), and employing
a real-options-based model of the economic value of modularity. We found the imprecise nature of
DSMs troubling in some dimensions. Our framework captures and clarifies the essence of Baldwin
and Clark’s theory in the particular framework of first-order logical constraint networks. It enables
automatic derivation of DSMs with rigorous semantics and a number of other architecture analysis
techniques.
We model both design decisions and relevant external conditions using an augmented form of
iv
Page 5
v
constraint networks (ACNs). To support design impact analysis, we derive an intermediate, state-
machine-based design space model from an ACN, which we call a design automaton (DA). To
support traditional design coupling structure analysis, we derive a pair-wise dependence relation
(PWDR) from a DA, based on which we can then derive a DSM, and apply Baldwin and Clark’s
theory (among others).
To address scalability issues in constraint solving and solution enumeration that a DA requires,
we create a method to decompose a large ACN model into a number of smaller ones, solve each of
them separately, and integrate the results on demand. To address the problem that an ACN model
is not sufficient to model and analyze complex design decisions with crosscutting and hierarchical
structural implications, we extend the ACN model into the complex augmented constraint network
(CACN), which formally represents a family of ACNs.
Our ultimate goal is to enable economically effective software architectural decision-making
based on sound theory and useful tools. This dissertation takes an important step toward this goal
by providing a formal analyzable design modeling framework. Our thesis follows:
• This framework formally accounts for the key concepts of Baldwin and Clark’s modularity
theory as well as Parnas’s earlier information hiding design criterion.
• This framework enables the derivation of pair-wise dependence relations from ACNs, and
consequently, the derivation of DSMs with precise semantics.
• This framework enables automation of a range of formal architectural analysis methods re-
lated to evolution and economic value.
• This framework generalizes to provide an account of both object-oriented and newer aspect-
oriented notions of modularity in a unified, declarative framework.
In support of this thesis, we present evidence in two forms: (1) formal modeling and automated
analysis of case studies, supported by our prototype tool called Simon; (2) a complete formaliza-
tion of our framework, and the formalized key notions of existing theories in the settings of our
framework.
Page 6
Dedication
I dedicate my dissertation to my husband, Bo Zhang, for his unconditional love and for his encour-
agement that prevented me from giving up when I felt hard to continue. I would never have been
able to reach this point without his constant support.
I also dedicate this dissertation to my loving family and family-in-law back in China. I thank my
father, Wentong Cai, and my mother, Jinfeng Wang, for caring about who I am much more than what
I do, always telling me to balance work with my personal life. I thank my mother-in-law, Shuqin
Guo, and my father-in-law, Yimin Zhang, for treating me like their own daughter, always showing
me their understanding and caring. I thank my elder brother, Yuanming Cai, for encouraging me
to embark on this challenging and fruitful journey. I thank my sister-in-law, Dongmei Sun, and my
lovely nephew, Zijian Cai, for always believing in me more than I believed in myself.
vi
Page 7
Acknowledgments
I would like to express my sincere gratitude and appreciation to my advisor, Professor Kevin Sul-
livan, for providing me with the unique opportunity to work in the research area of software en-
gineering and software economics, for his expert guidance and mentorship, and for his invaluable
advice that makes me think formally and think broadly. He has taught me innumerable lessons and
insights on the workings of academic research in general. His technical and editorial advice was
essential to the completion of this dissertation. I appreciate the high standards he has held me to.
After six years, I am stronger and more independent, not only in my research ability, but also in my
general personality.
I would not have been able to survive without the enormous help offered by my dear friend
Elisabeth Strunk. She has been there for me whenever I was in need of help: she spent many hours
reading and marking my paper draft word by word, helping me express and organize my thoughts,
while she was not even a co-author. There was a time when she had her own pending deadlines,
but still spared her time to comment on my paper, making more than one pass to be sure that I had
something presentable. When I had a hard time, she told me that she would help me make it, and
she did. I cannot thank her enough for the countless favors she has offered me over the past six
years.
I thank Professor John Knight for his constant and critical support. Especially, both my husband
and I are full of gratitude to him for helping me look for a job close to my husband. I thank Professor
Mary Lou Soffa for hugging and comforting me like my mom, and for her always being there to
support and to help. I thank Professor Anita Jones for her kind mentoring about how to be a
professional woman. I thank Professor Jack Davidson for his service and comments as part of my
vii
Page 8
viii
committee. I have been blessed with these great and generous people in our department. They are
like my family, where I found a source of courage and the strength to survive.
I thank Professor William Griswold from the University of California, San Diego for his great
collaboration and encouragement. I thank Professor Carliss Baldwin from the Harvard Business
School for her enormous influence on my research. I thank Professor Tao Xie from North Caro-
line State University and Dr. Cordell Green from Kestrel Institute for their friendship, help, and
mentoring.
I thank all my friends who came to every one of my talks and give me warm support, including
Billy Greenwell, Tony Aiello, Xiang Yin, Michael Spiegel, and Patrick Graydon. I thank all my
friends around me. Especially, I thank Yuanyuan Song for driving me around after I lost my car,
for sharing my burden of tool implementation when I was totally overwhelmed, and for always
reminding me whenever I felt frustrated and was asking why life can be so hard: “Thinking of
people starving in Africa and suffering in Iraq, you are not qualified to complain!”
Yes, I have been very, very lucky because of all these great, generous, and broad-minded people
around me. This small section is far from enough for me to thank them enough and thank them all.
Page 9
Contents
1 Introduction 1
1.1 Modularity in Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Current Design Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Formal Modeling and Automated Analysis . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Model Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Model Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Prototype Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.8 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Background 14
2.1 Modularity in Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Software Economics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Prevailing Design Representations and Analysis . . . . . . . . . . . . . . . . . . . 17
2.4 Emerging Approach to Economics of Modularity . . . . . . . . . . . . . . . . . . 20
3 Overview of Core Modeling and Analysis Approaches 26
3.1 Core Model Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Augmented Constraint Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Operational Design Space Evolution Model . . . . . . . . . . . . . . . . . . . . . 32
3.4 Pair-wise Dependence Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5 Connections to Evolvability and Economic Analysis . . . . . . . . . . . . . . . . . 37
ix
Page 10
Contents x
3.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Modeling and Analysis of a Benchmark Design 40
4.1 Key Word In Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 ACN KWIC Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Quantitative Changeability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 Design Structure Matrix Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5 Net Option Value Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5 Model Decomposition and Result Integration 62
5.1 ACN Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Integrating Analysis Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3 Observations and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6 Model Extension and Structural Design Impact Analysis 76
6.1 Figure Editor Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 Set-Valued Design Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3 Crosscutting Design Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.4 Nested Design Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.5 Parameterizing CACN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.6 Structural Design Impact Analysis Overview . . . . . . . . . . . . . . . . . . . . 88
6.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7 Formalization 97
7.1 Formalizing the Core Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.2 Formalizing Previous Theories of Modularity . . . . . . . . . . . . . . . . . . . . 105
Page 11
Contents xi
7.3 The Divide-and-Conquer Approach and Its Correctness . . . . . . . . . . . . . . . 108
7.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
8 Simon: The Tool 121
8.1 Interactive Formal Design Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.2 Constraint Solving and DA, PWDR Generation . . . . . . . . . . . . . . . . . . . 132
8.3 Automated Design Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
9 The Generalizability of the Approach 140
9.1 A Web Application—Winery Locator . . . . . . . . . . . . . . . . . . . . . . . . 142
9.2 HyperCast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
9.3 Galileo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
10 Evaluation of this Research 165
10.1 Thesis and Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
10.2 Novelty and Potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
10.3 Limitations and Remaining Problems . . . . . . . . . . . . . . . . . . . . . . . . 168
10.4 Challenges and Open Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
10.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
11 Conclusion 171
Bibliography 173
Page 12
List of Figures
2.1 OO Observer Pattern DSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 Core Models and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Matrix Constraint Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Matrix ACN model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 The Matrix Design Automaton . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Partial Matrix Design Automaton . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6 Matrix DSM Generated by Simon . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1 KWIC Sequential Design Architecture [57] . . . . . . . . . . . . . . . . . . . . . 42
4.2 KWIC Information Hiding Design Architecture [57] . . . . . . . . . . . . . . . . 43
4.3 KWIC Sequential Design Constraint Network . . . . . . . . . . . . . . . . . . . . 45
4.4 KWIC Information Hiding Design Constraint Network . . . . . . . . . . . . . . . 46
4.5 Simon Clustering GUI for the KWIC IH Design . . . . . . . . . . . . . . . . . . . 50
4.6 Tool Snapshot: KWIC SD Design Impact Analysis Input . . . . . . . . . . . . . . 52
4.7 Tool Snapshot: KWIC SD Design Impact Analysis Output . . . . . . . . . . . . . 53
4.8 Partial Non-deterministic Finite Automaton for SD and IH design . . . . . . . . . 54
4.9 KWIC SD Derived DSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.10 KWIC IH Derived DSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.11 NOV Computation for Manual KWIC SD . . . . . . . . . . . . . . . . . . . . . . 57
4.12 NOV Computation for Manual KWIC IH . . . . . . . . . . . . . . . . . . . . . . 58
4.13 NOV Computation for Derived KWIC SD . . . . . . . . . . . . . . . . . . . . . . 60
xii
Page 13
List of Figures xiii
4.14 NOV Computation for Derived KWIC IH . . . . . . . . . . . . . . . . . . . . . . 61
5.1 Partial KWIC Information Hiding ACN model . . . . . . . . . . . . . . . . . . . . 63
5.2 Conjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3 Partial KWIC CNF graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4 KWIC Condensation Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.5 The First sub-ACN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.6 The Second sub-ACN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.7 Partial DA for the Linestorage sub-ACN . . . . . . . . . . . . . . . . . . . . . . . 69
5.8 Partial DA for the CircularShift sub-ACN . . . . . . . . . . . . . . . . . . . . . . 70
5.9 KWIC SD Modularized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.10 A SD sub-ACN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.1 OO Observer Pattern UML Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2 Figure Editor CACN Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.3 Complex Augmented Constraint Network . . . . . . . . . . . . . . . . . . . . . . 84
6.4 The Constraint Network in an ACN Generated by Design Alternative FE1 . . . . . 88
6.5 The Constraint Network in an ACN Generated by Design Alternative FE2 . . . . . 89
6.6 The DSM of FE OO: OO Figure Editor Design . . . . . . . . . . . . . . . . . . . 90
6.7 The DSM of FE AO: AO Figure Editor Design . . . . . . . . . . . . . . . . . . . 91
6.8 DIA: Notification Policy Change Impacts . . . . . . . . . . . . . . . . . . . . . . 91
6.9 The DSM of FEOO Role: Screen takes the subject role . . . . . . . . . . . . . . . . 92
6.10 The DSM of FEOO Position: Positions are observed in OO design . . . . . . . . . . 93
6.11 The DSM of FEAO Position: Positions are observed in AO design . . . . . . . . . . 94
7.1 The Brute-Force and Divide-and-Conquer DA Derivation . . . . . . . . . . . . . . 112
7.2 The Brute-Force and Divide-and-Conquer PWDR Derivation . . . . . . . . . . . . 117
8.1 Core Models and Analysis in Simon . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.2 Simon: Constraint Network Construction . . . . . . . . . . . . . . . . . . . . . . 125
Page 14
List of Figures xiv
8.3 Simon: Dominance Relation Construction . . . . . . . . . . . . . . . . . . . . . . 126
8.4 Simon: Cluster Set Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.5 ACN Language Productions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.6 Simon: Complex Augmented Constraint Network . . . . . . . . . . . . . . . . . . 128
8.7 Simon: Design Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.8 Parameterize a CACN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
8.9 Automatically Generated ACN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
8.10 CACN Language Productions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
8.11 Simon: Solve Constraint Network . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.12 Simon: Decompose a Large Constraint Network . . . . . . . . . . . . . . . . . . . 134
8.13 Simon: Sub-ACNs are Solved . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
8.14 Simon: Design Automaton and Pair-wise Dependence Relation Generation . . . . 136
8.15 Design Impact Analysis: Select an Original Design . . . . . . . . . . . . . . . . . 137
8.16 Design Impact Analysis: Specify a Change . . . . . . . . . . . . . . . . . . . . . 138
8.17 Design Impact Analysis: Evolution Paths . . . . . . . . . . . . . . . . . . . . . . 138
8.18 Design Structure Matrix Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.19 Net Option Value Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.1 WineryLocator OO Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
9.2 WineryLocator Aspect-Oriented Design . . . . . . . . . . . . . . . . . . . . . . . 148
9.3 Derived WineryLocator Design Rule DSMs . . . . . . . . . . . . . . . . . . . . . 149
9.4 Collapsed WineryLocator Design Rule Design . . . . . . . . . . . . . . . . . . . . 150
9.5 HyperCast OO Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
9.6 HyperCast OO Derived DSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
9.7 HyperCast OO Manually-Constructed DSM [63] . . . . . . . . . . . . . . . . . . 156
9.8 HyperCast AO DSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.9 Galileo Design Rules CACN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.10 Galileo: Design Rules and New Features . . . . . . . . . . . . . . . . . . . . . . . 160
Page 15
List of Figures xv
9.11 Galileo Error Handling Refactoring . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.12 Design Structures using Different Error Handling Options . . . . . . . . . . . . . 163
9.13 Add New Views based on Different Error Handling Options . . . . . . . . . . . . 163
Page 16
List of Tables
3.1 Matrix Design Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.1 The Variables of IH sub-ACNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 The Variables of SD sub-ACNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.1 Performance for WineryLocator OO Model . . . . . . . . . . . . . . . . . . . . . 146
9.2 Performance for WineryLocator DR Model . . . . . . . . . . . . . . . . . . . . . 147
9.3 Performance for WineryLocator AO Model . . . . . . . . . . . . . . . . . . . . . 147
9.4 Performance for HyperCast OO Model . . . . . . . . . . . . . . . . . . . . . . . . 154
9.5 Performance for HyperCast Obliviousness Model . . . . . . . . . . . . . . . . . . 154
9.6 Performance for HyperCast DR Model . . . . . . . . . . . . . . . . . . . . . . . . 155
xvi
Page 17
Chapter 1
Introduction
1.1 Modularity in Design
People have long recognized that evolvability, achieved most fundamentally by appropriate modu-
larity in design, can have enormous technical, organizational and ultimately economic value. How-
ever, activities to achieve appropriate modularity in design, such as refactoring, are not free, but can
incur both direct costs and indirect costs in the form of tradeoffs against other key properties, such
as performance and time to market [14]. Reasoning about the evolvability properties and economic
implications of design structures is critical to high-consequence design decision-making. How-
ever, such reasoning remains difficult, in part because we remain without a formal framework for
modeling and analysis of problems in this domain.
The challenge is that we lack both formal theories linking design structures to their evolvability
and economic properties, and automated techniques facilitating these economic-oriented decision-
making. One key impediment is the lack of analyzable design representations that not only convey
conceptual designs, but also enable designers to reason about the structure of coupling relations on
high-level design decisions and their economic implications. This dissertation contributes such an
analyzable representation and a number of automated evolvability and economic-related analysis
techniques.
We were motivated to conduct this research in part by a conversation with two practicing soft-
ware engineers, who described a dilemma they faced at work. The engineers worked for a small
1
Page 18
Chapter 1. Introduction 2
company that earned revenues by delivering to a paying customer a stream of enhancements to a
software tool. The engineers who did the work were responsible both for estimating the time (and
thus cost) to make each enhancement and for implementing the selected enhancements. They were
quite good at estimating, but dissatisfied with the system design, believing that it significantly com-
plicated, and thus increased the cost and time required, to implement each new feature. They had
proposed to the management to restructure the tool. However, the management, concerned about
disrupting the flow of enhancements, and thus of the revenues on which the company depended, and
having no clear model of the expected benefits from restructuring, declined. A key problem was
that the engineers had neither the training nor the tools to analyze the situation quantitatively or to
frame it in the economic terms that might have been compelling to business decision-makers. As a
result, the engineers were dissatisfied, and the company incurred a possibly significant unnecessary
cost.
The problem that this company faced is one that organizations everywhere grapple with. Should
we make a costly investment in design, in this case, in restructuring a design? The problem we set
out to address is that, as a discipline, we still largely lack the testable scientific models needed
to analyze design decisions such as this one in terms that make sense both technically and eco-
nomically. Without testable and validated analysis methods and tools, it will remain hard for engi-
neers and managers to reliably make such costly investment decisions in software design. Parnas’s
well-known changeability analysis has strong economic implications, and his information hiding
principle [54], which aims to isolate design decisions that are anticipated to change, has remained
influential for decades. However, rigorous, quantitative, and automated approaches to applying
such reasoning remain largely absent.
Reasoning about evolvability and economic properties rigorously first requires an analyzable
design representation based on which the designer can express, analyze, and compare different
choices in terms of their respective economic impacts. Source code dependence structures, such
as call graphs, have been used as proxies of higher-level design coupling structures to facilitate
modular structure analysis. However, designers often need to answer important questions before
committing to implementation decisions: how best to accommodate changes in designs or in exter-
Page 19
Chapter 1. Introduction 3
nal conditions, whether to invest in costly restructuring of complex systems, how best to modularize
designs, how to align architecture and business strategy [71], etc. Nor is it yet clear that source code
structure is a sufficiently reliable proxy for more abstract coupling structures. Most prevailing de-
sign level representations are not designed for this purpose. We identify a number of problems that
make current design representations unsuitable for value-oriented design analyses, and present a
formal modeling framework to address these problems.
1.2 Current Design Representations
A design representation for value-oriented analyses should support not only conceptual design de-
scriptions, but also the reasoning about the structure of coupling relations on high-level design
decisions and their economic implications, precisely, rigorously, and, ideally, automatically. Un-
fortunately, most prevailing design modeling techniques, such as the unified modeling language
(UML) [15] and architecture description languages (ADLs) [35], are not designed for this purpose
and appear to be difficult to serve as the proper media. The obstacles include the following:
First, some design decisions or external conditions, such as hardware conditions, security re-
quirements, user profiles, are not part of the program but could affect software evolution or impact
other design decisions. Parnas’s changeability analysis in his seminal paper explicitly considers
several external conditions that drive design evolution, such as the memory size and input file size,
some dimensions in which decisions are likely to change, such as “how to store data in mem-
ory” [54], and different choices within each dimension. Most prevailing design representations are
not designed to capture varieties of such dimensions, making it difficult to analyze the impact of
changes in these dimensions.
Second, a decision at one level often alters a design structure by introducing new variables and
constraints at lower levels. Examples include the choice of design patterns, or the decision to add a
new feature. The structure of a design is thus not flat and fixed but is, in general, contingent on prior
decisions and recursive in structure. State-of-the-art design modeling approaches do not adequately
represent this aspect of real design structures. Consequently, it is difficult to analyze the structural
Page 20
Chapter 1. Introduction 4
consequences of, and tradeoffs involved in making such high level design decisions.
Third, the effects of design decisions are frequently not local but crosscutting. For example, in
an observer pattern design, all the subjects have to respect the agreed update protocol [33]. Pre-
vailing design modeling techniques do not adequately represent design decisions with crosscutting
effects, making it difficult to analyze their impacts.
Baldwin and Clark’s design rule theory [7] uses a matrix-based design model, call a design
structure matrix (DSM) [62, 25, 7], to represent design coupling structures, and they then employ a
model to statistically account for the economic implications of the modular structures of computer
system designs. The key idea in their work is that modules create valuable options, and their
model predicts the economic value of these options. A DSM model represents design decisions and
external conditions in a general way: they uniformly appear as design variables labeling the columns
and rows of a matrix. Marked cells in the matrix represent the pair-wise dependence relation among
design variables. DSM is simple but powerful in that it reveals key notions in design modularity,
as Baldwin and Clark point out [7]. Chapter 2 briefly introduces DSM modeling and Baldwin and
Clark’s economic analysis.
Previous work of Sullivan, Griswold, and the author of this dissertation [64] applied Baldwin
and Clark’s modeling and analysis techniques to software design, showing that a DSM can visu-
ally represent the criterion of information hiding modularity and support quantitative modularity
analysis. In Lopes’s paper [49] and our more recent work [63], Baldwin and Clark’s modeling
and analysis techniques were used to compare aspect-oriented designs, verifying that one design is
better than another both visually and quantitatively.
However, these manual modeling and statistic analysis techniques have severe limitations for
rigorous evolvability analyses. First, as with the UML and most ADLs, a DSM does not explic-
itly represent the choices available within each dimension, such as the choice of observer pattern
versus mediator pattern [33]. Consequently, analyzing the impact of changes in such decisions in
general remains difficult. Second, a DSM does not represent the multiple ways in which a change
in one design decision can be accommodated by changes in other decisions. In such cases, a DSM
becomes ambiguous, not enough for analysis leading to answers to some important questions, such
Page 21
Chapter 1. Introduction 5
as: “which is the best available compensation in terms of cost?” In addition, due to its informal
and ambiguous nature, we have found that building such DSM models is error-prone and time-
consuming.
1.3 Formal Modeling and Automated Analysis
To address these problems, this dissertation presents a formal design modeling framework, con-
tributing three core models representing decision-making phenomena from different perspectives.
These models connect conceptual software designs with a number of evolvability and economic
analyses.
1.3.1 Formal Models
As the basis for rigorous design analysis, we employ a formal model called an Constraint Network
(CN) to model design dimensions and external conditions in a general way. In a CN, variables
represent dimensions in which design decisions are made, values represent design decisions, and
logical constraints model required relations. For example, the following CN models the choices of a
matrix data structure, the choices of the algorithm, and one of their relations as two scalar variables
and one logical expression:
1: scalar matrix_data_structure:(array, list);
2: scalar matrix_algorithm:(array, list);
3: matrix_algorithm=array <=> matrix_data_structure=array;
The scalar variables matrix_data_structure and matrix_algorithm, in lines 1 and 2, represent
the dimensions of data structure and algorithm. Their domains follow within the parentheses, mod-
eling the choices within each dimension. For example, Line 1 models that the choices for the data
structure dimension are array and list. A constraint network models the interdependence relation
among design variables and environment conditions as a set of logical constraints. For example,
line 3 states that the choice of an array algorithm is valid if only if the selected data structure is
array-based.
Page 22
Chapter 1. Introduction 6
A purely logic-based design description is not sufficient for design modularity analysis. For
example, the dominance relation among design decisions plays an important role in Baldwin and
Clark’s modularity analysis, but is not part of a constraint network. We address this problem by
augmenting a pure constraint network with additional data structures, and call it an Augmented
Constraint Network (ACN).
To link a conceptual design with its evolvability and economic properties, we derive a design
evolution model that represents the change dynamics within that part of a design space defined by
an ACN. We call this model a design automaton (DA). The states of a DA represent design states
(assignments of decisions in each of the dimensions) that satisfy all of the constraints. Transitions
model changes in design driven and labeled by changes to individual design decisions, where the
destination state for a given starting state and change differs from the initial state in a way that is
minimally sufficient to restore consistency. A DA captures all of the possible ways in which any
change to any decision in any state of a design can be compensated for by changes to minimal
subsets of other decisions. DA enables quantitative changeability analysis. For example, given a
changing decision or condition, this framework computes how many ways are there to accommo-
date this change, and how many decisions should be reconsidered in each way.
A pair-wise dependence relation (PWDR) among design elements appears to be a useful model
underlying many influential design representations. In box-and-arrow style representations, such
as ADL, UML, call graph, Reflexion Models [53], the arrows model different kinds of pair-wise
dependence relations among boxes, such as function calls, inheritance, system I/O. Pair-wise de-
pendence relation among design decisions is the core data structure underlying Baldwin and Clark’s
theory. The DA model provides a precise definition of what it means for one variable to depend on
another: we define two design variables to be pair-wise dependent if, for some design state, there
is some change to the first variable for which the second must change in at least one of the minimal
compensating state changes.
Page 23
Chapter 1. Introduction 7
1.3.2 Automated Analysis
This representation formalizes the key concepts of Parnas’s and Baldwin and Clark’s modularity
theories, such as design spaces, design dimensions, design rules, and enables the automation of
their analysis techniques. (1) Parnas’s changeability analysis can be formalized rigorously in a DA
model. The question of finding all the ways to compensate for an anticipated sequence of individual
changes can be formulated as a mapping from a DA, an assignment modeling the current design,
and a sequence of variable-value pairs that model changes, to a set of sequences of consistent
design states modeling the feasible evolution paths for the given sequence of changes. To solve
this problem, we find the paths that start from the initial design and go along the edges labeled
with specified changes. Each path represents one way to compensate for the given changes. The
destination states are the possible new design states accommodating the given sequence changes.
(2) Parnas’s information hiding principle can be formalized as a mechanically checkable predicate
based on an ACN and the derived PWDR model, stating that a PWDR derived from an ACN should
not have any pair with a first element in an environment module, and the second in a design rule
module, formalizing the previous observation of Sullivan et al. obtained from DSM models [64]. (3)
The PWDR model can be used to populate a DSM model that has proven utility in other engineering
realms. As the result, in principle, analysis techniques available in other engineering realms, such
as project scheduling, can be applied to software design. (4) Since DSM modeling is at the heart
of Baldwin and Clark’s theory and analysis, this framework supports the automation of their Net
Option Value (NOV) analysis for software designs that can be expressed in terms of our modeling
framework.
1.4 Model Decomposition
As with many formal analysis techniques, such as model checking, the difficulty of constraint satis-
faction limits the size of models that can be analyzed in practice. Our DA model requires an explicit
representation of the entire space of satisfying solutions, but the number of the solutions increases
exponentially with the number of variables involved. For example, the Parnas’s KWIC information
Page 24
Chapter 1. Introduction 8
hiding design ACN with 20 variables has 34907 solutions, which are the input of our DA derivation
program. Using Alloy as a SAT solver [41], which was not designed to perform exhaustive analysis,
it takes hours in total to get analysis results. To address this problem, we create a method to decom-
pose a large ACN model into a number of smaller sub-ACNs, solve each sub-ACN individually, and
integrate the analysis results. The integrated results are equal to the results obtained by analyzing
the full large ACN model. This approach splits the whole KWIC information hiding ACN into 6
sub-ACNs, having 6, 6, 4, 5, 7, and 5 variables respectively. Our supporting tool, Simon, now in-
vokes multiple SAT solvers and DA processors separately to deal with these much smaller models,
and integrate the results in the order of seconds.
1.5 Model Extension
Although the ACN, DA, and PWDR models have the potential to enable automated evolvability and
economic analyses, as a conceptual design description model, an ACN is not sufficient to capture
several complex design decision-making phenomena that people encounter frequently. First, as
Baldwin and Clark point out [7], some design dimensions are “called into being” by other decisions.
For example, a decision to add a new feature brings into being a number of new dimensions specific
to that feature. Scalar-valued design variables are not sufficient to model these decisions and their
impacts. Second, it is not uncommon that a decision not only brings up new dimensions, but
also new constraints among these new dimensions, or constraints between new dimensions and
existing dimensions. For example, a choice of design patterns not only brings new dimensions
that are specific to the pattern, but also imposes pattern-specific constraints on new and existing
design elements. Third, design decisions can crosscut each other. For example, an observer pattern
requires that “all the objects taking the subject role should implement the prevailing notification
policy.” When a new object is added to the system as a subject, as part of the impact analysis, the
designer should be aware of the notification policy in use, and of other constraints imposed by the
choice of an observer pattern.
We extend the ACN model into a complex augmented constraint network (CACN) to support
Page 25
Chapter 1. Introduction 9
the modeling and analysis of these complex design decisions. A CACN, in essence, represents a
family of ACNs. Extending the definition of an ACN, a CACN uses set-valued variables to model
dimensions in which a decision brings into a set of new dimensions, uses values with subspaces to
model decisions with recursive sub-design structures, and uses logical quantifications to model the
crosscutting effects among decisions. For example, a CACN models the choices of patterns using
a subspace variable: subspace d_pattern: (observer, mediator), each value bringing up a
new design space represented by a recursive CACN. The following quantified expression:
∀object : observer role • object = orig⇒ update protocol = orig
models the crosscutting constraint that each object taking an observer role has to observe the agreed
update protocol. object = orig means keeping the object as currently designed; update protocol =
orig means that the update policy is as originally agreed. orig is short for ”original”, modeling a
current choice or decision.
Changing these complex design decisions incurs structural impacts. To analyze these impacts,
the designer instantiates a CACN model into a set of simpler design models represented by ACNs.
As a result, a range of analyses developed for ACNs can be applied—for assessing modularity,
evolvability, economic and other design properties.
1.6 Prototype Tool
We developed a tool called Simon to support formal design modeling and to automate the associated
evolvability property and economic analyses. Using Simon, the user can build ACNs and CACNs
through interactive GUIs, derive design coupling structures and present them using DSMs, apply
Baldwin and Clark’s net option value (NOV) analysis [7], and analyze design change impacts.
Using Simon, we have evaluated our framework against a number of case studies.
1.7 Evaluation
The ultimate goal of this research is to enable software designers to make value-oriented design
decisions in a rational way, facilitated by automatic tools. The purpose of this dissertation is to
Page 26
Chapter 1. Introduction 10
provide a formal analyzable design modeling framework, one important step towards this goal.
This dissertation claims and evaluates the following thesis:
• This framework provides a formal account of the key concepts of important but informal
modularity theories. (1) It formalizes Baldwin and Clark’s key notions of design dimension,
design decision, design decision dependence, and design space. (2) It formally accounts for
Parnas’s concept of information hiding modularity as a mechanically checkable predicate.
• This framework enables the derivation of design coupling structures in the form of pair-wise
relations on design decisions, and thus also the derivation of DSMs from ACNs. The benefit
is that the approach enables designers to reason about modularity in design architecture using
both the methods of Baldwin and Clark (but in terms of an abstract and formally precise
representation), as well as new kinds of analysis.
• This framework automates basic evolvability analyses such as design impact analysis. Given
a sequence of changing decisions or conditions, this framework computes how many ways
are there to accommodate these changes, and how many decisions should be reconsidered in
each way.
• Our model of modularity in design is general. In particular, it can account for both traditional
object-oriented notions of modularity and newer aspect-oriented notions within a unified,
declarative framework.
The evaluation of this framework partially involves the formal account of the key concepts
of important but informal modularity theories: (1) the formalization of Baldwin and Clark’s key
notions of design dimension, design decision, design decision dependence, design space, and design
rule; (2) the formal account of the extensions to Baldwin and Clark’s approach made by Sullivan
et al.; and (3) the formal account of Parnas’s concept of information hiding modularity. Chapter 7
presents these formalized concepts.
We also evaluate the ability of this framework to connect conceptual designs with important
evolvability and economic analyses, in particular, to Parnas’s changeability analysis, coupling struc-
ture analysis supported by design structure matrices, and Baldwin and Clark’s net option value
Page 27
Chapter 1. Introduction 11
(NOV) analysis. Our basic evaluation strategy in this aspect is to model software designs in which
people have analyzed problems that have strong economic implications, automate these analyses
using Simon, and compare the results with the previous qualitative analysis results. The purpose
is to evaluate: (1) the expressiveness of the framework; (2) the ability to automate these analyses;
and (3) the accuracy of the analyses. Our case studies include both canonical designs and designs
of several real systems.
The canonical designs include the famous software engineering benchmark, the Key Word in
Context (KWIC) system, in Parnas’s seminal information hiding paper [54], and the widely used
Figure Editor (FE) example [43, 37, 33]. These two designs represent very different design styles:
KWIC represents functional and object oriented designs; the Figure Editor design manifests broader
contemporary approaches using design patterns and aspect-oriented programming. The commonly-
used representation approaches for these two designs are different: researchers have frequently
represented KWIC architectures in a standardized way [57], while the UML models for different
variations of the FE example appear in large number of recent publications. Modeling them uni-
formly evaluates the expressiveness and abstraction ability of this framework.
The analyses people illustrate using these designs have some similarities in essence: Parnas
uses KWIC to analyze design changeability; Hannemann and Kiczales [38] use FE to compared
their aspect oriented (AO) design patterns with the object oriented (OO) design patterns in terms
of their ability to accommodate envisioned changes. Analyzing well-known problems in published
work in a quantitative and automated way demonstrates the potential utility of this framework.
Our case studies also include the following real designs:
1. A web application developed and studied by Lopes et al. [49] (WineryLocator). In their pa-
per, Lopes et al. use Baldwin and Clark’s modeling and analysis technique to quantitatively
compare different designs. To evaluate the expressiveness of our framework and the correct-
ness of the analysis results, we represent these designs into ACNs according to their design
descriptions, generate DSMs, and compare with their manually constructed DSMs. We found
ambiguities and problematic issues in their published manual models and analysis.
Page 28
Chapter 1. Introduction 12
2. A peer-to-peer networking system, HyperCast [48,47], developed by the network researchers
in the University of Virginia and studied by Sullivan et al. [63]. Similar to the WineryLoca-
tor paper, the authors compared different designs using manual models. Remodeling these
designs into our framework and analyzing them automatically reveals important issues in the
manual models.
3. The Galileo dynamic fault tree analysis tool, developed at the University of Virginia for
production use at NASA [66,65,24]. The Galileo designers once faced a situation when they
had to make a decision about how to restructure part of the system. They reached a decision
based on discussions and arguments, rather than rigorous analysis. A retrospective analysis
of this historical scenario using Simon shows how the designers might have been able to
compare different decisions and to justify their decision rigorously.
The revealed errors in manually constructed DSMs imply potential errors in the later on quantita-
tive analysis based on these models. These case studies provide support for the claimed modeling
and analysis ability of our framework, revealing the power of formal models and automated anal-
ysis. The ultimate utility of this framework and the tool will continuously be evaluated in realistic
settings, as part of our future work.
1.8 Overview
Chapter 2 presents the background of this work. Chapter 3 illustrates the full picture of our core
models and associated analyses using a small example. Chapter 4 uses the famous software engi-
neering benchmark, the Key Word in Context (KWIC) system, to illustrate the problems with exist-
ing design representation approaches and how our modeling framework addresses these problems.
This chapter shows how Simon automates our framework and related well-known economic-related
analyses, reveals errors and weaknesses in published work, and rationally justifies design decisions
previously made in intuitive and qualitative ways. Chapter 5 presents our divide-and-conquer ap-
proach to addressing the scalability issue. Chapter 6 presents our extended CACN model using
the widely used Figure Editor design as a running example, and shows how a CACN model sup-
Page 29
Chapter 1. Introduction 13
ports structural design impact analysis. Chapter 7 presents the key formalizations behind Simon.
Chapter 8 presents how Simon implements our framework. Chapter 9 further evaluates our frame-
work by modeling and analyzing the three real software designs, demonstrating its potential utility.
Chapter 10 evaluates this work as a whole. Chapter 11 concludes.
Page 30
Chapter 2
Background
A great deal of research has been done on evolvability and modularity in software design and more
recently on the economics of design. This chapter presents the most important and relevant prior
work in this area, and explains both its strengths and where it falls short in relation to the goals we
have set forth.
2.1 Modularity in Design
People have recognized for decades that the structure of the coupling relation on design decisions
is a key factor influencing the evolvability and economic properties of a design [2, 5, 58, 61, 16].
Christoper Alexander [2] defines design as “the process of inventing things which display new phys-
ical order, organization, form, in response to function...”, discusses the process by which a form is
adapted to the context that has called it into being, and shows that such an adaptive process will be
successful only if it proceeds piece by piece instead of all at once, that is, by creating subsystems
of the adaptive process.
Software designers seek to structure software systems into modules (subsystems), to better ac-
commodate expected changes (adapt to context changes), to have parts that can be developed and
evolved without further coordination, and to ease the understanding of complex designs through ab-
straction of details hidden within modules. Constantine et al. [61] emphasize the need for designers
to manage the coupling between modules (subsystems) and cohesiveness within them: modules
14
Page 31
Chapter 2. Background 15
with high cohesion and low coupling imply desirable properties of software including robustness,
reliability, reusability, and understandability.
In the study of OS/360 and other large systems, Belady and Lehman observed the rising cost of
change caused by decaying structure due to the accumulation of unanticipated changes, explicitly
connecting the changeability of design structures with their economic impacts [13]. Parnas’s infor-
mation hiding design criterion dictates that designers decompose systems into modules in order to
hide (and thus to decouple) decisions that are difficult or likely to change. Recent work, such as
object-oriented and component-based software development, follows these ideas, taking objects or
components as modules with the assumption that they hide the decisions that are likely to change.
The limitation of these dominating methods has been recognized and challenged, for example, by
aspect oriented programming researchers.
These important theories, guidance, and principles remain intuitive and heuristic, and progress
in programming languages hasn’t solved decision-making problems in design, such as the refactor-
ing story we introduced in the previous chapter. Among the reasons is that we lack both scientific
theories to rigorously account for the economic implications of design structures, and automated
techniques facilitating value-oriented decision-making. Although Parnas’s well-known changeabil-
ity analysis has strong and explicitly noted economic implications, we still remain without a scien-
tifically rigorous formulation of the idea or a quantitative, automatable approach to applying it in
design modeling and analysis.
2.2 Software Economics
People recently have explored the possibility to import rigorous analysis methods from other engi-
neering and economic realms into software engineering. Sullivan was among the first in the soft-
ware engineering community [67,44] to suggest that work from the financial economics community
on real options [3, 23] might provide a link from technical notions of modularity in design, phased
project structures (such as the spiral model [26]), and strategic timing of software design commit-
ments to economic measures of goodness. Withey [70] applied a related analysis to reason about
Page 32
Chapter 2. Background 16
the flexibility value of software product line architectures. Favaro [28] developed an options-based
approach to investment analysis for software reuse infrastructures.
Carliss Baldwin and Kim Clark at the Harvard Business School developed a similar idea, pub-
lished in their book, Design Rules: The Power of Modularity [7]. Their goal was a plausible sci-
entific hypothesis accounting for observed large-scale changes in the structure of the computer
industry over several decades, from a set of vertically integrated companies to a set of highly mod-
ular clusters: companies organized around particular components of the computer (CPU, operating
system, motherboard, etc.). Their idea is that modularity in design creates economic value in the
form of real options. These are options to invest in multiple R&D experiments within modules
rather than at the whole-system level, and then to select the best of resulting outcomes. Multiple
companies within a sector are, in essence, exploring the space of possible designs by conducting
such experiments. System integrators (such as Dell, for example, in the PC sector) serve to select
the best available outcomes at any given point in time. Baldwin and Clark presented a novel real
options valuation model, and argued that the pursuit of the economic value that they had modeled
could account for the large-scale transformation of the industry. Companies saw value within mod-
ules and organized system designs and themselves accordingly. Baldwin and Clark adopted and
adapted the design structure matrix (DSM) as a design representation, framed the notion of a de-
sign rule as a special design decision that serves to split a design into a set of independent modules,
and built an options valuation model for designs expressed in terms of DSMs and design rules.
The work of Sullivan and his colleagues, now including Baldwin, provides the backdrop for the
work presented in this dissertation. In particular, the notion that real options can provide models
to aid decision-makers in the design of software and software-intensive systems remains as a com-
pelling but still largely unproven hypothesis. Several hurdles stand in the way of the development,
validation and application of these ideas to actual software and system design. First, the statisti-
cal models of uncertainty underlying risky design experiments remain unvalidated. The stochastic
process models behind most work on real options are especially questionable for technical reasons
beyond the scope of this work. Baldwin and Clark’s new approach to options pricing is based
on extremal order statistics rather than on stochastic processes. However, validation of the new
Page 33
Chapter 2. Background 17
model remains ongoing work. Second, while the notion of design rules seems to provide a power-
ful new way to think about information hiding modularity in a general sense, the design structure
matrix representation, in terms of which the concept was first developed, appears to be inadequate
to fully support a rigorously precise theory adequate either to the needs of software engineering or
to underpin a scientifically precise and testable theory of design and economic value. (Nor, as we
discuss further below, do our traditional software design representations appear adequate.) Third,
the exploratory and experimental application of these ideas in software engineering research and
application remains at an early stage. We discuss the state of the art in this area in Section 2.4.
In this dissertation we primarily address the second problem: we lack suitable software de-
sign representations to support a scientifically rigorous theory of the economic value of modularity
based on concepts of real options. The next section explains why prevailing box-and-arrow and
component-and-connector representations, as typified by the class diagrams of the unified modeling
language (UML) [15], some architecture description languages (ADL) [35], etc., are not sufficient
to bridge the gap between design structure and rigorous economic reasoning.
2.3 Prevailing Design Representations and Analysis
In most commonly used box-and-arrow style design representations, a box represents the element
to model, and a line connecting two boxes represent a relation between these two elements. In an
ADL model, boxes represents components such as modules, and arrows model various of relations
between these modules, such as function calls. In a class diagram of a UML model, boxes rep-
resent classes and lines model their relations. Changeability is one of the problems that software
designers confront frequently, and that has strong economic implications. However, rigorous and
automated design changeability analysis at design level has not been available. Parnas’s analysis is
descriptive and qualitative; Hannemann and Kiczales compared the changeability of aspect oriented
implementation of design patterns with object oriented implementations based on the actual code.
Although these prevailing representations model program structures effectively, they are not
designed to support rigorous design changeability analysis. Designers could neither observe the
Page 34
Chapter 2. Background 18
impact of changing decisions, nor could they measure these impacts in a quantitative way. Au-
tomating the analysis is even more difficult. Traditional impact analysis research focuses on change
issues at the program level, as summarized in [4]. We are interested in the counterpart at design
level, and identify the following missing elements that are critical to economic-related analysis:
First, the environment conditions and important design decisions that influence software evo-
lution are not modeled. Parnas’s changeability analysis is based on changes in environment condi-
tions, such as external constraints on core size and input size, the dimensions that prevailing design
representations are not designed to model.
Second, important design dimensions and possible choices within these dimensions are not
modeled. For example, in an observer pattern [33], choosing different update policies influences
the elements in the pattern: the push policy requires the subjects send all their data regardless
of what the observers need; the pull policy depends on the observers to request the needed data.
Neither UML nor ADL are designed to model and analyze such choices. Czarnecki et al.’s feature
model [21] aims to model and analyze feature variations, but not design decisions in a broader
sense, such as refactoring options, design patterns, and aspects. Design space modeling has also
been studied by Bosch [59], Lane [45], and Feather [29] for product line design, design generation
and optimization. While these design notations can ease communication among designs, help to
guide system implementers, they are not designed to account for the connections between design
structure and economic value explicitly and rigorously, per se, and help designers make value-
oriented decisions.
Third, they do not adequately model the constraints relating such decisions. These arrows with
legends have limited ability to express complex constraints, such as logical or, imply, and transitive
relation. In essence, they lack rigorous semantics for quantitative analysis.
These environment conditions, design dimensions, possible choices, and their internal con-
straints are the fundamentals of Parnas’s changeability analysis, but are not representable in pre-
vailing representation styles, not to mention their rigorous and automated analyses. Consequently,
the success of such analysis depends on the designers’ experience.
Fourth, a decision at one level, such as the decision to apply a design pattern or to add a new
Page 35
Chapter 2. Background 19
feature, often alters a design structure by introducing new variables and constraints, As shown
in Hannemann and Kiczales’s [38] paper, the same pattern can use either an AO or an OO par-
adigm [38]. The choices in the pattern and paradigm dimensions have significant consequences
in that each choice calls into being a different design subspace that introduces both new dimen-
sions and constraints that are potentially scoped over other variables in the design. The structure
of a design space is thus not fixed but, in general, is contingent on prior decisions and recursive
in structure, a phenomenon that the state-of-the-art design modeling approaches do not adequately
represent. Consequently, it is difficult to analyze the structural and economic consequences of
making such high level design decisions.
Finally, the effects of design decisions are frequently not local but crosscutting. For example,
all the subjects and observers involved in an application of the observer design pattern [33] have
to respect the agreed notification policy, push or pull. Prevailing design modeling techniques do
not adequately represent design decisions with crosscutting effects. Consequently, it is difficult
to have a clear picture of the structural and economic consequences of making or changing such
crosscutting design decisions.
Jackson [40] uses Alloy for object modeling with the goal of being able to check structural
properties of object models specified using the Alloy relational logic. Garlan et al. [34, 1] used Z
to formalize architectural styles in order to prove mainly behavioral properties of systems in these
styles. Batory [9] uses formal models of software design spaces for systems that vary in component
implementations, aiming to support system generation and reuse.
We have found that Baldwin and Clark [7]’s design rule theory contains a set of concepts, mod-
els, and analysis techniques that are able to shed light on important software engineering phenom-
ena. The next section introduces briefly the key notions and analysis models of the emerging new
approaches, how they are related to software engineering, and what are the remaining challenges.
Page 36
Chapter 2. Background 20
2.4 Emerging Approach to Economics of Modularity
This section introduces key notions in Baldwin and Clark’s theory that researchers have attempted
to apply to software engineering. Section 4.4 introduces design structure matrices (DSMs), a model
that has been used in other engineering realms and is at the heart of Baldwin and Clark’s theory of
modularity. Section 4.5 introduces Baldwin and Clark’s economic analysis model called net option
value that statistically computes the value of modularity, and its application in comparing software
designs. Section 2.4.3 explains the remaining challenges.
2.4.1 Design Structure Matrices
In work spanning several communities, including engineering systems design [25], design eco-
nomics [7], and software engineering and languages [64, 63, 49], researchers have been developing
and using explicit design space representations to support a range of novel and potentially useful
architectural analysis and decision-support techniques, including techniques that link design struc-
ture to economic value and business strategy. Much of this work has revolved around the design
structure matrix (DSM) as a representation. DSM modeling originated with the work of Steward
dating to the 1960s [62], and has been further developed and applied in the design, analysis and
management of many large-scale engineering systems by Eppinger [25] and others. DSMs are the
primary representations at the heart of Baldwin and Clark’s developing theory of the economics of
modularity [7].
DSMs present in a graphical form the pair-wise dependence structure of designs and of their cor-
responding development and evolution process. Figure 2.1 shows the DSM model for the FE design
using the OO observer pattern. The rows and columns of a DSM are labeled with design variables,
representing dimensions for which the designers must make design decisions. For example, we rep-
resent the need for a notification policy decision with a design variable policy_notify. The ab-
stract interfaces Subject and Observer are modeled by variables adt_subject and adt_observer
respectively. A marked cell indicates that the decision of the dimension on the row depends on the
decision of the dimension on the column. In Figure 2.1, the cell in row 6, column 1, indicates that
Page 37
Chapter 2. Background 21
0 1 2 3 4 5 6 7 80:color_policy_observing .1:policy_notify .2:policy_update .3:d_mapping . x4:adt_observer . x5:adt_subject x x x .6:point_elements x x x x x .7:line_elements x x x x x .8:screen_elements x x .
Figure 2.1: OO Observer Pattern DSM
how the Point should be designed depends on the notification policy in use.
Some design decisions dominate other design decisions. Baldwin and Clark define a design
decision as a design rule [7] if it is made before and respected by subordinate design decisions,
deemed stable, and can decouple otherwise coupled decisions. The decision on an abstract interface
dimension can be seen as a design rule dominating other implementation decisions. The fact that the
environment conditions are outside of the designer’s control is another type of dominance relation.
For example, the observing policy defining the states and transitions of interest, color or position,
could be a changeable part of the system specification not decided by the designer. We call variables
modeling such environment conditions environment variables. A DSM models the existence of a
dominance relation by asymmetric dependences. In Figure 2.1, the DSM models that the decision
on the notification policy dominates the decision on the implementation of the Point element by the
lack of the symmetric mark in the cell of row 1 and column 6.
Sullivan et al. [64] showed that DSM modeling in the style of Baldwin and Clark should be ex-
tended with an explicit notion of environment variables, and that, so extended, it could account for
and help designers to visualize Parnas’s information hiding criterion. They also presented early ev-
idence of the potential for such modeling to provide an account of the economic (net options) value
of modularity in software design. The more recent work of Sullivan et al. built on such analysis to
Page 38
Chapter 2. Background 22
provide a critique of the notion of obliviousness in aspect-oriented software design, a comparative
economic-based analysis of alternative approaches to the use of aspect-oriented mechanisms, and a
notion of explicit interfaces for aspect-oriented design [63] along with a practical method for using
crosscutting interfaces (XPIs) in software design [37]. We seek to formalize and support automated
analysis of design questions of the kind analyzed informally and manually, using DSMs, in our
earlier work.
2.4.2 Net Option Value Analysis
To quantitatively characterize the modularity of design in engineering realms, such as computer
system design, Baldwin and Clark propose a net option value model to statistically account for
modular design phenomena [7]. Sullivan et al. [64, 63] and Lopes [49], have previously used Bald-
win and Clark’s net option value analysis to quantitatively compare software designs modeled by
DSMs. This section briefly introduces the main ideas.
Suppose that a product has a market value of S0 because of its visible functionalities or proper-
ties, and the NOV model estimates the additional value added by the modular structure in its hidden
design. The idea is that modularity creates options to replace existing modules with better ones
that produce higher values, for example, because of improved speed, quality, etc. Modularity thus
creates a portfolio of valuable real options, one per module.
This model states that splitting a design into m modules increases its base value S0 by a fraction
obtained by summing the net option values (NOV i) of the resulting options. NOV is the expected
payoff of exercising a search and substitute option optimally, accounting for both the benefits and
cost of exercising options. This model depends on a number of simplified assumptions, for example,
it does not take into account the cost of attaining modularity, assumes that multiple experiments are
preformed on the same module, and that these experiments generate values that distribute normally.
On the other hand, it does capture key phenomena in design. In this model, the value of a product
with m modules—m embedded option—is calculated as:
V = S0 +NOV 1 +NOV i + ...+NOV m, where
NOV i = maxki{σin1/2i Q(ki)−Ci(ni)ki−Zi}
Page 39
Chapter 2. Background 23
For module i, σin1/2i Q(ki) is the expected benefit to be gained by accepting the best positive-
valued candidate generated by ki independent experiments. Ci(ni)ki is the cost to run ki experiments
as a function Ci of the module complexity ni. Zi = Σjseesicnj is the cost of changing the modules
that depend on module i. The max picks the experiment that maximizes the gain for module i.
The most important parameter for NOV analysis is technical potential, σ. The complexity, n,
and visibility cost, Z, by contrast, are derived from a given design model. Technical potential is
the expected variance on the rate of return on an investment in producing variants of a module im-
plementation. On the assumption that the prevailing implementation of a module is adequate, the
expected variance in the results of independent experiments is proportional to changes in require-
ments that drive the evolution of the module’s specification. Complexity can be measured as the
size of the artifact as a proportion of the overall system, using the number of design variables, lines
of code, etc. The visibility cost measures the cost incurred by dependences between modules.
As Sullivan et al. pointed out [64, 63], the calculated values are not yet validated economic
projections but can only be interpreted as potentially valid indicators at present. It remains an open
challenge to justify precise estimates for real options in software design. However, as a back-of-the
envelope model, it provides ballpark figures and useful insights. Sullivan et al. first applied this
model to KWIC designs [64], and recently to a peer-to-peer networking protocol, HyperCast [63].
Lopes et al. applied this model to a web-service application, WineryLocator [49]. All these work
make use of NOV model as a comparative evaluation method to quantitatively compare two designs.
2.4.3 Challenges
DSM modeling is at the heart of the core of Baldwin and Clark’s theory and analysis methods.
Despite the fact that DSM modeling is powerful enough to reveal design coupling structure, make
the information hiding criterion precise, and support statistic analysis, DSM modeling falls short in
the following respects:
First, we have found that building DSMs representing conceptual design structures can be error-
prone and time-consuming. Our recent work [18] has revealed errors in published DSMs. Many
of the errors are due to the difficulty of seeing transitive relations among dependencies; and others,
Page 40
Chapter 2. Background 24
due to the lack of any precise definition of dependence. Recent work on matrix design represen-
tations constructs dependence structures from source code: MacCormack et al. [50] have modeled
the architectures of Mozilla and Linux using DSMs; Sangal et al. [55] have applied a commercial
product called Lattix [46] to analyze the architecture of Haystack, an information retrieval system
that has evolved over several years. In these efforts, the authors use code-level structures and de-
pendences as proxies for conceptual design structures. However, as we have pointed out, designers
frequently face questions before coding. The number of errors we found even in small design-level
DSM models suggests that manual DSMs are not a proper pre-coding design representation.
Second, a DSM only represents design dimensions, but not concrete choices within each di-
mension or the semantics of the constraints that relate decisions across dimensions. For example,
possible choices for the notification policy could be either push or pull, each having different con-
sequences. Similar to ADL and UML, DSMs do not explicitly express these choices, nor do they
support the analysis of their consequences.
Third, there are usually multiple ways to accommodate a change, but a DSM model does not
reveal each of them explicitly. Rather, a DSM reflects the union of possible ways in which a
given change might be accommodated. Indeed, in the presence of multiple compensations, the very
meaning of a dependence mark in DSMs becomes unclear: does a mark mean must change or is
subject to change in some scenario or could be changed but does not have to be?
In his dissertation [71], Woodward identifies the lack of support for the representation of nested
design spaces as a key shortcoming in DSM modeling. In work that aims to develop a theory of the
relationship between design structure and business strategy, he proposes to address this problem by
representing substitutable alternatives as values in inheritance hierarchies. Our work differs from
his in several ways. We provide precise semantics for dependences through our logical constraints,
and support the modeling of both complex subspaces and crosscutting constraints.
As a result, DSM modeling, despite its utility, has some significant weaknesses in terms of
support logically precise design analysis or a rigorously formal theory of coupling in design. One
contribution of our formal work, as discussed in the following chapters, is to provide a precise
formulation of the notion of pair-wise dependence between design decisions, which is at the heart
Page 41
Chapter 2. Background 25
of our method for computing DSMs having unambiguous semantics.
In summary of this section, the problem, in a nutshell, is that we continue to lack abstract
design representations that allow us to model or to reason adequately about the technical and eco-
nomic implications of dependences among complex software design decisions and relevant external
conditions.
This dissertation presents an analyzable formal design modeling framework addressing the
identified problems. The framework consists of a design representation approach and a number
of tool-supported analyses. The design representations approach models design spaces with de-
sign dimensions, external conditions, recursive structures, and crosscutting constraints; formally
accounts for Parnas’s information hiding modularity and the key notions in Baldwin and Clark’s
theory. The analysis techniques include (but not limited to): automatic DSM derivation—linking
software designs to existing engineering tools; quantitative changeability analysis–Parnas’s well-
known analysis previously done qualitatively; and NOV computation.
Page 42
Chapter 3
Overview of Core Modeling and Analysis Approaches
This chapter presents an informal and intuitive overview of our framework using a small example
to illustrate a full picture of the core models and analysis techniques. Figure 3.1 shows the relations
among the core models of our framework, and the automated analyses it enables.
3.1 Core Model Overview
The rounded boxes in Figure 3.1 represent the three core models representing decision-making
phenomena from different perspectives:
1. The augmented constraint network (ACN) consists of a constraint network modeling di-
mensions in which design decisions are made and constraints on decisions across these di-
mensions, and two additional data structures to formally account for the dominance relation
among design decisions, and the multiple ways a system can be clustered into modules. These
augmentations originate from corresponding notions in Baldwin and Clark’s DSM modeling.
These notions have played important roles in Baldwin and Clark’s DSM-based modularity
analysis, and we found that formalizing these notions and combining them with a constraint
network provide additional analysis power.
2. The design automaton (DA) is an operational, state-machine model that represents the dy-
namics of design variations driven by changes in design decisions. The DA model of an
26
Page 43
Chapter 3. Overview of Core Modeling and Analysis Approaches 27
���������� �������
� ��������
�����������
��� ����������
������������� ���������������
����� ���
�����������
����� ����
��� �����������
������
������
���������� ���� � ���
�������������! �����
���"�����# ��"$�����!��� � ���%�
��$������&�� �����������'''
�������
!�������
�� � ���
(�������
Figure 3.1: Core Models and Analysis
ACN is derived from the constraint network and the dominance relation of that ACN. The
DA model enables automated design impact analysis: given an original design and a se-
quences of envisioned changes, a DA model tells the different ways to accommodate these
changes.
3. The pair-wise dependence relation (PWDR) represents a summary pair-wise coupling rela-
tion on design dimensions. A PWDR summarizes the dependence relation modeled by a DA,
and is derived from the DA. Our prototype tool, Simon, can generate a DSM using the derived
PWDR to populate the matrix, and using a selected cluster to arrange the order of variables.
DSM modeling is at the heart of Baldwin and Clark’s modularity reasoning, and has proven
utility for engineering and economic analyses. This way, our framework, in principle, con-
nects conceptual designs modeled using ACNs with these existing analysis developed in other
realms.
Page 44
Chapter 3. Overview of Core Modeling and Analysis Approaches 28
Irwin et al. used a matrix example to illustrate how their aspect-oriented model addresses the
problems of object-oriented models [27]. This section models the small matrix design to pro-
vide a full picture of ACNs, DAs, PWDRs, and explains their connections to existing analyses.
Chapter 4 shows that our framework has the ability to uniformly account for aspect-oriented and
object-oriented modularity that appear to be distinct.
3.2 Augmented Constraint Network
For a matrix class, the best choice for its underlying concrete data structure depends on how the
client uses the matrix [27]. An array could be the best choice for a dense matrix, and a linked
list could be the best for a sparse matrix. The algorithms that implement the class methods must
correspond to the selected data structure. In our modeling and analysis approaches, we consider
both the design dimensions: data structure and algorithm, and the environment condition, the client
demand characteristics.
Following a long line of work in design theory and design automation, we took finite-domain
constraint networks (CNs) [51, 32] as the core of our design space representation. Logical con-
straints are a natural, powerful and already well understood notation for representing design di-
mensions. However, CNs do not model a number of important design decision-making issues that
are indispensable for our analysis. In particular, CNs do not readily model the dominance relation
among design decisions, and that there are multiple ways a system can be clustered into modules.
These are the key notions in Baldwin and Clark’s DSM modeling on which their modularity theory
is based. We augment a CN with additional data structures to formally model these notions, and
call it an augmented constraint network (ACN), which enables the automatic derivation of DSMs.
3.2.1 Constraint Network
Figure 3.2 shows the small matrix design modeled using a constraint network. A constraint network
consists of a set of variables, each having a domain comprising of a set of values, and the constraints
among these variables and values [51]. To model a conceptual design, each variable models a
Page 45
Chapter 3. Overview of Core Modeling and Analysis Approaches 29
1: scalar matrix: (dense,sparse);2: scalar ds: (list_ds,array_ds,other);3: scalar alg: (array_alg,list_alg,other);4: ds = array_ds => matrix = dense;5: ds = list_ds => matrix = sparse;6: alg = array_alg => ds = array_ds;7: alg = list_alg => ds = list_ds;
Figure 3.2: Matrix Constraint Network
design or relevant environmental dimension; the domain of a variable models possible choices
within each dimension; a domain comprises a set of values, each representing a possible decision
or an environmental condition.
In Figure 3.2, the scalar variables, ds and alg in lines 2 and 3, represent the dimensions of data
structure and algorithm; the matrix variable in line 1 represents the client demand. Their domains
follow within the parentheses. We use other as a value in many domains to model unelaborated
other possibilities. In Figure 3.2, for example, Line 2 models that the choices for the data structure
dimension are array, list, and other unelaborated choices.
A constraint network models the interdependence relation among design variables and environ-
ment conditions as a set of logical constraints. Line 4 in Figure 3.2, ds = array ds ⇒ matrix =
dense, states that the choice of an array is valid only if the client needs dense matrices. Logically,
the binding of the assuming variable implies the assumed binding. This might seem counterintu-
itive, but there could be other data structure choices that are also consistent with density, and we
do not want to model an overly constrained design in which array is the only choice. Thus the
implication arrows are opposite of what one might initially expect.
The variables, domains, and constraints constitute a finite-domain constraint network (FDCN),
the core of our framework.
3.2.2 Augmentation 1: Dominance Relation
However, constraint networks alone are not sufficient to model important design phenomena. Bald-
win and Clark’s theory proposes an important concept called design rules (DRs), and explains how
DRs decouple dependences among design decisions [7], which is essential to achieving a modular
Page 46
Chapter 3. Overview of Core Modeling and Analysis Approaches 30
structure. Sullivan et al. [64, 63], Griswold et al. [37], and Lopes et al. [49] have applied Baldwin
and Clark’s reasoning to software designs. The essence of the design rule concept is asymmetric
dominance. An abstract interface represents an instance of DR in software design: the designer
of an interface might prevail upon the implementer to conform to the interface specification; while
the implementer should not have the right to change the interface. Environment conditions are of-
ten outside of the designers’ sphere of influence, which represents another instance of asymmetric
dominance. The client demand condition matrix is such an example.
To address the problem that CNs do not readily model the asymmetric dominance relation, we
augment a CN model with a binary relation, dominance. (x,y) ∈ dominance indicates that, due to
policy or lack of control, changes in x cannot be compensated for by changes in y (even if changes
in y can be accommodated by changes in x). In the matrix example, assuming that the client’s need
dominates and that the design decisions must adapt accordingly, the matrix dominance relation thus
includes the following two pairs: (ds, matrix) and (alg, matrix).
3.2.3 Augmentations 2: Clustering
Module is another essential concept in software design. Parnas’s information hiding principle pro-
poses the criterion to decompose a system into modules; Baldwin and Clark’s modularity theory
uses a concept called proto-modules to denote a state where variables are aggregated as clusters,
but there are still dependences among these clusters. Their theory then explains how design rules
work by breaking the dependences among proto-modules to generate true modules—modules that
only depend on design rules, but not on “hidden” variables in other modules.
These theories center around operations on modules, a concept that a constraint network does
not lend itself to modeling. To address this problem, we also augment a CN model with an addi-
tional structure, cluster, to express the a priori clustering of subsets of variables into proto-modules.
The same design can have different clustering methods, reflecting different stakeholders’ views of
the design. The design decisions within a class, for instance, are usually considered together. On
the other hand, parts of a class may collaborate with parts from other classes to implement a feature,
which can also be viewed as a module. As a result, a CN can associate with multiple clusters, that
Page 47
Chapter 3. Overview of Core Modeling and Analysis Approaches 31
Figure 3.3: Matrix ACN model
is, a cluster set.
3.2.4 Core Design Description Model Summary
We call a constraint network augmented with a dominance relation and a cluster set an augmented
constraint network (ACN). Figure 3.3 shows the matrix ACN developed using our tool, Simon. Si-
mon allows the user to input the dominance relation through a grid GUI, as shown in Figure 3.3 (B).
Figure 3.3 (C) shows that there are two clusters in the matrix cluster set: Cluster1 and Cluster2; the
selected cluster Cluster2 contains two modules: environment and design; the environment module
contains the matrix variable; a design module contains the other two variables.
Describing conceptual designs using ACNs is general, abstract, and can be used at any level
of detail, from high-level specifications and architectural decisions to extremely detailed ones. It
captures the notion of design as a decision-making problem under constraints, and spans both design
variables and environment variables (which are not formally different from design variables). Many
concerns can naturally be represented as variables: security policy, choices of function or class
names and signatures, choices of design patterns to use, etc. These concerns play important roles
Page 48
Chapter 3. Overview of Core Modeling and Analysis Approaches 32
in software evolution, but prevailing design modeling methods are not designed to model them.
In addition to a dominance relation and a cluster set, other non-logical data structures could
be added to the core model. For example, Baldwin and Clark’s net option value (NOV) model
requires additional parameters, such as the technical potential of each module (cluster). Simon
supports NOV computation by providing a GUI in which the user can input estimated parameters
for a derived DSM, and compute its NOV value automatically. While the FDCN concept is limited,
we view it as a reasonable starting point for a formal account of coupling in design, viewed as a
decision-making activity.
A logic-based design description is not sufficient for reasoning design evolvability and eco-
nomic properties. Section 3.3 introduces a derived design evolution model that represents the
change dynamics within a design space, based on the propagation of changes through the constraint
network of an ACN. Section 3.4 introduces a derived pair-wise dependence model. Section 3.5
explains how these models connect to existing analysis techniques.
3.3 Operational Design Space Evolution Model
From an ACN model, we derive an evolution model called a design automaton (DA). The state set
of a DA is the design space implied by an ACN; the transitions of a DA model the design variation
constrained by the ACN. Figure 3.4 shows the full picture of the matrix DA.
3.3.1 Design Spaces
An ACN model brings up several notions that lead to the concept of a design space: The binding of
a value to a variable models a design decision or an environmental condition. An assignment is a
set of bindings, modeling a set of given decisions or environment conditions. For example, {matrix
= dense, ds = array ds} is an assignment. A valuation is an assignment involving all the variables
in the ACN. {matrix = dense, ds = array ds, alg = array alg} is a valuation.
A valuation of the variables of an ACN, that is, a binding of values to variables, satisfies a
constraint if and only if its projection onto the variables of that constraint is consistent with at least
Page 49
Chapter 3. Overview of Core Modeling and Analysis Approaches 33
Figure 3.4: The Matrix Design Automaton
Page 50
Chapter 3. Overview of Core Modeling and Analysis Approaches 34
Table 3.1: Matrix Design SpaceS0 matrix = sparse ds = other ds alg = other algS1 matrix = dense ds = other ds alg = other algS2 matrix = sparse ds = list ds alg = other algS3 matrix = sparse ds = list ds alg = list algS4 matrix = dense ds = array ds alg = other algS5 matrix = dense ds = array ds alg = array alg
one permitted assignment of that constraint. For example, the valuation {matrix = dense, ds =
array ds, alg = array ds} satisfies the constraint ds = array ds ⇒ matrix = dense because one of
its permitted assignment, {matrix = dense, ds = array ds}, is the subset of the valuation. A valid
design is a solution to the constraint network, that is, a valuation that satisfies all the constraints
defined in the ACN. All the valid designs constitute a design space as modeled by a given ACN 1.
Table 3.1 presents all the valid designs within the matrix design space. The designs are numbered
and constitute the state set of the matrix DA. Figure 3.4 illustrates these ideas.
3.3.2 Change Dynamics
Changing the value of one design decision can produce a valuation that violates one or more con-
straints. For example, if we start with the design S5 in Table 3.1, {(matrix = dense), (ds = array ds),
(alg = array alg)}, and change the data structure, ds, to list ds, the resulting state violates a con-
straint, producing an invalid design state. If such an invalidating change to a given decision is
forced, then, in general, the values of some subset of other variables will have to change in order to
restore the design to a consistent design state. In this case, both ds and alg have to be changed. Fig-
ure 3.5 depicts part of the matrix DA where all changes are originated from design S5, illustrating
three key properties of a DA:
1. We require that each transition in a DA is minimal. That is, each destination state differs only
minimally from the previous state, in the sense that no constituent change could be undone
while still preserving consistency. In Figure 3.5, starting with S5, if ds is changed to other ds,
1In general, there are many possible dimensions for a given set of requirements outside of the space modeled by anACN. Baldwin and Clark use the term design space to refer to the larger space of all possibilities. In this sense, an ACNis an explicit representation of a subspace of interest
Page 51
Chapter 3. Overview of Core Modeling and Analysis Approaches 35
��
��
����
��
�������� �
�� �
�������� �
�� �
�������� �
�� �
�� �� �
� �� � ����
� �� �
� �� � ���
���� ����������
��� ��������
�����������
�����������
��� ��������
������������
������������ ��� ��������
������������
������������
��� ��������
�����������
������������
��� ��������
�����������
������������
��� ��������
�����������
�����������
� �� �
� �� � ���
Figure 3.5: Partial Matrix Design Automaton
then there are at least two designs that can accommodate this change: S0: {matrix = sparse,
ds = other ds, alg = other ds} and S1: {matrix = dense, ds = other ds, alg = other ds}.
Changing alg to other ds in both S0 and S1 is indispensable, but changing matrix to sparse
in state S0 is not. We consider the transition from S5 to S1, labeled with {ds = other ds}, as
minimal, while the dotted arrow transition from S5 to S0 with the same label is invalid. As a
result, each transition in a DA models a minimal design perturbation.
2. A DA is nondeterministic. In general, there are multiple ways to accommodate a change. In
Figure 3.5, starting from state S5, {(matrix = dense), (ds = array ds), (alg = array alg)},
changing the client preference to sparse makes the design inconsistent. Making a set of
minimal changes to other variables to restore consistency leads to states S0, S2, or S3.
3. No transition in a DA may violate the dominance relation. If (x,y) ∈ dominance, then among
all the possible ways to restore consistency in the face of a change to x, those involving y are
excluded. For the matrix example, because (ds,matrix) ∈ dominance, the transition starting
from S5, triggered by changing ds to list ds, and leading to the client change in S3 (the dotted
arrow labeled ds = list ds) is precluded.
In summary, a DA captures all of the possible ways in which any change to any decision in any
state of a design can be compensated for by changes to minimal subsets of other decisions.
Page 52
Chapter 3. Overview of Core Modeling and Analysis Approaches 36
3.4 Pair-wise Dependence Relations
Pair-wise Dependence Relations (PWDRs) underlie many influential design representations. In
box-and-arrow style representations, such as ADL, UML, Call Graph, and Reflexion Models [53],
the arrows model different kinds of pair-wise dependence relations among boxes, such as function
calls, inheritance, and system I/O. DSMs, a special case of PWDR on design decisions, are the core
data structure underlying Baldwin and Clark’s theory.
Based on the DA model, we contribute a precise definition of what it means for one variable
to depend on another, enabling the automated derivation of PWDRs from DAs. In our work, we
have defined this situation as follows. Intuitively, for some consistent design state s in a DA, if
there is some change to a variable, x, such that the value of another variable, y, is changed in some
minimally perturbed destination state s′ of the DA, we say that y depends on x. We define the
coupling structure of a design ACN as the pair-wise dependence relation (PWDR) over all of its
variables: if y depends on x, then (x,y) ∈ PWDR.
We have shown that if the original design is S5: {(matrix = dense), (ds = array ds), (alg =
array alg)} and the envisioned change in client is (matrix = sparse), there are three new designs
accommodating this change in its DA:
S0: {(matrix = sparse), (ds = list ds), (alg = list alg)},
S2: {(matrix = sparse), (ds = list ds), (alg = other alg)}, or
S3: {(matrix = sparse), (ds = other ds), (alg = other alg)}.
Comparing the original design S0 with any of these new designs, we observe that both ds and alg
are involved in the minimal perturbations caused by the change to matrix. That is, ds depends on
matrix and alg depends on matrix. Similar analysis concludes that ds and alg depend on each other.
As a result, the matrix PWDR is the following set:
{(matrix,ds),(matrix,alg),(ds,alg),(alg,ds)}.
Page 53
Chapter 3. Overview of Core Modeling and Analysis Approaches 37
3.5 Connections to Evolvability and Economic Analysis
The three core models, ACNs, DAs, and PWDRs connect to (but are not limited to) the follow-
ing well-known evolvability and economic analyses, providing the foundation to automate these
analyses using tools.
1. Parnas’s changeability analysis. Given a software design, what are all the ways to compensate
for an anticipated sequence of individual changes? The question can be formulated as a map-
ping from a DA, an assignment modeling the current design, and a sequence of variable-value
pairs that model a sequence of changes to individual design decisions, to a set of sequences
of consistent design states modeling the feasible evolution paths for the given sequence of
changes. To compute this set of paths, we find the paths that start from the initial design and
go along the edges labeled with specified changes. Each path represents one way to com-
pensate for the given changes. The destination states are the new designs accommodating
the given changes. Chapter 7 formalizes these models, this problem, and its solution. Simon
automates this analysis based on the formalization. Chapter 4 shows how Simon automates
Parnas’s changeability analysis on KWIC precisely.
2. Parnas’s information hiding criterion. Sullivan et al. [64] previously published an observa-
tion that in an information design, the design rules are invariant with respect to changes in the
environment and that such changes should be accommodated by changes to hidden (subor-
dinate) design variables [64]. After clustering the variables representing external conditions
into an environment module, and clustering all the design rules into a design rule module, we
are able to formalize this principle as a predicate stating that a PWDR derived from an ACN
should not have any pair with a first element in the environment module, and the second in
the design rule module. This influential principle thus becomes a formal and mechanically
checkable criterion.
3. Design structure matrix analysis. A DSM, as introduced in Chapter 2, can be seen as com-
posed of a PWDR and an a priori clustering of variables. This framework provides rigorous
semantics for DSMs, and enables their automated generation from precise logical models.
Page 54
Chapter 3. Overview of Core Modeling and Analysis Approaches 38
Figure 3.6: Matrix DSM Generated by Simon
A PWDR derived from an ACN can be used to populate a DSM, and a cluster of the ACN
can be used to express the order in which the rows and columns are presented. Figure 3.6
is the DSM that Simon generates from the matrix ACN model. Chapter 4 shows how the
derived DSMs reveal both errors in published models and an issue overlooked by Baldwin
and Clark’s theory.
There are many existing analytical techniques and tools developed around DSMs in other
engineering realms, such as DeMAID/GA [42]. These tools analyze design architectures for
project scheduling, cyclic dependence detection, and so forth. In principle, our framework
connects conceptual designs expressed as ACNs with these existing engineering analysis
techniques, but this dissertation does not go further in this dimension.
4. Net option value computation. Chapter 2 introduced Baldwin and Clark’s net option value
model based on DSMs, and its application in software engineering. Simon supports the
association of NOV parameters with automatically derived DSMs, and computes NOV values
automatically.
3.6 Chapter Summary
In summary, this chapter has introduced the full picture of the three core models of our frame-
work using a small Matrix example: (1) the augmented constraint network (ACN) is a declara-
tive, constraint-based design model that describes dimensions in which design decisions must be
Page 55
Chapter 3. Overview of Core Modeling and Analysis Approaches 39
made and constraints on decisions across these dimensions; (2) the design automaton (DA) is an
operational model that represents the dynamics of design variations driven by changes in design
decisions; and (3) the pair-wise dependence relation (PWDR) represents a summary pair-wise cou-
pling relation on design dimensions. Of these three models, the ACN is primary, while the DA and
PWDR models are derived from a given ACN. The DA and PWDR models link conceptual designs
with existing evolvability and economic analysis techniques: the DA model enables quantitatively
changeability analysis; and the PWDR model enables the derivation of DSMs, which have proven
utility for engineering and economic analyses.
Page 56
Chapter 4
Modeling and Analysis of a Benchmark Design
Parnas’s KWIC (Key Word in Context) index system is a well-known and well-established bench-
mark for assessing concepts in software design. Sullivan et al. [64] presented an informal study
in which they tested the applicability of Baldwin and Clark’s theory to, and its potential value for,
software architectural design. Their experimental method was to test the theory’s ability to ac-
count for Parnas’s notion of, and his conclusions concerning, information hiding as a criterion for
modularizing designs. They thus conducted what amounted to a replication study of Parnas’s Key
Word in Context (KWIC) examples. They developed DSM models of Parnas’s KWIC designs, de-
rived options values estimated from the DSMs using Baldwin and Clark’s net option value models,
compared the results against Parnas’s conclusions, and found that the theory of Baldwin and Clark
made predictions consistent with the conclusions that Parnas had previously reached: the informa-
tion hiding criterion can add significant economic value to designs. This chapter presents the formal
replication of their informal study to evaluate the following claims:
1. Our framework provides a formal basis for (1) Baldwin and Clark’s key notions of design
dimension, design decision, design decision dependence, and design space; (2) for Sullivan
et al.’s formulation of Parnas’s notion of information hiding (as invariance of design rules
with respect to changes in environment parameters), but in a rigorously precise form that is
checkable by automated tools. We evaluate this claim by constructing the formal models of
KWIC systems.
40
Page 57
Chapter 4. Modeling and Analysis of a Benchmark Design 41
2. Our framework enables evolvability analysis of precise and abstract representations of design
architectures. To evaluate this claim, we formally model the five possible changes Parnas pos-
tulates as decision problems, and then use the Simon design impact analysis GUI to reveal the
differences of the two designs with respect to each change. We found that Parnas’s analysis
results are verified with numbers.
3. Our framework enables the automatic derivation of DSMs. Sullivan et al. previously con-
structed the DSMs for KWIC designs, based on which they applied Baldwin and Clark’s net
option value analysis and revealed Parnas’s information hiding criterion visually. We derive
DSMs using Simon for each design and find that the derived DSMs reveal the same informa-
tion hiding observation.
4. Our framework clarifies the notion of pair-wise dependence and makes the derived model
more reliable. We have found that manual DSM construction and NOV computation took us
a lot effort but still left a lot of ambiguities in the models. We compare the derived DSMs
with Sullivan et al.’s manually-constructed DSMs, and the comparison reveals several errors
and ambiguities in the published manually-constructed models, showing the power of formal
models and automated analysis.
5. Our framework automates Baldwin and Clark’s net option value analysis. Since there are
inconsistencies between the manually-constructed and the automatically derived DSMs, are
the NOV values we calculated based on the manual models still valid? We found that although
the comparative results are still valid, the NOV value for each model does change.
6. Our framework provides a more reliable basis for Baldwin and Clark’s modularity in design
analysis. Our experiment reveals an issue of Baldwin and Clark’s net option value computa-
tion based on their manually-constructed DSMs.
Section 4.1 introduces the Key Word in Context system using its standardized model [57].
Section 4.2 introduces in detail how the two KWIC designs are modeled by ACNs. Section 4.3
through Section 4.5 present the analysis results.
Page 58
Chapter 4. Modeling and Analysis of a Benchmark Design 42
������������ ���
�� � � � �� � �� ���� ��� �� �� � ����� � ����� � ���
�� ���� ���� � ����� �� � ���� �� ����� ��
�� � ����� �� �� �� � ����� ��
� ���� ���� � �� � � ���
� � � � �� �� �����
� ��� ��!�
Figure 4.1: KWIC Sequential Design Architecture [57]
4.1 Key Word In Context
In his seminal paper [54], Parnas describes the KWIC (Key Word in Context) index system as
follows:
“The KWIC index system accepts an ordered set of lines, each line is an ordered set of words,
and each word is an ordered set of characters. Any line may be “circularly shifted” by repeatedly
removing the first word and appending it at the end of the line. The KWIC index system outputs a
listing of all circular shifts of all lines in alphabetical order.”
Shaw et al. have standardized the architectural representations of the two KWIC designs Parnas
presents [57]. This box-and-arrow style representation, as shown in Figure 4.1 and 4.2, models these
functions, shared date structures, and I/O medium as blocks, and models direct memory access,
function calls, and system I/O using arrows. According to these figures, in the first sequential
design (SD), modules correspond to steps in the sequential transformation of inputs to outputs. The
SD design decomposes the system according to four basic functions performed: Input, Circular
Shift, Alphabetizing, Output, and Master Control (main).
In the second information hiding (IH) design, modules decouple design decisions deemed com-
plex or likely to change. Figure 4.2 represents the following modules as boxes:
Page 59
Chapter 4. Modeling and Analysis of a Benchmark Design 43
������������ ���
����
� ����� ���� ���
������
���� ���������
���������� ������������ �������� � ���
� � ��� ����
��������
� ! "
� ! "
��� ��
� ! "
�����
��� ��
���#��
��� �����#
��
� ! "�����
��� ��
� ! "
Figure 4.2: KWIC Information Hiding Design Architecture [57]
• The Linestorage module holds all characters from all words and lines.
• The Input module reads the data from a file and stores it in the Linestorage module.
• The Circularshift module produces circular shifts of lines and stored them in Linestorage
module.
• The Alphabetizing module sorts circular shifts alphabetically.
• The Output module prints the sorted shifts.
• The Master control module controls the sequence of method invocations in other modules.
Contrary to the Sequential design, in the IH design, the data is not shared between the computational
components. Instead, the IH design uses abstract data type (ADT) interfaces to decouple key design
decisions involving data structure and algorithm choices so that they can be changed without unduly
expensive ripple effects. For example, the Linestorage module provides the public interface that
allows other modules to set a character in a particular word in a particular line, read a specific
character, read, set or delete a particular word in a specific line, read a whole line at once, etc.
Page 60
Chapter 4. Modeling and Analysis of a Benchmark Design 44
4.2 ACN KWIC Models
This section explains how we used Simon to model Parnas’s KWIC designs into ACN models by
identifying variables and values, constraints, and dominance relations from Parnas’s prose [54], and
clustering these variables in different ways.
4.2.1 Variables and Values
Figure 4.3 shows our KWIC SD constraint network model. In this design Parnas views each in-
terface as providing two parts: an exported data structure and a function signature to be invoked
by the Master Control module. Given choices for these parameters, programmers produce function
implementations. As a result, we modeled the choices of function signature, data structure, and
implementation as design variables. For example, the Input module is modeled by three variables:
input_sig, input_ds and input_impl. As shown in Figure 4.3, variables ending with “_sig”
model the function signatures. The choices of implementations are modeled by the variables end-
ing with “_impl”. The choices of data structures are modeled by the variables ending with “_ds”.
Parnas assumes original designs in each case and analyzes the impact of changes.
We use orig (short for original) to generally represent a currently selected design decision in a
given dimension. There are many cases in which designers don’t need to think in terms of choices
from a small, finite domain: for example, the implementation of a class. However, once the designer
decides how to implement a class, a decision has been made implicitly, and there are always new
ways to implement the class, reflecting new decisions. As a result, we use (orig, other) as a
default domain for design dimensions without simple discrete choices. For example, the input_sig
has domain {orig, other}.
Figure 4.4 shows the constraint network for IH design. A new module, Line Storage, is present.
Its data structure variable linestorage_ds replaces the input_ds of the sequential design. The
IH Input module has no separate data structure. In the IH design, each module is also equipped
with an abstract data type interface, modeled by variables ending with “_ADT”. We model module
implementations and data structures in the same way.
Page 61
Chapter 4. Modeling and Analysis of a Benchmark Design 45
1: envr_input_format:{orig,other};2: envr_input_size:{small,medium,large};3: envr_core_size:{small,large};4: envr_alph_policy:{once,partial,search};5: input_sig:{orig,other};6: circ_sig:{orig,other};7: alph_sig:{orig,other};8: output_sig:{orig,other};9: master_sig:{orig,other};10: input_ds:{other,core4,disk,core0};11: circ_ds:{index,copy,other};12: alph_ds:{orig,other};13: output_ds:{orig,other};14: input_impl:{orig,other};15: circ_impl:{orig,other};16: alph_impl:{orig,other};17: output_impl:{orig,other};18: master_impl:{orig,other};19: input_impl = orig => input_sig = orig && input_ds = core4;20: circ_impl = orig => circ_sig = orig;21: alph_impl = orig => alph_sig = orig;22: output_impl = orig => output_sig = orig;23: master_impl = orig => master_sig = orig;24: circ_impl = orig => circ_ds = index;25: alph_impl = orig => alph_ds = orig;26: output_impl = orig => output_ds = orig;27: master_impl = orig => input_sig = orig;28: alph_impl = orig => circ_ds = index;29: alph_ds = orig => circ_ds = index;30: alph_impl = orig => input_ds = core4;31: circ_impl = orig => input_ds = core4;32: circ_ds = index => input_ds = core4;33: circ_ds = copy => input_ds = core4;34: output_impl = orig => input_ds = core4;35: output_impl = orig => alph_ds = orig;36: alph_ds = orig => input_ds = core4;37: input_ds = core4 => envr_input_size = medium || envr_input_size = small;38: input_ds = core0 => envr_input_size = small && envr_core_size = large;39: input_ds = disk => envr_input_size = large;40: circ_ds = copy => envr_input_size = small || envr_core_size = large;41: input_impl = orig => envr_input_format = orig;42: alph_impl = orig => envr_alph_policy = once;43: master_impl = orig => circ_sig = orig;44: master_impl = orig => alph_sig = orig;45: master_impl = orig => output_sig = orig;46: alph_ds = orig => envr_alph_policy = once;
Figure 4.3: KWIC Sequential Design Constraint Network
Page 62
Chapter 4. Modeling and Analysis of a Benchmark Design 46
1: envr_input_format:{orig,other};2: envr_input_size:{small,medium,large};3: envr_core_size:{small,large};4: envr_alph_policy:{once,partial,search};5: input_ADT:{orig,other};6: linestorage_ADT:{orig,other};7: circ_ADT:{orig,other};8: alph_ADT:{orig,other};9: output_ADT:{orig,other};10: master_ADT:{orig,other};11: linestorage_ds:{core0,core4,disk,other};12: circ_ds:{copy,index,other};13: alph_ds:{orig,other};14: output_ds:{orig,other};15: linestorage_impl:{orig,other};16: input_impl:{orig,other};17: circ_impl:{orig,other};18: alph_impl:{orig,other};19: output_impl:{orig,other};20: master_impl:{orig,other};21: linestorage_impl = orig => linestorage_ADT = orig && linestorage_ds = core4;22: input_impl = orig => input_ADT = orig;23: circ_impl = orig => circ_ADT = orig && circ_ds = index;24: alph_impl = orig => alph_ADT = orig && alph_ds = orig;25: output_impl = orig => output_ADT = orig && output_ds = orig;26: master_impl = orig => master_ADT = orig && linestorage_ADT = orig &&input_ADT = orig && circ_ADT = orig && alph_ADT = orig && output_ADT = orig;27: alph_impl = orig => circ_ADT = orig && linestorage_ADT = orig;28: circ_impl = orig => linestorage_ADT = orig;29: input_impl = orig => linestorage_ADT = orig;30: output_impl = orig => linestorage_ADT = orig && alph_ADT = orig;31: linestorage_ds = core4 => envr_input_size = medium || envr_input_size = small;32: linestorage_ds = core0 => envr_input_size = small && envr_core_size = large;33: linestorage_ds = disk => envr_input_size = large;34: circ_ds = copy => envr_input_size = small || envr_core_size = large;35: alph_ds = orig => envr_alph_policy = once;36: input_impl = orig => envr_input_format = orig;37: alph_impl = orig => envr_alph_policy = once;
Figure 4.4: KWIC Information Hiding Design Constraint Network
Page 63
Chapter 4. Modeling and Analysis of a Benchmark Design 47
Next we identify and model several critical dimensions in Parnas’s analysis. The sentence: “this
module reads the data lines from the input medium and stores them in core for processing by the
remaining modules. The characters are packed four to a word. . . ” implies a possible choice for the
input data structure dimension (modeled by input_ds in the SD design, and by linestorage_ds
in the IH design): a choice to pack four to a word. Similarly, the sentences: “[i]n cases where we
are working with small amounts of data it may prove undesirable to pack the characters; time will
be saved by a character per word layout. In other cases we may pack, but in different formats.”
and “for large jobs it may prove inconvenient or impractical to keep all the lines in core. . . .” imply
two other choices for the input data structure dimension: unpacked or disk storage. We model these
choices as a domain shared by input_ds in the SD ACN and linestorage_ds in the IH ACN:
{core4, core0, disk, other}.
These sentences also imply an important environment condition, input size, and its possible
conditions: small (fits packed in a small memory or unpacked in a large memory), medium (fits
in either memory if packed), or large (too big even for a large memory). In both Figure 4.3 and
Figure 4.4, this dimension is modeled as envr_input_size:{small, medium, large}. Simi-
larly, “Again, for a small index or a large core, writing them out may be the preferable approach.”,
implies a variable envr_core_size:{small, large}.
According to Parnas’s statements on the circular shift module: “. . . it prepares an index. . . it
leave its output in core. . . ,”, we identify a choice in the circ_ds dimension: index. From the
sentence: “for a small index or a large core, writing them out [copying] may be . . . preferable
[to indexing]. . . ”, we have another value for variable circ_ds: copy. As a result, the circ_ds
variable has a domain: {index, copy, other}.
4.2.2 Constraints
We represent the relationships among these design decisions as logical constraints, expressing the
conditions under which various decisions are valid.
In SD design, function implementations make assumptions about both the function signatures
and relative data structures. For example, the circular shift function implementation (circ_impl)
Page 64
Chapter 4. Modeling and Analysis of a Benchmark Design 48
has to know the circular shift function signature (circ_sig) and how the circular shift data
(circ_ds) is arranged in core. According to Parnas, in the original design circ_ds = index.
Lines 20 and 24 in Figure 4.3 model these constraints. To implement this function, it also has to
know the data structure of the Input module. In the current design, the characters are packed four to
a word, which is modeled as input_ds = core4. The constraint is modeled in Figure 4.3 line 32.
In the IH design, a module only knows the ADTs of other modules. For example, the circular
shift implementation (circ_impl) now assumes the linestorage_ADT, but not the line storage
data structure, as shown in Figure 4.4 line 28. For another example, Line 31, 32, and 33 in Figure 4.4
model that the effort to store data on disk is worthwhile only for large inputs; the choice to store
data unpacked works only for small inputs and large memories; and the choice to pack data makes
sense only for small and medium input sizes.
These environment conditions, design dimensions, possible choices within each dimension, and
their internal constraints are fundamental to Parnas’s changeability analysis. However, prevailing
box-and-arrow style representations, such as the ADL figures, are not designed to model them, nor
to enable rigorous and automated analyses.
4.2.3 Dominance Relation
In the SD design, Parnas noted: “All of the interfaces between the four modules must be spec-
ified before work could begin...” This sentence implies the choices of function signatures and
data structures that dominate other design variables. Consequently, the SD dominance rela-
tion includes pairs like: (input_impl, input_sig), (input_impl, input_ds), etc. Simi-
larly, in the IH case, the choices of ADT interface definitions dominate other decisions, and
pairs like (linestorage_ds, linestorage_ADT) are thus set in the IH dominance relation.
The interfaces and data structures in the SD ACN are design rules. The ADT interfaces in
the IH design are design rules. In both designs, we assume that the environmental conditions
are out of the control of the designers. Accordingly, (linestorage_ds, envr_input_size),
(linestorage_ds, envr_core_size), etc., are included in the dominance relations of both
ACNs.
Page 65
Chapter 4. Modeling and Analysis of a Benchmark Design 49
4.2.4 Cluster Set
There are multiple ways to cluster a design. Figure 4.5 shows the Simon clustering GUI supporting
different views of the same KWIC design. For purposes such as task assignment, we want to group
all variables involved in a particular function into a single module. For example, we could group the
envr_alph_policy, alph_ADT, alph_ds and alph_impl into a module, as shown in Figure 4.5
(b).
In the earlier work of Sullivan et al. [64], the authors observed that for a design to be truly
an information hiding modularization, the design rules should be invariant under changes in en-
vironment variables. To evaluate these two designs against this criterion, we want to cluster the
environment parameters, design rules and subordinate variables respectively into proto-modules. In
this case, for example, we group the envr_alph_policy, envr_input_size, envr_core_size,
and envr_input_format into an environment module, as shown in Figure 4.5 (c).
So far, we have modeled all the dimensions necessary for a number of analyses.
4.3 Quantitative Changeability Analysis
Parnas presents a comparative analysis of the changeability of the two designs based on their ability
to accommodate the following possible changes:
1. “The input format changes”, which implies that there could be other input format choice
other than the current one. Accordingly, we model the domain of envr_input_format as
(orig, new). In the original design, envr_input_format = orig. The change is modeled
as envr_input_format = new.
2. “The input size becomes so large that not all lines can be put in core”. We model this change
as envr_input_size = large.
3. “The input size gets so small that a word could be unpacked”, modeled as
envr_input_size = small.
Page 66
Chapter 4. Modeling and Analysis of a Benchmark Design 50
(a) No Clustering
(b) Task Assignment View
(c) Design Rule View
Figure 4.5: Simon Clustering GUI for the KWIC IH Design
Page 67
Chapter 4. Modeling and Analysis of a Benchmark Design 51
4. “The alphabetizing policy is changed to partial or search”,
modeled by envr_alph_policy = partial and envr_alph_policy = search. In the
original design envr_alph_policy = once.
Parnas’s informal comparative analysis can be formulated as follows: given an original design,
and given changes in environment (input size, core size, etc.), what are the feasible new designs that
accommodate the given changes? In particular, how many dimensions (variables) have to change
to get to these new design states? Figure 4.6 and Figure 4.7 are the snapshots of Simon design
impact analysis GUI. Figure 4.6 shows the input GUI in which an initial SD design is selected, and
a change is specified: envr_input_size is changing from medium to large. Figure 4.7 shows the
output GUI, in which the upper list shows the evolution paths, the middle list shows the differences
between the original design and the selected destination design, and the lower list shows the selected
new design.
We summarize all the changes and their impacts on both designs computed by Simon into
Figure 4.8. The numbers in the circles represent the design states of the DAs. The double circles
are the start states. Figure 4.8 shows part of the SD and IH DAs with states S18 and S1034 as the
respective start states. S18 corresponds to the original sequential design, and S1034 corresponds to
the original IH design. The design state numbers are automatically generated by Simon. Transitions
are labeled with changes shown in the table below.
The tables associated with the end states show what other variables are changed in the desti-
nation states. For example, in the SD DA, changing the input size to large (the transition labeled
C2) leads state S18 to state S555 or S865. In both of them, seven other variables are changed to
compensate for the driving change.
The numbers in the last two columns of the lower table summarize the number of other variables
that are affected by the changes in each design. The results confirm in a fully formal way that the
IH design involves fewer redesign requirements under changes. For example, when the input size
gets large, in the SD design, 7 dimensions have to be touched, while for the IH design, only 2.
So far, we have quantitatively confirmed Parnas’s qualitative analysis. The number of modules
that have to change is obviously a simple proxy for cost, but it is essentially the measure Parnas
Page 68
Chapter 4. Modeling and Analysis of a Benchmark Design 52
Figure 4.6: Tool Snapshot: KWIC SD Design Impact Analysis Input
Page 69
Chapter 4. Modeling and Analysis of a Benchmark Design 53
Figure 4.7: Tool Snapshot: KWIC SD Design Impact Analysis Output
Page 70
Chapter 4. Modeling and Analysis of a Benchmark Design 54
����
����
����
����
���
��
��
����
�����
�� ��� �
Change SD IHC1 envr_input_format = new 1 1C2 envr_input_size = large 7 2C3 envr_input_size = small 0 0C4 envr_alph_policy = partial 3 2C5 envr_alph_policy = search 3 2
� � � �� �
� � � �� �
� � ��� �
� � � �� �
� � � �� �
� � ��� �
�� ��� �
� � � �� �
� ��� �� �
�� ��� �
� ��� �� �
� � � �� �
� � ��� �
����
�������
���
�����
��
��
��
������
�� ��� �
� � � �� �
� � � �� �
� � � �� �
� � � �� �
��� � � �� �� � � �� �
��� � � �� �� � � �� �
��������������������� ��
���
��
���
��
��
Figure 4.8: Partial Non-deterministic Finite Automaton for SD and IH design
used in this paper. Moreover, we hypothesize and believe that by associating each variable with an
economic value, we expect this model to be extended with a richer cost-of-change model to further
estimate the economic cost of each evolution step, if requested.
4.4 Design Structure Matrix Derivation
After generating the DA and the PWDR by clicking the “Solve” menu item in Simon, the user is
able to derive DSMs by providing additional clustering data. We compare the DSMs that Simon
generates from our KWIC ACN models with manual results we presented in previous work [64].
We generated DSMs through Simon using the clustering method seen in Figure 4.5 (c).
To ease the comparison, we copied and pasted the DSM generated from Simon into Excel
and marked the differences from the published manual models. Figures 4.9 and 4.10 present the
SD and IH DSMs generated by Simon and presented in Excel. In these DSMs, all the cells with
dark backgrounds and white foregrounds represent discrepancies between derived DSMs and those
developed by hand and presented in the earlier work of Sullivan et al. [64]. A blank dark cell means
Page 71
Chapter 4. Modeling and Analysis of a Benchmark Design 55
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1:envr_input_format .
2:envr_input_size . x
3:envr_core_size x .
4:envr_alph_policy .
5:input_fun_sig .
6:circ_fun_sig .
7:alph_fun_sig .
8:output_fun_sig .
9:master_fun_sig .
10:input_ds x x . x x
11:circ_ds x x x . x
12:alph_ds x x x x .
13:output_ds .
14:input_fun_impl x x x x .
15:circ_fun_impl x x x x .
16:alph_fun_impl x x x x x x .
17:output_fun_impl x x x x x x x .
18:master_fun_impl x x x x x .
����������
��� �������
�������
�����
Figure 4.9: KWIC SD Derived DSM
that there was an erroneous mark in the manual version. A dark cell with an “x” in it means that
the dependence was missed in the manual version. In each DSM, variables 1–4 are environment
variables. The next run of variables is the design rule variables. The final run models the remaining
open design choices.
By comparison, we are able to answer the validation questions for DSM derivation. First of all,
our computed DSMs are largely consistent with the earlier results, validating the modeling and anal-
ysis concept. They reveal exactly the same key observations: the design rules, load-bearing walls of
an information hiding design, should be invariant with respect to changes in the environment, and
such changes should be accommodated merely by changes to hidden (subordinate) design variables
within independent modules.
There are differences, however, which we now address. First, the computed DSMs reveal sub-
tle errors in the manually produced DSMs, supporting our intuition that logic modeling and au-
tomated analysis are more reliable than manual modeling and analysis. In the derived IH DSM,
cells (17, 7) and (19, 8) reveal dependences missing from the manual model. It also lacks sev-
eral dependences that should not have been present in the manual version. An extra variable,
input_ds, which is redundant with linestorage_ds, was removed. Finally, the environment vari-
ables envr_core_size and envr_input_size are also now shown as dependent, in that a change
Page 72
Chapter 4. Modeling and Analysis of a Benchmark Design 56
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1:envr_input_format .
2:envr_input_size . x
3:envr_core_size x . 4:envr_alph_policy .
5:line_storage_adt .
6:input_adt .
7:circ_adt .
8:alph_adt .
9:output_adt . 10:master_adt .
11:line_storage_ds x x . x 12:line_storage_impl x x x .
13:input_impl x x x .
14:circ_ds x x . x 15:circ_impl x x x .
16:alph_ds x . x 17:alph_impl x x x x .
18:output_format . x 19:output_impl x x x .
20:master_impl x x x x x x .
����������
��� �������
�������
�����
Figure 4.10: KWIC IH Derived DSM
in one can be compensated for by a change to the other.
The second class of differences between Simon’s output and the manual calculation consists of
important ripple effects in the computed DSMs that are not shown in the manual version. For exam-
ple, the manually-constructed design (SD) DSM had no dependence between output_fun_impl
and circ_ds. The derived DSM revealed this dependence owing to two constraints in its ACN
model:
output_fun_impl = orig => alph_ds = orig
alph_ds = orig => circ_ds = index
Parnas’s paper confirms the presence of this dependence and the correctness of the formal model
and derived DSM. Even in such a small example, manual DSM is error-prone. Automated tool
support is critical for correct modeling and analysis of complex design constraint networks.
4.5 Net Option Value Computation
Sullivan et al. [64] computed the NOV values for the manually-constructed DSMs. Since the de-
rived DSMs are different from the manual versions, we redo the experiment to see if the results are
consistent with that previously published work.
Page 73
Chapter 4. Modeling and Analysis of a Benchmark Design 57
In that work, Sullivan et al. calculated the NOV value for each design using the formula intro-
duced in Chapter 2: the SD design has 0.26 system NOV, and the IH design has 1.56, predicting
that the IH design provides 6 times more value in the form of modularity than the SD design. In
other words, the value of the SD design increases to 1.26, and the value of the IH design increases
to 2.26, suggesting that the IH version of the system was twice as valuable as the SD.
In Simon, we can repeat the result exactly by first clustering the DSMs as the manual ones,
and assign these modules the same parameters as we did before. Figure 4.11 and Figure 4.12 are
the Simon snapshots repeating the previous experiments. The right upper tables in Figure 4.11
and 4.12 show our assumptions about the technical potential, complexity, and visibility cost of the
modules in the SD and IH designs.
Figure 4.11: NOV Computation for Manual KWIC SD
Page 74
Chapter 4. Modeling and Analysis of a Benchmark Design 58
Figure 4.12: NOV Computation for Manual KWIC IH
Since the derived DSMs that Simon works on are quite different from the manual ones, this
repetition assumes: (1) the new design uses the same environment parameters we used for the
manual DSMs; (2) the coupling relations among hidden modules are the same; (3) each module
has the same parameters. We now analyze if these assumptions are still valid in the newly derived
DSMs.
First, in the previous work [64], we hypothesized and categorized the possible forces driving
changes that Parnas might have selected, and appear to be implied in his analysis into three environ-
ment variables: computer configuration (e.g., device capacity, speed); corpus properties (input size,
language—e.g., Japanese); and user profile (e.g., computer savvy or not, interactive or offline), as
shown in Figures 8, 9, and 10 of the paper [64]. The environment variables we used in the ACN
Page 75
Chapter 4. Modeling and Analysis of a Benchmark Design 59
modeling are the explicit translation of Parnas’s prose mentioning possible changes literally. Both
environments are valid models, and the first assumption is valid.
Second, as we can see from the DSMs in Figure 4.9 and 4.10, apart from the environment
section, the main differences between the manual DSMs and derived DSMs concentrate on the de-
pendences between design rules and hidden modules, and there are no dependences among hidden
modules in both manual and derived DSMs. So the second assumption is valid.
The third assumption is problematic though: (1) in both ACN models, we separate the interfaces
of the Master Control module from its implementation, which influences the complexity count. For
example, the derived SD DSM now has one more design variable than the manual SD model; (2) in
the manual IH model, the input_ds was redundant with linestorage_ds, and is removed in the
derived IH DSM shown in Figure 4.10. This difference influences the complexity and technical po-
tential estimations of the Input module in the derived IH DSM: the complexity changes from 0.125
to 0.0625, and technical potential lowers from 2.5 to 1.6. We show the updated NOV computations
for both designs in Figure 4.13 and 4.14.
Comparing the new NOV computation in Figure 4.13 with the old one in Figure 4.11, and
Figure 4.14 with Figure 4.12, each pair shares the same technical potential (assuming the same
environments), and has different complexity estimations. According to the derived DSMs, the
system NOV is now 0.29 for the SD design and 1.30 for the IH design. Still, focusing just on
modularity, the model predicts that the IH design provides 4.5 times more value in the form of
modularity than the SD design. Our comparative result is still valid.
During this exercise, we find an issue in Baldwin and Clark’s NOV formula introduced in Chap-
ter 2: when they calculate the visibility cost of module i of size n using the term, Zi = Σj sees icn, it
is not clear whether the ripple effects should be counted in. That is, if j sees i, and k sees j, it is not
clear if the cost of changing k should be counted.
Baldwin and Clark’s DSMs are not intend to show ripple effects and they have to compute
high-order DSMs in order to account for transitive dependences. Using our derived DSMs, all the
affected variables due to transitive relations are shown in the same column of the changing variable.
As a result, their formula can be applied unambiguously using our derived DSMs.
Page 76
Chapter 4. Modeling and Analysis of a Benchmark Design 60
Figure 4.13: NOV Computation for Derived KWIC SD
4.6 Chapter Summary
This chapter evaluates a number of claims about our framework against a software engineering
benchmark, Parnas’s KWIC designs. The data and analysis have supported the following hypothe-
ses: this framework is expressive enough to formally account for key notions of Parnas’s and Bald-
win and Clark’s theories, to enable the derivation of design coupling structures as pair-wise relations
on design decisions and present them into DSMs, to automate Parnas’s changeability analysis, and
Baldwin and Clark’s net option value analysis. The automatic changeability analysis results have
verified Parnas’s qualitative results quantitatively. The comparison of the derived DSMs with man-
ual models reveals errors in published work, showing the power of formal models and automated
analysis. The automated NOV calculation enables us to do sensitivity analysis of this model, com-
Page 77
Chapter 4. Modeling and Analysis of a Benchmark Design 61
Figure 4.14: NOV Computation for Derived KWIC IH
paring different results by changing different parameters.
Page 78
Chapter 5
Model Decomposition and Result Integration
As with many formal analysis techniques, such as model checking, the difficulty of constraint sat-
isfaction limits the size of models that can be analyzed in practice. As the reader may have noticed,
the DA model is more demanding: it requires an explicit representation of the entire space of sat-
isfying solutions. The number of the solutions increases exponentially in the number of variables
involved, and it is impractical to represent DAs explicitly in cases where the state spaces are very
large. This chapter addresses the problem caused by the need to represent the complete state space
using a divide-and-conquer approach. Our approach is to make use of the non-trivial dominance
relation in an ACN model to split an ACN at natural breaking points into a number of smaller
sub-ACNs, solve each sub-ACN separately, and integrate partial results, but only as needed, to pro-
duce the desired answer more efficiently. The performance gain comes from two respects: (1) the
SAT solver now deals with much smaller models; (2) we no longer need to generate the full DA.
We claim that this approach has the potential to dramatically reduce analysis time at least for the
problems we studied. This chapter provides support for this claim by comparing the analysis time
used for the KWIC designs, with and without the decomposition. The experiment demonstrates
dramatic performance improvement: without decomposition, it takes hours to generate DSMs from
the KWIC IH ACN; after decomposition, we got the same results within a minute.
In this chapter, we use the KWIC information hiding ACN as a running example to show how
to decompose a KWIC IH ACN into a set of smaller sub-ACNs, solve each of them individually,
62
Page 79
Chapter 5. Model Decomposition and Result Integration 63
1: envr_input_size:{small,medium,large};2: envr_core_size:{small,large};3: linestorage_ADT:{orig,other};4: linestorage_ds:{core0,core4,disk,other};5: linestorage_impl:{orig,other};6: circ_ADT:{orig,other};7: circ_ds:{copy,index,other};8: circ_impl:{orig,other};9: linestorage_impl = orig => linestorage_ADT = orig && linestorage_ds = core4;10: linestorage_ds = core4 => envr_input_size = medium || envr_input_size = small;11: linestorage_ds = core0 => envr_input_size = small && envr_core_size = large;12: linestorage_ds = disk => envr_input_size = large;13: circ_ds = copy => envr_input_size = small || envr_core_size = large;14: circ_impl = orig => circ_ADT = orig && circ_ds = index && linestorage_ADT = orig;
Figure 5.1: Partial KWIC Information Hiding ACN model
and integrate the results. Chapter 7 formalizes the decomposed models and proves the accuracy of
the integrated results. Chapter 9 presents additional evidence in this dimension.
5.1 ACN Splitting
To provide a full picture of how this approach works, we consider part of the KWIC information
hiding model involving 8 design variables and the constraints among them as an example ACN.
Figure 5.1 presents the constraint network. One of the ACN elements, cluster set, does not take
effects in the splitting approach, so we ignore it in this chapter. The dominance relation of the
example dictates the following: (1) variables starting with “_envr” (environment variables) should
not be influenced by any other variables; (2) variables ending with “_ADT” (design rules) should not
be influenced by variables ending with “_ds” or “_impl”.
Our splitting approach takes the following steps:
1. Construct a graph depicting how these variables are syntactically connected. To construct
such a graph, we first change the constraints in an ACN into a conjunction normal form
(CNF). For example, the constraints shown in Figure 5.1 are translated into the CNF shown
in Figure 5.2.
Page 80
Chapter 5. Model Decomposition and Result Integration 64
(¬linestorage impl = orig ∨ linestorage ADT = orig) ∧(¬linestorage impl = orig ∨ linestorage ds = core4) ∧(¬linestorage ds = core4 ∨ envr input size = medium ∨ envr input size = small) ∧(¬linestorage ds = core0 ∨ envr input size = small) ∧(¬linestorage ds = core0 ∨ envr core size = large) ∧(¬linestorage ds = disk ∨ envr input size = large) ∧(¬circ ds = copy ∨ envr input size = small ∨ envr core size = large) ∧(¬circ impl = orig ∨ circ ADT = orig) ∧(¬circ impl = orig ∨ circ ds = index) ∧(¬circ impl = orig ∨ linestorage ADT = orig)
Figure 5.2: Conjunctive Normal Form
Without loss of generality, we assume that each variable in the ACN is involved in at least one
clause. After that, we model each clause using a directed complete subgraph: each variable
is a node, each node connects to every other node, and their values are ignored. As a result,
the whole CNF transforms into a directed graph: Gcnf =< V,E >. V is variable set of the
ACN. We assume that this graph is at least weakly connected. Otherwise, we consider each
subgraph separately.
This graph models the most conservative dependence relation among variables: if two vari-
ables appear in the same clause, they depend on each other syntactically. As a result, for the
partial KWIC example shown in Figure 5.1, every variable connects with every other variable
directly or indirectly, as shown in Figure 5.3.
2. Remove edges according to the dominance relation. If a variable pair (vi,vj) ∈ dominance,
remove the edge < vi,vj > from the Gcnf . We call the resulting graph G =< V,E >. If G
is not weakly connected, consider each subgraph separately. In Figure 5.3, the dotted lines
labeled X are the edges to be excluded.
3. Construct the condensation graph. We use M. Sharir’s algorithm [6] to find strongly-
connected components of G and construct its condensation graph: G∗ =< V∗,E∗ >. Fig-
ure 5.4 shows the condensation graph of Figure 5.3, in which V∗ = {V0,V1,V2,V3,V4}
Each node of G∗ represents a strongly-connected component of G, comprising a set of vari-
ables. G∗ is a directed acyclic graph (DAG) [6], containing a partial order of V∗.
Page 81
Chapter 5. Model Decomposition and Result Integration 65
� �
�
�
� �
�
��� � � �� � � � � �
��� � � �� � � ��� � �
� �� � � �
� �� �� �
� �� ��� � �
��� � � �� � � �� �
� � � ��� � � ��� �� � � � � �� � � �� �� �
Figure 5.3: Partial KWIC CNF graph
����������� ������������������
�������������������
������������
����������������������������������
� ������������������
��
��
��
��
�
Figure 5.4: KWIC Condensation Graph
Page 82
Chapter 5. Model Decomposition and Result Integration 66
4. Constructing sub-ACNs. The number of sub-ACNs is equal to the number of minimal ele-
ments of G∗. Figure 5.4 has two minimal elements: V3 and V4, so we can construct two
sub-ACNs in the following way:
(a) Construct variable set. For each minimal element, the union of all the nodes, each being
a set of variables, of G∗ that on the chains ending with the element is the variable set of a
sub-ACN. According to Figure 5.4, the variable set of the first sub-ACN is the union of
V0, V1, and V3: {envr input size, envr core size, linestorage ADT, linestorage ds,
linestorage impl}. The union of V0, V1, V2, and V4 is the variable set of the second
sub-ACN:
{envr input size, envr core size, linestorage ADT, circ ADT, circ ds, circ impl}.
(b) Construct the constraint set for each sub-ACN. If the variable set of a sub-ACN contains
all the participating variables of a CNF clause, we put the clause into the constraint set
of the sub-ACN. As a result, we obtain the two constraint networks for these two sub-
ACNs, as shown in Figure 5.5 and Figure 5.6. It is possible that there are clauses that
do not belong to any sub-ACN. In this case, we consider the graph, Gleft, made from
the complete graphs of these clauses. Each connected component of Gleft forms a new
sub-ACN: the corresponding clauses form its constraint set; all the variables involved
in these clauses form its variable set.
(c) Construct the dominance relation for each sub-ACN. From the dominance relation of
whole ACN, each sub-ACN selects the subset that only involves its own variables.
We have observed that in practice this method tends to group variables into cohesive, sparsely
overlapping sets that correspond to key features of the design [19]. For example, Figure 5.5 shows
an ACN corresponding to the line storage function; Figure 5.6 shows an ACN corresponding to the
circular shift function.
This approach splits the whole KWIC information hiding ACN into 6 sub-ACNs, having 6, 6,
4, 5, 7, and 5 variables respectively. Instead of taking hours to solve a constraint network with 20
Page 83
Chapter 5. Model Decomposition and Result Integration 67
envr_input_size:{small,medium,large};envr_core_size:{small,large};linestorage_ADT:{orig,other};linestorage_impl:{orig,other};linestorage_ds:{core0,core4,disk,other};(!linestorage_impl = orig || linestorage_ADT = orig) &&(!linestorage_impl = orig || linestorage_ds = core4) &&(!linestorage_ds = core4 || envr_input_size = medium || envr_input_size = small) &&(!linestorage_ds = core0 || envr_input_size = small) &&(!linestorage_ds = core0 || envr_core_size = large) &&(!linestorage_ds = disk || envr_input_size = large)
Figure 5.5: The First sub-ACN
envr_input_size:{small,medium,large};envr_core_size:{small,large};linestorage_ADT:{orig,other};circ_impl:{orig,other};circ_ADT:{orig,other};circ_ds:{copy,index,other};(!circ_ds = copy || envr_input_size = small || envr_core_size = large) &&(!circ_impl = orig || circ_ADT = orig ) &&(!circ_impl = orig || circ_ds = index) &&(!circ_impl = orig || linestorage_ADT = orig)
Figure 5.6: The Second sub-ACN
Page 84
Chapter 5. Model Decomposition and Result Integration 68
variables, Simon now invokes Alloy and its underlying SAT solvers to solve these much smaller
models separately, which takes only seconds. Similarly, the DA and PWDR models for each sub-
ACN can be generated individually and quickly.
As we have already explained, the design automaton (DA) is the key model enabling both the
derivation of the PWDR model and design impact analysis. However, integrating these sub-DAs
into a full DA would again be costly. Fortunately, for some of the analyses that we are interested in,
it is not necessary to depend on an integrated full DA. Instead, analyses can be done on each sub-
ACN, and the results can be integrated into the solution to the problem modeled using the whole
ACN. In particular, we can compute design structure matrices and analyze design change impact in
this way.
5.2 Integrating Analysis Results
After decomposing an ACN into a number of sub-ACNs, we need to generate the sub-DAs of
sub-ACNs for the purpose of analysis. However, a sub-ACN generally has both a smaller set of
variables and a weaker set of constraints than the large ACN from which it was derived, and so can
have solutions that are inconsistent with not only those of the full ACN but also of other sub-ACNs.
In order to generate sub-DAs, we first compute the consistent solution set of a sub-ACN, by which
we mean the subset of solutions of a sub-ACN in a given decomposition of a full ACN where these
solutions are all consistent with the solutions of other sub-ACNs in the given decomposition. After
that, we generate consistent sub-DAs of these sub-ACNs. Chapter 7 formalizes these ideas. In this
chapter, we use sub-DAs to stand for consistent sub-DAs. After sub-DAs are generated, both design
impact analysis and DSM derivation can be done using these sub-models, and their results can be
integrated into full solutions.
5.2.1 Integrating Design Impact Analysis Results
We consider the basic design impact analysis question: given an original design, what are all the
ways to compensate for a design decision change (or an environment condition change)? Instead of
Page 85
Chapter 5. Model Decomposition and Result Integration 69
��
����
�������
������
�����
�������
������
�����
���� �����
�����
� ���
��
���������� �����������
����������������������
������� ��������������
������� ���������������
������ ��������������
���������� �����������
����������������������
������� ��������������
������� ����������� ���
������ ������������ ���
���������� �����������
����������������������
������� ��������������
������� ������������!
������ ������������ ���
���������� �����������
����������������������
������� ��������������
������� ����������� ���
������ ������������ ���
�������������� ���������� ���������������������������������
Figure 5.7: Partial DA for the Linestorage sub-ACN
modeling the original design as a solution to the ACN, deriving the DA of the ACN, and finding the
solution, as introduced in section 3, this section presents a method to find sub-solutions from each
sub-ACN and sub-DA, and integrate them into a full solution. We take the ACN with its constraint
network shown in Figure 5.1 as an example, and suppose the original design is as follows:
1: envr_input_size = medium
2: envr_core_size = small
3: linestorage_ADT = orig
4: linestorage_ds = core4
5: linestorage_impl = orig
6: circ_ADT = orig
7: circ_ds = index
8: circ_impl = orig
We have shown that this ACN can be split into two sub-ACNs (their CNs are shown in Figure 5.5
and Figure 5.6). For the designated starting design, we first find the corresponding start states from
each sub-ACNs, in this example, the state L0 in Figure 5.7 and state C0 in Figure 5.8. We call L0
and C0 compatible states because their shared variables have the same values.
Given a changing variable, we distinguish the following two cases:
Page 86
Chapter 5. Model Decomposition and Result Integration 70
��
��������
��� ����
�������
�������������� ���������
��������������� ���������
�������������������������
������������������
������������������
�����������������
�������������� ���������
��������������� ���������
�������������������������
������������������
������������������
�����������������
Figure 5.8: Partial DA for the CircularShift sub-ACN
If the changing variable is local, that is, no other sub-ACNs involve this variable, then the design
impact analysis can be done locally using the sub-DA. linestorage ds of the sub-ACN shown in
Figure 5.5 is a local variable, only having local impact: changing its value to other leads state L0
to L1, as shown in Figure 5.7.
If the changing variable is shared among sub-ACNs, e.g., envr input size appears both in Fig-
ure 5.7 and Figure 5.8, we just need to integrate the results as follows:
1. Find the destination states from each sub-DA labeled with this change: in Figure 5.7, chang-
ing envr input size to large reaches L2 and L3; in Figure 5.8, the same change leads to C1.
2. Compute the cross product of these two sets of states, that is, {L2,L3} × {C1}. In this
procedure, a state in one DA only makes a union with another state in another DA when these
two states are compatible. As a result, L2S
C1 and L3S
C1 are the two destination designs
that we are looking for.
In other words, if the full DA for the full ACN had been generated, the design impact analysis for
the same original design under the same change would have led to the design states that are identical
to the integrated destination states. Chapter 7 formally proves that a DA, identical to the DA directly
derived from the original ACN, can be composed from the sub-DAs, and that the integrated design
impact analysis is valid.
Page 87
Chapter 5. Model Decomposition and Result Integration 71
5.2.2 Integrating Coupling Structures
It is possible that a sub-DA solution satisfies its own constraints, but makes the full constraint
network inconsistent. We call such a solution of a sub-DA an incompatible solution. In order to
compose the pair-wise dependence relation (PWDR) of the full ACN from the derived sub-PWDRs,
we first need to remove these incompatible states from each sub-DA. After that, we derive sub-
PWDRs from sub-DAs and compute the union of the sub-PWDRs to get the PWDR for the full
ACN.
This method does not reduce the complexity of deriving a PWDR from a constraint network,
which is NP-complete. In fact, the operation of removing incompatible states has exponential
complexity. The essence of our method is to provide a balance between two extremes: on one
extreme, large ACN is solved as a whole; on the other extreme, each clause of a CNF expression
can be seen as an individual ACN that can be solved independently. However, in order to integrate
the sub-PWDRs derived from these sub-ACNs, comparing each solution of each sub-ACN with
every other solution in every other sub-ACN would be again time-consuming.
Our method decomposes an ACN so that each sub-ACN only needs to compare with other sub-
ACNs that share variables with it. For example, the two KWIC sub-ACNs we present have three
shared variables. The comparison and incompatible state removing operation are executed along
with the DA generation procedure. People have explored many methods to decompose and cluster
a constraint network. As we discuss in section 5.4, our method is orthogonal to these methods, and
combining them may further improve the performance.
Figure 5.9 presents two snapshots of Simon in which the sub-DSMs are derived from the sub-
ACNs shown in Figure 5.5 and Figure 5.6. These two sub-DSMs are parts of the full DSM shown in
Figure 4.10, Chapter 4. We have used Simon to combine full DSMs for both the information hiding
and sequential designs, which are exactly the same as the DSMs we generated from full ACNs,
as presented in Chapter 4. Chapter 7 proves that the integrated PWDR is identical to the PWDR
derived directly from the original ACN.
Page 88
Chapter 5. Model Decomposition and Result Integration 72
(C)
Figure 5.9: KWIC SD Modularized
5.3 Observations and Performance
We decompose both the SD and IH ACNs introduced in Chapter 4, and compare the decomposed
sub-models. Simon decomposes the IH ACN into 6 sub-ACNs, having 6, 6, 4, 5, 7, and 5 variables,
as summarized in Table 5.1. The SD ACN is decomposed into 5 sub-ACNs, having 9, 8, 8, 9, and
6 variables, as summarized in Table 5.2.
We observe that there is one more sub-ACN decomposed from the information hiding design,
and that most information hiding sub-ACNs are smaller than the sequential sub-ACNs. From the IH
sub-models, it is easy to tell that each sub-model corresponds to a main function. We have shown
that the CN in Figure 5.5 corresponds to the line storage function, and that the CN in Figure 5.6
corresponds to the circular shift function. The other 4 sub-ACNs correspond to the alphabetizing,
circular shift, output, and master control functions respectively. However, after decomposing the
SD ACN, it is hard to tell immediately what function each sub-ACN models. Figure 5.10 shows
one of the SD sub-DSMs. We get similar observations from the other decomposed SD sub-ACNs.
We now compare the performance of design impact analysis and DSM derivation from a full
ACN and from decomposed sub-ACNs. Without decomposition, for a KWIC SD model with 18
variables, it took Alloy about an hour on a Pentium 1.5 GHz, 512 MB RAM PC to find the 12018
solutions and then 11 minutes to compute the DA and DSM. For the IH model with 20 variables,
Page 89
Chapter 5. Model Decomposition and Result Integration 73
Table 5.1: The Variables of IH sub-ACNssub- Size Variables CN DAACN (sec) (sec)1 6 alph impl, alph ADT, alph ds, circ ADT, 14 < 1
linestorage ADT, envr alph policy2 6 circ impl, circ ADT, circ ds, linestorage ADT, 18 < 1
envr input size, envr core size3 4 input impl, input ADT, linestorage ADT, envr input format, 6 < 14 5 linestorage impl, linestorage ADT, linestorage ds, 9 < 1
envr input size, envr core size5 7 master impl, master ADT, linestorage ADT, input ADT, 20 < 1
circ ADT, alph ADT, output ADT6 5 output impl, output ADT, output ds, 8 < 1
linestorage ADT, alph ADT
Table 5.2: The Variables of SD sub-ACNssub- Size Variables CN DAACN (sec) (sec)1 9 envr input size, envr core size, envr input format, input sig, 6 < 1
input ds, input impl, alph ds, envr alph policy, circ ds2 8 envr input size, circ sig, circ ds, envr core size, 22 < 1
circ impl, envr alph policy, alph ds, input ds3 8 envr input size, alph sig, circ ds, envr core size, 17 < 1
alph impl, envr alph policy, alph ds, input ds4 9 envr input size, circ ds, alph ds, envr core size, input ds, 10 < 1
output sig, envr alph policy, output ds, output impl5 6 master impl, master sig, input sig, circ sig, alph sig, output sig 68 < 1
Figure 5.10: A SD sub-ACN
Page 90
Chapter 5. Model Decomposition and Result Integration 74
Alloy took about three hours to find 34907 solutions, and the DA and DSM computation took
another 2 hours 13 minutes. After decomposition, all the DAs and DSMs are generated in about 1
minute.
Since Alloy is not designed for the purpose we are using it, the original inefficiency is partially
due to the discrepancy. However, our performance improvement is irrelevant to this fact: after
decomposition, each sub-ACN is still solved by Alloy. The performance gain comes from the fact
that Simon now invokes multiple solvers, each solving a much smaller model, and then integrates
the results quickly.
5.4 Related Work
In the constraint network realm, people have developed various ways to decompose and cluster
constraint problems. Major decomposition methods include conjunctive decomposition [31], dis-
junctive decomposition [31], tree clustering [22], etc. These methods make use of the structures of
constraint graphs to decompose a CN and compose the results. Our work is different. First, our
method is not based on traditional constraint graphs. The edges in our CNF graph do not represent
concrete logical relations, and the nodes do not represent variable-value pairs. Second, we use the
dominance relation to cut edges from a CNF graph. The dominance relation models a hierarchical
structure determined by the software architecture, not captured by logical relations among design
dimensions. By contrast, most constraint decomposition methods work on inconsistent variable-
value pairs, such as Choueiry’s work [20].
Our human design activity modeling using the dominance relation and clustering makes our
work different from most of the pure constraint solving techniques. In addition, the purpose of these
methods are to find optimal designs; ours, in contrast, is dependence and evolvability analysis. On
the other hand, our techniques are orthogonal to these constraint solving methods. These techniques
can be used to improve the performance of solving a full ACN, or to work on each sub-ACN after
decomposition.
There are many bottom-up automatic clustering approaches to automatically discover clusters
Page 91
Chapter 5. Model Decomposition and Result Integration 75
from source code, such as the work of Belady and Evangelisti [12], Hutchens and Basili [39],
Schwanke [56], and Mancoridis [52]. In contrast to this work, our method does not require the
existence of source code. Instead, a system is decomposed based on an abstract design model.
Feature oriented design analysis treats each feature as a whole. FODA tools, such as
AHEAD [10, 9], aggregate all the base code that related to a feature into a module. Which part
of the source code belongs to which feature is determined manually. Our approach works at a
higher level, decomposing a system automatically according to its underlying logical structure.
Aspect-oriented programming promises to localize concerns, with the assumptions that these
crosscutting concerns can be captured using built-in pointcut designators. However, not all concerns
are syntactically related, such as a feature that crosscuts hardware, database, algorithm, etc. Our
work does not have such syntactic-based assumptions.
5.5 Chapter Summary
In summary, this chapter addressed the inefficiency problem caused by the brute-force technique.
We presented our approach to make use of the non-trivial dominance relation in an ACN model,
splitting an ACN at natural breaking points into a number of smaller sub-ACNs, solving each sub-
ACN individually, and integrating partial results to produce the desired answer more efficiently.
We evaluated the hypothesis that this approach has the potential to dramatically reduce analysis
time at least for problems with tractable state space sizes against the canonical KWIC designs. The
experiment demonstrates dramatic performance improvement, justifying the potential utility of this
approach.
Page 92
Chapter 6
Model Extension and Structural Design Impact Analysis
Previous chapters have shown how this framework connects conceptual designs modeled by ACNs
to existing evolvability and economic analyses. However, as a conceptual design description model,
a flat ACN — which is to say, a model involving a fixed number of dimensions with a fixed number
of choices in each dimension — is known to be insufficient to capture key aspects of real software
design problems.
First, an ACN model only has scalar-valued variables. Fred Brooks, among other, has recog-
nized that such a simple and traditional design space model is inadequate to capture the complexity
of many real-world software design problems [17]. As Baldwin and Clark point out [7], some de-
sign dimensions are “called into being” by other decisions. Scalar variables are not sufficient to
model these dimensions and their impacts. Second, it is not uncommon that a decision not only
brings up new dimensions, but also new constraints among these new dimensions, or constraints
between new dimensions and existing dimensions. For example, a choice of design patterns not
only brings new dimensions that are specific to the pattern, but also imposes pattern-specific con-
straints to new and existing design dimensions. Third, design decisions can crosscut each other. For
example, we need a more expressive model to represent such decisions as “all the objects taking
the subject role should implement the prevailing policy.” When a new object is added to the system
as a subject, as part of the impact analysis, the designer should be aware of the notification policy
in use, and of other constraints imposed by the choice of observer pattern. These complex design
76
Page 93
Chapter 6. Model Extension and Structural Design Impact Analysis 77
decisions have structural impacts. Capturing their existence explicitly is necessary for analysis of
their impacts.
This chapter contributes a richer model which we call the complex augmented constraint net-
work (CACN) to address these inadequacies. Our approach is to model these complex decisions
using set values and subspace values, and model crosscutting constraints using universally quan-
tified logical expressions. To analyze the impact of these decisions, the user first parameterizes
the extended model into simpler models represented by ACNs. After that, the user can compare
the resulting designs using the analysis techniques developed on ACNs. We illustrate the extended
model using Hannemann and Kiczales’s Figure Editor (FE) [38], which has been used as a repre-
sentative example in a large number of publications to demonstrate various problems, especially the
problems related to aspect-oriented programming [43, 37, 33].
We claim that (1) our framework is expressive enough to capture representative design deci-
sions exemplified by the FE design, such as the choices of a design pattern, or the choice of pattern
implementation paradigm (AO or OO); (2) our framework is general enough to account for the
aspect-oriented and objected modularity in uniform, declarative terms; and (3) our framework auto-
mates Hannemann and Kiczales’s analysis precisely. We evaluate the first two claims by modeling
both the OO and AO designs the authors described in their paper; we evaluate the last claim by com-
paring our automatically generated results with the authors’ qualitative analysis. Our experiment
provides positive evidence to support these claims.
Section 6.1 introduces the FE example. Section 6.2 through Section 6.4 present our extended
model. Section 6.5 introduces how to parameterize a CACN into a number of ACNs. Section 6.6
shows how to analyze the impacts of high level design decisions that incur structural changes, which
we call Structural Design Impact Analysis (SDIA).
6.1 Figure Editor Example
Figure 6.1 shows the Figure Editor (FE) design modeled using the Unified Modeling Language
(UML), a de facto industry standard. The Figure Editor is a tool for editing drawings comprising
Page 94
Chapter 6. Model Extension and Structural Design Impact Analysis 78
�����������
�����������
����� � ����� � ���
����������
����������
����� � ��� � ����
�����
�����
����� � �
���������� �
��������������������
���� �����������������
�� ������
�����������
��� �
���������� ���
���������� ���
����� � ����� � ���
�������� ����
�������� ����
����� � ��� � ����
����� �
����� �
���� � �
�������� �
���������
�������� ���
� ������������
��������
�����������
�������
Figure 6.1: OO Observer Pattern UML Class Diagram
points and lines (figure elements), where a screen displays each figure element, always reflecting
the figure elements’ current states [43]. Figure 6.1 presents a UML class diagram model of one
possible design based on the observer pattern [33]. The Subject class serves as an abstract interface
for the concrete subjects: Point and Line. The Observer class provides an abstract interface for the
concrete observer: Screen.
Gamma et al. [33] mentioned and analyzed several important design dimensions and the dif-
ferent consequences of making different decisions in these dimensions. For example, notification
policy is a design dimension in which various design decisions can be made. Possible choices for
the notification policy include a pull or a push model [33]:
At one extreme, which we call the push model, the subject sends observers detailed information
about the change: whether they want it or not. At the other extreme is the pull model: the subject
sends nothing but the most minimal notification, and observers ask for details explicitly thereafter.
Another design dimension concerns the choice of the data structure used for the mapping from
subjects to observers, such as a hash table. Third, the update policy could be complex enough that
a change manager might be needed.
Hannemann and Kiczales [38] identified additional but implicit design dimensions, such as
the role assignment, which demands decisions about which objects are observers and which are
Page 95
Chapter 6. Model Extension and Structural Design Impact Analysis 79
subjects. They compared their aspect-oriented (AO) design of the observer pattern with the object-
oriented (OO) observer pattern in terms of the changeability in these dimensions. Their analysis
focused on answering evolvability questions such as the following: what are the consequences if
the client changes the role assignment, requiring the Screen to be both a subject and an observer?
Which parts should be changed? In the current design, the subject color is the only state of interest
under observation. What if the observing policy changed so that the positions of the figure elements
should also be observed? The authors analyzed these problems descriptively and showed the code
implementing these choices as the evidence of their analysis. However, designers frequently face
questions of the like before coding.
We basically observed similar problems from UML models as we observed in the architec-
tural description method: while UML notations better represent object-oriented program structures,
they do not provide effective ways to represent such design decisions as the choice of role assign-
ment, choice of mapping data structure, choice of observing policy, choice of pattern, or choice
of paradigm. These choices have profound impacts on the design coupling structures that strongly
influence crucial design quality attributes, such as evolvability, the best way to accommodate given
changes, and the economic value of flexible design architecture. In addition, the FE example exem-
plifies two additional problems:
First, a decision at one level often alters a design space structure by introducing new variables
and constraints. For example, the interaction between the Screen and the figure elements can also be
designed with other patterns, e.g., a mediator pattern, which in turn can use either an AO or an OO
paradigm [38]. The choices in the pattern and paradigm dimensions have significant consequences
in that each choice calls into being a different design subspace that introduces both new dimensions
and constraints that are potentially scoped over other variables in the design. The structure of a
design space is thus not fixed but is, in general, contingent on prior decisions and recursive in struc-
ture. State-of-the-art design modeling approaches do not adequately represent this phenomenon.
Consequently, it is difficult to analyze the structural and economic consequences of making such
high-level design decisions.
Second, the effects of design decisions are frequently not local but crosscutting. For example, all
Page 96
Chapter 6. Model Extension and Structural Design Impact Analysis 80
the subjects have to respect the agreed notification policy, push or pull. Prevailing design modeling
techniques do not adequately represent design decisions with crosscutting effects. Consequently,
it is difficult to have a clear picture of the structural and economic consequences of making or
changing such crosscutting design decisions.
Three new elements extend the ACN model: (1) Set-Valued Design Variables modeling di-
mensions in which each choice is a set of other dimensions; (2) Quantified constraints modeling
crosscutting relations among decisions; (3) Hierarchical Design Variables modeling dimensions in
which each choice is a sub-design with new dimensions and constraints. We call this extended
formal model the Complex Augmented Constraint Network (CACN). Figure 6.2 shows the Figure
Editor CACN model. The next section explains the extended model.
6.2 Set-Valued Design Variables
In many cases, the choice in one dimension can be a set of other variables, each itself potentially
designating a complex dimension in the design space. We model such design dimensions as set-
valued design variables (SDVs), and each choice (value) as a named set.
In the Figure Editor example, the decision about what elements a figure editor system should
contain is a set of new design dimensions, such as {Point, Line, Screen}. We use the variable
elements to model this dimension. Each decision in this dimension brings into being a set of new
design dimensions, in this case, point, line, and screen, each modeled as a scalar variable with
the same domain: (orig, other).
Line 1 in Figure 6.2 demonstrates how Simon models this SDV in its internal language:
set elements(orig, other): (v1{point, line, screen}, other). The shared domain
is defined after the SDV name. Each set value has a name, followed by a set of elements within a
pair of curly brackets. This line says that v1 is a decision value that is a set with three elements:
point, line, screen. Similar with other variable definitions, we use other as a value to repre-
sent unelaborated possibilities.
For another example, Hannemann and Kiczales point out an observing policy dimension, which
Page 97
Chapter 6. Model Extension and Structural Design Impact Analysis 81
1: set elements(orig, other):(v1{point, line, screen},other);2: set subject_role(*elements):(v1{point, line}, v2{point, line, screen}, other);3: set observer_role(*elements):(v1{screen}, other);4: set policy_observing(orig, other):(v1{color}, v2{color, position}, other);5: scalar policy_notify:(push, pull);6: scalar policy_update:(orig, other);
7: scalar d_mapping:(hashtable, other);8: subspace d_paradigm: (OO, AO);
9: d_paradigm_OO[10: scalar adt_observer:(orig, other);11: scalar adt_subject:(orig, other);12: adt_subject = orig => d_mapping = orig && adt_observer = orig&& policy_notify = push;
13: ˜observer_role = orig => adt_observer = orig && policy_update = orig;14: ˜subject_role = orig => adt_subject = orig && ˜policy_observing = orig;15: ];
16: d_paradigm_AO[17: scalar abstract_protocol_interface:(orig, other);18: scalar abstract_protocol_impl:(orig, other);19: set concrete_protocol(orig, other): %policy_observing;20: %policy: policy_observing, con: concrete_protocol% |con = orig => policy = orig;
21: abstract_protocol_impl= orig => abstract_protocol_interface = orig&& d_mapping = hashtable && policy_notify = orig;22: ˜concrete_protocol = orig => abstract_prototcol_interface = orig&& ˜subject_role = orig && ˜observer_role = orig && policy_update = orig;];
Figure 6.2: Figure Editor CACN Model
Page 98
Chapter 6. Model Extension and Structural Design Impact Analysis 82
we model as policy_observing, and one of its variations: in one design, only the colors of figure
elements are observed. In another design, their positions are observed too. Line 4 in Figure 6.2
models this variable. Each value in the domain of policy_observing models one possibility.
This is not the only way that a decision can bring up a new dimension. What we call the one-
to-one correspondence relation among design dimensions is another way that a new decision can
be called into being: in a networking system, each supported protocol should have a corresponding
processing module; in a fault tree analysis tool, each type of node should have a corresponding
shape to denote it visually; etc.
In the aspect-oriented Figure Editor design introduced by Hannemann and Kiczales [38], each
dimension to be observed (modeled by policy_observing), color or coordination, incurs a cor-
responding concrete aspect protocol, concrete_protocol. We use ” % ” to denote that the choice
of concrete_protocol is brought up by this bijective mapping: each dimension to be observed is
taken care of by a concrete protocol, as shown in Line 20 of Figure 6.2.
In Hannemann and Kiczales’s paper [38], the initial FE design assigns the subject role to Point
and Line, and the observer role to Screen. They mention a variation that a screen can also take
the subject role. We notice that the decisions on the subject role and observer role dimensions do
not bring into new design dimensions. Instead, they refer to existing design dimensions in order to
impose constraints on them. We still model such a dimension as a SDV, but specify the referred
dimension in the parenthesis following the SDV name: set subject_role(*elements). The *
before the referred variable name tells Simon that this SDV does not bring new dimensions, but
refers to existing design dimensions defined in elements. Lines 2 and 3 in Figure 6.2 model the
subject role and observer role dimensions, their original state, and the variation.
6.3 Crosscutting Design Dimensions
In general, a decision in one dimension can have complex interactions with decisions in other di-
mensions: either taking them as assumptions or constraining them. The constraints imposed by
design decisions can be pervasive: they can be system-wide and crosscutting. We have found uni-
Page 99
Chapter 6. Model Extension and Structural Design Impact Analysis 83
versal quantification to be useful in capturing the crosscutting phenomena among design decisions.
For example, in the OO observer pattern FE design, the abstract subject interface influences all the
concrete subjects, and the observing policy specifying which states should be observed influences
all the subjects. These constraints can be logically modeled as: ∀subject : subject role • subject =
orig⇒ (adt subject = orig ∧ (∀policy : policy observing • policy = orig))
Line 14 in Figure 6.2 demonstrates how Simon models this constraint, in which “ ˜ ” is shorthand
for universal quantification. The notation in which we present our examples is not a fully developed
relational logic, but rather is the result of our introducing mechanisms as the need has arisen.
6.4 Nested Design Subspaces
The structure of a design could be recursive in the sense that a decision in one dimension can intro-
duce new dimensions in which design decisions have to be made, as well as new constraints. The
set-valued variables in CACNs represent such design dimensions, but with simpler form: each di-
mension within a set has the same domain, and there are no constraints specified. A design decision
can introduce more complex subspaces with new dimensions, each having different domains and
constraints affecting decisions both within and outside of the set of variables of the new subspace.
We capture the recursive nature of a design by the notion that a value can carry a subspace,
which we call a subspace value. We model a design dimension with subspace values as a hier-
archical design variable (HDV), and we model each subspace recursively as a CACN. For the FE
example, the programming paradigm choices for the observer pattern can be modeled as an HDV:
subspace d_paradigm: (OO, AO).
Line 9 through Line 15 in Figure 6.3 model the OO subspaces; Line 16 through Line 22
model the AO subspaces. We observe that the OO subspace introduces two abstract interfaces:
adt_observer and adt_subject. These subspaces also introduce new constraints among deci-
sions. Line 21 in Figure 6.2 shows that, in the AO design, the abstract protocol implementation
makes assumptions about the mapping data structure decision and the notification policy. For the
detailed AO observer design, please refer to Hannemann and Kiczales’s paper [38].
Page 100
Chapter 6. Model Extension and Structural Design Impact Analysis 84
�����������
��� �������������� ����������������
�������������
������� ���������������������� ����
���
� ������������������ !����������"# ��������������� !����������"#
$ ����������%������%&�
� �������%������''������������%������''�����������(��%�����#
) �*��������������%������%&������������%������''�������������%�����#
+ �*������������%������%&����������%������''�*�����������������%�����#
� ���������������������������(��� !����������"#
$ ������������������������ �� !����������"#
) ��������������������!����������" �, ����������������#
+ �,������ ���������������������� ����������������, �-�����%������%&��������%�����#
. ����������������� ��%������%&�
�������������������(����%������''�� �������%����������''�����������(��%�����#
/ �*����������������%������%&��������������������(����%������''�*������������%������
''�*��������������%������''�������������%�����#
01
01
� ���������������(�������� !����������"#
$ ������������������������ !����������"#
) �������� ������ !��������������"#
���2������ ����
01
����������������$�������������������������
01
�����������$��������������������� 01
��
��
3����
�$
)
Figure 6.3: Complex Augmented Constraint Network
Page 101
Chapter 6. Model Extension and Structural Design Impact Analysis 85
6.5 Parameterizing CACN
Changing high level design decisions, such the value of SDVs and HDVs incurs structural impacts
that we are interested to analyze. For example, if a SDV has a new value in which a new dimension
is added, for example, the FE now takes Circle as a new element, the impact analysis should
be able to identify the constraints that the Circle variable has to respect. If an HDV modeling
design patterns changes from one pattern to another, the impact analysis should be able to compare
the different coupling structures caused by choosing different patterns, and compare which pattern
could better accommodate anticipated changes.
This section explains how our CACN model supports these structural design impact analyses
(SDIA). The basic idea is to parameterize a CACN into a set of design alternatives, instantiate each
design alternative into a flat ACN, and compare the structural differences of these resulting ACNs.
6.5.1 Design Alternatives
Each value of a variable in an ACN or CACN defines an alternative choice of that dimension. For
a SDV or HDV value, such choice is a sub-structure. A design of an ACN or CACN can be seen
as the combination of these alternatives. Parameterizing an ACN or CACN involves selecting one
alternative from each dimension, that is, binding a value to each variable, and combining them to
form a design, which we call a design alternative. Our structural design impact analysis begins by
parameterizing a CACN into a set of design alternatives.
For the purpose of illustration, we model the FE CACN as an and-or tree depicted in Figure 6.3.
We put all the SDVs and HDVs with specified value alternatives as the entries of the root AND node.
We also consider all other variables and constraints as a Basic sub-design, making it the first entry
of the AND node. Each SDV or HDV leads an OR node with each value, considered as an alterna-
tive subspace, as an entry. The FE CACN thus can be formalized as:
FE = Basic ∧
(elements = v1 ∨ elements = other) ∧
Page 102
Chapter 6. Model Extension and Structural Design Impact Analysis 86
(observer role = v1 ∨ observer role = v2 ∨ observer role = other) ∧
(subject role = v1 ∨ subject role = other) ∧
(policy observing = v1 ∨ policy observing = v2 ∨ policy observing = other) ∧
(d paradigm = OO ∨ d paradigm = AO)
We reform the above formula into the disjunctive normal form (DNF), assign a name to each
clause, and use each clause as an alternative design. There are 72 alternative designs in the design
space defined by the FE CACN: FE = FE0 ∨ FE1 ∨ ... ∨ FE71. We define:
FE1 = Basic ∧ elements = v1 ∧ observer role = v1 ∧ subject role = v1 ∧
policy observing = v1 ∧ d paradigm = OO
Simon allows the user to designate a value for each SDV and HDV, and automatically translates
the specified CACN design alternative into a new simple ACN.
6.5.2 Instantiate ACNs
Figure 6.4 shows the ACN instantiated from design alternative FE1. Instantiating a CACN design
alternative into a flat ACN involves the following steps:
1. Replace each SDV with a set of scalar variables.
For example, for elements = v1{point, line, screen}, Simon generates a set of new
scalar variables shown in Line 1 to Line 3 in Figure 6.4. The name of each new scalar
variable is the combination of the variable name and the value element name; the do-
main of each scalar variable is copied from the original CACN definition. Similarly,
policy_observing = v1{color} becomes Line 7 in Figure 6.4. For SDVs that only re-
fer to other variables, such as subject_role and observer_role, Simon just internally
stores the variables they refer to, and processes the constraints imposed as shown in step 3.
For SDVs defined by a one-to-one correspondence relation, such as:
Page 103
Chapter 6. Model Extension and Structural Design Impact Analysis 87
set concrete_protocol(orig, other): %observing_policy;
According to the decision observing_policy = v1{color} in FE1, Simon implicitly adds
a new value, v1{color}, into the domain of concrete_protocol, and stores the bijection
mapping internally. After that, Simon generates the following new variable:
scalar color_concrete_protocol:(orig, other);
2. Replace each hierarchical design variable with the sub-structure associated with the desig-
nated value. For example, d_paradigm = OO in FE1 is replaced with the OO subspace shown
in box 2, Figure 6.3, which introduces two new variables: adt_observer and adt_subject,
as shown in Line 8 and Line 9 in Figure 6.4. If an HDV value defines a structure with new
SDVs or HDVs, we just need to repeat these steps recursively.
3. Remove universal constraints. Once all the SDVs are replaced with scalar variables, univer-
sal quantifications can be replaced with simple logic. For example, we have shown that Line
14 in figure 6.2 models the following constraint: ∀subject : subject role • subject = orig ⇒
(adt subject = orig ∧ (∀policy : policy observing • policy = orig))
Simon first translates it into a normal form:
(¬∃subject : subject role • subject = orig) ∨ (adt subject = orig ∧ (∀policy :
policy observing • policy = orig)
Suppose the value of subject_role has been designated to v1{point, line}, and
policy_observing = v1{color},
Simon replaces the quantified expression
(¬∃subject : subject role • subject = orig)
with a conjunctive expression:
point subject role 6= orig ∧ line subject role 6= orig
Since subject_role refers to variables defined in elements, Simon replaces
point_subject_role with point_elements, a variable existing in the generated ACN.
Page 104
Chapter 6. Model Extension and Structural Design Impact Analysis 88
1: scalar point_elements:(orig,other);2: scalar line_elements:(orig,other);3: scalar point_elements:(orig,other);
4: scalar policy_notify:{push,pull};5: scalar policy_update:{orig,other};6: scalar d_mapping:{orig,other};
7: scalar color_policy_observing:{orig,other};
8: scalar adt_observer:{orig,other};9: scalar adt_subject:{orig,other};10: (point_elements != orig && line_elements != orig) ||(adt_subject = orig && color_policy_observing = orig);11: screen_elements != orig || (adt_observer = orig && policy_update = orig);
12: adt_subject = orig => d_mapping = orig && adt_observer = orig && policy_notify = push;
Figure 6.4: The Constraint Network in an ACN Generated by Design Alternative FE1
Similarly, (∀policy : policy observing • policy = orig) is replaced with:
color policy observing = orig. As a result, the above quantified expression is translated into
Line 10 in Figure 6.4.
After binding a value for each SDV and HDV, Simon generates a plain ACN with only scalar
variables and non-quantified constraints.
6.6 Structural Design Impact Analysis Overview
In their paper [38], Hannemann and Kiczales compared the differences of using OO and AO to
implement the observer pattern. In this section, we formulate this analysis as a decision problem
defined on a CACN: if d_paradigm changes from OO to AO, what is the impact on the design
coupling structure? Subsection 6.6.1 analyzes this problem. Subsection 6.6.2 analyzes the impact
of changing the role assignment so that the Screen is both a subject and an observer, a problem
Hannemann and Kiczales analyzed descriptively in their paper. Subsection 6.6.3 analyzes another
problem they mentioned: What if the observing policy changed so that the positions of the figure
elements should also be observed in addition to the colors? Hannemann and Kiczales analyzed
Page 105
Chapter 6. Model Extension and Structural Design Impact Analysis 89
1: color_policy_observing:{orig,other};2: policy_notify:{push,pull};3: policy_update:{orig,other};4: d_mapping:{orig,other};5: abstract_protocol_impl:{orig,other};6: point_elements:{orig,other};7: line_elements:{orig,other};8: screen_elements:{orig,other};9: color_concrete_protocol:{orig,other};10: abstract_protocol_interface:{orig,other};11: color_concrete_protocol = orig => color_policy_observing = orig;12: color_concrete_protocol = orig => abstract_protocol_interface = orig&& point_elements = orig && line_elements = orig&& screen_elements = orig && policy_update = orig;13: abstract_protocol_impl = orig => abstract_protocol_interface = orig&& d_mapping = orig && policy_notify = push;
Figure 6.5: The Constraint Network in an ACN Generated by Design Alternative FE2
these problems descriptively and showed the code implementing these choices as the evidence of
their analysis. This section shows the automation of their analyses at design level.
6.6.1 OO Pattern Versus AO pattern
To analyze the impact of changing a structural decision on a hierarchical or set-valued design di-
mension, the user needs to designate two design alternatives for the changed and original decisions.
For example, we define FEOO = FE1, specifying the original OO design, for which Simon generates
a DSM as shown in Figure 6.6. Now the decision on d_paradigm changes from OO to AO, leading
to a new design alternative, FEAO:
FEAO = Basic ∧ elements = v1 ∧
observer role = v1 ∧ subject role = v1 ∧
policy observing = v1 ∧ d paradigm = AO
In order to compare the change impact, we use Simon to generate a new ACN for FEAO, shown
in Figure 6.5, and derive the DSMs of FEOO and FEAO, as shown in Figure 6.6 and Figure 6.7. We
observe that the AO design has fewer dependence marks. In the AO design, the decisions on the
Page 106
Chapter 6. Model Extension and Structural Design Impact Analysis 90
Figure 6.6: The DSM of FE OO: OO Figure Editor Design
notification and update policies no longer influence these concrete subjects, such as the Point and
Line implementations. Instead, only the abstract and concrete protocols depend on these policies,
indicating the localization of crosscutting decisions.
We can also analyze the respective consequences of changing common design decisions for two
design alternatives, for example, the consequences of changing the notification policy from push
to pull in both AO and OO designs. Figure 6.8 shows two Simon DIA snapshots, comparing the
change impacts in both designs. Simon shows that in the OO design, three variables other than
policy_notify have to be revisited, while in the AO design, only one other variable should be
revisited. Counting the number of variables affected by a change in a decision is clearly insufficient
to determine costs of change. However, identifying what must change is a critical step, and we hy-
pothesis that our analysis can be combined with traditional methods of cost estimation for changes
in individual design decisions to supporting economic reasoning.
6.6.2 Change Role Assignment
This section shows how Simon reveals the different consequences of changing the value of
subject_role from v1{Point, Line} to v2{Point, Line, Screen} in an OO design and AO
design respectively. We define:
Page 107
Chapter 6. Model Extension and Structural Design Impact Analysis 91
Figure 6.7: The DSM of FE AO: AO Figure Editor Design
Figure 6.8: DIA: Notification Policy Change Impacts
Page 108
Chapter 6. Model Extension and Structural Design Impact Analysis 92
Figure 6.9: The DSM of FEOO Role: Screen takes the subject role
FEOO Role = Basic ∧ elements = v1 ∧
observer role = v1 ∧ subject role = v2 ∧
policy observing = v1 ∧ d paradigm = OO
and: FEAO Role = Basic ∧ elements = v1 ∧
observer role = v1 ∧ subject role = v2 ∧
policy observing = v1 ∧ d paradigm = AO
We first observe the impact of changing this decision on the AO design by comparing DSMs.
We found that the FEAO Role DSM is exactly the same as the DSM of FEAO, as shown in Figure 6.7.
It means that the AO design localizes this change completely without incurring any additional de-
pendencies or new dimensions.
The impact on the OO design is different. Figure 6.9 shows the DSM of FEOO Role. Comparing
with the DSM of FEOO shown in Figure 6.6, we observe that screen_elements now depends
on color_policy_observing, policy_notify, d_mapping and adt_subject. The coupling
structure changed because of this new decision.
Page 109
Chapter 6. Model Extension and Structural Design Impact Analysis 93
Figure 6.10: The DSM of FEOO Position: Positions are observed in OO design
6.6.3 Change Observing Target
This section shows how Simon reveals the different consequences of changing the value of
policy_observing from v1{color} to v2{color, position} in an OO design and AO design
respectively. We define:
FEOO Position = Basic ∧ elements = v1 ∧
observer role = v1 ∧ subject role = v1 ∧
policy observing = v2 ∧ d paradigm = OO
and: FEAO Position = Basic ∧ elements = v1 ∧
observer role = v1 ∧ subject role = v1 ∧
policy observing = v2 ∧ d paradigm = AO
Figure 6.10 shows the DSM of FEOO Position. Comparing with the DSM of FEOO shown in
Figure 6.6, we observe that a new position_policy_observing is added and it influences existing
variables: point_elements and line_elments.
Figure 6.11 shows the DSM of FEAO Position. Comparing with the DSM of FEAO shown in
Figure 6.7, we observe that two new variables are added: position_policy_observing and
position_concrete_protocol. The newly added dependences do not affect the existing structure
Page 110
Chapter 6. Model Extension and Structural Design Impact Analysis 94
Figure 6.11: The DSM of FEAO Position: Positions are observed in AO design
(boxed within the darker lines).
In summary, these DSMs confirm Hannemann and Kiczales’s qualitative analyses, revealing
how these high level decisions impact the design coupling structure visually and precisely.
6.7 Related Work
Software design space, product family, and variability modeling and has been widely studied by
Bosch [59], Lane [45], and others for product line design, design optimization and for other pur-
poses. Using logic to model features is not new.
Batory and O’Malley’s work specifies the relations among features using constraints [9, 8, 10].
Czarnecki et al.’s feature model [21] uses feature diagrams to represent the design variations. Sim-
ilar to our and-or-tree representation, they represent features as a hierarchical structure, and derive
a concrete configuration by selecting and cloning options. Feature models often need to contain
additional constraints mainly to express which options have to be co-existent, and which options
are exclusive from each other. Our approach is more general in that we not only model and analyze
features, which are one kind of design decisions, but also broader decisions such as refactoring
options, design patterns, and aspects. The impact of these decisions is beyond the inclusion or ex-
clusion of feature options. Our purpose differs in that we aim to analyze the modular structures in
Page 111
Chapter 6. Model Extension and Structural Design Impact Analysis 95
design architectures, and their economic implications, while they aim to analyze feature properties.
On the other hand, we suspect that our approach is general enough to analyze the problems in their
realms.
Automatic program variations have also been widely studied, such as Batory’s work on generic
programming [11] and Goguen’s work on parameterized programming [36]. While their purpose
is to synthesize complex software systems from libraries of reusable components, our purpose is to
rigorously support modularity analysis and decision-making.
Similar to our design space modeling, Lane [45] models the structure of software systems as
design spaces by identifying the key functional choices, and classifying the alternatives available
for each choice. Their notions of rules, similar to our constraints, are formulated to relate choices
within a design space. Their purpose is to automatically select an optimal design. Our approach
is more general in that we model broader decision-making phenomena and their impacts, such as a
decision that brings up a subspace, the dominance relations among decision decisions, and different
modularization approaches. Our logical constraints are different and more expressive than their
semi-formal rules (guidelines), and our approach enables the comparison of designs in terms of
their modular structures.
Jackson [40] uses Alloy for object modeling with the goal of being able to check structural prop-
erties of object models specified using the Alloy relational logic. Alloy is thus related to our work
in several ways: it uses logic to model designs, and it supports formal analysis of specifications. In
fact, our Simon tool uses Alloy internally to analyze ACNs [18]. Unlike our work, however, Al-
loy has mainly been developed and used to specify formal properties and check complex relational
object structures, whereas our work aims to enable the specification of design decision spaces and
the analysis of properties such as evolvability under changes to given decisions and the net option
value of modularity.
Traditional impact analysis research focuses on change issues at program level, as summarized
in [4]. Advantages of our approach include a precise semantics of dependence, and the ability
to reason about the ripple effects of changes in high-level design decisions. We have provided a
precise notion of impact analysis for logical design models.
Page 112
Chapter 6. Model Extension and Structural Design Impact Analysis 96
Aspect-Oriented Software Development (AOSD) researchers [43, 68] have realized the limi-
tation imposed by traditional OO design and contributed language constructs addressing multidi-
mensional and cross-cutting issues. In addition, as Filman pointed out [30], quantification is a key
feature of AOP. However, the logical representation of crosscutting structures at design level has
not been fully developed in earlier work, and is a key target of the current work.
6.8 Chapter Summary
Using the representative Figure Element example, this chapter presents an extended ACN model to
address the problem that ACN modeling is not adequate to capture complex design decisions and
to analyze their structural impacts. We evaluated and provided evidence in support of the claims
that our framework is able to capture representative complex design decisions, uniformly account
for aspect-oriented and object-oriented modularity, and automate the analyses of problems people
previously analyzed qualitatively.
Page 113
Chapter 7
Formalization
Parnas’s information hiding criterion has been influential for decades [54]. Baldwin and Clark’s de-
sign rule theory [7] has shed additional light on the value of design modularity. Sullivan et al. [64]
showed that Baldwin and Clark’s model can be extended with environment parameters to account
for Parnas’s information hiding criterion. However, these theories remain informal, and conse-
quently unnecessarily hard to understand and hard to apply with rigor and precision. This chapter
addresses these problems by contributing a formalization of our framework that accounts for the
key notions within these theories, and that enables automation of the corresponding analysis tech-
niques. In Chapter 5, we introduced our divide-and-conquer approach to addressing the scalability
issue. In this chapter we formalize this approach and prove its correctness.
We claim that our framework is sufficient to formally account for: (1) Parnas’s criterion of
information hiding modularity; (2) Parnas’s approach to analyzing the changeability of a design;
and (3) Baldwin and Clark’s concepts of design dimensions, design decisions, design spaces, de-
sign rules, and design dependences, which, in turn, are the rigorous foundation for Baldwin and
Clark’s net option value analysis. We evaluate these claims by showing that all these concepts and
approaches can be formally defined in terms of augmented constraint networks, design automata,
and pair-wise dependence relations. We also claim that the divide-and-conquer approach produces
the same analysis results as the brute-force approach. In this chapter, we prove this claim formally.
We present our formalization using the Z (pronounced Zed) specification language [60].
Section 7.1 formalizes our core models. Section 7.2 presents how previous theories can be
97
Page 114
Chapter 7. Formalization 98
formalized within the setting of our framework. Section 7.3.1 presents the formalization of our
divide-and-conquer approach and the proof that this approach is correct.
7.1 Formalizing the Core Models
This section formalizes the notions of finite-domain constraint network (FDCN), augmented con-
straint network (ACN), design automaton (DA), pair-wise dependence relation (PWDR), and the
derivation of DAs and PWDRs from ACNs.
7.1.1 Finite Domain Constraint Network
Because we intend to formalize our ideas using the Z language, we need a formalization of finite-
domain constraint networks (FDCNs) expressed in Z. We have thus developed a formalization of
Tsang’s quasi-formal model of constraint networks [69] in Z.
We first abstractly specify variables and values as given sets.
[Variable,Value]
The domains of variables are specified as a relation between variables and values.
Domains
domain : Variable↔ Value
In a valid assignment, each variable takes values from its domain. The following schema
states that an Assignment has a domain, and the function bindings maps variables to their values.
Line 7.1.1 ensures that the value assigned to a variable must respect its domain.
Page 115
Chapter 7. Formalization 99
Assignment
Domains
bindings : Variable 7→ Value
∀var : dombindings • bindings(var) ∈ domain(| {var} |) (7.1.1)
In a FDCN, a constraint is modeled as a set of permitted assignments to the variables to which
it applies, as formalized in the following schema. For the matrix example, if the domain of matrix is
{dense,sparse}, and the domain of ds is {array ds,other ds}, then the constraint ds = array ds⇒
matrix = dense is modeled as the following permitted assignments: {{ds = array ds, matrix =
dense}, {ds = other ds, matrix = dense}, {ds = other ds, matrix = sparse}}.
Constraint
Domains
AssignmentsAllowed : FAssignment
∀allowed : AssignmentsAllowed • allowed.domain = domain
The following schema specifies the notion of FDCN, which consists of a finite set of variables,
their domains, and a set of constraints. Line 7.1.2 ensures that the domain is defined over the FDCN
variable set. Line 7.1.3 ensures that the constraints constrain the FDCN variable set. The function
solutions maps a FDCN to a set of assignments that are the solutions of the constraint network.
Line 7.1.4 ensures that each solution is an assignment, and its domain is the whole variable set of the
constraint network; Line 7.1.5 ensures that for any constraint, there exists a permitted assignment
that is the subset of (consistent with) the given solution.
Page 116
Chapter 7. Formalization 100
ConstraintNetwork
VariableSet : FVariable
Domains
ConstraintSet : FConstraint
solutions : FAssignment
domdomain = VariableSet (7.1.2)
∀constraint : ConstraintSet • constraint.domain⊆ domain (7.1.3)
solutions = {solution : Assignment | solution.domain = domain ∧ (7.1.4)
(∀constraint : ConstraintSet • (∃allowed : constraint.AssignmentsAllowed •
(allowed.bindings⊆ solution.bindings)))} (7.1.5)
7.1.2 Augmented Constraint Network
As introduced in Chapter 3, we augment a constraint network with a dominance relation and a clus-
ter set, and call it an augmented constraint network. Since the cluster set component is only used
when the user needs to generate a DSM, we elide it in our core formalization. Thus, we formally
specify an ACN as a constraint network and a dominance relation, as the AugmentedConstraintNet-
work schema states.
AugmentedConstraintNetwork
cn : ConstraintNetwork
dominance : Variable↔ Variable
domdominance = cn.VariableSet (7.1.6)
randominance = cn.VariableSet (7.1.7)
We model a dominance relation as a binary relation with the semantics that: if (x,y) ∈
Page 117
Chapter 7. Formalization 101
dominance, changes to x cannot force changes to y. Constraints 7.1.6 and 7.1.7 state that the domi-
nance relation is defined on the variable set of the constraint network.
7.1.3 Design Automaton
The following schema specifies our Design Automaton (DA) model, and the constraints state the
following:
• (7.1.8) A DA consists of a set of states. Each state is an assignment.
• (7.1.9) The alphabet of a DA is a set of variable-value pairs.
• (7.1.10) In a DA, a transition models a change from an original design, leading to destination
states accommodating that change. In general, there are several ways to compensate for a
change, so a DA is nondeterministic. Accordingly, the transition function maps an assignment
(modeling a start design state) and a variable-value pair (modeling a change), to a set of new
assignments (the destination states).
• (7.1.11) The multiple destination states caused by the same change correspond to the differ-
ent ways to compensate for that change, each involving changes to a set of variables. The
function singleChangeSet summarizes the impact of one change: it maps a change (modeled
by a variable-value pair) starting from an original state (an assignment), to a group of variable
sets, each of which models a set of changed variables involved in one of the multiple ways to
accommodate this change ( 7.1.13).
• (7.1.12) The function allChangeSet specifies the impacts of each variable by summarizing all
the variable groups it causes to change starting in any design state, as specified in Line 7.1.14.
Page 118
Chapter 7. Formalization 102
DesignAutomaton
states : FAssignment (7.1.8)
alphabet : Variable↔ Value (7.1.9)
transition : (Assignment× (Variable×Value))→ FAssignment (7.1.10)
singleChangeSet : (Assignment× (Variable×Value)) 7→ (F(FVariable)) (7.1.11)
allChangeSet : Variable 7→ (F(FVariable)) (7.1.12)
dom(dom transition) = states
alphabet = (ran(dom transition))
(ran transition) = states
∀start : Assignment; var : Variable; value : Value • (7.1.13)
(start ∈ states ∧ (var,value) ∈ alphabet)⇒
singleChangeSet(start,(var,value)) = {varset : FVariable |
(∃end : transition(start,(var,value)) •
(varset = dom(end.bindings\ start.bindings)))
}
∀var : Variable; • allChangeSet(var) =S{changedVarSet : F(FVariable) | (7.1.14)
(∃sol : states; value : alphabet(| {var} |) •
changedVarSet = singleChangeSet(sol,(var,value)))
}
The function deriveDA specifies the derivation of a DA from a given ACN, making use of the
function computeDA.
Page 119
Chapter 7. Formalization 103
deriveDA : AugmentedConstraintNetwork → DesignAutomaton
∀acn : AugmentedConstraintNetwork; da : DesignAutomaton •
deriveDA(acn) = computeDA(acn.cn.solutions,acn.cn.domain,acn.dominance)
computeDA : FAssignment× (Variable↔ Value)
×(Variable↔ Variable)→ DesignAutomaton
∀solutions : FAssignment; domain : (Variable↔ Value);
dominance : Variable↔ Variable; da : DesignAutomaton •
computeDA(solutions,domain,dominance) = da⇒
da.states = solutions ∧ da.alphabet = domain ∧ (7.1.15)
(∀start : da.states; change : da.alphabet; endstates : Fda.states •
da.transition(start,change) = endstates⇒ (∀end : endstates •
(change ∈ end.bindings) ∧ (7.1.16)
(∀sub : F(Variable×Value) • sub⊂ (end.bindings\ start.bindings)⇒
replace(start,sub) /∈ solutions) ∧ (7.1.17)
(∀ forced : dom(end.bindings\ start.bindings) • (7.1.18)
forced /∈ (dominance(| {first change} |)))))
The function computeDA maps a set of solutions, a domain, and a dominance relation to a DA.
It does so by essentially finding the minimal transitions that respect the dominance relation.
• (7.1.15) The states of the DA are the given solutions, and the DA alphabet is the given domain.
• (7.1.16) All destination states accommodate the specified change. That is, the set of bindings
of each end state includes the change.
• (7.1.17) Each transition between states is minimal. That is, changing any proper subset of
the differences between the start state and the destination state will not lead to a valid solu-
Page 120
Chapter 7. Formalization 104
tion. The helper function, replace, assigns different values to a set of variables in an assign-
ment, and returns a new assignment. end.bindings \ start.bindings models the differences
between the start state and the destination state; sub is a proper subset of the difference;
replace(start,sub) leads the start design to a new assignment that differs by sub.
• (7.1.18) The dominance relation must not be violated. If (x,y) ∈ dominance, then among
all the possible ways to restore consistency in face of a change to x, those involving y are
excluded.
replace : (Assignment× (F(Variable×Value)))→ Assignment
∀ from, to : Assignment; changes : F(Variable×Value) •
replace(from,changes) = to⇔ to.bindings\ from.bindings = changes
7.1.4 Pair-wise Dependence Relation
The following schema specifies a pair-wise dependence relation (PWDR) among design variables:
PairWiseDependenceRelation
pairs : Variable↔ Variable
The function derivePWDR specifies the derivation of a pair-wise dependence relation (PWDR)
from an ACN. In essence, a PWDR is computed from a DA (7.1.19), specified by the function
computePWDR. Constraint 7.1.20 states that a pair (x, y) belongs to the PWDR if and only if y is
impacted by x.
Page 121
Chapter 7. Formalization 105
derivePWDR : AugmentedConstraintNetwork → PairWiseDependenceRelation
∀acn : AugmentedConstraintNetwork; pwdr : PairWiseDependenceRelation •
derivePWDR(acn) = pwdr ⇒
pwdr = computePWDR(deriveDA(acn)) (7.1.19)
computePWDR : DesignAutomaton→ PairWiseDependenceRelation
∀da : DesignAutomaton; pwdr : PairWiseDependenceRelation •
computePWDR(da) = pwdr ⇒
(∀pair : pwdr.pairs • (∃vargroup : da.allChangeSet(first pair) • (7.1.20)
(second pair) ∈ vargroup))
We have thus developed a formal definition of what it means for two design variables to be
coupled: if y is involved in some minimal compensation for some change to x, we define that y
depends on x. The DA encodes complete coupling information and a PWDR summarizes (but
loses) information in the DA.
7.2 Formalizing Previous Theories of Modularity
In this section, we show how our models formally account for previous modularity theories.
7.2.1 Parnas’s Theory
Sullivan et al. [64] present a novel characterization of the nature of Parnas’s information hiding
modularity as invariance of design rules with respect to changes in environment variables in a
Baldwin-and-Clark-style DSM. In this subsection, we formalize this notion by partitioning the vari-
ables of an ACN into environment, design rule, and hidden variables, and then by stating that no
design rule variable is affected by any change in any environment variable. To our knowledge, this
Page 122
Chapter 7. Formalization 106
work is the first to present a formal model of what it means for a design architecture to exhibit
information hiding modularity.
InformationHidingModularity
acn : AugmentedConstraintNetwork
Environment : FVariable
DesignRules : FVariable
HiddenVariables : FVariable
〈Environment,DesignRules,HiddenVariables〉 partition acn.cn.VariableSet (7.2.1)
∀pair : (derivePWDR(acn)).pairs •
¬ ((first pair) ∈ Environment ∧ (second pair) ∈ DesignRules) (7.2.2)
The schema InformationHidingModularity states that after partitioning the variable set of an
ACN into Environment, DesignRules, and HiddenVariables (7.2.1), if the ACN models an informa-
tion hiding modularized design, its derived PWDR relation should not have any pair with the first
element in an environment set, and the second in a design rule set (7.2.2).
Given a current design, what are all the ways to compensate for a sequence of given decision
changes? Parnas’s changeability analysis to find the ripple effects of a change can be inferred from
the answer to this more general question by comparing the feasible new designs with the orig-
inal design. The answer to this question has the potential utility, for example, to find the most
cost-effective way to accommodate a change. We specify this problem and its solution in the De-
signImpactAnalysis schema.
The function impact maps an original design and a sequence of changes to a set of evolution
paths, each comprising of a sequence of designs accommodating the changes. The last states of
these paths are the new designs that the original one could reach. Constraint 7.2.3 states that the
start design is the first state in the evolution path. Constraint 7.2.4 states that each transition step
Page 123
Chapter 7. Formalization 107
consumes a change, leading to a set of new designs preserving that change.
DesignImpactAnalysis
acn : AugmentedConstraintNetwork
impact : (Assignment× (seq(Variable×Value))) 7→ (F(seqAssignment))
∀start : acn.cn.solutions; changes : seq(Variable×Value) •
∀n : 1 . .#changes • changes(n) ∈ (deriveDA(acn)).alphabet ⇒
impact(start,changes) = {path : seqAssignment | start = path(0) ∧ (7.2.3)
(∀n : 1 . .#path • path(n) ∈
(deriveDA(acn)).transition(path(n−1),changes(n))) (7.2.4)
}
7.2.2 Baldwin and Clark’s Theory
Given these formalizations, we found that the following key concepts in Baldwin and Clark’s theory
are formally accounted for in our framework:
• Design dimensions: Variable.
• Design decisions: Value.
• Design spaces: the solutions of a CN, that is, the state set of a DA 1.
• Design dependences: the PWDR model.
• Design rules (in part): the dominance relation models a property of a design rule 2.
The derivation of a DSM from a PWDR becomes straightforward. The cluster set element of an
ACN consists of a set of clusterings. A clustering represents the organization of variables into an
1Technically speaking, the design space for a problem is the entire set of possible designs that address that problem.A DSM or ACN generally models only a sub-space. When we say that we formalize the notion of design space, we meanthat we have formalized an approach to modeling such sub-spaces.
2According to Baldwin and Clark, design rules dominate other design decisions, decouple otherwise dependent deci-sions, and remain stable. This dissertation does not intend to formalize the stable property.
Page 124
Chapter 7. Formalization 108
ordered tree, giving both a hierarchical clustering of variables and a linear ordering on the clusters
and variables within them. We elide the formal specification of clusterings in this dissertation. To
derive a DSM, we first compute the PWDR to populate a matrix, and then select a clustering to
order the columns and rows of the matrix, and to group them into a hierarchy of proto-modules.
Different DSMs can be derived from a given PWDR using different clusterings.
Given a derived DSM, the NOV of the structure is computed by (1) assigning values of the
parameter, technical potential, in Baldwin and Clark’s model to each of the top level modules, (2)
deriving the values expressing the complexity of each modules by structural analysis of the DSM (or
by other means), (3) reading from the DSM directly the ”sees” relation, and (4) plugging all these
values into Baldwin and Clark’s NOV formula. All of these steps are automated in our tool, Simon.
The bottom line is that our framework provides rigorous semantics to marks in a DSM and a way to
compute the net option value of modularity for formally precise, abstract models of software design
architectures.
7.3 The Divide-and-Conquer Approach and Its Correctness
Generating a design automaton for an ACN requires explicit enumeration of all the solutions, the
number of which grows exponentially with the number of variables involved. To address the scala-
bility issue, in Chapter 5 we presented a divide-and-conquer approach to decomposing a large ACN
model into a set of smaller sub-ACNs, solving each sub-ACN separately, analyzing these smaller
models, and composing the results. We showed that it is not necessary to compose a full DA from
sub-DAs. Instead, design impact analysis can be done in a divide-and-conquer approach, and a full
DSM can be derived from a full PWDR composed by the sub-PWDRs.
In this section we present the formalization of these approaches, and prove that our divide-
and-conquer approaches to impact analysis and PWDR derivation, respectively, are correct. Our
divide-and-conquer approaches to PWDR derivation and impact analysis involve the derivation of
sub-DAs from sub-ACNs, and the composition of results from these partial intermediate models.
Section 7.3.1 first formalizes ACN decomposition. Section 7.3.2 then presents our divide-and-
Page 125
Chapter 7. Formalization 109
conquer approach to PWDR derivation and proves its correctness. Section 7.3.3 finally presents
and proves our divide-and-conquer approach to impact analysis.
7.3.1 ACN Decomposition
This section formalizes the key notions of our approach to ACN decomposition. The function de-
compose specifies the constraints between a full ACN and its decomposed sub-ACNs. The approach
we described in Chapter 5 decomposes a large ACN into a number of sub-ACNs that conform to
this specification.
decompose : AugmentedConstraintNetwork → FAugmentedConstraintNetwork
∀ full : AugmentedConstraintNetwork; sub : FAugmentedConstraintNetwork •
decompose(full) = sub⇒
full.cn.VariableSet = {var : Variable | ∃subacn : sub • (7.3.1)
var ∈ subacn.cn.VariableSet} ∧
full.cn.domain = {var : Variable; val : Value | ∃subacn : sub • (7.3.2)
(var,val) ∈ subacn.cn.domain} ∧
full.cn.ConstraintSet = {subcons : Constraint | ∃subacn : sub • (7.3.3)
subcons ∈ subacn.cn.ConstraintSet} ∧
(∀subacn : sub • subacn.dominance = {x,y : Variable | (x,y) ∈ full.dominance ∧ (7.3.4)
x ∈ subacn.cn.VariableSet ∧ y ∈ subacn.cn.VariableSet})
• (7.3.1) The union of the variable set of each sub-ACN equals the variable set of the full ACN.
• (7.3.2) The union of the domain of each sub-ACN equals the domain of the full ACN.
• (7.3.3) The union of the constraint set of each sub-ACN equals the constraint set of the full
ACN. We assume that all constraints are expressed in conjunctive normal form (CNF).
• (7.3.4) The dominance relation of a sub-ACN is a subset of the full ACN dominance relation
that involves the variables of the sub-ACN.
Page 126
Chapter 7. Formalization 110
In the following sections, we are going to need two additional concepts. The first is the notion
of what we call the consistent solution set of a sub-ACN, by which we mean the subset of solutions
of a sub-ACN in a given decomposition of a full ACN, where these solutions are all consistent with
the solutions of other sub-ACNs in the given decomposition. The basic idea here is that a sub-ACN
generally has both a smaller set of variables and a weaker set of constraints than the large ACN
from which it was derived, and so can have solutions that are inconsistent with not only those of the
full ACN but also of other sub-ACNs.
consistentSolutions : AugmentedConstraintNetwork×AugmentedConstraintNetwork
→ FAssignment
∀ full : AugmentedConstraintNetwork; sub : AugmentedConstraintNetwork;
solset : FAssignment •
sub ∈ decompose(full) ∧ solset = {sol : sub.cn.solutions |
(∀subSolutions : sub.cn.solutions • (7.3.5)
∀subacn j : decompose(full)\{sub} •
∃compatibleSolution : subacn j.cn.solutions •
(∀sharedvar : (sub.cn.VariableSet∩ subacn j.cn.VariableSet) •
subSolutions.bindings(sharedvar) =
compatibleSolution.bindings(sharedvar)
)
)}
Function consistentSolutions specifies the computation of the consistent solutions of a sub-ACN
in the sub-ACN set decomposed from a full ACN. Constraint 7.3.5 states that for any solution,
subSolution, of a sub-ACN, sub, any other sub-ACN sharing its variables has at least one solution
in which the shared variables have the same assignment. For any individual sub-ACN, it is possible
that it has a solution satisfying its own constraints, but making the full constraint network incon-
sistent. In other words, for a solution, sol a, of an ACN, acn a, if there exists another sub-ACN,
Page 127
Chapter 7. Formalization 111
acn b, sharing its variables, but the sol a assignment of these shared variables is not allowed in
acn b, we call sol a an “incompatible” solution. Constraint 7.3.5 ensures that all the incompatible
solutions of these sub-ACNs are excluded.
The second idea is that of what we call a consistent sub-DA, by which we mean the DA com-
puted from the consistent solution subset specified by the consistentSolutions schema. In preceding
chapters, we have been imprecise about the details of the divide-and-conquer approach. In partic-
ular, when we talked about decomposing an ACN into sub-ACNs, and computing corresponding
sub-DAs, what we meant was computing consistent sub-DAs. In this chapter, we make these ideas
precise. As we will see, a more rigorous statement of our approach is that we decompose an ACN
into sub-ACNs, compute consistent sub-DAs for each sub-ACN, and synthesize our results from
these consistent sub-DAs. Function deriveConsistentSubDA specifies the consistent sub-DAs com-
putation.
deriveConsistentSubDA : AugmentedConstraintNetwork×AugmentedConstraintNetwork
→ DesignAutomaton
∀ full : AugmentedConstraintNetwork; sub : AugmentedConstraintNetwork •
deriveConsistentSubDA(full,sub) =
computeDA(consistentSolutions(full,sub),sub.cn.domain,sub.dominance)
7.3.2 Divide-and-Conquer Pair-Wise Dependence Relation Derivation
This section presents our divide-and-conquer approach to PWDR derivation and proves its correct-
ness. We prove that a PWDR derived by the divide-and-conquer approach is equal to the PWDR
derived by brute-force means directly from an ACN. The proof requires an important lemma: a
DA composed from the consistent sub-DAs derived from the sub-ACNs is equal to the DA that
would be obtained by brute-force derivation from the given ACN. Subsection 7.3.2.1 proves this
lemma. Subsection 7.3.2.2 uses this lemma to prove the equality between the composed PWDR,
PWDRcomposed, and the directly derived PWDR, PWDRdirect.
Page 128
Chapter 7. Formalization 112
7.3.2.1 Lemma: The Equality between the Composed DA and Directly Derived DA
In this subsection we show that the consistent sub-DAs corresponding to decomposed sub-ACNs
can be composed to produce a DA, DAcomposed, and that DAcomposed is equal to the DA that corre-
sponds directly to the given ACN, DAdirect. Figure 7.1 illustrates this idea as a commutative diagram.
We prove that this diagram commutes by first formalizing a function, deriveDA divide conquer,
that follows the “Divide-and-Conquer” path through the diagram and then by proving that it pro-
duces the same DA as obtained by following the “Brute-Force” path.
��������
����������
���������
����� �������
����� ��������������
������
������ �������������
�����������������
��������������������������������
��������
Figure 7.1: The Brute-Force and Divide-and-Conquer DA Derivation
The following Z definition formalizes our divide-and-conquer function:
• (7.3.6) For each state of the composed DA, each sub-ACN has a consistent solution (that is,
a state of its consistent sub-DA) that is a subset of this state. In other words, each consistent
sub-DA has a state that is compatible with the composed DA state.
• (7.3.7) The alphabet of the composed DA is the union of the alphabets of the consistent
sub-DAs.
• (7.3.8) In order to specify the destination states reached by a transition comprising a start
state, start, a change, change, we first specify several auxiliary sets:
Page 129
Chapter 7. Formalization 113
deriveDA divide conquer : AugmentedConstraintNetwork → DesignAutomaton
∀acn : AugmentedConstraintNetwork; DA composed : DesignAutomaton;
subacnset : FAugmentedConstraintNetwork •
subacnset = decompose(acn) ∧ DA composed = deriveDA divide conquer(acn)⇒
(DA composed.states = {state : Assignment | ∀subacn : subacnset • (7.3.6)
(∃sub : consistentSolutions(acn,subacn) • sub.bindings⊆ state.bindings)}) ∧
(DA composed.alphabet = {var : Variable; val : Value | ∃subacn : subacnset • (7.3.7)
(var,val) ∈ (deriveConsistentSubDA(acn,subacn)).alphabet}) ∧
(DA composed.transition = {start : DA composed.states; (7.3.8)
change : DA composed.alphabet; composedendset : FAssignment |
(∃affectedacnset : FAugmentedConstraintNetwork; affectedvarset : FVariable;
unchangedbindings : Assignment; composedaffectedset : FAssignment •
affectedacnset = {aacn : subacnset | first change ∈ aacn.cn.VariableSet} ∧ (7.3.9)
affectedvarset = {affectedvar : Variable | (∃acn : affectedacnset • (7.3.10)
affectedvar ∈ acn.cn.VariableSet)} ∧
unchangedbindings.bindings⊆ start.bindings ∧ (7.3.11)
(∀pair : unchangedbindings.bindings • (first pair) /∈ affectedvarset) ∧
composedaffectedset = {affected : Assignment | (7.3.12)
domaffected.bindings = affectedvarset ∧ (∀aacn : affectedacnset •
(∃substart : (deriveConsistentSubDA(acn,aacn)).states •
substart.bindings⊆ start.bindings ∧
(∃subend : (deriveConsistentSubDA(acn,aacn)).transition(substart,change) •
subend.bindings⊆ affected.bindings)))} ∧
composedendset = {end : DA composed.states | (7.3.13)
(∃affected : composedaffectedset •
end.bindings = affected.bindings∪unchangedbindings.bindings)}
) • ((start,(change)),composedendset)})
Page 130
Chapter 7. Formalization 114
a. (7.3.9) affectedacnset is a set of sub-ACNs that are affected by the specified change.
b. (7.3.10) affectedvarset is a set of variables that are involved in the affected sub-ACNs.
c. (7.3.11) unchangedbindings is a set of bindings that are not affected by the specified change,
because their involving variables do not belong to any affected sub-ACNs.
d. (7.3.12) composedaffectedset specifies the cross product of the destination state sets reached
in each affected consistent sub-DAs led by the specified change and originated from the correspond-
ing substart.
Given all these auxiliary sets, composedendset (7.3.13) specifies the destination states reached
by the given start state and change, which are the union of the unchangedbindings and the elements
of composedaffectedset.
Now we prove that the design automaton DAdirect directly derived from the original ACN is
equal to the composed DA, which we call DAcomposed.
(1) We first prove that the state set of DAdirect is equal to the state set of DAcomposed. That is, for
each state S in DAdirect, S is also in DAcomposed, and vice versa.
Theorem 1: For a given ACN, SuperACN, any state S in DAdirect = deriveDA(SuperACN) is
also in DAcomposed = deriveDA divide conquer(SuperACN).
Proof of Theorem 1: Because S is a solution of SuperACN, S satisfies all the constraints of
SuperACN. Let SubACNSet be the set of sub-ACNs obtained by decompose(SuperACN), and let
SubACN be an arbitrary element of SubACNSet. We claim that S must be consistent with the
constraints of SubACN and of all other elements of SubACNSet. The reason is that each such sub-
ACN has a non-trivial projection of S as a solution, SubS. The projection is non-trivial because the
variable set of the sub-ACN is a non-empty subset of the variable set of the SuperACN. Moreover,
because SubS is a projection of a solution to the SuperACN, SubS must be consistent with all other
sub-ACNs, which is to say that it must be a consistent solution of the sub-ACN. By constraint 7.3.6
in the specification of deriveDA divide conquer, S is therefore a state in DAcomposed. The reason is
that S is the union of a set of sub-states that satisfy the criteria in 7.3.6.
Theorem 2: For a given ACN, SuperACN, any state S in
DAcomposed = deriveDA divide conquer(SuperACN) is also in DAdirect = deriveDA(SuperACN).
Page 131
Chapter 7. Formalization 115
Proof of Theorem 2: Each state of DAcomposed is the union of set of consistent solutions of
sub-ACNs. The sub-ACNs collectively embody all of the constraints of the ACN. The union of
consistent solutions is, by definition, consistent with the constraints of SuperACN.
Given Theorem 1 and Theorem 2, we conclude that the state set of DAdirect is equal to the state
set of DAcomposed.
(2) Next we prove that the transitions of DAdirect is equal to the transitions of DAcomposed.
Theorem 3: Given an ACN, SuperACN, and its DAdirect = deriveDA(SuperACN), any transition
T from state start to state end on assignment v = val in DAdirect is also a transition between the
corresponding states start and end in DAcomposed = deriveDA divide conquer(SuperACN). The
proof of this theorem depends on the following lemma:
Lemma 1: For any consistent sub-DA, subDAi, involving the change v = val, there exists a
transition t : ((substarti,(v,val)),subendi) in subDAi, where substarti is the projection of start, and
subendi is the projection of end.
Proof of Lemma 1: we prove this lemma by contradiction. Suppose the transition t is not
present in subDAi. For this to be the case, in essence the transition t must not be minimal (because
the other conditions for a transition are satisfied). If t is not minimal, then there must be another
state in the sub-DA, which we call it subend′i, where (subend′i \ substarti) ⊂ (subendi \ substarti).
But in this case, subend′i is a consistent state, which means that there is a state of the DAdirect,
end′, of which subend′i is a projection, and there has to be a transition from start to end′, such that
(end′ \ start)⊂ (end \ start).which violates the initial assumption that T is minimal.
The theorem follows from the combination of the lemma and our definition of the set of end
states of the composed DA. In particular, we need show that end is one of the end states in the
composed DA. Given the lemma that for all the m consistent sub-DAs involving the change v = val,
each consistent subDAi, i = 1..m has a transition t : ((substarti,(v,val)),subendi) in subDAi, where
substarti is the projection of start, and subendi is the projection of end, we let composedaffectedset
be the union of all the subendi, i = 1..m. Since composedaffectedset ⊆ end, we can see that end is
one of the destination states reached by the given start state and the change in DAcomposed, according
to constraints 7.3.12 and 7.3.13 in the specification of deriveDA divide conquer. In summary, the
Page 132
Chapter 7. Formalization 116
transition T exists in DAcomposed, and Theorem 3 is proved.
Theorem 4: Given an ACN, SuperACN, and its
DAcomposed = deriveDA divide conquer(SuperACN), any transition T from state start to state end
on assignment v = val in DAcomposed is also a transition between the corresponding states start
and end in DAdirect = deriveDA(SuperACN). In order to prove this theorem, we need to prove the
following lemmas:
Lemma 2: In the state end, the change v = val is accommodated.
Proof of Lemma 2: In the specification of transitions in deriveDA divide conquer, con-
straint 7.3.12 specifies the cross product of the end states in these affected consistent sub-DAs:
composedaffectedset = subendset1 × subendset2 × ...× subendsetm. Because the change v = val
is accommodated by every subendi, i = 1..m, in every affected ∈ composedaffectedset, v must
be bound to val. Constraint 7.3.13 specifies the union of affected and the unchanged bind-
ings, unchangedbindings, and obtain the set of full destination states: composedendset = {end :
Assignment | ∃affected : composedaffectedset • end = affected ∪ unchangedbindings}. Because
end ∈ composedendset, end must have accommodated the change v = val.
Lemma 3: The transition between start and end is minimal.
Proof of Lemma 3: Since there exists affected ∈ composedaffectedset, such that end = affected∪
unchangedbindings, we just need to prove that affected is minimal. That is, there does not exist
a affected′ and end′ = affected′ ∪ unchangedbindings such that (end′.bindings \ start.bindings) ⊂
(end.bindings \ start.bindings). We prove this by contradiction. Suppose that affected′ exists,
and can be decomposed into the corresponding end states of each affected consistent sub-DAs,
subend′i, i = 1...m. We similarly decompose subend and start into the corresponding end and start
states of each affected consistent sub-DAs, subendi, i = 1...m and substarti, i = 1...m. Then there
must exist a consistent sub-DA subDAi, in which (subend′i \ substarti) ⊂ (subseti \ substarti). This
means that the transition from substarti to subendi labeled with v = val is not minimal, and should
not exist in subDAi. This is a contradiction. Thus, subend′i does not exist, and the transition between
start and end labeled with v = val is minimal.
Lemma 4: The dominance relation is respected.
Page 133
Chapter 7. Formalization 117
Proof of Lemma 4: For any pair (v,w) in the dominance relation of DAdirect, if both v and
w belong to the same sub-ACNs, then their dominance relation must have been respected by the
corresponding consistent sub-DAs; if v and w do not appear in any sub-ACN together, then in
the composed transitions of DAcomposed, changes to v must not cause change to w. Otherwise, the
DAcomposed can not be minimal.
Given these lemmas and the definition of a design automaton, start and end are connected by
v = val in DAdirect, and Theorem 4 is proved. Given Theorem 3 and Theorem 4, we conclude that
the transitions of DAdirect is equal to the transitions of DAcomposed.
In summary of (1) and (2), DAdirect equals DAcomposed.
7.3.2.2 Divide-and-Conquer PWDR Derivation
We have already formalized the relationship between an ACN and its DA; the decomposition of an
ACN into sub-ACNs; and a divide-and-conquer approach to DA composition. In this subsection
we show that the sub-PWDRs derived from the consistent sub-DAs corresponding to the sub-ACNs
can be composed to produce a PWDR, and that this composed PWDR, PWDRcomposed, is equal to
the directly derived PWDR, PWDRdirect. Figure 7.2 illustrates this idea as a commutative diagram.
����������
���������
����� �������
����� ��������������
���������
��������� �� �����
����� �����
������
����� �����
�����������
����� ��������
�����������������
�������������������� ����������
��������
Figure 7.2: The Brute-Force and Divide-and-Conquer PWDR Derivation
We prove that this diagram commutes by first formalizing a function, deriveP-
Page 134
Chapter 7. Formalization 118
WDR divide conquer, that follows the “Divide-and-Conquer” path through the diagram, and then
by proving that it produces the same PWDR as obtained by following the “Brute-Force” path.
The following Z definition formalizes our divide-and-conquer function, which simply specifies
the union of the sub-PWDRs computed from consistent sub-DAs to achieve the purpose.
derivePWDR divide conquer :
AugmentedConstraintNetwork → PairWiseDependenceRelation
∀acn : AugmentedConstraintNetwork; subacnset : FAugmentedConstraintNetwork;
pwdr : PairWiseDependenceRelation • subacnset = decompose(acn) ∧
derivePWDR divide conquer(acn) = pwdr ⇒
pwdr.pairs = {first : Variable; second : Variable |
∃subacn : subacnset • (first,second) ∈
(computePWDR(deriveConsistentSubDA(acn,subacn))).pairs
}
Having proved the equality between DAdirect and DAcomposed, we can easily prove the correct-
ness of the composed PWDR. We prove that if a pair (x,y) is in PWDRdirect, it must also be in
PWDRcomposed, and vice versa.
On one hand, if a pair (x,y) is in PWDRdirect, we know that there exists a transition in DAdirect
labeled with an assignment of x, and y is involved in accommodating this assignment. Since we
have proved that DAdirect equals DAcomposed, such a transition must also exist in DAcomposed. In
other words, such a transition must exist in one of the consistent sub-DAs, and consequently in the
computed sub-PWDR. Accordingly, (x,y) must also be in PWDRcomposed.
On the other hand, if a pair (x,y) is in the PWDRcomposed, we know that there exists a transition
in one of the consistent sub-DAs that is labeled with an assignment of x, and y is involved in
accommodating this assignment. Since DAcomposed is composed from sub-DAs, and is equal to
DAdirect, such a transition must exist in DAdirect. Accordingly, (x,y) should also be in PWDRdirect.
The proof is complete.
Page 135
Chapter 7. Formalization 119
7.3.3 Divide-and-Conquer Design Impact Analysis
Given an original state, start, and a sequence of changes, the purpose of design impact analysis
(DIA) is to find all the evolution paths that start from start and go alone the edges of the DA labeled
with these changes. The last states of these paths accommodate these changes, and the evolution
paths represent the different ways these changes can be accommodated.
Our divide-and-conquer design impact analysis involves the following steps:
(1) For the original state start, we identify a substart in each consistent sub-DA that is a subset
of start, and consider the first variable-value pair, change, of the sequence of changes.
(2) Suppose that m out of the n consistent sub-DAs involve change. For the set of variables that
are not involved in any of these affected m consistent sub-DAs, we keep their value assignment,
which we call unchangedbindings.
(3) For each of the m affected consistent sub-DAs, we find the set of destination states:
subendseti, i = 1...m, that the change leads to.
(4) Then we compute the cross product of the end states in these affected consistent sub-
DAs: composedaffectedset = subendset1 × subendset2 × ...× subendsetm. For each affected ∈
composedaffectedset, we then compute the union of affected and the unchanged bindings,
unchangedbindings, and obtain the set of full destination states: composedendset = {composedend :
Assignment | ∃affected : composedaffectedset • composedend = affected ∪ unchangedbindings}.
The correctness of our divide-and-conquer DIA approach depends on the following theorem:
Theorem 5: The composedendset equals the destination state set that would have been reached
by the given start and change in DAdirect. The proof of this theorem depends on the following
Lemmas:
Lemma 5: The composedendset comprises all the destination states that the same start state and
the same change would have reached in DAdirect. It is easy to see that composedendset has taken
into account all the possible ways to accommodate the change, and this condition is satisfied.
Lemma 6: For each state end ∈ composedendset, the transition between start and end is min-
imal. Proof of Lemma 6: Since end is the union of unchangedbindings and an element, affected,
of composedaffetedset, we just need to prove that affected is minimal. That is, there does not exist
Page 136
Chapter 7. Formalization 120
an affected′ and end′ = affected′∪unchangedbindings such that (end′.bindings\ start.bindings)⊂
(end.bindings \ start.bindings). We prove this by contradiction. Suppose that affected′ exists,
and can be decomposed into the corresponding end states of each affected consistent sub-DAs,
subend′i, i = 1...m. We similarly decompose subend and start into the corresponding end and start
states of each affected consistent sub-DAs, subendi, i = 1...m and substarti, i = 1...m . Then there
must exist a consistent sub-DA, subDAi, in which (subend′i \ substarti)⊂ (subseti \ substarti). This
means that the transition from substarti to subendi labeled with v = val is not minimal, and should
not exist in subDAi. This is a contradiction. Thus, subend′i does not exist, and the transition between
start and composedend labeled with v = val is minimal.
In summary, Theorem 5 is proved and this step finds the set of destination states that would
have been reached by the given start and change in DAdirect.
(5) We now consider each composedend as a new state, start, consider the sequences of changes
with the first change removed as the new sequence of changes, and go back to step (1).
After all the changes are consumed, we get the evolution paths that are equal to the paths
otherwise obtained using the brute-force design impact analysis on DAdirect.
7.4 Chapter Summary
To address the problem that important modularity theories remain informal and imprecise, not suf-
ficient to provide the basis for tool-supported automation, this chapter contributes the formalization
of our framework. We claim that our framework is able to account for the key notions of the follow-
ing important but informal theories: (1) Baldwin and Clark’s key concepts of design dimensions,
design decisions, design spaces, design rules, and design dependences; (2) Parnas’s information
hiding modularity; and (3) Parnas’s changeability analysis. We have supported these claims by
showing that all these concepts and problems can be formally defined in the context of our models.
We have also formalized our divide-and-conquer approaches and proved the correctness.
Page 137
Chapter 8
Simon: The Tool
Previous chapters have presented the modeling and analysis techniques of our framework. We claim
that our modeling framework can be supported by tools, that the analyses can be automated with
reasonable performance at least for small but representative design models, and that the tool helps
to validate the modeling approach. In order to test these hypotheses, we developed a prototype
tool, called Simon 1, that implements our framework. We used Simon to automatically analyze the
problems people previously analyzed qualitatively or manually. The results Simon produced either
confirm previous results or reveal errors in them precisely and quantitatively. These experiments
using Simon constitute the evidence we have developed to date in support of our claims that our
framework is valid. The implementation of Simon also enables the dissemination of our results
independent of our modeling techniques, providing the first step towards a practical tool for design
modeling and analysis.
Simon supports formal design modeling through interactive graphical user interfaces (GUIs),
and automates design impact analysis, design structure matrix derivation, and net option value cal-
culation. These modeling and analysis techniques are based on the formalizations introduced in
Chapter 7. All the results and screen snapshots presented in this dissertation are based on experi-
ments using Simon. Chapter 4 presents the performance data of analyzing Parnas’s KWIC example.
Chapter 9 presents the performance data of analyzing the ACN models of a web application and a
1Our tool is named after Herbert A. Simon, the pioneer of decision making theories and the father of artificial intelli-gence.
121
Page 138
Chapter 8. Simon: The Tool 122
peer to peer networking system. The other two designs presented in this dissertation, the figure ed-
itor example and a fault tree analysis tool, are small enough so that Simon automates their analyses
instantaneously. The evidence in support of our claims has several parts: (1) The fact that all the
representative designs presented in this dissertation have been modeled as ACNs or CACNs using
Simon provides evidence that our modeling technique can be supported by tools; (2) Simon com-
putes all the analyses with acceptable response time; and (3) Simon produces the analysis results
that validate our framework.
This chapter introduces the graphical user interfaces (GUIs) and modeling languages of Simon
to show how the framework introduced in this dissertation can be possibility supported by a tool.
That is, how ACN and CACN models can be built and analyzed using Simon. At present, Simon is
still a research prototype implemented for proof of concept, for performance testing, and for model
validation purposes.
Section 8.1 describes how to construct ACNs and CACNs using Simon. We introduce ACN
modeling first because ACNs are the basic models for automated analyses. After that, we describe
how to build extended CACN models using Simon and how to reduce a CACN into ACNs. This
section also presents the grammar of the ACN and CACN modeling language notation. We have
shown ACN and CACN models expressed using these languages throughout the dissertation, such
as Figure 4.3 in Chapter 4. Simon provides GUIs for the user to construct the expressions of
these languages easily. At present, the language is sufficient to model the designs presented in this
dissertation, but its syntax is somewhat arbitrary and subject to change. The further development
of Simon language is among our future work. Section 8.2 describes how Simon solves constraint
networks and generates design automata and pair-wise dependence relations. Section 8.3 describes
how to use the Simon GUI to automate design evolvability and economic-related analyses, using
either the brute-force or divide-and-conquer approaches.
Page 139
Chapter 8. Simon: The Tool 123
8.1 Interactive Formal Design Modeling
The GUIs of Simon are built using C#, and a Simon project can be saved as a set of files. This
section introduces Simon GUIs for ACN and CACN construction, as well as the textual modeling
language notations for both models.
8.1.1 ACN Modeling
���������� ����������� ������
� �������������������
��� ���������������������
������������� ���������������
����� ���������������
������� �!�
����� ������� ���
������� �"�
������#�
������� �$�
����%�
�������&�� ���� �'����������� �()*� �(+*� �(,�
�������#�����- ������������� �( �
���.�����/ ���������� �(0�
�������
-�������
�� �'���
1����&��
#������������������
#��%�
����%�
Figure 8.1: Core Models and Analysis in Simon
8.1.1.1 Overview
Figure 8.1 shows how Simon supports the core models introduced in Chapter 3. As we have ex-
plained, an ACN model consists of these elements: a constraint network, a dominance relation,
and a cluster set. Simon enables the user to input these elements through different tab pages of a
Page 140
Chapter 8. Simon: The Tool 124
tab control, as shown in Figure 8.2, 8.3, and 8.4. Given an ACN, Simon first solves the constraint
network and stores all the solutions into a .sol file. After that, Simon takes the solutions and the
dominance relation as input, generates a design automaton and a pair-wise dependence relation, and
stores them into a .da file and a .dep file. At this point, the user can analyze design impacts using
the GUIs (Figure 8.15, 8.16, and 8.17), derive design structure matrices (8.18), and compute net
option values (8.19).
8.1.1.2 Graphical User Interfaces
Figure 8.2 shows the constraint network tab page. The left list box shows all the design variables,
and the right list box shows all the constraints. To add new variables, the user selects Edit−>Add
Variable from the main menu. After that, a variable input GUI appears so that the user can input
or edit a scalar variable and its domain. Newly added variables will be displayed in the GUI shown
in 8.2. The domain of a selected scalar variable will be shown in the lower left box of the tab page.
To add new constraints, the user selects Edit−>Add Constraint from the main menu. After that, a
constraint input GUI appears so that the user can input or edit a constraint as a logical expression.
Newly added constraints will be displayed in the GUI shown in 8.2. The syntax of constraint
expressions are defined in the next subsection.
Figure 8.3 shows the dominance relation tab page, in which the user can construct the domi-
nance relation through a grid control. Checking a cell dictates that the variable on the row can not
influence the variable on the column.
Figure 8.4 shows the cluster set tab page, in which the user can create, delete, or edit a cluster
by moving variables around and aggregating variables into modules. Newly added clusters will be
shown in the upper left cluster set box of the form.
The cluster set boxes on the upper left of these GUIs display existing clusterings. The selected
cluster boxes display the selected cluster. Selecting a different cluster reorders the variables that are
displayed in the tab control.
Page 141
Chapter 8. Simon: The Tool 125
Figure 8.2: Simon: Constraint Network Construction
8.1.1.3 Constraint Network Modeling Language
After saving a project using the main menu, Simon saves the constraint network as an internal
language file. The dominance relation is saved as a plain text file, and the cluster set is saved an
XML file. Opening a Simon project will load these files and show the contents into these GUIs. The
language productions are shown in Figure 8.5. These productions are presented using the grammar
of ANTLR (ANother Tool for Language Recognition) 2, a language tool that provides a framework
for constructing recognizers, compilers, and translators from grammatical descriptions. Simon uses
ANTLR as a component to generate the parsers and lexers of its internal languages.
• The ds production defines that a design space consists of a number of expressions:
ds: "DesignSpace" modelName LCURLY (dsExprs ";")* RCURLY
2http://www.antlr.org/
Page 142
Chapter 8. Simon: The Tool 126
Figure 8.3: Simon: Dominance Relation Construction
• The dsExprs production defines that an expression can be either a variable expression or a
logical expression: dsExprs: dsVar | predicate
• The dsVar production defines that a variable has a name and a domain:
dsVar: varName ":" enumType
• The enumType production defines that a domain consists of a set of values:
enumType: (LCURLY valueName("," valueName)* RCURLY)
• The bindingDecl production defines the syntax of specifying that a variable is, or is not, of
a value:
bindingDecl : varName ("!=" | "=") valueName
• The primitive production defines that a primitive logical expression can be either a binding
or a more complex predicate:
primitive : bindingDecl | (LPAREN predicate RPAREN)
Page 143
Chapter 8. Simon: The Tool 127
Figure 8.4: Simon: Cluster Set Construction
ds: "DesignSpace" modelName LCURLY (dsExprs ";")* RCURLY;dsExprs: dsVar | predicate ;dsVar: varName ":" enumType;enumType: (LCURLY valueName("," valueName)* RCURLY);bindingDecl : varName ("!=" | "=") valueName;primitive : bindingDecl | (LPAREN predicate RPAREN);relationalDecl : primitive (("&&" | "||") primitive)*;predicate: (relationalDecl (("=>" | "<=>" ) relationalDecl )*);
Figure 8.5: ACN Language Productions
• The relationalDecl and predicate define the logical expression syntax, in which && and
|| have higher priority than => and <=>:
relationalDecl : primitive (("&&" | "||") primitive)*
predicate: (relationalDecl (("=>" | "<=>" ) relationalDecl )*)
Page 144
Chapter 8. Simon: The Tool 128
8.1.2 CACN Modeling
As introduced in Chapter 6, a CACN model extends the ACN model by adding set-valued variables,
subspace-valued variables, and universally quantified logical expressions. We extend the GUIs and
language of Simon to accommodate these changes in modeling. Simon also provides a GUI in
which the user can specify a value for each set-valued variable and subspace-valued variable, and
translate the parameterized CACN into ACNs.
Through the main menu, the user can open or construct an extended CACN model as introduced
in Chapter 6. Figure 8.6 shows the GUI of CACN modeling 3. The upper left tree view in Figure 8.6
shows the CACN design space in which variables are shown as tree nodes. A design variable with
subspaces has children, each modeling a subspace. The Variables and Predicate list boxes show
all the variables and predicates within the parent space or a selected subspace. The Variable Detail
block shows the details of a selected scalar or set-valued variable.
Figure 8.6: Simon: Complex Augmented Constraint Network
To construct or edit a CACN model, the user clicks the Edit button, and then adds or edits vari-3Yuanyuan Song in our research group implemented the GUIs of CACN modeling, as well as the translation from a
CACN model into ACN models.
Page 145
Chapter 8. Simon: The Tool 129
ables using the GUI shown in Figure 8.7. After editing the variables, clicking the Add Constraint
button reveals the constraint editing GUI in which the user could further input logical expressions,
including universally quantified expressions. The next subsection introduces the syntax produc-
tions. If newly added or edited variable has subspaces, Simon will recursively present the GUI as
shown in Figure 8.6 so that the user can construct the subspaces.
Figure 8.7: Simon: Design Variables
From the GUI shown in Figure 8.6, clicking the To ACN button brings a new GUI in which
the user can specify a value for each set-valued variable and subspace-valued variable, as shown in
Figure 8.8. After that, clicking the Finish button will cause Simon to translate the parameterized
CACN into an ACN, as explained in Chapter 6. All the CACN forms are closed and the user can
work on the newly generated ACN, as shown in Figure 8.9.
8.1.3 CACN Modeling Language
After saving a CACN project, the CACN model is saved as an CACN language file, and will be
loaded when the user opens the project later. Figure 8.10 shows the CACN notation productions.
As a matter of fact, the CACN language is a superset of the ACN language, and we can use the
Page 146
Chapter 8. Simon: The Tool 130
Figure 8.8: Parameterize a CACN
CACN language and GUIs to model a ACN design. For the purpose of concept proving, Simon
provides two sets of GUIs and language syntax for these two models at present.
Following are several extended productions in the CACN modeling language:
• The expressions, dsExprs production, now include subspace declaration and quantified log-
ical constraints:
dsExprs: varDecl | predicate | subSpace | quanpredicate
• The variable declaration, varDecl production, now includes set-valued variables and sub-
space variables: varDecl: scalarVar | setVar | subspaceVar
• The subspace variable declaration syntax is defined by subspaceVar production:
subspaceVar: "subspace" varName ":"
(LPAREN valueName ("," valueName)* RPAREN)
• The set-valued variable declaration syntax is defined by setVar:
setVar: "set" varName (setRef |
((LPAREN valueName ("," valueName)* RPAREN)":") (setMatch | setNew))
Page 147
Chapter 8. Simon: The Tool 131
Figure 8.9: Automatically Generated ACN
• The setRef production defines set-valued variables that refer to other variables specified
after *: setRef: (LPAREN "*" varName RPAREN)
• The setMatch production defines set-valued variables that are brought into being by another
variable or variable set after %:
setMatch: ("%"varName ("*" "%" varName)*)
• The setNew production defines variables that could be either scalar variables or set variables
that bring in new design dimensions, as defined in the production valueSet:
setNew: ("set" | (LPAREN valueSet ("," valueSet)* RPAREN))
• The subSpace production defines a subspace:
subSpace: IDENT LBRACK (dsExprs ";")* RBRACK
• The matching and quanpredicate productions define the constraints between variables
with one-to-one correspondence relation:
Page 148
Chapter 8. Simon: The Tool 132
ds: "DesignSpace" IDENT LBRACK (dsExprs ";")* RBRACK;dsExprs: varDecl | predicate | subSpace | quanpredicate ;varDecl: scalarVar | setVar | subspaceVar ;scalarVar: "scalar" varName ":" (LPAREN valueName ("," valueName)* RPAREN);subspaceVar: "subspace" varName ":" (LPAREN valueName ("," valueName)* RPAREN);setVar: "set" varName (setRef | ((LPAREN valueName ("," valueName)* RPAREN)":")(setMatch | setNew));setRef: (LPAREN "*" varName RPAREN)( | ":"(LPAREN valueSet ("," valueSet)* RPAREN));setMatch: ("%"varName ("*" "%" varName)*);setNew: ("set" | (LPAREN valueSet ("," valueSet)* RPAREN));valueSet: (valueName setDecl | valueName);setDecl: "{" valueName ("," valueName)* "}";matching: "%" (varName ":" varName) ("," (varName ":" varName))* "%";quanpredicate: matching ( | "|" predicate ("," predicate)*);predicate: (relationalDecl (("=>"ˆ | "<=>"ˆ ) relationalDecl )*);subSpace: IDENT LBRACK (dsExprs ";")* RBRACK;bindingDecl : bindingSingle | bindingSet ;bindingSingle : varName ("!="ˆ | "="ˆ) valueName;bindingSet : "˜" varName ("!="ˆ | "="ˆ) valueName;primitive : bindingDecl | (LPAREN predicate RPAREN) | setMember;setMember: varName "in" varName;relationalDecl : primitive (("&&"ˆ | "||"ˆ) primitive)*;
Figure 8.10: CACN Language Productions
matching: "%" (varName ":" varName) ("," (varName ":" varName))* "%"
quanpredicate: matching ( | "|" predicate ("," predicate)*)
• The bindingSet production defines a universal quantification, dictating that all the variables
within a set are bound with the same value:
bindingSet : "˜" varName ("!="ˆ | "="ˆ) valueName
• Since a CACN model supports set-valued variables, we add a primitive expression,
setMember, to denote the membership relation:
setMember: varName "in" varName
8.2 Constraint Solving and DA, PWDR Generation
After an ACN model is built or automatically generated from a parameterized CACN, the user
can analyze design impacts, derive DSMs, or compute net option values. In order to do these
Page 149
Chapter 8. Simon: The Tool 133
analyses, the user should first let Simon solve the constraint network, and then generate the design
automaton and pair-wise dependence relation as internal data structures, using the menus shown
the Figure 8.11. The user can either choose to solve the whole constraint network and generate
the full DA and PWDR (by clicking the menu items Solve Whole CN and Generate Whole DA),
or decompose the ACN into a number of sub-ACNs by clicking the Decompose menu, and then
solve each sub-ACN and generate sub-DAs and sub-PWDRs (by clicking the Solve Modular CN
and Generate Modular DA menu items).
Figure 8.11: Simon: Solve Constraint Network
Figure 8.12 shows a scenario when the KWIC ACN is decomposed into 6 sub-ACNs and a
prompt is shown after the decomposition is completed. Internally, each sub-ACN is saved as a
separate Simon project that the user can open and analyze separately.
Simon uses Alloy internally as a SAT solver. Since Alloy is a Java program, we wrote a small
helper program in Java to invoke the Alloy SAT solver through Alloy APIs, and translate Alloy
outputs into a text file (a .sol file) in the format Simon requires. Given an ACN or a sub-ACN,
Simon first takes the constraint network part and translates it into an Alloy specification. After that,
Simon invokes the helper program as a separate thread, which reports to Simon when the constraint
solving is complete. Figure 8.13 shows a scenario when all the decomposed sub-ACNs are solved.
For performance reasons, we wrote a DA and PWDR generation program in C. This program
Page 150
Chapter 8. Simon: The Tool 134
Figure 8.12: Simon: Decompose a Large Constraint Network
takes a solution file and a dominance relation, generates a DA and a PWDR, and stores them as plain
text files (a .da file and a .dep file) in the format requires by Simon. After the user clicks Generate
Whole DA or Generate Modular DA menu items, Simon invokes this program as a separate thread
to process the whole ACN, or invokes multiple threads to generate the sub-DA and sub-PWDR for
each sub-ACN separately.
8.3 Automated Design Analysis
After the constraint network is solved and internal data structures, DAs and PWDRs, are generated,
the user can do a number of analyses. Figure 8.14 shows two analysis menu items: design impact
analysis and design structure matrix generation. The net option value calculation is based on DSMs,
and its GUI will be accessible through a DSM GUI.
8.3.1 Design Impact Analysis
As introduced in Chapter 3, the inputs of design impact analysis include an original design and
a sequence of changes, and the output includes a set of evolution paths. Figure 8.15 shows the
Simon GUI in which the user can specify a design by selecting a value for each variable. After that,
clicking the Verify button tests whether the specified design is valid.
Page 151
Chapter 8. Simon: The Tool 135
Figure 8.13: Simon: Sub-ACNs are Solved
After verifying a valid design and clicking the Select button, the user can specify a change by
selecting another value for a changing variable using the GUI shown in Figure 8.16. All the changed
variables are shown in the lower list view as a sequence.
After specifying an original design and a sequence of changes, the use can click the analysis
menu item to analyze design impact and get the output in a GUI as shown in Figure 8.17. The
upper block shows two evolution paths. Clicking the corresponding radio button shows the selected
design in the evolution path in the lower box. The middle list view shows the differences between
the original design and the selected design in the evolution path.
8.3.2 Design Structure Matrix Derivation
Clicking the Design Structure Matrix menu will generate a DSM, as shown in Figure 8.18. A DSM
is derived from a PWDR and a selected clustering from the clustering set. Selecting a different
clustering method will reorder the DSM.
8.3.3 Net Option Value Computation
Clicking the NOV menu item in the DSM GUI reveals the net option value calculation GUI, as
shown in Figure 8.19. The user can input additional NOV related parameters, such as the estimated
Page 152
Chapter 8. Simon: The Tool 136
Figure 8.14: Simon: Design Automaton and Pair-wise Dependence Relation Generation
technical potential, for each module using the control on the upper left. This GUI displays differ-
ent module parameters according to the selected clusterings. The user can experiment with a new
modularization method and compute its NOV by first constructing a new clustering using the GUI
shown in Figure 8.4, and then computing the corresponding NOV. The upper right block summa-
rizes all the parameters of all the modules. The lower right grid are the automatically computed
NOV values for each module. The final system NOV is shown at the bottom.
8.4 Chapter Summary
In summary, this chapter introduced Simon, our prototype tool, that implements the framework
introduced in this dissertation. The fact that all the five representative designs presented in this
dissertation have been constructed as ACNs or CACNs using Simon, and that Simon automates
all the analyses within reasonable amount of time has provided evidence supporting our claims
that the formal modeling techniques of our framework can be supported by tools, that the analysis
techniques of our framework can be automated with reasonable performance, and that Simon can
help us evaluate our modeling framework as a novel artifact.
Simon is currently at the stage of concept proving. We use Alloy internally and use C# for
GUI development. Since Alloy uses SAT solvers (mchaff or zchaff) that are written in C, Simon
Page 153
Chapter 8. Simon: The Tool 137
Figure 8.15: Design Impact Analysis: Select an Original Design
is actually taking the following route: C#− >Java− >C− >Java− >C#. The input and output
transitions in between are through writing and loading text files, which is unnecessarily complex
and time consuming. Bypassing Alloy but using a SAT solver directly is among our future work.
Page 154
Chapter 8. Simon: The Tool 138
Figure 8.16: Design Impact Analysis: Specify a Change
Figure 8.17: Design Impact Analysis: Evolution Paths
Page 155
Chapter 8. Simon: The Tool 139
Figure 8.18: Design Structure Matrix Derivation
Figure 8.19: Net Option Value Calculation
Page 156
Chapter 9
The Generalizability of the Approach
Our long-term goal is to develop a modeling and analysis theory and tools of value to practicing
software architects and decision-makers. In particular, we hope that developers will someday be
able to use such tools to help make major design decisions: high-consequence design decisions that
have not yet been made. Reaching this goal is beyond scope of the work presented in this disserta-
tion. Nevertheless, it is important that we make and defend a critical claim: that our modeling and
analysis approach generalizes beyond the set of small models, such as KWIC, which we used as
experimental subjects in developing and refining the approach itself. Is the approach applicable to
systems and to modeling and analysis experiments that are (a) beyond those used in developing the
approach, (b) beyond those developed by the authors of the approach, and (c) to modeling and anal-
ysis of designs models for real system? We claim that our approach generalizes in these dimension.
In support of this claim, we present evidence in the form of three additional case studies.
Section 9.1 presents a replication study, using our approach, of the modeling and economic
analysis of a web-based e-commerce application developed and studied by Lopes et al. [49]. The
application was called WineryLocator. Lopes et al. employed Baldwin and Clark’s modeling and
analysis technique to quantitatively compare a range of possible designs for this system, including
both object- and aspect-oriented designs. We represent these designs using ACNs according to
their design descriptions, generate DSM models, and compare with the models that they developed
by hand. Our results confirmed their results in general. However, as with the replication study
140
Page 157
Chapter 9. The Generalizability of the Approach 141
of our own KWIC example, we found a number of errors in their published models and results.
Both outcomes—that our overall results are similar to theirs and that ours also revealed errors in
earlier, informal reasoning—tend to confirm our claim that our approach is sufficient to reproduce,
but formally and with higher precision, published design studies of systems developed by other
researchers.
Section 9.2 presents the modeling and analysis of a peer-to-peer networking system, Hyper-
Cast [48, 47], developed by the network researchers in the University of Virginia and studied by
Sullivan et al. [63]. Similar to the WineryLocator paper, the authors compared different designs
using manual models. Remodeling these designs into our framework and analyzing them automat-
ically reveals important issues missed in the manual models.
These sections demonstrate how to construct the formal models of these designs, how a large
model is decomposed into small sub-models so that we can get the results quickly, and the com-
parison of the derived DSMs with published manual models. The comparison reveals errors and
problematic issues in the manual models that the authors used to compute NOV values, which
implies potential problems in their quantitative results. These experiments reveal the benefits of
automating design analysis based on precise models.
Although the authors of these papers applied the same NOV formula, they estimated the param-
eters in dramatically different ad hoc ways. In fact, the application of the NOV model to software
design involves the revision and extension of the model itself, which is our on-going research. As
a result, this dissertation does not go further evaluating or comparing the NOV experiments and
results for these two designs.
Section 9.3 presents the modeling and analysis of the Galileo dynamic fault tree analysis tool,
developed at the University of Virginia for production use at NASA [66, 65, 24]. The Galileo de-
signers once faced a situation when they had to make a decision about how to restructure part of the
system. They reached a decision based on discussions and arguments, rather than rigorous analysis.
Modeling and analyzing this historical scenario using Simon, the designers are now able to compare
different decisions comprehensively and justify their decision rationally.
Page 158
Chapter 9. The Generalizability of the Approach 142
9.1 A Web Application—Winery Locator
In their paper [49], Lopes et al. studied a web application called WineryLocator. The authors used
DSMs to model and compare object-oriented and aspect-oriented designs for WineryLocator. Their
purpose, analogous to that of Sullivan et al., was to model the value of modularity [64], in this case,
with a focus on the benefits of aspect-oriented modularity.
WineryLocator is designed to locate wineries in California. A user can input either an approxi-
mate or exact address, by which the application locates the exact address as the valid staring point.
After that, the user can select preferences for the wineries. Given a starting point and the pref-
erences, the application generates a route for a tour consisting of all the wineries that match the
preferences. The application outputs a set of stops in the route, a navigable map, and the driving
directions under user request.
In this section, we first briefly present how these designs are modeled, and then introduce how
large ACNs are split into a number of smaller ones that are solved individually, and present the
integrated results. Finally, we compare the discrepancies between our derived DSMs and the manual
models in their paper. The differences revealed several ambiguities and problematic issues in their
imprecise modeling method and manual version DSMs.
9.1.1 ACN Modeling
We first construct the ACN model according to the application description. To locate wineries and
get directions using the application, the user first inputs either an approximate or exact address,
by which the application locates the exact address as the valid starting point, a function called
startWineryFind. After that, the user can select preferences for the wineries, the searchWinery
function. Given a starting point and the preferences, the application generates a route for a tour
consisting of all the wineries that match the preferences, the tour function. The application also
presents the driving directions upon user requests, the directions function.
The application depends on MapPoint as the main address and routing service, and they devel-
oped a local service WineryFind to find wineries matching criteria. The interfaces provided by these
Page 159
Chapter 9. The Generalizability of the Approach 143
services are called MapPointDesignRules and WineryFindDesignRules. In order to authenticate
MapPoint service requests, each authentication function has to implement a Java Servlet interface
HttpSessionListener, and uses ApacheAXIS to insert authentication parameters to MapPoint.
There are three supporting functions: the function to get addresses from MapPoint, AddressLo-
cator, to get routes from MapPoint, RouteMapHandler, and to get wineries from the local service,
WineryFinder. Since AddressLocator and RouteMapHandler access the MapPoint service, they
have to be authenticated. They do so by making two new classes AuthAddressLocator and Au-
thRouteMapHandler, which inherit from AddressLocator and RouteMapHandler respectively, and
the authentication is taken care of by these subclasses. All the web service function calls have to be
logged by WebServiceLogger.
9.1.1.1 Object-Oriented Design
To model this system into an ACN, we add “_interface” and “_impl” to the end of each function
name to represent the fact that in an OO design, each function leads to an interface and an imple-
mentation. For example, AddressLocator_interface and AddressLocator_impl are the two
variables addressing the AddressLocator function. We call this design WineryLocator OO.
Figure 9.1 lists all the constraints among these variables, which include the following categories:
• The relations among service functions, as shown from Line 1 to Line 5.
• Supporting functions depend on that relative services are available, as shown from Line 6 to
Line 21.
• Implementations depend on interfaces, as shown from Line 22 to Line 30.
• User functions depend on supporting function interfaces and other user function interfaces
they use, as shown from Line 31 to Line 43.
• Object-oriented constraints, for example, AuthAddressLocator inherits from AddressLocator,
as shown in Line 44 to Line 45.
Page 160
Chapter 9. The Generalizability of the Approach 144
1: HttpSessionBindingListener = orig => Servlet = orig;2: MapPointDesignRules = orig => ApacheAXIS = orig;3: WineryFindDesignRules = orig => WineryFind = orig;4: MapPointDesignRules = orig => MapPoint = orig;5: WineryFindDesignRules = orig => ApacheAXIS = orig;
6: startWineryFind_impl = orig => MapPointDesignRules = orig;7: tour_impl = orig => MapPointDesignRules = orig;8: directions_impl = orig => MapPointDesignRules = orig;9: directions_impl = orig => Servlet = orig;10: searchWinery_impl = orig => MapPointDesignRules = orig;11: searchWinery_impl = orig => WineryFindDesignRules = orig;12: startWineryFind_impl = orig => Servlet = orig;13: tour_impl = orig => Servlet = orig;14: searchWinery_impl = orig => Servlet = orig;15: AuthAddressLocator_impl = orig => ApacheAXIS = orig;16: AddressLocator_impl = orig => MapPointDesignRules = orig;17: RouteMapHandler_impl = orig => MapPointDesignRules = orig;18: WineryFinder_impl = orig => WineryFindDesignRules = orig;19: AuthRouteMapHandler_impl = orig => ApacheAXIS = orig;20: AuthRouteMapHandler_impl = orig => HttpSessionBindingListener = orig;21: AuthAddressLocator_impl = orig => HttpSessionBindingListener = orig;
22: AddressLocator_impl = orig => AddressLocator_interface = orig;23: RouteMapHandler_impl = orig => RouteMapHandler_interface = orig;24: AuthRouteMapHandler_impl = orig => AuthRouteMapHandler_interface = orig;25: AuthAddressLocator_impl = orig => AuthAddressLocator_interface = orig;26: searchWinery_impl = orig => searchWinery_interface = orig;27: tour_impl = orig => tour_interface = orig;28: directions_impl = orig => directions_interface = orig;29: WineryFinder_impl = orig => WineryFinder_interface = orig;30: WebServicesLogger_impl = orig => WebServicesLogger_interface = orig;
31: searchWinery_impl = orig => WineryFinder_interface = orig;32: searchWinery_impl = orig => startWineryFind_interface = orig;33: searchWinery_impl = orig => tour_interface = orig;34: startWineryFind_impl = orig => AuthAddressLocator_interface = orig;35: startWineryFind_impl = orig => searchWinery_interface = orig;36: startWineryFind_impl = orig => startWineryFind_interface = orig;37: tour_impl = orig => AuthRouteMapHandler_interface = orig;38: tour_impl = orig => startWineryFind_interface = orig;39: tour_impl = orig => directions_interface = orig;40: directions_impl = orig => startWineryFind_interface = orig;41: WineryFinder_impl = orig => WebServicesLogger_interface = orig;42: RouteMapHandler_impl = orig => WebServicesLogger_interface = orig;43: AddressLocator_impl = orig => WebServicesLogger_interface = orig;
44: AuthAddressLocator_interface = orig => AddressLocator_interface = orig;45: AuthRouteMapHandler_interface = orig => RouteMapHandler_interface = orig;
Figure 9.1: WineryLocator OO Constraints
Page 161
Chapter 9. The Generalizability of the Approach 145
9.1.1.2 OO Design with Design Rules
In order to improve the WineryLocator OO design, they introduced five interfaces to decouple the
effects of MapPoint as much as possible. Following how they name these interfaces, we model
them as the following design variables:
• startAddress_Address: the starting location the user provides and selects.
• matches_Address: the data structure storing the set of matched addresses.
• WinerySearchOption: the data structure storing the preferences.
• Tour: tour representation.
• MapOptions: standard map options.
We call this design WineryLocator DR. The constraints are adjusted so that some design vari-
ables now assume these design rules.
9.1.1.3 Aspect-Oriented Design
The author then presented an AO design where logging and authentication functions are imple-
mented using aspects. Modeling the AO design is similar to modeling the OO design. The aspects
are also design dimensions that can be modeled by design variables: aop_Authentication and
aop_WebServiceLogging.
The constraints in the AO design change a little: these aspect variables now assume the im-
plementations of other functions. The other functions no longer need to be aware of these two
functions. In essence, modeling AO and OO designs are similar. In both ACN models, logging
and authentications are just design variables. The two designs differ in how the variables are con-
strained.
For the dominance relation modeling in all three designs, we assume that nothing could affect
these third-party services and interfaces; and implementations should not affect the specified design
rules and interfaces. In the AO design, the authentication and logging aspects shouldn’t affect the
functions they advise.
Page 162
Chapter 9. The Generalizability of the Approach 146
Table 9.1: Performance for WineryLocator OO Modelsub- Size CN Solving DA GenerationACN (seconds) (seconds)
1 5 4 < 12 6 6 < 13 6 5 < 14 6 8 < 15 5 4 < 16 11 110 < 17 8 20 < 18 9 28 < 19 2 2 < 1
10 6 7 < 1
9.1.2 Modular Analysis Results
Without decomposition, it took Simon a whole day to solve these constraint networks, and the DA
generation took a couple of days. Basically, we are not able to generate DSM models within a
reasonable amount of time without decomposition.
The WineryLocator OO ACN with 27 variables is decomposed into 10 sub-ACNs. Table 9.1
lists the number of variables (Size), the constraint solving time, and the DA generation time for
each sub-ACN. Since Simon invokes multiple solvers in parallel, the constraint solving bottleneck
depends on the largest sub-ACN to solve. In this case, the 6th sub-ACN with 11 variables takes
about 2 minutes to find all its solutions. After that, DAs and DSMs are generated within a second.
After applying the decomposition approach introduced in Chapter 5, the WineryLocator DR
ACN with 32 variables is also decomposed into 10 sub-ACNs. Table 9.2 shows the performance.
In this case, the largest sub-ACN took about 1 minute to solve.
The WineryLocator AO ACN with 29 variables is decomposed into 9 sub-ACNs. Table 9.3
shows the performance. In this case, the largest sub-ACN took about 40 seconds to solve.
Comparing Table 9.2 with Table 9.3, we observe that the aspect design has one fewer sub-ACN, but
its largest sub-ACN is smaller than the largest one of the design rule design, and the performance is
a little better.
Page 163
Chapter 9. The Generalizability of the Approach 147
Table 9.2: Performance for WineryLocator DR Modelsub- Size CN Solving DA GenerationACN (seconds) (seconds)
1 8 20 < 12 6 7 < 13 6 6 < 14 5 6 < 15 8 20 < 16 8 36 27 8 27 < 18 9 51 29 2 3 < 1
10 8 21 < 1
Table 9.3: Performance for WineryLocator AO Modelsub- Size CN Solving DA GenerationACN (seconds) (seconds)
1 7 11 < 12 6 8 < 13 7 5 < 14 5 5 < 15 7 11 < 16 7 18 < 17 7 28 < 18 8 40 29 7 19 < 1
Page 164
Chapter 9. The Generalizability of the Approach 148
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 280:MapPoint .1:WineryFind .2:ApacheAXIS .3:Servlet .4:HttpSessionBindingListener x .5:MapPointDesignRules x x .6:WineryFindDesignRules x x .7:startAddress_Address .8:matchesAddress .9:WinerySearchOption .10:Tour .11:MapOperation .12:AddressLocator_interface .13:AddressLocator_impl x x x x x x .14:WineryFinder_interface .15:WineryFinder_impl x x x x x x .16:RouteMapHandler_interface .17:RouteMapHandler_impl x x x x x x .18:startWineryFind_interface .19:startWineryFind_impl x x x x x . x20:searchWinery_interface .21:searchWinery_impl x x x x x . x22:tour_interface .23:tour_impl x x x x x x . x24:directions_interface .25:directions_impl x x x x .26:WebServicesLogger_impl .27:aspect_Logging x x x x x x .28:aspect_Authentication x x x x x .
Figure 9.2: WineryLocator Aspect-Oriented Design
9.1.3 Comparative Results
In order to make their manual and our derived DSMs comparable, we cluster our derived DSMs so
that they appear in the same order and same names as that of their manual DSMs. For example, we
cluster tour sig and tour impl together as a module named tour, mapping to the variable with the
same name in their DSMs.
Figure 9.2 shows the AO design DSM generated and clustered by Simon. The cells with black
backgrounds are the marks that are missing in their manual DSMs. Figure 9.3 shows the DSM for
the DR design generated from Simon. Figure 9.4 shows a clustered DR DSM for comparison with
their manual DSMs. Tracing down the differences exposed several interesting issues, showing the
advantages of our formal model and automatic tool.
First, our derived DSMs reveal many indirect dependencies not shown in their manual ones.
For example, they chose MapPoint as their major library, which influences many other decisions.
Page 165
Chapter 9. The Generalizability of the Approach 149
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 320:MapPoint .
1:ApacheAXIS .2:WineryFind .3:Servlet .4:HttpSessionBindingListener x .5:MapPointDesignRules x .6:WineryFindDesignRules x x .7:startAddress_Address .8:matchesAddress .9:WinerySearchOption .10:Tour .11:MapOperation .12:WebServicesLogger_sig .13:WebServicesLogger x .14:AddressLocator_sig .15:AddressLocator x x x x x x .16:AuthAddressLocator_sig .17:AuthAddressLocator_impl x x x x x . x18:WineryFinder_sig .19:WineryFinder x x x x x x x .20:RouteMapHandler_sig .21:RouteMapHandler x x x x x x .22:AuthRouteMapHandler_sig .23:AuthRouteMapHandler_impl x x x x x . x24:startWineryFind_sig .25:startWineryFind_impl x x x x x x . x26:searchWinery_sig .27:searchWinery_impl x x x x x x . x28:tour_sig .29:tour_impl x x x x x x x . x30:directions_sig .31:directions_impl x x x x .32:web_xml x x x x .
Figure 9.3: Derived WineryLocator Design Rule DSMs
However, in their DSMs, only one module depends on it. Although a higher order matrix might re-
veal indirect dependences, these indirect dependences are not accounted for in Baldwin and Clark’s
value model, which they are using [7]. By contrast, the derived DSMs yield defensible estimates of
the total impact of given local changes in design.
Second, the dependence definition in the manual DSM modeling is ambiguous, making the
manual DSMs hard to understand. We take three design dimensions, startWineryFind, Address-
Locator, and AuthAddressLocator, as examples. The first is a function making use of the service
provided by the second to locate addresses. The third inherits from the second and extends it with
authentication functions. While our derived DSMs show that the first depends on the other two, their
manual DSMs indicate only a dependence of startWineryFind on AuthAddressLocator, but not on
AddressLocator, despite the fact that AddressLocator interface changes affect the startWineryFind
function directly.
Page 166
Chapter 9. The Generalizability of the Approach 150
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220:MapPoint .1:WineryFind .2:ApacheAXIS .3:Servlet .4:HttpSessionBindingListener x .5:MapPointDesignRules x .6:WineryFindDesignRules x x .7:startAddress_Address .8:matchesAddress .9:WinerySearchOption .10:Tour .11:MapOperation .12:WebServicesLogger .13:AddressLocator x x x x x .14:AuthAddressLocator x x x x . x15:WineryFinder x x x x x x .16:RouteMapHandler x x x x x .17:AuthRouteMapHandler x x x x . x18:startWineryFind x x x x x . x19:searchWinery x x x x x . x20:tour x x x x x x . x21:directions x x x .22:web_xml x x x x .
Figure 9.4: Collapsed WineryLocator Design Rule Design
We understand the reason for this discrepancy after discussing it with the authors to learn exactly
how the system is implemented. The dependence between startWineryFind and AuthAddressLoca-
tor is because of usage: the former is a jsp page using the latter as a Javabean. Since startWineryFind
doesn’t refer to AddressLocator directly, the authors did not mark them as dependent. The usage
and inherits relations are different, so using transitive operations to find this missing dependence
doesn’t seem to be proper. By contrast, our framework provides an exact semantics of dependence:
a change in one design decision causes revisitation and revision of the other. Using this definition,
the missing dependences are discovered directly.
Finally, the authors modeled third-party services, such as MapPoint, as environment parame-
ters. However, we understand environment conditions as those that are likely to change and drive
software evolution. For example, the user interface could be either web-based or GUI application
based on Java Swing. They mentioned this as a possible change in the paper, but didn’t model and
Page 167
Chapter 9. The Generalizability of the Approach 151
analyze it. These discrepancies imply potential problems in their later quantitative analysis, but we
don’t go further in this dimension.
9.2 HyperCast
HyperCast [48, 47] is an independently developed project. It is a scalable, self-organizing overlay
system developed using Java. Viewing overlay sockets as nodes, HyperCast integrates these nodes
into ad hoc networks, and provides network services, including naming, reliable transport and net-
work management. An overlay socket supports peer-to-peer and multicast communication within
these networks. We used HyperCast as a subject in our work applying the design rule concepts of
Baldwin and Clark to meet the need for a new kind of crosscutting interface to decouple aspects
from the code they advise [63].
We developed two methods to modularize the scattered logging code in the original object-
oriented (OO) design. One method is to obliviously add logging aspects, as aspect programs often
do. Another method is to insert interfaces that carry design rules to decouple aspects from the code
they advise. We used DSMs to represent these three designs, and to quantitatively evaluate the
resulting design structures, following the methods of Baldwin and Clark as adapted to software by
Sullivan et al. [64] and Lopes et al. [49].
However, we had to spend a great deal of time producing and correcting DSM models for
this study. In order to ease the DSM-based coupling structure analysis and enable design impact
analysis, we modeled the three designs as ACNs and tried to use Simon to analyze them as a whole.
However, we were not able to get any results within a reasonable amount of time, until we improved
Simon with the decomposition capability.
9.2.1 ACN Modeling
The HyperCast design has the following main dimensions: Socket—the overlay socket API,
Protocol—the available protocols, Monitor—the network management capability, Service—the set
of services, Adapter—a layer virtualizing the underlying networks, Event Handling, and Logging.
Page 168
Chapter 9. The Generalizability of the Approach 152
Each dimension leads to several design variables. For example, Adapter dimension is
modeled by specification adaptor_spec, interface adaptor_interface, and implementation
adaptor_impl. Event Handling and Logging are crosscutting concerns. Events include proto-
col event and service event. Logging has the following aspects: info logging, exception logging,
non exception logging. In the aspect-oriented (AO) designs, we add prefix “ao_” before these
logging variables. The domain of each variable contains its available choices. We assume that each
dimension has some other unelaborated choices.
Figure 9.5 shows the constraint network modeling the OO design. There are three types of
constraints in this system. First, an implementation depends on interfaces, Line 1 to Line 6. Second,
an implementation should fulfill the corresponding specification or policies, Line 7 to Line 17.
Third, the dimensions make use of each other, Line 18 to Line 42.
We view specifications as environment variables and interfaces as design rules. We assume that
nothing could affect environment variables, and implementations cannot influence design rules. The
dominance relation is generated accordingly. For example, (service_impl, service_spec) is a
member of the dominance relation.
The three HyperCast designs are modeled using three ACNs. The ACN modeling the original
OO design (OO ACN) has 29 variables; the ACN modeling the oblivious AO design (oblivious
ACN) has 25 variables; and the ACN modeling the DR AO design (DR ACN) has 33 variables.
9.2.2 Modular Analysis Results
Table 9.4 shows the decomposed OO sub-ACNs and their performance. Each sub-ACN concen-
trates on one major task: the protocol sub-ACN has 13 variables, adapter sub-ACN has 8 variables,
service sub-ACN has 13 variables, socket sub-ACN has 8 variables, event sub-ACN has 5 variables,
and monitor sub-ACN has 11 variables. Each sub-ACN has several logging-related variables and
constraints.
The oblivious ACN is decomposed into 5 sub-ACNs, as shown in Table 9.5. The dimensions
related to one kind of logging are aggregated together. For example, the sub-ACN with 9 variables
includes the service and protocol dimensions where information logging is requested, as well as the
Page 169
Chapter 9. The Generalizability of the Approach 153
1: protocol_impl = orig => protocol_interface = orig;2: service_impl = orig => service_interface = orig;3: socket_impl = orig => socket_interface = orig;4: monitor_impl = orig => monitor_interface = orig;5: adapter_impl = orig => adapter_interface = orig;6: event_impl = orig => event_interface = orig;
7: protocol_impl = orig => protocol_spec = orig;8: service_impl = orig => service_spec = orig;9: socket_impl = orig => socket_spec = orig;10: monitor_impl = orig => monitor_spec = orig;11: adapter_impl = orig => adapter_spec = orig;12: event_impl = orig => event_spec = orig;13: exception_logging = orig => exception_logging_policy = orig;14: non_exception_logging = orig => non_exception_logging_policy = orig;15: protocol_event = orig => protocol_event_policy = orig;16: service_event = orig => service_event_policy = orig;17: info_logging = orig => info_logging_policy = orig;
18: protocol_impl = orig => socket_interface = orig;19: protocol_impl = orig => event_interface = orig;20: protocol_impl = orig => protocol_event = orig;21: protocol_impl = orig => info_logging = orig;22: service_impl = orig => socket_interface = orig;23: service_impl = orig => event_interface = orig;24: service_impl = orig => service_event = orig;25: service_impl = orig => info_logging = orig;26: monitor_impl = orig => protocol_interface = orig;27: monitor_impl = orig => service_interface = orig;28: monitor_impl = orig => adapter_interface = orig;29: monitor_impl = orig => socket_interface = orig;30: adapter_impl = orig => socket_interface = orig;31: event_impl = orig => protocol_interface = orig;32: event_impl = orig => service_interface = orig;33: protocol_impl = orig => exception_logging = orig;34: protocol_impl = orig => non_exception_logging = orig;35: service_impl = orig => exception_logging = orig;36: service_impl = orig => non_exception_logging = orig;37: socket_impl = orig => exception_logging = orig;38: socket_impl = orig => non_exception_logging = orig;39: monitor_impl = orig => exception_logging = orig;40: monitor_impl = orig => non_exception_logging = orig;41: adapter_impl = orig => exception_logging = orig;42: adapter_impl = orig => non_exception_logging = orig;
Figure 9.5: HyperCast OO Constraints
Page 170
Chapter 9. The Generalizability of the Approach 154
Table 9.4: Performance for HyperCast OO Modelsub- Size CN Solving DA GenerationACN (seconds) (seconds)
1 8 21 < 12 5 6 < 13 11 152 34 13 340 45 13 339 46 7 10 < 1
Table 9.5: Performance for HyperCast Obliviousness Modelsub- Size CN Solving DA GenerationACN (seconds) (seconds)
1 17 1106 402 9 22 < 13 17 1107 584 6 6 35 6 6 < 1
information logging function itself. The sub-ACNs with 17 variables are the slowest, indicating a
suboptimal aggregation. Taking a closer look at these large sub-ACNs, we found that a lot of vari-
ables replicate in both sub-ACNs, due to the “oblivious” exception logging aspects: these aspects
actually depend on the implementations of many other functions, and as a result, our decomposition
algorithm aggregates these aspects with all the functions that they are advising.
The DR ACN is decomposed into 10 sub-ACNs, with 7, 3, 7, 4, 6, 3, 15, 11, 8,and 6 vari-
ables respectively. Each sub-ACN addresses one of the ten dimensions: adapter, protocol, service,
monitor, socket, info logging, exception logging, non exception logging, protocol event and ser-
vice event. We observe that although the DR ACN has more variables, the key design rules we
added enable us to decompose the system with greater granularity, and the speed to get its DSM is
the fastest. At this time, these aspects only depend on the design rules, but not the implementation
of many functions. As a result, large sub-ACNs in the oblivious AO design are removed.
Page 171
Chapter 9. The Generalizability of the Approach 155
Table 9.6: Performance for HyperCast DR Modelsub- Size CN Solving DA GenerationACN (seconds) (seconds)
1 7 8 32 3 2 < 13 7 18 < 14 4 3 < 15 6 9 26 3 2 < 17 15 332 38 11 78 29 8 13 1
10 6 6 < 1
9.2.3 Comparative Results
Figure 9.6 and 9.8 are the DSMs Simon generated by decomposition and combination for the orig-
inal OO design and oblivious AO design. We do not present the design rule DSM here because it
is exactly same as the DSM we produced manually in [63]. We now address the differences of the
derived OO and oblivious AO DSMs from their manual versions.
Compare Figure 9.6 with Figure 9.7, the manual OO DSM in our previous work [63]. The
difference is that we previously replicated the same variables with crosscutting effects, such as
info_logging, into several different lines, as though they are different variables. This is error-
prone because if the dependences of these variables are changed, the user has to remember where
these variables are scattered. The derived DSM shows the crosscutting effects by the off-block
dependences.
Comparing the manual and derived DSMs for the oblivious design shows that in the manual
version, logging aspects do not depend on the functional specifications. However, these depen-
dences should exist as shown in the derived DSM because the implementation changes are very
likely caused by the changes in the specification, and the implementation changes influence the
aspects.
Page 172
Chapter 9. The Generalizability of the Approach 156
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
1:protocol_spec .2:service_spec .3:socket_spec .4:monitor_spec .5:adapter_spec .6:event_spec .7:protocol_event_policy .8:service_event_policy .9:info_logging_policy .10:exception_logging_policy .11:non_exception_logging_policy .12:protocol_interface .13:service_interface .14:socket_interface .15:monitor_interface .16:adapter_interface .17:event_interface .18:protocol_impl x x x x x x x x . x x x x19:service_impl x x x x x x x x . x x x x20:socket_impl x x x x . x x21:monitor_impl x x x x x x x x . x x22:adapter_impl x x x x x . x x23:event_impl x x x x .24:protocol_event x .25:service_event x .26:info_logging x .27:exception_logging x .28:non_exception_logging x .
OO Interfaces
Functional Implementations
�������������
�����
MainFunctions
loggingfunction
Figure 9.6: HyperCast OO Derived DSM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 371.protocol_spec2.service_spec3.socket_spec4.monitor_spec5.adapter_spec6.event_spec7.protocol_event_policy8.service_event_policy9.info_logging_policy10.exception_logging_policy11.non_exception_logging_policy12.protocol_interface13.service_interface14.socket_interface15.monitor_interface16.adapter_interface17.event_interface18.protocol_impl x x x x x x x x19.protocol_event_impl x x x x x20.info_logging_impl x x x x x21.exception_logging_impl x x x x x22.non_exception_logging_impl x x x x x23.service_impl x x x x x x x x24.services_event_impl x x x x x25.info_logging_impl x x x x x26.exception_logging_impl x x x x x27.non_exception_logging_impl x x x x x28.socket_impl x x x x29.exception_logging_impl x x x30.non_exception_logging_impl x x x31.monitor_impl x x x x x x x x32.exception_logging_impl x x x33.non_exception_logging_impl x x x34.adapter_impl x x x x x35.exception_logging_impl x x x36.non_exception_logging_impl x x x37.event_impl x x x x
������
�����
������� ��
�����
���������� ������� ��
������� ��������
���� �����
Figure 9.7: HyperCast OO Manually-Constructed DSM [63]
Page 173
Chapter 9. The Generalizability of the Approach 157
� � � � � � � � � �� �� �� �� �� �� �� �� � � �� �� �� �� ��
1:protocol_spec .2:service_spec .3:socket_spec .4:monitor_spec .5:adapter_spec .6:protocol_event_policy .7:service_event_policy .8:info_logging_policy .9:exception_logging_policy .10:non_exception_logging_policy .11:protocol_interface .12:service_interface .13:socket_interface .14:monitor_interface .15:adapter_interface .16:protocol_impl x x x .17:service_impl x x x .18:socket_impl x x .19:monitor_impl x x x x x x .20:adapter_impl x x x .21:ao_protocol_event x x x x x .22:ao_service_event x x x x x .23:ao_info_logging x x x x x x x x .24:ao_exception_logging x x x x x x x x x x x x x x x x .25:ao_non_exception_logging x x x x x x x x x x x x x x x x .
Aspects
OOImplementation
s
OO Interfaces
MainFunctions
loggingfunction
Figure 9.8: HyperCast AO DSM
9.3 Galileo
We developed a design model of the Galileo dynamic fault tree analysis tool [66, 65, 24]. Galileo
has about 35,000 lines of C++ code excluding library and generated code. We designed and im-
plemented this system, but its design was independent of the work presented in this paper. It has
evolved continually over eight years, with hundreds of files and many student developers. During
maintenance and feature enhancement stages, we found several problematic issues. First, intu-
itional memory of important but implicit crosscutting decisions was lost over time. During the
feature enhancement stage, developers of these features were not aware of these constraints, and
invalid designs were proposed. Second, it is hard to justify different design refactoring proposals.
We modeled these historical situations as CACNs using Simon, and found that our analysis
would allow the designer to see constraints that have to be respected when making a change, and
that Simon provides quantitative analysis results that are consistent with our earlier refactoring
decisions. This section presents two models and shows the analysis enabled by Simon.
Page 174
Chapter 9. The Generalizability of the Approach 158
1: set spec_elements(orig, other):(v1{andGate, orGate, pandGate}, other);
2: scalar dr_core: (no_mfc, other);3: scalar dr_visio: (constant_time, other);
4: set module_core(orig, other): %spec_elements;5: set module_word(orig, other): %spec_elements;6: set module_visio(orig, other): %spec_elements;7: %spec:spec_elements, core:module_core,
word:module_word, visio:module_visio% |core = orig => spec = orig,
word = orig => spec = orig && core = orig,visio = orig => spec = orig && core = orig;
8: ˜module_core = orig => dr_core = no_mfc;9: ˜module_visio = orig => dr_visio= constant_time;
Figure 9.9: Galileo Design Rules CACN
9.3.1 Model Design Rules and Features
Figure 9.9 shows a CACN modeling a number of features and design rules of Galileo. A fault tree is
composed of a number of elements, such as Gates and Events. These elements and their properties
were specified in requirement and specification documents. We model such specifications as a
SDV spec_elements, shown in Line 1. We only model three types of Gates for the purpose of
illustration.
Each element required by the user brings into being a corresponding module containing core
data structures representing it and a set of operations working on it. All the core modules, mod-
eled by a SDV module_core, should be independent from other modules since it is supposed to
be portable. Each fault tree element should have a textual Word representation and a graphic Visio
representation. Similarly, we use SDVs module_word and module_visio to model the view mod-
ules containing both the representations and the operations on the respective views. Line 4 through
Line 6 model these dimensions and their correspondence relations with spec_elements.
During the maintenance stage, we recovered some design decisions that the chief architect made
at the beginning of the project. The new developers were not aware of these decisions and tended to
violate them in implementing new features. For example, the core fault tree data structure shouldn’t
Page 175
Chapter 9. The Generalizability of the Approach 159
assume the presence of the Microsoft Foundation Classes (MFC), yet the implementation makes
use of CString and MFC message boxes as error prompt methods.
Another constraint that was violated was that the visual operations of a fault tree should not
require a linear-time or costlier traversal. This constraint is needed, given Visio’s performance
degradation when traversing large drawings, to ensure that users receive interactive response times
even when editing large fault trees. The design rule dictates that no function may incur more than a
constant-time query of a Visio depiction.
Failure to respect these constraints had incurred costs during maintenance. When we tried
to separate the core data structures to implement fault tree analysis as a web service on a Unix-
based platform, we found that we had to spend much time rewriting the MFC dependent part. To
implement an enhanced error reporting capability, two new maintainers planned to implement a
function to analyze the entire graphical depiction of a fault tree presented by Microsoft Visio. Such
a function would have violated the constant time design rule.
We realized that these design rules are important, but they were only represented in the mind of
the chief architect. The developers who had once understood them had all left (graduated). Without
a modeling and analysis approach in which such constraints could be revealed automatically, there
is a significantly higher risk that such constraints will be forgotten over time and be violated again.
In a framework such as ours, the constraint could be represented as a logical expression quantified
over an explicitly modeled set of the algorithms that manipulate the given representation. We
use module_visio here for simplicity. These design rules are modeled as two design variables,
dr_core and dr_visio, in Line 2 and Line 3 of Figure 9.9. Their prevailing effects are thus
modeled as universally quantified logic shown in Line 8 and Line 9, which dictates that every core
module should respect the core design rules, and every visual module should respect the constant
time rule.
9.3.2 When A New Feature is Added
During the feature enhancement stage, we proposed several approaches to implement a common
cause gate (CCG) feature that involves, among other things, new operations on the visual depiction
Page 176
Chapter 9. The Generalizability of the Approach 160
(A) Galileo with Design Rules (B) Galileo with the New CCG Feature
Figure 9.10: Galileo: Design Rules and New Features
of a fault tree. The chief architect brought up the constant-time constraint during our discussion.
As a result, one of the mechanisms proposed by the feature designers, who were not aware of this
decision before, was found to be unusable despite its other merits, because it had to traverse the
entire visual representation.
Now we use Simon to analyze the impact of adding the CCG feature. We first make
spec_elements=v1{andGate, orGate, pandGate} and let Simon generate an ACN and derive
its DSM, as shown in Figure 9.10 (A).
After that, we change the original CACN by adding a new value,
v2{andGate, orGate, pandGate, ccgGate},
to the domain of the variable spec_elements, modeling the fact that a new gate is required in
version 2. Then we make spec_elements=v2, and let Simon generate the new ACN and its DSM,
as show in Figure 9.10 (B).
Comparing the two ACNs and DSMs, we observe that adding the CCG feature requires
adding a CCG core module, a CCG word module, and a CCG visio module. The DSM shows
that the CCG gate core design, ccgGate_module_core, should respect the CCG specification,
ccgGate_spec_elements, as well as the design rule dr_core. Simon provides a view so that the
user can see these dependences clearly. Similarly, it shows that the CCG Visio designer should
respect the Visio design rules. Design representations and analysis of the kind we propose here,
as supported by Simon, both record and highlight the constraints that have to be respected when
Page 177
Chapter 9. The Generalizability of the Approach 161
1: set views(orig, other):(v1{word, visio}, other);2: set errors(orig, other):(v1{syntax, semantics}, other);3: scalar MarkSequence:(orig, other);4: subspace ErrorHandling: (option1, option2, option3, option4);
5: ErrorHandling_option1 [˜views=orig => ˜errors=orig && MarkSequence=orig;];6: ErrorHandling_option2 [˜errors=orig => ˜views=orig && MarkSequence=orig;];7: ErrorHandling_option3 [8: set markers(orig, other): %errors * %views;9: ˜markers=orig => MarkSequence=orig;10: ˜errors=orig => ˜markers=orig && ˜views=orig;11: ];12: ErrorHandling_option4 [13: scalar ErrorToMark:(orig, other);14: set markers(orig, other): %errors * %views;15: ˜markers=orig => MarkSequence=orig;16: ErrorToMark=orig => ˜errors=orig && ˜markers=orig && ˜views=orig;17: ]
Figure 9.11: Galileo Error Handling Refactoring
making such a change.
9.3.3 Error Handling Options
Error handling was suggested to be refactored to be more consistent and descriptive. Two di-
mensions involved in error handling were support for multiple error types (e.g., syntax error and
semantics error), and support for multiple views (Word97, Visio5). According to different types
of errors, the error handling module should mark the views where an error happens, jump to the
error point, give messages, and clear the marks once the error is corrected. We call these actions a
marking sequence, modeled by MarkSequence.
Four refactoring mechanisms were proposed in relation to the addition of sophisticated error
handling to Galileo. We faced the problem of choosing the best one. We modeled this decision
with an HDV called ErrorHandling. Each option is modeled as a subspace value, as depicted in
Figure 9.11.
The first option requires that each error object knows in which view an error happens, and
implements the marking sequence. The second option is symmetric to the first one, requiring that
each view knows what type of error happened, and that it then implements the marking sequence.
Page 178
Chapter 9. The Generalizability of the Approach 162
Prototypes were built for these options and the designers realized that the marking sequence was
complex and followed the same pattern (crosscutting), which made the code hard to understand. As
a result, these options were abandoned despite their straightforwardness and ease of understanding.
We came to a point where a marker class, modeled by markers, is designed to take the responsibility
of implementing a marking sequence according to different combinations of error and view types.
As this time, the third and fourth designs, both with a marker class, were proposed. We at-
tempted to figure out which one was better. The major difference of the third and fourth options
was in who deciding which error happened in which view and in invoking the corresponding marker
object. The third option required each error objects to take this responsibility and the fourth option
demanded a new class, ErrorToMark, to do the job, as shown in Line 16 Figure 9.11.
9.3.4 Select the Best Refactoring Mechanism
To make a rational decision, we first compare the different coupling structures these options would
incur, and then envision a possible change to see the different consequences.
9.3.4.1 Inspect Different Coupling Structures
Figure 9.12 shows the DSMs Simon generates for each of the four error handling decisions. By
comparison, we can tell that options 1 and 2 are simpler in the sense that they involve fewer design
dimensions. However, their DSMs show many crosscutting dependences. Option 3 expands the
design space into more dimensions, while retaining many crosscutting dependences. Option 4
appears to be the best in terms of its coupling structure: although it has one more dimension than
option 3, it has the fewest dependences.
9.3.4.2 Change Impact
Now we model another possible feature change to analyze change impact. One envisioned change is
to add new views to the system, for example, an Excel view and an XML view, modeled by adding
a new value, v2{word, visio, excel, uml}, to the domain of evnr_views, and specifying its
value as v2.
Page 179
Chapter 9. The Generalizability of the Approach 163
(A) Error Handling Option 1 (B) Error Handling Option 2
(C) Error Handling Option 3 (D) Error Handling Option 4
Figure 9.12: Design Structures using Different Error Handling Options
(A) Error Handling Option 3 with New Views (B) Error Handling Option 4 with New Views
Figure 9.13: Add New Views based on Different Error Handling Options
Page 180
Chapter 9. The Generalizability of the Approach 164
After the ACNs and DSMs with new features are generated for each error handling option, we
compare them with the ones without the new features. We observed that although the design spaces
are expanded similarly in all cases, the design using option 4 is the only one that has the least
increment in dependences. Reflecting back to implementation, the major difference is that when
new views are added, options 1, 2, and 3 require changing multiple places. For example, according
to Figure 9.13, for the design using option 3, syntax_errors and synmantics_errors are going
to be changed. The option 4 design requires only one additional change to ErrorToMark. This
analysis quantitatively validates the choice that we actually made: to use the fourth option.
9.4 Chapter Summary
In summary, this chapter has evaluated the hypotheses that our modeling and analysis approach gen-
eralizes beyond the set of small models, and is applicable to systems and to modeling and analysis
experiments that are (a) beyond those used in developing the approach, (b) beyond those developed
by the authors of the approach, and (c) to modeling and analysis of design models for real system.
We have modeled three real designs people have analyzed and automated these analyses, and our
experiments support the claim that our approach generalizes in these dimension: our framework is
expressive enough to capture varieties of design phenomena uniformly and analyze these problems
automatically, confirming previous results or revealing errors in them precisely and quantitatively.
These experiments and results constitute an important first step needed to justify a next more ex-
pensive step, that is, studies of the utility of the approach and supporting tools in a real design
setting.
Page 181
Chapter 10
Evaluation of this Research
In this chapter, we first summarize how well the thesis of this dissertation is supported by the
evidence and analyses presented. After that, we evaluate the novelty, potential, as well as the
shortcomings and remaining problems in the proposed approach. Finally, we evaluate this work in
terms of its potential to lead to significant results in the future.
10.1 Thesis and Evidence
The ultimate goal of this research is to enable software designers to make value-oriented design
decisions in a rational way, facilitated by automatic tools. The purpose of this dissertation is to
provide a formal analyzable design modeling framework, as one fundamental step towards this
goal. This dissertation claims and evaluates the following thesis:
• This framework provides a formal account of the key concepts of important but informal
modularity theories. (1) It formalizes Baldwin and Clark’s key notions of design dimension,
design decision, design decision dependence, and design space. (2) It formally accounts for
Parnas’s concept of information hiding modularity as a mechanically checkable predicate.
• This framework enables the derivation of design coupling structures in the form of pair-wise
relations on design decisions, and thus also the derivation of DSMs from ACNs. The benefit
is that the approach enables designers to reason about modularity in design architecture using
165
Page 182
Chapter 10. Evaluation of this Research 166
both the methods of Baldwin and Clark (but in terms of an abstract and formally precise
representation), as well as new kinds of analysis.
• This framework automates basic evolvability analyses such as design impact analysis. Given
a sequence of changing decisions or conditions, this framework computes how many ways
are there to accommodate these changes, and how many decisions should be reconsidered in
each way.
• Our model of modularity in design is general. In particular, it can account for both traditional
object-oriented notions of modularity and newer aspect-oriented notions within a unified,
declarative framework.
Chapter 7 has evaluated the first element of our thesis. In that chapter, we have formally ac-
counted for the important concepts of Baldwin and Clark’s theory within the settings of our core
models; formally defined the semantics of Information Hiding Modularity, and formally defined
the design impact analysis problem and its formal solution.
Our evaluation strategy for the analysis part of the thesis includes three parts: (1) we formally
model software designs for which people have analyzed problems that have strong economic im-
plications, (2) automate these analyses using Simon, and (3) compare the results with the previous
qualitative analysis results.
We first evaluated the thesis against two canonical designs. Chapter 4 has presented the mod-
eling and analysis of the famous software engineering benchmark, Key Word in Context (KWIC).
Chapter 6 has presented the modeling and analysis of the widely used Figure Editor (FE) exam-
ple [43, 37, 33]. These designs are widely used in a large number of publications, representing dra-
matically different design paradigms: KWIC represents functional and object-oriented designs; the
Figure Editor design manifests broader contemporary design phenomena, such as design patterns
and aspect-oriented programming. Our experiments have shown that our framework is expressive
enough to model these design phenomena uniformly. We used Simon to automatically analyze the
problems people previously analyzed qualitatively or manually. Our analysis results either confirm
previous results or reveal errors in them precisely and quantitatively.
Page 183
Chapter 10. Evaluation of this Research 167
We also evaluated the generalizability of our framework in terms of three real designs in
Chapter 9: the modeling and analysis of a web application developed and studied by Lopes et
al. [49] (WineryLocator); the modeling and analysis of a peer-to-peer networking system, Hyper-
Cast [48, 47], developed by the network researchers in the University of Virginia and studied by
Sullivan et al. [63]; and the modeling and analysis of the Galileo dynamic fault tree analysis tool,
developed at the University of Virginia for production use at NASA [66, 65, 24]. In the first two
designs, the authors use Baldwin and Clark’s modeling and analysis technique to quantitatively
compare different designs based on manually-constructed DSM models. We represent these de-
signs as ACNs according to their design descriptions, generate DSM models, and compare with
their manual models. The comparisons reveal ambiguities and problematic issues in the manual
models that the authors used to compute NOV values, which implies potential problems in their
quantitative results. The Galileo designers once faced a situation when they had to make a decision
about how to restructure part of the system. They reached a decision based on discussions and argu-
ments, rather than rigorous analysis. Modeling and analyzing this historical scenario using Simon
suggests that the designers might have been able to compare different decisions comprehensively
and to justify their decision rationally, had they had the benefits of a tool such as Simon.
In summary, we have achieved the goals set forth and the evidence and analyses presented have
supported our theses.
10.2 Novelty and Potential
Our contributions appear novel in several dimensions. First, the formal account of Parnas’s influen-
tial but informal information hiding principle enables the rigorous and automatic application of his
analysis. Second, by formalizing key notions of Baldwin and Clark’s design rule theory, we put a
new emphasis on the design of design spaces and their underlying coupling structures, as opposed
to the design of individual points in design space, the focus of most current design methods. Third,
our formalization of dependence in constraint networks appears to be novel. Fourth, the DA model
captures the complex ways in which changes can be accommodated in real systems. Finally, the
Page 184
Chapter 10. Evaluation of this Research 168
provision of a formal basis for dependence markings in DSMs, in principle, imports design analysis
techniques [25, 62] developed around DSMs into software design.
Our work has potential in several areas. First, it has potential to support a formal abstract
theory of modularity in design, and, eventually, to contribute to a value-based theory of architectural
design. Second, it has the potential to connect software design with existing economic models and
analysis, such as Baldwin and Clark’s modular operators, to provide a scientific basis of value-
oriented decision-making. Third, the work has the potential to help designers estimate the cost and
benefits of high consequence decisions in practice, such as the decision to refactor or to add a new
feature.
10.3 Limitations and Remaining Problems
However, this work is still at the emerging stage, leaving many issues open, and having its limita-
tions and shortcomings.
First, one problem is that the sizes of design spaces are exponential in the number of variables,
in general. Although our decomposition approach has alleviated the problems we have encountered
in our case studies, the scales of the designs we have studies is still relatively small. We haven’t had
a chance to evaluate the whole approach in large-scale complex designs.
Second, we haven’t had a chance to collaborate with practitioners to empirically assess both the
scalability issue, and how difficult it is to use our framework.
Third, the language Simon currently uses is not a mature logical language yet. We seek to
evaluate and develop the language and the tool while we investigate the modeling and analysis of
real problems in practice.
Finally, Simon uses Alloy as its underlying SAT solver, which incurs unnecessary overhead of
translating our models into Alloy specifications. In addition, Alloy is not designed for the purpose
we are using it, which contributes part of the performance issues. Employing a mature SAT solver
directly is among our plans.
Page 185
Chapter 10. Evaluation of this Research 169
10.4 Challenges and Open Questions
We identify the following challenges and open questions regarding to the ultimate application of
this framework in practice.
First, modeling with ACNs or CACNs requires abstraction. Deciding what to model is not
always easy. For example, HyperCast includes hundreds of files and tens of thousands lines of
code. We modeled it using about 30 variables. We found such modeling difficult at first, but easier
as we gained some experience.
Second, we are current using finite-domain constraint networks as the basis of our modeling and
analysis techniques. There might be cases that require more complex constraint models, such as
linear or quadratic equations. We might extend our model in the future to address such requirements.
Finally, our framework is based on the perspective that design is a decision-making proce-
dure, and both ACNs and CACNs model decisions and the relation among these decisions. This
decision-based model is dramatically different from the traditional program-based design models,
such UML. This discrepancy might present difficulty in the application of our framework.
10.5 Future Work
We find this framework is general enough to connect with various software development stages,
and our future work will seek to explore, develop, and extend our framework for the power of
description, prediction, and prescription.
10.5.1 Between Design and Value
We are still in need of models that can scientifically account for the economic value of important
design structures and activities, and provide the basis for economic-oriented decision making. We
are currently collaborating with Carliss Baldwin from the Harvard Business School to explore the
relation between design refactoring and the value variation caused by this activity.
In addition, extending design impact analysis to support cost modeling would allow one to find
the least expensive way to accommodate a given sequence of changes in a design. As introduced in
Page 186
Chapter 10. Evaluation of this Research 170
Chapter 1, this work is motivated by a question from an industry practitioner: “Given the necessity
to keep our feature delivery velocity, is it worthwhile investing in refactoring, as my engineers
suggested?” Our framework proposes a solution to such a problem, which has the following key
elements: (1) developing CACN models at a suitably high level of abstraction; (2) formulating
an expected evolutionary scenario as a sequence of changes, or perhaps as a stochastic process
generating change requests; (3) measuring the cost of change in both cases; (4) accounting for the
switching cost to get from the current to the proposed new design. We plan to further evaluate and
develop this idea.
10.5.2 Between Design and Code
Under the pressure of project deadlines, projects often sacrifice design architecture and plunges
to implementation prematurely. The problem is not that the design stage is ignored. It is that
current design modeling and analysis techniques do not support fast and automated design evolution
modeling and analysis. Software evolution should start with comprehensive consideration of the
costs and benefits based on current and proposed design representations. These analyses should
enable designers to select an optimal way to evolve the project, quickly and automatically. On the
other hand, legacy code presents difficulties in many companies. Recovering designs from source
code is important to save previous investments. Part of our future work is to explore approaches to
extracting logical design models from source code, taking it as a subset of the full design, combining
it with high level design models, and forming a full picture.
We found the potential utility of this framework in many dimensions, and look forward to
continuing the exploration.
Page 187
Chapter 11
Conclusion
To address the problem that current design representations are not sufficient to enable designers
to reason about design structures and their economic properties, this dissertation contributes an
analyzable design modeling framework that supports formal design modeling, formalizes important
design concepts and approaches, and automates a number of economic-related analyses.
This framework consists of a design description model called an augmented constraint network
(ACN) to model design decisions and external conditions in a general way, an intermediate opera-
tional design space model derived from an ACN model, which we call a design automaton (DA),
to connect a conceptual design with its economic-related properties, and a pair-wise dependence
relation (PWDR) derived from a DA, to support design coupling structure analysis.
This framework provides a formal account of the key concepts of important but informal mod-
ularity theories: (1) it formalizes Baldwin and Clark’s key notions of design dimension, design
decision, design decision dependence, design space, and design rule. (2) It formally accounts for
Parnas’s concept of information hiding modularity, formalizing this principle as a predicate me-
chanically checkable by tools.
The supporting tool, Simon, enables the following analyses for conceptual designs: (1) Par-
nas’s changeability analysis can be done automatically and quantitatively; (2) design structure ma-
trices (DSMs) can be derived automatically from conceptual software designs; and (3) Baldwin and
Clark’s net option value analysis based on DSM modeling can be calculated automatically.
Scalability is a common issue for many formal models depending on constraint solving, includ-
171
Page 188
Chapter 11. Conclusion 172
ing our model. We create a method to decompose a large ACN model into a number of smaller ones
to help with the scalability issue that a large ACN encounters. We have observed dramatic perfor-
mance improvement. To model and analyze complex design decisions with structural impacts, we
extend the ACN model into a complex augmented constraint network (CACN) to support structural
design impact analysis.
This framework has been evaluated against both canonical software design benchmarks and real
software designs. The evaluation shows that: (1) this framework is expressive enough to capture
a variety of design decision-making phenomena, such as object-oriented design, aspect-oriented
design, and design patterns; and (2) it has the ability to automate the analysis of a number of
economic-oriented problems that people previously analyzed manually or qualitatively, and the
results either confirm the previous ones, or reveal errors in them, showing the power of formal
models and automated analyses.
In summary, we have contributed a general framework that formally accounts for the key con-
cepts of important but informal modularity theories, enables the automation of basic evolvability
analysis such as design impact analysis, and enables the derivation of design coupling structures
in the form of pair-wise relations on design decisions, and thus also the derivation of DSMs from
ACNs. We also contribute a prototype tool, Simon, to support the modeling and analysis techniques.
Simon has proved the concepts developed in the framework, showing that it is possible to use this
tool-supported framework to model software designs and analyze evolvability and economic-related
properties with reasonable performance.
The ultimate goal of this research is to enable software designers to make value-oriented design
decisions in a rational way, facilitated by automatic tools, which involves future work in a number
of dimensions, such as the further explorations of applicable economic models, further develop-
ment of the tool, and the connection of this framework with different development stages, such as
specifications and source code analyses.
Page 189
Bibliography
[1] Gregory D. Abowd, Robert Allen, and David Garlan. Formalizing style to understand descrip-
tions of software architecture. ACM Transactions on Software Engineering and Methodology,
4(4):319–64, October 1995.
[2] Christopher W. Alexander. Notes on the Synthesis of Form. Harvard University Press, 1970.
[3] Martha Amram and Nalin Kulatilaka. Real Options: Managing Strategic Investment in an
Uncertain World. Oxford University Press, USA, Dec 1998.
[4] Robert Arnold and Shawn Bohner. Software Change Impact Analysis. Wiley-IEEE Computer
Society Pr, first edition, 1996.
[5] W.R. Ashby. Design for a Brain. John Wiley and Sons, 1952.
[6] Sara Baase and Allen Van Gelder. Computer Algorithms: Introduction to Design and Analysis
(3rd Edition). Addison Wesley, 3rd edition, Nov 1999.
[7] Carliss Y. Baldwin and Kim B. Clark. Design Rules, Vol. 1: The Power of Modularity. The
MIT Press, 2000.
[8] Don Batory and Bart J. Geraci. Composition validation and subjectivity in genvoca generators.
IEEE Transactions on Software Engineering, 23(2):67–82, February 1997.
[9] Don Batory and Sean O’Malley. The design and implementation of hierarchical software
systems with reusable components. ACM Transactions on Software Engineering and Method-
ology, 1(4):355–398, 1992.
173
Page 190
Bibliography 174
[10] Don Batory, Jacob Neal Sarvela, and Axel Rauschmayer. Scaling step-wise refinement. In
ICSE 03: Proceedings of the 25th International Conference on Software Engineering, pages
187–197, Washington, DC, USA, 2003. IEEE Computer Society.
[11] Don Batory, Vivek Singhal, Jeff Thomas, Sankar Dasari, Bart Geraci, and Marty Sirkin. The
genvoca model of software-system generators. IEEE Software, 11(5):89–94, September 1994.
[12] L. A. Belady and C. J. Evangelisti. System partitioning and its measure. In Journal of Systems
and Software, 1981.
[13] L. A. Belady and M. M. Lehman. A model of large program development. IBM Systems
Journal, 15(3):225–252, March 1976.
[14] Barry W. Boehm and Kevin J. Sullivan. Software economics: a roadmap. In Proceedings of
the conference on The future of Software engineering, pages 319–343. ACM Press, 2000.
[15] Grady Booch, James Rumbaugh, and Ivar Jacobson. The Unified Modeling Language User
Guide. Addison-Wesley, Reading, Massachusettes, 1999.
[16] Fred Brooks. No silver bullet: Essence and accidents of software engineering. IEEE Com-
puter, 20(4):10–19, April 1987.
[17] Fred Brooks. Is there a design of design? In Science of Design: Software-Intensive Systems,
Workshop Program, National Science Foundation, Computer and Information Science and
Engineering Directorate, Charlottesville, Virginia, November 2003.
[18] Yuanfang Cai and Kevin Sullivan. Simon: A tool for logical design space modeling and
analysis. In 20th IEEE/ACM International Conference on Automated Software Engineering,
Long Beach, California, USA, Nov 2005.
[19] Yuanfang Cai and Kevin Sullivan. Modularity analysis of logical design models. In 21th
IEEE/ACM International Conference on Automated Software Engineering, Tokyo, JAPAN,
Sep 2006.
Page 191
Bibliography 175
[20] B. Y. Choueiry and G. Noubir. A disjunctive decomposition scheme for discrete constraint
satisfaction problems using complete no-good sets. In Knowledge Systems Laboratory, 1998.
[21] Krzysztof Czarnecki and Ulrich Eisenecker. Generative Programming: Methods, Tools, and
Applications. Addison-Wesley Professional, 1st edition edition, Jun 2000.
[22] R. Dechter and J. Pearl. Tree clustering for constraint networks. In Artificial Intelligence
38:353–366, 1989.
[23] Avinash K. Dixit and Robert S. Pindyck. Investment under Uncertainty. Princeton University
Press, USA, Jan 1998.
[24] Joanne Bechta Dugan, Kevin J. Sullivan, and David Coppit. Developing a high-quality soft-
ware tool for fault tree analysis. In Proceedings of the International Symposium on Software
Reliability Engineering, pages 222–31, Boca Raton, Florida, 1–4 November 1999. IEEE.
[25] Steven D. Eppinger. Model-based approaches to managing concurrent engineering. Journal
of Engineering Design, 2(4):283–290, 1991.
[26] Barry W. Boehm et al. Software Cost Estimation with Cocomo II. Prentice Hall PTR, 1st
edition edition, 2000.
[27] John Irwin et al. Aspect-oriented programming of sparse matrix code. In In Proceedings Inter-
national Scientific Computing in Object-Oriented Parallel Environments (ISCOPE), volume
1343, Marina del Rey, CA., 1997. Springer-Verlag, LNCS.
[28] John M. Favor, Kenneth R. Favor, and Paul F. Favor. Value based software reuse investment.
In Annals of Software Engineering 5, pages 5–52, 15May 1998.
[29] Martin S. Feather. Risk reduction using ddp (defect detection and prevention): Software
support and software applications. In RE, page 288, 2001.
[30] R. Filman and D. Friedman. Aspect-oriented programming is quantification and obliviousness.
2000.
Page 192
Bibliography 176
[31] E. Freuder and P. D. Hubbe. A disjunctive decomposition control schema for constraint satis-
faction. In Principles and Practice of Constraint Programming, 1st International Workshop,
PPCP’93, Newport, Rhode Island, 1993.
[32] Eugene C. Freuder. Partial Constraint Satisfaction. In Proceedings of the Eleventh Interna-
tional Joint Conference on Artificial Intelligence, IJCAI-89, Detroit, Michigan, USA, pages
278–283, 1989.
[33] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements
of Resuable Object Oriented Software. ADDISON-WESLEY, Nov 2000.
[34] David Garlan and David Notkin. Formalizing design spaces: Implicit invocation mecha-
nisms. In Proceedings of the 4th International Symposium of VDM Europe on Formal Soft-
ware Development-Volume I, pages 31–44. Springer-Verlag, 1991.
[35] David Garlan and Mary Shaw. An introduction to software architecture. In V. Ambriola and
G. Tortora, editors, Advances in Software Engineering and Knowledge Engineering, volume 1,
pages 1–40. World Scientific Publishing Company, 1993. Large-scale architecture patterns:
pipes and filters, layering, black-board systems.
[36] Joseph A. Goguen. Reusing and interconneting software components. IEEE Computer,
19(2):16–28, February 1986.
[37] William G. Griswold, Kevin Sullivan, Yuanyuan Song, Nishit Tewari, Macneil Shonle, Yuan-
fang Cai, and Hridesh Rajan. Modular software design with crosscutting interfaces. IEEE
Software, Special Issue on Aspect-Oriented Programming, January/February 2006 (in press).,
Feb 2006.
[38] J. Hannemann and G. Kiczales. Design pattern implementation in java and aspect. In Pro-
ceedings of the 17th Annual ACM conference on Object-Oriented Programming, Systems,
Languages, and Applications (OOPSLA), 2002.
Page 193
Bibliography 177
[39] D. Hutchens and R. Basili. System structure analysis: Clustering with data bindings. In IEEE
Transactions on Software Engineering, 11:749-757,, Aug. 1995.
[40] Daniel Jackson. Micromodels of software: Lightweight modeling and analysis with alloy.
February 2002.
[41] Daniel Jackson and Kevin Sullivan. COM revisited: Tool assisted modelling and analysis of
software structures. In Proceedings of the Eighth ACM SIGSOFT Symposium on the Founda-
tions of Software Engineering, pages 149–58, San Diego, CA, 6–10 November 2000.
[42] Rogers James L. Demaid/ga - an enhanced design manager’s aid for intelligent decomposition.
In Proceedings of 6th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis
and Optimization, Seattle, WA, 4-6 September 1996.
[43] Gregor Kiczales, Erik Hilsdale, Jim Hugunin, Mik Kersten, Jeffrey Palm, and William G.
Griswold. An overview of AspectJ. Lecture Notes in Computer Science, 2072:327–355,
2001.
[44] S. Jha K.J. Sullivan, P. Chalasani and V. Sazawal. Real Options and Business Srategy: Appli-
cations to Decision Making,. 1999.
[45] Thomas G. Lane. Studying software architecture through design spaces and rules. Technical
Report CMU/SEI-90-TR-18, CMU, 1990.
[46] A commercial product. http://www.lattix.com/.
[47] J. Liebeherr, M. Nahas, and W. Si. Application-layer multicasting with delaunay triangulation
overlays. IEEE Journal on Selected Areas in Communications, 20(8), Oct 2002.
[48] Jorg Liebeherr and Tyler K. Beam. Hypercast: A protocol for maintaining multicast group
members in a logical hypercube topology. In Networked Group Communication, pages 72–89,
1999.
[49] Cristina Videira Lopes and Sushil Krishna Bajracharya. An analysis of modularity in aspect
oriented design. In AOSD ’05, pages 15–26, New York, NY, USA, 2005. ACM Press.
Page 194
Bibliography 178
[50] Alan MacCormack, John Rusnak, and Carliss Baldwin. Exploring the structure of complex
software designs: An empirical study of open source and proprietary code. Harvard Business
School Working Paper Number 05-016.
[51] Alan Mackworth. Consistency in networks of relations. In Artificial Intelligence, 8, pages
99–118, 1977.
[52] S. Mancoridis, B. Mitchell, C. Rorres, Y. Chen, and E. Gansner. Using automatic clustering
to produce high-level system organizations of source code. In In Proc. 6th Intl. Workshop on
Program Comprehension, 1998.
[53] G. Murphy and D. Notkin. Software reflexion models: Bridging the gap between source and
high-level models. In Proceedings of the Third Symposium on the Foundations of Software
Engineering (FSE3), pages 18–28, New York, NY, October 1995. ACM.
[54] D. L. Parnas. On the criteria to be used in decomposing systems into modules. Communica-
tions of the ACM, 15(12):1053–8, December 1972.
[55] Neeraj Sangal, Ev Jordan, Vineet Sinha, and Daniel Jackson. Using dependency models to
manage complex software architecture. In OOPLSA, 2005.
[56] R. Schwanke. An intelligent tool for re-engineering software modularity. In In Proc. 13th Intl.
Conf. Software Engineering., 1991.
[57] M. Shaw. Candidate model problems in software architecture, 1994.
[58] Herbert A. Simon. The Sciences of the Artificial. The MIT Press, third edition, 1996.
[59] M. Sinnema, S. Deelstra, J. Nijhuis, and J. Bosch. Covamof: A framework for modeling
variability in software product families. In Proceedings of SPLC 2004, volume 3154, pages
197–213, August 2004.
[60] Mike Spivey. The fuzz manual. URL: http://spivey.oriel.ox.ac.uk/˜mike/fuzz/.
Page 195
Bibliography 179
[61] W. P. Stevens, G. J. Myers, and L. L. Constantine. Structured design. IBM Systems Journal,
13(2):115–39, 1974.
[62] Donald V. Steward. The design structure system: A method for managing the design of
complex systems. IEEE Transactions on Engineering Management, 28(3):71–84, 1981.
[63] Kevin Sullivan, William Griswold, Yuanyuan Song, Yuanfang Cai, and et al. Information
hiding interfaces for aspect-oriented design. In ESEC/FSE ’05, Sept 2005.
[64] Kevin Sullivan, William G. Griswold, Yuanfang Cai, and Ben Hallen. The structure and
value of modularity in software design. SIGSOFT Software Engineering Notes, 26(5):99–
108, September 2001.
[65] Kevin J. Sullivan, Joanne Bechta Dugan, and David Coppit. The Galileo fault tree analysis
tool. In Proceedings of the 29th Annual International Symposium on Fault-Tolerant Comput-
ing, pages 232–5, Madison, Wisconsin, 15–18 June 1999. IEEE.
[66] Kevin J. Sullivan, Joanne Bechta Dugan, John Knight, et al. Galileo: An advanced fault tree
analysis tool, 1997. URL: http://www.cs.virginia.edu/˜ftree/index.html.
[67] Kevin J. Sullivan, Ira J. Kalet, and David Notkin. Software design: The options approach.
In 2nd International Software Architecture Workshop, Joint Proceedings of the SIGSOFT ’96
Workshops, San Francisco, CA, October, 1996., pages 15–18, August 1996.
[68] Peri L. Tarr, Harold Ossher, William H. Harrison, and Stanley M. Sutton Jr. N degrees of
separation: Multi-dimensional separation of concerns. pages 107–119, 1999.
[69] Edward Tsang. Foundations of Constraint Satisfaction. Academic Pr., London and San Diego,
1993.
[70] J Withey. Investment analysis of software assets for product lines. Technical Report
CMU/SEI-96-TR-10, Carnegie Mellon University, 1996.
Page 196
Bibliography 180
[71] C. Jason Woodard. Architectural Strategy and Design Evolution in Complex Engineered Sys-
tems. PhD thesis, Harvard University and Singapore Management University., May 2006.
(forthcoming).