Modularity in Design: Formal Modeling and Automated Analysis

Modularity in Design:Formal Modeling and Automated Analysis

A Dissertation

Presented to

the faculty of the School of Engineering and Applied Science

University of Virginia

In Partial Fulfillment

of the requirements for the Degree

Doctor of Philosophy

Computer Science

by

Yuanfang Cai

August 2006

c© Copyright August 2006

Yuanfang Cai

All rights reserved

Approvals

This dissertation is submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Computer Science

Yuanfang Cai

Approved:

Kevin J. Sullivan (Advisor)

Mary Lou E. Soffa

William G. Griswold

John C. Knight (Chair)

Jack W. Davidson

Accepted by the School of Engineering and Applied Science:

James H. Aylor (Dean)

August 2006

Abstract

People have long recognized that evolvability, achieved most fundamentally by appropriate modu-

larity in design, can have enormous technical, organizational and economic value. However, achiev-

ing appropriate modularity in design, e.g., by refactoring, can incur significant costs: both direct

and in tradeoffs against other properties, such as performance and time to market [14]. Reason-

ing about the evolvability properties and economic implications of design structures is critical to

high-consequence decision-making, but it remains difficult, in part due to the lack of formal theo-

ries linking design structures to evolvability and economic properties, and of automated techniques

facilitating value-based decision-making.

One key impediment is the lack of analyzable high-level design representations that both convey

design architectures and enable designers to reason precisely about their modularity properties and

economics. This dissertation contributes such a formal and analyzable representation. It supports

formal design modeling and enables automation of a number of evolvability and economic-related

analyses.

Baldwin and Clark previously contributed an influential but informal theory of modularity in de-

sign, centered on a design representation called the design structure matrix (DSM), and employing

a real-options-based model of the economic value of modularity. We found the imprecise nature of

DSMs troubling in some dimensions. Our framework captures and clarifies the essence of Baldwin

and Clark’s theory in the particular framework of first-order logical constraint networks. It enables

automatic derivation of DSMs with rigorous semantics and a number of other architecture analysis

techniques.

We model both design decisions and relevant external conditions using an augmented form of

iv

v

constraint networks (ACNs). To support design impact analysis, we derive an intermediate, state-

machine-based design space model from an ACN, which we call a design automaton (DA). To

support traditional design coupling structure analysis, we derive a pair-wise dependence relation

(PWDR) from a DA, based on which we can then derive a DSM, and apply Baldwin and Clark’s

theory (among others).

To address scalability issues in constraint solving and solution enumeration that a DA requires,

we create a method to decompose a large ACN model into a number of smaller ones, solve each of

them separately, and integrate the results on demand. To address the problem that an ACN model

is not sufficient to model and analyze complex design decisions with crosscutting and hierarchical

structural implications, we extend the ACN model into the complex augmented constraint network

(CACN), which formally represents a family of ACNs.

Our ultimate goal is to enable economically effective software architectural decision-making

based on sound theory and useful tools. This dissertation takes an important step toward this goal

by providing a formal analyzable design modeling framework. Our thesis follows:

• This framework formally accounts for the key concepts of Baldwin and Clark’s modularity

theory as well as Parnas’s earlier information hiding design criterion.

• This framework enables the derivation of pair-wise dependence relations from ACNs, and

consequently, the derivation of DSMs with precise semantics.

• This framework enables automation of a range of formal architectural analysis methods re-

lated to evolution and economic value.

• This framework generalizes to provide an account of both object-oriented and newer aspect-

oriented notions of modularity in a unified, declarative framework.

In support of this thesis, we present evidence in two forms: (1) formal modeling and automated

analysis of case studies, supported by our prototype tool called Simon; (2) a complete formaliza-

tion of our framework, and the formalized key notions of existing theories in the settings of our

framework.

Dedication

I dedicate my dissertation to my husband, Bo Zhang, for his unconditional love and for his encour-

agement that prevented me from giving up when I felt hard to continue. I would never have been

able to reach this point without his constant support.

I also dedicate this dissertation to my loving family and family-in-law back in China. I thank my

father, Wentong Cai, and my mother, Jinfeng Wang, for caring about who I am much more than what

I do, always telling me to balance work with my personal life. I thank my mother-in-law, Shuqin

Guo, and my father-in-law, Yimin Zhang, for treating me like their own daughter, always showing

me their understanding and caring. I thank my elder brother, Yuanming Cai, for encouraging me

to embark on this challenging and fruitful journey. I thank my sister-in-law, Dongmei Sun, and my

lovely nephew, Zijian Cai, for always believing in me more than I believed in myself.

vi

Acknowledgments

I would like to express my sincere gratitude and appreciation to my advisor, Professor Kevin Sul-

livan, for providing me with the unique opportunity to work in the research area of software en-

gineering and software economics, for his expert guidance and mentorship, and for his invaluable

advice that makes me think formally and think broadly. He has taught me innumerable lessons and

insights on the workings of academic research in general. His technical and editorial advice was

essential to the completion of this dissertation. I appreciate the high standards he has held me to.

After six years, I am stronger and more independent, not only in my research ability, but also in my

general personality.

I would not have been able to survive without the enormous help offered by my dear friend

Elisabeth Strunk. She has been there for me whenever I was in need of help: she spent many hours

reading and marking my paper draft word by word, helping me express and organize my thoughts,

while she was not even a co-author. There was a time when she had her own pending deadlines,

but still spared her time to comment on my paper, making more than one pass to be sure that I had

something presentable. When I had a hard time, she told me that she would help me make it, and

she did. I cannot thank her enough for the countless favors she has offered me over the past six

years.

I thank Professor John Knight for his constant and critical support. Especially, both my husband

and I are full of gratitude to him for helping me look for a job close to my husband. I thank Professor

Mary Lou Soffa for hugging and comforting me like my mom, and for her always being there to

support and to help. I thank Professor Anita Jones for her kind mentoring about how to be a

professional woman. I thank Professor Jack Davidson for his service and comments as part of my

vii

viii

committee. I have been blessed with these great and generous people in our department. They are

like my family, where I found a source of courage and the strength to survive.

I thank Professor William Griswold from the University of California, San Diego for his great

collaboration and encouragement. I thank Professor Carliss Baldwin from the Harvard Business

School for her enormous influence on my research. I thank Professor Tao Xie from North Caro-

line State University and Dr. Cordell Green from Kestrel Institute for their friendship, help, and

mentoring.

I thank all my friends who came to every one of my talks and give me warm support, including

Billy Greenwell, Tony Aiello, Xiang Yin, Michael Spiegel, and Patrick Graydon. I thank all my

friends around me. Especially, I thank Yuanyuan Song for driving me around after I lost my car,

for sharing my burden of tool implementation when I was totally overwhelmed, and for always

reminding me whenever I felt frustrated and was asking why life can be so hard: “Thinking of

people starving in Africa and suffering in Iraq, you are not qualified to complain!”

Yes, I have been very, very lucky because of all these great, generous, and broad-minded people

around me. This small section is far from enough for me to thank them enough and thank them all.

Contents

1 Introduction 1

1.1 Modularity in Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Current Design Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Formal Modeling and Automated Analysis . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Model Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.5 Model Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.6 Prototype Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.8 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Background 14

2.1 Modularity in Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Software Economics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Prevailing Design Representations and Analysis . . . . . . . . . . . . . . . . . . . 17

2.4 Emerging Approach to Economics of Modularity . . . . . . . . . . . . . . . . . . 20

3 Overview of Core Modeling and Analysis Approaches 26

3.1 Core Model Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2 Augmented Constraint Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3 Operational Design Space Evolution Model . . . . . . . . . . . . . . . . . . . . . 32

3.4 Pair-wise Dependence Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.5 Connections to Evolvability and Economic Analysis . . . . . . . . . . . . . . . . . 37

ix

Contents x

3.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4 Modeling and Analysis of a Benchmark Design 40

4.1 Key Word In Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2 ACN KWIC Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.3 Quantitative Changeability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4 Design Structure Matrix Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5 Net Option Value Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5 Model Decomposition and Result Integration 62

5.1 ACN Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.2 Integrating Analysis Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.3 Observations and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6 Model Extension and Structural Design Impact Analysis 76

6.1 Figure Editor Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.2 Set-Valued Design Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.3 Crosscutting Design Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.4 Nested Design Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.5 Parameterizing CACN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.6 Structural Design Impact Analysis Overview . . . . . . . . . . . . . . . . . . . . 88

6.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7 Formalization 97

7.1 Formalizing the Core Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

7.2 Formalizing Previous Theories of Modularity . . . . . . . . . . . . . . . . . . . . 105

Contents xi

7.3 The Divide-and-Conquer Approach and Its Correctness . . . . . . . . . . . . . . . 108

7.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

8 Simon: The Tool 121

8.1 Interactive Formal Design Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 123

8.2 Constraint Solving and DA, PWDR Generation . . . . . . . . . . . . . . . . . . . 132

8.3 Automated Design Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

8.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

9 The Generalizability of the Approach 140

9.1 A Web Application—Winery Locator . . . . . . . . . . . . . . . . . . . . . . . . 142

9.2 HyperCast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

9.3 Galileo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

9.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

10 Evaluation of this Research 165

10.1 Thesis and Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

10.2 Novelty and Potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

10.3 Limitations and Remaining Problems . . . . . . . . . . . . . . . . . . . . . . . . 168

10.4 Challenges and Open Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

10.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

11 Conclusion 171

Bibliography 173

List of Figures

2.1 OO Observer Pattern DSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1 Core Models and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 Matrix Constraint Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Matrix ACN model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4 The Matrix Design Automaton . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.5 Partial Matrix Design Automaton . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.6 Matrix DSM Generated by Simon . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.1 KWIC Sequential Design Architecture [57] . . . . . . . . . . . . . . . . . . . . . 42

4.2 KWIC Information Hiding Design Architecture [57] . . . . . . . . . . . . . . . . 43

4.3 KWIC Sequential Design Constraint Network . . . . . . . . . . . . . . . . . . . . 45

4.4 KWIC Information Hiding Design Constraint Network . . . . . . . . . . . . . . . 46

4.5 Simon Clustering GUI for the KWIC IH Design . . . . . . . . . . . . . . . . . . . 50

4.6 Tool Snapshot: KWIC SD Design Impact Analysis Input . . . . . . . . . . . . . . 52

4.7 Tool Snapshot: KWIC SD Design Impact Analysis Output . . . . . . . . . . . . . 53

4.8 Partial Non-deterministic Finite Automaton for SD and IH design . . . . . . . . . 54

4.9 KWIC SD Derived DSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.10 KWIC IH Derived DSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.11 NOV Computation for Manual KWIC SD . . . . . . . . . . . . . . . . . . . . . . 57

4.12 NOV Computation for Manual KWIC IH . . . . . . . . . . . . . . . . . . . . . . 58

4.13 NOV Computation for Derived KWIC SD . . . . . . . . . . . . . . . . . . . . . . 60

xii

List of Figures xiii

4.14 NOV Computation for Derived KWIC IH . . . . . . . . . . . . . . . . . . . . . . 61

5.1 Partial KWIC Information Hiding ACN model . . . . . . . . . . . . . . . . . . . . 63

5.2 Conjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.3 Partial KWIC CNF graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.4 KWIC Condensation Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.5 The First sub-ACN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.6 The Second sub-ACN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.7 Partial DA for the Linestorage sub-ACN . . . . . . . . . . . . . . . . . . . . . . . 69

5.8 Partial DA for the CircularShift sub-ACN . . . . . . . . . . . . . . . . . . . . . . 70

5.9 KWIC SD Modularized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.10 A SD sub-ACN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.1 OO Observer Pattern UML Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . 78

6.2 Figure Editor CACN Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.3 Complex Augmented Constraint Network . . . . . . . . . . . . . . . . . . . . . . 84

6.4 The Constraint Network in an ACN Generated by Design Alternative FE1 . . . . . 88

6.5 The Constraint Network in an ACN Generated by Design Alternative FE2 . . . . . 89

6.6 The DSM of FE OO: OO Figure Editor Design . . . . . . . . . . . . . . . . . . . 90

6.7 The DSM of FE AO: AO Figure Editor Design . . . . . . . . . . . . . . . . . . . 91

6.8 DIA: Notification Policy Change Impacts . . . . . . . . . . . . . . . . . . . . . . 91

6.9 The DSM of FEOO Role: Screen takes the subject role . . . . . . . . . . . . . . . . 92

6.10 The DSM of FEOO Position: Positions are observed in OO design . . . . . . . . . . 93

6.11 The DSM of FEAO Position: Positions are observed in AO design . . . . . . . . . . 94

7.1 The Brute-Force and Divide-and-Conquer DA Derivation . . . . . . . . . . . . . . 112

7.2 The Brute-Force and Divide-and-Conquer PWDR Derivation . . . . . . . . . . . . 117

8.1 Core Models and Analysis in Simon . . . . . . . . . . . . . . . . . . . . . . . . . 123

8.2 Simon: Constraint Network Construction . . . . . . . . . . . . . . . . . . . . . . 125

List of Figures xiv

8.3 Simon: Dominance Relation Construction . . . . . . . . . . . . . . . . . . . . . . 126

8.4 Simon: Cluster Set Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

8.5 ACN Language Productions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

8.6 Simon: Complex Augmented Constraint Network . . . . . . . . . . . . . . . . . . 128

8.7 Simon: Design Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

8.8 Parameterize a CACN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

8.9 Automatically Generated ACN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

8.10 CACN Language Productions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

8.11 Simon: Solve Constraint Network . . . . . . . . . . . . . . . . . . . . . . . . . . 133

8.12 Simon: Decompose a Large Constraint Network . . . . . . . . . . . . . . . . . . . 134

8.13 Simon: Sub-ACNs are Solved . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

8.14 Simon: Design Automaton and Pair-wise Dependence Relation Generation . . . . 136

8.15 Design Impact Analysis: Select an Original Design . . . . . . . . . . . . . . . . . 137

8.16 Design Impact Analysis: Specify a Change . . . . . . . . . . . . . . . . . . . . . 138

8.17 Design Impact Analysis: Evolution Paths . . . . . . . . . . . . . . . . . . . . . . 138

8.18 Design Structure Matrix Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . 139

8.19 Net Option Value Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

9.1 WineryLocator OO Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

9.2 WineryLocator Aspect-Oriented Design . . . . . . . . . . . . . . . . . . . . . . . 148

9.3 Derived WineryLocator Design Rule DSMs . . . . . . . . . . . . . . . . . . . . . 149

9.4 Collapsed WineryLocator Design Rule Design . . . . . . . . . . . . . . . . . . . . 150

9.5 HyperCast OO Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

9.6 HyperCast OO Derived DSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

9.7 HyperCast OO Manually-Constructed DSM [63] . . . . . . . . . . . . . . . . . . 156

9.8 HyperCast AO DSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

9.9 Galileo Design Rules CACN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

9.10 Galileo: Design Rules and New Features . . . . . . . . . . . . . . . . . . . . . . . 160

List of Figures xv

9.11 Galileo Error Handling Refactoring . . . . . . . . . . . . . . . . . . . . . . . . . 161

9.12 Design Structures using Different Error Handling Options . . . . . . . . . . . . . 163

9.13 Add New Views based on Different Error Handling Options . . . . . . . . . . . . 163

List of Tables

3.1 Matrix Design Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.1 The Variables of IH sub-ACNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.2 The Variables of SD sub-ACNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

9.1 Performance for WineryLocator OO Model . . . . . . . . . . . . . . . . . . . . . 146

9.2 Performance for WineryLocator DR Model . . . . . . . . . . . . . . . . . . . . . 147

9.3 Performance for WineryLocator AO Model . . . . . . . . . . . . . . . . . . . . . 147

9.4 Performance for HyperCast OO Model . . . . . . . . . . . . . . . . . . . . . . . . 154

9.5 Performance for HyperCast Obliviousness Model . . . . . . . . . . . . . . . . . . 154

9.6 Performance for HyperCast DR Model . . . . . . . . . . . . . . . . . . . . . . . . 155

xvi

Chapter 1

Introduction

1.1 Modularity in Design

People have long recognized that evolvability, achieved most fundamentally by appropriate modu-

larity in design, can have enormous technical, organizational and ultimately economic value. How-

ever, activities to achieve appropriate modularity in design, such as refactoring, are not free, but can

incur both direct costs and indirect costs in the form of tradeoffs against other key properties, such

as performance and time to market [14]. Reasoning about the evolvability properties and economic

implications of design structures is critical to high-consequence design decision-making. How-

ever, such reasoning remains difficult, in part because we remain without a formal framework for

modeling and analysis of problems in this domain.

The challenge is that we lack both formal theories linking design structures to their evolvability

and economic properties, and automated techniques facilitating these economic-oriented decision-

making. One key impediment is the lack of analyzable design representations that not only convey

conceptual designs, but also enable designers to reason about the structure of coupling relations on

high-level design decisions and their economic implications. This dissertation contributes such an

analyzable representation and a number of automated evolvability and economic-related analysis

techniques.

We were motivated to conduct this research in part by a conversation with two practicing soft-

ware engineers, who described a dilemma they faced at work. The engineers worked for a small

1

Chapter 1. Introduction 2

company that earned revenues by delivering to a paying customer a stream of enhancements to a

software tool. The engineers who did the work were responsible both for estimating the time (and

thus cost) to make each enhancement and for implementing the selected enhancements. They were

quite good at estimating, but dissatisfied with the system design, believing that it significantly com-

plicated, and thus increased the cost and time required, to implement each new feature. They had

proposed to the management to restructure the tool. However, the management, concerned about

disrupting the flow of enhancements, and thus of the revenues on which the company depended, and

having no clear model of the expected benefits from restructuring, declined. A key problem was

that the engineers had neither the training nor the tools to analyze the situation quantitatively or to

frame it in the economic terms that might have been compelling to business decision-makers. As a

result, the engineers were dissatisfied, and the company incurred a possibly significant unnecessary

cost.

The problem that this company faced is one that organizations everywhere grapple with. Should

we make a costly investment in design, in this case, in restructuring a design? The problem we set

out to address is that, as a discipline, we still largely lack the testable scientific models needed

to analyze design decisions such as this one in terms that make sense both technically and eco-

nomically. Without testable and validated analysis methods and tools, it will remain hard for engi-

neers and managers to reliably make such costly investment decisions in software design. Parnas’s

well-known changeability analysis has strong economic implications, and his information hiding

principle [54], which aims to isolate design decisions that are anticipated to change, has remained

influential for decades. However, rigorous, quantitative, and automated approaches to applying

such reasoning remain largely absent.

Reasoning about evolvability and economic properties rigorously first requires an analyzable

design representation based on which the designer can express, analyze, and compare different

choices in terms of their respective economic impacts. Source code dependence structures, such

as call graphs, have been used as proxies of higher-level design coupling structures to facilitate

modular structure analysis. However, designers often need to answer important questions before

committing to implementation decisions: how best to accommodate changes in designs or in exter-


nal conditions, whether to invest in costly restructuring of complex systems, how best to modularize

designs, how to align architecture and business strategy [71], etc. Nor is it yet clear that source code

structure is a sufficiently reliable proxy for more abstract coupling structures. Most prevailing de-

sign level representations are not designed for this purpose. We identify a number of problems that

make current design representations unsuitable for value-oriented design analyses, and present a

formal modeling framework to address these problems.

1.2 Current Design Representations

A design representation for value-oriented analyses should support not only conceptual design de-

scriptions, but also the reasoning about the structure of coupling relations on high-level design

decisions and their economic implications, precisely, rigorously, and, ideally, automatically. Un-

fortunately, most prevailing design modeling techniques, such as the unified modeling language

(UML) [15] and architecture description languages (ADLs) [35], are not designed for this purpose

and appear to be difficult to serve as the proper media. The obstacles include the following:

First, some design decisions or external conditions, such as hardware conditions, security re-

quirements, user profiles, are not part of the program but could affect software evolution or impact

other design decisions. Parnas’s changeability analysis in his seminal paper explicitly considers

several external conditions that drive design evolution, such as the memory size and input file size,

some dimensions in which decisions are likely to change, such as “how to store data in mem-

ory” [54], and different choices within each dimension. Most prevailing design representations are

not designed to capture varieties of such dimensions, making it difficult to analyze the impact of

changes in these dimensions.

Second, a decision at one level often alters a design structure by introducing new variables and

constraints at lower levels. Examples include the choice of design patterns, or the decision to add a

new feature. The structure of a design is thus not flat and fixed but is, in general, contingent on prior

decisions and recursive in structure. State-of-the-art design modeling approaches do not adequately

represent this aspect of real design structures. Consequently, it is difficult to analyze the structural


consequences of, and tradeoffs involved in making such high level design decisions.

Third, the effects of design decisions are frequently not local but crosscutting. For example, in

an observer pattern design, all the subjects have to respect the agreed update protocol [33]. Pre-

vailing design modeling techniques do not adequately represent design decisions with crosscutting

effects, making it difficult to analyze their impacts.

Baldwin and Clark’s design rule theory [7] uses a matrix-based design model, call a design

structure matrix (DSM) [62, 25, 7], to represent design coupling structures, and they then employ a

model to statistically account for the economic implications of the modular structures of computer

system designs. The key idea in their work is that modules create valuable options, and their

model predicts the economic value of these options. A DSM model represents design decisions and

external conditions in a general way: they uniformly appear as design variables labeling the columns

and rows of a matrix. Marked cells in the matrix represent the pair-wise dependence relation among

design variables. DSM is simple but powerful in that it reveals key notions in design modularity,

as Baldwin and Clark point out [7]. Chapter 2 briefly introduces DSM modeling and Baldwin and

Clark’s economic analysis.

Previous work of Sullivan, Griswold, and the author of this dissertation [64] applied Baldwin

and Clark’s modeling and analysis techniques to software design, showing that a DSM can visu-

ally represent the criterion of information hiding modularity and support quantitative modularity

analysis. In Lopes’s paper [49] and our more recent work [63], Baldwin and Clark’s modeling

and analysis techniques were used to compare aspect-oriented designs, verifying that one design is

better than another both visually and quantitatively.

However, these manual modeling and statistic analysis techniques have severe limitations for

rigorous evolvability analyses. First, as with the UML and most ADLs, a DSM does not explic-

itly represent the choices available within each dimension, such as the choice of observer pattern

versus mediator pattern [33]. Consequently, analyzing the impact of changes in such decisions in

general remains difficult. Second, a DSM does not represent the multiple ways in which a change

in one design decision can be accommodated by changes in other decisions. In such cases, a DSM

becomes ambiguous, not enough for analysis leading to answers to some important questions, such


as: “which is the best available compensation in terms of cost?” In addition, due to its informal

and ambiguous nature, we have found that building such DSM models is error-prone and time-

consuming.

1.3 Formal Modeling and Automated Analysis

To address these problems, this dissertation presents a formal design modeling framework, con-

tributing three core models representing decision-making phenomena from different perspectives.

These models connect conceptual software designs with a number of evolvability and economic

analyses.

1.3.1 Formal Models

As the basis for rigorous design analysis, we employ a formal model called an Constraint Network

(CN) to model design dimensions and external conditions in a general way. In a CN, variables

represent dimensions in which design decisions are made, values represent design decisions, and

logical constraints model required relations. For example, the following CN models the choices of a

matrix data structure, the choices of the algorithm, and one of their relations as two scalar variables

and one logical expression:

1: scalar matrix_data_structure:(array, list);

2: scalar matrix_algorithm:(array, list);

3: matrix_algorithm=array <=> matrix_data_structure=array;

The scalar variables matrix_data_structure and matrix_algorithm, in lines 1 and 2, represent

the dimensions of data structure and algorithm. Their domains follow within the parentheses, mod-

eling the choices within each dimension. For example, Line 1 models that the choices for the data

structure dimension are array and list. A constraint network models the interdependence relation

among design variables and environment conditions as a set of logical constraints. For example,

line 3 states that the choice of an array algorithm is valid if only if the selected data structure is

array-based.


A purely logic-based design description is not sufficient for design modularity analysis. For

example, the dominance relation among design decisions plays an important role in Baldwin and

Clark’s modularity analysis, but is not part of a constraint network. We address this problem by

augmenting a pure constraint network with additional data structures, and call it an Augmented

Constraint Network (ACN).

To link a conceptual design with its evolvability and economic properties, we derive a design

evolution model that represents the change dynamics within that part of a design space defined by

an ACN. We call this model a design automaton (DA). The states of a DA represent design states

(assignments of decisions in each of the dimensions) that satisfy all of the constraints. Transitions

model changes in design driven and labeled by changes to individual design decisions, where the

destination state for a given starting state and change differs from the initial state in a way that is

minimally sufficient to restore consistency. A DA captures all of the possible ways in which any

change to any decision in any state of a design can be compensated for by changes to minimal

subsets of other decisions. DA enables quantitative changeability analysis. For example, given a

changing decision or condition, this framework computes how many ways are there to accommo-

date this change, and how many decisions should be reconsidered in each way.

A pair-wise dependence relation (PWDR) among design elements appears to be a useful model

underlying many influential design representations. In box-and-arrow style representations, such

as ADL, UML, call graph, Reflexion Models [53], the arrows model different kinds of pair-wise

dependence relations among boxes, such as function calls, inheritance, system I/O. Pair-wise de-

pendence relation among design decisions is the core data structure underlying Baldwin and Clark’s

theory. The DA model provides a precise definition of what it means for one variable to depend on

another: we define two design variables to be pair-wise dependent if, for some design state, there

is some change to the first variable for which the second must change in at least one of the minimal

compensating state changes.


1.3.2 Automated Analysis

This representation formalizes the key concepts of Parnas’s and Baldwin and Clark’s modularity

theories, such as design spaces, design dimensions, design rules, and enables the automation of

their analysis techniques. (1) Parnas’s changeability analysis can be formalized rigorously in a DA

model. The question of finding all the ways to compensate for an anticipated sequence of individual

changes can be formulated as a mapping from a DA, an assignment modeling the current design,

and a sequence of variable-value pairs that model changes, to a set of sequences of consistent

design states modeling the feasible evolution paths for the given sequence of changes. To solve

this problem, we find the paths that start from the initial design and go along the edges labeled

with specified changes. Each path represents one way to compensate for the given changes. The

destination states are the possible new design states accommodating the given sequence changes.

(2) Parnas’s information hiding principle can be formalized as a mechanically checkable predicate

based on an ACN and the derived PWDR model, stating that a PWDR derived from an ACN should

not have any pair with a first element in an environment module, and the second in a design rule

module, formalizing the previous observation of Sullivan et al. obtained from DSM models [64]. (3)

The PWDR model can be used to populate a DSM model that has proven utility in other engineering

realms. As the result, in principle, analysis techniques available in other engineering realms, such

as project scheduling, can be applied to software design. (4) Since DSM modeling is at the heart

of Baldwin and Clark’s theory and analysis, this framework supports the automation of their Net

Option Value (NOV) analysis for software designs that can be expressed in terms of our modeling

framework.

1.4 Model Decomposition

As with many formal analysis techniques, such as model checking, the difficulty of constraint satis-

faction limits the size of models that can be analyzed in practice. Our DA model requires an explicit

representation of the entire space of satisfying solutions, but the number of the solutions increases

exponentially with the number of variables involved. For example, the Parnas’s KWIC information


hiding design ACN with 20 variables has 34907 solutions, which are the input of our DA derivation

program. Using Alloy as a SAT solver [41], which was not designed to perform exhaustive analysis,

it takes hours in total to get analysis results. To address this problem, we create a method to decom-

pose a large ACN model into a number of smaller sub-ACNs, solve each sub-ACN individually, and

integrate the analysis results. The integrated results are equal to the results obtained by analyzing

the full large ACN model. This approach splits the whole KWIC information hiding ACN into 6

sub-ACNs, having 6, 6, 4, 5, 7, and 5 variables respectively. Our supporting tool, Simon, now in-

vokes multiple SAT solvers and DA processors separately to deal with these much smaller models,

and integrate the results in the order of seconds.

1.5 Model Extension

Although the ACN, DA, and PWDR models have the potential to enable automated evolvability and

economic analyses, as a conceptual design description model, an ACN is not sufficient to capture

several complex design decision-making phenomena that people encounter frequently. First, as

Baldwin and Clark point out [7], some design dimensions are “called into being” by other decisions.

For example, a decision to add a new feature brings into being a number of new dimensions specific

to that feature. Scalar-valued design variables are not sufficient to model these decisions and their

impacts. Second, it is not uncommon that a decision not only brings up new dimensions, but

also new constraints among these new dimensions, or constraints between new dimensions and

existing dimensions. For example, a choice of design patterns not only brings new dimensions

that are specific to the pattern, but also imposes pattern-specific constraints on new and existing

design elements. Third, design decisions can crosscut each other. For example, an observer pattern

requires that “all the objects taking the subject role should implement the prevailing notification

policy.” When a new object is added to the system as a subject, as part of the impact analysis, the

designer should be aware of the notification policy in use, and of other constraints imposed by the

choice of an observer pattern.

We extend the ACN model into a complex augmented constraint network (CACN) to support


the modeling and analysis of these complex design decisions. A CACN, in essence, represents a

family of ACNs. Extending the definition of an ACN, a CACN uses set-valued variables to model

dimensions in which a decision brings into a set of new dimensions, uses values with subspaces to

model decisions with recursive sub-design structures, and uses logical quantifications to model the

crosscutting effects among decisions. For example, a CACN models the choices of patterns using

a subspace variable: subspace d_pattern: (observer, mediator), each value bringing up a

new design space represented by a recursive CACN. The following quantified expression:

∀object : observer role • object = orig⇒ update protocol = orig

models the crosscutting constraint that each object taking an observer role has to observe the agreed

update protocol. object = orig means keeping the object as currently designed; update protocol =

orig means that the update policy is as originally agreed. orig is short for ”original”, modeling a

current choice or decision.

Changing these complex design decisions incurs structural impacts. To analyze these impacts,

the designer instantiates a CACN model into a set of simpler design models represented by ACNs.

As a result, a range of analyses developed for ACNs can be applied—for assessing modularity,

evolvability, economic and other design properties.

1.6 Prototype Tool

We developed a tool called Simon to support formal design modeling and to automate the associated

evolvability property and economic analyses. Using Simon, the user can build ACNs and CACNs

through interactive GUIs, derive design coupling structures and present them using DSMs, apply

Baldwin and Clark’s net option value (NOV) analysis [7], and analyze design change impacts.

Using Simon, we have evaluated our framework against a number of case studies.

1.7 Evaluation

The ultimate goal of this research is to enable software designers to make value-oriented design

decisions in a rational way, facilitated by automatic tools. The purpose of this dissertation is to


provide a formal analyzable design modeling framework, one important step towards this goal.

This dissertation claims and evaluates the following thesis:

• This framework provides a formal account of the key concepts of important but informal

modularity theories. (1) It formalizes Baldwin and Clark’s key notions of design dimension,

design decision, design decision dependence, and design space. (2) It formally accounts for

Parnas’s concept of information hiding modularity as a mechanically checkable predicate.

• This framework enables the derivation of design coupling structures in the form of pair-wise

relations on design decisions, and thus also the derivation of DSMs from ACNs. The benefit

is that the approach enables designers to reason about modularity in design architecture using

both the methods of Baldwin and Clark (but in terms of an abstract and formally precise

representation), as well as new kinds of analysis.

• This framework automates basic evolvability analyses such as design impact analysis. Given

a sequence of changing decisions or conditions, this framework computes how many ways

are there to accommodate these changes, and how many decisions should be reconsidered in

each way.

• Our model of modularity in design is general. In particular, it can account for both traditional

object-oriented notions of modularity and newer aspect-oriented notions within a unified,

declarative framework.

The evaluation of this framework partially involves the formal account of the key concepts

of important but informal modularity theories: (1) the formalization of Baldwin and Clark’s key

notions of design dimension, design decision, design decision dependence, design space, and design

rule; (2) the formal account of the extensions to Baldwin and Clark’s approach made by Sullivan

et al.; and (3) the formal account of Parnas’s concept of information hiding modularity. Chapter 7

presents these formalized concepts.

We also evaluate the ability of this framework to connect conceptual designs with important

evolvability and economic analyses, in particular, to Parnas’s changeability analysis, coupling struc-

ture analysis supported by design structure matrices, and Baldwin and Clark’s net option value


(NOV) analysis. Our basic evaluation strategy in this aspect is to model software designs in which

people have analyzed problems that have strong economic implications, automate these analyses

using Simon, and compare the results with the previous qualitative analysis results. The purpose

is to evaluate: (1) the expressiveness of the framework; (2) the ability to automate these analyses;

and (3) the accuracy of the analyses. Our case studies include both canonical designs and designs

of several real systems.

The canonical designs include the famous software engineering benchmark, the Key Word in

Context (KWIC) system, in Parnas’s seminal information hiding paper [54], and the widely used

Figure Editor (FE) example [43, 37, 33]. These two designs represent very different design styles:

KWIC represents functional and object oriented designs; the Figure Editor design manifests broader

contemporary approaches using design patterns and aspect-oriented programming. The commonly-

used representation approaches for these two designs are different: researchers have frequently

represented KWIC architectures in a standardized way [57], while the UML models for different

variations of the FE example appear in large number of recent publications. Modeling them uni-

formly evaluates the expressiveness and abstraction ability of this framework.

The analyses people illustrate using these designs have some similarities in essence: Parnas

uses KWIC to analyze design changeability; Hannemann and Kiczales [38] use FE to compared

their aspect oriented (AO) design patterns with the object oriented (OO) design patterns in terms

of their ability to accommodate envisioned changes. Analyzing well-known problems in published

work in a quantitative and automated way demonstrates the potential utility of this framework.

Our case studies also include the following real designs:

1. A web application developed and studied by Lopes et al. [49] (WineryLocator). In their pa-

per, Lopes et al. use Baldwin and Clark’s modeling and analysis technique to quantitatively

compare different designs. To evaluate the expressiveness of our framework and the correct-

ness of the analysis results, we represent these designs into ACNs according to their design

descriptions, generate DSMs, and compare with their manually constructed DSMs. We found

ambiguities and problematic issues in their published manual models and analysis.


2. A peer-to-peer networking system, HyperCast [48,47], developed by the network researchers

in the University of Virginia and studied by Sullivan et al. [63]. Similar to the WineryLoca-

tor paper, the authors compared different designs using manual models. Remodeling these

designs into our framework and analyzing them automatically reveals important issues in the

manual models.

3. The Galileo dynamic fault tree analysis tool, developed at the University of Virginia for

production use at NASA [66,65,24]. The Galileo designers once faced a situation when they

had to make a decision about how to restructure part of the system. They reached a decision

based on discussions and arguments, rather than rigorous analysis. A retrospective analysis

of this historical scenario using Simon shows how the designers might have been able to

compare different decisions and to justify their decision rigorously.

The revealed errors in manually constructed DSMs imply potential errors in the later on quantita-

tive analysis based on these models. These case studies provide support for the claimed modeling

and analysis ability of our framework, revealing the power of formal models and automated anal-

ysis. The ultimate utility of this framework and the tool will continuously be evaluated in realistic

settings, as part of our future work.

1.8 Overview

Chapter 2 presents the background of this work. Chapter 3 illustrates the full picture of our core

models and associated analyses using a small example. Chapter 4 uses the famous software engi-

neering benchmark, the Key Word in Context (KWIC) system, to illustrate the problems with exist-

ing design representation approaches and how our modeling framework addresses these problems.

This chapter shows how Simon automates our framework and related well-known economic-related

analyses, reveals errors and weaknesses in published work, and rationally justifies design decisions

previously made in intuitive and qualitative ways. Chapter 5 presents our divide-and-conquer ap-

proach to addressing the scalability issue. Chapter 6 presents our extended CACN model using

the widely used Figure Editor design as a running example, and shows how a CACN model sup-


ports structural design impact analysis. Chapter 7 presents the key formalizations behind Simon.

Chapter 8 presents how Simon implements our framework. Chapter 9 further evaluates our frame-

work by modeling and analyzing the three real software designs, demonstrating its potential utility.

Chapter 10 evaluates this work as a whole. Chapter 11 concludes.

Chapter 2

Background

A great deal of research has been done on evolvability and modularity in software design and more

recently on the economics of design. This chapter presents the most important and relevant prior

work in this area, and explains both its strengths and where it falls short in relation to the goals we

have set forth.

2.1 Modularity in Design

People have recognized for decades that the structure of the coupling relation on design decisions

is a key factor influencing the evolvability and economic properties of a design [2, 5, 58, 61, 16].

Christoper Alexander [2] defines design as “the process of inventing things which display new phys-

ical order, organization, form, in response to function...”, discusses the process by which a form is

adapted to the context that has called it into being, and shows that such an adaptive process will be

successful only if it proceeds piece by piece instead of all at once, that is, by creating subsystems

of the adaptive process.

Software designers seek to structure software systems into modules (subsystems), to better ac-

commodate expected changes (adapt to context changes), to have parts that can be developed and

evolved without further coordination, and to ease the understanding of complex designs through ab-

straction of details hidden within modules. Constantine et al. [61] emphasize the need for designers

to manage the coupling between modules (subsystems) and cohesiveness within them: modules

14

Chapter 2. Background 15

with high cohesion and low coupling imply desirable properties of software including robustness,

reliability, reusability, and understandability.

In the study of OS/360 and other large systems, Belady and Lehman observed the rising cost of

change caused by decaying structure due to the accumulation of unanticipated changes, explicitly

connecting the changeability of design structures with their economic impacts [13]. Parnas’s infor-

mation hiding design criterion dictates that designers decompose systems into modules in order to

hide (and thus to decouple) decisions that are difficult or likely to change. Recent work, such as

object-oriented and component-based software development, follows these ideas, taking objects or

components as modules with the assumption that they hide the decisions that are likely to change.

The limitation of these dominating methods has been recognized and challenged, for example, by

aspect oriented programming researchers.

These important theories, guidance, and principles remain intuitive and heuristic, and progress

in programming languages hasn’t solved decision-making problems in design, such as the refactor-

ing story we introduced in the previous chapter. Among the reasons is that we lack both scientific

theories to rigorously account for the economic implications of design structures, and automated

techniques facilitating value-oriented decision-making. Although Parnas’s well-known changeabil-

ity analysis has strong and explicitly noted economic implications, we still remain without a scien-

tifically rigorous formulation of the idea or a quantitative, automatable approach to applying it in

design modeling and analysis.

2.2 Software Economics

People recently have explored the possibility to import rigorous analysis methods from other engi-

neering and economic realms into software engineering. Sullivan was among the first in the soft-

ware engineering community [67,44] to suggest that work from the financial economics community

on real options [3, 23] might provide a link from technical notions of modularity in design, phased

project structures (such as the spiral model [26]), and strategic timing of software design commit-

ments to economic measures of goodness. Withey [70] applied a related analysis to reason about


the flexibility value of software product line architectures. Favaro [28] developed an options-based

approach to investment analysis for software reuse infrastructures.

Carliss Baldwin and Kim Clark at the Harvard Business School developed a similar idea, pub-

lished in their book, Design Rules: The Power of Modularity [7]. Their goal was a plausible sci-

entific hypothesis accounting for observed large-scale changes in the structure of the computer

industry over several decades, from a set of vertically integrated companies to a set of highly mod-

ular clusters: companies organized around particular components of the computer (CPU, operating

system, motherboard, etc.). Their idea is that modularity in design creates economic value in the

form of real options. These are options to invest in multiple R&D experiments within modules

rather than at the whole-system level, and then to select the best of resulting outcomes. Multiple

companies within a sector are, in essence, exploring the space of possible designs by conducting

such experiments. System integrators (such as Dell, for example, in the PC sector) serve to select

the best available outcomes at any given point in time. Baldwin and Clark presented a novel real

options valuation model, and argued that the pursuit of the economic value that they had modeled

could account for the large-scale transformation of the industry. Companies saw value within mod-

ules and organized system designs and themselves accordingly. Baldwin and Clark adopted and

adapted the design structure matrix (DSM) as a design representation, framed the notion of a de-

sign rule as a special design decision that serves to split a design into a set of independent modules,

and built an options valuation model for designs expressed in terms of DSMs and design rules.

The work of Sullivan and his colleagues, now including Baldwin, provides the backdrop for the

work presented in this dissertation. In particular, the notion that real options can provide models

to aid decision-makers in the design of software and software-intensive systems remains as a com-

pelling but still largely unproven hypothesis. Several hurdles stand in the way of the development,

validation and application of these ideas to actual software and system design. First, the statisti-

cal models of uncertainty underlying risky design experiments remain unvalidated. The stochastic

process models behind most work on real options are especially questionable for technical reasons

beyond the scope of this work. Baldwin and Clark’s new approach to options pricing is based

on extremal order statistics rather than on stochastic processes. However, validation of the new


model remains ongoing work. Second, while the notion of design rules seems to provide a power-

ful new way to think about information hiding modularity in a general sense, the design structure

matrix representation, in terms of which the concept was first developed, appears to be inadequate

to fully support a rigorously precise theory adequate either to the needs of software engineering or

to underpin a scientifically precise and testable theory of design and economic value. (Nor, as we

discuss further below, do our traditional software design representations appear adequate.) Third,

the exploratory and experimental application of these ideas in software engineering research and

application remains at an early stage. We discuss the state of the art in this area in Section 2.4.

In this dissertation we primarily address the second problem: we lack suitable software de-

sign representations to support a scientifically rigorous theory of the economic value of modularity

based on concepts of real options. The next section explains why prevailing box-and-arrow and

component-and-connector representations, as typified by the class diagrams of the unified modeling

language (UML) [15], some architecture description languages (ADL) [35], etc., are not sufficient

to bridge the gap between design structure and rigorous economic reasoning.

2.3 Prevailing Design Representations and Analysis

In most commonly used box-and-arrow style design representations, a box represents the element

to model, and a line connecting two boxes represent a relation between these two elements. In an

ADL model, boxes represents components such as modules, and arrows model various of relations

between these modules, such as function calls. In a class diagram of a UML model, boxes rep-

resent classes and lines model their relations. Changeability is one of the problems that software

designers confront frequently, and that has strong economic implications. However, rigorous and

automated design changeability analysis at design level has not been available. Parnas’s analysis is

descriptive and qualitative; Hannemann and Kiczales compared the changeability of aspect oriented

implementation of design patterns with object oriented implementations based on the actual code.

Although these prevailing representations model program structures effectively, they are not

designed to support rigorous design changeability analysis. Designers could neither observe the


impact of changing decisions, nor could they measure these impacts in a quantitative way. Au-

tomating the analysis is even more difficult. Traditional impact analysis research focuses on change

issues at the program level, as summarized in [4]. We are interested in the counterpart at design

level, and identify the following missing elements that are critical to economic-related analysis:

First, the environment conditions and important design decisions that influence software evo-

lution are not modeled. Parnas’s changeability analysis is based on changes in environment condi-

tions, such as external constraints on core size and input size, the dimensions that prevailing design

representations are not designed to model.

Second, important design dimensions and possible choices within these dimensions are not

modeled. For example, in an observer pattern [33], choosing different update policies influences

the elements in the pattern: the push policy requires the subjects send all their data regardless

of what the observers need; the pull policy depends on the observers to request the needed data.

Neither UML nor ADL are designed to model and analyze such choices. Czarnecki et al.’s feature

model [21] aims to model and analyze feature variations, but not design decisions in a broader

sense, such as refactoring options, design patterns, and aspects. Design space modeling has also

been studied by Bosch [59], Lane [45], and Feather [29] for product line design, design generation

and optimization. While these design notations can ease communication among designs, help to

guide system implementers, they are not designed to account for the connections between design

structure and economic value explicitly and rigorously, per se, and help designers make value-

oriented decisions.

Third, they do not adequately model the constraints relating such decisions. These arrows with

legends have limited ability to express complex constraints, such as logical or, imply, and transitive

relation. In essence, they lack rigorous semantics for quantitative analysis.

These environment conditions, design dimensions, possible choices, and their internal con-

straints are the fundamentals of Parnas’s changeability analysis, but are not representable in pre-

vailing representation styles, not to mention their rigorous and automated analyses. Consequently,

the success of such analysis depends on the designers’ experience.

Fourth, a decision at one level, such as the decision to apply a design pattern or to add a new


feature, often alters a design structure by introducing new variables and constraints, As shown

in Hannemann and Kiczales’s [38] paper, the same pattern can use either an AO or an OO par-

adigm [38]. The choices in the pattern and paradigm dimensions have significant consequences

in that each choice calls into being a different design subspace that introduces both new dimen-

sions and constraints that are potentially scoped over other variables in the design. The structure

of a design space is thus not fixed but, in general, is contingent on prior decisions and recursive

in structure, a phenomenon that the state-of-the-art design modeling approaches do not adequately

represent. Consequently, it is difficult to analyze the structural and economic consequences of

making such high level design decisions.

Finally, the effects of design decisions are frequently not local but crosscutting. For example,

all the subjects and observers involved in an application of the observer design pattern [33] have

to respect the agreed notification policy, push or pull. Prevailing design modeling techniques do

not adequately represent design decisions with crosscutting effects. Consequently, it is difficult

to have a clear picture of the structural and economic consequences of making or changing such

crosscutting design decisions.

Jackson [40] uses Alloy for object modeling with the goal of being able to check structural

properties of object models specified using the Alloy relational logic. Garlan et al. [34, 1] used Z

to formalize architectural styles in order to prove mainly behavioral properties of systems in these

styles. Batory [9] uses formal models of software design spaces for systems that vary in component

implementations, aiming to support system generation and reuse.

We have found that Baldwin and Clark [7]’s design rule theory contains a set of concepts, mod-

els, and analysis techniques that are able to shed light on important software engineering phenom-

ena. The next section introduces briefly the key notions and analysis models of the emerging new

approaches, how they are related to software engineering, and what are the remaining challenges.


2.4 Emerging Approach to Economics of Modularity

This section introduces key notions in Baldwin and Clark’s theory that researchers have attempted

to apply to software engineering. Section 4.4 introduces design structure matrices (DSMs), a model

that has been used in other engineering realms and is at the heart of Baldwin and Clark’s theory of

modularity. Section 4.5 introduces Baldwin and Clark’s economic analysis model called net option

value that statistically computes the value of modularity, and its application in comparing software

designs. Section 2.4.3 explains the remaining challenges.

2.4.1 Design Structure Matrices

In work spanning several communities, including engineering systems design [25], design eco-

nomics [7], and software engineering and languages [64, 63, 49], researchers have been developing

and using explicit design space representations to support a range of novel and potentially useful

architectural analysis and decision-support techniques, including techniques that link design struc-

ture to economic value and business strategy. Much of this work has revolved around the design

structure matrix (DSM) as a representation. DSM modeling originated with the work of Steward

dating to the 1960s [62], and has been further developed and applied in the design, analysis and

management of many large-scale engineering systems by Eppinger [25] and others. DSMs are the

primary representations at the heart of Baldwin and Clark’s developing theory of the economics of

modularity [7].

DSMs present in a graphical form the pair-wise dependence structure of designs and of their cor-

responding development and evolution process. Figure 2.1 shows the DSM model for the FE design

using the OO observer pattern. The rows and columns of a DSM are labeled with design variables,

representing dimensions for which the designers must make design decisions. For example, we rep-

resent the need for a notification policy decision with a design variable policy_notify. The ab-

stract interfaces Subject and Observer are modeled by variables adt_subject and adt_observer

respectively. A marked cell indicates that the decision of the dimension on the row depends on the

decision of the dimension on the column. In Figure 2.1, the cell in row 6, column 1, indicates that


0 1 2 3 4 5 6 7 80:color_policy_observing .1:policy_notify .2:policy_update .3:d_mapping . x4:adt_observer . x5:adt_subject x x x .6:point_elements x x x x x .7:line_elements x x x x x .8:screen_elements x x .

Figure 2.1: OO Observer Pattern DSM

how the Point should be designed depends on the notification policy in use.

Some design decisions dominate other design decisions. Baldwin and Clark define a design

decision as a design rule [7] if it is made before and respected by subordinate design decisions,

deemed stable, and can decouple otherwise coupled decisions. The decision on an abstract interface

dimension can be seen as a design rule dominating other implementation decisions. The fact that the

environment conditions are outside of the designer’s control is another type of dominance relation.

For example, the observing policy defining the states and transitions of interest, color or position,

could be a changeable part of the system specification not decided by the designer. We call variables

modeling such environment conditions environment variables. A DSM models the existence of a

dominance relation by asymmetric dependences. In Figure 2.1, the DSM models that the decision

on the notification policy dominates the decision on the implementation of the Point element by the

lack of the symmetric mark in the cell of row 1 and column 6.

Sullivan et al. [64] showed that DSM modeling in the style of Baldwin and Clark should be ex-

tended with an explicit notion of environment variables, and that, so extended, it could account for

and help designers to visualize Parnas’s information hiding criterion. They also presented early ev-

idence of the potential for such modeling to provide an account of the economic (net options) value

of modularity in software design. The more recent work of Sullivan et al. built on such analysis to


provide a critique of the notion of obliviousness in aspect-oriented software design, a comparative

economic-based analysis of alternative approaches to the use of aspect-oriented mechanisms, and a

notion of explicit interfaces for aspect-oriented design [63] along with a practical method for using

crosscutting interfaces (XPIs) in software design [37]. We seek to formalize and support automated

analysis of design questions of the kind analyzed informally and manually, using DSMs, in our

earlier work.

2.4.2 Net Option Value Analysis

To quantitatively characterize the modularity of design in engineering realms, such as computer

system design, Baldwin and Clark propose a net option value model to statistically account for

modular design phenomena [7]. Sullivan et al. [64, 63] and Lopes [49], have previously used Bald-

win and Clark’s net option value analysis to quantitatively compare software designs modeled by

DSMs. This section briefly introduces the main ideas.

Suppose that a product has a market value of S0 because of its visible functionalities or proper-

ties, and the NOV model estimates the additional value added by the modular structure in its hidden

design. The idea is that modularity creates options to replace existing modules with better ones

that produce higher values, for example, because of improved speed, quality, etc. Modularity thus

creates a portfolio of valuable real options, one per module.

This model states that splitting a design into m modules increases its base value S0 by a fraction

obtained by summing the net option values (NOV i) of the resulting options. NOV is the expected

payoff of exercising a search and substitute option optimally, accounting for both the benefits and

cost of exercising options. This model depends on a number of simplified assumptions, for example,

it does not take into account the cost of attaining modularity, assumes that multiple experiments are

preformed on the same module, and that these experiments generate values that distribute normally.

On the other hand, it does capture key phenomena in design. In this model, the value of a product

with m modules—m embedded option—is calculated as:

V = S0 +NOV 1 +NOV i + ...+NOV m, where

NOV i = maxki{σin1/2i Q(ki)−Ci(ni)ki−Zi}


For module i, σin1/2i Q(ki) is the expected benefit to be gained by accepting the best positive-

valued candidate generated by ki independent experiments. Ci(ni)ki is the cost to run ki experiments

as a function Ci of the module complexity ni. Zi = Σjseesicnj is the cost of changing the modules

that depend on module i. The max picks the experiment that maximizes the gain for module i.

The most important parameter for NOV analysis is technical potential, σ. The complexity, n,

and visibility cost, Z, by contrast, are derived from a given design model. Technical potential is

the expected variance on the rate of return on an investment in producing variants of a module im-

plementation. On the assumption that the prevailing implementation of a module is adequate, the

expected variance in the results of independent experiments is proportional to changes in require-

ments that drive the evolution of the module’s specification. Complexity can be measured as the

size of the artifact as a proportion of the overall system, using the number of design variables, lines

of code, etc. The visibility cost measures the cost incurred by dependences between modules.

As Sullivan et al. pointed out [64, 63], the calculated values are not yet validated economic

projections but can only be interpreted as potentially valid indicators at present. It remains an open

challenge to justify precise estimates for real options in software design. However, as a back-of-the

envelope model, it provides ballpark figures and useful insights. Sullivan et al. first applied this

model to KWIC designs [64], and recently to a peer-to-peer networking protocol, HyperCast [63].

Lopes et al. applied this model to a web-service application, WineryLocator [49]. All these work

make use of NOV model as a comparative evaluation method to quantitatively compare two designs.

2.4.3 Challenges

DSM modeling is at the heart of the core of Baldwin and Clark’s theory and analysis methods.

Despite the fact that DSM modeling is powerful enough to reveal design coupling structure, make

the information hiding criterion precise, and support statistic analysis, DSM modeling falls short in

the following respects:

First, we have found that building DSMs representing conceptual design structures can be error-

prone and time-consuming. Our recent work [18] has revealed errors in published DSMs. Many

of the errors are due to the difficulty of seeing transitive relations among dependencies; and others,


due to the lack of any precise definition of dependence. Recent work on matrix design represen-

tations constructs dependence structures from source code: MacCormack et al. [50] have modeled

the architectures of Mozilla and Linux using DSMs; Sangal et al. [55] have applied a commercial

product called Lattix [46] to analyze the architecture of Haystack, an information retrieval system

that has evolved over several years. In these efforts, the authors use code-level structures and de-

pendences as proxies for conceptual design structures. However, as we have pointed out, designers

frequently face questions before coding. The number of errors we found even in small design-level

DSM models suggests that manual DSMs are not a proper pre-coding design representation.

Second, a DSM only represents design dimensions, but not concrete choices within each di-

mension or the semantics of the constraints that relate decisions across dimensions. For example,

possible choices for the notification policy could be either push or pull, each having different con-

sequences. Similar to ADL and UML, DSMs do not explicitly express these choices, nor do they

support the analysis of their consequences.

Third, there are usually multiple ways to accommodate a change, but a DSM model does not

reveal each of them explicitly. Rather, a DSM reflects the union of possible ways in which a

given change might be accommodated. Indeed, in the presence of multiple compensations, the very

meaning of a dependence mark in DSMs becomes unclear: does a mark mean must change or is

subject to change in some scenario or could be changed but does not have to be?

In his dissertation [71], Woodward identifies the lack of support for the representation of nested

design spaces as a key shortcoming in DSM modeling. In work that aims to develop a theory of the

relationship between design structure and business strategy, he proposes to address this problem by

representing substitutable alternatives as values in inheritance hierarchies. Our work differs from

his in several ways. We provide precise semantics for dependences through our logical constraints,

and support the modeling of both complex subspaces and crosscutting constraints.

As a result, DSM modeling, despite its utility, has some significant weaknesses in terms of

support logically precise design analysis or a rigorously formal theory of coupling in design. One

contribution of our formal work, as discussed in the following chapters, is to provide a precise

formulation of the notion of pair-wise dependence between design decisions, which is at the heart


of our method for computing DSMs having unambiguous semantics.

In summary of this section, the problem, in a nutshell, is that we continue to lack abstract

design representations that allow us to model or to reason adequately about the technical and eco-

nomic implications of dependences among complex software design decisions and relevant external

conditions.

This dissertation presents an analyzable formal design modeling framework addressing the

identified problems. The framework consists of a design representation approach and a number

of tool-supported analyses. The design representations approach models design spaces with de-

sign dimensions, external conditions, recursive structures, and crosscutting constraints; formally

accounts for Parnas’s information hiding modularity and the key notions in Baldwin and Clark’s

theory. The analysis techniques include (but not limited to): automatic DSM derivation—linking

software designs to existing engineering tools; quantitative changeability analysis–Parnas’s well-

known analysis previously done qualitatively; and NOV computation.

Chapter 3

Overview of Core Modeling and Analysis Approaches

This chapter presents an informal and intuitive overview of our framework using a small example

to illustrate a full picture of the core models and analysis techniques. Figure 3.1 shows the relations

among the core models of our framework, and the automated analyses it enables.

3.1 Core Model Overview

The rounded boxes in Figure 3.1 represent the three core models representing decision-making

phenomena from different perspectives:

1. The augmented constraint network (ACN) consists of a constraint network modeling di-

mensions in which design decisions are made and constraints on decisions across these di-

mensions, and two additional data structures to formally account for the dominance relation

among design decisions, and the multiple ways a system can be clustered into modules. These

augmentations originate from corresponding notions in Baldwin and Clark’s DSM modeling.

These notions have played important roles in Baldwin and Clark’s DSM-based modularity

analysis, and we found that formalizing these notions and combining them with a constraint

network provide additional analysis power.

2. The design automaton (DA) is an operational, state-machine model that represents the dy-

namics of design variations driven by changes in design decisions. The DA model of an

26

Chapter 3. Overview of Core Modeling and Analysis Approaches 27

��

� ��

��

��

��

��

��

��

��

��

��

��

��! ��

��"��# ��"$��!�� %�

��$��&�� '''

��

!��

��

(��

Figure 3.1: Core Models and Analysis

ACN is derived from the constraint network and the dominance relation of that ACN. The

DA model enables automated design impact analysis: given an original design and a se-

quences of envisioned changes, a DA model tells the different ways to accommodate these

changes.

3. The pair-wise dependence relation (PWDR) represents a summary pair-wise coupling rela-

tion on design dimensions. A PWDR summarizes the dependence relation modeled by a DA,

and is derived from the DA. Our prototype tool, Simon, can generate a DSM using the derived

PWDR to populate the matrix, and using a selected cluster to arrange the order of variables.

DSM modeling is at the heart of Baldwin and Clark’s modularity reasoning, and has proven

utility for engineering and economic analyses. This way, our framework, in principle, con-

nects conceptual designs modeled using ACNs with these existing analysis developed in other

realms.


Irwin et al. used a matrix example to illustrate how their aspect-oriented model addresses the

problems of object-oriented models [27]. This section models the small matrix design to pro-

vide a full picture of ACNs, DAs, PWDRs, and explains their connections to existing analyses.

Chapter 4 shows that our framework has the ability to uniformly account for aspect-oriented and

object-oriented modularity that appear to be distinct.

3.2 Augmented Constraint Network

For a matrix class, the best choice for its underlying concrete data structure depends on how the

client uses the matrix [27]. An array could be the best choice for a dense matrix, and a linked

list could be the best for a sparse matrix. The algorithms that implement the class methods must

correspond to the selected data structure. In our modeling and analysis approaches, we consider

both the design dimensions: data structure and algorithm, and the environment condition, the client

demand characteristics.

Following a long line of work in design theory and design automation, we took finite-domain

constraint networks (CNs) [51, 32] as the core of our design space representation. Logical con-

straints are a natural, powerful and already well understood notation for representing design di-

mensions. However, CNs do not model a number of important design decision-making issues that

are indispensable for our analysis. In particular, CNs do not readily model the dominance relation

among design decisions, and that there are multiple ways a system can be clustered into modules.

These are the key notions in Baldwin and Clark’s DSM modeling on which their modularity theory

is based. We augment a CN with additional data structures to formally model these notions, and

call it an augmented constraint network (ACN), which enables the automatic derivation of DSMs.

3.2.1 Constraint Network

Figure 3.2 shows the small matrix design modeled using a constraint network. A constraint network

consists of a set of variables, each having a domain comprising of a set of values, and the constraints

among these variables and values [51]. To model a conceptual design, each variable models a


1: scalar matrix: (dense,sparse);2: scalar ds: (list_ds,array_ds,other);3: scalar alg: (array_alg,list_alg,other);4: ds = array_ds => matrix = dense;5: ds = list_ds => matrix = sparse;6: alg = array_alg => ds = array_ds;7: alg = list_alg => ds = list_ds;

Figure 3.2: Matrix Constraint Network

design or relevant environmental dimension; the domain of a variable models possible choices

within each dimension; a domain comprises a set of values, each representing a possible decision

or an environmental condition.

In Figure 3.2, the scalar variables, ds and alg in lines 2 and 3, represent the dimensions of data

structure and algorithm; the matrix variable in line 1 represents the client demand. Their domains

follow within the parentheses. We use other as a value in many domains to model unelaborated

other possibilities. In Figure 3.2, for example, Line 2 models that the choices for the data structure

dimension are array, list, and other unelaborated choices.

A constraint network models the interdependence relation among design variables and environ-

ment conditions as a set of logical constraints. Line 4 in Figure 3.2, ds = array ds ⇒ matrix =

dense, states that the choice of an array is valid only if the client needs dense matrices. Logically,

the binding of the assuming variable implies the assumed binding. This might seem counterintu-

itive, but there could be other data structure choices that are also consistent with density, and we

do not want to model an overly constrained design in which array is the only choice. Thus the

implication arrows are opposite of what one might initially expect.

The variables, domains, and constraints constitute a finite-domain constraint network (FDCN),

the core of our framework.

3.2.2 Augmentation 1: Dominance Relation

However, constraint networks alone are not sufficient to model important design phenomena. Bald-

win and Clark’s theory proposes an important concept called design rules (DRs), and explains how

DRs decouple dependences among design decisions [7], which is essential to achieving a modular


structure. Sullivan et al. [64, 63], Griswold et al. [37], and Lopes et al. [49] have applied Baldwin

and Clark’s reasoning to software designs. The essence of the design rule concept is asymmetric

dominance. An abstract interface represents an instance of DR in software design: the designer

of an interface might prevail upon the implementer to conform to the interface specification; while

the implementer should not have the right to change the interface. Environment conditions are of-

ten outside of the designers’ sphere of influence, which represents another instance of asymmetric

dominance. The client demand condition matrix is such an example.

To address the problem that CNs do not readily model the asymmetric dominance relation, we

augment a CN model with a binary relation, dominance. (x,y) ∈ dominance indicates that, due to

policy or lack of control, changes in x cannot be compensated for by changes in y (even if changes

in y can be accommodated by changes in x). In the matrix example, assuming that the client’s need

dominates and that the design decisions must adapt accordingly, the matrix dominance relation thus

includes the following two pairs: (ds, matrix) and (alg, matrix).

3.2.3 Augmentations 2: Clustering

Module is another essential concept in software design. Parnas’s information hiding principle pro-

poses the criterion to decompose a system into modules; Baldwin and Clark’s modularity theory

uses a concept called proto-modules to denote a state where variables are aggregated as clusters,

but there are still dependences among these clusters. Their theory then explains how design rules

work by breaking the dependences among proto-modules to generate true modules—modules that

only depend on design rules, but not on “hidden” variables in other modules.

These theories center around operations on modules, a concept that a constraint network does

not lend itself to modeling. To address this problem, we also augment a CN model with an addi-

tional structure, cluster, to express the a priori clustering of subsets of variables into proto-modules.

The same design can have different clustering methods, reflecting different stakeholders’ views of

the design. The design decisions within a class, for instance, are usually considered together. On

the other hand, parts of a class may collaborate with parts from other classes to implement a feature,

which can also be viewed as a module. As a result, a CN can associate with multiple clusters, that


Figure 3.3: Matrix ACN model

is, a cluster set.

3.2.4 Core Design Description Model Summary

We call a constraint network augmented with a dominance relation and a cluster set an augmented

constraint network (ACN). Figure 3.3 shows the matrix ACN developed using our tool, Simon. Si-

mon allows the user to input the dominance relation through a grid GUI, as shown in Figure 3.3 (B).

Figure 3.3 (C) shows that there are two clusters in the matrix cluster set: Cluster1 and Cluster2; the

selected cluster Cluster2 contains two modules: environment and design; the environment module

contains the matrix variable; a design module contains the other two variables.

Describing conceptual designs using ACNs is general, abstract, and can be used at any level

of detail, from high-level specifications and architectural decisions to extremely detailed ones. It

captures the notion of design as a decision-making problem under constraints, and spans both design

variables and environment variables (which are not formally different from design variables). Many

concerns can naturally be represented as variables: security policy, choices of function or class

names and signatures, choices of design patterns to use, etc. These concerns play important roles


in software evolution, but prevailing design modeling methods are not designed to model them.

In addition to a dominance relation and a cluster set, other non-logical data structures could

be added to the core model. For example, Baldwin and Clark’s net option value (NOV) model

requires additional parameters, such as the technical potential of each module (cluster). Simon

supports NOV computation by providing a GUI in which the user can input estimated parameters

for a derived DSM, and compute its NOV value automatically. While the FDCN concept is limited,

we view it as a reasonable starting point for a formal account of coupling in design, viewed as a

decision-making activity.

A logic-based design description is not sufficient for reasoning design evolvability and eco-

nomic properties. Section 3.3 introduces a derived design evolution model that represents the

change dynamics within a design space, based on the propagation of changes through the constraint

network of an ACN. Section 3.4 introduces a derived pair-wise dependence model. Section 3.5

explains how these models connect to existing analysis techniques.

3.3 Operational Design Space Evolution Model

From an ACN model, we derive an evolution model called a design automaton (DA). The state set

of a DA is the design space implied by an ACN; the transitions of a DA model the design variation

constrained by the ACN. Figure 3.4 shows the full picture of the matrix DA.

3.3.1 Design Spaces

An ACN model brings up several notions that lead to the concept of a design space: The binding of

a value to a variable models a design decision or an environmental condition. An assignment is a

set of bindings, modeling a set of given decisions or environment conditions. For example, {matrix

= dense, ds = array ds} is an assignment. A valuation is an assignment involving all the variables

in the ACN. {matrix = dense, ds = array ds, alg = array alg} is a valuation.

A valuation of the variables of an ACN, that is, a binding of values to variables, satisfies a

constraint if and only if its projection onto the variables of that constraint is consistent with at least


Figure 3.4: The Matrix Design Automaton


Table 3.1: Matrix Design SpaceS0 matrix = sparse ds = other ds alg = other algS1 matrix = dense ds = other ds alg = other algS2 matrix = sparse ds = list ds alg = other algS3 matrix = sparse ds = list ds alg = list algS4 matrix = dense ds = array ds alg = other algS5 matrix = dense ds = array ds alg = array alg

one permitted assignment of that constraint. For example, the valuation {matrix = dense, ds =

array ds, alg = array ds} satisfies the constraint ds = array ds ⇒ matrix = dense because one of

its permitted assignment, {matrix = dense, ds = array ds}, is the subset of the valuation. A valid

design is a solution to the constraint network, that is, a valuation that satisfies all the constraints

defined in the ACN. All the valid designs constitute a design space as modeled by a given ACN 1.

Table 3.1 presents all the valid designs within the matrix design space. The designs are numbered

and constitute the state set of the matrix DA. Figure 3.4 illustrates these ideas.

3.3.2 Change Dynamics

Changing the value of one design decision can produce a valuation that violates one or more con-

straints. For example, if we start with the design S5 in Table 3.1, {(matrix = dense), (ds = array ds),

(alg = array alg)}, and change the data structure, ds, to list ds, the resulting state violates a con-

straint, producing an invalid design state. If such an invalidating change to a given decision is

forced, then, in general, the values of some subset of other variables will have to change in order to

restore the design to a consistent design state. In this case, both ds and alg have to be changed. Fig-

ure 3.5 depicts part of the matrix DA where all changes are originated from design S5, illustrating

three key properties of a DA:

1. We require that each transition in a DA is minimal. That is, each destination state differs only

minimally from the previous state, in the sense that no constituent change could be undone

while still preserving consistency. In Figure 3.5, starting with S5, if ds is changed to other ds,

1In general, there are many possible dimensions for a given set of requirements outside of the space modeled by anACN. Baldwin and Clark use the term design space to refer to the larger space of all possibilities. In this sense, an ACNis an explicit representation of a subspace of interest


��

��

��

��

��

��

��

��

��

��

��

� ��

� ��

� ��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

� ��

� ��

Figure 3.5: Partial Matrix Design Automaton

then there are at least two designs that can accommodate this change: S0: {matrix = sparse,

ds = other ds, alg = other ds} and S1: {matrix = dense, ds = other ds, alg = other ds}.

Changing alg to other ds in both S0 and S1 is indispensable, but changing matrix to sparse

in state S0 is not. We consider the transition from S5 to S1, labeled with {ds = other ds}, as

minimal, while the dotted arrow transition from S5 to S0 with the same label is invalid. As a

result, each transition in a DA models a minimal design perturbation.

2. A DA is nondeterministic. In general, there are multiple ways to accommodate a change. In

Figure 3.5, starting from state S5, {(matrix = dense), (ds = array ds), (alg = array alg)},

changing the client preference to sparse makes the design inconsistent. Making a set of

minimal changes to other variables to restore consistency leads to states S0, S2, or S3.

3. No transition in a DA may violate the dominance relation. If (x,y) ∈ dominance, then among

all the possible ways to restore consistency in the face of a change to x, those involving y are

excluded. For the matrix example, because (ds,matrix) ∈ dominance, the transition starting

from S5, triggered by changing ds to list ds, and leading to the client change in S3 (the dotted

arrow labeled ds = list ds) is precluded.

In summary, a DA captures all of the possible ways in which any change to any decision in any

state of a design can be compensated for by changes to minimal subsets of other decisions.


3.4 Pair-wise Dependence Relations

Pair-wise Dependence Relations (PWDRs) underlie many influential design representations. In

box-and-arrow style representations, such as ADL, UML, Call Graph, and Reflexion Models [53],

the arrows model different kinds of pair-wise dependence relations among boxes, such as function

calls, inheritance, and system I/O. DSMs, a special case of PWDR on design decisions, are the core

data structure underlying Baldwin and Clark’s theory.

Based on the DA model, we contribute a precise definition of what it means for one variable

to depend on another, enabling the automated derivation of PWDRs from DAs. In our work, we

have defined this situation as follows. Intuitively, for some consistent design state s in a DA, if

there is some change to a variable, x, such that the value of another variable, y, is changed in some

minimally perturbed destination state s′ of the DA, we say that y depends on x. We define the

coupling structure of a design ACN as the pair-wise dependence relation (PWDR) over all of its

variables: if y depends on x, then (x,y) ∈ PWDR.

We have shown that if the original design is S5: {(matrix = dense), (ds = array ds), (alg =

array alg)} and the envisioned change in client is (matrix = sparse), there are three new designs

accommodating this change in its DA:

S0: {(matrix = sparse), (ds = list ds), (alg = list alg)},

S2: {(matrix = sparse), (ds = list ds), (alg = other alg)}, or

S3: {(matrix = sparse), (ds = other ds), (alg = other alg)}.

Comparing the original design S0 with any of these new designs, we observe that both ds and alg

are involved in the minimal perturbations caused by the change to matrix. That is, ds depends on

matrix and alg depends on matrix. Similar analysis concludes that ds and alg depend on each other.

As a result, the matrix PWDR is the following set:

{(matrix,ds),(matrix,alg),(ds,alg),(alg,ds)}.


3.5 Connections to Evolvability and Economic Analysis

The three core models, ACNs, DAs, and PWDRs connect to (but are not limited to) the follow-

ing well-known evolvability and economic analyses, providing the foundation to automate these

analyses using tools.

1. Parnas’s changeability analysis. Given a software design, what are all the ways to compensate

for an anticipated sequence of individual changes? The question can be formulated as a map-

ping from a DA, an assignment modeling the current design, and a sequence of variable-value

pairs that model a sequence of changes to individual design decisions, to a set of sequences

of consistent design states modeling the feasible evolution paths for the given sequence of

changes. To compute this set of paths, we find the paths that start from the initial design and

go along the edges labeled with specified changes. Each path represents one way to com-

pensate for the given changes. The destination states are the new designs accommodating

the given changes. Chapter 7 formalizes these models, this problem, and its solution. Simon

automates this analysis based on the formalization. Chapter 4 shows how Simon automates

Parnas’s changeability analysis on KWIC precisely.

2. Parnas’s information hiding criterion. Sullivan et al. [64] previously published an observa-

tion that in an information design, the design rules are invariant with respect to changes in the

environment and that such changes should be accommodated by changes to hidden (subor-

dinate) design variables [64]. After clustering the variables representing external conditions

into an environment module, and clustering all the design rules into a design rule module, we

are able to formalize this principle as a predicate stating that a PWDR derived from an ACN

should not have any pair with a first element in the environment module, and the second in

the design rule module. This influential principle thus becomes a formal and mechanically

checkable criterion.

3. Design structure matrix analysis. A DSM, as introduced in Chapter 2, can be seen as com-

posed of a PWDR and an a priori clustering of variables. This framework provides rigorous

semantics for DSMs, and enables their automated generation from precise logical models.


Figure 3.6: Matrix DSM Generated by Simon

A PWDR derived from an ACN can be used to populate a DSM, and a cluster of the ACN

can be used to express the order in which the rows and columns are presented. Figure 3.6

is the DSM that Simon generates from the matrix ACN model. Chapter 4 shows how the

derived DSMs reveal both errors in published models and an issue overlooked by Baldwin

and Clark’s theory.

There are many existing analytical techniques and tools developed around DSMs in other

engineering realms, such as DeMAID/GA [42]. These tools analyze design architectures for

project scheduling, cyclic dependence detection, and so forth. In principle, our framework

connects conceptual designs expressed as ACNs with these existing engineering analysis

techniques, but this dissertation does not go further in this dimension.

4. Net option value computation. Chapter 2 introduced Baldwin and Clark’s net option value

model based on DSMs, and its application in software engineering. Simon supports the

association of NOV parameters with automatically derived DSMs, and computes NOV values

automatically.

3.6 Chapter Summary

In summary, this chapter has introduced the full picture of the three core models of our frame-

work using a small Matrix example: (1) the augmented constraint network (ACN) is a declara-

tive, constraint-based design model that describes dimensions in which design decisions must be


made and constraints on decisions across these dimensions; (2) the design automaton (DA) is an

operational model that represents the dynamics of design variations driven by changes in design

decisions; and (3) the pair-wise dependence relation (PWDR) represents a summary pair-wise cou-

pling relation on design dimensions. Of these three models, the ACN is primary, while the DA and

PWDR models are derived from a given ACN. The DA and PWDR models link conceptual designs

with existing evolvability and economic analysis techniques: the DA model enables quantitatively

changeability analysis; and the PWDR model enables the derivation of DSMs, which have proven

utility for engineering and economic analyses.

Chapter 4

Modeling and Analysis of a Benchmark Design

Parnas’s KWIC (Key Word in Context) index system is a well-known and well-established bench-

mark for assessing concepts in software design. Sullivan et al. [64] presented an informal study

in which they tested the applicability of Baldwin and Clark’s theory to, and its potential value for,

software architectural design. Their experimental method was to test the theory’s ability to ac-

count for Parnas’s notion of, and his conclusions concerning, information hiding as a criterion for

modularizing designs. They thus conducted what amounted to a replication study of Parnas’s Key

Word in Context (KWIC) examples. They developed DSM models of Parnas’s KWIC designs, de-

rived options values estimated from the DSMs using Baldwin and Clark’s net option value models,

compared the results against Parnas’s conclusions, and found that the theory of Baldwin and Clark

made predictions consistent with the conclusions that Parnas had previously reached: the informa-

tion hiding criterion can add significant economic value to designs. This chapter presents the formal

replication of their informal study to evaluate the following claims:

1. Our framework provides a formal basis for (1) Baldwin and Clark’s key notions of design

dimension, design decision, design decision dependence, and design space; (2) for Sullivan

et al.’s formulation of Parnas’s notion of information hiding (as invariance of design rules

with respect to changes in environment parameters), but in a rigorously precise form that is

checkable by automated tools. We evaluate this claim by constructing the formal models of

KWIC systems.

40

Chapter 4. Modeling and Analysis of a Benchmark Design 41

2. Our framework enables evolvability analysis of precise and abstract representations of design

architectures. To evaluate this claim, we formally model the five possible changes Parnas pos-

tulates as decision problems, and then use the Simon design impact analysis GUI to reveal the

differences of the two designs with respect to each change. We found that Parnas’s analysis

results are verified with numbers.

3. Our framework enables the automatic derivation of DSMs. Sullivan et al. previously con-

structed the DSMs for KWIC designs, based on which they applied Baldwin and Clark’s net

option value analysis and revealed Parnas’s information hiding criterion visually. We derive

DSMs using Simon for each design and find that the derived DSMs reveal the same informa-

tion hiding observation.

4. Our framework clarifies the notion of pair-wise dependence and makes the derived model

more reliable. We have found that manual DSM construction and NOV computation took us

a lot effort but still left a lot of ambiguities in the models. We compare the derived DSMs

with Sullivan et al.’s manually-constructed DSMs, and the comparison reveals several errors

and ambiguities in the published manually-constructed models, showing the power of formal

models and automated analysis.

5. Our framework automates Baldwin and Clark’s net option value analysis. Since there are

inconsistencies between the manually-constructed and the automatically derived DSMs, are

the NOV values we calculated based on the manual models still valid? We found that although

the comparative results are still valid, the NOV value for each model does change.

6. Our framework provides a more reliable basis for Baldwin and Clark’s modularity in design

analysis. Our experiment reveals an issue of Baldwin and Clark’s net option value computa-

tion based on their manually-constructed DSMs.

Section 4.1 introduces the Key Word in Context system using its standardized model [57].

Section 4.2 introduces in detail how the two KWIC designs are modeled by ACNs. Section 4.3

through Section 4.5 present the analysis results.


��

��

��

��

� ��

� � � � ��

� �� !�

Figure 4.1: KWIC Sequential Design Architecture [57]

4.1 Key Word In Context

In his seminal paper [54], Parnas describes the KWIC (Key Word in Context) index system as

follows:

“The KWIC index system accepts an ordered set of lines, each line is an ordered set of words,

and each word is an ordered set of characters. Any line may be “circularly shifted” by repeatedly

removing the first word and appending it at the end of the line. The KWIC index system outputs a

listing of all circular shifts of all lines in alphabetical order.”

Shaw et al. have standardized the architectural representations of the two KWIC designs Parnas

presents [57]. This box-and-arrow style representation, as shown in Figure 4.1 and 4.2, models these

functions, shared date structures, and I/O medium as blocks, and models direct memory access,

function calls, and system I/O using arrows. According to these figures, in the first sequential

design (SD), modules correspond to steps in the sequential transformation of inputs to outputs. The

SD design decomposes the system according to four basic functions performed: Input, Circular

Shift, Alphabetizing, Output, and Master Control (main).

In the second information hiding (IH) design, modules decouple design decisions deemed com-

plex or likely to change. Figure 4.2 represents the following modules as boxes:


��

��

� ��

��

��

��

� � ��

��

� ! "

� ! "

��

� ! "

��

��

��#��

�� #

��

� ! "��

��

� ! "

Figure 4.2: KWIC Information Hiding Design Architecture [57]

• The Linestorage module holds all characters from all words and lines.

• The Input module reads the data from a file and stores it in the Linestorage module.

• The Circularshift module produces circular shifts of lines and stored them in Linestorage

module.

• The Alphabetizing module sorts circular shifts alphabetically.

• The Output module prints the sorted shifts.

• The Master control module controls the sequence of method invocations in other modules.

Contrary to the Sequential design, in the IH design, the data is not shared between the computational

components. Instead, the IH design uses abstract data type (ADT) interfaces to decouple key design

decisions involving data structure and algorithm choices so that they can be changed without unduly

expensive ripple effects. For example, the Linestorage module provides the public interface that

allows other modules to set a character in a particular word in a particular line, read a specific

character, read, set or delete a particular word in a specific line, read a whole line at once, etc.


4.2 ACN KWIC Models

This section explains how we used Simon to model Parnas’s KWIC designs into ACN models by

identifying variables and values, constraints, and dominance relations from Parnas’s prose [54], and

clustering these variables in different ways.

4.2.1 Variables and Values

Figure 4.3 shows our KWIC SD constraint network model. In this design Parnas views each in-

terface as providing two parts: an exported data structure and a function signature to be invoked

by the Master Control module. Given choices for these parameters, programmers produce function

implementations. As a result, we modeled the choices of function signature, data structure, and

implementation as design variables. For example, the Input module is modeled by three variables:

input_sig, input_ds and input_impl. As shown in Figure 4.3, variables ending with “_sig”

model the function signatures. The choices of implementations are modeled by the variables end-

ing with “_impl”. The choices of data structures are modeled by the variables ending with “_ds”.

Parnas assumes original designs in each case and analyzes the impact of changes.

We use orig (short for original) to generally represent a currently selected design decision in a

given dimension. There are many cases in which designers don’t need to think in terms of choices

from a small, finite domain: for example, the implementation of a class. However, once the designer

decides how to implement a class, a decision has been made implicitly, and there are always new

ways to implement the class, reflecting new decisions. As a result, we use (orig, other) as a

default domain for design dimensions without simple discrete choices. For example, the input_sig

has domain {orig, other}.

Figure 4.4 shows the constraint network for IH design. A new module, Line Storage, is present.

Its data structure variable linestorage_ds replaces the input_ds of the sequential design. The

IH Input module has no separate data structure. In the IH design, each module is also equipped

with an abstract data type interface, modeled by variables ending with “_ADT”. We model module

implementations and data structures in the same way.


1: envr_input_format:{orig,other};2: envr_input_size:{small,medium,large};3: envr_core_size:{small,large};4: envr_alph_policy:{once,partial,search};5: input_sig:{orig,other};6: circ_sig:{orig,other};7: alph_sig:{orig,other};8: output_sig:{orig,other};9: master_sig:{orig,other};10: input_ds:{other,core4,disk,core0};11: circ_ds:{index,copy,other};12: alph_ds:{orig,other};13: output_ds:{orig,other};14: input_impl:{orig,other};15: circ_impl:{orig,other};16: alph_impl:{orig,other};17: output_impl:{orig,other};18: master_impl:{orig,other};19: input_impl = orig => input_sig = orig && input_ds = core4;20: circ_impl = orig => circ_sig = orig;21: alph_impl = orig => alph_sig = orig;22: output_impl = orig => output_sig = orig;23: master_impl = orig => master_sig = orig;24: circ_impl = orig => circ_ds = index;25: alph_impl = orig => alph_ds = orig;26: output_impl = orig => output_ds = orig;27: master_impl = orig => input_sig = orig;28: alph_impl = orig => circ_ds = index;29: alph_ds = orig => circ_ds = index;30: alph_impl = orig => input_ds = core4;31: circ_impl = orig => input_ds = core4;32: circ_ds = index => input_ds = core4;33: circ_ds = copy => input_ds = core4;34: output_impl = orig => input_ds = core4;35: output_impl = orig => alph_ds = orig;36: alph_ds = orig => input_ds = core4;37: input_ds = core4 => envr_input_size = medium || envr_input_size = small;38: input_ds = core0 => envr_input_size = small && envr_core_size = large;39: input_ds = disk => envr_input_size = large;40: circ_ds = copy => envr_input_size = small || envr_core_size = large;41: input_impl = orig => envr_input_format = orig;42: alph_impl = orig => envr_alph_policy = once;43: master_impl = orig => circ_sig = orig;44: master_impl = orig => alph_sig = orig;45: master_impl = orig => output_sig = orig;46: alph_ds = orig => envr_alph_policy = once;

Figure 4.3: KWIC Sequential Design Constraint Network


1: envr_input_format:{orig,other};2: envr_input_size:{small,medium,large};3: envr_core_size:{small,large};4: envr_alph_policy:{once,partial,search};5: input_ADT:{orig,other};6: linestorage_ADT:{orig,other};7: circ_ADT:{orig,other};8: alph_ADT:{orig,other};9: output_ADT:{orig,other};10: master_ADT:{orig,other};11: linestorage_ds:{core0,core4,disk,other};12: circ_ds:{copy,index,other};13: alph_ds:{orig,other};14: output_ds:{orig,other};15: linestorage_impl:{orig,other};16: input_impl:{orig,other};17: circ_impl:{orig,other};18: alph_impl:{orig,other};19: output_impl:{orig,other};20: master_impl:{orig,other};21: linestorage_impl = orig => linestorage_ADT = orig && linestorage_ds = core4;22: input_impl = orig => input_ADT = orig;23: circ_impl = orig => circ_ADT = orig && circ_ds = index;24: alph_impl = orig => alph_ADT = orig && alph_ds = orig;25: output_impl = orig => output_ADT = orig && output_ds = orig;26: master_impl = orig => master_ADT = orig && linestorage_ADT = orig &&input_ADT = orig && circ_ADT = orig && alph_ADT = orig && output_ADT = orig;27: alph_impl = orig => circ_ADT = orig && linestorage_ADT = orig;28: circ_impl = orig => linestorage_ADT = orig;29: input_impl = orig => linestorage_ADT = orig;30: output_impl = orig => linestorage_ADT = orig && alph_ADT = orig;31: linestorage_ds = core4 => envr_input_size = medium || envr_input_size = small;32: linestorage_ds = core0 => envr_input_size = small && envr_core_size = large;33: linestorage_ds = disk => envr_input_size = large;34: circ_ds = copy => envr_input_size = small || envr_core_size = large;35: alph_ds = orig => envr_alph_policy = once;36: input_impl = orig => envr_input_format = orig;37: alph_impl = orig => envr_alph_policy = once;

Figure 4.4: KWIC Information Hiding Design Constraint Network


Next we identify and model several critical dimensions in Parnas’s analysis. The sentence: “this

module reads the data lines from the input medium and stores them in core for processing by the

remaining modules. The characters are packed four to a word. . . ” implies a possible choice for the

input data structure dimension (modeled by input_ds in the SD design, and by linestorage_ds

in the IH design): a choice to pack four to a word. Similarly, the sentences: “[i]n cases where we

are working with small amounts of data it may prove undesirable to pack the characters; time will

be saved by a character per word layout. In other cases we may pack, but in different formats.”

and “for large jobs it may prove inconvenient or impractical to keep all the lines in core. . . .” imply

two other choices for the input data structure dimension: unpacked or disk storage. We model these

choices as a domain shared by input_ds in the SD ACN and linestorage_ds in the IH ACN:

{core4, core0, disk, other}.

These sentences also imply an important environment condition, input size, and its possible

conditions: small (fits packed in a small memory or unpacked in a large memory), medium (fits

in either memory if packed), or large (too big even for a large memory). In both Figure 4.3 and

Figure 4.4, this dimension is modeled as envr_input_size:{small, medium, large}. Simi-

larly, “Again, for a small index or a large core, writing them out may be the preferable approach.”,

implies a variable envr_core_size:{small, large}.

According to Parnas’s statements on the circular shift module: “. . . it prepares an index. . . it

leave its output in core. . . ,”, we identify a choice in the circ_ds dimension: index. From the

sentence: “for a small index or a large core, writing them out [copying] may be . . . preferable

[to indexing]. . . ”, we have another value for variable circ_ds: copy. As a result, the circ_ds

variable has a domain: {index, copy, other}.

4.2.2 Constraints

We represent the relationships among these design decisions as logical constraints, expressing the

conditions under which various decisions are valid.

In SD design, function implementations make assumptions about both the function signatures

and relative data structures. For example, the circular shift function implementation (circ_impl)


has to know the circular shift function signature (circ_sig) and how the circular shift data

(circ_ds) is arranged in core. According to Parnas, in the original design circ_ds = index.

Lines 20 and 24 in Figure 4.3 model these constraints. To implement this function, it also has to

know the data structure of the Input module. In the current design, the characters are packed four to

a word, which is modeled as input_ds = core4. The constraint is modeled in Figure 4.3 line 32.

In the IH design, a module only knows the ADTs of other modules. For example, the circular

shift implementation (circ_impl) now assumes the linestorage_ADT, but not the line storage

data structure, as shown in Figure 4.4 line 28. For another example, Line 31, 32, and 33 in Figure 4.4

model that the effort to store data on disk is worthwhile only for large inputs; the choice to store

data unpacked works only for small inputs and large memories; and the choice to pack data makes

sense only for small and medium input sizes.

These environment conditions, design dimensions, possible choices within each dimension, and

their internal constraints are fundamental to Parnas’s changeability analysis. However, prevailing

box-and-arrow style representations, such as the ADL figures, are not designed to model them, nor

to enable rigorous and automated analyses.

4.2.3 Dominance Relation

In the SD design, Parnas noted: “All of the interfaces between the four modules must be spec-

ified before work could begin...” This sentence implies the choices of function signatures and

data structures that dominate other design variables. Consequently, the SD dominance rela-

tion includes pairs like: (input_impl, input_sig), (input_impl, input_ds), etc. Simi-

larly, in the IH case, the choices of ADT interface definitions dominate other decisions, and

pairs like (linestorage_ds, linestorage_ADT) are thus set in the IH dominance relation.

The interfaces and data structures in the SD ACN are design rules. The ADT interfaces in

the IH design are design rules. In both designs, we assume that the environmental conditions

are out of the control of the designers. Accordingly, (linestorage_ds, envr_input_size),

(linestorage_ds, envr_core_size), etc., are included in the dominance relations of both

ACNs.


4.2.4 Cluster Set

There are multiple ways to cluster a design. Figure 4.5 shows the Simon clustering GUI supporting

different views of the same KWIC design. For purposes such as task assignment, we want to group

all variables involved in a particular function into a single module. For example, we could group the

envr_alph_policy, alph_ADT, alph_ds and alph_impl into a module, as shown in Figure 4.5

(b).

In the earlier work of Sullivan et al. [64], the authors observed that for a design to be truly

an information hiding modularization, the design rules should be invariant under changes in en-

vironment variables. To evaluate these two designs against this criterion, we want to cluster the

environment parameters, design rules and subordinate variables respectively into proto-modules. In

this case, for example, we group the envr_alph_policy, envr_input_size, envr_core_size,

and envr_input_format into an environment module, as shown in Figure 4.5 (c).

So far, we have modeled all the dimensions necessary for a number of analyses.

4.3 Quantitative Changeability Analysis

Parnas presents a comparative analysis of the changeability of the two designs based on their ability

to accommodate the following possible changes:

1. “The input format changes”, which implies that there could be other input format choice

other than the current one. Accordingly, we model the domain of envr_input_format as

(orig, new). In the original design, envr_input_format = orig. The change is modeled

as envr_input_format = new.

2. “The input size becomes so large that not all lines can be put in core”. We model this change

as envr_input_size = large.

3. “The input size gets so small that a word could be unpacked”, modeled as

envr_input_size = small.


(a) No Clustering

(b) Task Assignment View

(c) Design Rule View

Figure 4.5: Simon Clustering GUI for the KWIC IH Design


4. “The alphabetizing policy is changed to partial or search”,

modeled by envr_alph_policy = partial and envr_alph_policy = search. In the

original design envr_alph_policy = once.

Parnas’s informal comparative analysis can be formulated as follows: given an original design,

and given changes in environment (input size, core size, etc.), what are the feasible new designs that

accommodate the given changes? In particular, how many dimensions (variables) have to change

to get to these new design states? Figure 4.6 and Figure 4.7 are the snapshots of Simon design

impact analysis GUI. Figure 4.6 shows the input GUI in which an initial SD design is selected, and

a change is specified: envr_input_size is changing from medium to large. Figure 4.7 shows the

output GUI, in which the upper list shows the evolution paths, the middle list shows the differences

between the original design and the selected destination design, and the lower list shows the selected

new design.

We summarize all the changes and their impacts on both designs computed by Simon into

Figure 4.8. The numbers in the circles represent the design states of the DAs. The double circles

are the start states. Figure 4.8 shows part of the SD and IH DAs with states S18 and S1034 as the

respective start states. S18 corresponds to the original sequential design, and S1034 corresponds to

the original IH design. The design state numbers are automatically generated by Simon. Transitions

are labeled with changes shown in the table below.

The tables associated with the end states show what other variables are changed in the desti-

nation states. For example, in the SD DA, changing the input size to large (the transition labeled

C2) leads state S18 to state S555 or S865. In both of them, seven other variables are changed to

compensate for the driving change.

The numbers in the last two columns of the lower table summarize the number of other variables

that are affected by the changes in each design. The results confirm in a fully formal way that the

IH design involves fewer redesign requirements under changes. For example, when the input size

gets large, in the SD design, 7 dimensions have to be touched, while for the IH design, only 2.

So far, we have quantitatively confirmed Parnas’s qualitative analysis. The number of modules

that have to change is obviously a simple proxy for cost, but it is essentially the measure Parnas


Figure 4.6: Tool Snapshot: KWIC SD Design Impact Analysis Input


Figure 4.7: Tool Snapshot: KWIC SD Design Impact Analysis Output


��

��

��

��

��

��

��

��

��

��

Change SD IHC1 envr_input_format = new 1 1C2 envr_input_size = large 7 2C3 envr_input_size = small 0 0C4 envr_alph_policy = partial 3 2C5 envr_alph_policy = search 3 2

� � � ��

� � � ��

� � ��

� � � ��

� � � ��

� � ��

��

� � � ��

� ��

��

� ��

� � � ��

� � ��

��

��

��

��

��

��

��

��

��

� � � ��

� � � ��

� � � ��

� � � ��

��

��

��

��

��

��

��

��

Figure 4.8: Partial Non-deterministic Finite Automaton for SD and IH design

used in this paper. Moreover, we hypothesize and believe that by associating each variable with an

economic value, we expect this model to be extended with a richer cost-of-change model to further

estimate the economic cost of each evolution step, if requested.

4.4 Design Structure Matrix Derivation

After generating the DA and the PWDR by clicking the “Solve” menu item in Simon, the user is

able to derive DSMs by providing additional clustering data. We compare the DSMs that Simon

generates from our KWIC ACN models with manual results we presented in previous work [64].

We generated DSMs through Simon using the clustering method seen in Figure 4.5 (c).

To ease the comparison, we copied and pasted the DSM generated from Simon into Excel

and marked the differences from the published manual models. Figures 4.9 and 4.10 present the

SD and IH DSMs generated by Simon and presented in Excel. In these DSMs, all the cells with

dark backgrounds and white foregrounds represent discrepancies between derived DSMs and those

developed by hand and presented in the earlier work of Sullivan et al. [64]. A blank dark cell means


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

1:envr_input_format .

2:envr_input_size . x

3:envr_core_size x .

4:envr_alph_policy .

5:input_fun_sig .

6:circ_fun_sig .

7:alph_fun_sig .

8:output_fun_sig .

9:master_fun_sig .

10:input_ds x x . x x

11:circ_ds x x x . x

12:alph_ds x x x x .

13:output_ds .

14:input_fun_impl x x x x .

15:circ_fun_impl x x x x .

16:alph_fun_impl x x x x x x .

17:output_fun_impl x x x x x x x .

18:master_fun_impl x x x x x .

��

��

��

��

Figure 4.9: KWIC SD Derived DSM

that there was an erroneous mark in the manual version. A dark cell with an “x” in it means that

the dependence was missed in the manual version. In each DSM, variables 1–4 are environment

variables. The next run of variables is the design rule variables. The final run models the remaining

open design choices.

By comparison, we are able to answer the validation questions for DSM derivation. First of all,

our computed DSMs are largely consistent with the earlier results, validating the modeling and anal-

ysis concept. They reveal exactly the same key observations: the design rules, load-bearing walls of

an information hiding design, should be invariant with respect to changes in the environment, and

such changes should be accommodated merely by changes to hidden (subordinate) design variables

within independent modules.

There are differences, however, which we now address. First, the computed DSMs reveal sub-

tle errors in the manually produced DSMs, supporting our intuition that logic modeling and au-

tomated analysis are more reliable than manual modeling and analysis. In the derived IH DSM,

cells (17, 7) and (19, 8) reveal dependences missing from the manual model. It also lacks sev-

eral dependences that should not have been present in the manual version. An extra variable,

input_ds, which is redundant with linestorage_ds, was removed. Finally, the environment vari-

ables envr_core_size and envr_input_size are also now shown as dependent, in that a change


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1:envr_input_format .

2:envr_input_size . x

3:envr_core_size x . 4:envr_alph_policy .

5:line_storage_adt .

6:input_adt .

7:circ_adt .

8:alph_adt .

9:output_adt . 10:master_adt .

11:line_storage_ds x x . x 12:line_storage_impl x x x .

13:input_impl x x x .

14:circ_ds x x . x 15:circ_impl x x x .

16:alph_ds x . x 17:alph_impl x x x x .

18:output_format . x 19:output_impl x x x .

20:master_impl x x x x x x .

��

��

��

��

Figure 4.10: KWIC IH Derived DSM

in one can be compensated for by a change to the other.

The second class of differences between Simon’s output and the manual calculation consists of

important ripple effects in the computed DSMs that are not shown in the manual version. For exam-

ple, the manually-constructed design (SD) DSM had no dependence between output_fun_impl

and circ_ds. The derived DSM revealed this dependence owing to two constraints in its ACN

model:

output_fun_impl = orig => alph_ds = orig

alph_ds = orig => circ_ds = index

Parnas’s paper confirms the presence of this dependence and the correctness of the formal model

and derived DSM. Even in such a small example, manual DSM is error-prone. Automated tool

support is critical for correct modeling and analysis of complex design constraint networks.

4.5 Net Option Value Computation

Sullivan et al. [64] computed the NOV values for the manually-constructed DSMs. Since the de-

rived DSMs are different from the manual versions, we redo the experiment to see if the results are

consistent with that previously published work.


In that work, Sullivan et al. calculated the NOV value for each design using the formula intro-

duced in Chapter 2: the SD design has 0.26 system NOV, and the IH design has 1.56, predicting

that the IH design provides 6 times more value in the form of modularity than the SD design. In

other words, the value of the SD design increases to 1.26, and the value of the IH design increases

to 2.26, suggesting that the IH version of the system was twice as valuable as the SD.

In Simon, we can repeat the result exactly by first clustering the DSMs as the manual ones,

and assign these modules the same parameters as we did before. Figure 4.11 and Figure 4.12 are

the Simon snapshots repeating the previous experiments. The right upper tables in Figure 4.11

and 4.12 show our assumptions about the technical potential, complexity, and visibility cost of the

modules in the SD and IH designs.

Figure 4.11: NOV Computation for Manual KWIC SD


Figure 4.12: NOV Computation for Manual KWIC IH

Since the derived DSMs that Simon works on are quite different from the manual ones, this

repetition assumes: (1) the new design uses the same environment parameters we used for the

manual DSMs; (2) the coupling relations among hidden modules are the same; (3) each module

has the same parameters. We now analyze if these assumptions are still valid in the newly derived

DSMs.

First, in the previous work [64], we hypothesized and categorized the possible forces driving

changes that Parnas might have selected, and appear to be implied in his analysis into three environ-

ment variables: computer configuration (e.g., device capacity, speed); corpus properties (input size,

language—e.g., Japanese); and user profile (e.g., computer savvy or not, interactive or offline), as

shown in Figures 8, 9, and 10 of the paper [64]. The environment variables we used in the ACN


modeling are the explicit translation of Parnas’s prose mentioning possible changes literally. Both

environments are valid models, and the first assumption is valid.

Second, as we can see from the DSMs in Figure 4.9 and 4.10, apart from the environment

section, the main differences between the manual DSMs and derived DSMs concentrate on the de-

pendences between design rules and hidden modules, and there are no dependences among hidden

modules in both manual and derived DSMs. So the second assumption is valid.

The third assumption is problematic though: (1) in both ACN models, we separate the interfaces

of the Master Control module from its implementation, which influences the complexity count. For

example, the derived SD DSM now has one more design variable than the manual SD model; (2) in

the manual IH model, the input_ds was redundant with linestorage_ds, and is removed in the

derived IH DSM shown in Figure 4.10. This difference influences the complexity and technical po-

tential estimations of the Input module in the derived IH DSM: the complexity changes from 0.125

to 0.0625, and technical potential lowers from 2.5 to 1.6. We show the updated NOV computations

for both designs in Figure 4.13 and 4.14.

Comparing the new NOV computation in Figure 4.13 with the old one in Figure 4.11, and

Figure 4.14 with Figure 4.12, each pair shares the same technical potential (assuming the same

environments), and has different complexity estimations. According to the derived DSMs, the

system NOV is now 0.29 for the SD design and 1.30 for the IH design. Still, focusing just on

modularity, the model predicts that the IH design provides 4.5 times more value in the form of

modularity than the SD design. Our comparative result is still valid.

During this exercise, we find an issue in Baldwin and Clark’s NOV formula introduced in Chap-

ter 2: when they calculate the visibility cost of module i of size n using the term, Zi = Σj sees icn, it

is not clear whether the ripple effects should be counted in. That is, if j sees i, and k sees j, it is not

clear if the cost of changing k should be counted.

Baldwin and Clark’s DSMs are not intend to show ripple effects and they have to compute

high-order DSMs in order to account for transitive dependences. Using our derived DSMs, all the

affected variables due to transitive relations are shown in the same column of the changing variable.

As a result, their formula can be applied unambiguously using our derived DSMs.


Figure 4.13: NOV Computation for Derived KWIC SD

4.6 Chapter Summary

This chapter evaluates a number of claims about our framework against a software engineering

benchmark, Parnas’s KWIC designs. The data and analysis have supported the following hypothe-

ses: this framework is expressive enough to formally account for key notions of Parnas’s and Bald-

win and Clark’s theories, to enable the derivation of design coupling structures as pair-wise relations

on design decisions and present them into DSMs, to automate Parnas’s changeability analysis, and

Baldwin and Clark’s net option value analysis. The automatic changeability analysis results have

verified Parnas’s qualitative results quantitatively. The comparison of the derived DSMs with man-

ual models reveals errors in published work, showing the power of formal models and automated

analysis. The automated NOV calculation enables us to do sensitivity analysis of this model, com-


Figure 4.14: NOV Computation for Derived KWIC IH

paring different results by changing different parameters.

Chapter 5

Model Decomposition and Result Integration

As with many formal analysis techniques, such as model checking, the difficulty of constraint sat-

isfaction limits the size of models that can be analyzed in practice. As the reader may have noticed,

the DA model is more demanding: it requires an explicit representation of the entire space of sat-

isfying solutions. The number of the solutions increases exponentially in the number of variables

involved, and it is impractical to represent DAs explicitly in cases where the state spaces are very

large. This chapter addresses the problem caused by the need to represent the complete state space

using a divide-and-conquer approach. Our approach is to make use of the non-trivial dominance

relation in an ACN model to split an ACN at natural breaking points into a number of smaller

sub-ACNs, solve each sub-ACN separately, and integrate partial results, but only as needed, to pro-

duce the desired answer more efficiently. The performance gain comes from two respects: (1) the

SAT solver now deals with much smaller models; (2) we no longer need to generate the full DA.

We claim that this approach has the potential to dramatically reduce analysis time at least for the

problems we studied. This chapter provides support for this claim by comparing the analysis time

used for the KWIC designs, with and without the decomposition. The experiment demonstrates

dramatic performance improvement: without decomposition, it takes hours to generate DSMs from

the KWIC IH ACN; after decomposition, we got the same results within a minute.

In this chapter, we use the KWIC information hiding ACN as a running example to show how

to decompose a KWIC IH ACN into a set of smaller sub-ACNs, solve each of them individually,

62

Chapter 5. Model Decomposition and Result Integration 63

1: envr_input_size:{small,medium,large};2: envr_core_size:{small,large};3: linestorage_ADT:{orig,other};4: linestorage_ds:{core0,core4,disk,other};5: linestorage_impl:{orig,other};6: circ_ADT:{orig,other};7: circ_ds:{copy,index,other};8: circ_impl:{orig,other};9: linestorage_impl = orig => linestorage_ADT = orig && linestorage_ds = core4;10: linestorage_ds = core4 => envr_input_size = medium || envr_input_size = small;11: linestorage_ds = core0 => envr_input_size = small && envr_core_size = large;12: linestorage_ds = disk => envr_input_size = large;13: circ_ds = copy => envr_input_size = small || envr_core_size = large;14: circ_impl = orig => circ_ADT = orig && circ_ds = index && linestorage_ADT = orig;

Figure 5.1: Partial KWIC Information Hiding ACN model

and integrate the results. Chapter 7 formalizes the decomposed models and proves the accuracy of

the integrated results. Chapter 9 presents additional evidence in this dimension.

5.1 ACN Splitting

To provide a full picture of how this approach works, we consider part of the KWIC information

hiding model involving 8 design variables and the constraints among them as an example ACN.

Figure 5.1 presents the constraint network. One of the ACN elements, cluster set, does not take

effects in the splitting approach, so we ignore it in this chapter. The dominance relation of the

example dictates the following: (1) variables starting with “_envr” (environment variables) should

not be influenced by any other variables; (2) variables ending with “_ADT” (design rules) should not

be influenced by variables ending with “_ds” or “_impl”.

Our splitting approach takes the following steps:

1. Construct a graph depicting how these variables are syntactically connected. To construct

such a graph, we first change the constraints in an ACN into a conjunction normal form

(CNF). For example, the constraints shown in Figure 5.1 are translated into the CNF shown

in Figure 5.2.


(¬linestorage impl = orig ∨ linestorage ADT = orig) ∧(¬linestorage impl = orig ∨ linestorage ds = core4) ∧(¬linestorage ds = core4 ∨ envr input size = medium ∨ envr input size = small) ∧(¬linestorage ds = core0 ∨ envr input size = small) ∧(¬linestorage ds = core0 ∨ envr core size = large) ∧(¬linestorage ds = disk ∨ envr input size = large) ∧(¬circ ds = copy ∨ envr input size = small ∨ envr core size = large) ∧(¬circ impl = orig ∨ circ ADT = orig) ∧(¬circ impl = orig ∨ circ ds = index) ∧(¬circ impl = orig ∨ linestorage ADT = orig)

Figure 5.2: Conjunctive Normal Form

Without loss of generality, we assume that each variable in the ACN is involved in at least one

clause. After that, we model each clause using a directed complete subgraph: each variable

is a node, each node connects to every other node, and their values are ignored. As a result,

the whole CNF transforms into a directed graph: Gcnf =< V,E >. V is variable set of the

ACN. We assume that this graph is at least weakly connected. Otherwise, we consider each

subgraph separately.

This graph models the most conservative dependence relation among variables: if two vari-

ables appear in the same clause, they depend on each other syntactically. As a result, for the

partial KWIC example shown in Figure 5.1, every variable connects with every other variable

directly or indirectly, as shown in Figure 5.3.

2. Remove edges according to the dominance relation. If a variable pair (vi,vj) ∈ dominance,

remove the edge < vi,vj > from the Gcnf . We call the resulting graph G =< V,E >. If G

is not weakly connected, consider each subgraph separately. In Figure 5.3, the dotted lines

labeled X are the edges to be excluded.

3. Construct the condensation graph. We use M. Sharir’s algorithm [6] to find strongly-

connected components of G and construct its condensation graph: G∗ =< V∗,E∗ >. Fig-

ure 5.4 shows the condensation graph of Figure 5.3, in which V∗ = {V0,V1,V2,V3,V4}

Each node of G∗ represents a strongly-connected component of G, comprising a set of vari-

ables. G∗ is a directed acyclic graph (DAG) [6], containing a partial order of V∗.


� �

�

�

� �

�

��

��

� ��

� ��

� ��

��

� � � ��

Figure 5.3: Partial KWIC CNF graph

��

��

��

��

� ��

��

��

��

��

�

Figure 5.4: KWIC Condensation Graph


4. Constructing sub-ACNs. The number of sub-ACNs is equal to the number of minimal ele-

ments of G∗. Figure 5.4 has two minimal elements: V3 and V4, so we can construct two

sub-ACNs in the following way:

(a) Construct variable set. For each minimal element, the union of all the nodes, each being

a set of variables, of G∗ that on the chains ending with the element is the variable set of a

sub-ACN. According to Figure 5.4, the variable set of the first sub-ACN is the union of

V0, V1, and V3: {envr input size, envr core size, linestorage ADT, linestorage ds,

linestorage impl}. The union of V0, V1, V2, and V4 is the variable set of the second

sub-ACN:

{envr input size, envr core size, linestorage ADT, circ ADT, circ ds, circ impl}.

(b) Construct the constraint set for each sub-ACN. If the variable set of a sub-ACN contains

all the participating variables of a CNF clause, we put the clause into the constraint set

of the sub-ACN. As a result, we obtain the two constraint networks for these two sub-

ACNs, as shown in Figure 5.5 and Figure 5.6. It is possible that there are clauses that

do not belong to any sub-ACN. In this case, we consider the graph, Gleft, made from

the complete graphs of these clauses. Each connected component of Gleft forms a new

sub-ACN: the corresponding clauses form its constraint set; all the variables involved

in these clauses form its variable set.

(c) Construct the dominance relation for each sub-ACN. From the dominance relation of

whole ACN, each sub-ACN selects the subset that only involves its own variables.

We have observed that in practice this method tends to group variables into cohesive, sparsely

overlapping sets that correspond to key features of the design [19]. For example, Figure 5.5 shows

an ACN corresponding to the line storage function; Figure 5.6 shows an ACN corresponding to the

circular shift function.

This approach splits the whole KWIC information hiding ACN into 6 sub-ACNs, having 6, 6,

4, 5, 7, and 5 variables respectively. Instead of taking hours to solve a constraint network with 20


envr_input_size:{small,medium,large};envr_core_size:{small,large};linestorage_ADT:{orig,other};linestorage_impl:{orig,other};linestorage_ds:{core0,core4,disk,other};(!linestorage_impl = orig || linestorage_ADT = orig) &&(!linestorage_impl = orig || linestorage_ds = core4) &&(!linestorage_ds = core4 || envr_input_size = medium || envr_input_size = small) &&(!linestorage_ds = core0 || envr_input_size = small) &&(!linestorage_ds = core0 || envr_core_size = large) &&(!linestorage_ds = disk || envr_input_size = large)

Figure 5.5: The First sub-ACN

envr_input_size:{small,medium,large};envr_core_size:{small,large};linestorage_ADT:{orig,other};circ_impl:{orig,other};circ_ADT:{orig,other};circ_ds:{copy,index,other};(!circ_ds = copy || envr_input_size = small || envr_core_size = large) &&(!circ_impl = orig || circ_ADT = orig ) &&(!circ_impl = orig || circ_ds = index) &&(!circ_impl = orig || linestorage_ADT = orig)

Figure 5.6: The Second sub-ACN


variables, Simon now invokes Alloy and its underlying SAT solvers to solve these much smaller

models separately, which takes only seconds. Similarly, the DA and PWDR models for each sub-

ACN can be generated individually and quickly.

As we have already explained, the design automaton (DA) is the key model enabling both the

derivation of the PWDR model and design impact analysis. However, integrating these sub-DAs

into a full DA would again be costly. Fortunately, for some of the analyses that we are interested in,

it is not necessary to depend on an integrated full DA. Instead, analyses can be done on each sub-

ACN, and the results can be integrated into the solution to the problem modeled using the whole

ACN. In particular, we can compute design structure matrices and analyze design change impact in

this way.

5.2 Integrating Analysis Results

After decomposing an ACN into a number of sub-ACNs, we need to generate the sub-DAs of

sub-ACNs for the purpose of analysis. However, a sub-ACN generally has both a smaller set of

variables and a weaker set of constraints than the large ACN from which it was derived, and so can

have solutions that are inconsistent with not only those of the full ACN but also of other sub-ACNs.

In order to generate sub-DAs, we first compute the consistent solution set of a sub-ACN, by which

we mean the subset of solutions of a sub-ACN in a given decomposition of a full ACN where these

solutions are all consistent with the solutions of other sub-ACNs in the given decomposition. After

that, we generate consistent sub-DAs of these sub-ACNs. Chapter 7 formalizes these ideas. In this

chapter, we use sub-DAs to stand for consistent sub-DAs. After sub-DAs are generated, both design

impact analysis and DSM derivation can be done using these sub-models, and their results can be

integrated into full solutions.

5.2.1 Integrating Design Impact Analysis Results

We consider the basic design impact analysis question: given an original design, what are all the

ways to compensate for a design decision change (or an environment condition change)? Instead of


��

��

��

��

��

��

��

��

��

��

� ��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

�� !

��

��

��

��

��

��

��

Figure 5.7: Partial DA for the Linestorage sub-ACN

modeling the original design as a solution to the ACN, deriving the DA of the ACN, and finding the

solution, as introduced in section 3, this section presents a method to find sub-solutions from each

sub-ACN and sub-DA, and integrate them into a full solution. We take the ACN with its constraint

network shown in Figure 5.1 as an example, and suppose the original design is as follows:

1: envr_input_size = medium

2: envr_core_size = small

3: linestorage_ADT = orig

4: linestorage_ds = core4

5: linestorage_impl = orig

6: circ_ADT = orig

7: circ_ds = index

8: circ_impl = orig

We have shown that this ACN can be split into two sub-ACNs (their CNs are shown in Figure 5.5

and Figure 5.6). For the designated starting design, we first find the corresponding start states from

each sub-ACNs, in this example, the state L0 in Figure 5.7 and state C0 in Figure 5.8. We call L0

and C0 compatible states because their shared variables have the same values.

Given a changing variable, we distinguish the following two cases:


��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

Figure 5.8: Partial DA for the CircularShift sub-ACN

If the changing variable is local, that is, no other sub-ACNs involve this variable, then the design

impact analysis can be done locally using the sub-DA. linestorage ds of the sub-ACN shown in

Figure 5.5 is a local variable, only having local impact: changing its value to other leads state L0

to L1, as shown in Figure 5.7.

If the changing variable is shared among sub-ACNs, e.g., envr input size appears both in Fig-

ure 5.7 and Figure 5.8, we just need to integrate the results as follows:

1. Find the destination states from each sub-DA labeled with this change: in Figure 5.7, chang-

ing envr input size to large reaches L2 and L3; in Figure 5.8, the same change leads to C1.

2. Compute the cross product of these two sets of states, that is, {L2,L3} × {C1}. In this

procedure, a state in one DA only makes a union with another state in another DA when these

two states are compatible. As a result, L2S

C1 and L3S

C1 are the two destination designs

that we are looking for.

In other words, if the full DA for the full ACN had been generated, the design impact analysis for

the same original design under the same change would have led to the design states that are identical

to the integrated destination states. Chapter 7 formally proves that a DA, identical to the DA directly

derived from the original ACN, can be composed from the sub-DAs, and that the integrated design

impact analysis is valid.


5.2.2 Integrating Coupling Structures

It is possible that a sub-DA solution satisfies its own constraints, but makes the full constraint

network inconsistent. We call such a solution of a sub-DA an incompatible solution. In order to

compose the pair-wise dependence relation (PWDR) of the full ACN from the derived sub-PWDRs,

we first need to remove these incompatible states from each sub-DA. After that, we derive sub-

PWDRs from sub-DAs and compute the union of the sub-PWDRs to get the PWDR for the full

ACN.

This method does not reduce the complexity of deriving a PWDR from a constraint network,

which is NP-complete. In fact, the operation of removing incompatible states has exponential

complexity. The essence of our method is to provide a balance between two extremes: on one

extreme, large ACN is solved as a whole; on the other extreme, each clause of a CNF expression

can be seen as an individual ACN that can be solved independently. However, in order to integrate

the sub-PWDRs derived from these sub-ACNs, comparing each solution of each sub-ACN with

every other solution in every other sub-ACN would be again time-consuming.

Our method decomposes an ACN so that each sub-ACN only needs to compare with other sub-

ACNs that share variables with it. For example, the two KWIC sub-ACNs we present have three

shared variables. The comparison and incompatible state removing operation are executed along

with the DA generation procedure. People have explored many methods to decompose and cluster

a constraint network. As we discuss in section 5.4, our method is orthogonal to these methods, and

combining them may further improve the performance.

Figure 5.9 presents two snapshots of Simon in which the sub-DSMs are derived from the sub-

ACNs shown in Figure 5.5 and Figure 5.6. These two sub-DSMs are parts of the full DSM shown in

Figure 4.10, Chapter 4. We have used Simon to combine full DSMs for both the information hiding

and sequential designs, which are exactly the same as the DSMs we generated from full ACNs,

as presented in Chapter 4. Chapter 7 proves that the integrated PWDR is identical to the PWDR

derived directly from the original ACN.


(C)

Figure 5.9: KWIC SD Modularized

5.3 Observations and Performance

We decompose both the SD and IH ACNs introduced in Chapter 4, and compare the decomposed

sub-models. Simon decomposes the IH ACN into 6 sub-ACNs, having 6, 6, 4, 5, 7, and 5 variables,

as summarized in Table 5.1. The SD ACN is decomposed into 5 sub-ACNs, having 9, 8, 8, 9, and

6 variables, as summarized in Table 5.2.

We observe that there is one more sub-ACN decomposed from the information hiding design,

and that most information hiding sub-ACNs are smaller than the sequential sub-ACNs. From the IH

sub-models, it is easy to tell that each sub-model corresponds to a main function. We have shown

that the CN in Figure 5.5 corresponds to the line storage function, and that the CN in Figure 5.6

corresponds to the circular shift function. The other 4 sub-ACNs correspond to the alphabetizing,

circular shift, output, and master control functions respectively. However, after decomposing the

SD ACN, it is hard to tell immediately what function each sub-ACN models. Figure 5.10 shows

one of the SD sub-DSMs. We get similar observations from the other decomposed SD sub-ACNs.

We now compare the performance of design impact analysis and DSM derivation from a full

ACN and from decomposed sub-ACNs. Without decomposition, for a KWIC SD model with 18

variables, it took Alloy about an hour on a Pentium 1.5 GHz, 512 MB RAM PC to find the 12018

solutions and then 11 minutes to compute the DA and DSM. For the IH model with 20 variables,


Table 5.1: The Variables of IH sub-ACNssub- Size Variables CN DAACN (sec) (sec)1 6 alph impl, alph ADT, alph ds, circ ADT, 14 < 1

linestorage ADT, envr alph policy2 6 circ impl, circ ADT, circ ds, linestorage ADT, 18 < 1

envr input size, envr core size3 4 input impl, input ADT, linestorage ADT, envr input format, 6 < 14 5 linestorage impl, linestorage ADT, linestorage ds, 9 < 1

envr input size, envr core size5 7 master impl, master ADT, linestorage ADT, input ADT, 20 < 1

circ ADT, alph ADT, output ADT6 5 output impl, output ADT, output ds, 8 < 1

linestorage ADT, alph ADT

Table 5.2: The Variables of SD sub-ACNssub- Size Variables CN DAACN (sec) (sec)1 9 envr input size, envr core size, envr input format, input sig, 6 < 1

input ds, input impl, alph ds, envr alph policy, circ ds2 8 envr input size, circ sig, circ ds, envr core size, 22 < 1

circ impl, envr alph policy, alph ds, input ds3 8 envr input size, alph sig, circ ds, envr core size, 17 < 1

alph impl, envr alph policy, alph ds, input ds4 9 envr input size, circ ds, alph ds, envr core size, input ds, 10 < 1

output sig, envr alph policy, output ds, output impl5 6 master impl, master sig, input sig, circ sig, alph sig, output sig 68 < 1

Figure 5.10: A SD sub-ACN


Alloy took about three hours to find 34907 solutions, and the DA and DSM computation took

another 2 hours 13 minutes. After decomposition, all the DAs and DSMs are generated in about 1

minute.

Since Alloy is not designed for the purpose we are using it, the original inefficiency is partially

due to the discrepancy. However, our performance improvement is irrelevant to this fact: after

decomposition, each sub-ACN is still solved by Alloy. The performance gain comes from the fact

that Simon now invokes multiple solvers, each solving a much smaller model, and then integrates

the results quickly.

5.4 Related Work

In the constraint network realm, people have developed various ways to decompose and cluster

constraint problems. Major decomposition methods include conjunctive decomposition [31], dis-

junctive decomposition [31], tree clustering [22], etc. These methods make use of the structures of

constraint graphs to decompose a CN and compose the results. Our work is different. First, our

method is not based on traditional constraint graphs. The edges in our CNF graph do not represent

concrete logical relations, and the nodes do not represent variable-value pairs. Second, we use the

dominance relation to cut edges from a CNF graph. The dominance relation models a hierarchical

structure determined by the software architecture, not captured by logical relations among design

dimensions. By contrast, most constraint decomposition methods work on inconsistent variable-

value pairs, such as Choueiry’s work [20].

Our human design activity modeling using the dominance relation and clustering makes our

work different from most of the pure constraint solving techniques. In addition, the purpose of these

methods are to find optimal designs; ours, in contrast, is dependence and evolvability analysis. On

the other hand, our techniques are orthogonal to these constraint solving methods. These techniques

can be used to improve the performance of solving a full ACN, or to work on each sub-ACN after

decomposition.

There are many bottom-up automatic clustering approaches to automatically discover clusters


from source code, such as the work of Belady and Evangelisti [12], Hutchens and Basili [39],

Schwanke [56], and Mancoridis [52]. In contrast to this work, our method does not require the

existence of source code. Instead, a system is decomposed based on an abstract design model.

Feature oriented design analysis treats each feature as a whole. FODA tools, such as

AHEAD [10, 9], aggregate all the base code that related to a feature into a module. Which part

of the source code belongs to which feature is determined manually. Our approach works at a

higher level, decomposing a system automatically according to its underlying logical structure.

Aspect-oriented programming promises to localize concerns, with the assumptions that these

crosscutting concerns can be captured using built-in pointcut designators. However, not all concerns

are syntactically related, such as a feature that crosscuts hardware, database, algorithm, etc. Our

work does not have such syntactic-based assumptions.

5.5 Chapter Summary

In summary, this chapter addressed the inefficiency problem caused by the brute-force technique.

We presented our approach to make use of the non-trivial dominance relation in an ACN model,

splitting an ACN at natural breaking points into a number of smaller sub-ACNs, solving each sub-

ACN individually, and integrating partial results to produce the desired answer more efficiently.

We evaluated the hypothesis that this approach has the potential to dramatically reduce analysis

time at least for problems with tractable state space sizes against the canonical KWIC designs. The

experiment demonstrates dramatic performance improvement, justifying the potential utility of this

approach.

Chapter 6

Model Extension and Structural Design Impact Analysis

Previous chapters have shown how this framework connects conceptual designs modeled by ACNs

to existing evolvability and economic analyses. However, as a conceptual design description model,

a flat ACN — which is to say, a model involving a fixed number of dimensions with a fixed number

of choices in each dimension — is known to be insufficient to capture key aspects of real software

design problems.

First, an ACN model only has scalar-valued variables. Fred Brooks, among other, has recog-

nized that such a simple and traditional design space model is inadequate to capture the complexity

of many real-world software design problems [17]. As Baldwin and Clark point out [7], some de-

sign dimensions are “called into being” by other decisions. Scalar variables are not sufficient to

model these dimensions and their impacts. Second, it is not uncommon that a decision not only

brings up new dimensions, but also new constraints among these new dimensions, or constraints

between new dimensions and existing dimensions. For example, a choice of design patterns not

only brings new dimensions that are specific to the pattern, but also imposes pattern-specific con-

straints to new and existing design dimensions. Third, design decisions can crosscut each other. For

example, we need a more expressive model to represent such decisions as “all the objects taking

the subject role should implement the prevailing policy.” When a new object is added to the system

as a subject, as part of the impact analysis, the designer should be aware of the notification policy

in use, and of other constraints imposed by the choice of observer pattern. These complex design

76

Chapter 6. Model Extension and Structural Design Impact Analysis 77

decisions have structural impacts. Capturing their existence explicitly is necessary for analysis of

their impacts.

This chapter contributes a richer model which we call the complex augmented constraint net-

work (CACN) to address these inadequacies. Our approach is to model these complex decisions

using set values and subspace values, and model crosscutting constraints using universally quan-

tified logical expressions. To analyze the impact of these decisions, the user first parameterizes

the extended model into simpler models represented by ACNs. After that, the user can compare

the resulting designs using the analysis techniques developed on ACNs. We illustrate the extended

model using Hannemann and Kiczales’s Figure Editor (FE) [38], which has been used as a repre-

sentative example in a large number of publications to demonstrate various problems, especially the

problems related to aspect-oriented programming [43, 37, 33].

We claim that (1) our framework is expressive enough to capture representative design deci-

sions exemplified by the FE design, such as the choices of a design pattern, or the choice of pattern

implementation paradigm (AO or OO); (2) our framework is general enough to account for the

aspect-oriented and objected modularity in uniform, declarative terms; and (3) our framework auto-

mates Hannemann and Kiczales’s analysis precisely. We evaluate the first two claims by modeling

both the OO and AO designs the authors described in their paper; we evaluate the last claim by com-

paring our automatically generated results with the authors’ qualitative analysis. Our experiment

provides positive evidence to support these claims.

Section 6.1 introduces the FE example. Section 6.2 through Section 6.4 present our extended

model. Section 6.5 introduces how to parameterize a CACN into a number of ACNs. Section 6.6

shows how to analyze the impacts of high level design decisions that incur structural changes, which

we call Structural Design Impact Analysis (SDIA).

6.1 Figure Editor Example

Figure 6.1 shows the Figure Editor (FE) design modeled using the Unified Modeling Language

(UML), a de facto industry standard. The Figure Editor is a tool for editing drawings comprising


��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

� ��

��

��

��

Figure 6.1: OO Observer Pattern UML Class Diagram

points and lines (figure elements), where a screen displays each figure element, always reflecting

the figure elements’ current states [43]. Figure 6.1 presents a UML class diagram model of one

possible design based on the observer pattern [33]. The Subject class serves as an abstract interface

for the concrete subjects: Point and Line. The Observer class provides an abstract interface for the

concrete observer: Screen.

Gamma et al. [33] mentioned and analyzed several important design dimensions and the dif-

ferent consequences of making different decisions in these dimensions. For example, notification

policy is a design dimension in which various design decisions can be made. Possible choices for

the notification policy include a pull or a push model [33]:

At one extreme, which we call the push model, the subject sends observers detailed information

about the change: whether they want it or not. At the other extreme is the pull model: the subject

sends nothing but the most minimal notification, and observers ask for details explicitly thereafter.

Another design dimension concerns the choice of the data structure used for the mapping from

subjects to observers, such as a hash table. Third, the update policy could be complex enough that

a change manager might be needed.

Hannemann and Kiczales [38] identified additional but implicit design dimensions, such as

the role assignment, which demands decisions about which objects are observers and which are


subjects. They compared their aspect-oriented (AO) design of the observer pattern with the object-

oriented (OO) observer pattern in terms of the changeability in these dimensions. Their analysis

focused on answering evolvability questions such as the following: what are the consequences if

the client changes the role assignment, requiring the Screen to be both a subject and an observer?

Which parts should be changed? In the current design, the subject color is the only state of interest

under observation. What if the observing policy changed so that the positions of the figure elements

should also be observed? The authors analyzed these problems descriptively and showed the code

implementing these choices as the evidence of their analysis. However, designers frequently face

questions of the like before coding.

We basically observed similar problems from UML models as we observed in the architec-

tural description method: while UML notations better represent object-oriented program structures,

they do not provide effective ways to represent such design decisions as the choice of role assign-

ment, choice of mapping data structure, choice of observing policy, choice of pattern, or choice

of paradigm. These choices have profound impacts on the design coupling structures that strongly

influence crucial design quality attributes, such as evolvability, the best way to accommodate given

changes, and the economic value of flexible design architecture. In addition, the FE example exem-

plifies two additional problems:

First, a decision at one level often alters a design space structure by introducing new variables

and constraints. For example, the interaction between the Screen and the figure elements can also be

designed with other patterns, e.g., a mediator pattern, which in turn can use either an AO or an OO

paradigm [38]. The choices in the pattern and paradigm dimensions have significant consequences

in that each choice calls into being a different design subspace that introduces both new dimensions

and constraints that are potentially scoped over other variables in the design. The structure of a

design space is thus not fixed but is, in general, contingent on prior decisions and recursive in struc-

ture. State-of-the-art design modeling approaches do not adequately represent this phenomenon.

Consequently, it is difficult to analyze the structural and economic consequences of making such

high-level design decisions.

Second, the effects of design decisions are frequently not local but crosscutting. For example, all


the subjects have to respect the agreed notification policy, push or pull. Prevailing design modeling

techniques do not adequately represent design decisions with crosscutting effects. Consequently,

it is difficult to have a clear picture of the structural and economic consequences of making or

changing such crosscutting design decisions.

Three new elements extend the ACN model: (1) Set-Valued Design Variables modeling di-

mensions in which each choice is a set of other dimensions; (2) Quantified constraints modeling

crosscutting relations among decisions; (3) Hierarchical Design Variables modeling dimensions in

which each choice is a sub-design with new dimensions and constraints. We call this extended

formal model the Complex Augmented Constraint Network (CACN). Figure 6.2 shows the Figure

Editor CACN model. The next section explains the extended model.

6.2 Set-Valued Design Variables

In many cases, the choice in one dimension can be a set of other variables, each itself potentially

designating a complex dimension in the design space. We model such design dimensions as set-

valued design variables (SDVs), and each choice (value) as a named set.

In the Figure Editor example, the decision about what elements a figure editor system should

contain is a set of new design dimensions, such as {Point, Line, Screen}. We use the variable

elements to model this dimension. Each decision in this dimension brings into being a set of new

design dimensions, in this case, point, line, and screen, each modeled as a scalar variable with

the same domain: (orig, other).

Line 1 in Figure 6.2 demonstrates how Simon models this SDV in its internal language:

set elements(orig, other): (v1{point, line, screen}, other). The shared domain

is defined after the SDV name. Each set value has a name, followed by a set of elements within a

pair of curly brackets. This line says that v1 is a decision value that is a set with three elements:

point, line, screen. Similar with other variable definitions, we use other as a value to repre-

sent unelaborated possibilities.

For another example, Hannemann and Kiczales point out an observing policy dimension, which


1: set elements(orig, other):(v1{point, line, screen},other);2: set subject_role(*elements):(v1{point, line}, v2{point, line, screen}, other);3: set observer_role(*elements):(v1{screen}, other);4: set policy_observing(orig, other):(v1{color}, v2{color, position}, other);5: scalar policy_notify:(push, pull);6: scalar policy_update:(orig, other);

7: scalar d_mapping:(hashtable, other);8: subspace d_paradigm: (OO, AO);

9: d_paradigm_OO[10: scalar adt_observer:(orig, other);11: scalar adt_subject:(orig, other);12: adt_subject = orig => d_mapping = orig && adt_observer = orig&& policy_notify = push;

13: ˜observer_role = orig => adt_observer = orig && policy_update = orig;14: ˜subject_role = orig => adt_subject = orig && ˜policy_observing = orig;15: ];

16: d_paradigm_AO[17: scalar abstract_protocol_interface:(orig, other);18: scalar abstract_protocol_impl:(orig, other);19: set concrete_protocol(orig, other): %policy_observing;20: %policy: policy_observing, con: concrete_protocol% |con = orig => policy = orig;

21: abstract_protocol_impl= orig => abstract_protocol_interface = orig&& d_mapping = hashtable && policy_notify = orig;22: ˜concrete_protocol = orig => abstract_prototcol_interface = orig&& ˜subject_role = orig && ˜observer_role = orig && policy_update = orig;];

Figure 6.2: Figure Editor CACN Model


we model as policy_observing, and one of its variations: in one design, only the colors of figure

elements are observed. In another design, their positions are observed too. Line 4 in Figure 6.2

models this variable. Each value in the domain of policy_observing models one possibility.

This is not the only way that a decision can bring up a new dimension. What we call the one-

to-one correspondence relation among design dimensions is another way that a new decision can

be called into being: in a networking system, each supported protocol should have a corresponding

processing module; in a fault tree analysis tool, each type of node should have a corresponding

shape to denote it visually; etc.

In the aspect-oriented Figure Editor design introduced by Hannemann and Kiczales [38], each

dimension to be observed (modeled by policy_observing), color or coordination, incurs a cor-

responding concrete aspect protocol, concrete_protocol. We use ” % ” to denote that the choice

of concrete_protocol is brought up by this bijective mapping: each dimension to be observed is

taken care of by a concrete protocol, as shown in Line 20 of Figure 6.2.

In Hannemann and Kiczales’s paper [38], the initial FE design assigns the subject role to Point

and Line, and the observer role to Screen. They mention a variation that a screen can also take

the subject role. We notice that the decisions on the subject role and observer role dimensions do

not bring into new design dimensions. Instead, they refer to existing design dimensions in order to

impose constraints on them. We still model such a dimension as a SDV, but specify the referred

dimension in the parenthesis following the SDV name: set subject_role(*elements). The *

before the referred variable name tells Simon that this SDV does not bring new dimensions, but

refers to existing design dimensions defined in elements. Lines 2 and 3 in Figure 6.2 model the

subject role and observer role dimensions, their original state, and the variation.

6.3 Crosscutting Design Dimensions

In general, a decision in one dimension can have complex interactions with decisions in other di-

mensions: either taking them as assumptions or constraining them. The constraints imposed by

design decisions can be pervasive: they can be system-wide and crosscutting. We have found uni-


versal quantification to be useful in capturing the crosscutting phenomena among design decisions.

For example, in the OO observer pattern FE design, the abstract subject interface influences all the

concrete subjects, and the observing policy specifying which states should be observed influences

all the subjects. These constraints can be logically modeled as: ∀subject : subject role • subject =

orig⇒ (adt subject = orig ∧ (∀policy : policy observing • policy = orig))

Line 14 in Figure 6.2 demonstrates how Simon models this constraint, in which “ ˜ ” is shorthand

for universal quantification. The notation in which we present our examples is not a fully developed

relational logic, but rather is the result of our introducing mechanisms as the need has arisen.

6.4 Nested Design Subspaces

The structure of a design could be recursive in the sense that a decision in one dimension can intro-

duce new dimensions in which design decisions have to be made, as well as new constraints. The

set-valued variables in CACNs represent such design dimensions, but with simpler form: each di-

mension within a set has the same domain, and there are no constraints specified. A design decision

can introduce more complex subspaces with new dimensions, each having different domains and

constraints affecting decisions both within and outside of the set of variables of the new subspace.

We capture the recursive nature of a design by the notion that a value can carry a subspace,

which we call a subspace value. We model a design dimension with subspace values as a hier-

archical design variable (HDV), and we model each subspace recursively as a CACN. For the FE

example, the programming paradigm choices for the observer pattern can be modeled as an HDV:

subspace d_paradigm: (OO, AO).

Line 9 through Line 15 in Figure 6.3 model the OO subspaces; Line 16 through Line 22

model the AO subspaces. We observe that the OO subspace introduces two abstract interfaces:

adt_observer and adt_subject. These subspaces also introduce new constraints among deci-

sions. Line 21 in Figure 6.2 shows that, in the AO design, the abstract protocol implementation

makes assumptions about the mapping data structure decision and the notification policy. For the

detailed AO observer design, please refer to Hannemann and Kiczales’s paper [38].


��

��

��

��

��

� �� !��"# �� !��"#

$ ��%��%&�

� ��%��''��%��''��(��%��#

) �*��%��%&��%��''��%��#

+ �*��%��%&��%��''�*��%��#

� ��(�� !��"#

$ �� !��"#

) ��!��" �, ��#

+ �,�� , �-��%��%&��%��#

. �� %��%&�

��(��%��''�� %��''��(��%��#

/ �*��%��%&��(��%��''�*��%��

''�*��%��''��%��#

01

01

� ��(�� !��"#

$ �� !��"#

) �� !��"#

��2��

01

��$��

01

��$�� 01

��

��

3��

�$

)

Figure 6.3: Complex Augmented Constraint Network


6.5 Parameterizing CACN

Changing high level design decisions, such the value of SDVs and HDVs incurs structural impacts

that we are interested to analyze. For example, if a SDV has a new value in which a new dimension

is added, for example, the FE now takes Circle as a new element, the impact analysis should

be able to identify the constraints that the Circle variable has to respect. If an HDV modeling

design patterns changes from one pattern to another, the impact analysis should be able to compare

the different coupling structures caused by choosing different patterns, and compare which pattern

could better accommodate anticipated changes.

This section explains how our CACN model supports these structural design impact analyses

(SDIA). The basic idea is to parameterize a CACN into a set of design alternatives, instantiate each

design alternative into a flat ACN, and compare the structural differences of these resulting ACNs.

6.5.1 Design Alternatives

Each value of a variable in an ACN or CACN defines an alternative choice of that dimension. For

a SDV or HDV value, such choice is a sub-structure. A design of an ACN or CACN can be seen

as the combination of these alternatives. Parameterizing an ACN or CACN involves selecting one

alternative from each dimension, that is, binding a value to each variable, and combining them to

form a design, which we call a design alternative. Our structural design impact analysis begins by

parameterizing a CACN into a set of design alternatives.

For the purpose of illustration, we model the FE CACN as an and-or tree depicted in Figure 6.3.

We put all the SDVs and HDVs with specified value alternatives as the entries of the root AND node.

We also consider all other variables and constraints as a Basic sub-design, making it the first entry

of the AND node. Each SDV or HDV leads an OR node with each value, considered as an alterna-

tive subspace, as an entry. The FE CACN thus can be formalized as:

FE = Basic ∧

(elements = v1 ∨ elements = other) ∧


(observer role = v1 ∨ observer role = v2 ∨ observer role = other) ∧

(subject role = v1 ∨ subject role = other) ∧

(policy observing = v1 ∨ policy observing = v2 ∨ policy observing = other) ∧

(d paradigm = OO ∨ d paradigm = AO)

We reform the above formula into the disjunctive normal form (DNF), assign a name to each

clause, and use each clause as an alternative design. There are 72 alternative designs in the design

space defined by the FE CACN: FE = FE0 ∨ FE1 ∨ ... ∨ FE71. We define:

FE1 = Basic ∧ elements = v1 ∧ observer role = v1 ∧ subject role = v1 ∧

policy observing = v1 ∧ d paradigm = OO

Simon allows the user to designate a value for each SDV and HDV, and automatically translates

the specified CACN design alternative into a new simple ACN.

6.5.2 Instantiate ACNs

Figure 6.4 shows the ACN instantiated from design alternative FE1. Instantiating a CACN design

alternative into a flat ACN involves the following steps:

1. Replace each SDV with a set of scalar variables.

For example, for elements = v1{point, line, screen}, Simon generates a set of new

scalar variables shown in Line 1 to Line 3 in Figure 6.4. The name of each new scalar

variable is the combination of the variable name and the value element name; the do-

main of each scalar variable is copied from the original CACN definition. Similarly,

policy_observing = v1{color} becomes Line 7 in Figure 6.4. For SDVs that only re-

fer to other variables, such as subject_role and observer_role, Simon just internally

stores the variables they refer to, and processes the constraints imposed as shown in step 3.

For SDVs defined by a one-to-one correspondence relation, such as:


set concrete_protocol(orig, other): %observing_policy;

According to the decision observing_policy = v1{color} in FE1, Simon implicitly adds

a new value, v1{color}, into the domain of concrete_protocol, and stores the bijection

mapping internally. After that, Simon generates the following new variable:

scalar color_concrete_protocol:(orig, other);

2. Replace each hierarchical design variable with the sub-structure associated with the desig-

nated value. For example, d_paradigm = OO in FE1 is replaced with the OO subspace shown

in box 2, Figure 6.3, which introduces two new variables: adt_observer and adt_subject,

as shown in Line 8 and Line 9 in Figure 6.4. If an HDV value defines a structure with new

SDVs or HDVs, we just need to repeat these steps recursively.

3. Remove universal constraints. Once all the SDVs are replaced with scalar variables, univer-

sal quantifications can be replaced with simple logic. For example, we have shown that Line

14 in figure 6.2 models the following constraint: ∀subject : subject role • subject = orig ⇒

(adt subject = orig ∧ (∀policy : policy observing • policy = orig))

Simon first translates it into a normal form:

(¬∃subject : subject role • subject = orig) ∨ (adt subject = orig ∧ (∀policy :

policy observing • policy = orig)

Suppose the value of subject_role has been designated to v1{point, line}, and

policy_observing = v1{color},

Simon replaces the quantified expression

(¬∃subject : subject role • subject = orig)

with a conjunctive expression:

point subject role 6= orig ∧ line subject role 6= orig

Since subject_role refers to variables defined in elements, Simon replaces

point_subject_role with point_elements, a variable existing in the generated ACN.


1: scalar point_elements:(orig,other);2: scalar line_elements:(orig,other);3: scalar point_elements:(orig,other);

4: scalar policy_notify:{push,pull};5: scalar policy_update:{orig,other};6: scalar d_mapping:{orig,other};

7: scalar color_policy_observing:{orig,other};

8: scalar adt_observer:{orig,other};9: scalar adt_subject:{orig,other};10: (point_elements != orig && line_elements != orig) ||(adt_subject = orig && color_policy_observing = orig);11: screen_elements != orig || (adt_observer = orig && policy_update = orig);

12: adt_subject = orig => d_mapping = orig && adt_observer = orig && policy_notify = push;

Figure 6.4: The Constraint Network in an ACN Generated by Design Alternative FE1

Similarly, (∀policy : policy observing • policy = orig) is replaced with:

color policy observing = orig. As a result, the above quantified expression is translated into

Line 10 in Figure 6.4.

After binding a value for each SDV and HDV, Simon generates a plain ACN with only scalar

variables and non-quantified constraints.

6.6 Structural Design Impact Analysis Overview

In their paper [38], Hannemann and Kiczales compared the differences of using OO and AO to

implement the observer pattern. In this section, we formulate this analysis as a decision problem

defined on a CACN: if d_paradigm changes from OO to AO, what is the impact on the design

coupling structure? Subsection 6.6.1 analyzes this problem. Subsection 6.6.2 analyzes the impact

of changing the role assignment so that the Screen is both a subject and an observer, a problem

Hannemann and Kiczales analyzed descriptively in their paper. Subsection 6.6.3 analyzes another

problem they mentioned: What if the observing policy changed so that the positions of the figure

elements should also be observed in addition to the colors? Hannemann and Kiczales analyzed


1: color_policy_observing:{orig,other};2: policy_notify:{push,pull};3: policy_update:{orig,other};4: d_mapping:{orig,other};5: abstract_protocol_impl:{orig,other};6: point_elements:{orig,other};7: line_elements:{orig,other};8: screen_elements:{orig,other};9: color_concrete_protocol:{orig,other};10: abstract_protocol_interface:{orig,other};11: color_concrete_protocol = orig => color_policy_observing = orig;12: color_concrete_protocol = orig => abstract_protocol_interface = orig&& point_elements = orig && line_elements = orig&& screen_elements = orig && policy_update = orig;13: abstract_protocol_impl = orig => abstract_protocol_interface = orig&& d_mapping = orig && policy_notify = push;

Figure 6.5: The Constraint Network in an ACN Generated by Design Alternative FE2

these problems descriptively and showed the code implementing these choices as the evidence of

their analysis. This section shows the automation of their analyses at design level.

6.6.1 OO Pattern Versus AO pattern

To analyze the impact of changing a structural decision on a hierarchical or set-valued design di-

mension, the user needs to designate two design alternatives for the changed and original decisions.

For example, we define FEOO = FE1, specifying the original OO design, for which Simon generates

a DSM as shown in Figure 6.6. Now the decision on d_paradigm changes from OO to AO, leading

to a new design alternative, FEAO:

FEAO = Basic ∧ elements = v1 ∧

observer role = v1 ∧ subject role = v1 ∧

policy observing = v1 ∧ d paradigm = AO

In order to compare the change impact, we use Simon to generate a new ACN for FEAO, shown

in Figure 6.5, and derive the DSMs of FEOO and FEAO, as shown in Figure 6.6 and Figure 6.7. We

observe that the AO design has fewer dependence marks. In the AO design, the decisions on the


Figure 6.6: The DSM of FE OO: OO Figure Editor Design

notification and update policies no longer influence these concrete subjects, such as the Point and

Line implementations. Instead, only the abstract and concrete protocols depend on these policies,

indicating the localization of crosscutting decisions.

We can also analyze the respective consequences of changing common design decisions for two

design alternatives, for example, the consequences of changing the notification policy from push

to pull in both AO and OO designs. Figure 6.8 shows two Simon DIA snapshots, comparing the

change impacts in both designs. Simon shows that in the OO design, three variables other than

policy_notify have to be revisited, while in the AO design, only one other variable should be

revisited. Counting the number of variables affected by a change in a decision is clearly insufficient

to determine costs of change. However, identifying what must change is a critical step, and we hy-

pothesis that our analysis can be combined with traditional methods of cost estimation for changes

in individual design decisions to supporting economic reasoning.

6.6.2 Change Role Assignment

This section shows how Simon reveals the different consequences of changing the value of

subject_role from v1{Point, Line} to v2{Point, Line, Screen} in an OO design and AO

design respectively. We define:


Figure 6.7: The DSM of FE AO: AO Figure Editor Design

Figure 6.8: DIA: Notification Policy Change Impacts


Figure 6.9: The DSM of FEOO Role: Screen takes the subject role

FEOO Role = Basic ∧ elements = v1 ∧



and: FEAO Role = Basic ∧ elements = v1 ∧



We first observe the impact of changing this decision on the AO design by comparing DSMs.

We found that the FEAO Role DSM is exactly the same as the DSM of FEAO, as shown in Figure 6.7.

It means that the AO design localizes this change completely without incurring any additional de-

pendencies or new dimensions.

The impact on the OO design is different. Figure 6.9 shows the DSM of FEOO Role. Comparing

with the DSM of FEOO shown in Figure 6.6, we observe that screen_elements now depends

on color_policy_observing, policy_notify, d_mapping and adt_subject. The coupling

structure changed because of this new decision.


Figure 6.10: The DSM of FEOO Position: Positions are observed in OO design

6.6.3 Change Observing Target

This section shows how Simon reveals the different consequences of changing the value of

policy_observing from v1{color} to v2{color, position} in an OO design and AO design

respectively. We define:

FEOO Position = Basic ∧ elements = v1 ∧



and: FEAO Position = Basic ∧ elements = v1 ∧



Figure 6.10 shows the DSM of FEOO Position. Comparing with the DSM of FEOO shown in

Figure 6.6, we observe that a new position_policy_observing is added and it influences existing

variables: point_elements and line_elments.

Figure 6.11 shows the DSM of FEAO Position. Comparing with the DSM of FEAO shown in

Figure 6.7, we observe that two new variables are added: position_policy_observing and

position_concrete_protocol. The newly added dependences do not affect the existing structure


Figure 6.11: The DSM of FEAO Position: Positions are observed in AO design

(boxed within the darker lines).

In summary, these DSMs confirm Hannemann and Kiczales’s qualitative analyses, revealing

how these high level decisions impact the design coupling structure visually and precisely.

6.7 Related Work

Software design space, product family, and variability modeling and has been widely studied by

Bosch [59], Lane [45], and others for product line design, design optimization and for other pur-

poses. Using logic to model features is not new.

Batory and O’Malley’s work specifies the relations among features using constraints [9, 8, 10].

Czarnecki et al.’s feature model [21] uses feature diagrams to represent the design variations. Sim-

ilar to our and-or-tree representation, they represent features as a hierarchical structure, and derive

a concrete configuration by selecting and cloning options. Feature models often need to contain

additional constraints mainly to express which options have to be co-existent, and which options

are exclusive from each other. Our approach is more general in that we not only model and analyze

features, which are one kind of design decisions, but also broader decisions such as refactoring

options, design patterns, and aspects. The impact of these decisions is beyond the inclusion or ex-

clusion of feature options. Our purpose differs in that we aim to analyze the modular structures in


design architectures, and their economic implications, while they aim to analyze feature properties.

On the other hand, we suspect that our approach is general enough to analyze the problems in their

realms.

Automatic program variations have also been widely studied, such as Batory’s work on generic

programming [11] and Goguen’s work on parameterized programming [36]. While their purpose

is to synthesize complex software systems from libraries of reusable components, our purpose is to

rigorously support modularity analysis and decision-making.

Similar to our design space modeling, Lane [45] models the structure of software systems as

design spaces by identifying the key functional choices, and classifying the alternatives available

for each choice. Their notions of rules, similar to our constraints, are formulated to relate choices

within a design space. Their purpose is to automatically select an optimal design. Our approach

is more general in that we model broader decision-making phenomena and their impacts, such as a

decision that brings up a subspace, the dominance relations among decision decisions, and different

modularization approaches. Our logical constraints are different and more expressive than their

semi-formal rules (guidelines), and our approach enables the comparison of designs in terms of

their modular structures.

Jackson [40] uses Alloy for object modeling with the goal of being able to check structural prop-

erties of object models specified using the Alloy relational logic. Alloy is thus related to our work

in several ways: it uses logic to model designs, and it supports formal analysis of specifications. In

fact, our Simon tool uses Alloy internally to analyze ACNs [18]. Unlike our work, however, Al-

loy has mainly been developed and used to specify formal properties and check complex relational

object structures, whereas our work aims to enable the specification of design decision spaces and

the analysis of properties such as evolvability under changes to given decisions and the net option

value of modularity.

Traditional impact analysis research focuses on change issues at program level, as summarized

in [4]. Advantages of our approach include a precise semantics of dependence, and the ability

to reason about the ripple effects of changes in high-level design decisions. We have provided a

precise notion of impact analysis for logical design models.


Aspect-Oriented Software Development (AOSD) researchers [43, 68] have realized the limi-

tation imposed by traditional OO design and contributed language constructs addressing multidi-

mensional and cross-cutting issues. In addition, as Filman pointed out [30], quantification is a key

feature of AOP. However, the logical representation of crosscutting structures at design level has

not been fully developed in earlier work, and is a key target of the current work.

6.8 Chapter Summary

Using the representative Figure Element example, this chapter presents an extended ACN model to

address the problem that ACN modeling is not adequate to capture complex design decisions and

to analyze their structural impacts. We evaluated and provided evidence in support of the claims

that our framework is able to capture representative complex design decisions, uniformly account

for aspect-oriented and object-oriented modularity, and automate the analyses of problems people

previously analyzed qualitatively.

Chapter 7

Formalization

Parnas’s information hiding criterion has been influential for decades [54]. Baldwin and Clark’s de-

sign rule theory [7] has shed additional light on the value of design modularity. Sullivan et al. [64]

showed that Baldwin and Clark’s model can be extended with environment parameters to account

for Parnas’s information hiding criterion. However, these theories remain informal, and conse-

quently unnecessarily hard to understand and hard to apply with rigor and precision. This chapter

addresses these problems by contributing a formalization of our framework that accounts for the

key notions within these theories, and that enables automation of the corresponding analysis tech-

niques. In Chapter 5, we introduced our divide-and-conquer approach to addressing the scalability

issue. In this chapter we formalize this approach and prove its correctness.

We claim that our framework is sufficient to formally account for: (1) Parnas’s criterion of

information hiding modularity; (2) Parnas’s approach to analyzing the changeability of a design;

and (3) Baldwin and Clark’s concepts of design dimensions, design decisions, design spaces, de-

sign rules, and design dependences, which, in turn, are the rigorous foundation for Baldwin and

Clark’s net option value analysis. We evaluate these claims by showing that all these concepts and

approaches can be formally defined in terms of augmented constraint networks, design automata,

and pair-wise dependence relations. We also claim that the divide-and-conquer approach produces

the same analysis results as the brute-force approach. In this chapter, we prove this claim formally.

We present our formalization using the Z (pronounced Zed) specification language [60].

Section 7.1 formalizes our core models. Section 7.2 presents how previous theories can be

97

Chapter 7. Formalization 98

formalized within the setting of our framework. Section 7.3.1 presents the formalization of our

divide-and-conquer approach and the proof that this approach is correct.

7.1 Formalizing the Core Models

This section formalizes the notions of finite-domain constraint network (FDCN), augmented con-

straint network (ACN), design automaton (DA), pair-wise dependence relation (PWDR), and the

derivation of DAs and PWDRs from ACNs.

7.1.1 Finite Domain Constraint Network

Because we intend to formalize our ideas using the Z language, we need a formalization of finite-

domain constraint networks (FDCNs) expressed in Z. We have thus developed a formalization of

Tsang’s quasi-formal model of constraint networks [69] in Z.

We first abstractly specify variables and values as given sets.

[Variable,Value]

The domains of variables are specified as a relation between variables and values.

Domains

domain : Variable↔ Value

In a valid assignment, each variable takes values from its domain. The following schema

states that an Assignment has a domain, and the function bindings maps variables to their values.

Line 7.1.1 ensures that the value assigned to a variable must respect its domain.


Assignment

Domains

bindings : Variable 7→ Value

∀var : dombindings • bindings(var) ∈ domain(| {var} |) (7.1.1)

In a FDCN, a constraint is modeled as a set of permitted assignments to the variables to which

it applies, as formalized in the following schema. For the matrix example, if the domain of matrix is

{dense,sparse}, and the domain of ds is {array ds,other ds}, then the constraint ds = array ds⇒

matrix = dense is modeled as the following permitted assignments: {{ds = array ds, matrix =

dense}, {ds = other ds, matrix = dense}, {ds = other ds, matrix = sparse}}.

Constraint

Domains

AssignmentsAllowed : FAssignment

∀allowed : AssignmentsAllowed • allowed.domain = domain

The following schema specifies the notion of FDCN, which consists of a finite set of variables,

their domains, and a set of constraints. Line 7.1.2 ensures that the domain is defined over the FDCN

variable set. Line 7.1.3 ensures that the constraints constrain the FDCN variable set. The function

solutions maps a FDCN to a set of assignments that are the solutions of the constraint network.

Line 7.1.4 ensures that each solution is an assignment, and its domain is the whole variable set of the

constraint network; Line 7.1.5 ensures that for any constraint, there exists a permitted assignment

that is the subset of (consistent with) the given solution.


ConstraintNetwork

VariableSet : FVariable

Domains

ConstraintSet : FConstraint

solutions : FAssignment

domdomain = VariableSet (7.1.2)

∀constraint : ConstraintSet • constraint.domain⊆ domain (7.1.3)

solutions = {solution : Assignment | solution.domain = domain ∧ (7.1.4)

(∀constraint : ConstraintSet • (∃allowed : constraint.AssignmentsAllowed •

(allowed.bindings⊆ solution.bindings)))} (7.1.5)

7.1.2 Augmented Constraint Network

As introduced in Chapter 3, we augment a constraint network with a dominance relation and a clus-

ter set, and call it an augmented constraint network. Since the cluster set component is only used

when the user needs to generate a DSM, we elide it in our core formalization. Thus, we formally

specify an ACN as a constraint network and a dominance relation, as the AugmentedConstraintNet-

work schema states.

AugmentedConstraintNetwork

cn : ConstraintNetwork

dominance : Variable↔ Variable

domdominance = cn.VariableSet (7.1.6)

randominance = cn.VariableSet (7.1.7)

We model a dominance relation as a binary relation with the semantics that: if (x,y) ∈


dominance, changes to x cannot force changes to y. Constraints 7.1.6 and 7.1.7 state that the domi-

nance relation is defined on the variable set of the constraint network.

7.1.3 Design Automaton

The following schema specifies our Design Automaton (DA) model, and the constraints state the

following:

• (7.1.8) A DA consists of a set of states. Each state is an assignment.

• (7.1.9) The alphabet of a DA is a set of variable-value pairs.

• (7.1.10) In a DA, a transition models a change from an original design, leading to destination

states accommodating that change. In general, there are several ways to compensate for a

change, so a DA is nondeterministic. Accordingly, the transition function maps an assignment

(modeling a start design state) and a variable-value pair (modeling a change), to a set of new

assignments (the destination states).

• (7.1.11) The multiple destination states caused by the same change correspond to the differ-

ent ways to compensate for that change, each involving changes to a set of variables. The

function singleChangeSet summarizes the impact of one change: it maps a change (modeled

by a variable-value pair) starting from an original state (an assignment), to a group of variable

sets, each of which models a set of changed variables involved in one of the multiple ways to

accommodate this change ( 7.1.13).

• (7.1.12) The function allChangeSet specifies the impacts of each variable by summarizing all

the variable groups it causes to change starting in any design state, as specified in Line 7.1.14.


DesignAutomaton

states : FAssignment (7.1.8)

alphabet : Variable↔ Value (7.1.9)

transition : (Assignment× (Variable×Value))→ FAssignment (7.1.10)

singleChangeSet : (Assignment× (Variable×Value)) 7→ (F(FVariable)) (7.1.11)

allChangeSet : Variable 7→ (F(FVariable)) (7.1.12)

dom(dom transition) = states

alphabet = (ran(dom transition))

(ran transition) = states

∀start : Assignment; var : Variable; value : Value • (7.1.13)

(start ∈ states ∧ (var,value) ∈ alphabet)⇒

singleChangeSet(start,(var,value)) = {varset : FVariable |

(∃end : transition(start,(var,value)) •

(varset = dom(end.bindings\ start.bindings)))

}

∀var : Variable; • allChangeSet(var) =S{changedVarSet : F(FVariable) | (7.1.14)

(∃sol : states; value : alphabet(| {var} |) •

changedVarSet = singleChangeSet(sol,(var,value)))

}

The function deriveDA specifies the derivation of a DA from a given ACN, making use of the

function computeDA.


deriveDA : AugmentedConstraintNetwork → DesignAutomaton

∀acn : AugmentedConstraintNetwork; da : DesignAutomaton •

deriveDA(acn) = computeDA(acn.cn.solutions,acn.cn.domain,acn.dominance)

computeDA : FAssignment× (Variable↔ Value)

×(Variable↔ Variable)→ DesignAutomaton

∀solutions : FAssignment; domain : (Variable↔ Value);

dominance : Variable↔ Variable; da : DesignAutomaton •

computeDA(solutions,domain,dominance) = da⇒

da.states = solutions ∧ da.alphabet = domain ∧ (7.1.15)

(∀start : da.states; change : da.alphabet; endstates : Fda.states •

da.transition(start,change) = endstates⇒ (∀end : endstates •

(change ∈ end.bindings) ∧ (7.1.16)

(∀sub : F(Variable×Value) • sub⊂ (end.bindings\ start.bindings)⇒

replace(start,sub) /∈ solutions) ∧ (7.1.17)

(∀ forced : dom(end.bindings\ start.bindings) • (7.1.18)

forced /∈ (dominance(| {first change} |)))))

The function computeDA maps a set of solutions, a domain, and a dominance relation to a DA.

It does so by essentially finding the minimal transitions that respect the dominance relation.

• (7.1.15) The states of the DA are the given solutions, and the DA alphabet is the given domain.

• (7.1.16) All destination states accommodate the specified change. That is, the set of bindings

of each end state includes the change.

• (7.1.17) Each transition between states is minimal. That is, changing any proper subset of

the differences between the start state and the destination state will not lead to a valid solu-


tion. The helper function, replace, assigns different values to a set of variables in an assign-

ment, and returns a new assignment. end.bindings \ start.bindings models the differences

between the start state and the destination state; sub is a proper subset of the difference;

replace(start,sub) leads the start design to a new assignment that differs by sub.

• (7.1.18) The dominance relation must not be violated. If (x,y) ∈ dominance, then among

all the possible ways to restore consistency in face of a change to x, those involving y are

excluded.

replace : (Assignment× (F(Variable×Value)))→ Assignment

∀ from, to : Assignment; changes : F(Variable×Value) •

replace(from,changes) = to⇔ to.bindings\ from.bindings = changes

7.1.4 Pair-wise Dependence Relation

The following schema specifies a pair-wise dependence relation (PWDR) among design variables:

PairWiseDependenceRelation

pairs : Variable↔ Variable

The function derivePWDR specifies the derivation of a pair-wise dependence relation (PWDR)

from an ACN. In essence, a PWDR is computed from a DA (7.1.19), specified by the function

computePWDR. Constraint 7.1.20 states that a pair (x, y) belongs to the PWDR if and only if y is

impacted by x.


derivePWDR : AugmentedConstraintNetwork → PairWiseDependenceRelation

∀acn : AugmentedConstraintNetwork; pwdr : PairWiseDependenceRelation •

derivePWDR(acn) = pwdr ⇒

pwdr = computePWDR(deriveDA(acn)) (7.1.19)

computePWDR : DesignAutomaton→ PairWiseDependenceRelation

∀da : DesignAutomaton; pwdr : PairWiseDependenceRelation •

computePWDR(da) = pwdr ⇒

(∀pair : pwdr.pairs • (∃vargroup : da.allChangeSet(first pair) • (7.1.20)

(second pair) ∈ vargroup))

We have thus developed a formal definition of what it means for two design variables to be

coupled: if y is involved in some minimal compensation for some change to x, we define that y

depends on x. The DA encodes complete coupling information and a PWDR summarizes (but

loses) information in the DA.

7.2 Formalizing Previous Theories of Modularity

In this section, we show how our models formally account for previous modularity theories.

7.2.1 Parnas’s Theory

Sullivan et al. [64] present a novel characterization of the nature of Parnas’s information hiding

modularity as invariance of design rules with respect to changes in environment variables in a

Baldwin-and-Clark-style DSM. In this subsection, we formalize this notion by partitioning the vari-

ables of an ACN into environment, design rule, and hidden variables, and then by stating that no

design rule variable is affected by any change in any environment variable. To our knowledge, this


work is the first to present a formal model of what it means for a design architecture to exhibit

information hiding modularity.

InformationHidingModularity

acn : AugmentedConstraintNetwork

Environment : FVariable

DesignRules : FVariable

HiddenVariables : FVariable

〈Environment,DesignRules,HiddenVariables〉 partition acn.cn.VariableSet (7.2.1)

∀pair : (derivePWDR(acn)).pairs •

¬ ((first pair) ∈ Environment ∧ (second pair) ∈ DesignRules) (7.2.2)

The schema InformationHidingModularity states that after partitioning the variable set of an

ACN into Environment, DesignRules, and HiddenVariables (7.2.1), if the ACN models an informa-

tion hiding modularized design, its derived PWDR relation should not have any pair with the first

element in an environment set, and the second in a design rule set (7.2.2).

Given a current design, what are all the ways to compensate for a sequence of given decision

changes? Parnas’s changeability analysis to find the ripple effects of a change can be inferred from

the answer to this more general question by comparing the feasible new designs with the orig-

inal design. The answer to this question has the potential utility, for example, to find the most

cost-effective way to accommodate a change. We specify this problem and its solution in the De-

signImpactAnalysis schema.

The function impact maps an original design and a sequence of changes to a set of evolution

paths, each comprising of a sequence of designs accommodating the changes. The last states of

these paths are the new designs that the original one could reach. Constraint 7.2.3 states that the

start design is the first state in the evolution path. Constraint 7.2.4 states that each transition step


consumes a change, leading to a set of new designs preserving that change.

DesignImpactAnalysis

acn : AugmentedConstraintNetwork

impact : (Assignment× (seq(Variable×Value))) 7→ (F(seqAssignment))

∀start : acn.cn.solutions; changes : seq(Variable×Value) •

∀n : 1 . .#changes • changes(n) ∈ (deriveDA(acn)).alphabet ⇒

impact(start,changes) = {path : seqAssignment | start = path(0) ∧ (7.2.3)

(∀n : 1 . .#path • path(n) ∈

(deriveDA(acn)).transition(path(n−1),changes(n))) (7.2.4)

}

7.2.2 Baldwin and Clark’s Theory

Given these formalizations, we found that the following key concepts in Baldwin and Clark’s theory

are formally accounted for in our framework:

• Design dimensions: Variable.

• Design decisions: Value.

• Design spaces: the solutions of a CN, that is, the state set of a DA 1.

• Design dependences: the PWDR model.

• Design rules (in part): the dominance relation models a property of a design rule 2.

The derivation of a DSM from a PWDR becomes straightforward. The cluster set element of an

ACN consists of a set of clusterings. A clustering represents the organization of variables into an

1Technically speaking, the design space for a problem is the entire set of possible designs that address that problem.A DSM or ACN generally models only a sub-space. When we say that we formalize the notion of design space, we meanthat we have formalized an approach to modeling such sub-spaces.

2According to Baldwin and Clark, design rules dominate other design decisions, decouple otherwise dependent deci-sions, and remain stable. This dissertation does not intend to formalize the stable property.


ordered tree, giving both a hierarchical clustering of variables and a linear ordering on the clusters

and variables within them. We elide the formal specification of clusterings in this dissertation. To

derive a DSM, we first compute the PWDR to populate a matrix, and then select a clustering to

order the columns and rows of the matrix, and to group them into a hierarchy of proto-modules.

Different DSMs can be derived from a given PWDR using different clusterings.

Given a derived DSM, the NOV of the structure is computed by (1) assigning values of the

parameter, technical potential, in Baldwin and Clark’s model to each of the top level modules, (2)

deriving the values expressing the complexity of each modules by structural analysis of the DSM (or

by other means), (3) reading from the DSM directly the ”sees” relation, and (4) plugging all these

values into Baldwin and Clark’s NOV formula. All of these steps are automated in our tool, Simon.

The bottom line is that our framework provides rigorous semantics to marks in a DSM and a way to

compute the net option value of modularity for formally precise, abstract models of software design

architectures.

7.3 The Divide-and-Conquer Approach and Its Correctness

Generating a design automaton for an ACN requires explicit enumeration of all the solutions, the

number of which grows exponentially with the number of variables involved. To address the scala-

bility issue, in Chapter 5 we presented a divide-and-conquer approach to decomposing a large ACN

model into a set of smaller sub-ACNs, solving each sub-ACN separately, analyzing these smaller

models, and composing the results. We showed that it is not necessary to compose a full DA from

sub-DAs. Instead, design impact analysis can be done in a divide-and-conquer approach, and a full

DSM can be derived from a full PWDR composed by the sub-PWDRs.

In this section we present the formalization of these approaches, and prove that our divide-

and-conquer approaches to impact analysis and PWDR derivation, respectively, are correct. Our

divide-and-conquer approaches to PWDR derivation and impact analysis involve the derivation of

sub-DAs from sub-ACNs, and the composition of results from these partial intermediate models.

Section 7.3.1 first formalizes ACN decomposition. Section 7.3.2 then presents our divide-and-


conquer approach to PWDR derivation and proves its correctness. Section 7.3.3 finally presents

and proves our divide-and-conquer approach to impact analysis.

7.3.1 ACN Decomposition

This section formalizes the key notions of our approach to ACN decomposition. The function de-

compose specifies the constraints between a full ACN and its decomposed sub-ACNs. The approach

we described in Chapter 5 decomposes a large ACN into a number of sub-ACNs that conform to

this specification.

decompose : AugmentedConstraintNetwork → FAugmentedConstraintNetwork

∀ full : AugmentedConstraintNetwork; sub : FAugmentedConstraintNetwork •

decompose(full) = sub⇒

full.cn.VariableSet = {var : Variable | ∃subacn : sub • (7.3.1)

var ∈ subacn.cn.VariableSet} ∧

full.cn.domain = {var : Variable; val : Value | ∃subacn : sub • (7.3.2)

(var,val) ∈ subacn.cn.domain} ∧

full.cn.ConstraintSet = {subcons : Constraint | ∃subacn : sub • (7.3.3)

subcons ∈ subacn.cn.ConstraintSet} ∧

(∀subacn : sub • subacn.dominance = {x,y : Variable | (x,y) ∈ full.dominance ∧ (7.3.4)

x ∈ subacn.cn.VariableSet ∧ y ∈ subacn.cn.VariableSet})

• (7.3.1) The union of the variable set of each sub-ACN equals the variable set of the full ACN.

• (7.3.2) The union of the domain of each sub-ACN equals the domain of the full ACN.

• (7.3.3) The union of the constraint set of each sub-ACN equals the constraint set of the full

ACN. We assume that all constraints are expressed in conjunctive normal form (CNF).

• (7.3.4) The dominance relation of a sub-ACN is a subset of the full ACN dominance relation

that involves the variables of the sub-ACN.


In the following sections, we are going to need two additional concepts. The first is the notion

of what we call the consistent solution set of a sub-ACN, by which we mean the subset of solutions

of a sub-ACN in a given decomposition of a full ACN, where these solutions are all consistent with

the solutions of other sub-ACNs in the given decomposition. The basic idea here is that a sub-ACN

generally has both a smaller set of variables and a weaker set of constraints than the large ACN

from which it was derived, and so can have solutions that are inconsistent with not only those of the

full ACN but also of other sub-ACNs.

consistentSolutions : AugmentedConstraintNetwork×AugmentedConstraintNetwork

→ FAssignment

∀ full : AugmentedConstraintNetwork; sub : AugmentedConstraintNetwork;

solset : FAssignment •

sub ∈ decompose(full) ∧ solset = {sol : sub.cn.solutions |

(∀subSolutions : sub.cn.solutions • (7.3.5)

∀subacn j : decompose(full)\{sub} •

∃compatibleSolution : subacn j.cn.solutions •

(∀sharedvar : (sub.cn.VariableSet∩ subacn j.cn.VariableSet) •

subSolutions.bindings(sharedvar) =

compatibleSolution.bindings(sharedvar)

)

)}

Function consistentSolutions specifies the computation of the consistent solutions of a sub-ACN

in the sub-ACN set decomposed from a full ACN. Constraint 7.3.5 states that for any solution,

subSolution, of a sub-ACN, sub, any other sub-ACN sharing its variables has at least one solution

in which the shared variables have the same assignment. For any individual sub-ACN, it is possible

that it has a solution satisfying its own constraints, but making the full constraint network incon-

sistent. In other words, for a solution, sol a, of an ACN, acn a, if there exists another sub-ACN,


acn b, sharing its variables, but the sol a assignment of these shared variables is not allowed in

acn b, we call sol a an “incompatible” solution. Constraint 7.3.5 ensures that all the incompatible

solutions of these sub-ACNs are excluded.

The second idea is that of what we call a consistent sub-DA, by which we mean the DA com-

puted from the consistent solution subset specified by the consistentSolutions schema. In preceding

chapters, we have been imprecise about the details of the divide-and-conquer approach. In partic-

ular, when we talked about decomposing an ACN into sub-ACNs, and computing corresponding

sub-DAs, what we meant was computing consistent sub-DAs. In this chapter, we make these ideas

precise. As we will see, a more rigorous statement of our approach is that we decompose an ACN

into sub-ACNs, compute consistent sub-DAs for each sub-ACN, and synthesize our results from

these consistent sub-DAs. Function deriveConsistentSubDA specifies the consistent sub-DAs com-

putation.

deriveConsistentSubDA : AugmentedConstraintNetwork×AugmentedConstraintNetwork

→ DesignAutomaton

∀ full : AugmentedConstraintNetwork; sub : AugmentedConstraintNetwork •

deriveConsistentSubDA(full,sub) =

computeDA(consistentSolutions(full,sub),sub.cn.domain,sub.dominance)

7.3.2 Divide-and-Conquer Pair-Wise Dependence Relation Derivation

This section presents our divide-and-conquer approach to PWDR derivation and proves its correct-

ness. We prove that a PWDR derived by the divide-and-conquer approach is equal to the PWDR

derived by brute-force means directly from an ACN. The proof requires an important lemma: a

DA composed from the consistent sub-DAs derived from the sub-ACNs is equal to the DA that

would be obtained by brute-force derivation from the given ACN. Subsection 7.3.2.1 proves this

lemma. Subsection 7.3.2.2 uses this lemma to prove the equality between the composed PWDR,

PWDRcomposed, and the directly derived PWDR, PWDRdirect.


7.3.2.1 Lemma: The Equality between the Composed DA and Directly Derived DA

In this subsection we show that the consistent sub-DAs corresponding to decomposed sub-ACNs

can be composed to produce a DA, DAcomposed, and that DAcomposed is equal to the DA that corre-

sponds directly to the given ACN, DAdirect. Figure 7.1 illustrates this idea as a commutative diagram.

We prove that this diagram commutes by first formalizing a function, deriveDA divide conquer,

that follows the “Divide-and-Conquer” path through the diagram and then by proving that it pro-

duces the same DA as obtained by following the “Brute-Force” path.

��

��

��

��

��

��

��

��

��

��

Figure 7.1: The Brute-Force and Divide-and-Conquer DA Derivation

The following Z definition formalizes our divide-and-conquer function:

• (7.3.6) For each state of the composed DA, each sub-ACN has a consistent solution (that is,

a state of its consistent sub-DA) that is a subset of this state. In other words, each consistent

sub-DA has a state that is compatible with the composed DA state.

• (7.3.7) The alphabet of the composed DA is the union of the alphabets of the consistent

sub-DAs.

• (7.3.8) In order to specify the destination states reached by a transition comprising a start

state, start, a change, change, we first specify several auxiliary sets:


deriveDA divide conquer : AugmentedConstraintNetwork → DesignAutomaton

∀acn : AugmentedConstraintNetwork; DA composed : DesignAutomaton;

subacnset : FAugmentedConstraintNetwork •

subacnset = decompose(acn) ∧ DA composed = deriveDA divide conquer(acn)⇒

(DA composed.states = {state : Assignment | ∀subacn : subacnset • (7.3.6)

(∃sub : consistentSolutions(acn,subacn) • sub.bindings⊆ state.bindings)}) ∧

(DA composed.alphabet = {var : Variable; val : Value | ∃subacn : subacnset • (7.3.7)

(var,val) ∈ (deriveConsistentSubDA(acn,subacn)).alphabet}) ∧

(DA composed.transition = {start : DA composed.states; (7.3.8)

change : DA composed.alphabet; composedendset : FAssignment |

(∃affectedacnset : FAugmentedConstraintNetwork; affectedvarset : FVariable;

unchangedbindings : Assignment; composedaffectedset : FAssignment •

affectedacnset = {aacn : subacnset | first change ∈ aacn.cn.VariableSet} ∧ (7.3.9)

affectedvarset = {affectedvar : Variable | (∃acn : affectedacnset • (7.3.10)

affectedvar ∈ acn.cn.VariableSet)} ∧

unchangedbindings.bindings⊆ start.bindings ∧ (7.3.11)

(∀pair : unchangedbindings.bindings • (first pair) /∈ affectedvarset) ∧

composedaffectedset = {affected : Assignment | (7.3.12)

domaffected.bindings = affectedvarset ∧ (∀aacn : affectedacnset •

(∃substart : (deriveConsistentSubDA(acn,aacn)).states •

substart.bindings⊆ start.bindings ∧

(∃subend : (deriveConsistentSubDA(acn,aacn)).transition(substart,change) •

subend.bindings⊆ affected.bindings)))} ∧

composedendset = {end : DA composed.states | (7.3.13)

(∃affected : composedaffectedset •

end.bindings = affected.bindings∪unchangedbindings.bindings)}

) • ((start,(change)),composedendset)})


a. (7.3.9) affectedacnset is a set of sub-ACNs that are affected by the specified change.

b. (7.3.10) affectedvarset is a set of variables that are involved in the affected sub-ACNs.

c. (7.3.11) unchangedbindings is a set of bindings that are not affected by the specified change,

because their involving variables do not belong to any affected sub-ACNs.

d. (7.3.12) composedaffectedset specifies the cross product of the destination state sets reached

in each affected consistent sub-DAs led by the specified change and originated from the correspond-

ing substart.

Given all these auxiliary sets, composedendset (7.3.13) specifies the destination states reached

by the given start state and change, which are the union of the unchangedbindings and the elements

of composedaffectedset.

Now we prove that the design automaton DAdirect directly derived from the original ACN is

equal to the composed DA, which we call DAcomposed.

(1) We first prove that the state set of DAdirect is equal to the state set of DAcomposed. That is, for

each state S in DAdirect, S is also in DAcomposed, and vice versa.

Theorem 1: For a given ACN, SuperACN, any state S in DAdirect = deriveDA(SuperACN) is

also in DAcomposed = deriveDA divide conquer(SuperACN).

Proof of Theorem 1: Because S is a solution of SuperACN, S satisfies all the constraints of

SuperACN. Let SubACNSet be the set of sub-ACNs obtained by decompose(SuperACN), and let

SubACN be an arbitrary element of SubACNSet. We claim that S must be consistent with the

constraints of SubACN and of all other elements of SubACNSet. The reason is that each such sub-

ACN has a non-trivial projection of S as a solution, SubS. The projection is non-trivial because the

variable set of the sub-ACN is a non-empty subset of the variable set of the SuperACN. Moreover,

because SubS is a projection of a solution to the SuperACN, SubS must be consistent with all other

sub-ACNs, which is to say that it must be a consistent solution of the sub-ACN. By constraint 7.3.6

in the specification of deriveDA divide conquer, S is therefore a state in DAcomposed. The reason is

that S is the union of a set of sub-states that satisfy the criteria in 7.3.6.

Theorem 2: For a given ACN, SuperACN, any state S in

DAcomposed = deriveDA divide conquer(SuperACN) is also in DAdirect = deriveDA(SuperACN).


Proof of Theorem 2: Each state of DAcomposed is the union of set of consistent solutions of

sub-ACNs. The sub-ACNs collectively embody all of the constraints of the ACN. The union of

consistent solutions is, by definition, consistent with the constraints of SuperACN.

Given Theorem 1 and Theorem 2, we conclude that the state set of DAdirect is equal to the state

set of DAcomposed.

(2) Next we prove that the transitions of DAdirect is equal to the transitions of DAcomposed.

Theorem 3: Given an ACN, SuperACN, and its DAdirect = deriveDA(SuperACN), any transition

T from state start to state end on assignment v = val in DAdirect is also a transition between the

corresponding states start and end in DAcomposed = deriveDA divide conquer(SuperACN). The

proof of this theorem depends on the following lemma:

Lemma 1: For any consistent sub-DA, subDAi, involving the change v = val, there exists a

transition t : ((substarti,(v,val)),subendi) in subDAi, where substarti is the projection of start, and

subendi is the projection of end.

Proof of Lemma 1: we prove this lemma by contradiction. Suppose the transition t is not

present in subDAi. For this to be the case, in essence the transition t must not be minimal (because

the other conditions for a transition are satisfied). If t is not minimal, then there must be another

state in the sub-DA, which we call it subend′i, where (subend′i \ substarti) ⊂ (subendi \ substarti).

But in this case, subend′i is a consistent state, which means that there is a state of the DAdirect,

end′, of which subend′i is a projection, and there has to be a transition from start to end′, such that

(end′ \ start)⊂ (end \ start).which violates the initial assumption that T is minimal.

The theorem follows from the combination of the lemma and our definition of the set of end

states of the composed DA. In particular, we need show that end is one of the end states in the

composed DA. Given the lemma that for all the m consistent sub-DAs involving the change v = val,

each consistent subDAi, i = 1..m has a transition t : ((substarti,(v,val)),subendi) in subDAi, where

substarti is the projection of start, and subendi is the projection of end, we let composedaffectedset

be the union of all the subendi, i = 1..m. Since composedaffectedset ⊆ end, we can see that end is

one of the destination states reached by the given start state and the change in DAcomposed, according

to constraints 7.3.12 and 7.3.13 in the specification of deriveDA divide conquer. In summary, the


transition T exists in DAcomposed, and Theorem 3 is proved.

Theorem 4: Given an ACN, SuperACN, and its

DAcomposed = deriveDA divide conquer(SuperACN), any transition T from state start to state end

on assignment v = val in DAcomposed is also a transition between the corresponding states start

and end in DAdirect = deriveDA(SuperACN). In order to prove this theorem, we need to prove the

following lemmas:

Lemma 2: In the state end, the change v = val is accommodated.

Proof of Lemma 2: In the specification of transitions in deriveDA divide conquer, con-

straint 7.3.12 specifies the cross product of the end states in these affected consistent sub-DAs:

composedaffectedset = subendset1 × subendset2 × ...× subendsetm. Because the change v = val

is accommodated by every subendi, i = 1..m, in every affected ∈ composedaffectedset, v must

be bound to val. Constraint 7.3.13 specifies the union of affected and the unchanged bind-

ings, unchangedbindings, and obtain the set of full destination states: composedendset = {end :

Assignment | ∃affected : composedaffectedset • end = affected ∪ unchangedbindings}. Because

end ∈ composedendset, end must have accommodated the change v = val.

Lemma 3: The transition between start and end is minimal.

Proof of Lemma 3: Since there exists affected ∈ composedaffectedset, such that end = affected∪

unchangedbindings, we just need to prove that affected is minimal. That is, there does not exist

a affected′ and end′ = affected′ ∪ unchangedbindings such that (end′.bindings \ start.bindings) ⊂

(end.bindings \ start.bindings). We prove this by contradiction. Suppose that affected′ exists,

and can be decomposed into the corresponding end states of each affected consistent sub-DAs,

subend′i, i = 1...m. We similarly decompose subend and start into the corresponding end and start

states of each affected consistent sub-DAs, subendi, i = 1...m and substarti, i = 1...m. Then there

must exist a consistent sub-DA subDAi, in which (subend′i \ substarti) ⊂ (subseti \ substarti). This

means that the transition from substarti to subendi labeled with v = val is not minimal, and should

not exist in subDAi. This is a contradiction. Thus, subend′i does not exist, and the transition between

start and end labeled with v = val is minimal.

Lemma 4: The dominance relation is respected.


Proof of Lemma 4: For any pair (v,w) in the dominance relation of DAdirect, if both v and

w belong to the same sub-ACNs, then their dominance relation must have been respected by the

corresponding consistent sub-DAs; if v and w do not appear in any sub-ACN together, then in

the composed transitions of DAcomposed, changes to v must not cause change to w. Otherwise, the

DAcomposed can not be minimal.

Given these lemmas and the definition of a design automaton, start and end are connected by

v = val in DAdirect, and Theorem 4 is proved. Given Theorem 3 and Theorem 4, we conclude that

the transitions of DAdirect is equal to the transitions of DAcomposed.

In summary of (1) and (2), DAdirect equals DAcomposed.

7.3.2.2 Divide-and-Conquer PWDR Derivation

We have already formalized the relationship between an ACN and its DA; the decomposition of an

ACN into sub-ACNs; and a divide-and-conquer approach to DA composition. In this subsection

we show that the sub-PWDRs derived from the consistent sub-DAs corresponding to the sub-ACNs

can be composed to produce a PWDR, and that this composed PWDR, PWDRcomposed, is equal to

the directly derived PWDR, PWDRdirect. Figure 7.2 illustrates this idea as a commutative diagram.

��

��

��

��

��

��

��

��

��

��

��

��

��

��

Figure 7.2: The Brute-Force and Divide-and-Conquer PWDR Derivation

We prove that this diagram commutes by first formalizing a function, deriveP-


WDR divide conquer, that follows the “Divide-and-Conquer” path through the diagram, and then

by proving that it produces the same PWDR as obtained by following the “Brute-Force” path.

The following Z definition formalizes our divide-and-conquer function, which simply specifies

the union of the sub-PWDRs computed from consistent sub-DAs to achieve the purpose.

derivePWDR divide conquer :

AugmentedConstraintNetwork → PairWiseDependenceRelation

∀acn : AugmentedConstraintNetwork; subacnset : FAugmentedConstraintNetwork;

pwdr : PairWiseDependenceRelation • subacnset = decompose(acn) ∧

derivePWDR divide conquer(acn) = pwdr ⇒

pwdr.pairs = {first : Variable; second : Variable |

∃subacn : subacnset • (first,second) ∈

(computePWDR(deriveConsistentSubDA(acn,subacn))).pairs

}

Having proved the equality between DAdirect and DAcomposed, we can easily prove the correct-

ness of the composed PWDR. We prove that if a pair (x,y) is in PWDRdirect, it must also be in

PWDRcomposed, and vice versa.

On one hand, if a pair (x,y) is in PWDRdirect, we know that there exists a transition in DAdirect

labeled with an assignment of x, and y is involved in accommodating this assignment. Since we

have proved that DAdirect equals DAcomposed, such a transition must also exist in DAcomposed. In

other words, such a transition must exist in one of the consistent sub-DAs, and consequently in the

computed sub-PWDR. Accordingly, (x,y) must also be in PWDRcomposed.

On the other hand, if a pair (x,y) is in the PWDRcomposed, we know that there exists a transition

in one of the consistent sub-DAs that is labeled with an assignment of x, and y is involved in

accommodating this assignment. Since DAcomposed is composed from sub-DAs, and is equal to

DAdirect, such a transition must exist in DAdirect. Accordingly, (x,y) should also be in PWDRdirect.

The proof is complete.


7.3.3 Divide-and-Conquer Design Impact Analysis

Given an original state, start, and a sequence of changes, the purpose of design impact analysis

(DIA) is to find all the evolution paths that start from start and go alone the edges of the DA labeled

with these changes. The last states of these paths accommodate these changes, and the evolution

paths represent the different ways these changes can be accommodated.

Our divide-and-conquer design impact analysis involves the following steps:

(1) For the original state start, we identify a substart in each consistent sub-DA that is a subset

of start, and consider the first variable-value pair, change, of the sequence of changes.

(2) Suppose that m out of the n consistent sub-DAs involve change. For the set of variables that

are not involved in any of these affected m consistent sub-DAs, we keep their value assignment,

which we call unchangedbindings.

(3) For each of the m affected consistent sub-DAs, we find the set of destination states:

subendseti, i = 1...m, that the change leads to.

(4) Then we compute the cross product of the end states in these affected consistent sub-

DAs: composedaffectedset = subendset1 × subendset2 × ...× subendsetm. For each affected ∈

composedaffectedset, we then compute the union of affected and the unchanged bindings,

unchangedbindings, and obtain the set of full destination states: composedendset = {composedend :

Assignment | ∃affected : composedaffectedset • composedend = affected ∪ unchangedbindings}.

The correctness of our divide-and-conquer DIA approach depends on the following theorem:

Theorem 5: The composedendset equals the destination state set that would have been reached

by the given start and change in DAdirect. The proof of this theorem depends on the following

Lemmas:

Lemma 5: The composedendset comprises all the destination states that the same start state and

the same change would have reached in DAdirect. It is easy to see that composedendset has taken

into account all the possible ways to accommodate the change, and this condition is satisfied.

Lemma 6: For each state end ∈ composedendset, the transition between start and end is min-

imal. Proof of Lemma 6: Since end is the union of unchangedbindings and an element, affected,

of composedaffetedset, we just need to prove that affected is minimal. That is, there does not exist


an affected′ and end′ = affected′∪unchangedbindings such that (end′.bindings\ start.bindings)⊂

(end.bindings \ start.bindings). We prove this by contradiction. Suppose that affected′ exists,

and can be decomposed into the corresponding end states of each affected consistent sub-DAs,

subend′i, i = 1...m. We similarly decompose subend and start into the corresponding end and start

states of each affected consistent sub-DAs, subendi, i = 1...m and substarti, i = 1...m . Then there

must exist a consistent sub-DA, subDAi, in which (subend′i \ substarti)⊂ (subseti \ substarti). This

means that the transition from substarti to subendi labeled with v = val is not minimal, and should

not exist in subDAi. This is a contradiction. Thus, subend′i does not exist, and the transition between

start and composedend labeled with v = val is minimal.

In summary, Theorem 5 is proved and this step finds the set of destination states that would

have been reached by the given start and change in DAdirect.

(5) We now consider each composedend as a new state, start, consider the sequences of changes

with the first change removed as the new sequence of changes, and go back to step (1).

After all the changes are consumed, we get the evolution paths that are equal to the paths

otherwise obtained using the brute-force design impact analysis on DAdirect.

7.4 Chapter Summary

To address the problem that important modularity theories remain informal and imprecise, not suf-

ficient to provide the basis for tool-supported automation, this chapter contributes the formalization

of our framework. We claim that our framework is able to account for the key notions of the follow-

ing important but informal theories: (1) Baldwin and Clark’s key concepts of design dimensions,

design decisions, design spaces, design rules, and design dependences; (2) Parnas’s information

hiding modularity; and (3) Parnas’s changeability analysis. We have supported these claims by

showing that all these concepts and problems can be formally defined in the context of our models.

We have also formalized our divide-and-conquer approaches and proved the correctness.

Chapter 8

Simon: The Tool

Previous chapters have presented the modeling and analysis techniques of our framework. We claim

that our modeling framework can be supported by tools, that the analyses can be automated with

reasonable performance at least for small but representative design models, and that the tool helps

to validate the modeling approach. In order to test these hypotheses, we developed a prototype

tool, called Simon 1, that implements our framework. We used Simon to automatically analyze the

problems people previously analyzed qualitatively or manually. The results Simon produced either

confirm previous results or reveal errors in them precisely and quantitatively. These experiments

using Simon constitute the evidence we have developed to date in support of our claims that our

framework is valid. The implementation of Simon also enables the dissemination of our results

independent of our modeling techniques, providing the first step towards a practical tool for design

modeling and analysis.

Simon supports formal design modeling through interactive graphical user interfaces (GUIs),

and automates design impact analysis, design structure matrix derivation, and net option value cal-

culation. These modeling and analysis techniques are based on the formalizations introduced in

Chapter 7. All the results and screen snapshots presented in this dissertation are based on experi-

ments using Simon. Chapter 4 presents the performance data of analyzing Parnas’s KWIC example.

Chapter 9 presents the performance data of analyzing the ACN models of a web application and a

1Our tool is named after Herbert A. Simon, the pioneer of decision making theories and the father of artificial intelli-gence.

121

Chapter 8. Simon: The Tool 122

peer to peer networking system. The other two designs presented in this dissertation, the figure ed-

itor example and a fault tree analysis tool, are small enough so that Simon automates their analyses

instantaneously. The evidence in support of our claims has several parts: (1) The fact that all the

representative designs presented in this dissertation have been modeled as ACNs or CACNs using

Simon provides evidence that our modeling technique can be supported by tools; (2) Simon com-

putes all the analyses with acceptable response time; and (3) Simon produces the analysis results

that validate our framework.

This chapter introduces the graphical user interfaces (GUIs) and modeling languages of Simon

to show how the framework introduced in this dissertation can be possibility supported by a tool.

That is, how ACN and CACN models can be built and analyzed using Simon. At present, Simon is

still a research prototype implemented for proof of concept, for performance testing, and for model

validation purposes.

Section 8.1 describes how to construct ACNs and CACNs using Simon. We introduce ACN

modeling first because ACNs are the basic models for automated analyses. After that, we describe

how to build extended CACN models using Simon and how to reduce a CACN into ACNs. This

section also presents the grammar of the ACN and CACN modeling language notation. We have

shown ACN and CACN models expressed using these languages throughout the dissertation, such

as Figure 4.3 in Chapter 4. Simon provides GUIs for the user to construct the expressions of

these languages easily. At present, the language is sufficient to model the designs presented in this

dissertation, but its syntax is somewhat arbitrary and subject to change. The further development

of Simon language is among our future work. Section 8.2 describes how Simon solves constraint

networks and generates design automata and pair-wise dependence relations. Section 8.3 describes

how to use the Simon GUI to automate design evolvability and economic-related analyses, using

either the brute-force or divide-and-conquer approaches.


8.1 Interactive Formal Design Modeling

The GUIs of Simon are built using C#, and a Simon project can be saved as a set of files. This

section introduces Simon GUIs for ACN and CACN construction, as well as the textual modeling

language notations for both models.

8.1.1 ACN Modeling

��

� ��

��

��

��

�� !�

��

�� "�

��#�

�� $�

��%�

��&�� '�� ()*� �(+*� �(,�

��#��- �� ( �

��.��/ �� (0�

��

-��

�� '��

1��&��

#��

#��%�

��%�

Figure 8.1: Core Models and Analysis in Simon

8.1.1.1 Overview

Figure 8.1 shows how Simon supports the core models introduced in Chapter 3. As we have ex-

plained, an ACN model consists of these elements: a constraint network, a dominance relation,

and a cluster set. Simon enables the user to input these elements through different tab pages of a


tab control, as shown in Figure 8.2, 8.3, and 8.4. Given an ACN, Simon first solves the constraint

network and stores all the solutions into a .sol file. After that, Simon takes the solutions and the

dominance relation as input, generates a design automaton and a pair-wise dependence relation, and

stores them into a .da file and a .dep file. At this point, the user can analyze design impacts using

the GUIs (Figure 8.15, 8.16, and 8.17), derive design structure matrices (8.18), and compute net

option values (8.19).

8.1.1.2 Graphical User Interfaces

Figure 8.2 shows the constraint network tab page. The left list box shows all the design variables,

and the right list box shows all the constraints. To add new variables, the user selects Edit−>Add

Variable from the main menu. After that, a variable input GUI appears so that the user can input

or edit a scalar variable and its domain. Newly added variables will be displayed in the GUI shown

in 8.2. The domain of a selected scalar variable will be shown in the lower left box of the tab page.

To add new constraints, the user selects Edit−>Add Constraint from the main menu. After that, a

constraint input GUI appears so that the user can input or edit a constraint as a logical expression.

Newly added constraints will be displayed in the GUI shown in 8.2. The syntax of constraint

expressions are defined in the next subsection.

Figure 8.3 shows the dominance relation tab page, in which the user can construct the domi-

nance relation through a grid control. Checking a cell dictates that the variable on the row can not

influence the variable on the column.

Figure 8.4 shows the cluster set tab page, in which the user can create, delete, or edit a cluster

by moving variables around and aggregating variables into modules. Newly added clusters will be

shown in the upper left cluster set box of the form.

The cluster set boxes on the upper left of these GUIs display existing clusterings. The selected

cluster boxes display the selected cluster. Selecting a different cluster reorders the variables that are

displayed in the tab control.


Figure 8.2: Simon: Constraint Network Construction

8.1.1.3 Constraint Network Modeling Language

After saving a project using the main menu, Simon saves the constraint network as an internal

language file. The dominance relation is saved as a plain text file, and the cluster set is saved an

XML file. Opening a Simon project will load these files and show the contents into these GUIs. The

language productions are shown in Figure 8.5. These productions are presented using the grammar

of ANTLR (ANother Tool for Language Recognition) 2, a language tool that provides a framework

for constructing recognizers, compilers, and translators from grammatical descriptions. Simon uses

ANTLR as a component to generate the parsers and lexers of its internal languages.

• The ds production defines that a design space consists of a number of expressions:

ds: "DesignSpace" modelName LCURLY (dsExprs ";")* RCURLY

2http://www.antlr.org/


Figure 8.3: Simon: Dominance Relation Construction

• The dsExprs production defines that an expression can be either a variable expression or a

logical expression: dsExprs: dsVar | predicate

• The dsVar production defines that a variable has a name and a domain:

dsVar: varName ":" enumType

• The enumType production defines that a domain consists of a set of values:

enumType: (LCURLY valueName("," valueName)* RCURLY)

• The bindingDecl production defines the syntax of specifying that a variable is, or is not, of

a value:

bindingDecl : varName ("!=" | "=") valueName

• The primitive production defines that a primitive logical expression can be either a binding

or a more complex predicate:

primitive : bindingDecl | (LPAREN predicate RPAREN)


Figure 8.4: Simon: Cluster Set Construction

ds: "DesignSpace" modelName LCURLY (dsExprs ";")* RCURLY;dsExprs: dsVar | predicate ;dsVar: varName ":" enumType;enumType: (LCURLY valueName("," valueName)* RCURLY);bindingDecl : varName ("!=" | "=") valueName;primitive : bindingDecl | (LPAREN predicate RPAREN);relationalDecl : primitive (("&&" | "||") primitive)*;predicate: (relationalDecl (("=>" | "<=>" ) relationalDecl )*);

Figure 8.5: ACN Language Productions

• The relationalDecl and predicate define the logical expression syntax, in which && and

|| have higher priority than => and <=>:

relationalDecl : primitive (("&&" | "||") primitive)*

predicate: (relationalDecl (("=>" | "<=>" ) relationalDecl )*)


8.1.2 CACN Modeling

As introduced in Chapter 6, a CACN model extends the ACN model by adding set-valued variables,

subspace-valued variables, and universally quantified logical expressions. We extend the GUIs and

language of Simon to accommodate these changes in modeling. Simon also provides a GUI in

which the user can specify a value for each set-valued variable and subspace-valued variable, and

translate the parameterized CACN into ACNs.

Through the main menu, the user can open or construct an extended CACN model as introduced

in Chapter 6. Figure 8.6 shows the GUI of CACN modeling 3. The upper left tree view in Figure 8.6

shows the CACN design space in which variables are shown as tree nodes. A design variable with

subspaces has children, each modeling a subspace. The Variables and Predicate list boxes show

all the variables and predicates within the parent space or a selected subspace. The Variable Detail

block shows the details of a selected scalar or set-valued variable.

Figure 8.6: Simon: Complex Augmented Constraint Network

To construct or edit a CACN model, the user clicks the Edit button, and then adds or edits vari-3Yuanyuan Song in our research group implemented the GUIs of CACN modeling, as well as the translation from a

CACN model into ACN models.


ables using the GUI shown in Figure 8.7. After editing the variables, clicking the Add Constraint

button reveals the constraint editing GUI in which the user could further input logical expressions,

including universally quantified expressions. The next subsection introduces the syntax produc-

tions. If newly added or edited variable has subspaces, Simon will recursively present the GUI as

shown in Figure 8.6 so that the user can construct the subspaces.

Figure 8.7: Simon: Design Variables

From the GUI shown in Figure 8.6, clicking the To ACN button brings a new GUI in which

the user can specify a value for each set-valued variable and subspace-valued variable, as shown in

Figure 8.8. After that, clicking the Finish button will cause Simon to translate the parameterized

CACN into an ACN, as explained in Chapter 6. All the CACN forms are closed and the user can

work on the newly generated ACN, as shown in Figure 8.9.

8.1.3 CACN Modeling Language

After saving a CACN project, the CACN model is saved as an CACN language file, and will be

loaded when the user opens the project later. Figure 8.10 shows the CACN notation productions.

As a matter of fact, the CACN language is a superset of the ACN language, and we can use the


Figure 8.8: Parameterize a CACN

CACN language and GUIs to model a ACN design. For the purpose of concept proving, Simon

provides two sets of GUIs and language syntax for these two models at present.

Following are several extended productions in the CACN modeling language:

• The expressions, dsExprs production, now include subspace declaration and quantified log-

ical constraints:

dsExprs: varDecl | predicate | subSpace | quanpredicate

• The variable declaration, varDecl production, now includes set-valued variables and sub-

space variables: varDecl: scalarVar | setVar | subspaceVar

• The subspace variable declaration syntax is defined by subspaceVar production:

subspaceVar: "subspace" varName ":"

(LPAREN valueName ("," valueName)* RPAREN)

• The set-valued variable declaration syntax is defined by setVar:

setVar: "set" varName (setRef |

((LPAREN valueName ("," valueName)* RPAREN)":") (setMatch | setNew))


Figure 8.9: Automatically Generated ACN

• The setRef production defines set-valued variables that refer to other variables specified

after *: setRef: (LPAREN "*" varName RPAREN)

• The setMatch production defines set-valued variables that are brought into being by another

variable or variable set after %:

setMatch: ("%"varName ("*" "%" varName)*)

• The setNew production defines variables that could be either scalar variables or set variables

that bring in new design dimensions, as defined in the production valueSet:

setNew: ("set" | (LPAREN valueSet ("," valueSet)* RPAREN))

• The subSpace production defines a subspace:

subSpace: IDENT LBRACK (dsExprs ";")* RBRACK

• The matching and quanpredicate productions define the constraints between variables

with one-to-one correspondence relation:


ds: "DesignSpace" IDENT LBRACK (dsExprs ";")* RBRACK;dsExprs: varDecl | predicate | subSpace | quanpredicate ;varDecl: scalarVar | setVar | subspaceVar ;scalarVar: "scalar" varName ":" (LPAREN valueName ("," valueName)* RPAREN);subspaceVar: "subspace" varName ":" (LPAREN valueName ("," valueName)* RPAREN);setVar: "set" varName (setRef | ((LPAREN valueName ("," valueName)* RPAREN)":")(setMatch | setNew));setRef: (LPAREN "*" varName RPAREN)( | ":"(LPAREN valueSet ("," valueSet)* RPAREN));setMatch: ("%"varName ("*" "%" varName)*);setNew: ("set" | (LPAREN valueSet ("," valueSet)* RPAREN));valueSet: (valueName setDecl | valueName);setDecl: "{" valueName ("," valueName)* "}";matching: "%" (varName ":" varName) ("," (varName ":" varName))* "%";quanpredicate: matching ( | "|" predicate ("," predicate)*);predicate: (relationalDecl (("=>"ˆ | "<=>"ˆ ) relationalDecl )*);subSpace: IDENT LBRACK (dsExprs ";")* RBRACK;bindingDecl : bindingSingle | bindingSet ;bindingSingle : varName ("!="ˆ | "="ˆ) valueName;bindingSet : "˜" varName ("!="ˆ | "="ˆ) valueName;primitive : bindingDecl | (LPAREN predicate RPAREN) | setMember;setMember: varName "in" varName;relationalDecl : primitive (("&&"ˆ | "||"ˆ) primitive)*;

Figure 8.10: CACN Language Productions

matching: "%" (varName ":" varName) ("," (varName ":" varName))* "%"

quanpredicate: matching ( | "|" predicate ("," predicate)*)

• The bindingSet production defines a universal quantification, dictating that all the variables

within a set are bound with the same value:

bindingSet : "˜" varName ("!="ˆ | "="ˆ) valueName

• Since a CACN model supports set-valued variables, we add a primitive expression,

setMember, to denote the membership relation:

setMember: varName "in" varName

8.2 Constraint Solving and DA, PWDR Generation

After an ACN model is built or automatically generated from a parameterized CACN, the user

can analyze design impacts, derive DSMs, or compute net option values. In order to do these


analyses, the user should first let Simon solve the constraint network, and then generate the design

automaton and pair-wise dependence relation as internal data structures, using the menus shown

the Figure 8.11. The user can either choose to solve the whole constraint network and generate

the full DA and PWDR (by clicking the menu items Solve Whole CN and Generate Whole DA),

or decompose the ACN into a number of sub-ACNs by clicking the Decompose menu, and then

solve each sub-ACN and generate sub-DAs and sub-PWDRs (by clicking the Solve Modular CN

and Generate Modular DA menu items).

Figure 8.11: Simon: Solve Constraint Network

Figure 8.12 shows a scenario when the KWIC ACN is decomposed into 6 sub-ACNs and a

prompt is shown after the decomposition is completed. Internally, each sub-ACN is saved as a

separate Simon project that the user can open and analyze separately.

Simon uses Alloy internally as a SAT solver. Since Alloy is a Java program, we wrote a small

helper program in Java to invoke the Alloy SAT solver through Alloy APIs, and translate Alloy

outputs into a text file (a .sol file) in the format Simon requires. Given an ACN or a sub-ACN,

Simon first takes the constraint network part and translates it into an Alloy specification. After that,

Simon invokes the helper program as a separate thread, which reports to Simon when the constraint

solving is complete. Figure 8.13 shows a scenario when all the decomposed sub-ACNs are solved.

For performance reasons, we wrote a DA and PWDR generation program in C. This program


Figure 8.12: Simon: Decompose a Large Constraint Network

takes a solution file and a dominance relation, generates a DA and a PWDR, and stores them as plain

text files (a .da file and a .dep file) in the format requires by Simon. After the user clicks Generate

Whole DA or Generate Modular DA menu items, Simon invokes this program as a separate thread

to process the whole ACN, or invokes multiple threads to generate the sub-DA and sub-PWDR for

each sub-ACN separately.

8.3 Automated Design Analysis

After the constraint network is solved and internal data structures, DAs and PWDRs, are generated,

the user can do a number of analyses. Figure 8.14 shows two analysis menu items: design impact

analysis and design structure matrix generation. The net option value calculation is based on DSMs,

and its GUI will be accessible through a DSM GUI.

8.3.1 Design Impact Analysis

As introduced in Chapter 3, the inputs of design impact analysis include an original design and

a sequence of changes, and the output includes a set of evolution paths. Figure 8.15 shows the

Simon GUI in which the user can specify a design by selecting a value for each variable. After that,

clicking the Verify button tests whether the specified design is valid.


Figure 8.13: Simon: Sub-ACNs are Solved

After verifying a valid design and clicking the Select button, the user can specify a change by

selecting another value for a changing variable using the GUI shown in Figure 8.16. All the changed

variables are shown in the lower list view as a sequence.

After specifying an original design and a sequence of changes, the use can click the analysis

menu item to analyze design impact and get the output in a GUI as shown in Figure 8.17. The

upper block shows two evolution paths. Clicking the corresponding radio button shows the selected

design in the evolution path in the lower box. The middle list view shows the differences between

the original design and the selected design in the evolution path.

8.3.2 Design Structure Matrix Derivation

Clicking the Design Structure Matrix menu will generate a DSM, as shown in Figure 8.18. A DSM

is derived from a PWDR and a selected clustering from the clustering set. Selecting a different

clustering method will reorder the DSM.

8.3.3 Net Option Value Computation

Clicking the NOV menu item in the DSM GUI reveals the net option value calculation GUI, as

shown in Figure 8.19. The user can input additional NOV related parameters, such as the estimated


Figure 8.14: Simon: Design Automaton and Pair-wise Dependence Relation Generation

technical potential, for each module using the control on the upper left. This GUI displays differ-

ent module parameters according to the selected clusterings. The user can experiment with a new

modularization method and compute its NOV by first constructing a new clustering using the GUI

shown in Figure 8.4, and then computing the corresponding NOV. The upper right block summa-

rizes all the parameters of all the modules. The lower right grid are the automatically computed

NOV values for each module. The final system NOV is shown at the bottom.

8.4 Chapter Summary

In summary, this chapter introduced Simon, our prototype tool, that implements the framework

introduced in this dissertation. The fact that all the five representative designs presented in this

dissertation have been constructed as ACNs or CACNs using Simon, and that Simon automates

all the analyses within reasonable amount of time has provided evidence supporting our claims

that the formal modeling techniques of our framework can be supported by tools, that the analysis

techniques of our framework can be automated with reasonable performance, and that Simon can

help us evaluate our modeling framework as a novel artifact.

Simon is currently at the stage of concept proving. We use Alloy internally and use C# for

GUI development. Since Alloy uses SAT solvers (mchaff or zchaff) that are written in C, Simon


Figure 8.15: Design Impact Analysis: Select an Original Design

is actually taking the following route: C#− >Java− >C− >Java− >C#. The input and output

transitions in between are through writing and loading text files, which is unnecessarily complex

and time consuming. Bypassing Alloy but using a SAT solver directly is among our future work.


Figure 8.16: Design Impact Analysis: Specify a Change

Figure 8.17: Design Impact Analysis: Evolution Paths


Figure 8.18: Design Structure Matrix Derivation

Figure 8.19: Net Option Value Calculation

Chapter 9

The Generalizability of the Approach

Our long-term goal is to develop a modeling and analysis theory and tools of value to practicing

software architects and decision-makers. In particular, we hope that developers will someday be

able to use such tools to help make major design decisions: high-consequence design decisions that

have not yet been made. Reaching this goal is beyond scope of the work presented in this disserta-

tion. Nevertheless, it is important that we make and defend a critical claim: that our modeling and

analysis approach generalizes beyond the set of small models, such as KWIC, which we used as

experimental subjects in developing and refining the approach itself. Is the approach applicable to

systems and to modeling and analysis experiments that are (a) beyond those used in developing the

approach, (b) beyond those developed by the authors of the approach, and (c) to modeling and anal-

ysis of designs models for real system? We claim that our approach generalizes in these dimension.

In support of this claim, we present evidence in the form of three additional case studies.

Section 9.1 presents a replication study, using our approach, of the modeling and economic

analysis of a web-based e-commerce application developed and studied by Lopes et al. [49]. The

application was called WineryLocator. Lopes et al. employed Baldwin and Clark’s modeling and

analysis technique to quantitatively compare a range of possible designs for this system, including

both object- and aspect-oriented designs. We represent these designs using ACNs according to

their design descriptions, generate DSM models, and compare with the models that they developed

by hand. Our results confirmed their results in general. However, as with the replication study

140

Chapter 9. The Generalizability of the Approach 141

of our own KWIC example, we found a number of errors in their published models and results.

Both outcomes—that our overall results are similar to theirs and that ours also revealed errors in

earlier, informal reasoning—tend to confirm our claim that our approach is sufficient to reproduce,

but formally and with higher precision, published design studies of systems developed by other

researchers.

Section 9.2 presents the modeling and analysis of a peer-to-peer networking system, Hyper-

Cast [48, 47], developed by the network researchers in the University of Virginia and studied by

Sullivan et al. [63]. Similar to the WineryLocator paper, the authors compared different designs

using manual models. Remodeling these designs into our framework and analyzing them automat-

ically reveals important issues missed in the manual models.

These sections demonstrate how to construct the formal models of these designs, how a large

model is decomposed into small sub-models so that we can get the results quickly, and the com-

parison of the derived DSMs with published manual models. The comparison reveals errors and

problematic issues in the manual models that the authors used to compute NOV values, which

implies potential problems in their quantitative results. These experiments reveal the benefits of

automating design analysis based on precise models.

Although the authors of these papers applied the same NOV formula, they estimated the param-

eters in dramatically different ad hoc ways. In fact, the application of the NOV model to software

design involves the revision and extension of the model itself, which is our on-going research. As

a result, this dissertation does not go further evaluating or comparing the NOV experiments and

results for these two designs.

Section 9.3 presents the modeling and analysis of the Galileo dynamic fault tree analysis tool,

developed at the University of Virginia for production use at NASA [66, 65, 24]. The Galileo de-

signers once faced a situation when they had to make a decision about how to restructure part of the

system. They reached a decision based on discussions and arguments, rather than rigorous analysis.

Modeling and analyzing this historical scenario using Simon, the designers are now able to compare

different decisions comprehensively and justify their decision rationally.


9.1 A Web Application—Winery Locator

In their paper [49], Lopes et al. studied a web application called WineryLocator. The authors used

DSMs to model and compare object-oriented and aspect-oriented designs for WineryLocator. Their

purpose, analogous to that of Sullivan et al., was to model the value of modularity [64], in this case,

with a focus on the benefits of aspect-oriented modularity.

WineryLocator is designed to locate wineries in California. A user can input either an approxi-

mate or exact address, by which the application locates the exact address as the valid staring point.

After that, the user can select preferences for the wineries. Given a starting point and the pref-

erences, the application generates a route for a tour consisting of all the wineries that match the

preferences. The application outputs a set of stops in the route, a navigable map, and the driving

directions under user request.

In this section, we first briefly present how these designs are modeled, and then introduce how

large ACNs are split into a number of smaller ones that are solved individually, and present the

integrated results. Finally, we compare the discrepancies between our derived DSMs and the manual

models in their paper. The differences revealed several ambiguities and problematic issues in their

imprecise modeling method and manual version DSMs.

9.1.1 ACN Modeling

We first construct the ACN model according to the application description. To locate wineries and

get directions using the application, the user first inputs either an approximate or exact address,

by which the application locates the exact address as the valid starting point, a function called

startWineryFind. After that, the user can select preferences for the wineries, the searchWinery

function. Given a starting point and the preferences, the application generates a route for a tour

consisting of all the wineries that match the preferences, the tour function. The application also

presents the driving directions upon user requests, the directions function.

The application depends on MapPoint as the main address and routing service, and they devel-

oped a local service WineryFind to find wineries matching criteria. The interfaces provided by these


services are called MapPointDesignRules and WineryFindDesignRules. In order to authenticate

MapPoint service requests, each authentication function has to implement a Java Servlet interface

HttpSessionListener, and uses ApacheAXIS to insert authentication parameters to MapPoint.

There are three supporting functions: the function to get addresses from MapPoint, AddressLo-

cator, to get routes from MapPoint, RouteMapHandler, and to get wineries from the local service,

WineryFinder. Since AddressLocator and RouteMapHandler access the MapPoint service, they

have to be authenticated. They do so by making two new classes AuthAddressLocator and Au-

thRouteMapHandler, which inherit from AddressLocator and RouteMapHandler respectively, and

the authentication is taken care of by these subclasses. All the web service function calls have to be

logged by WebServiceLogger.

9.1.1.1 Object-Oriented Design

To model this system into an ACN, we add “_interface” and “_impl” to the end of each function

name to represent the fact that in an OO design, each function leads to an interface and an imple-

mentation. For example, AddressLocator_interface and AddressLocator_impl are the two

variables addressing the AddressLocator function. We call this design WineryLocator OO.

Figure 9.1 lists all the constraints among these variables, which include the following categories:

• The relations among service functions, as shown from Line 1 to Line 5.

• Supporting functions depend on that relative services are available, as shown from Line 6 to

Line 21.

• Implementations depend on interfaces, as shown from Line 22 to Line 30.

• User functions depend on supporting function interfaces and other user function interfaces

they use, as shown from Line 31 to Line 43.

• Object-oriented constraints, for example, AuthAddressLocator inherits from AddressLocator,

as shown in Line 44 to Line 45.


1: HttpSessionBindingListener = orig => Servlet = orig;2: MapPointDesignRules = orig => ApacheAXIS = orig;3: WineryFindDesignRules = orig => WineryFind = orig;4: MapPointDesignRules = orig => MapPoint = orig;5: WineryFindDesignRules = orig => ApacheAXIS = orig;

6: startWineryFind_impl = orig => MapPointDesignRules = orig;7: tour_impl = orig => MapPointDesignRules = orig;8: directions_impl = orig => MapPointDesignRules = orig;9: directions_impl = orig => Servlet = orig;10: searchWinery_impl = orig => MapPointDesignRules = orig;11: searchWinery_impl = orig => WineryFindDesignRules = orig;12: startWineryFind_impl = orig => Servlet = orig;13: tour_impl = orig => Servlet = orig;14: searchWinery_impl = orig => Servlet = orig;15: AuthAddressLocator_impl = orig => ApacheAXIS = orig;16: AddressLocator_impl = orig => MapPointDesignRules = orig;17: RouteMapHandler_impl = orig => MapPointDesignRules = orig;18: WineryFinder_impl = orig => WineryFindDesignRules = orig;19: AuthRouteMapHandler_impl = orig => ApacheAXIS = orig;20: AuthRouteMapHandler_impl = orig => HttpSessionBindingListener = orig;21: AuthAddressLocator_impl = orig => HttpSessionBindingListener = orig;

22: AddressLocator_impl = orig => AddressLocator_interface = orig;23: RouteMapHandler_impl = orig => RouteMapHandler_interface = orig;24: AuthRouteMapHandler_impl = orig => AuthRouteMapHandler_interface = orig;25: AuthAddressLocator_impl = orig => AuthAddressLocator_interface = orig;26: searchWinery_impl = orig => searchWinery_interface = orig;27: tour_impl = orig => tour_interface = orig;28: directions_impl = orig => directions_interface = orig;29: WineryFinder_impl = orig => WineryFinder_interface = orig;30: WebServicesLogger_impl = orig => WebServicesLogger_interface = orig;

31: searchWinery_impl = orig => WineryFinder_interface = orig;32: searchWinery_impl = orig => startWineryFind_interface = orig;33: searchWinery_impl = orig => tour_interface = orig;34: startWineryFind_impl = orig => AuthAddressLocator_interface = orig;35: startWineryFind_impl = orig => searchWinery_interface = orig;36: startWineryFind_impl = orig => startWineryFind_interface = orig;37: tour_impl = orig => AuthRouteMapHandler_interface = orig;38: tour_impl = orig => startWineryFind_interface = orig;39: tour_impl = orig => directions_interface = orig;40: directions_impl = orig => startWineryFind_interface = orig;41: WineryFinder_impl = orig => WebServicesLogger_interface = orig;42: RouteMapHandler_impl = orig => WebServicesLogger_interface = orig;43: AddressLocator_impl = orig => WebServicesLogger_interface = orig;

44: AuthAddressLocator_interface = orig => AddressLocator_interface = orig;45: AuthRouteMapHandler_interface = orig => RouteMapHandler_interface = orig;

Figure 9.1: WineryLocator OO Constraints


9.1.1.2 OO Design with Design Rules

In order to improve the WineryLocator OO design, they introduced five interfaces to decouple the

effects of MapPoint as much as possible. Following how they name these interfaces, we model

them as the following design variables:

• startAddress_Address: the starting location the user provides and selects.

• matches_Address: the data structure storing the set of matched addresses.

• WinerySearchOption: the data structure storing the preferences.

• Tour: tour representation.

• MapOptions: standard map options.

We call this design WineryLocator DR. The constraints are adjusted so that some design vari-

ables now assume these design rules.

9.1.1.3 Aspect-Oriented Design

The author then presented an AO design where logging and authentication functions are imple-

mented using aspects. Modeling the AO design is similar to modeling the OO design. The aspects

are also design dimensions that can be modeled by design variables: aop_Authentication and

aop_WebServiceLogging.

The constraints in the AO design change a little: these aspect variables now assume the im-

plementations of other functions. The other functions no longer need to be aware of these two

functions. In essence, modeling AO and OO designs are similar. In both ACN models, logging

and authentications are just design variables. The two designs differ in how the variables are con-

strained.

For the dominance relation modeling in all three designs, we assume that nothing could affect

these third-party services and interfaces; and implementations should not affect the specified design

rules and interfaces. In the AO design, the authentication and logging aspects shouldn’t affect the

functions they advise.


Table 9.1: Performance for WineryLocator OO Modelsub- Size CN Solving DA GenerationACN (seconds) (seconds)

1 5 4 < 12 6 6 < 13 6 5 < 14 6 8 < 15 5 4 < 16 11 110 < 17 8 20 < 18 9 28 < 19 2 2 < 1

10 6 7 < 1

9.1.2 Modular Analysis Results

Without decomposition, it took Simon a whole day to solve these constraint networks, and the DA

generation took a couple of days. Basically, we are not able to generate DSM models within a

reasonable amount of time without decomposition.

The WineryLocator OO ACN with 27 variables is decomposed into 10 sub-ACNs. Table 9.1

lists the number of variables (Size), the constraint solving time, and the DA generation time for

each sub-ACN. Since Simon invokes multiple solvers in parallel, the constraint solving bottleneck

depends on the largest sub-ACN to solve. In this case, the 6th sub-ACN with 11 variables takes

about 2 minutes to find all its solutions. After that, DAs and DSMs are generated within a second.

After applying the decomposition approach introduced in Chapter 5, the WineryLocator DR

ACN with 32 variables is also decomposed into 10 sub-ACNs. Table 9.2 shows the performance.

In this case, the largest sub-ACN took about 1 minute to solve.

The WineryLocator AO ACN with 29 variables is decomposed into 9 sub-ACNs. Table 9.3

shows the performance. In this case, the largest sub-ACN took about 40 seconds to solve.

Comparing Table 9.2 with Table 9.3, we observe that the aspect design has one fewer sub-ACN, but

its largest sub-ACN is smaller than the largest one of the design rule design, and the performance is

a little better.


Table 9.2: Performance for WineryLocator DR Modelsub- Size CN Solving DA GenerationACN (seconds) (seconds)

1 8 20 < 12 6 7 < 13 6 6 < 14 5 6 < 15 8 20 < 16 8 36 27 8 27 < 18 9 51 29 2 3 < 1

10 8 21 < 1

Table 9.3: Performance for WineryLocator AO Modelsub- Size CN Solving DA GenerationACN (seconds) (seconds)

1 7 11 < 12 6 8 < 13 7 5 < 14 5 5 < 15 7 11 < 16 7 18 < 17 7 28 < 18 8 40 29 7 19 < 1


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 280:MapPoint .1:WineryFind .2:ApacheAXIS .3:Servlet .4:HttpSessionBindingListener x .5:MapPointDesignRules x x .6:WineryFindDesignRules x x .7:startAddress_Address .8:matchesAddress .9:WinerySearchOption .10:Tour .11:MapOperation .12:AddressLocator_interface .13:AddressLocator_impl x x x x x x .14:WineryFinder_interface .15:WineryFinder_impl x x x x x x .16:RouteMapHandler_interface .17:RouteMapHandler_impl x x x x x x .18:startWineryFind_interface .19:startWineryFind_impl x x x x x . x20:searchWinery_interface .21:searchWinery_impl x x x x x . x22:tour_interface .23:tour_impl x x x x x x . x24:directions_interface .25:directions_impl x x x x .26:WebServicesLogger_impl .27:aspect_Logging x x x x x x .28:aspect_Authentication x x x x x .

Figure 9.2: WineryLocator Aspect-Oriented Design

9.1.3 Comparative Results

In order to make their manual and our derived DSMs comparable, we cluster our derived DSMs so

that they appear in the same order and same names as that of their manual DSMs. For example, we

cluster tour sig and tour impl together as a module named tour, mapping to the variable with the

same name in their DSMs.

Figure 9.2 shows the AO design DSM generated and clustered by Simon. The cells with black

backgrounds are the marks that are missing in their manual DSMs. Figure 9.3 shows the DSM for

the DR design generated from Simon. Figure 9.4 shows a clustered DR DSM for comparison with

their manual DSMs. Tracing down the differences exposed several interesting issues, showing the

advantages of our formal model and automatic tool.

First, our derived DSMs reveal many indirect dependencies not shown in their manual ones.

For example, they chose MapPoint as their major library, which influences many other decisions.


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 320:MapPoint .

1:ApacheAXIS .2:WineryFind .3:Servlet .4:HttpSessionBindingListener x .5:MapPointDesignRules x .6:WineryFindDesignRules x x .7:startAddress_Address .8:matchesAddress .9:WinerySearchOption .10:Tour .11:MapOperation .12:WebServicesLogger_sig .13:WebServicesLogger x .14:AddressLocator_sig .15:AddressLocator x x x x x x .16:AuthAddressLocator_sig .17:AuthAddressLocator_impl x x x x x . x18:WineryFinder_sig .19:WineryFinder x x x x x x x .20:RouteMapHandler_sig .21:RouteMapHandler x x x x x x .22:AuthRouteMapHandler_sig .23:AuthRouteMapHandler_impl x x x x x . x24:startWineryFind_sig .25:startWineryFind_impl x x x x x x . x26:searchWinery_sig .27:searchWinery_impl x x x x x x . x28:tour_sig .29:tour_impl x x x x x x x . x30:directions_sig .31:directions_impl x x x x .32:web_xml x x x x .

Figure 9.3: Derived WineryLocator Design Rule DSMs

However, in their DSMs, only one module depends on it. Although a higher order matrix might re-

veal indirect dependences, these indirect dependences are not accounted for in Baldwin and Clark’s

value model, which they are using [7]. By contrast, the derived DSMs yield defensible estimates of

the total impact of given local changes in design.

Second, the dependence definition in the manual DSM modeling is ambiguous, making the

manual DSMs hard to understand. We take three design dimensions, startWineryFind, Address-

Locator, and AuthAddressLocator, as examples. The first is a function making use of the service

provided by the second to locate addresses. The third inherits from the second and extends it with

authentication functions. While our derived DSMs show that the first depends on the other two, their

manual DSMs indicate only a dependence of startWineryFind on AuthAddressLocator, but not on

AddressLocator, despite the fact that AddressLocator interface changes affect the startWineryFind

function directly.


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220:MapPoint .1:WineryFind .2:ApacheAXIS .3:Servlet .4:HttpSessionBindingListener x .5:MapPointDesignRules x .6:WineryFindDesignRules x x .7:startAddress_Address .8:matchesAddress .9:WinerySearchOption .10:Tour .11:MapOperation .12:WebServicesLogger .13:AddressLocator x x x x x .14:AuthAddressLocator x x x x . x15:WineryFinder x x x x x x .16:RouteMapHandler x x x x x .17:AuthRouteMapHandler x x x x . x18:startWineryFind x x x x x . x19:searchWinery x x x x x . x20:tour x x x x x x . x21:directions x x x .22:web_xml x x x x .

Figure 9.4: Collapsed WineryLocator Design Rule Design

We understand the reason for this discrepancy after discussing it with the authors to learn exactly

how the system is implemented. The dependence between startWineryFind and AuthAddressLoca-

tor is because of usage: the former is a jsp page using the latter as a Javabean. Since startWineryFind

doesn’t refer to AddressLocator directly, the authors did not mark them as dependent. The usage

and inherits relations are different, so using transitive operations to find this missing dependence

doesn’t seem to be proper. By contrast, our framework provides an exact semantics of dependence:

a change in one design decision causes revisitation and revision of the other. Using this definition,

the missing dependences are discovered directly.

Finally, the authors modeled third-party services, such as MapPoint, as environment parame-

ters. However, we understand environment conditions as those that are likely to change and drive

software evolution. For example, the user interface could be either web-based or GUI application

based on Java Swing. They mentioned this as a possible change in the paper, but didn’t model and


analyze it. These discrepancies imply potential problems in their later quantitative analysis, but we

don’t go further in this dimension.

9.2 HyperCast

HyperCast [48, 47] is an independently developed project. It is a scalable, self-organizing overlay

system developed using Java. Viewing overlay sockets as nodes, HyperCast integrates these nodes

into ad hoc networks, and provides network services, including naming, reliable transport and net-

work management. An overlay socket supports peer-to-peer and multicast communication within

these networks. We used HyperCast as a subject in our work applying the design rule concepts of

Baldwin and Clark to meet the need for a new kind of crosscutting interface to decouple aspects

from the code they advise [63].

We developed two methods to modularize the scattered logging code in the original object-

oriented (OO) design. One method is to obliviously add logging aspects, as aspect programs often

do. Another method is to insert interfaces that carry design rules to decouple aspects from the code

they advise. We used DSMs to represent these three designs, and to quantitatively evaluate the

resulting design structures, following the methods of Baldwin and Clark as adapted to software by

Sullivan et al. [64] and Lopes et al. [49].

However, we had to spend a great deal of time producing and correcting DSM models for

this study. In order to ease the DSM-based coupling structure analysis and enable design impact

analysis, we modeled the three designs as ACNs and tried to use Simon to analyze them as a whole.

However, we were not able to get any results within a reasonable amount of time, until we improved

Simon with the decomposition capability.

9.2.1 ACN Modeling

The HyperCast design has the following main dimensions: Socket—the overlay socket API,

Protocol—the available protocols, Monitor—the network management capability, Service—the set

of services, Adapter—a layer virtualizing the underlying networks, Event Handling, and Logging.


Each dimension leads to several design variables. For example, Adapter dimension is

modeled by specification adaptor_spec, interface adaptor_interface, and implementation

adaptor_impl. Event Handling and Logging are crosscutting concerns. Events include proto-

col event and service event. Logging has the following aspects: info logging, exception logging,

non exception logging. In the aspect-oriented (AO) designs, we add prefix “ao_” before these

logging variables. The domain of each variable contains its available choices. We assume that each

dimension has some other unelaborated choices.

Figure 9.5 shows the constraint network modeling the OO design. There are three types of

constraints in this system. First, an implementation depends on interfaces, Line 1 to Line 6. Second,

an implementation should fulfill the corresponding specification or policies, Line 7 to Line 17.

Third, the dimensions make use of each other, Line 18 to Line 42.

We view specifications as environment variables and interfaces as design rules. We assume that

nothing could affect environment variables, and implementations cannot influence design rules. The

dominance relation is generated accordingly. For example, (service_impl, service_spec) is a

member of the dominance relation.

The three HyperCast designs are modeled using three ACNs. The ACN modeling the original

OO design (OO ACN) has 29 variables; the ACN modeling the oblivious AO design (oblivious

ACN) has 25 variables; and the ACN modeling the DR AO design (DR ACN) has 33 variables.

9.2.2 Modular Analysis Results

Table 9.4 shows the decomposed OO sub-ACNs and their performance. Each sub-ACN concen-

trates on one major task: the protocol sub-ACN has 13 variables, adapter sub-ACN has 8 variables,

service sub-ACN has 13 variables, socket sub-ACN has 8 variables, event sub-ACN has 5 variables,

and monitor sub-ACN has 11 variables. Each sub-ACN has several logging-related variables and

constraints.

The oblivious ACN is decomposed into 5 sub-ACNs, as shown in Table 9.5. The dimensions

related to one kind of logging are aggregated together. For example, the sub-ACN with 9 variables

includes the service and protocol dimensions where information logging is requested, as well as the


1: protocol_impl = orig => protocol_interface = orig;2: service_impl = orig => service_interface = orig;3: socket_impl = orig => socket_interface = orig;4: monitor_impl = orig => monitor_interface = orig;5: adapter_impl = orig => adapter_interface = orig;6: event_impl = orig => event_interface = orig;

7: protocol_impl = orig => protocol_spec = orig;8: service_impl = orig => service_spec = orig;9: socket_impl = orig => socket_spec = orig;10: monitor_impl = orig => monitor_spec = orig;11: adapter_impl = orig => adapter_spec = orig;12: event_impl = orig => event_spec = orig;13: exception_logging = orig => exception_logging_policy = orig;14: non_exception_logging = orig => non_exception_logging_policy = orig;15: protocol_event = orig => protocol_event_policy = orig;16: service_event = orig => service_event_policy = orig;17: info_logging = orig => info_logging_policy = orig;

18: protocol_impl = orig => socket_interface = orig;19: protocol_impl = orig => event_interface = orig;20: protocol_impl = orig => protocol_event = orig;21: protocol_impl = orig => info_logging = orig;22: service_impl = orig => socket_interface = orig;23: service_impl = orig => event_interface = orig;24: service_impl = orig => service_event = orig;25: service_impl = orig => info_logging = orig;26: monitor_impl = orig => protocol_interface = orig;27: monitor_impl = orig => service_interface = orig;28: monitor_impl = orig => adapter_interface = orig;29: monitor_impl = orig => socket_interface = orig;30: adapter_impl = orig => socket_interface = orig;31: event_impl = orig => protocol_interface = orig;32: event_impl = orig => service_interface = orig;33: protocol_impl = orig => exception_logging = orig;34: protocol_impl = orig => non_exception_logging = orig;35: service_impl = orig => exception_logging = orig;36: service_impl = orig => non_exception_logging = orig;37: socket_impl = orig => exception_logging = orig;38: socket_impl = orig => non_exception_logging = orig;39: monitor_impl = orig => exception_logging = orig;40: monitor_impl = orig => non_exception_logging = orig;41: adapter_impl = orig => exception_logging = orig;42: adapter_impl = orig => non_exception_logging = orig;

Figure 9.5: HyperCast OO Constraints


Table 9.4: Performance for HyperCast OO Modelsub- Size CN Solving DA GenerationACN (seconds) (seconds)

1 8 21 < 12 5 6 < 13 11 152 34 13 340 45 13 339 46 7 10 < 1

Table 9.5: Performance for HyperCast Obliviousness Modelsub- Size CN Solving DA GenerationACN (seconds) (seconds)

1 17 1106 402 9 22 < 13 17 1107 584 6 6 35 6 6 < 1

information logging function itself. The sub-ACNs with 17 variables are the slowest, indicating a

suboptimal aggregation. Taking a closer look at these large sub-ACNs, we found that a lot of vari-

ables replicate in both sub-ACNs, due to the “oblivious” exception logging aspects: these aspects

actually depend on the implementations of many other functions, and as a result, our decomposition

algorithm aggregates these aspects with all the functions that they are advising.

The DR ACN is decomposed into 10 sub-ACNs, with 7, 3, 7, 4, 6, 3, 15, 11, 8,and 6 vari-

ables respectively. Each sub-ACN addresses one of the ten dimensions: adapter, protocol, service,

monitor, socket, info logging, exception logging, non exception logging, protocol event and ser-

vice event. We observe that although the DR ACN has more variables, the key design rules we

added enable us to decompose the system with greater granularity, and the speed to get its DSM is

the fastest. At this time, these aspects only depend on the design rules, but not the implementation

of many functions. As a result, large sub-ACNs in the oblivious AO design are removed.


Table 9.6: Performance for HyperCast DR Modelsub- Size CN Solving DA GenerationACN (seconds) (seconds)

1 7 8 32 3 2 < 13 7 18 < 14 4 3 < 15 6 9 26 3 2 < 17 15 332 38 11 78 29 8 13 1

10 6 6 < 1

9.2.3 Comparative Results

Figure 9.6 and 9.8 are the DSMs Simon generated by decomposition and combination for the orig-

inal OO design and oblivious AO design. We do not present the design rule DSM here because it

is exactly same as the DSM we produced manually in [63]. We now address the differences of the

derived OO and oblivious AO DSMs from their manual versions.

Compare Figure 9.6 with Figure 9.7, the manual OO DSM in our previous work [63]. The

difference is that we previously replicated the same variables with crosscutting effects, such as

info_logging, into several different lines, as though they are different variables. This is error-

prone because if the dependences of these variables are changed, the user has to remember where

these variables are scattered. The derived DSM shows the crosscutting effects by the off-block

dependences.

Comparing the manual and derived DSMs for the oblivious design shows that in the manual

version, logging aspects do not depend on the functional specifications. However, these depen-

dences should exist as shown in the derived DSM because the implementation changes are very

likely caused by the changes in the specification, and the implementation changes influence the

aspects.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

1:protocol_spec .2:service_spec .3:socket_spec .4:monitor_spec .5:adapter_spec .6:event_spec .7:protocol_event_policy .8:service_event_policy .9:info_logging_policy .10:exception_logging_policy .11:non_exception_logging_policy .12:protocol_interface .13:service_interface .14:socket_interface .15:monitor_interface .16:adapter_interface .17:event_interface .18:protocol_impl x x x x x x x x . x x x x19:service_impl x x x x x x x x . x x x x20:socket_impl x x x x . x x21:monitor_impl x x x x x x x x . x x22:adapter_impl x x x x x . x x23:event_impl x x x x .24:protocol_event x .25:service_event x .26:info_logging x .27:exception_logging x .28:non_exception_logging x .

OO Interfaces

Functional Implementations

��

��

MainFunctions

loggingfunction

Figure 9.6: HyperCast OO Derived DSM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 371.protocol_spec2.service_spec3.socket_spec4.monitor_spec5.adapter_spec6.event_spec7.protocol_event_policy8.service_event_policy9.info_logging_policy10.exception_logging_policy11.non_exception_logging_policy12.protocol_interface13.service_interface14.socket_interface15.monitor_interface16.adapter_interface17.event_interface18.protocol_impl x x x x x x x x19.protocol_event_impl x x x x x20.info_logging_impl x x x x x21.exception_logging_impl x x x x x22.non_exception_logging_impl x x x x x23.service_impl x x x x x x x x24.services_event_impl x x x x x25.info_logging_impl x x x x x26.exception_logging_impl x x x x x27.non_exception_logging_impl x x x x x28.socket_impl x x x x29.exception_logging_impl x x x30.non_exception_logging_impl x x x31.monitor_impl x x x x x x x x32.exception_logging_impl x x x33.non_exception_logging_impl x x x34.adapter_impl x x x x x35.exception_logging_impl x x x36.non_exception_logging_impl x x x37.event_impl x x x x

��

��

��

��

��

��

��

Figure 9.7: HyperCast OO Manually-Constructed DSM [63]


� � � � � � � � � ��

1:protocol_spec .2:service_spec .3:socket_spec .4:monitor_spec .5:adapter_spec .6:protocol_event_policy .7:service_event_policy .8:info_logging_policy .9:exception_logging_policy .10:non_exception_logging_policy .11:protocol_interface .12:service_interface .13:socket_interface .14:monitor_interface .15:adapter_interface .16:protocol_impl x x x .17:service_impl x x x .18:socket_impl x x .19:monitor_impl x x x x x x .20:adapter_impl x x x .21:ao_protocol_event x x x x x .22:ao_service_event x x x x x .23:ao_info_logging x x x x x x x x .24:ao_exception_logging x x x x x x x x x x x x x x x x .25:ao_non_exception_logging x x x x x x x x x x x x x x x x .

Aspects

OOImplementation

s

OO Interfaces

MainFunctions

loggingfunction

Figure 9.8: HyperCast AO DSM

9.3 Galileo

We developed a design model of the Galileo dynamic fault tree analysis tool [66, 65, 24]. Galileo

has about 35,000 lines of C++ code excluding library and generated code. We designed and im-

plemented this system, but its design was independent of the work presented in this paper. It has

evolved continually over eight years, with hundreds of files and many student developers. During

maintenance and feature enhancement stages, we found several problematic issues. First, intu-

itional memory of important but implicit crosscutting decisions was lost over time. During the

feature enhancement stage, developers of these features were not aware of these constraints, and

invalid designs were proposed. Second, it is hard to justify different design refactoring proposals.

We modeled these historical situations as CACNs using Simon, and found that our analysis

would allow the designer to see constraints that have to be respected when making a change, and

that Simon provides quantitative analysis results that are consistent with our earlier refactoring

decisions. This section presents two models and shows the analysis enabled by Simon.


1: set spec_elements(orig, other):(v1{andGate, orGate, pandGate}, other);

2: scalar dr_core: (no_mfc, other);3: scalar dr_visio: (constant_time, other);

4: set module_core(orig, other): %spec_elements;5: set module_word(orig, other): %spec_elements;6: set module_visio(orig, other): %spec_elements;7: %spec:spec_elements, core:module_core,

word:module_word, visio:module_visio% |core = orig => spec = orig,

word = orig => spec = orig && core = orig,visio = orig => spec = orig && core = orig;

8: ˜module_core = orig => dr_core = no_mfc;9: ˜module_visio = orig => dr_visio= constant_time;

Figure 9.9: Galileo Design Rules CACN

9.3.1 Model Design Rules and Features

Figure 9.9 shows a CACN modeling a number of features and design rules of Galileo. A fault tree is

composed of a number of elements, such as Gates and Events. These elements and their properties

were specified in requirement and specification documents. We model such specifications as a

SDV spec_elements, shown in Line 1. We only model three types of Gates for the purpose of

illustration.

Each element required by the user brings into being a corresponding module containing core

data structures representing it and a set of operations working on it. All the core modules, mod-

eled by a SDV module_core, should be independent from other modules since it is supposed to

be portable. Each fault tree element should have a textual Word representation and a graphic Visio

representation. Similarly, we use SDVs module_word and module_visio to model the view mod-

ules containing both the representations and the operations on the respective views. Line 4 through

Line 6 model these dimensions and their correspondence relations with spec_elements.

During the maintenance stage, we recovered some design decisions that the chief architect made

at the beginning of the project. The new developers were not aware of these decisions and tended to

violate them in implementing new features. For example, the core fault tree data structure shouldn’t


assume the presence of the Microsoft Foundation Classes (MFC), yet the implementation makes

use of CString and MFC message boxes as error prompt methods.

Another constraint that was violated was that the visual operations of a fault tree should not

require a linear-time or costlier traversal. This constraint is needed, given Visio’s performance

degradation when traversing large drawings, to ensure that users receive interactive response times

even when editing large fault trees. The design rule dictates that no function may incur more than a

constant-time query of a Visio depiction.

Failure to respect these constraints had incurred costs during maintenance. When we tried

to separate the core data structures to implement fault tree analysis as a web service on a Unix-

based platform, we found that we had to spend much time rewriting the MFC dependent part. To

implement an enhanced error reporting capability, two new maintainers planned to implement a

function to analyze the entire graphical depiction of a fault tree presented by Microsoft Visio. Such

a function would have violated the constant time design rule.

We realized that these design rules are important, but they were only represented in the mind of

the chief architect. The developers who had once understood them had all left (graduated). Without

a modeling and analysis approach in which such constraints could be revealed automatically, there

is a significantly higher risk that such constraints will be forgotten over time and be violated again.

In a framework such as ours, the constraint could be represented as a logical expression quantified

over an explicitly modeled set of the algorithms that manipulate the given representation. We

use module_visio here for simplicity. These design rules are modeled as two design variables,

dr_core and dr_visio, in Line 2 and Line 3 of Figure 9.9. Their prevailing effects are thus

modeled as universally quantified logic shown in Line 8 and Line 9, which dictates that every core

module should respect the core design rules, and every visual module should respect the constant

time rule.

9.3.2 When A New Feature is Added

During the feature enhancement stage, we proposed several approaches to implement a common

cause gate (CCG) feature that involves, among other things, new operations on the visual depiction


(A) Galileo with Design Rules (B) Galileo with the New CCG Feature

Figure 9.10: Galileo: Design Rules and New Features

of a fault tree. The chief architect brought up the constant-time constraint during our discussion.

As a result, one of the mechanisms proposed by the feature designers, who were not aware of this

decision before, was found to be unusable despite its other merits, because it had to traverse the

entire visual representation.

Now we use Simon to analyze the impact of adding the CCG feature. We first make

spec_elements=v1{andGate, orGate, pandGate} and let Simon generate an ACN and derive

its DSM, as shown in Figure 9.10 (A).

After that, we change the original CACN by adding a new value,

v2{andGate, orGate, pandGate, ccgGate},

to the domain of the variable spec_elements, modeling the fact that a new gate is required in

version 2. Then we make spec_elements=v2, and let Simon generate the new ACN and its DSM,

as show in Figure 9.10 (B).

Comparing the two ACNs and DSMs, we observe that adding the CCG feature requires

adding a CCG core module, a CCG word module, and a CCG visio module. The DSM shows

that the CCG gate core design, ccgGate_module_core, should respect the CCG specification,

ccgGate_spec_elements, as well as the design rule dr_core. Simon provides a view so that the

user can see these dependences clearly. Similarly, it shows that the CCG Visio designer should

respect the Visio design rules. Design representations and analysis of the kind we propose here,

as supported by Simon, both record and highlight the constraints that have to be respected when


1: set views(orig, other):(v1{word, visio}, other);2: set errors(orig, other):(v1{syntax, semantics}, other);3: scalar MarkSequence:(orig, other);4: subspace ErrorHandling: (option1, option2, option3, option4);

5: ErrorHandling_option1 [˜views=orig => ˜errors=orig && MarkSequence=orig;];6: ErrorHandling_option2 [˜errors=orig => ˜views=orig && MarkSequence=orig;];7: ErrorHandling_option3 [8: set markers(orig, other): %errors * %views;9: ˜markers=orig => MarkSequence=orig;10: ˜errors=orig => ˜markers=orig && ˜views=orig;11: ];12: ErrorHandling_option4 [13: scalar ErrorToMark:(orig, other);14: set markers(orig, other): %errors * %views;15: ˜markers=orig => MarkSequence=orig;16: ErrorToMark=orig => ˜errors=orig && ˜markers=orig && ˜views=orig;17: ]

Figure 9.11: Galileo Error Handling Refactoring

making such a change.

9.3.3 Error Handling Options

Error handling was suggested to be refactored to be more consistent and descriptive. Two di-

mensions involved in error handling were support for multiple error types (e.g., syntax error and

semantics error), and support for multiple views (Word97, Visio5). According to different types

of errors, the error handling module should mark the views where an error happens, jump to the

error point, give messages, and clear the marks once the error is corrected. We call these actions a

marking sequence, modeled by MarkSequence.

Four refactoring mechanisms were proposed in relation to the addition of sophisticated error

handling to Galileo. We faced the problem of choosing the best one. We modeled this decision

with an HDV called ErrorHandling. Each option is modeled as a subspace value, as depicted in

Figure 9.11.

The first option requires that each error object knows in which view an error happens, and

implements the marking sequence. The second option is symmetric to the first one, requiring that

each view knows what type of error happened, and that it then implements the marking sequence.


Prototypes were built for these options and the designers realized that the marking sequence was

complex and followed the same pattern (crosscutting), which made the code hard to understand. As

a result, these options were abandoned despite their straightforwardness and ease of understanding.

We came to a point where a marker class, modeled by markers, is designed to take the responsibility

of implementing a marking sequence according to different combinations of error and view types.

As this time, the third and fourth designs, both with a marker class, were proposed. We at-

tempted to figure out which one was better. The major difference of the third and fourth options

was in who deciding which error happened in which view and in invoking the corresponding marker

object. The third option required each error objects to take this responsibility and the fourth option

demanded a new class, ErrorToMark, to do the job, as shown in Line 16 Figure 9.11.

9.3.4 Select the Best Refactoring Mechanism

To make a rational decision, we first compare the different coupling structures these options would

incur, and then envision a possible change to see the different consequences.

9.3.4.1 Inspect Different Coupling Structures

Figure 9.12 shows the DSMs Simon generates for each of the four error handling decisions. By

comparison, we can tell that options 1 and 2 are simpler in the sense that they involve fewer design

dimensions. However, their DSMs show many crosscutting dependences. Option 3 expands the

design space into more dimensions, while retaining many crosscutting dependences. Option 4

appears to be the best in terms of its coupling structure: although it has one more dimension than

option 3, it has the fewest dependences.

9.3.4.2 Change Impact

Now we model another possible feature change to analyze change impact. One envisioned change is

to add new views to the system, for example, an Excel view and an XML view, modeled by adding

a new value, v2{word, visio, excel, uml}, to the domain of evnr_views, and specifying its

value as v2.


(A) Error Handling Option 1 (B) Error Handling Option 2

(C) Error Handling Option 3 (D) Error Handling Option 4

Figure 9.12: Design Structures using Different Error Handling Options

(A) Error Handling Option 3 with New Views (B) Error Handling Option 4 with New Views

Figure 9.13: Add New Views based on Different Error Handling Options


After the ACNs and DSMs with new features are generated for each error handling option, we

compare them with the ones without the new features. We observed that although the design spaces

are expanded similarly in all cases, the design using option 4 is the only one that has the least

increment in dependences. Reflecting back to implementation, the major difference is that when

new views are added, options 1, 2, and 3 require changing multiple places. For example, according

to Figure 9.13, for the design using option 3, syntax_errors and synmantics_errors are going

to be changed. The option 4 design requires only one additional change to ErrorToMark. This

analysis quantitatively validates the choice that we actually made: to use the fourth option.

9.4 Chapter Summary

In summary, this chapter has evaluated the hypotheses that our modeling and analysis approach gen-

eralizes beyond the set of small models, and is applicable to systems and to modeling and analysis

experiments that are (a) beyond those used in developing the approach, (b) beyond those developed

by the authors of the approach, and (c) to modeling and analysis of design models for real system.

We have modeled three real designs people have analyzed and automated these analyses, and our

experiments support the claim that our approach generalizes in these dimension: our framework is

expressive enough to capture varieties of design phenomena uniformly and analyze these problems

automatically, confirming previous results or revealing errors in them precisely and quantitatively.

These experiments and results constitute an important first step needed to justify a next more ex-

pensive step, that is, studies of the utility of the approach and supporting tools in a real design

setting.

Chapter 10

Evaluation of this Research

In this chapter, we first summarize how well the thesis of this dissertation is supported by the

evidence and analyses presented. After that, we evaluate the novelty, potential, as well as the

shortcomings and remaining problems in the proposed approach. Finally, we evaluate this work in

terms of its potential to lead to significant results in the future.

10.1 Thesis and Evidence


decisions in a rational way, facilitated by automatic tools. The purpose of this dissertation is to

provide a formal analyzable design modeling framework, as one fundamental step towards this

goal. This dissertation claims and evaluates the following thesis:

• This framework provides a formal account of the key concepts of important but informal

modularity theories. (1) It formalizes Baldwin and Clark’s key notions of design dimension,

design decision, design decision dependence, and design space. (2) It formally accounts for

Parnas’s concept of information hiding modularity as a mechanically checkable predicate.

• This framework enables the derivation of design coupling structures in the form of pair-wise

relations on design decisions, and thus also the derivation of DSMs from ACNs. The benefit

is that the approach enables designers to reason about modularity in design architecture using

165

Chapter 10. Evaluation of this Research 166

both the methods of Baldwin and Clark (but in terms of an abstract and formally precise

representation), as well as new kinds of analysis.

• This framework automates basic evolvability analyses such as design impact analysis. Given

a sequence of changing decisions or conditions, this framework computes how many ways

are there to accommodate these changes, and how many decisions should be reconsidered in

each way.

• Our model of modularity in design is general. In particular, it can account for both traditional

object-oriented notions of modularity and newer aspect-oriented notions within a unified,

declarative framework.

Chapter 7 has evaluated the first element of our thesis. In that chapter, we have formally ac-

counted for the important concepts of Baldwin and Clark’s theory within the settings of our core

models; formally defined the semantics of Information Hiding Modularity, and formally defined

the design impact analysis problem and its formal solution.

Our evaluation strategy for the analysis part of the thesis includes three parts: (1) we formally

model software designs for which people have analyzed problems that have strong economic im-

plications, (2) automate these analyses using Simon, and (3) compare the results with the previous

qualitative analysis results.

We first evaluated the thesis against two canonical designs. Chapter 4 has presented the mod-

eling and analysis of the famous software engineering benchmark, Key Word in Context (KWIC).

Chapter 6 has presented the modeling and analysis of the widely used Figure Editor (FE) exam-

ple [43, 37, 33]. These designs are widely used in a large number of publications, representing dra-

matically different design paradigms: KWIC represents functional and object-oriented designs; the

Figure Editor design manifests broader contemporary design phenomena, such as design patterns

and aspect-oriented programming. Our experiments have shown that our framework is expressive

enough to model these design phenomena uniformly. We used Simon to automatically analyze the

problems people previously analyzed qualitatively or manually. Our analysis results either confirm

previous results or reveal errors in them precisely and quantitatively.


We also evaluated the generalizability of our framework in terms of three real designs in

Chapter 9: the modeling and analysis of a web application developed and studied by Lopes et

al. [49] (WineryLocator); the modeling and analysis of a peer-to-peer networking system, Hyper-

Cast [48, 47], developed by the network researchers in the University of Virginia and studied by

Sullivan et al. [63]; and the modeling and analysis of the Galileo dynamic fault tree analysis tool,

developed at the University of Virginia for production use at NASA [66, 65, 24]. In the first two

designs, the authors use Baldwin and Clark’s modeling and analysis technique to quantitatively

compare different designs based on manually-constructed DSM models. We represent these de-

signs as ACNs according to their design descriptions, generate DSM models, and compare with

their manual models. The comparisons reveal ambiguities and problematic issues in the manual

models that the authors used to compute NOV values, which implies potential problems in their

quantitative results. The Galileo designers once faced a situation when they had to make a decision

about how to restructure part of the system. They reached a decision based on discussions and argu-

ments, rather than rigorous analysis. Modeling and analyzing this historical scenario using Simon

suggests that the designers might have been able to compare different decisions comprehensively

and to justify their decision rationally, had they had the benefits of a tool such as Simon.

In summary, we have achieved the goals set forth and the evidence and analyses presented have

supported our theses.

10.2 Novelty and Potential

Our contributions appear novel in several dimensions. First, the formal account of Parnas’s influen-

tial but informal information hiding principle enables the rigorous and automatic application of his

analysis. Second, by formalizing key notions of Baldwin and Clark’s design rule theory, we put a

new emphasis on the design of design spaces and their underlying coupling structures, as opposed

to the design of individual points in design space, the focus of most current design methods. Third,

our formalization of dependence in constraint networks appears to be novel. Fourth, the DA model

captures the complex ways in which changes can be accommodated in real systems. Finally, the


provision of a formal basis for dependence markings in DSMs, in principle, imports design analysis

techniques [25, 62] developed around DSMs into software design.

Our work has potential in several areas. First, it has potential to support a formal abstract

theory of modularity in design, and, eventually, to contribute to a value-based theory of architectural

design. Second, it has the potential to connect software design with existing economic models and

analysis, such as Baldwin and Clark’s modular operators, to provide a scientific basis of value-

oriented decision-making. Third, the work has the potential to help designers estimate the cost and

benefits of high consequence decisions in practice, such as the decision to refactor or to add a new

feature.

10.3 Limitations and Remaining Problems

However, this work is still at the emerging stage, leaving many issues open, and having its limita-

tions and shortcomings.

First, one problem is that the sizes of design spaces are exponential in the number of variables,

in general. Although our decomposition approach has alleviated the problems we have encountered

in our case studies, the scales of the designs we have studies is still relatively small. We haven’t had

a chance to evaluate the whole approach in large-scale complex designs.

Second, we haven’t had a chance to collaborate with practitioners to empirically assess both the

scalability issue, and how difficult it is to use our framework.

Third, the language Simon currently uses is not a mature logical language yet. We seek to

evaluate and develop the language and the tool while we investigate the modeling and analysis of

real problems in practice.

Finally, Simon uses Alloy as its underlying SAT solver, which incurs unnecessary overhead of

translating our models into Alloy specifications. In addition, Alloy is not designed for the purpose

we are using it, which contributes part of the performance issues. Employing a mature SAT solver

directly is among our plans.


10.4 Challenges and Open Questions

We identify the following challenges and open questions regarding to the ultimate application of

this framework in practice.

First, modeling with ACNs or CACNs requires abstraction. Deciding what to model is not

always easy. For example, HyperCast includes hundreds of files and tens of thousands lines of

code. We modeled it using about 30 variables. We found such modeling difficult at first, but easier

as we gained some experience.

Second, we are current using finite-domain constraint networks as the basis of our modeling and

analysis techniques. There might be cases that require more complex constraint models, such as

linear or quadratic equations. We might extend our model in the future to address such requirements.

Finally, our framework is based on the perspective that design is a decision-making proce-

dure, and both ACNs and CACNs model decisions and the relation among these decisions. This

decision-based model is dramatically different from the traditional program-based design models,

such UML. This discrepancy might present difficulty in the application of our framework.

10.5 Future Work

We find this framework is general enough to connect with various software development stages,

and our future work will seek to explore, develop, and extend our framework for the power of

description, prediction, and prescription.

10.5.1 Between Design and Value

We are still in need of models that can scientifically account for the economic value of important

design structures and activities, and provide the basis for economic-oriented decision making. We

are currently collaborating with Carliss Baldwin from the Harvard Business School to explore the

relation between design refactoring and the value variation caused by this activity.

In addition, extending design impact analysis to support cost modeling would allow one to find

the least expensive way to accommodate a given sequence of changes in a design. As introduced in


Chapter 1, this work is motivated by a question from an industry practitioner: “Given the necessity

to keep our feature delivery velocity, is it worthwhile investing in refactoring, as my engineers

suggested?” Our framework proposes a solution to such a problem, which has the following key

elements: (1) developing CACN models at a suitably high level of abstraction; (2) formulating

an expected evolutionary scenario as a sequence of changes, or perhaps as a stochastic process

generating change requests; (3) measuring the cost of change in both cases; (4) accounting for the

switching cost to get from the current to the proposed new design. We plan to further evaluate and

develop this idea.

10.5.2 Between Design and Code

Under the pressure of project deadlines, projects often sacrifice design architecture and plunges

to implementation prematurely. The problem is not that the design stage is ignored. It is that

current design modeling and analysis techniques do not support fast and automated design evolution

modeling and analysis. Software evolution should start with comprehensive consideration of the

costs and benefits based on current and proposed design representations. These analyses should

enable designers to select an optimal way to evolve the project, quickly and automatically. On the

other hand, legacy code presents difficulties in many companies. Recovering designs from source

code is important to save previous investments. Part of our future work is to explore approaches to

extracting logical design models from source code, taking it as a subset of the full design, combining

it with high level design models, and forming a full picture.

We found the potential utility of this framework in many dimensions, and look forward to

continuing the exploration.

Chapter 11

Conclusion

To address the problem that current design representations are not sufficient to enable designers

to reason about design structures and their economic properties, this dissertation contributes an

analyzable design modeling framework that supports formal design modeling, formalizes important

design concepts and approaches, and automates a number of economic-related analyses.

This framework consists of a design description model called an augmented constraint network

(ACN) to model design decisions and external conditions in a general way, an intermediate opera-

tional design space model derived from an ACN model, which we call a design automaton (DA),

to connect a conceptual design with its economic-related properties, and a pair-wise dependence

relation (PWDR) derived from a DA, to support design coupling structure analysis.

This framework provides a formal account of the key concepts of important but informal mod-

ularity theories: (1) it formalizes Baldwin and Clark’s key notions of design dimension, design

decision, design decision dependence, design space, and design rule. (2) It formally accounts for

Parnas’s concept of information hiding modularity, formalizing this principle as a predicate me-

chanically checkable by tools.

The supporting tool, Simon, enables the following analyses for conceptual designs: (1) Par-

nas’s changeability analysis can be done automatically and quantitatively; (2) design structure ma-

trices (DSMs) can be derived automatically from conceptual software designs; and (3) Baldwin and

Clark’s net option value analysis based on DSM modeling can be calculated automatically.

Scalability is a common issue for many formal models depending on constraint solving, includ-

171

Chapter 11. Conclusion 172

ing our model. We create a method to decompose a large ACN model into a number of smaller ones

to help with the scalability issue that a large ACN encounters. We have observed dramatic perfor-

mance improvement. To model and analyze complex design decisions with structural impacts, we

extend the ACN model into a complex augmented constraint network (CACN) to support structural

design impact analysis.

This framework has been evaluated against both canonical software design benchmarks and real

software designs. The evaluation shows that: (1) this framework is expressive enough to capture

a variety of design decision-making phenomena, such as object-oriented design, aspect-oriented

design, and design patterns; and (2) it has the ability to automate the analysis of a number of

economic-oriented problems that people previously analyzed manually or qualitatively, and the

results either confirm the previous ones, or reveal errors in them, showing the power of formal

models and automated analyses.

In summary, we have contributed a general framework that formally accounts for the key con-

cepts of important but informal modularity theories, enables the automation of basic evolvability

analysis such as design impact analysis, and enables the derivation of design coupling structures

in the form of pair-wise relations on design decisions, and thus also the derivation of DSMs from

ACNs. We also contribute a prototype tool, Simon, to support the modeling and analysis techniques.

Simon has proved the concepts developed in the framework, showing that it is possible to use this

tool-supported framework to model software designs and analyze evolvability and economic-related

properties with reasonable performance.


decisions in a rational way, facilitated by automatic tools, which involves future work in a number

of dimensions, such as the further explorations of applicable economic models, further develop-

ment of the tool, and the connection of this framework with different development stages, such as

specifications and source code analyses.

Bibliography

[1] Gregory D. Abowd, Robert Allen, and David Garlan. Formalizing style to understand descrip-

tions of software architecture. ACM Transactions on Software Engineering and Methodology,

4(4):319–64, October 1995.

[2] Christopher W. Alexander. Notes on the Synthesis of Form. Harvard University Press, 1970.

[3] Martha Amram and Nalin Kulatilaka. Real Options: Managing Strategic Investment in an

Uncertain World. Oxford University Press, USA, Dec 1998.

[4] Robert Arnold and Shawn Bohner. Software Change Impact Analysis. Wiley-IEEE Computer

Society Pr, first edition, 1996.

[5] W.R. Ashby. Design for a Brain. John Wiley and Sons, 1952.

[6] Sara Baase and Allen Van Gelder. Computer Algorithms: Introduction to Design and Analysis

(3rd Edition). Addison Wesley, 3rd edition, Nov 1999.

[7] Carliss Y. Baldwin and Kim B. Clark. Design Rules, Vol. 1: The Power of Modularity. The

MIT Press, 2000.

[8] Don Batory and Bart J. Geraci. Composition validation and subjectivity in genvoca generators.

IEEE Transactions on Software Engineering, 23(2):67–82, February 1997.

[9] Don Batory and Sean O’Malley. The design and implementation of hierarchical software

systems with reusable components. ACM Transactions on Software Engineering and Method-

ology, 1(4):355–398, 1992.

173

Bibliography 174

[10] Don Batory, Jacob Neal Sarvela, and Axel Rauschmayer. Scaling step-wise refinement. In

ICSE 03: Proceedings of the 25th International Conference on Software Engineering, pages

187–197, Washington, DC, USA, 2003. IEEE Computer Society.

[11] Don Batory, Vivek Singhal, Jeff Thomas, Sankar Dasari, Bart Geraci, and Marty Sirkin. The

genvoca model of software-system generators. IEEE Software, 11(5):89–94, September 1994.

[12] L. A. Belady and C. J. Evangelisti. System partitioning and its measure. In Journal of Systems

and Software, 1981.

[13] L. A. Belady and M. M. Lehman. A model of large program development. IBM Systems

Journal, 15(3):225–252, March 1976.

[14] Barry W. Boehm and Kevin J. Sullivan. Software economics: a roadmap. In Proceedings of

the conference on The future of Software engineering, pages 319–343. ACM Press, 2000.

[15] Grady Booch, James Rumbaugh, and Ivar Jacobson. The Unified Modeling Language User

Guide. Addison-Wesley, Reading, Massachusettes, 1999.

[16] Fred Brooks. No silver bullet: Essence and accidents of software engineering. IEEE Com-

puter, 20(4):10–19, April 1987.

[17] Fred Brooks. Is there a design of design? In Science of Design: Software-Intensive Systems,

Workshop Program, National Science Foundation, Computer and Information Science and

Engineering Directorate, Charlottesville, Virginia, November 2003.

[18] Yuanfang Cai and Kevin Sullivan. Simon: A tool for logical design space modeling and

analysis. In 20th IEEE/ACM International Conference on Automated Software Engineering,

Long Beach, California, USA, Nov 2005.

[19] Yuanfang Cai and Kevin Sullivan. Modularity analysis of logical design models. In 21th

IEEE/ACM International Conference on Automated Software Engineering, Tokyo, JAPAN,

Sep 2006.

Bibliography 175

[20] B. Y. Choueiry and G. Noubir. A disjunctive decomposition scheme for discrete constraint

satisfaction problems using complete no-good sets. In Knowledge Systems Laboratory, 1998.

[21] Krzysztof Czarnecki and Ulrich Eisenecker. Generative Programming: Methods, Tools, and

Applications. Addison-Wesley Professional, 1st edition edition, Jun 2000.

[22] R. Dechter and J. Pearl. Tree clustering for constraint networks. In Artificial Intelligence

38:353–366, 1989.

[23] Avinash K. Dixit and Robert S. Pindyck. Investment under Uncertainty. Princeton University

Press, USA, Jan 1998.

[24] Joanne Bechta Dugan, Kevin J. Sullivan, and David Coppit. Developing a high-quality soft-

ware tool for fault tree analysis. In Proceedings of the International Symposium on Software

Reliability Engineering, pages 222–31, Boca Raton, Florida, 1–4 November 1999. IEEE.

[25] Steven D. Eppinger. Model-based approaches to managing concurrent engineering. Journal

of Engineering Design, 2(4):283–290, 1991.

[26] Barry W. Boehm et al. Software Cost Estimation with Cocomo II. Prentice Hall PTR, 1st

edition edition, 2000.

[27] John Irwin et al. Aspect-oriented programming of sparse matrix code. In In Proceedings Inter-

national Scientific Computing in Object-Oriented Parallel Environments (ISCOPE), volume

1343, Marina del Rey, CA., 1997. Springer-Verlag, LNCS.

[28] John M. Favor, Kenneth R. Favor, and Paul F. Favor. Value based software reuse investment.

In Annals of Software Engineering 5, pages 5–52, 15May 1998.

[29] Martin S. Feather. Risk reduction using ddp (defect detection and prevention): Software

support and software applications. In RE, page 288, 2001.

[30] R. Filman and D. Friedman. Aspect-oriented programming is quantification and obliviousness.

2000.

Bibliography 176

[31] E. Freuder and P. D. Hubbe. A disjunctive decomposition control schema for constraint satis-

faction. In Principles and Practice of Constraint Programming, 1st International Workshop,

PPCP’93, Newport, Rhode Island, 1993.

[32] Eugene C. Freuder. Partial Constraint Satisfaction. In Proceedings of the Eleventh Interna-

tional Joint Conference on Artificial Intelligence, IJCAI-89, Detroit, Michigan, USA, pages

278–283, 1989.

[33] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements

of Resuable Object Oriented Software. ADDISON-WESLEY, Nov 2000.

[34] David Garlan and David Notkin. Formalizing design spaces: Implicit invocation mecha-

nisms. In Proceedings of the 4th International Symposium of VDM Europe on Formal Soft-

ware Development-Volume I, pages 31–44. Springer-Verlag, 1991.

[35] David Garlan and Mary Shaw. An introduction to software architecture. In V. Ambriola and

G. Tortora, editors, Advances in Software Engineering and Knowledge Engineering, volume 1,

pages 1–40. World Scientific Publishing Company, 1993. Large-scale architecture patterns:

pipes and filters, layering, black-board systems.

[36] Joseph A. Goguen. Reusing and interconneting software components. IEEE Computer,

19(2):16–28, February 1986.

[37] William G. Griswold, Kevin Sullivan, Yuanyuan Song, Nishit Tewari, Macneil Shonle, Yuan-

fang Cai, and Hridesh Rajan. Modular software design with crosscutting interfaces. IEEE

Software, Special Issue on Aspect-Oriented Programming, January/February 2006 (in press).,

Feb 2006.

[38] J. Hannemann and G. Kiczales. Design pattern implementation in java and aspect. In Pro-

ceedings of the 17th Annual ACM conference on Object-Oriented Programming, Systems,

Languages, and Applications (OOPSLA), 2002.

Bibliography 177

[39] D. Hutchens and R. Basili. System structure analysis: Clustering with data bindings. In IEEE

Transactions on Software Engineering, 11:749-757,, Aug. 1995.

[40] Daniel Jackson. Micromodels of software: Lightweight modeling and analysis with alloy.

February 2002.

[41] Daniel Jackson and Kevin Sullivan. COM revisited: Tool assisted modelling and analysis of

software structures. In Proceedings of the Eighth ACM SIGSOFT Symposium on the Founda-

tions of Software Engineering, pages 149–58, San Diego, CA, 6–10 November 2000.

[42] Rogers James L. Demaid/ga - an enhanced design manager’s aid for intelligent decomposition.

In Proceedings of 6th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis

and Optimization, Seattle, WA, 4-6 September 1996.

[43] Gregor Kiczales, Erik Hilsdale, Jim Hugunin, Mik Kersten, Jeffrey Palm, and William G.

Griswold. An overview of AspectJ. Lecture Notes in Computer Science, 2072:327–355,

2001.

[44] S. Jha K.J. Sullivan, P. Chalasani and V. Sazawal. Real Options and Business Srategy: Appli-

cations to Decision Making,. 1999.

[45] Thomas G. Lane. Studying software architecture through design spaces and rules. Technical

Report CMU/SEI-90-TR-18, CMU, 1990.

[46] A commercial product. http://www.lattix.com/.

[47] J. Liebeherr, M. Nahas, and W. Si. Application-layer multicasting with delaunay triangulation

overlays. IEEE Journal on Selected Areas in Communications, 20(8), Oct 2002.

[48] Jorg Liebeherr and Tyler K. Beam. Hypercast: A protocol for maintaining multicast group

members in a logical hypercube topology. In Networked Group Communication, pages 72–89,

1999.

[49] Cristina Videira Lopes and Sushil Krishna Bajracharya. An analysis of modularity in aspect

oriented design. In AOSD ’05, pages 15–26, New York, NY, USA, 2005. ACM Press.

Bibliography 178

[50] Alan MacCormack, John Rusnak, and Carliss Baldwin. Exploring the structure of complex

software designs: An empirical study of open source and proprietary code. Harvard Business

School Working Paper Number 05-016.

[51] Alan Mackworth. Consistency in networks of relations. In Artificial Intelligence, 8, pages

99–118, 1977.

[52] S. Mancoridis, B. Mitchell, C. Rorres, Y. Chen, and E. Gansner. Using automatic clustering

to produce high-level system organizations of source code. In In Proc. 6th Intl. Workshop on

Program Comprehension, 1998.

[53] G. Murphy and D. Notkin. Software reflexion models: Bridging the gap between source and

high-level models. In Proceedings of the Third Symposium on the Foundations of Software

Engineering (FSE3), pages 18–28, New York, NY, October 1995. ACM.

[54] D. L. Parnas. On the criteria to be used in decomposing systems into modules. Communica-

tions of the ACM, 15(12):1053–8, December 1972.

[55] Neeraj Sangal, Ev Jordan, Vineet Sinha, and Daniel Jackson. Using dependency models to

manage complex software architecture. In OOPLSA, 2005.

[56] R. Schwanke. An intelligent tool for re-engineering software modularity. In In Proc. 13th Intl.

Conf. Software Engineering., 1991.

[57] M. Shaw. Candidate model problems in software architecture, 1994.

[58] Herbert A. Simon. The Sciences of the Artificial. The MIT Press, third edition, 1996.

[59] M. Sinnema, S. Deelstra, J. Nijhuis, and J. Bosch. Covamof: A framework for modeling

variability in software product families. In Proceedings of SPLC 2004, volume 3154, pages

197–213, August 2004.

[60] Mike Spivey. The fuzz manual. URL: http://spivey.oriel.ox.ac.uk/˜mike/fuzz/.

http://spivey.oriel.ox.ac.uk/~mike/fuzz/

Bibliography 179

[61] W. P. Stevens, G. J. Myers, and L. L. Constantine. Structured design. IBM Systems Journal,

13(2):115–39, 1974.

[62] Donald V. Steward. The design structure system: A method for managing the design of

complex systems. IEEE Transactions on Engineering Management, 28(3):71–84, 1981.

[63] Kevin Sullivan, William Griswold, Yuanyuan Song, Yuanfang Cai, and et al. Information

hiding interfaces for aspect-oriented design. In ESEC/FSE ’05, Sept 2005.

[64] Kevin Sullivan, William G. Griswold, Yuanfang Cai, and Ben Hallen. The structure and

value of modularity in software design. SIGSOFT Software Engineering Notes, 26(5):99–

108, September 2001.

[65] Kevin J. Sullivan, Joanne Bechta Dugan, and David Coppit. The Galileo fault tree analysis

tool. In Proceedings of the 29th Annual International Symposium on Fault-Tolerant Comput-

ing, pages 232–5, Madison, Wisconsin, 15–18 June 1999. IEEE.

[66] Kevin J. Sullivan, Joanne Bechta Dugan, John Knight, et al. Galileo: An advanced fault tree

analysis tool, 1997. URL: http://www.cs.virginia.edu/˜ftree/index.html.

[67] Kevin J. Sullivan, Ira J. Kalet, and David Notkin. Software design: The options approach.

In 2nd International Software Architecture Workshop, Joint Proceedings of the SIGSOFT ’96

Workshops, San Francisco, CA, October, 1996., pages 15–18, August 1996.

[68] Peri L. Tarr, Harold Ossher, William H. Harrison, and Stanley M. Sutton Jr. N degrees of

separation: Multi-dimensional separation of concerns. pages 107–119, 1999.

[69] Edward Tsang. Foundations of Constraint Satisfaction. Academic Pr., London and San Diego,

1993.

[70] J Withey. Investment analysis of software assets for product lines. Technical Report

CMU/SEI-96-TR-10, Carnegie Mellon University, 1996.

http://www.cs.virginia.edu/~ftree/index.html

Bibliography 180

[71] C. Jason Woodard. Architectural Strategy and Design Evolution in Complex Engineered Sys-

tems. PhD thesis, Harvard University and Singapore Management University., May 2006.

(forthcoming).

Modularity in Design: Formal Modeling and Automated Analysis

Documents