Dependability

Dependable softwareBertrand Meyer, ETH Zurich

ABSTRACT

Achieving software reliability takes many complementary techniques, directedat the process or at the products. This survey summarizes some of the mostfruitful ideas.

1 OVERVIEWEveryone who uses software or rel in other words, everyone programs will perform properly.techniques to improve software qu

There are many subcultures osealed off from each other; menCMMI to programming languageproofs, can be as incongruous as bThis survey disregards such establinclude as many as possible ofproducing good software is hard ea result we will encounter techniq

A note of warning to the readfrom including references eaexpectation (if a justification is nthan a cold inspection limited to o

2 SCOPE AND TERMIN

The first task is to define some ofof this articles title, determinedInformation and Communication

Reliability and dependabili

In the software engineering lidependable but reliable, as igeneral-purpose and technical dicdefinitions and are usually transla

Cite as follows: Bertrand Meyer, DependableSoftware, to appear in Dependable Systems:Software, Computing, Networks, eds. Jrg Kohlas,Bertrand Meyer, Andr Schiper, Lecture Notes inComputer Science, Springer-Verlag, 2006.ies on devices or processes that use softwarehas a natural interest in guarantees thatThe following pages provide a review ofality.

f software quality research, often seeminglytioning process-based approaches such astechnologists, or tests to people working onringing up Balanchine among baseball fans.ished cultural fences and instead attempts tothe relevant areas, on the assumption thatnough that every little bit counts [60]. Asues of very diverse kinds.er seeking objectivity: I have not shied awaysy to spot to my own work, with theeeded) that it makes the result more livelyther peoples products and publications.

OLOGY

the fundamental terms. Even the first wordby the Hasler Foundations Dependable

Systems project, requires clarification.

ty

terature the more familiar term is notn software reliability. A check throughtionaries confirms that the two have similarted identically into foreign languages.

DEPENDABLE SOFTWARE2

There does exist a definition of dependability [1] from the eponymousIFIP Working Group 10.4 [39] that treats reliability as only one amongdependability attributes, along with availability, safety, confidentiality,integrity and maintainability. While possibly applicable to a computingsystem as a whole, this classification does not seem right for their softwarepart, as some attributes such as availability are not properties of the softwareper se, others such as confidentiality are included in reliability (through one ofits components, security), and the remaining ones such as maintainability areof dubious meaning for software, being better covered by other quality factorssuch as extendibility and reusability [57].

As a consequence of thesedependability as meaning the sam

Defining reliability

The term software reliability itsOne could argue for taking it to covof use, efficiency and extendibilitymodularity. (The distinction, detaproperties, immediate or long-tpurchasing and using the softwaronly to software developers althouof external factors.)

It is reasonable to retain a mcovers three external factors: cdoesnt imply that others are irrerobust and secure system can hardtakes ages to react to inputs, an effiuse: many software disastersimplemented the right functions buser interfaces. The reasons for lare, first, that including all othersessentially the whole of softwartechniques to achieve these threea certain kindred spirit, not sharperformance optimization techniqand other external and internal facobservations the present survey interpretse thing, for software, as reliability.

elf lacks a universally accepted definition.er all external quality factors such as ease, and even internal quality factors such asiled in [57], is that external factors are theerm, that affect companies and peoplee, whereas internal factors are perceptiblegh in the end they determine the attainment

ore restricted view in which reliability onlyorrectness, robustness and security. Thislevant; for example even the most correct,ly be considered dependable if in practice itciency problem. The same goes for ease of

on record happened with systems thatut made them available through error-proneimiting ourselves to the three factors listedwould turn this discussion into a survey ofe engineering (see [33]); second, that thefactors, although already very diverse, haveed by those for enhancing efficiency (likeues), ease of use (like ergonomic design)tors.

2 SCOPE AND TERMINOLOGY 3

Correctness, robustness, securityFor the three factors retained, we may rely on the following definitions: Correctness is a systems ability to perform according to its specification

in cases of use within that specification. Robustness is a systems ability to prevent damage in cases of erroneous

use outside of its specification. Security is a systems ability to prevent damage in cases of hostile use

outside of its specification.They correspond to levels of increasing departure from the specification. Thespecification of any realistic system makes assumptions, explicit or implicit,about the conditions of its use: agenerated program if the inputprogram defines a pay check if taccess control software specificbuilding has burned. By nature, tsecurity are different from thosewe can no longer talk of performiseek the more modest goal of preability to detect attempts at errone

Security deserves a special mhighly visible place in software clamented, as it signals the end of awe could concentrate on devisingmuch concern about the worldsadvantage, since it has finally brousoftware quality issues, a resultmodern software engineering pracvisible signs of this phenomenondevelopment in February of 2001security flaws. Many of these floverflow, are simply the result of pfocusing on security means lookfixing security implies taking a coand requires, in the end, ensuring Product and processAny comprehensive discussioncomplementary aspects: product a

The products are the softwareassess; the process includes the mand their organizations build theseC compilers specification doesnt define ais payroll data, any more than a payrollhe input is a C program; and a buildingsation cannot define what happens if thehe requirements defined by robustness andof correctness: outside of the specification,ng according to that specification, but onlyventing damage; note that this implies theous or hostile use.ention as in recent years it has assumed a

oncerns. This is a phenomenon to be bothgolden age of software development whenthe best possible functionality without toonastiness, and at the same time taken toght home to corporations the seriousness ofthat decades of hectoring by advocates oftices had failed to achieve. One of the mostis Bill Gatess edict famously halting all

in favor of code reviews for hunting downaws, such as the most obnoxious, bufferoor software engineering practices. Even if

ing at the symptom rather than the cause,herent look at software tools and techniquesreliability as a whole.

of software issues must consider twond process.elements whose reliability we are trying to

echanisms and procedures whereby people products.


The products of softwareThe products themselves are diverse. In the end the most important one, forwhich we may assess correctness, robustness and security, is code. But eventhat simple term covers several kinds of product: source code as programmerssee it, machine code as the computer executes it, and any intermediate versionsas exist on modern platforms, such as the bytecode of virtual machines.

Beyond code, we should consider many other products, which in theirown ways are all software: requirements, specifications, design diagramsand other design documents, test data but also test plans , userdocumentation, teaching aids

To realize why it is importanproducts other than code, it sufstudies, some already decades oldcost of correcting an error the late

DeficienciesIn trying to ascertain the reliabilitoften like a detective or a firmindset and look for sources oaccepted terminology here disting A failure is a malfunction of

directly apply to products oth A fault is a departure of the so

have satisfied. A failure alnecessarily a fault in the codocumentation, or in a non-which the system runs.

An error is a wrong human dsystem. Wrong is a subjecwhat it means: a decision is wturn cause failures).

In a discussion limited to softwarresult from errors, since softwareslings and arrows of the physical w

The more familiar term for eengineering literature shuns it fobenefit of admitting that our mistathem ourselves. In practice, as mat in the search for quality to pay attention tofices to consider the results of numerous[10], showing the steep progression of the

r it is identified in the lifecycle.

y of a software product or process we muste prevention engineer adopt a negativef violation of reliability properties. Theuishes three levels:the software. Note that this term does noter than executable code.ftware product from the properties it shouldways comes from a fault, although notde: it could be in the specification, in thesoftware product such as the hardware on

ecision made during the construction of thetive term, but for this discussion its clearrong if it can lead to a fault (which can in

e reliability, all faults and hence all failuresis an intellectual product not subject to theorld.

rror is bug. The upper crust of the softwarer its animist connotations. Error has thekes dont creep into our software: we inserty be expected, everyone says bug.

2 SCOPE AND TERMINOLOGY 5

Verification and validation

Even with subjectivity removed from the definition of error, definitions forthe other two levels above remains relative: what constitutes a malfunction(for the definition of failures) or a departure from desirable properties (forfaults) can only be assessed with respect to some description of the expectedcharacteristics.

While such reference descriptions exist for some categories of softwareproduct an element of code is relative to a design, the design is relative toa specification, the specification is relative to an analysis of the requirements the chain always stops somewhere; for example one cannot in the end certifythat the requirements have no faultsome higher-level description, anassessing the value of the descript

Even in the absence of anotheassess a particular product, we canby performing internal checks. Fo A program that does not init

path is suspicious, independfulfillment of its specification

A poorly written user manualof another project document,

This observation leads to distireliability assessment, verificatioabbreviation V&V: Verification is internal asse

considered just by itself. Theare subject to verification: foranother example.

Validation is relative assessdefines some of the propertiedesign against specificatiodocumentation against standrules, delivery dates againstagainst defined goals, test sui

A popular version of this distiascertaining that the product is ddoing the right thing. It only appa project plan or a test plan do not, as this would mean assessing them againstd would only push the problem further toion itself. Turtles all the way up.r reference (another turtle) against which tooften obtain some evaluation of its quality

r example:ialize one of its variables along a particularently of any of its properties vis--vis the.may not explicitly violate the prescriptions

but is problematic all the same.nguishing two complementary kinds ofn and validation, often combined in the

ssment of the consistency of the product,last two examples illustrated properties thatcode; for documentation. Type checking is

ment of a product vis--vis another thats that it should satisfy: code against design,n, specification against requirements,ards, observed practices against companyproject milestones, observed defect rates

tes against coverage metrics.nction [10] is that verification is aboutoing things right and validation that it islies to code, however, since a specification, do anything.


3 CLASSIFYING APPROACHES

One of the reasons for the diversity of approaches to software quality is themultiplicity of problems they address. The following table shows a list ofcriteria, essentially orthogonal, for classifying them.

The first distinction is cultural almtechniques the emphasis is methapply certain rules to produce a bthe goal is to examine a proposepossible deficiencies, with the aimstate that the two are complemenoften used by proponents of a pcriticized for accepting softwareimprove it they correspond tohopeful of prevention and the othe

The second distinction correengineering cited above: are we wleading to them?

Some approaches are of aapplying some practices; we mtechniques that are tool-supported

Criteria for classifying approaches to software reliability

A priori (build) A posteriori (assess and correct)Process ProductManualTechnology-neutralProduct- and phase-neutralStatic (uses software text)InformalComplete (guarantee)Freeost as much as it is technical. With a prioriodological: telling development teams to

etter product. With a posteriori techniques,d software product or process element for

of correcting them. While it is natural totary rather than contradictory a defenseosteriori approaches such as testing whentechnology as it is rather than helping todifferent views of the software world, oner willing to settle down for cure.

sponds to the two dimensions of softwareorking on the products, or on the processes

methodological nature and just requireay call them manual, in contrast withand hence at least partially automated.

vs

Tool-supportedTechnology-specificProduct- or phase-specificDynamic (requires execution)MathematicalPartial (some progress)Commercial

3 CLASSIFYING APPROACHES 7

An idea can be applicable regardless of technology choices; for exampleprocess-based techniques such as CMMI, discussed below, explicitly stayaway from prescribing specific technologies. At the other extreme, certaintechniques may be applicable only if you accept a certain programminglanguage, specification method, tool or other technology choice. We may talkof technology-neutral and technology-specific approaches; this is more aspectrum of possibilities than a black-and-white distinction, since manyapproaches assume a certain class of technologies such as object-orienteddevelopment encompassing many variants.

Some techniques apply to aspecification (a specification langcode) They are product-specconfiguration management tools, aproduct-neutral. Product is usedof the software construction proce

For techniques directed at prbetween dynamic approaches sucprogram, and purely static ones,which only need to analyze the prsimulation technique requires edynamic even though the execenvironment; model-checking isrespect it is close to testing.

Some methods are based onthe case with program proofs andmore informal.

A technique intended to assesguarantee that they are satisfiedreassurance to this effect.

The final distinction is ecodomain usable for free, incommercial ones.specific product or phase of the lifecycle:uage), implementation (a static analyzer ofific, or phase-specific. Others, such aspply to many or all product kinds; they arehere to denote one of the types of outcome

ss.

ogram quality, an important division existsh as testing, which rely on executing the

such as static analysis and program proofs,ogram text. Here too some nuances exist: axecution and hence can be classified asution doesnt use the normal run-timeclassified as static even though in some

mathematical techniques; this is obviouslyformal specification in general. Many are

s quality properties can give you a complete, or more commonly some partial

nomic: between techniques in the publicthe ordinary sense of the term and


4 PROCESS-BASED APPROACHES

We start with the least technical approaches, emphasizing managementprocedures and organizational techniques.

Lifecycle models

One of the defining acts of software engineering was the recognition of theseparate activities involved, in the form of lifecycle models that prescribe acertain order of tasks (see the figure on the adjacent page). The initial modelis the so-called waterfall [11], stsoftware process although no loVariants include:

The V model which retainsdivides the process into twoalong the first branch are for dfor verification and validatiosteps along the first branch.

The Spiral model [11] wmanagement, in particular thof the Waterfall approach. Ththe systems functionality tquickly, and when they haveexperience to proceed to othewith the notion of rapid proto

The Rational Unified Proceelaboration, construction andof development and a set oconfiguration management.

The Cluster model [51]incrementality buildingfundamental to the most usersuccessive activities, frommaintenance, as a continuumindividual lifecycle of everyfuture reuse of some of the d

The figure shows pictorial represeill used as a reference for discussions of thenger recommended for literal application.

the sequential approach of the waterfall butparts, the branches of the V; activitiesevelopment, those in the second branch are

n, each applied to the results of one of the

hich focuses on reducing risk in projecte risk caused by the all-or-nothing attitudee spiral model suggests isolating subsets ofhat are small enough to be implementedbeen implemented taking advantage of ther parts of the system. The idea is connectedtyping.

ss, distinguishing four phases, inception,transition, with a spiral-like iterative stylef recommended best practices such as

[57], emphasizing a different form ofa system by layers, from the most

-oriented and a seamless process treatinganalysis to design, implementation and. This model also introduces, as part of thecluster, a generalization step to prepare foreveloped elements.

ntations of some of these models.

4 PROCESS-BASED APPROACHES 9

Waterfall Cluster

Spiral (from [11])Lifecycle models, illustrated

V-shaped


Whatever their effect on how people actually develop software, thecontribution of lifecycle models has been a classification and definition of theactivities involved in software development, even when these activities are notexecuted as phases in the precise order mandated by, for example, thewaterfall model. Software quality benefits in particular from:

A distinction between requirements, the recording of user requirements,and specification, their translation into a systematic form suitable forsoftware development, where rigor and precision are essential.

Recognition of the importance of Verification and Validation tasks.

Recognition of post-deliverythey still do not occupy a visresult from evolutions poster

In the Cluster model, the prestask to prepare for reuse.

Also in the Cluster model, thwhich unifies the methods,throughout the software procescounter-example here is the us

The growing emphasis on ieven if this concept is underscluster and RUP models.

Organizational standards

Another process-related set of debeneficial, on some segments ofDepartment of Defense, concernsoftware capabilities and to estSoftware Engineering Institute wMaturity Model, whose currenIntegration) provides a collectdisciplines, rather than a single moInternational Standard Organizativariants of its 9000-series qualproperties with CMMI. The preseactivities such as maintenance, althoughible enough place. Many software troublesior to the initial release.

ence, for each cluster, of the generalization

e use of a seamless and reversible approachtools, techniques and notations that helps, rather than exaggerate them. (The textbooke of UML for analysis and design [56].)

ncrementality in the development process,tood differently in, for example, the spiral,

velopments has had a major effect, largelythe industry. In the early 1990s the US

ed with the need to assess its suppliersablish consistent standards, entrusted theith the task of developing a Capabilityt incarnation, CMMI [74] (the I is forion of standards applicable to variousdel for software. Largely independently, theon has produced a set of software-orientedity standards, which share a number ofnt discussion is based on CMMI.


Beyond its original target community, CMM and CMMI have been thecatalyst for one of the major phenomena of the IT industry starting in the mid-nineties: the development of offshore software production, especially in India[63]. CMMI qualification provides suppliers of outsourcing developmentservices with quality standards and the associated possibility of independentcertification, without which customers would not be have known how to trustdistant, initially unknown contractors.

CMMI is (in the earlier classification) product-neutral, phase-neutral andtechnology-neutral. In its application to software it is intended only todetermine how well an organization controls its development process bydefining and documenting it, recording and assessing how it is applied inpractice, and working to improvshould be, only how much you adeveloping in PL/I on IBM 370 an

CMMI assesses both the cap(such as software) in an organizatiwhole. It distinguishes five levels Performed: projects happen

control and no reproducibilit Managed: processes are clea

for the organization as a who Defined: proactive process de Quantitatively managed: the

to qualitative techniques, but Optimizing: the mechanisms

well established that the focand its processes.

Through their emphasis on the prstandards help improve the qualitysuch improvements of the procesproducts as well; but they are only one module of the software waanother was providing them in Enthe failure of the NASA Mars Orbthe project noted that the organizastandards. Process models and prfor using the best technologicalwould not shy away from integrtechnology could be extremelyneutral requirements of CMMI cahold on their software processes.e it. It doesnt prescribe what the processre on top of it. You could presumably bed get CMMI qualification.

ability level of individual process areas inon, and the maturity of an organization as aof increasing maturity:and results get produced, but there is littley; the process is essentially reactive.rly defined for individual projects, but notle. They remain largely reactive.fined for the organization.

control mechanisms do not limit themselves add well-defined numerical measurements.

for controlling processes are sufficientlyus can shift on improving the organization

ocess and its repeatability, CMMI and ISOof software development. One may expect

s to have a positive effect on the resultingpart of the solution. After a software error

s expecting measures in the metric system,glish units was identified as the cause ofiter Vehicle mission [82], an engineer fromtion was heavily into ISO and other processocess-focused practices are not a substitutesolutions. Tailored versions of CMMI thatating specific technologies such as objectuseful. In the meantime, the technology-n be applied by organizations to get a better


Extreme programming

The Extreme Programming movement [6] is a reaction against precisely thekinds of lifecycle models and process-oriented approaches just reviewed. XP(as it is also called) emphasizes instead the primacy of code. Some of theprincipal ideas include:

Short release cycles to get frequent feedback.

Pair programming (two people at a keyboard and terminal). Test-driven development.

A general distrust of specifiguide of development.

Emphasis on programmers w

Some of these practices are clearprior to XP, in particular short redescribed in 1995 by Cusumano afrequent testing as part of develoreally specific to XP are of limitedpair programming cannot be impowork for some people and becauuseful all the time) or, in the cspecifications, downright detrimthe approach.

Code inspections

A long-established quality practicsession designed to examine a certflaws. The most common form iapplied to any kind of software en

Small meeting: at most 8 peelement under review.

The elements under reviewcirculated in advance; theidentified possible criticismsbe bounded, for example 2 orcation and design: testing is the preferred

elfare.

ly beneficial to quality but were developedlease cycles (Microsofts daily build as

nd Shelby [19], see also [54]) and the use ofpment (see e.g. quality first [55]). Those

interest (while sometimes a good practice,sed indiscriminately, both because it doesntse those who find it useful may not find itase of tests viewed as a replacement forental. See [75] and [64] for critiques of

e is the inspection, also known as review: aain software element with the aim of findings code inspection, but the process can begineering product. Rules include:

ople or so, including the developer of the

and any supporting documents must beparticipants should have read them andbefore the meeting. The allotted time should 3 hours.


The meeting must have a moderator to guide discussions and a secretaryto record results.

The moderator should not be the developers manager. The intent is toevaluate products, not people.

The sole goal is to identify deficiencies and confirm that they are indeeddeficiencies; correction is not part of the process and should not beattempted during the meeting.

Code inspections can help avoid errors, but to assess their usefulness one mustcompare the costs with those of running automated tools that can catch someof the same problems without human intervention; static analyzers, discussedbelow, are an example.

Some companies have institcheck in code (integrate it into thewithout approval by one other dethat has a clearly beneficial effect bat least one other team member of

Open-source processes

A generalization of the idea of comembers of the open-source cdramatically improves quality byat the software text; some have goall bugs are shallow [73].

As with many of the other tea beneficial contribution, but not aof a widely used security programsubtle buffer overflow problems hacode for years, even though it hadand security auditors One tool wpotentially exploitable, but reseacame to the conclusion that there w(The last observation is anecdotal esuch as static analyzers are potent

While is no evidence that opworse) than commercial software,only because of the wide variety oclear that more eyes potentially seutionalized the rule that no developer mayrepository for a current or future product)

veloper, a limited form of code inspectiony forcing the original developer to convince the suitability of the contribution.

de inspection is the frequent assertion, byommunity, that the open-source processenabling many people to take a critical lookne so far as to state that given enough eyes,

chniques reviewed, we may see in this ideapanacea. John Viega gives [78] the examplein which in the past two years, several veryve been found Almost all had been in thebeen examined many times by both hackersas able to identify one of the problems asrchers examined the code thoroughly andas no way the problem could be exploited.vidence for the above observation that tools

ially superior to human analysis.)en-source software as a whole is better (orand no absolute rule should be expected if

f products and processes on both sides, it ise more bugs.


Requirements engineering

In areas such as embedded systems, many serious software failures have beentraced [45] to inadequate requirements rather than to deficiencies introducedin later phases. Systematic techniques for requirements analysis are available[76] [40] to improve this critical task of collecting customer wishes andtranslating them into a form that can serve as a basis for a software project.

Design patterns

A process-related advance that hadevelopment is the emergencearchitectural scheme that has beenin applications, and for which astandard format. Patterns providesimplifying design discussions, anwisdom of their predecessors.

A (minority) view of patterntowards the technique discussed ninterpretation, suffer from the liminsert the corresponding solutionsystem. If instead it is possible todevelopers can directly reuse th(Abstract Program Interface). Thethan to redo. Investigations [65]programming language constructscan be thus componentized.

Trusted components

Quality improvement techniques,product, are only as good as themagnitude of the necessary educamajor short-term improvements,have not had the benefit of a forms had a strong beneficial effect on softwareof design patterns [32]. A pattern is anrecognized as fruitful through frequent useprecise description exists according to a

a common vocabulary to developers, henced enable them to benefit from the collective

s [62] [65] understands them as a first stepext, reusable components. Patterns, in thisitation that each developer must manuallys into the architecture of every applicableturn the pattern into a reusable component,e corresponding solution through an APIobservation here is that it is better to reusesuggest that with the help of appropriateup to two thirds of common design patterns

whether they emphasize the process or their actual application by programmers. Thetion effort is enough to temper any hope ofespecially given that many programmers

al computer science education to start with.


Another practical impediment to continued quality improvement comesfrom market forces. The short-term commercial interest of a company isgenerally to release software that is good enough [83]: software that hasbarely passed the threshold under which the market would reject it because ofbad quality; not excellent software. The extra time and expense to go from thefirst to the second stage may mean, for the company, losing the market to a lessscrupulous competitor, and possibly going out of business. For the industry asa whole, software quality has indeed improved regularly over time but tendsto peak below the optimum.

An approach that can overcome these obstacles is increased reliance onreusable components, providing pmany different applications, eitherpurpose component libraries) orComponents have already changproviding conveniently packagabstract interfaces, of commondatabase manipulation, basic nstructures and others, thereby elevtheir applications. When the compreuse has highly beneficial effectsthe quality of the application-spec

Examining more closely thactually highlights two separatequality of a system will benefit fmust note that reuse magnifies thebe even more damaging in componsince they affect every application

The notion of trusted componone of the most pressing and promis the industrial production of reusof quality. Producing such trusttechniques discussed elsewhere inones, such as program proving, away to justify the cost and efforscaling effect of component reusequality at which it can really be trelies on it.re-built solutions to problems that arise inregardless of the technical domain (general-in particular fields (specialized libraries).

ed the nature of software development byed implementations, accessible throughaspects such as graphical user interfaces,umerical algorithms, fundamental dataating the level at which programmers writeonents themselves are of good quality, suchsince developers can direct their efforts to

ific part of their programs.

e relationship of components to qualityeffects: it is comforting to know that therom the quality of its components; but webad as well as the good: imperfections canents than in one-of-a-kind developments, that relies on a component.

ent [58] [61] follows from this analysis thatising tasks for improving software quality

able components equipped with a guaranteeed components may involve most of thethis article. For some of the more difficult

pplication to components may be the bestt and recoup the investment thanks to the: once a component has reached the level ofrusted, it will benefit every application that


5 TOOLS AND ENVIRONMENTS

Transitioning now to product-oriented solutions, we examine some of theprogress in tools available to software developers to the extent that it isrelevant for software quality.

Configuration management

Configuration management is a both practice (for the software developer) anda service (from the supporting tools), so it could in principle be classifiedunder process as well as under product. It belongs more properly to thelatter category since its tools thatapplied as a pure organizational prbecomes tedious and ceases being

Configuration management mand registering of project element Register a new version of any Retrieve any previously regis Register dependencies, both

registered versions of projectof A requires version 7, 8 or 9

Construct composite producbuild an executable versionreconstruct earlier versions, i

A significant number of softwconfiguration management errors,version of a module when compilobsolete version of some data fileacceptable configuration managesource, are widely available. Theshope for, have made configuratiopractices of modern software deve

Source code is not the onlyAny product that evolves, has deprestoring to an earlier state shconfiguration management reposiplans, specification and design dosuch as PowerPoint slides, test damake configuration management realistic;actice without good tool support, it quickly applied.ay be defined as the systematic collecting

s, including in particular the ability to: project element.tered version of any project element.

between project elements and betweenelements (e.g. A relies on B, and version 10 of B).

ts from their constituents for example,of a program from its modules or

n accordance with registered dependencies.are disasters on record followed fromtypically due to reintroducing an obsolete

ing a new release of a program, or using an. Excuses no longer exist for such errors, asment tools, both commercial and open-e tools, while still far from what one couldn management one of the most importantlopment.beneficiary of configuration management.endencies on other elements and may needould be considered for inclusion in thetory. Besides code this may include projectcuments, user manuals, training documentsta files.

5 TOOLS AND ENVIRONMENTS 17

Metrics and models

If we believe Lord Kelvins (approximate) maxim that all serious study isquantitative, then software and software development should be susceptible tomeasurement, tempered of course by Einsteins equally famous quote that noteverything measurable is worth measuring. A few software properties, processor product, are at the same time measurable, worth measuring and relevant tosoftware reliability.

On the process side, cost in its various dimensions is a prime concern.While it is important to record costs, if only for CMMI-style traceability, whatmost project managers want at a particular time is a model to estimate the costof a future project or of the remaindand can be useful, at least if the devcomparable to previous ones: thenand relying on historical data for coperson-months within reasonamodel, for which free and commer

During the development of athey shouldnt be comparable to this an intellectual product and doesfrom the weather. In practice, howlarge projects can follow patterns tare susceptible to similar statisticalexist is in fact consistent with intuproduct under development have eais unlikely that the next iteration wreliability engineering [69][46] elaassessing and predicting failures,requirement for meaningful predictcalibration. Reliability models areprojects understand, predict and m

More generally, numerousquantitative assessments of softwaexample, include: source lines ofall the same; function points [2mechanisms implemented by the scontrol graph, such as cyclomspecifically adapted to object-orienvironment [30] makes it possibleunder development, including meaer of a current project. Such models do existelopment process is stable and the project is

by estimating a number of project parametersmparison one can predict costs essentially,ble average accuracy. A well-known costcial tools are available, is COCOMO II [12].system, faults will be reported. In principlee faults of a material product, since softwarent erode, wear out or collapse under attackever, statistical analysis shows that faults inhat resemble those of hardware systems andprediction techniques. That such patterns canition: if the tests on the last five builds of ach uncovered one hundred new bugs each, itill have zero bugs, or a thousand. Software

borates on these ideas to develop models forfaults and errors. As with cost models, a

ions is the ability to rely on historical data fornot widely known, but could help software

anage anomalies better.metrics have been proposed to providere properties. Measures of complexity, for

code (SLOC), the most primitive, but useful5], which count the number of elementaryoftware; measures of the complexity of theatic complexity [48][49]; and measuresented software [35][59]. The EiffelStudioto compute many metrics applied to a projectsures regarding the use of contracts (section


8), and to compare them with values on record. While not necessarily meaningfulin isolation, such measures elements are a useful control tool for the manager;they are in line with the CMMIs insistence that an organization can only reachthe higher levels of process maturity (4 and 5) by moving from the qualitative tothe quantitative, and should be part of the data collected for such an effort.

Static analyzers

Static analyzers are another important category of tools, increasinglyintegrated in development environments, whose purpose is to examine thesoftware text for deficiencies. They lie somewhere between type checkers(themselves integrated in compilstudied below (page 26) after the

Integrated development env

Beyond individual tools the evoluwidespread of integrated tool suiteInteractive) Development EnvirMicrosofts Visual Studio [66] ananother example. These envisophisticated graphical user interbattery of mechanisms to write(configuration management), comexamine it effectively (browsers)(debuggers, testers), analyze it foanalysis), generate code from desaround (diagramming, Computereverse engineering), change arcontrolled transformations (refacabove (metric tools), and other tas

This is one of the most active afor whom IDEs are the basic dailyso that open-source projects suchactive community participation. Tsoftware reliability, while diffuse,supports quality in several ways:techniques; avoiding new bugsgenerating some of the code withoproviding a level of comfort that fthem apply their best skills to the ers) and full program provers, and will bediscussion of proofs.

ironments

tion of software development has led to thes known as IDEs for Integrated (originally:onments. Among the best known ared IBMs Eclipse [27]; EiffelStudio [30] isronments, equipped with increasinglyfaces, provide under a single roof a wholesoftware (editors), manage its evolution

pile it (compilers, interpreters, optimizers),, run it and elucidate the sources of faultsr possible inconsistencies and errors (staticign and analysis diagrams or the other wayr-Aided Software Engineering or CASE,chitecture in a safe way through tool-toring), perform measurements as notedks.reas in software engineering; programmers,tools, are directly interested in their quality,

as Eclipse and EiffelStudio benefit fromhe effect of these advanced frameworks onis undeniable, as their increasing clevernessfinding bugs through static and dynamic

through mechanisms such as refactoring;ut manual intervention; and, more generally,rees programmers from distractions and letshardest issues of software construction.

6 PROGRAMMING LANGUAGES 19

6 PROGRAMMING LANGUAGES

The evolution of programming languages plays its part in the search for morereliable software. High-level languages contribute both positively, byproviding higher levels of expression through advanced constructs freeing theprogrammer (in the same spirit as modern IDEs) from mundane, repetitive orirrelevant tasks, and negatively, by ruling out certain potentially unsafeconstructs and, as a result, eradicate entire classes of bugs at the source.

The realization that programming language constructs could exert amajor influence on software quality both through what they offer and whatthey forbid dates back to structureseventies, led to rejecting the goexpressive constructs sequenmajor step was object-oriented pabstractions, in particular the notion object types rather than ininheritance and genericity.

In both cases the benefit comoperationally about software. Aexecutions, so many in fact that ihence to get it right by thinkingBoth structured and object-orienteoperational thinking and instead urun-time behaviors by applying th

In drawing the list of pcontributions to quality, we must ihave to do with structure. Withambitious goals, the production ansafe and powerful modular decomp

As pointed out, the class mecstable modules with a clear r

Techniques for informationdetails of other modules, andparts of a system.

Inheritance, allowing the clclasses into structured collec

Genericity, allowing the cond programming [22] [20] which, in the earlyto as a control structure in favor of morece, conditional, loop, recursion. The nextrogramming, introducing a full new set ofon of class, providing decomposition baseddividual operations, and techniques of

es largely from being able to reason lesssoftware text represents many possible

t is hard to understand the program andin terms of what happens at execution [22].d techniques make it possible to limit suchnderstand the abstract properties of future

e usual rules of logical reasoning.

rogramming languages most importantndeed put at the top all the mechanisms thatever larger programs addressing ever mored maintenance of reliable software requiresosition facilities. Particularly noteworthy are:

hanism, which provides a general basis forole in the overall architecture.

hiding, which protect modules againstpermit independent evolution of the various

assification and systematic organization oftions, especially with multiple inheritance.

struction of type-parameterized modules.


Another benefit of modern languages is static typing which requiresprogrammers to declare types for all the variables and other entities in theirprograms, then takes advantage of this information to detect possibleinconsistencies in their use and reject programs, at compilation time, until alltypes fit. Static typing is particularly interesting in object-oriented languagessince inheritance supports a flexible type system in which types can becompatible even if they are not identical, as long as one describes aspecialization of the other.

Another key advance is garbage collection, which frees programmersfrom having to worry about the details of memory management and removesan entire class of errors such as attempts to access a previously freedmemory cell which can othercorrect, in particular because the rthan deterministic. Strictly speakilanguage implementation, but itpossible, as with modern object-osuch as C that permit arbitrary poi

Exception handling, as prehelps improve software robustnesscode for run-time faults that wooverflow or running out of memor

A mechanism that is equallyclosure, delegate or agentobjects that can then be passedsystem, making it possible todrastically simplify certain kindsGUI programming and other even

The application of programsoftware quality is limited by the csoftware industry on older langua Operating systems and low-l

which retains its attractionsknown deficiencies, such as t

The embedded and mission-clow-level languages, includinintroduced by compilers and

The Verifying Compiler Grand Cthe development of tools that will guarantee, during the procedescribed in the following sectionswise be particularly hard to detect and toesulting failures are often intermittent ratherng, garbage collection is a property of thes the language definition that makes itriented languages, or not, as in languagesnter arithmetic and type conversions.sent in modern programming languages,by allowing developers to include recovery

uld otherwise be fatal, such as arithmeticy.far-reaching in its abstraction benefits is the[62]. Such constructs wrap operations in

around anonymously across modules of atreat routines as first-class values. Theyof software such as numerical applications,t-driven (or publish-subscribe) schemes.ming language techniques to improvingontinued reliance of significant parts of the

ges. In particular:evel system-related tend to be written in C,

for such applications in spite of widelyhe possibility of buffer overflow.ritical community sometimes prefers to useg assembly, for fear of the risks potentiallyother supporting tools.hallenge [38] [77] is an attempt to supporteven with such programming languages ss of compiling and thanks to techniques, the reliability of the programs they process.

7 STATIC VERIFICATION TECHNIQUES 21

7 STATIC VERIFICATION TECHNIQUESStatic techniques work solely from the analysis of the software text: unlikedynamic techniques such as tests they do not require any execution to verifysoftware or report errors.

Proofs

Perhaps the principal difference between mathematics and engineering is thatonly mathematics allows providing absolute guarantees. Given the properaxioms, I can assert with total confidence that two plus two equals four. But ifI want to drive to Berne the best asdown is a probability. I know itsand lower than if my goal were Pmake it higher by buying a new, bthe highest attention to qualityoccasionally fail.

Under appropriate assumptproposition rather than a material stating that all executions of theat least one possible execution witrue or not is entirely determinedassume correct functioning of theneeded to carry out program execsystem). Another way of expresslanguage is similar to a mathematitrue and others false, as determine

In principle, then, it should bprograms, in particular correctnesthe same rigorous techniques as iThis assumes overcoming a numb Programming languages are

theories but through natural-degree of precision. To mdescribing them in mathemmathematical semantics (orlanguage and is a huge tasadvanced mechanisms suchwell as the details of compuintegers and reals strays fromsurance I can get that my car will not breakhigher than if I just drive it to the suburbs,rague, Alma-Ata, Peking or Bombay; I canetter car; but it will never be one. Even withand maintenance, physical products will

ions, a program is like a mathematicaldevice: any general property of the program

program will achieve a certain goal, or thatll is either true or false, and whether it is

by the text of the program, at least if wehardware and of other software elements

ution (compiler, run-time system, operatinging this observation is that a programmingcal theory, in which certain propositions ared by the axioms and inference rules.e possible to prove or disprove properties ofs, robustness and security properties, usingn the proofs of any mathematical theorem.er of technical difficulties:

generally not defined as mathematicallanguage documents possessing a varyingake formal reasoning possible requires

atical form; this is known as providing aformal semantics) to a programming

k, especially when it comes to modelingas exception handling and concurrency, aster arithmetic since the computers view of their standard mathematical properties.


The theorems to be proved involve specific properties of programs, suchas the value of a certain variable not exceeding a certain threshold at acertain state of the execution. Any proof process requires the ability toexpress such properties; this means extending the programming languagewith boolean-valued expressions, called assertions. Common languagesother than Eiffel do not include an assertion mechanism; this means thatprogrammers will have to resort to special extensions such as JML forJava [43] (see also Spec#, an extension of the C# language [5]) andannotate programs with the appropriate assertions. Some tools such asDaikon help in this process by extracting tentative assertions from theprogram itself [31].

In practice the softwares acta supporting hardware and somust be complemented by gu

Not all properties lend themnon-functional propertiesbandwidth, memory occupati

More generally, a proof is onproven. What is being provedabsolute sense, nor even its qstated. It is never possible to kincluded. This is not just a theadvantage of auxiliary aspedesign and verification did no

Even if the language, the conspecified semantically andremains a challenge. It cannoeven the proof of a few properreaches into the thousands ofthe other hand, generally notcomputer-assisted proof tecapplications) significant prooand expert knowledge.

Of course the effort may well becritical systems in transportationwork has been directed; and reujustified as explained in the discthe scaling-up effect of reuse.ual operation depends, as noted, on those offtware environment; proofs of the softwarearantees about that environment.

selves to easy enunciation. In particular,such as performance (response time,

on) are hard to model.ly as useful as the program properties being

is not the perfection of the program in anyuality, but only that it satisfies the assertionsnow that all properties of interest have beenoretical problem: security attacks often takects of the programs behavior, which itst take into account.

text and the properties of interest are fullythe properties relevant, the proof processt in any case be performed manually, sinceties of a moderately sized programs quicklyproof steps. Fully automated proofs are, onpossible. Despite considerable advances inhnology (for programs as well as otherfs still require considerable user interaction

worthwhile, especially in two cases: life-and defense to which, indeed, much proofsable components, for which the effort isussion of Trusted Components above by


Here are some of the basic ideas about how proofs work. A typicalprogram element to prove would be, in Eiffel notation

This has a program body, the do cintroduced by require and a pconsisting of two subclauses impessentially boolean expressionspostcondition, of using the old nofirst subclause of the postconditiobeen decreased by one after execu

Program proofs deal withcontracted programs (see sectionproofs and other software qualitabsolute guarantees of quality: weonly assess it whether througmore partial ones such as those revproperties, expressed here through

From a programmers viewporoutine to be executed, with sompostcondition, expressing propertproof purposes this text is a theoreclause with its assignment instrsatisfied it will terminate in such a

This theorem appears to holdconcern noted above that compmathematical integers proviframework. The basic rule of axiomcovering such cases is the assignexpression e states that the follow

decrement-- Decrease counter by one.

requirecounter > 0

docounter := counter 1

ensurecounter = old counter 1counter >= 0

endlause, and two assertions, a preconditionostcondition introduced by ensure and

licitly connected by an and. Assertions areof the language with the possibility, in atation to refer to values on entry: here then states that the value of counter will havetion of the do clause.

such annotated programs, also called8 below). The annotations remind us thaty assurance technique can never give uscan never say that a program is correct,

h rigorous techniques like proofs or usingiewed next relatively to explicitly stated assertions integrated in the program text.int the above extract is simply the text of ae extra annotations, the precondition and

ies to be satisfied before and after. But form, asserting that whenever the body (the douction) is executed with the precondition way that the postcondition is satisfied.trivially but even before addressing the

uter integers are not quite the same asng it requires the proper mathematicalatic semantics (or Hoare semantics [37])

ment axiom, which for any variable x anding holds


where Q (x) is an assertion which may depend on x; then Q (e) is the sameassertion with every mention of x replaced by e, except for occurrences ofold x which must be replaced by x.

This very general axiom captures the properties of assignment (in theabsence of side effect in the evaluation of e); its remarkable feature is that it isapplicable even if the source expression e contains occurrences of the targetvariable x, as in the example (where x is counter).

We may indeed apply the axiom to prove the examples correctness. LetQ1 (x) be x = old x 1, corresponding to the first subclause of thepostcondition, and Q2 (x) be x >=replace counter by counter + 1counter 1 = counter 1, whictransformations to Q2 (counter), wto the precondition counter > 0.assertion-equipped example.

From there the theory moinference rule states that if you ha

and

(note the postcondition of the fisecond part) you are entitled to de

and so on for more instructions. Aproperties of if c then I1 else I2advanced is the case of loops: to p

require Q (e) do x := e ensure Q (x)

require P do Instruction_1 ensu

require Q do Instruction_2 ensu

require P do Instruction_1 ; Inst

fromInitialization

untilExit

loopBody

end0. Applying the rule to Q1 (counter), weand old counter by counter; this gives

h trivially holds. Applying now the samee get counter 1 >= 0, which is equivalentThis proves the correctness of our little

ves to more complex constructions. Anve proved

rst part matching the precondition of theduce

rule in the same style enables you to deduceend from properties of I1 and I2. More

rove the properties of

re Q

re R

ruction_2 ensure Rt


you need, in this general approach, to introduce a new assertion called the loopinvariant and an integer expression called the loop variant. The invariant isa weakened form of the desired postcondition, which serves as approximationof the final goal; for example if the goal is to compute the maximum of a setof values, the invariant will be Result is the maximum of the values processedso far. The advantage of the invariant is that it is possible both to:

Ensure the invariant through initialization (the from clause in the abovenotation); in the example the invariant will be trivially true if we start withjust one value and set Result to that value.

Preserve the invariant througclause); in the example it suffione element v and execute if

If indeed a loop possesses such anon exit the invariant will still holdand preserved by all the loop iteracombination of these two assertiothe other way around, if we startedit to get an invariant, we will obtaexit condition states that we havethis property with the invariantprocessed so far tells us that Resu

Such reasoning is only interminates; this is where the loopwhich must have a non-negativewhile remaining non-negative, whcondition not satisfied. The exisguarantee termination since a noforever. In the example a variant isbeing considered for the maximunumber of values processed.

Axioms and inference ruleprogramming languages, becominto more advanced mechanisms.h one iteration of the loop body (the loopces to extend the set of processed values byv > Result then Result := v end.

invariant and its execution terminates, then(since it was ensured by the initialization

tions), together with the Exit condition. Thens gives the postcondition of the loop. Seenfrom a desired postcondition and weakenedin a correct program. In the example, if theprocessed all values of interest, combiningResult is the maximum of the valueslt is the maximum of all values.

teresting if the loop execution actuallyvariant comes in. It is an integer expressionvalue after the Initialization and decrease,enever the Body is executed with the Exit

tence of such an expression is enough ton-negative integer value cannot decreaseN i where N is the total number of values

m (the proof assumes a finite set) and i the

s similarly exist for other constructs ofg, as noted, more intricate as one moves on


For concurrent, reactive and real-time systems, boolean assertions of thekind illustrated above may not be sufficient; it is often convenient to rely onproperties of temporal logic [47], which given a set of successiveobservations of a programs execution, can express, for a boolean property Q: forever Q: from now on, Q will always hold. eventually Q: at some point in the future (where future includes now),

Q will hold. P until Q: Q will hold at some point in the future, and until then P will hold.Regardless of the kind of programs and properties being targeted, there aretwo approaches to producing proprograms as they exist, then aftmanually or with some automatedconstructive method [24] [2] [68]construction process, often usispecification to implementation thproved to preserve correctness, anevery step.

Proof technology has had somsystems (and in hardware design),reach of most software projects.

Static analysis

If hoping for a proof covering aproperties of potential interest ismore approachable if we settle formay be very partial but very interethat no buffer overflow can ever ato provide a firm guarantee, throuindex used at run time to access awill be within the defined boundsout a whole class of security attac

Static analysis is the tool-spurpose of assessing specific qualexecution and hence can in princthan code. Proofs are a special canalysis techniques are available.gram proofs. The analytic method takeser equipping them with assertions, eitheraid as noted above, attempts the proof. Theintegrates the proof process in the softwareng successive refinements to go fromrough a sequence of transformations, eachd integrating more practical constraints at

e notable successes, including in industrialbut until recently has remained beyond the

ll the correctness, reliability and securityoften too ambitious, the problem becomesa subset of these properties a subset thatsting. For example being able to determine

rise in a certain program in other words,gh analysis of the program text, that everyn item in an array or a character in a string is of great practical value since this rulesks.

upported analysis of software texts for theity properties. Being static, it requires noiple be applied to software products otherase, the most far-reaching, but other static


At the other extreme, a well-established form of elementary staticanalysis is type checking, which benefits programs written in a statically typedprogramming language. Type checking, usually performed by the compilerrather than by a separate tool, ascertains the type consistency of assignments,routine calls and expressions, and rejects any program that contains atype incompatibility.

More generally, techniques usually characterized as static analysis liesomewhere between such basic compiler checks and full program proofs.Violations that can typically be detected by static analysis include:

Variables that, on some control paths, would be accessed before beinginitialized (in languages such

Improper array and string acc

Memory properties: attemptmemory leak

Pointer management (again ito follow void or otherwise in

Concurrency control: deadloc

Miscellaneous: certain casechanges to supposedly consta

Static analysis tools such as PRseveral years to new versions ofmany potential errors.

One of the issues of static ainconsistency reports that, on inspwas the weak point of older static awhich complements the type checkcan easily swamp their users unspurious, but requiring a manual w(In the search for errors, of coursconsidered the bad: evidence of wbeen successful in considerably re

The popularity of static analythe reach of static analysis toolsexamples are: as C that do not guarantee initialization).ess (buffer overflow).to access a freed location, double freeing,

n low-level languages such as C): attemptsvalid pointers.

ks, data races.

s of arithmetic overflow or underflow,nt strings

Efix [72] have been regularly applied forthe Windows code base and have avoided

nalysis is the occurrence of false alarms:ection, do not reveal any actual error. Thisnalyzers, such as the widely known Lint tooling of C compilers: for a large program theyder thousand of messages, most of themalkthrough to sort out the good from the bad.e, the good is what otherwise would berongdoing.) Progress in static analysis has

ducing the occurrence of false alarms.

sis is growing; the current trend is to extendever further towards program proofs. Two


Techniques of abstract interpretation [18] with the supporting ASTREtool [9], which has been used to prove the absence of run-time errors inthe primary flight control software, written in C, for the Airbus A340 fly-by-wire system.

ESC-Java [21] and, more recently, the Boogie analyzer [4] make programproving less obtrusive by incrementally extending the kind of diagnosticswith which programmers are familiar, for example type errors, to moreadvanced checks such as the impossibility to guarantee that an invariantis preserved.

Model checking

The model checking approach to vand static analysis, but provides(testing) studied below. The inherbe exhaustive; for any significantthe number of possible cases skyrwhere the orders of magnitude invparticles in the universe.

The useful measure is the nunotion of state was implicit in thesimply a snapshot of the programthat execution, by looking up therealistically by using the debuggvariables. Indeed it is the combdetermines the state. With every 6values, it is not surprising that the

Model checking attempts exhby performing predicate abstractireplacing all expressions by boolepossible values, so that the size owill still be large, but the poweralgorithms, can make its explordesired property holds for examof buffer overflows, or a timing prsuffices to evaluate the correspondif a violation of that assertion (oralso arises in the original programerification [36] [17] [3] is static, like proofsa natural link to the dynamic techniques

ent limitation of tests is that they can neversystem in fact, even for toy examples ockets into the combinatorial stratosphere,ite lyrical comparisons with the number of

mber of possible states of a program. Theearlier discussion of assertions. A state is

execution, as could be observed, if we stopcontents of the programs memory, or moreer to examine the values of the programsination of all the variables values that

4-bit integer variable potentially having 264 estimates quickly go galactic.

austive analysis of program states anywayon. The idea is to simplify the program byan expressions (predicates), with only twof the state space decreases dramatically; itof modern computers, together with smartation tractable. Then to determine that aple, a security property such as the absence

operty such as the absence of deadlock iting assertion in all of the abstract states and,counter-example) is found, to check that it.


For example, predicate abstraction will reduce a conditional instructionif a > b then... to if p then..., where p is a boolean. This immediately cuts downthe number of cases from 2128 to 2. The drawback is that the resulting programis only a caricature of the original; it loses the relation of p to other predicatesinvolving a and b. But it has an interesting property: if the original violates theassertion, then the abstracted version also does. So the next task is to look forany such violation in the abstracted version. This may be possible throughexhaustive examination of its reduced state space, and if so is guaranteed tofind any violation in the original program, but even so is not the end of the story,since the reverse proposition does not hold: a counter-example in the abstractedprogram does not necessarily signal a counter-example in the original. It couldresult from the artificial merging oa path impossible in an executselecting both p and q as true whexamining the state space of the a Not find any violations, in w

original program. Report violations, each of w

simply a false alarm generateSo the remaining task, if counterwhether they arise in the originalthat leads to each counter-exampprogram variables (that is to say,in the example, a > b and b > a +values for the program variablcombination, or variable assignmeone; if not, as in the case given, it

This problem of predicate saefficient algorithms is one of the c

The focus on counter-examadvantage over traditional proof tbuilt with verification in mind (thabove), the first attempt to verifydoesnt tell us the source of the probof the proof procedure rather than ayou get a counter-example which d

Model checking has capturedin hardware design and then in reassertions of interest are often expf several cases, for example if it occurs onion of the original program obtained byere q is the abstraction of b > a + 1. Thenbstracted program will either:hich case it proves there was none in the

hich might be an error in the original ord by the abstraction process.-examples have been found, is to ascertain. This involves defining the path predicatele, expressing it in terms of the original

removing the predicate abstraction, giving,1) and determining if any combination of

es can satisfy the predicate: if such ant, exists, then the counter-example is a real

is spurious.tisfiability is computationally hard; findingentral areas of research in model checking.ples gives model checking a practical

echniques. Unless a software element wasrough a constructive method as definedit will often fail. With proofs, this failurelem and could actually signal a limitation

n error in the program. With model checking,irectly shows whats wrong.considerable attention in recent years, first

active and real-time systems, for which theressed in temporal logic.


8 DESIGN BY CONTRACT

The goal of developing software to support full proofs of correctnessproperties is, as noted, desirable but still unrealistic for most projects. Even ashort brush with program proving methods suggests, however, that more rigorcan be highly beneficial to software quality. The techniques of Design byContract go in this direction and deliver part of the corresponding benefitswithout requiring the full formality of proof-directed development.

The discussion of proofs introduced Eiffel notations such as require assertion -- A routine precondition ensure assertion -- A associated with individual routinewhich specify abstract semantic papply in particular to: Individual routines: precond

routine is applicable; postcguarantee in return when it te

In object-oriented programconsistency conditions that mstate. For example, the invprocessing system may stateequal to the paragraph widthof the class may assume theprecondition) and must resits postcondition).

Loops: invariant and (intege Individual instructions: asseThe discipline of Design by Contramechanisms in software developma system as defining a multitudsupplier modules, each specifierelationships between companies

The benefits of such a mthroughout the lifecycle, supportin Contracts can be used to ex

precise yet understandable wnotations, although of courseroutine postconditions. They are examples of contract elementsroperties of program constructs. Contracts

ition, stating the condition under which aondition, stating what condition it willrminates.ming, classes: class invariant, statingust hold whenever an object is in a stable

ariant for a paragraph class in a textthat the total length of letters and spaces is. Every routine that can modify an instanceclass invariant on entry (in addition to its

tore it on exit (in addition to ensuring

r) variant as discussed above.rt or check constructs.ct [53] [57] [67] gives a central role to theseent. It views the overall process of buildinge of relationships between client and

d through a contract in the same manner asin the commercial world.ethod, if carried systematically, extendg the goal of seamlessness discussed earlier:press requirements and specifications in aay, preferable to pure bubbles and arrows they can be displayed graphically too.

8 DESIGN BY CONTRACT 31

The method is also a powerful guide to design and implementation,helping developers to understand better the precise reason and context forevery module they produce, and as a consequence to get the module right.

Contracts serve as a documentation mechanism: the contract view ofa class, which discards implementation-dependent elements but retainsexternally relevant elements and in particular preconditions,postconditions and class invariants, often provides just the right form ofdocumentation for software elements, especially reusable components:precise enough thanks to the contracts; abstract enough thanks to theremoval of implementation properties; extracted from the program text,and hence having a better chance of being up to date (at least one majorsoftware disaster was tracspecification had changed, uncheap to produce, since this ftools from the source text,purpose, since the output canHTML. Eiffel environments[30], which serve as the basic

Contracts are also useful forhigh level of abstraction, and

In object-oriented programmproper use of inheritance, byframework within which rouclasses. This is connectemanagement, since a conseqrefinements to an design are chave been defined by the topin the form of contracts.

Most visibly, contracts are a texecution that violates an acontract monitoring durintechnique for identifying bugthe tools cited in the discussi

Design by Contract mechanismslanguage [52] [28] and a key parDozens of contract extensions halanguages (as well as UML [80]),for Java and the Spec# extension oed [41] to a software element whosebeknownst to the developers who reused it);orm of documentation can be generated byrather than written separately; and multi-be tuned to any appropriate format such assuch as EiffelStudio produce such views form of software documentation.

managers to understand the software at a as a tool to control maintenance.

ing, contracts provide a framework for theallowing developers to specify the semantictines may be further refined in descendantd with the preceding comment aboutuence is to allow a manager to check thatonsistent with its original intent, which maydesigners in the organization and expressed

esting and debugging mechanism. Since anssertion always signals a bug, turning ong development provides a remarkables. This idea is pursued further by some of

on of testing below.

are integrated in the design of the Eiffelt of the practice of the associated method.ve been proposed for other programmingincluding many designs such as JML [43]f C# [5].


9 TESTING

Testing [70] [8] is the most widely used form of program verification, and stillfor many teams essentially the only one. In academic circles testing has longsuffered from a famous comment [23] that (because of the astronomicalnumber of possible states) testing can only show the presence of bugs, butnever to show their absence. In retrospect its hard to find a rationalexplanation for why this comment ever detracted anyone from the importanceof tests, since it in no way disproves the usefulness of testing: finding bugs isa very important task of software development. All it indicates is that weshould understand that finding bugs is indeed the sole purpose of testing, andnot delude ourselves that test resuproduct under development.

Components of a test

Successful testing relies on a testdescribing choices for the tasks of

Determining which parts to t

Finding the appropriate input

Determining the expected prInput values and the associatcollection of which constitute

Instrumenting the software tooperation, or in addition to itwhich may involve test drivestubs to stand for parts of thplaceholder when other parts

Running the software on the

Comparing the outputs and b

Recording the test data (testof the system, in particular rpreviously corrected errors h

In addition there will be a phase otest, but in line with the above obstrict sense.lts directly reflect the level of quality of a

plan: a strategy, expressed in a document, the testing process. These tasks include:

est.

values to exercise.

operties of the results (known as oracles).ed oracles together make up test cases, thes a test suite.

run the tests (rather than perform its normal); this is known as building a test harness,rs to solicit specific parts to be tested, ande system that will not be tested but need a call them.

selected inputs.

ehavior to the oracles.

cases, oracles, outputs) for future re-testingegression testing, the task of verifying thatave not reappeared.

f correction of the errors uncovered by theservations this is not part of testing in the

9 TESTING 33

Kinds of test

One may classify tests with respect to their scope (this was used in the earlierdescription of the V model of the lifecycle): A unit test covers a module of the software.

Integration test covers a complete cluster or subsystem.

A system test covers the complete delivery.

User Acceptance Testing involves the participation of the recipients of thesystem (in addition to thevariants) to determine whethe

Business Confidence Testinconditions as close as possibl

An orthogonal classification addre

Functional testing: whetherthe specification.

Performance testing: its use o Stress testing: its behavior

user load.

Yet another dimension is intendeficiencies but also (despite theestimate satisfaction of desired prdecide whether to approve the prtests corresponding to previously ierrors have a knack for surging balong after they were thought corre

The testing technique, in part

Black-box: based on knowled

White-box: based on knowledexample to try to exercise as

Observing the state of the art in socritical: managing the test procdevising oracles; and the toughdevelopers, responsible for the precedingr they are satisfied with the delivery.g is further testing with the users, ine to the real operating environment.

sses what is being tested:

the system fulfills the functions defined in

f resources.

under extreme conditions, such as heavy

t: testing can be fault-directed to findabove warnings), conformance-directed tooperties, or acceptance testing for users tooduct. Regression testing, as noted, re-runsdentified errors; surprisingly to the layman,ck into the software, sometimes repeatedly,cted.

icular the construction of test suites, can be:

ge of the systems specification only.

ge of the code, which makes it possible formuch of that code as possible.

ftware testing suggests that four issues areess; estimating the quality of test suites;est generating test cases automatically.


Managing the testing processTest management has been made easier through the appearance of testingframeworks such as JUnit [42] and Gobo Eiffel Test [7] which record testharnesses to allow running the tests automatically. This removes a considerablepart of the burden of testing and is important for regression testing.

An example of a framework for regression testing of a compiler,incorporating every bug ever found since 1991, is EiffelWeasel [29]. Suchautomated testing require a solid multi-process infrastructure, to ensure forexample that if a test run causes a crash the testing process doesnt also crashbut records the problem and moves on to the next test.

Estimating test qualityBeing able to estimate the qualityknow when to stop testing. Theblack-box testing.

With white-box testing it is peach assuming the preceding onesthe execution of the selected testonce; branch coverage, where everand once to false; condition coversub-expressions; path coverage, fcoverage, where each loop body h

Another technique for meapproaches is mutation testing [test suite, this consists of makingkind of errors that programmers wthe tests again. If a mutant progryou have made sure the mutant is nthe changes are meaningful) that this an active area of research [71];mutation operators, to ensure dive

With black-box testing the pthey assume access to the sourcedefine notions of specification coexercised the various cases listedthis will mean analyzing the varioutesting [81] is the general name fothe input domain into representativsuite must cover all the subsets.of a test suite is essential in particular totechniques are different for white-box and

ossible to define various levels of coverage,: instruction coverage, ensuring that throughcases every instruction is executed at leasty boolean condition tests at least once to trueage, where this is also the case for booleanor which every path has been taken; loopas been executed at least n times for set n.asuring test suite quality in white-box79]. Starting with a program that passes itsmodifications similar, if possible, to theould make to the program, and runningam still passes the tests, this indicates (onceot equivalent to the original, in other words,e tests were not sufficient. Mutation testingone of the challenges is to use appropriatersity of the mutants.revious techniques are not available since

code to set up the test plan. It is possible toverage to estimate whether the tests have

in the specification; if contracts are present,s cases listed in the preconditions. Partitionr techniques (black- or white-box) that splite subsets, with the implication that any test

9 TESTING 35

Defining oraclesAn oracle, allowing interpretation of testing results, provides a decision criterionfor accepting or rejecting the result of a test. The preparation of oracles can be asmuch work as the rest of the test plan. The best solution that can be recommendedis to rely on contracts: any functional property of a software system (with thepossible exception of some user-interface properties for which human assessmentmay be required) can be expressed as a routine postcondition or a class invariant.

These assertions can be included in the test harness, but it is of coursebest, as noted in the discussion of Design by Contract, to make them anintegral part of the software to be tested as it is developed; they will thenprovide the other benefits cited, such as aid to design and built-indocumentation, and will facilitateTest case generationThe last of the four critical issuestoughest; automatic generation inclose to exhaustive testing, we wapossible, and especially to makepotential program executions coverage measures and mutation, b

For any realistic program,enough cases; in addition, they aautomatic test case generation, whtest cases as possible, typically woTwo tools in this area are Korat(which draws on the advantage texisting Eiffel software is typicallyso that AutoTest can be run on softa significant number of problems

Manual tests, which benefit frotwo kinds are complementary: magenerated tests at breadth. In parwhether through manual or automregression test suite. AutoTest integthe automatic test case generation a

Automatic test case generatContrary to intuition, random tesfrom the input domain, can bereasonably even distribution overrandom testing [14] which has so fvalues (for which a clear notion ois immediately meaningful). Recoriented programming by defining regression testing.

listed, test case generation, is probably theparticular. Even though we cant ever get

nt the test process to cover as many cases assure they are representative of the variousas can be assessed in white-box testing byut needs to be sought in any form of testing.manually prepared tests will never coverre tedious to prepare. Hence the work onich tries to produce as many representativerking from specifications only (black-box).for JML [13] and AutoTest for Eiffel [15]hat contracts being native to Eiffel equipped with large numbers of assertions,

ware as is, and indeed has already uncoveredin existing programs and libraries).m human insight, remain indispensable. Thenual tests are good at depth, automatically

ticular, any run that ever uncovered a bug,atic techniques, should become part of therates manual tests and regression tests withinnd execution framework [44].ion needs a strategy for selecting inputs.ting [34], which selects test data randomlyan effective strategy if tuned to ensure athat domain, a policy known as adaptive

ar been applied to integers and other simplef distance exists, so that even distributionent work [16] extends the idea to object- a notion of object distance.


10 CONCLUSION

This survey has taken a broad sweep across many techniques that all havesomething to contribute to the aim of software reliability. While it has stayedaway from the gloomy picture of the state of the industry which seems to bede rigueur in discussions of this topic, and is not justified given theconsiderable amount of quality-enhancing ideas, techniques and tools that areavailable today and the considerable amount of good work currently inprogress, it cannot fail to note as a conclusion that the industry could do muchmore to take advantage of all these efforts and results.

There is not enough of a reoften, the order of concerns is cosreassess priorities.

Acknowledgments

The material in this chapter derivecourse on Testing and Software QIlinca Ciupa, Andreas Leitner andbenefited from the work of Petecourse, Software Engineering forBernd Schoeller and Ilinca Ciupa

Design by Contract is a tra

The context for this surveygrant for our SCOOP work in theopportunities that the grant and thexperience gained in the two DIC

REFERENCES

Note: All URLs listed were active

[1] Algirdas Avizienis, Jean-ClauConcepts of Dependability, in ProReport, October 2000, pagesciteseer.ist.psu.edu/article/avizien[2] Ralph Back: A Calculus of ReInformatica, vol. 25, 1988, pages 59public/1988/ACalculusOfRefinemeliability culture in the software world; toot, then deadlines, then quality. It is time to

s in part from the slides for an ETH industryuality Assurance prepared with the help ofBernd Schoeller. The discussion of CMMI

r Kolb in the preparation of another ETHOutsourced and Offshored Development.

provided important comments on the draft.

demark of Eiffel Software.

was provided by the Hasler FoundationsDICS project. We are very grateful for the

e project have provided, in particular for theS workshops in 2004 and 2005.

in April 2006.

de Laprie and Brian Randell: Fundamentalceedings of Third Information Survivability7-12, available among other places atis01fundamental.html.finements for Program Derivations, in Acta3-624, available at crest.cs.abo.fi/publications/ntsForProgramDerivationsA.pdf.

10 REFERENCES 37

[3] Thomas Ball and Sriram K. Rajamani: Automatically Validating TemporalSafety Properties of Interfaces, in SPIN 2001, Proceedings of Workshop onModel Checking of Software, Lecture Notes in Computer Science 2057,Springer-Verlag, May 2001, pages 103-122, available at tinyurl.com/qrm9m.

[4] Mike Barnett, Robert DeLine, Manuel Fhndrich, K. Rustan M. Leino,Wolfram Schulte: Verification of object-oriented programs with invariants, inJournal of Object Technology, vol. 3, no. 6, Special issue: ECOOP 2003workshop on Formal Techniques for Java-like Programs, June 2004, pages 27-56, available at www.jot.fm/issues/issue_2004_06/article2.[5] Mike Barnett, K. Rustan M. Leino and Wolfram Schulte: The Spec#Programming System: An OvervAnalysis of Safe, Secure InteropComputer Science 3362,research.microsoft.com/specsharppapers at research.microsoft.com/

[6] Kent Beck and Cynthia AEmbrace Change. 2nd edition, Ad

[7] ric Bezault: Gobo Ewww.gobosoft.com/eiffel/gobo/ge

[8] Robert Binder: Testing ObjecTools, Addison-Wesley, 1999.

[9] Bruno Blanchet, Patrick CouMauborgne, Antoine Min, DavidStatic Analyzer for Large SafetyVerification, Dagstuhl Seminarwww.di.ens.fr/~cousot/COUSOTtASTRE page at www.astree.ens[10] Barry W. Boehm: Software E[11] Barry W. Boehm: A SpirEnhancement, in Computer (IEEE[12] Barry W. Boehm et al.: SofPrentice Hall, 2000.

[13] Chandrasekhar Boyapati, SaAutomated Testing Based on JavInternational Symposium on SoftJuly 22--24, 2002, available at tiniew, in CASSIS 2004: Construction anderable Smart devices, Lecture Notes in

Springer-Verlag, 2004, available at/papers/krml136.pdf; see also other Spec#specsharp/.

ndres: Extreme Programming Explained:dison-Wesley, 2004.

iffel Test, online documentation attest/index.html.

t-Oriented Systems: Models, Patterns, and

sot, Radhia Cousot, Jrme Feret, LaurentMonniaux and Xavier Rival: ASTRE: A

-Critical Software, in Applied Deductive3451, November 2003, available at

alks/Dagstuhl-3451-2003.shtml. See also.fr.

ngineering Economics, Prentice Hall, 1981.

al Model of Software Development and), vol. 21, no. 5, May 1988, pages 61-72.tware Cost Estimation with COCOMO II,

rfraz Khurshid and Darko Marinov: Korat:a Predicates, in Proceedings of the 2002ware Testing and Analysis (ISSTA), Rome,yurl.com/qwwd3.


[14] T.Y. Chen, H. Leung and I.K. Mak: Adaptive random testing, in Advancesin Science - ASIAN 2004: Higher-Level Decision Making, 9th AsianComputing Science Conference, ed. Michael J. Maher, Lecture Notes inComputer Science 3321, Springer-Verlag, 2004, available attinyurl.com/lpxn5.[15] Ilinca Ciupa and Andreas Leitner: Automated Testing Based on Design byContract, in Proceedings of Net.ObjectsDays 2005, 6th Annual Conference onObject-Oriented and Internet-Based Technologies, Concepts andApplications for a Networked World, 2005, pages 545-557, available atse.ethz.ch/people/ciupa/papers/soqua05.pdf. See also AutoTest page atse.ethz.ch/research/autotest.[16] Ilinca Ciupa, Andreas LeitnerDistance and its Application to APrograms, submitted for publicatipublications/testing/object_distan[17] Edmund M. Clarke Jr., OrnChecking, MIT Press, 1999.[18] Patrick Cousot: VerificationSymposium on Verification TheoryBirthday, ed. Nachum DershowitzSpringer-Verlag, 2003, pages 243-[19] Mic

Dependability

Documents

notn software reliability

software quthere

f software quality research

communication reliability

dependable software2there

dependable systems

familiar term

complementary techniques