Top Banner
SENG 521 SENG 521 Software Reliability & Software Reliability & Software Reliability & Software Reliability & Software Quality Software Quality Chapter Chapter 1: : Overview Overview Department of Electrical & Computer Engineering, University of Calgary B.H. Far [email protected]http://www enel ucalgary ca/People/far/Lectures/SENG521/ [email protected] 1 http://www.enel.ucalgary .ca/People/far/Lectures/SENG521/ Contents Contents Sh t i Sh t i Shorter version: Shorter version: How to avoid these? How to avoid these? [email protected] 2 Contents Contents L i L i Longer version: Longer version: What is this course about? What is this course about? What factors affect software What factors affect software quality? quality? Wh t Wh t i ft li bilit ? i ft li bilit ? What What is software reliability? is software reliability? What What is software reliability is software reliability engineering engineering? engineering engineering? What is software What is software reliability reliability engineering engineering process? process? engineering engineering process? process? [email protected] 3 What Affects Quality? What Affects Quality? SENG521 [email protected] 4 What Affects What Affects Software Quality? Software Quality? Time: Time: Time: Time: Meeting the project deadline. Quality Reaching the market at the right time. C t C t Cost: Cost: Meeting the anticipated project costs. Cost Time Quality (reliability): Quality (reliability): Working fine for the People Technology designated period on the designated system. People Technology [email protected] 5 Terminology & Scope Terminology & Scope Failures Treats Failures Faults Errors The ability of a system to deliver service that can The ability of a system to deliver service that can Attributes Availability Reliability Safety C fid ti lit service that can justifiably be trusted. service that can justifiably be trusted. Dependability Attributes Confidentiality Integrity Maintainability Dependability Means Fault prevention Fault tolerance Fault removal Models Fault removal Fault forecasting Reliability Block Diagram Fault Tree model [email protected] 6 Fault Tree model Reliability Graph
12

Chapter Chapter 11: : OverviewOverview

Oct 19, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter Chapter 11: : OverviewOverview

SENG 521SENG 521Software Reliability & Software Reliability & Software Reliability & Software Reliability & Software QualitySoftware Qualityyy

Chapter Chapter 11: : OverviewOverview

Department of Electrical & Computer Engineering, University of Calgary

B.H. Far ([email protected])http://www enel ucalgary ca/People/far/Lectures/SENG521/

[email protected] 1

http://www.enel.ucalgary.ca/People/far/Lectures/SENG521/

ContentsContentsSh t iSh t iShorter version:Shorter version: How to avoid these?How to avoid these?

[email protected] 2

ContentsContentsL iL iLonger version:Longer version: What is this course about?What is this course about? What factors affect software What factors affect software

quality?quality?Wh tWh t i ft li bilit ?i ft li bilit ? What What is software reliability?is software reliability?

What What is software reliability is software reliability engineeringengineering??engineeringengineering??

What is software What is software reliability reliability engineeringengineering process?process?engineering engineering process?process?

[email protected] 3

What Affects Quality?What Affects Quality?

SENG521 [email protected] 4

What Affects What Affects Software Quality?Software Quality?Time:Time: Time:Time: Meeting the project

deadline. Quality Reaching the market at

the right time.C tC t

Q y

Cost:Cost: Meeting the anticipated

project costs.

Cost Time

p j Quality (reliability):Quality (reliability):

Working fine for the People Technologydesignated period on

the designated system.People Technology

[email protected] 5

Terminology & ScopeTerminology & ScopeFailures

TreatsFailuresFaultsErrors

The ability of a system to deliver service that can

The ability of a system to deliver service that can

Attributes

AvailabilityReliabilitySafetyC fid ti lit

service that can justifiably be trusted.service that can justifiably be trusted.

Dependability Attributes ConfidentialityIntegrityMaintainability

Dependability

MeansFault preventionFault toleranceFault removal

Models

Fault removalFault forecasting

Reliability Block DiagramFault Tree model

[email protected] 6

Fault Tree modelReliability Graph

Page 2: Chapter Chapter 11: : OverviewOverview

Software ReliabilitySoftware Reliability

Software ReliabilitySoftware Reliability

ModelModel ProcessProcessConceptConcept

Single Single FailureFailureM d lM d l

Multiple Multiple FailureFailureM d lM d l

SRE SRE ProcessProcess

ReliabilityReliabilityAvailabilityAvailabilityFailure rateFailure rate ModelModel ModelModelFailure rateFailure rateMTTFMTTFFailure densityFailure density

ReliabilityReliabilityGrowthGrowthModelModel

Etc.Etc.

[email protected] 7

Software Software QualityQuality

Software Software QualityQuality

StandardsStandards& Models& Models

AssessmentAssessmenttechniquestechniques

Size & EffortSize & Effort

ISO 9126ISO 9126 PrePre--releasereleasePredict sizePredict sizePredict effortPredict effortPredict timePredict time PostPost--releasereleasePredict timePredict time

[email protected] 8

At The End …At The End … What is software quality? What affects software quality? What is software quality? What affects software quality? What is software reliability engineering (SRE)? Why SRE is important? How does it affect software quality?

Wh t th i f t th t ff t th li bilit f ft ? What are the main factors that affect the reliability of software? How can one determine what the size of the software will be? How can one determine how much it will cost to develop the software? How can one determine how often will the software fail? How can one determine the current quality of the software under

development? How can one determine whether the software is reliable enough to be

released? Can SRE methodology be applied to the current ways of software

d l t lik bj t i t d t b d d ildevelopment: like object-oriented, component-based and agile development?

What are challenges and difficulties of applying SRE?

[email protected] 9

Question to AskQuestion to AskD I ll d t t k thi ? Do I really need to take this course?

Answer depend on you! Take this course if you want to avoid these in your

career as a software designer, tester and quality t llcontroller:

[email protected] 10

Bug Fix

MoralMoralI d t i l i t d Industrial oriented course Novelty of problem; multiplicity of solution

Attend lectures: there is more to lectures than to the notesthan to the notes

Sit close-up: some of the material is hard to see from the back

Feel free to ask questions.

[email protected] 11

Section Section 11

From SoftwareFrom Software Quality toQuality toSoftware Reliability Software Reliability EngineeringEngineering

[email protected] 12

Page 3: Chapter Chapter 11: : OverviewOverview

What is Quality?What is Quality?Q i iQ i i Quality popular view:Quality popular view:

– Something “good” but not ifi blquantifiable

– Something luxury and classy

Quality professional view:Quality professional view:– Conformance to requirement

(Crosby, 1979)– Fitness for use (Juran, 1970)

SENG521 [email protected] 13

Quality: Various ViewsQuality: Various Views

Aesthetic Aesthetic ViewView

Developer Developer ViewViewViewView ViewView

Customer Customer ViewView

[email protected] 14

What is Software Quality?What is Software Quality?C f t i tC f t i t Conformance to requirementConformance to requirement The requirements are clearly stated and the

product must conform to itWhat isWhat isWhat isWhat is Any deviation from the requirements is

regarded as a defect A good quality product contains fewer defectsWhatWhat

What isWhat isSpecified?Specified?

What isWhat isSpecified?Specified?

A good quality product contains fewer defects Fitness for useFitness for use

Fit to user expectations: meet user’s needs

SW SW Does?Does? WhatWhat

useruserN d ?N d ?

WhatWhatuseruser

N d ?N d ? A good quality product provides better user

satisfactionNeeds?Needs?Needs?Needs?

BothBoth Dependable computing systemDependable computing systemBothBoth Dependable computing systemDependable computing system

SENG521 [email protected] 15

Both Both Dependable computing systemDependable computing systemBoth Both Dependable computing systemDependable computing system

Definition: Software QualityDefinition: Software QualityISO 8402 d fi itiISO 8402 definition

of QUALITY:The totality ofThe totality of features and characteristics of acharacteristics of a product or a service that bear on its ability to satisfy stated or implied needsneeds

ReliabilityReliability and MaintainabilityMaintainability are two major components of Quality

SENG521 [email protected] 16

Quality Model: ISO 9126Quality Model: ISO 9126

Characteristics Characteristics AttributesAttributes

1. Functionality Suitability Interoperability Accuracy

Compliance Security

2. Reliability Maturity Recoverability Fault tolerance

C h fCrash frequency

3. Usability Understandability Learnability Operability

4. Efficiency Time behaviour Resource behavioury

5. Maintainability Analyzability Stability Changeability

Testability

6. Portability Adaptability Installability Conformance

Replacability

[email protected] 17

Quality Model Quality Model –– StructureStructure

SW QualitySW QualityUser oriented

Quality Factor 1Quality Factor 1

Quality Factor 2Quality Factor 2

Quality Factor nQuality Factor n......

Quality Quality Quality Quality Quality Quality Quality Quality yCriterion

1

yCriterion

1

yCriterion

m

yCriterion

m......

yCriterion

2

yCriterion

2

yCriterion

3

yCriterion

3Software oriented

MeasuresMeasures

[email protected]

Page 4: Chapter Chapter 11: : OverviewOverview

Example: Attribute ExpansionExample: Attribute ExpansionD i b blD i b bl Quality objective Design by measurable Design by measurable objectives:objectives:Incremental design is

Quality objective

Incremental design is evaluated to check whether the goal for each increment

Availability User friendliness

the goal for each increment was achieved.

% of planned Days on job to% of planned System uptime

Days on job to learn task suppliedBy new system

Worst: 95%Best: 99%

Worst: 7 daysBest: 1 day

[email protected] 19

Quality vs. Project CostsQuality vs. Project CostsC t di t ib ti f t i l ft j t

IntegrationProductDesign

Cost distribution for a typical software project

and test

Programming

Release

Programming

Design Programming Testing

What is wrong with this picture?

SENG521 [email protected] 20

What is wrong with this picture?

Total Cost DistributionTotal Cost DistributionMaintenance is responsible for more that 60% of total cost

Product Design

Maintenance is responsible for more that 60% of total cost for a typical software project

Questions:Questions:g

Programming

1) How 1) How to to build qualitybuild qualityProgramming build quality build quality into a system?into a system?

Integrationand test

Maintenance 2) How 2) How to to assess quality assess quality of a system?of a system?and test

Developing better quality system will contribute to lowering maintenance costs

of a system?of a system?

SENG521 [email protected] 21

contribute to lowering maintenance costs

1) How to Build Quality into a 1) How to Build Quality into a System?System?D l i b tt lit t i Developing better quality systems requires:

Establishing Quality Assurance (QA) Quality Assurance (QA) programs

Establishing Reliability Engineering (SRE)Reliability Engineering (SRE)process

SENG521 [email protected] 22

2) How to Assess Quality of a 2) How to Assess Quality of a System?System?

l b h Relevant to both pre-release and post-

lQuality

release Pre-release: SRE,

Q yAssessment

certification, standards ISO9001

Post-release: evaluation, validation, , ,RAM

SENG521 [email protected] 23

How Do We Assess Quality?How Do We Assess Quality?AA ( i( i AdAd--hoc (trial hoc (trial and error) and error)

h!h!approach!approach!

Systematic Systematic approachapproachpppp

SENG521 [email protected] 24

Page 5: Chapter Chapter 11: : OverviewOverview

PrePre--release Qualityrelease QualityS ft Software inspection and

Facts:• About 20% of the software

projects are canceled. (missedtesting

Methods:

projects are canceled. (missed schedules, etc.)

• About 84% of software projects Methods:

SREare incomplete when released (need patch, etc).

• Almost all of the software projects Certification Standards

ost a o t e so twa e p ojectscosts exceed initial estimations. (cost overrun)

ISO9001, 9126, 25000

SENG521 [email protected] 25

Fatal Software ExamplesFatal Software Examples

Fatal software related incidents [Gage & McCormick 2004]Date Casualties Detail 2003 3 S ft f il t ib t f t N th t2003 3 Software failure contributes of power outage across North-eastern

U.S. and Canada.

2001 5 Panamanian cancer patients die following overdoses of radiation, determined by the use of faulty softwaredetermined by the use of faulty software.

2000 4 Crash of marine corps osprey tilt-rotor aircraft, partially blamed on software anomaly.

d h ld h d j h h bbl d b1997 225 Radar that could have prevented Korean jet crash hobbled by software problem.

1995 159 American airlines jet, descending into Cali, Columbia crashes into t i A th t th ft t d i ffi i ta mountain. A cause was that the software presented insufficient

and conflicting information to the pilots, who got lost.

1991 28 Software problem prevents Patriot missile battery from picking up SCUD missile which hits US Army barracks in Saudi Arabia

[email protected] 26

SCUD missile, which hits US Army barracks in Saudi Arabia.

Cost of a Defect …Cost of a Defect …Require-ments

FieldUseDesign Functional

TestSystem

TestCoding

50 % Fault

Fault D t ti

40 %10 %

50 %

50 %

Origin

Detection10 %

25 %50 %

3 % 5 % 7 %

20 KDM

Cost per Fault

6 KDM

12 KDM

1 KDM 1 KDM 1 KDM

Fault

1 KDM 1 000 D t h M k

[email protected] 27

1 KDM = 1,000 Deutsch MarksCMU. Software Engineering Institute

A Central QuestionA Central Question

In spite of having many development methodologies, central questions are:g q

1 C ll b b f l ?1. Can we remove all bugs before release?2. How often will the software fail?f f f

[email protected] 28

Two ExtremesTwo ExtremesC ft SE f t h b Craftsman SE: fast, cheap, buggy

Cleanroom SE: slow, expensive, zero defect Is there a middle solution?

CraftsmanCraftsmanSoftware

CleanroomCleanroomSoftwareIs there

YES!

U i S ftDevelop-ment

Develop-ment

Is there a middle solution?

Using Software Reliability

EngineeringEngineering (SRE) Process

[email protected] 29

Can We Remove All Bugs?Can We Remove All Bugs?

Size [function points]

Failure potential [development]

Failure removal rate Failure Density [at release]

1 1.85 95% 0.09 10 2.45 92% 0.20

100 3 68 90% 0 37100 3.68 90% 0.37 1000 5.00 85% 0.75

10000 7.60 78% 1.67 100000 9.55 75% 2.39 Average 5.02 86% 0.91

Defect potential and density are expressed in terms of defects per function point

The answer is usually NO!The answer is usually NO!

[email protected] 30

The answer is usually NO!The answer is usually NO!

Page 6: Chapter Chapter 11: : OverviewOverview

What Can We Learn from Failures?What Can We Learn from Failures?Time Between Failure vs. ith Failure

900

1000

Does this plot make

700

800

Does this plot make any sense to you?

400

500

600

Hou

rs

200

300

0

100

1 11 21 31 41 51 61 71 81 91

ith F il

[email protected] 31

ith Failure Failure Time

How to Handle Defects?How to Handle Defects?T bl b l i th ti b t f il Table below gives the time between failures for a software system:

Failure no. 1 2 3 4 5 6 7 8 9 10Time since last failure (hours) 6 4 8 5 6 9 11 14 16 19

What can we learn from this data?S t li bilit ? System reliability?

Approximate number of bugs in the system? Approximate time to remove remaining bugs?

[email protected] 32

What to Learn from Data?What to Learn from Data?Th i f th i t f il ti th The inverses of the inter-failure times are the failure intensity (= failure per unit of time) d idata points

Error no. 1 2 3 4 5 6 7 8 9 10

Time since last failure (hours)

6 4 8 5 6 9 11 14 16 19

Failure intensity 0.166 0.25 0.125 0.20 0.166 0.111 0.09 0.071 0.062 0.053

[email protected] 33

What to Learn from Data?What to Learn from Data?M ti t f il MTTF ( f il t ) Mean-time-to-failures MTTF (or average failure rate)MTTF = (6+4+8+5+6+9+11+14+16+19)/10 = 9.8 hour

System reliability for 1 hour of operation System reliability for 1 hour of operation1

9.8 0.90299tt MTTFR e e e

Fitting a straight line to the graph in (a) would show an x-intercept of about 15. Using this as an estimate of the total number of original failures, we estimate that there are still five bugs in the software.Fitting a straight line to the graph in (b) would give an x Fitting a straight line to the graph in (b) would give an x-intercept near 160. This would give an additional testing time of 62 units to remove all bugs, approximately.

[email protected] 34

A Typical Problem: QuestionA Typical Problem: QuestionF il i t it (f il t ) f t i ll Failure intensity (failure rate) of a system is usually expressed using FIT (Failure-In-Time) unit which is 1 failure per 10**9 device hours1 failure per 10**9 device hours.

Failure intensity of an electric pump system used for pumping crude oil in Northern Alberta’s oil fieldfor pumping crude oil in Northern Alberta s oil field is constant and is 10,000 FITs and 100 such pumps are operational.are operational.

If for continuous operation all failed units are to be replaced immediately, what shall be the minimumreplaced immediately, what shall be the minimum inventory size of pumps for one year of operation?

[email protected] 35

A Typical Problem: AnswerA Typical Problem: AnswerP ’ M Ti T F il (MTTF)Pump’s Mean-Time-To-Failure (MTTF) λ = 10,000 FITs = 10,000 / 10**9 hour = 1×10**-5 hour

= 1 failure per 100 000 hours= 1 failure per 100,000 hours

The 12-month reliability is: (1 year = 8 760 hours)The 12 month reliability is: (1 year 8,760 hours) R(8,760 hours) = exp{-8,760/100,000} = 0.916 and “unreliability” is, F(8,760) = 1 - 0.916 = 0.084

Therefore, inventory size is 8.4% or minimum 9 pumps should be at stock in the first year.

[email protected] 36

Page 7: Chapter Chapter 11: : OverviewOverview

Another Typical ProblemAnother Typical Problem

Unit manufacturing cost of a software product is 50$. The company decides to offer p p yone year free update to its customers. Suppose that failure intensity of the productSuppose that failure intensity of the product at the release time is = 0.01 failures/month.What should be the unit cost of the productWhat should be the unit cost of the product including warranty services?

[email protected] 37

TerminologyTerminologyFailures

TreatsFailuresFaultsErrors

The ability of a system to avoid failures that are more frequent or more severe, and outage

The ability of a system to avoid failures that are more frequent or more severe, and outage

Attributes

AvailabilityReliabilitySafetyC fid ti lit

o o e se e e, a d outagedurations that are longer, than is acceptable to the users.

o o e se e e, a d outagedurations that are longer, than is acceptable to the users.

Dependability Attributes ConfidentialityIntegrityMaintainability

The ability of aThe ability of a

Dependability

MeansFault preventionFault toleranceFault removal

The ability of a system to deliver service that can justifiably be

The ability of a system to deliver service that can justifiably be

Models

Fault removalFault forecasting

justifiably be trusted.justifiably be trusted.

Reliability Block DiagramFault Tree model

[email protected] 38

Fault Tree modelReliability Graph

Definition: ServiceDefinition: ServiceThe service delivered by a system is its behaviour as The service delivered by a system is its behaviour as it is perceived by its users; a user is another system (physical, human) that interacts with the former at (p y , )the service interface.

The function of a system is what the system is i d d d d i d ib d b h f i lintended to do, and is described by the functional specification. Correct service is delivered when the service Correct service is delivered when the service implements the system function.

The delivery of incorrect service is a system outage The delivery of incorrect service is a system outage. A transition from incorrect service to correct service

is service restoration.

[email protected] 39

Dependability: TreatsDependability: Treats

Error cause Fault cause Failure

An error is a human action that results in software containing a fault.

A fault (bug) is a cause for either a failure of the ( g)program or an internal error (e.g., an incorrect state, incorrect timing). It must be detected and removed.

Among the 3 factors only failure is observable.

[email protected] 40

Definition: FailureDefinition: FailureF ilF il Failure: Failure: A system failure is an event that occurs when the delivered service

deviates from correct service. A failure is thus a transition from correct service to incorrect service i e to not implementing thecorrect service to incorrect service, i.e., to not implementing the system function.

Any departure of system behavior in execution from user needs. A failure is caused by a fault and the cause of a fault is usually a humanfailure is caused by a fault and the cause of a fault is usually a human error.

Failure Mode: Failure Mode: The manner in which a fault occurs, i.e., the way in which the element

faults. Failure Effect: Failure Effect:

The consequence(s) of a failure mode on an operation, function, status of a system/process/activity/environment. The undesirable outcome of a fault of a system element in a particular mode associated Risk

[email protected] 41

a fault of a system element in a particular mode. associated Risk

Failure Intensity & DensityFailure Intensity & Density

Failure Intensity (failure rate):Failure Intensity (failure rate): the rate failures are happening, i.e., number of failures per natural or time unit. Failure intensity is way of expressing system reliability, e.g., 5 failures per hour; 2 failures per 1000 transactions. For system

end users

Failure Density:Failure Density: failure per KLOC (or per FP) of developed code, e.g., 1 failure per KLOC, 0.2 failure

FP tper FP, etc.For system developers

[email protected] 42

Page 8: Chapter Chapter 11: : OverviewOverview

Example: Failure DensityExample: Failure Densityf In a software system,

measuring number of f il l dfailures lead to identification of 5

d lmodules. However, measuring

failures per KLOC (Failure Density) leads to identification of only one module.

[email protected] 43

Example from Fenton’s Book

Failure Density Failure Density vs. Inspection vs. Inspection Effort Effort

f il d i d i f f Is failure density a good metrics for software quality?

The more bugs found and fixed doesn’t necessarily imply better software quality because the fault injection rate and the effort to fix them may be different.

Inspection EffortInspection EffortInspection EffortInspection EffortHigher Lower

Higher Good/ Not Worstyy Higher Good/ Not bad

Worst Case

Lower Best Case UnsureFailu

reFa

ilure

Den

sity

Den

sity

[email protected] 44

Definition: FaultDefinition: FaultFault:Fault: A fault is a cause for either a failure of the Fault:Fault: A fault is a cause for either a failure of the program or an internal error (e.g., an incorrect state, incorrect timing)g) A fault must be detected and then removed Fault can be removed without execution (e.g., code

i ti d i i )inspection, design review) Fault removal due to execution depends on the

occurrence of associated “failure”. Occurrence depends on length of execution time and

operational profile.D f tD f t f t ith f lt ( ) f il Defect:Defect: refers to either fault (cause) or failure (effect)

[email protected] 45

Definition: ErrorDefinition: Error

Error has two meanings: A discrepancy between a computed, observed or p y p ,

measured value or condition and the true, specified or theoretically correct value or p ycondition.

A human action that results in software A human action that results in software containing a fault.

Human errors are the hardest to detect Human errors are the hardest to detect.

[email protected] 46

Dependability: Attributes /1Dependability: Attributes /1A il bilit di f t i Availability: readiness for correct service

Reliability: continuity of correct service Safety: absence of catastrophic consequences on

the users and the environment Confidentiality: absence of unauthorized

disclosure of information Integrity: absence of improper system state

alterations Maintainability: ability to undergo repairs and

modifications

[email protected] 47

Dependability: Attributes /2Dependability: Attributes /2Dependability attributes may be emphasized to a Dependability attributes may be emphasized to a greater or lesser extent depending on the application: availability is always required, whereas pp y y q ,reliability, confidentiality, safety may or may not be required. O h d d bili ib b d fi d Other dependability attributes can be defined as combinations or specializations of the six basic attributesattributes.

Example: Security is the concurrent existence of Availability for authorized users only;Availability for authorized users only; Confidentiality; and Integrity with improper taken as meaning unauthorized.

[email protected] 48

Page 9: Chapter Chapter 11: : OverviewOverview

Definition: AvailabilityDefinition: Availability

Availability:Availability: a measure of the delivery of correct service with respect to the alternation pof correct and incorrect service

DowntineUptimeUptimetyAvailabili

p

MTTFMTTFtyAvailabili MTBFMTTRMTTF

y

[email protected] 49

Definition: Reliability /1Definition: Reliability /1R li bilit i f th ti d li f t Reliability is a measure of the continuous delivery of correct service

Reliability is the probability that a system or a capability of a Reliability is the probability that a system or a capability of a system functions without failure for a “specified time” or “number of natural units” in a specified environment. (Musa, t l ) Gi th t th t f ti i l t thet al.) Given that the system was functioning properly at the

beginning of the time period Probability of failure-free operation for a specified time in a Probability of failure free operation for a specified time in a

specified environment for a given purpose (Sommerville) A recent survey of software consumers revealed that

reliability was the most important quality attribute of the application software

[email protected] 50

Definition: Reliability /2Definition: Reliability /2h k iThree key points:

Reliability depends on how the software is used Therefore a model of usage is required

Reliability can be improved over time if certain Reliability can be improved over time if certain bugs are fixed (reliability growth) Therefore a trend model (aggregation or regression)Therefore a trend model (aggregation or regression) is neededF il h d i Failures may happen at random timeTherefore a probabilistic model of failure is needed

[email protected] 51

Definition: SafetyDefinition: SafetyS f t b f t t hi Safety: absence of catastrophic consequences on the users and the environmentS f t i t i f li bilit f t i Safety is an extension of reliability: safety is reliability with respect to catastrophic failures.Wh th t t f t i d th t t f When the state of correct service and the states of incorrect service due to non-catastrophic failure are grouped into a safe state (in the sense of being freegrouped into a safe state (in the sense of being free from catastrophic damage, not from danger), safety is a measure of continuous safeness, or equivalently,is a measure of continuous safeness, or equivalently, of the time to catastrophic failure.

[email protected] 52

Definition: Definition: ConfidentialityConfidentiality

Confidentiality: absence of unauthorized disclosure of information

[email protected] 53

Definition: Definition: IntegrityIntegrity

Integrity: absence of improper system state alterations

[email protected] 54

Page 10: Chapter Chapter 11: : OverviewOverview

Definition: Definition: MaintainabilityMaintainability

Maintainability: ability to undergo repairs and modifications

Maintainability is a measure of the time to service restoration since the last failureservice restoration since the last failure occurrence, or equivalently, measure of the

i d li f i icontinuous delivery of incorrect service.

[email protected] 55

Dependability: MeansDependability: MeansF lt ti h t t th Fault prevention: how to prevent the occurrence or introduction of faults

Fault tolerance: how to deliver correct service in the presence of faultsp

Fault removal: how to reduce the number or severity of faultsseverity of faults

Fault forecasting: how to estimate the present number the future incidence and thepresent number, the future incidence, and the likely consequences of faults

[email protected] 56

Definition: Definition: Fault PreventionFault PreventionT id f lt b t ti To avoid fault occurrences by construction.

Fault prevention is attained by quality control p y q ytechniques employed during the design and manufacturing of software. g

Fault prevention intends to prevent operational physical faultsoperational physical faults.

Example techniques: design review, modularization consistency checkingmodularization, consistency checking, structured programming, etc.

[email protected] 57

Fault PreventionFault Prevention

Activities: Requirement reviewq Design review

Clear code Clear code Establishing standards (ISO 9000-3, etc.) Using CASE tools with built-in check mechanisms

All these activities are included in ISO 9000All these activities are included in ISO 9000--3: 3: Guidelines for application of ISO 9001 to the development, supply and maintenance of

ft (1991)

[email protected] 58

software (1991)

Definition: Definition: Fault ToleranceFault ToleranceA f lt t l t ti t i bl f A fault-tolerant computing system is capable of providing specified services in the presence of a bounded number of failuresbounded number of failures

Use of techniques to enable continued delivery of service during system operationservice during system operation

It is generally implemented by error detection and subsequent system recoverysubsequent system recovery

Based on the principle of:Act during operation while Act during operation while

Defined during specification and design

[email protected] 59

Fault Tolerance Fault Tolerance ProcessProcess1 i1. Detection

Identify faults and their causes (errors)

2. Assessment Assess the extent to which the system state has been

damaged or corrupted.3. Recoveryy

Remain operational or regain operational status4. Fault treatment and continued service4. Fault treatment and continued service

Locate and repair the fault to prevent another occurence

[email protected] 60

Page 11: Chapter Chapter 11: : OverviewOverview

Definition: Definition: Fault Removal /1Fault Removal /1Fault removal is performed both during the Fault removal is performed both during the development phase, and during the operational life of a system. y

Fault removal during the development phase of a system life-cycle consists of three steps: verification verification diagnosis diagnosis correctioncorrection

Verification is the process of checking whether the dh i i ll d hsystem adheres to given properties, called the

verification conditions. If it does not, the other two steps follow: diagnosing the faults that preventedsteps follow: diagnosing the faults that prevented the verification conditions from being fulfilled, and then performing the necessary corrections.

[email protected] 61

Definition: Definition: Fault Removal /2Fault Removal /2Aft ti th ifi ti h ld b t d After correction, the verification process should be repeated in order to check that fault removal had no undesired consequences; the verification performed at this stage is q ; p gusually called non-regression verification.

Checking the specification is usually referred to as lid tivalidation.

Uncovering specification faults can happen at any stage of the development either during the specification phase itselfthe development, either during the specification phase itself, or during subsequent phases when evidence is found that the system will not implement its function, or that the i l t ti t b hi d i t ff tiimplementation cannot be achieved in a cost effective way.

[email protected] 62

Definition: Definition: Fault ForecastingFault ForecastingF lt f ti i d t d b f i Fault forecasting is conducted by performing an evaluation of the system behaviour with respect to fault occurrence or activationfault occurrence or activation

Evaluation has two aspects:qualitative or ordinal evaluation which aims to identify qualitative, or ordinal, evaluation, which aims to identify, classify, rank the failure modes, or the event combinations (component failures or environmental conditions) that would lead to system failures

quantitative, or probabilistic, evaluation, which aims to e al ate in terms of probabilities the e tent to hichevaluate in terms of probabilities the extent to which some of the attributes of dependability are satisfied; those attributes are then viewed as measures of dependability

[email protected] 63

p y

SRE: Process /1SRE: Process /1Th 5 t i There are 5 steps in SRE process (for each system toeach system to test): Define necessary Define necessary

reliability Develop

operational profiles Prepare for test

E Execute test Apply failure data

to guide decisions

[email protected] 64

to guide decisions

SRE: Process /2SRE: Process /2

Modified version of the SRE Process

[email protected] 65

Ref: Musa’s book 2nd Ed

ConclusionsConclusions

Software Reliability Engineering (SRE) can offer metrics and measures to help elevate a psoftware development organization to the upper levels of software developmentupper levels of software development maturity.H i i ff i However, in practice effective implementation of SRE is a non-trivial task!

[email protected] 66

Page 12: Chapter Chapter 11: : OverviewOverview

SENG521 [email protected] 67