SENG 521 SENG 521 Software Reliability & Software Reliability & Software Reliability & Software Reliability & Software Quality Software Quality Chapter Chapter 1: : Overview Overview Department of Electrical & Computer Engineering, University of Calgary B.H. Far ([email protected]) http://www enel ucalgary ca/People/far/Lectures/SENG521/ [email protected]1 http://www.enel.ucalgary .ca/People/far/Lectures/SENG521/ Contents Contents Sh t i Sh t i Shorter version: Shorter version: How to avoid these? How to avoid these? [email protected]2 Contents Contents L i L i Longer version: Longer version: What is this course about? What is this course about? What factors affect software What factors affect software quality? quality? Wh t Wh t i ft li bilit ? i ft li bilit ? What What is software reliability? is software reliability? What What is software reliability is software reliability engineering engineering? engineering engineering? What is software What is software reliability reliability engineering engineering process? process? engineering engineering process? process? [email protected]3 What Affects Quality? What Affects Quality? SENG521 [email protected]4 What Affects What Affects Software Quality? Software Quality? Time: Time: Time: Time: Meeting the project deadline. Quality Reaching the market at the right time. C t C t Cost: Cost: Meeting the anticipated project costs. Cost Time Quality (reliability): Quality (reliability): Working fine for the People Technology designated period on the designated system. People Technology [email protected]5 Terminology & Scope Terminology & Scope Failures Treats Failures Faults Errors The ability of a system to deliver service that can The ability of a system to deliver service that can Attributes Availability Reliability Safety C fid ti lit service that can justifiably be trusted. service that can justifiably be trusted. Dependability Attributes Confidentiality Integrity Maintainability Dependability Means Fault prevention Fault tolerance Fault removal Models Fault removal Fault forecasting Reliability Block Diagram Fault Tree model [email protected]6 Fault Tree model Reliability Graph
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
ContentsContentsL iL iLonger version:Longer version: What is this course about?What is this course about? What factors affect software What factors affect software
quality?quality?Wh tWh t i ft li bilit ?i ft li bilit ? What What is software reliability?is software reliability?
What What is software reliability is software reliability engineeringengineering??engineeringengineering??
What is software What is software reliability reliability engineeringengineering process?process?engineering engineering process?process?
ISO 9126ISO 9126 PrePre--releasereleasePredict sizePredict sizePredict effortPredict effortPredict timePredict time PostPost--releasereleasePredict timePredict time
At The End …At The End … What is software quality? What affects software quality? What is software quality? What affects software quality? What is software reliability engineering (SRE)? Why SRE is important? How does it affect software quality?
Wh t th i f t th t ff t th li bilit f ft ? What are the main factors that affect the reliability of software? How can one determine what the size of the software will be? How can one determine how much it will cost to develop the software? How can one determine how often will the software fail? How can one determine the current quality of the software under
development? How can one determine whether the software is reliable enough to be
released? Can SRE methodology be applied to the current ways of software
d l t lik bj t i t d t b d d ildevelopment: like object-oriented, component-based and agile development?
What are challenges and difficulties of applying SRE?
What is Software Quality?What is Software Quality?C f t i tC f t i t Conformance to requirementConformance to requirement The requirements are clearly stated and the
product must conform to itWhat isWhat isWhat isWhat is Any deviation from the requirements is
regarded as a defect A good quality product contains fewer defectsWhatWhat
What isWhat isSpecified?Specified?
What isWhat isSpecified?Specified?
A good quality product contains fewer defects Fitness for useFitness for use
Fit to user expectations: meet user’s needs
SW SW Does?Does? WhatWhat
useruserN d ?N d ?
WhatWhatuseruser
N d ?N d ? A good quality product provides better user
Both Both Dependable computing systemDependable computing systemBoth Both Dependable computing systemDependable computing system
Definition: Software QualityDefinition: Software QualityISO 8402 d fi itiISO 8402 definition
of QUALITY:The totality ofThe totality of features and characteristics of acharacteristics of a product or a service that bear on its ability to satisfy stated or implied needsneeds
ReliabilityReliability and MaintainabilityMaintainability are two major components of Quality
Example: Attribute ExpansionExample: Attribute ExpansionD i b blD i b bl Quality objective Design by measurable Design by measurable objectives:objectives:Incremental design is
Quality objective
Incremental design is evaluated to check whether the goal for each increment
Availability User friendliness
the goal for each increment was achieved.
% of planned Days on job to% of planned System uptime
Fatal software related incidents [Gage & McCormick 2004]Date Casualties Detail 2003 3 S ft f il t ib t f t N th t2003 3 Software failure contributes of power outage across North-eastern
U.S. and Canada.
2001 5 Panamanian cancer patients die following overdoses of radiation, determined by the use of faulty softwaredetermined by the use of faulty software.
2000 4 Crash of marine corps osprey tilt-rotor aircraft, partially blamed on software anomaly.
d h ld h d j h h bbl d b1997 225 Radar that could have prevented Korean jet crash hobbled by software problem.
1995 159 American airlines jet, descending into Cali, Columbia crashes into t i A th t th ft t d i ffi i ta mountain. A cause was that the software presented insufficient
and conflicting information to the pilots, who got lost.
1991 28 Software problem prevents Patriot missile battery from picking up SCUD missile which hits US Army barracks in Saudi Arabia
What to Learn from Data?What to Learn from Data?Th i f th i t f il ti th The inverses of the inter-failure times are the failure intensity (= failure per unit of time) d idata points
What to Learn from Data?What to Learn from Data?M ti t f il MTTF ( f il t ) Mean-time-to-failures MTTF (or average failure rate)MTTF = (6+4+8+5+6+9+11+14+16+19)/10 = 9.8 hour
System reliability for 1 hour of operation System reliability for 1 hour of operation1
9.8 0.90299tt MTTFR e e e
Fitting a straight line to the graph in (a) would show an x-intercept of about 15. Using this as an estimate of the total number of original failures, we estimate that there are still five bugs in the software.Fitting a straight line to the graph in (b) would give an x Fitting a straight line to the graph in (b) would give an x-intercept near 160. This would give an additional testing time of 62 units to remove all bugs, approximately.
A Typical Problem: QuestionA Typical Problem: QuestionF il i t it (f il t ) f t i ll Failure intensity (failure rate) of a system is usually expressed using FIT (Failure-In-Time) unit which is 1 failure per 10**9 device hours1 failure per 10**9 device hours.
Failure intensity of an electric pump system used for pumping crude oil in Northern Alberta’s oil fieldfor pumping crude oil in Northern Alberta s oil field is constant and is 10,000 FITs and 100 such pumps are operational.are operational.
If for continuous operation all failed units are to be replaced immediately, what shall be the minimumreplaced immediately, what shall be the minimum inventory size of pumps for one year of operation?
A Typical Problem: AnswerA Typical Problem: AnswerP ’ M Ti T F il (MTTF)Pump’s Mean-Time-To-Failure (MTTF) λ = 10,000 FITs = 10,000 / 10**9 hour = 1×10**-5 hour
= 1 failure per 100 000 hours= 1 failure per 100,000 hours
The 12-month reliability is: (1 year = 8 760 hours)The 12 month reliability is: (1 year 8,760 hours) R(8,760 hours) = exp{-8,760/100,000} = 0.916 and “unreliability” is, F(8,760) = 1 - 0.916 = 0.084
Therefore, inventory size is 8.4% or minimum 9 pumps should be at stock in the first year.
Unit manufacturing cost of a software product is 50$. The company decides to offer p p yone year free update to its customers. Suppose that failure intensity of the productSuppose that failure intensity of the product at the release time is = 0.01 failures/month.What should be the unit cost of the productWhat should be the unit cost of the product including warranty services?
Definition: ServiceDefinition: ServiceThe service delivered by a system is its behaviour as The service delivered by a system is its behaviour as it is perceived by its users; a user is another system (physical, human) that interacts with the former at (p y , )the service interface.
The function of a system is what the system is i d d d d i d ib d b h f i lintended to do, and is described by the functional specification. Correct service is delivered when the service Correct service is delivered when the service implements the system function.
The delivery of incorrect service is a system outage The delivery of incorrect service is a system outage. A transition from incorrect service to correct service
An error is a human action that results in software containing a fault.
A fault (bug) is a cause for either a failure of the ( g)program or an internal error (e.g., an incorrect state, incorrect timing). It must be detected and removed.
Definition: FailureDefinition: FailureF ilF il Failure: Failure: A system failure is an event that occurs when the delivered service
deviates from correct service. A failure is thus a transition from correct service to incorrect service i e to not implementing thecorrect service to incorrect service, i.e., to not implementing the system function.
Any departure of system behavior in execution from user needs. A failure is caused by a fault and the cause of a fault is usually a humanfailure is caused by a fault and the cause of a fault is usually a human error.
Failure Mode: Failure Mode: The manner in which a fault occurs, i.e., the way in which the element
faults. Failure Effect: Failure Effect:
The consequence(s) of a failure mode on an operation, function, status of a system/process/activity/environment. The undesirable outcome of a fault of a system element in a particular mode associated Risk
a fault of a system element in a particular mode. associated Risk
Failure Intensity & DensityFailure Intensity & Density
Failure Intensity (failure rate):Failure Intensity (failure rate): the rate failures are happening, i.e., number of failures per natural or time unit. Failure intensity is way of expressing system reliability, e.g., 5 failures per hour; 2 failures per 1000 transactions. For system
end users
Failure Density:Failure Density: failure per KLOC (or per FP) of developed code, e.g., 1 failure per KLOC, 0.2 failure
Failure Density Failure Density vs. Inspection vs. Inspection Effort Effort
f il d i d i f f Is failure density a good metrics for software quality?
The more bugs found and fixed doesn’t necessarily imply better software quality because the fault injection rate and the effort to fix them may be different.
Definition: FaultDefinition: FaultFault:Fault: A fault is a cause for either a failure of the Fault:Fault: A fault is a cause for either a failure of the program or an internal error (e.g., an incorrect state, incorrect timing)g) A fault must be detected and then removed Fault can be removed without execution (e.g., code
i ti d i i )inspection, design review) Fault removal due to execution depends on the
occurrence of associated “failure”. Occurrence depends on length of execution time and
operational profile.D f tD f t f t ith f lt ( ) f il Defect:Defect: refers to either fault (cause) or failure (effect)
Dependability: Attributes /2Dependability: Attributes /2Dependability attributes may be emphasized to a Dependability attributes may be emphasized to a greater or lesser extent depending on the application: availability is always required, whereas pp y y q ,reliability, confidentiality, safety may or may not be required. O h d d bili ib b d fi d Other dependability attributes can be defined as combinations or specializations of the six basic attributesattributes.
Example: Security is the concurrent existence of Availability for authorized users only;Availability for authorized users only; Confidentiality; and Integrity with improper taken as meaning unauthorized.
Definition: Reliability /1Definition: Reliability /1R li bilit i f th ti d li f t Reliability is a measure of the continuous delivery of correct service
Reliability is the probability that a system or a capability of a Reliability is the probability that a system or a capability of a system functions without failure for a “specified time” or “number of natural units” in a specified environment. (Musa, t l ) Gi th t th t f ti i l t thet al.) Given that the system was functioning properly at the
beginning of the time period Probability of failure-free operation for a specified time in a Probability of failure free operation for a specified time in a
specified environment for a given purpose (Sommerville) A recent survey of software consumers revealed that
reliability was the most important quality attribute of the application software
Definition: Reliability /2Definition: Reliability /2h k iThree key points:
Reliability depends on how the software is used Therefore a model of usage is required
Reliability can be improved over time if certain Reliability can be improved over time if certain bugs are fixed (reliability growth) Therefore a trend model (aggregation or regression)Therefore a trend model (aggregation or regression) is neededF il h d i Failures may happen at random timeTherefore a probabilistic model of failure is needed
Definition: SafetyDefinition: SafetyS f t b f t t hi Safety: absence of catastrophic consequences on the users and the environmentS f t i t i f li bilit f t i Safety is an extension of reliability: safety is reliability with respect to catastrophic failures.Wh th t t f t i d th t t f When the state of correct service and the states of incorrect service due to non-catastrophic failure are grouped into a safe state (in the sense of being freegrouped into a safe state (in the sense of being free from catastrophic damage, not from danger), safety is a measure of continuous safeness, or equivalently,is a measure of continuous safeness, or equivalently, of the time to catastrophic failure.
Maintainability: ability to undergo repairs and modifications
Maintainability is a measure of the time to service restoration since the last failureservice restoration since the last failure occurrence, or equivalently, measure of the
i d li f i icontinuous delivery of incorrect service.
Dependability: MeansDependability: MeansF lt ti h t t th Fault prevention: how to prevent the occurrence or introduction of faults
Fault tolerance: how to deliver correct service in the presence of faultsp
Fault removal: how to reduce the number or severity of faultsseverity of faults
Fault forecasting: how to estimate the present number the future incidence and thepresent number, the future incidence, and the likely consequences of faults
Clear code Clear code Establishing standards (ISO 9000-3, etc.) Using CASE tools with built-in check mechanisms
All these activities are included in ISO 9000All these activities are included in ISO 9000--3: 3: Guidelines for application of ISO 9001 to the development, supply and maintenance of
Definition: Definition: Fault ToleranceFault ToleranceA f lt t l t ti t i bl f A fault-tolerant computing system is capable of providing specified services in the presence of a bounded number of failuresbounded number of failures
Use of techniques to enable continued delivery of service during system operationservice during system operation
It is generally implemented by error detection and subsequent system recoverysubsequent system recovery
Based on the principle of:Act during operation while Act during operation while
Definition: Definition: Fault Removal /1Fault Removal /1Fault removal is performed both during the Fault removal is performed both during the development phase, and during the operational life of a system. y
Fault removal during the development phase of a system life-cycle consists of three steps: verification verification diagnosis diagnosis correctioncorrection
Verification is the process of checking whether the dh i i ll d hsystem adheres to given properties, called the
verification conditions. If it does not, the other two steps follow: diagnosing the faults that preventedsteps follow: diagnosing the faults that prevented the verification conditions from being fulfilled, and then performing the necessary corrections.
Definition: Definition: Fault Removal /2Fault Removal /2Aft ti th ifi ti h ld b t d After correction, the verification process should be repeated in order to check that fault removal had no undesired consequences; the verification performed at this stage is q ; p gusually called non-regression verification.
Checking the specification is usually referred to as lid tivalidation.
Uncovering specification faults can happen at any stage of the development either during the specification phase itselfthe development, either during the specification phase itself, or during subsequent phases when evidence is found that the system will not implement its function, or that the i l t ti t b hi d i t ff tiimplementation cannot be achieved in a cost effective way.
Definition: Definition: Fault ForecastingFault ForecastingF lt f ti i d t d b f i Fault forecasting is conducted by performing an evaluation of the system behaviour with respect to fault occurrence or activationfault occurrence or activation
Evaluation has two aspects:qualitative or ordinal evaluation which aims to identify qualitative, or ordinal, evaluation, which aims to identify, classify, rank the failure modes, or the event combinations (component failures or environmental conditions) that would lead to system failures
quantitative, or probabilistic, evaluation, which aims to e al ate in terms of probabilities the e tent to hichevaluate in terms of probabilities the extent to which some of the attributes of dependability are satisfied; those attributes are then viewed as measures of dependability
Software Reliability Engineering (SRE) can offer metrics and measures to help elevate a psoftware development organization to the upper levels of software developmentupper levels of software development maturity.H i i ff i However, in practice effective implementation of SRE is a non-trivial task!