SENG 521 SENG 521 SENG 521 SENG 521 Software Reliability & Software Reliability & Software Quality Software Quality Ch t Ch t 5O i fS ft 5O i fS ft Chapter Chapter 5: Overview of Software 5: Overview of Software Reliability Engineering Reliability Engineering Department of Electrical & Computer Engineering, University of Calgary B.H. Far ([email protected]) [email protected]1 http://www.enel.ucalgary.ca/People/far/Lectures/SENG521/
33
Embed
SENG 521 Software Reliability & Software Qualitypeople.ucalgary.ca/~far/Lectures/SENG521/PDF/SENG521-05.pdf · SENG 521 Software Reliability & Software Quality ... [email protected]
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Reliability TheoryReliability TheoryReliability TheoryReliability Theory Reliability theory developed apart from the y y p p
mainstream of probability and statistics, and was used primarily as a tool to help nineteenth century maritime and life insurance companies compute profitable rates t h th i t E t d thto charge their customers. Even today, the terms “failure rate” and “hazard rate” are often used interchangeablyoften used interchangeably.
Probability of survival of merchandize after one MTTF is 1 0 37R e
Software vs HardwareSoftware vs HardwareSoftware vs. HardwareSoftware vs. Hardware
Software reliability doesn’t decrease with Software reliability doesn t decrease with time, i.e., software doesn’t wear out.
Hardware faults are mostly physical faults Hardware faults are mostly physical faults, e.g., fatigue.S ft f lt tl d i f lt Software faults are mostly design faultswhich are harder to measure, model, detect
Software vs HardwareSoftware vs HardwareSoftware vs. HardwareSoftware vs. Hardware Hardware failure can be “fixed” by replacing a y p g
faulty component with an identical one, therefore no reliability growth. S ft bl b “fi d” b h i th Software problems can be “fixed” by changing the code in order to have the failure not happen again, therefore reliability growth is present.e e o e e b y g ow s p ese .
Software does not go through production phase the same way as hardware does.
Conclusion: hardware reliability models may not be used identically for software.
Engineering of “reliability” in software Engineering of reliability in software products.
Reliability Engineering’s goal: Reliability Engineering s goal:developing software to reach the market With “minimum” development time With minimum development time With “minimum” development cost With “maximum” reliability With maximum reliability With “minimum” expertise needed With “minimum” available technology
Software quality means getting the rightSoftware quality means getting the right balance among development cost, development time people technology and reliabilitytime, people, technology and reliability.
Minimum & Maximum
Cost Time PeopleSRE Cost, Time, People, Technology, Reliability
Optimum
Pick quantitative representations for the 5 factors (cost, time, people, technology and reliability) and measure them!
What is SRE? /1What is SRE? /1What is SRE? /1What is SRE? /1 Software Reliability Engineering (SRE) is a multi-y g g ( )
faceted discipline covering the software product lifecycle.
It involves both technical and managementactivities in three basic areas: Software Development and Maintenance Measurement and Analysis of reliability data
F db k f li bilit i f ti i t th ft Feedback of reliability information into the software lifecycle activities.
What is SRE ? /2What is SRE ? /2What is SRE ? /2What is SRE ? /2 SRE is a practice for quantitatively planning and p q y p g
guiding software development and test, with emphasis on reliability and availability.SRE i lt l d th thi SRE simultaneously does three things: It ensures that product reliability and availability meet
user needs. It delivers the product to market faster. It increases productivity, lowering product life-cycle cost.
In applying SRE, one can vary relative emphasis placed on these three factors.
SRE: Necessary ReliabilitySRE: Necessary ReliabilitySRE: Necessary ReliabilitySRE: Necessary Reliability Define what “failure” means for the software product.p Choose a common measure for all failure intensities, either
failures per some natural unit or failures per hour.h l f il i i bj i ( ) f h Set the total system failure intensity objective (FIO) for the
software/hardware system. Compute a developed software FIO by subtracting the total Compute a developed software FIO by subtracting the total
of the FIOs of all hardware and acquired software components from the system FIOs.
Use the developed software FIOs to track the reliability growth during system test (later on).
F il I t it Obj ti (FIO)F il I t it Obj ti (FIO)Failure Intensity Objective (FIO)Failure Intensity Objective (FIO)
Failure intensity (λ) is defined as failure per natural y ( ) punits (or time), e.g. 3 alarms per 100 hours of operation. 5 failures per 1000 transactions, etc.
Failure intensity of a cascade (serial) system is the sum of failure intensities for all of the components of the system.
SRE: OperationSRE: OperationSRE: OperationSRE: Operation An operation is a major system logical task, which p j y g ,
returns control to the system when complete. An operation is an input event affects the course of
b h i f ftbehavior of software. Example: operations for a Web proxy server
Connect internal users to external Web Connect internal users to external Web Email internal users to external users Email external users to internal users DNS request by internal users Etc.
SRE: Operational ModeSRE: Operational ModeSRE: Operational ModeSRE: Operational Mode Operational mode is a distinct pattern of system p p y
use and/or set of environmental conditions that may need separate testing due to likelihood of stimulating different failuresstimulating different failures.
Example: Time (time of year, day of week, time of day) Time (time of year, day of week, time of day) Different user types (customer or user) Users experiences (novice or expert)
The same operation may appear in different operational mode with different probabilities.
SRE: Operational ProfileSRE: Operational ProfileSRE: Operational ProfileSRE: Operational Profile An operational profile is a complete set of operations with their
b biliti f (d i th ti l f th ft )probabilities of occurrence (during the operational use of the software). An operational profile is a description of the distribution of input events
that is expected to occur in actual software operation. The operational profile of the software reflects how it will be used in
SRE S t O ti l P filSRE S t O ti l P filSRE: System Operational ProfileSRE: System Operational Profile System operational profile must be developed for all of its
important operational modes. There are four principal steps in developing an operational
profile:p Identify the operation initiators (i.e., user types, external systems, and
the system itself) List the operations invoked by each initiatorp y Determine the occurrence rates Determine the occurrence probabilities by dividing the occurrence
Types of TestTypes of TestTypes of TestTypes of Test Certification Test: Certification Test: Accept or reject (binary
decision) an acquired component for a given target failure intensity.
FeatureFeature (Unit) Test(Unit) Test:: A single execution of an Feature Feature (Unit) Test(Unit) Test:: A single execution of an operation with interaction between operations minimized.Load Test:Load Test: T ti ith fi ld d t d Load Test:Load Test: Testing with field use data and accounting for interactions
Regression Test:Regression Test: Feature tests after every build gg yinvolving significant change, i.e., check whether a bug fix worked.
Plot each new failure as it occurs on a Plot each new failure as it occurs on a reliability demonstration chart.
Accept or reject software (operations) using Accept or reject software (operations) using reliability demonstration chart.T k li bilit th f lt d Track reliability growth as faults are removed.
Collect Field DataCollect Field DataCollect Field DataCollect Field Data SRE for the software product lifecycle. Collect field data to use in succeeding releases either using
automatic reporting routines or manual collection, using a random sample of field sites.p
Collect data on failure intensity and on customer satisfaction and use this information in setting the failure intensity objective for the next releaseobjective for the next release.
Measure operational profiles in the field and use this information to correct the operational profiles we estimated.
Collect information to refine the process of choosing reliability strategies in future projects.
However However However …However … Practical implementation of an effective SRE
program is a non-trivial task. Mechanisms for collection and analysis of data on
software product and process quality must be insoftware product and process quality must be in place.
Fault identification and elimination techniques must b i lbe in place.
Other organizational abilities such as the use of reviews and inspections, reliability based testing p , y gand software process improvement are also necessary for effective SRE.