SENG 521 SENG 521 Software Reliability & Software Reliability & Software Reliability & Software Reliability & Software Quality Software Quality Chapter Chapter 8: System Reliability 8: System Reliability Department of Electrical & Computer Engineering, University of Calgary Department of Electrical & Computer Engineering, University of Calgary B.H. Far ([email protected]) http://www.enel.ucalgary.ca/People/far/Lectures/SENG521 [email protected]1
62
Embed
SENG 521 Software Reliability & Software Qualitypeople.ucalgary.ca/~far/Lectures/SENG521/PDF/SENG521-08.pdf · m – out of – n System System hasSystem has n components. At least
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Dependability ModelsDependability ModelsDependability models capture conditions that make Dependability models capture conditions that make a system fail in terms of the structural relationships between the system componentsy p
A system usually consists of components. Each component consists of sub-components.Each component consists of sub components. Components may have
Different reliability Different dependencies among each other
System reliability is a function of the reliabilities of the (sub ) components and ofreliabilities of the (sub-) components and of the relationships between the components.
Reliability Block Diagram (RBD)Reliability Block Diagram (RBD)Reliability Block Diagram (RBD) is a graphical Reliability Block Diagram (RBD) is a graphical representation of how the components of a system are connected from reliability point of view. y p
Reliability of the system is derived in terms of reliabilities of its individual components
The most common configurations of an RBD are the series and parallel configurations. A i ll d f bi i f A system is usually composed of combinations of serial and parallel configurations.
RBD analysis is essential for determining reliability RBD analysis is essential for determining reliability, availability and down time of the system.
i l fi i h l In a serial system configuration, the elements must all work for the system to work and the system fails if f h f il Th llif one of the components fails. The overall reliability of a serial system is lower than the
li bili f i i di id lreliability of its individual components. In parallel configuration, the components are
considered to be redundant and the system will still cease to work if all the parallel components fail. The overall reliability of a parallel system is higher than the reliability of its individual components.
1. Define boundary of the system for analysis2. Break system into functional components2. Break system into functional components3. Determine serial-parallel combinations 4. Represent each components as a separate
block in the diagramg5. Draw lines connecting the blocks in a
logical order for mission successlogical order for mission success
Combining Combining ReliabilitiesReliabilitiesSerial system reliability can be calculated from Serial system reliability can be calculated from component reliabilities, if the components fail independently of each other. p y
For serial systems:p pQ Q Qp number of components
11
p pQ Q
k kkk
R R and
Qp number of components
Rk component reliability
Components reliabilities (Rk) must be expressed with respect to a common interval.
A serial system has always smaller reliability than its components (because Rk 1).
Example 1 Example 1 In reliability prediction for an assembled system we usually In reliability prediction for an assembled system we usually
use a “bottoms-up” approach by estimating the failure rate for each subsystem and then combining the failure rates for the entire assembly The following figure illustrates a systemthe entire assembly. The following figure illustrates a system in which the subsystems A, B, C, and D are in a serial configuration. Each subsystem is composed of several parts
hi h t d i l ll l hwhich are connected as serial or parallel as shown.
The failure rates of serial parts 1, 2, and 3 are 0.1, 0.3, and 0.5 per hour, respectively. p p yDetermine the failure rate and MTTF for subsystem A.subsystem A.
Example 1 (cont’d)Example 1 (cont’d)The failure rates of parallel parts 4 5 and 6 in The failure rates of parallel parts 4, 5, and 6 in subsystem B are 0.2, 0.4, and 0.25 per hour, respectively. Determine the failure rate and MTTF p yfor subsystem B.
Example 1 (cont’d)Example 1 (cont’d)The failure rate of parallel parts 7 is constant and is The failure rate of parallel parts 7 is constant and is 0.2 per hour. Determine the failure rate and MTTF for subsystem C in which at least 2 out of 3 items ymust be working.
Example 2Example 2h f i f h i lThe fastening of two mechanical
parts in an automobile brake h ld b li bl I i E1E3system should be reliable. It is
done by means of two flanges hi h d h i h 4
E1
E2
E3
E4which are pressed together with 4 bolt and nut pairs E1 to E4 placed 90 d h h90 degrees to each other.
Experience shows that the fastening h ld h l b l dholds when at least one bolt and nut pair located opposite to each otherwork i e (E1 and E2) or (E3 and E4)
b) Compute reliability of the fixation described in (a) if all bolt and nut pairs have the ( ) preliability of R=0.90 for 50K miles of operation.operation.
RBD: When and HowRBD: When and HowWh t RBD? When to use RBD? When reliability of a complex system must be calculated
and the reliability wise weaknesses of the system must beand the reliability-wise weaknesses of the system must be identified.
How to use RBD?How to use RBD? Draw the RDB diagram. sometimes not that simple! Calculate the system reliability using the RBD diagram.y y g g Perform calculation such as availability and downtime.
There are a number of automated tools, integrated , gwith the other methods, such as Fault Tree Analysis (FTA), to generate the diagram and to analyze it.
RBD: Benefits & LimitationsRBD: Benefits & LimitationsRBD i th i l t f i li i th li bilit f th RBD is the simplest way of visualizing the reliability of the complex systems.
The benefits of the RBD are: The benefits of the RBD are: Establishes reliability goals; Evaluates component failure impact on
overall system safety; Provides a basis for “what if” analysis; Allocates component reliability by calculating system MTBF;Allocates component reliability by calculating system MTBF; Provides cost savings in large system trouble-shooting; Estimates system reliability; Analyzes various system configurations in trade-off studies; Identifies potential design problems; Determines systemoff studies; Identifies potential design problems; Determines system sensitivity to component failures
Disadvantages are that some complex constructs, such as t db b hi d l d h i t t b l lstandby, branching and load sharing, etc., cannot be clearly
Hazard AnalysisHazard Analysis“A common mistake in engineeringA common mistake in engineering, is to put too much confidence in software. There seems to be a feeling gamong non-software professionals that software will not or cannot fail, which leads to complacency and overwhich leads to complacency and over reliance on computer functions.”
Leveson, N.G., Safeware – System Safety and Computers, Addison-Wesley, 1995.
Hazard AnalysisHazard AnalysisGG GoalGoal Identify events that may eventually lead to accidents Determine impact on system
TechniquesTechniques FMEA: Failure Modes and Effects Analysis FMECA: Failure Modes, Effects and Criticality Analysisy y ETA: Event Tree Analysis FTA: Fault Tree Analysisy HAZOP: HAZard and OPerability studies
FMEAFMEA FMEA is a technique to identify and prioritize how FMEA is a technique to identify and prioritize how
systems fail, and identify the effects of failure. FMEA is used when FMEA is used when
Designing products or processes, to identify and avoid failure-prone designs.p g
Investigating why existing systems have failed and to identify possible causes.
Investigating possible solutions, to help select one with an acceptable risk.
Planning actions in order to identify risks in the plan and Planning actions, in order to identify risks in the plan and hence identify countermeasures.
Identification:Identification: How ( i.e., in what ways) can this element fail (failure modes)?( )
Ramification:Ramification: What will happen to the system and its environment if this elementsystem and its environment if this element does fail in each of the ways available to it (f il ff )?(failure effects)?
Prevention:Prevention: What needs to be done to prevent or mitigate the problem?
P = Probabilities (chance) of occurrencesS = Seriousness (impact) of failure D = Likelihood that the defect will reach the customer R = Risk priority measure (P x S x D)R Risk priority measure (P x S x D)
1 = very low or none 2 = low or minor 3 = moderate or significant4 = high
FMEA cannot be done until design has proceeded to the point that system elements p p yhave been selected at the level the analysis is to explore in software system afterto explore in software system after software architecture is finalized (requirement volatility kills FMEA!)(requirement volatility kills FMEA!)
FMEA b d ti i th t FMEA can be done anytime in the system lifetime, from initial design onward
FMEA: SummaryFMEA: SummaryM th d f k di d f il d f Method: from known or predicted failure modes of components, determine possible effects on systemGood for hazard identification early in development by Good for hazard identification early in development, by considering possible failures of system functions: loss of function (omission failure) loss of function (omission failure) function performed incorrectly function performed when not required
(commission failure)
No good for concurrent multiple failuresd h i h No good when requirements change
Fault Tree Analysis (FTA)Fault Tree Analysis (FTA) Fault tree analysis is a graphical representation of the major Fault tree analysis is a graphical representation of the major
faults or critical failures associated with a product, the causes for the faults, and potential countermeasures. FTA helps identify areas of concern for new system design or foridentify areas of concern for new system design or for improvement of existing systems. It also helps identify corrective actions to correct or mitigate problems.
FTA can also be defined as a graphic “model” of the pathways within a system that can lead to a foreseeable, undesirable event. The pathways interconnect contributory p y yevents and conditions, using standard logic symbols.
Fault tree analysis is useful both in designing new systems, products or services or in dealing with identified problems inproducts or services or in dealing with identified problems in existing products/services. As part of process improvement, it can be used to help identify root causes of trouble and to design remedies and countermeasures
How to Use FTA?How to Use FTA?1 Select a component for analysis Draw a box at the top of the diagram1. Select a component for analysis. Draw a box at the top of the diagram
and list the component inside.2. Identify critical failures or “faults” related to the component. Using
Failure Mode and Effect Analysis is a good way to identify faults duringFailure Mode and Effect Analysis is a good way to identify faults during quality planning. For quality improvement, faults may be identified through Brainstorming or as the output of Cause and Effect Analysis.
3. Identify causes for each fault. List all applicable causes for faults in y ppovals below the fault. Connect the ovals to the appropriate fault box.
4. Work toward a root cause. Continue identifying causes for each fault until you reach a root or controllable cause.
5. Identify countermeasures for each root cause. Use Brainstorming or a modified version of Force Field Analysis to develop actions to counteract the root cause of each critical failure. Create boxes for each countermeasure draw the boxes below the appropriate root cause andcountermeasure, draw the boxes below the appropriate root cause, and link the counter measure and cause.
Steps in FTASteps in FTAFMEA (F il M d d Eff A l i ) d t i th f il FMEA (Failure Modes and Effects Analysis) determines the failure modes that are likely to cause failure events. Then it determines what single or multiple point failures could produce those top level events. FMEA asks the question, “What can go wrong?” even if theevents. FMEA asks the question, What can go wrong? even if the product meets specification. In order to perform FTA one first needs to perform FMEA.
FTA: SummaryFTA: SummaryM th d f l i b k h h d i Method: trace faults stepwise back through system design to possible causes a tree with a top event at the root a tree with a top event at the root logic gates at branches, linking each event with its “immediate”
causes initiating faults at leaves (eventually)
Good for tracing system hazards to component failures, and ll i f iallocating safety requirements
Good for systems with multiple failuresd f h ki l f f i Good for checking completeness of safety requirements
Can be difficult, time-consuming, hard to maintain