SENG 637 SENG 637 Dependability Reliability & Dependability Reliability & Dependability, Reliability & Dependability, Reliability & Testing of Software Testing of Software Systems Systems SRE D l t SRE D l t SRE Deployment SRE Deployment (Chapter 10) (Chapter 10) Department of Electrical & Computer Engineering, University of Calgary B.H. Far ([email protected]) http://www enel ucalgary ca/People/far/Lectures/SENG637/ [email protected]1 http://www.enel.ucalgary .ca/People/far/Lectures/SENG637/
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
SENG 637SENG 637Dependability Reliability & Dependability Reliability & Dependability, Reliability & Dependability, Reliability & Testing of Software Testing of Software SystemsSystems
SRE D l tSRE D l tSRE DeploymentSRE Deployment(Chapter 10)(Chapter 10)
Department of Electrical & Computer Engineering, University of Calgary
B.H. Far ([email protected])http://www enel ucalgary ca/People/far/Lectures/SENG637/
ContentsContentsContentsContents Quality in requirements phase Quality in design & implementation, testing &
release phasesS f Q li A (SQA) d S f Software Quality Assurance (SQA) and Software Reliability Engineering (SRE)
Quality, test and data plansQuality, test and data plans Roles and responsibilities Sample quality and test plan Defect reporting procedure Best practices of SRE
Quality ChallengesQuality ChallengesQuality ChallengesQuality Challenges Modern software systems are required to meet
several quality attributes such as: modifiabilityseveral quality attributes such as: modifiability, performance, security, interoperability, portability, reliability, etc.
Questions for any particular system: What precisely do these quality attributes mean?
b l d d i d i d li i Can a system be analyzed to determine desired qualities? How soon can such an analysis occur? How do you know if the design is suitable without How do you know if the design is suitable without
ReferencesReferencesReferencesReferences Software Architecture Technology Initiative of the
SEI: http://www.sei.cmu.edu/architecture/ ATAM: Method for Architecture Evaluation (2000),
Rick Kazman, Mark Klein, Paul Clements, Technical Report, CMU/SEI-2000-TR-004.CBAM M ki A hi D i D i i A CBAM: Making Architecture Design Decisions: An Economic Approach (2002), Rick Kazman, Jai Asundi, Mark Klein, Technical Report, CMU/SEI-2002-TR-035.2002 TR 035.
Software Architecture in Practice, 2nd ed., Len Bass, Paul Clements, Rick Kazman, Addison-Wesley, 2003., , y,
Evaluating Software Architectures: Methods and Case Studies, Paul Clements, Rick Kazman, Mark Klein,
What is Reliable Software?What is Reliable Software?What is Reliable Software?What is Reliable Software? Reliable software products are those that run correctly and
consistently, have fewer remaining defects, handle abnormalconsistently, have fewer remaining defects, handle abnormal situation properly, and need less installation effort
The remaining defects should not affect the normal behaviour and the use of the software they will not do any destructiveand the use of the software, they will not do any destructive things to system and its hardware or software environment, and rarely be evident to the usersD l i li bl ft i Developing reliable software requires: Establishing Software Quality System (SQS) and Software
John W. Horch: Practical Guide to Software Quality Management
Technology
Software Quality Assurance (SQA)Software Quality Assurance (SQA)Software Quality Assurance (SQA)Software Quality Assurance (SQA) Software quality Assurance (SQA) is a planned and
systematic approach to ensure that both software process andsystematic approach to ensure that both software process and software product conform to the established standards, processes, and procedures.
The goals of SQA are to improve software quality by The goals of SQA are to improve software quality by monitoring both software and the development process to ensure full compliance with the established standards and proceduresprocedures.
Steps to establish an SQA program Get the top management’s agreement on its goal and support. Identify SQA issues, write SQA plan, establish standards and SQA
functions, implement the SQA plan and evaluate SQA program.
John W. Horch: Practical Guide to Software Quality Management
SRE: Who is SRE: Who is InvolvedInvolved??SRE: Who is SRE: Who is InvolvedInvolved?? Senior management Test coordinator (manager) Data coordinator (manager)Data coordinator (manager) Customer or user
SRE: Management ConcernsSRE: Management ConcernsSRE: Management ConcernsSRE: Management Concerns Perception and specification of a customer’s real needs.
l i f ifi i i f i d i Translation of specification into a conforming design. Maintaining conformity throughout the development
processesprocesses. Product and sub-product demonstrations which provide
convincing indications of the product and project having metconvincing indications of the product and project having met their requirements.
Ensuring that the tests and demonstrations are designed and controlled, so as to be both achievable and manageable.
Test coordinator is expected to ensure that every specific statement ofTest coordinator is expected to ensure that every specific statement of intent in the product requirement, specification and design, is matched by a well designed (cost-effective, convincing, self-reporting, etc.) test, measurement or demonstrationmeasurement or demonstration.
Data Coordinator (Manager) :Data Coordinator (Manager) :Data coordinator ensures that the physical and administrative structures f d ll i i d d d i h li l ifor data collection exist and are documented in the quality plan, receives and validates the data during development, and through analysis and communication ensures that the meaning of the information is known to all, in time, for effective application.
Quality Plans /1Quality Plans /1Quality Plans /1Quality Plans /1 The most promising mechanisms
f i i d i ifor gaining and improving predictability and controllability of software qualities are quality plan and its subsidiary TestTestplan and its subsidiary documents, including test plansand data (measurement) plans.
The creation of the quality planQualityQuality
PlanPlan
PlanPlan
The creation of the quality plancan be instrumental in raising project effectiveness and in preventing expensive and time-
PlanPlan
DataDatap g pconsuming misunderstandings during the project, and at release/acceptance time.
Quality Plan /2Quality Plan /2Quality Plan /2Quality Plan /2 Quality plan and quality record, provide guidelines
f i t d t lli th f ll ifor carrying out and controlling the followings: Requirement and specification management. Development processes Development processes. Documentation management. Design evaluation.Design evaluation. Product testing. Data collection and interpretation. SRE related
Quality Plan /3Quality Plan /3Quality Plan /3Quality Plan /3 Quality planning should be made at the very earliest point in a
project preferably before a final decision is made onproject, preferably before a final decision is made on feasibility, and before a software development contract is signed.
Quality plan should be devised and agreed between all the concerned parties: senior management, software development management (both administrative and technical) softwaremanagement (both administrative and technical), software development team, customers, and any involved general support functions such as resource management and
Data (Measurement) PlanData (Measurement) PlanData (Measurement) PlanData (Measurement) Plan The data (measurement) plan prescribes:
What should be measured and recorded during a project; How it should be checked and collated;
How it should be interpreted and applied How it should be interpreted and applied.
Data may be collected in several ways, within the specific project and beyond itspecific project and beyond it.
Ideally, there should be a higher level of data collection and application into which project data iscollection and application into which project data is fed.
Test Plan /1Test Plan /1Test Plan /1Test Plan /1 The purpose of test plan is to ensure that all testing activities
(including those used for controlling the process of(including those used for controlling the process of development, and in indicating the progress of the project) are expected, are manageable and are managed.
Test plans are created as a subsection or as an associated Test plans are created as a subsection or as an associated document of the quality plan.
Test plans become progressively more detailed and expanded d i j tduring a project.
Each test plan defines its own objectives and scope, and the means and methods by whichscope, and the means and methods by which the objectives are expected to be met.
Test Plan /2Test Plan /2Test Plan /2Test Plan /2 For the software product, the test plan is usually restricted by
the scope of the test: certification, feature and load test.the scope of the test: certification, feature and load test. The plan predicts the resources and means required to reach
the required levels of assurance about the end products, and the scheduling of all testing measuring and demonstrationthe scheduling of all testing, measuring and demonstration activities.
Tests, measurements and demonstrations are used to establish th t th ft d t ti fi th i t d tthat the software product satisfies the requirements document, and that each process during a development is carried out correctly and results in acceptable outcomes.
Sample SQS Plan (cont’d) /2 Sample SQS Plan (cont’d) /2 Sample SQS Plan (cont d) /2 Sample SQS Plan (cont d) /2 4 Documentation
4.1 Purpose 4.2 Minimum Documentation
4 2 1 S ft R i t S ifi ti 4.2.1 Software Requirements Specification 4.2.2 Software Design Description 4.2.3 Software Verification and Validation Plan 4.2.4 Software Verification and Validation Report 4.2.5 User Documentation
fi i l 4.2.6 Configuration Management Plan 4.3 Other Documentation
Sample SQS Plan (cont’d) /4Sample SQS Plan (cont’d) /4Sample SQS Plan (cont d) /4Sample SQS Plan (cont d) /4 6 Review and Audits
6 1 Purpose 6.1 Purpose 6.2 Minimum Requirements
6.2.1 Software Requirements Review 6.2.2 Preliminary Design Review6. . e a y es g ev ew 6.2.3 Critical Design Review 6.2.4 Software Verification and Validation Review 6.2.5 Functional Audit 6.2.6 Physical Audit 6.2.7 In-process Reviews 6.2.8 Managerial Reviews
6 2 9 C fi i l i 6.2.9 Configuration Management Plan Review 6.2.10 Postmortem Review
Sample SQS Plan (cont’d) /5Sample SQS Plan (cont’d) /5Sample SQS Plan (cont d) /5Sample SQS Plan (cont d) /5 7 Test 8 Problem Reporting and Corrective Action 8 Problem Reporting and Corrective Action
8.1 Practices and Procedures 8.2 Organizational Responsibilities
9 T l T h i d M h d l i 9 Tools, Techniques, and Methodologies 10 Code Control 11 Media Control11 Media Control 12 Supplier Control 13 Records Collection, Maintenance, and Retention 14 Training 15 Risk Management
Sample Test Plan (cont’d) /4Sample Test Plan (cont’d) /4Sample Test Plan (cont d) /4Sample Test Plan (cont d) /4 7 Item Pass/Fail Criteria 8 Suspension Criteria and Resumption
Practice of SRE /1Practice of SRE /1Practice of SRE /1Practice of SRE /1 The practice of SRE provides the software engineer or
manager the means to predict estimate and measure the ratemanager the means to predict, estimate, and measure the rate of failure occurrences in software.
Using SRE in the context of Software Engineering, one can: Analyze, manage, and improve the reliability of software products. Balance customer needs for competitive price, timely delivery, and a
reliable product.y! p Determine when the software is good enough to release to customers,
minimizing the risks of releasing software with serious problems. Avoid excessive time to market due to overtestingH
opef
ully
Avoid excessive time to market due to overtesting.H
Implementing SRE /4Implementing SRE /4Implementing SRE /4Implementing SRE /4 Post delivery and maintenance:Post delivery and maintenance:
Project post-release staff needs Monitor field reliability vs. objectives Track customer satisfaction with reliability Time new feature introduction by monitoring
reliability Guide product and process improvement with
D fi f il f t ’ ti Define failure from customer’s perspective Group identified failures into a group of severity classes from
customer’s perspective Usually 3-4 classes are sufficient
Activity 2:Activity 2: Identify customer reliability needs What is the level of reliability that the customer needs? Who are the rival companies and what are rival products and what is
their reliability?
Activity 3:Activity 3: Determine operational profile Based on the tasks performed and the environmental factors
components hardware and other systemscomponents, hardware and other systems Determine which systems and components are involved and how they
affect the overall system reliability
Activity 7:Activity 7: Engineer to meet reliability objectives Plan using fault tolerance, fault removal and fault avoidance
Activity 8:Activity 8: Focus resources based on operational profile Activity 8:Activity 8: Focus resources based on operational profile Operational profile guides the designer to focus on features that are
supposed to be more criticalDe elop more critical f nctions first in more detail Develop more critical functions first in more detail
ft h d d th tsoftware, hardware and other systems Certification test using reliability demonstration chart
A ti it 10A ti it 10 M f lt i t d ti d Activity 10:Activity 10: Manage fault introduction and propagation Practicing a development methodology; constructing Practicing a development methodology; constructing
modular system; employing reuse; conducting inspection and review; controlling change
System Test PhaseSystem Test PhaseSystem Test PhaseSystem Test Phase Activity 11:Activity 11: Determine operational profile used
f t tifor testing Decide upon critical operations Decide upon need of multiplicity of operational profile Decide upon need of multiplicity of operational profile
Activity 12: Activity 12: Conduct reliability growth testingConduct reliability growth testingActivity 13:Activity 13: Track testing progress and certify Activity 13:Activity 13: Track testing progress and certify that reliability objectives are met Conduct feature test regression test and performance and Conduct feature test, regression test and performance and
Check accuracy of test: time and coverage Plan for changes in test strategies and methods
A ti it 15A ti it 15 C tif th t li bilit bj ti d Activity 15:Activity 15: Certify that reliability objectives and release criteria are met Check accuracy of data collection Check accuracy of data collection Check whether test operational profile reflects field
operational profile Check customer’s definition of failure matches with what
Post Delivery Phase /2Post Delivery Phase /2Post Delivery Phase /2Post Delivery Phase /2 Activity 19:Activity 19: Time new feature introduction by
monitoring reliabilitymonitoring reliability New features bring new defects. Add new features desired
by the customers if they can be managed without ifi i li bili f h h lsacrificing reliability of the whole system
Activity 20:Activity 20: Guide product and process improvement with reliability measuresimprovement with reliability measures Root-cause analysis for the faults Why the fault was not detected earlier in the development
phase and what should be done to reduce the probability of introducing similar faults
Existing vs New ProjectsExisting vs New ProjectsExisting vs. New ProjectsExisting vs. New Projects There is no essential difference between new and existing
projects in applying SRE for the first time Howeverprojects in applying SRE for the first time. However, determining failure intensity objective and operational profile for existing projects is easier.
Most of the SRE activities will require only small updates after they have been completed once, e.g., operational profile should only be updated for the new operations addedshould only be updated for the new operations added. (remember interaction factor)
After SRE has been applied to one release, less effort is needed for succeeding releases, e.g., new test cases should be added to the existing ones.
ShortShort--Cycle ProjectsCycle ProjectsShortShort--Cycle ProjectsCycle Projects Small projects or releases or those with short
development cycles may require a modified set ofdevelopment cycles may require a modified set of SRE activities to keep costs low or activity durations short.
Reduction in cost and time can be obtained by limiting the number of elements in the operational profile and b accepting less precisionprofile and by accepting less precision.
Examples:Examples: Setting one operational mode and performing certification test rather than reliabilityperforming certification test rather than reliability growth test.
Cost ConcernsCost ConcernsCost ConcernsCost Concerns There may be a training cost when starting to apply
SRESRE. The principal cost in applying SRE is determining
the operational profilethe operational profile. Another cost is associated with processing and
analyzing failure data during reliability growth testanalyzing failure data during reliability growth test. As most projects have multiple releases, the SRE
cost drops sharply after initial releasecost drops sharply after initial release.
Practice VariationPractice VariationPractice VariationPractice Variation
Defining an operational profile based on “customer Defining an operational profile based on customer modeling”.
Automatic test cases generation based on frequency Automatic test cases generation based on frequency of use reflected in operational profile.
Employing “cleanroom” development techniques Employing cleanroom development techniques together with feature and certification testing.A t ti t ki f li bilit th Automatic tracking of reliability growth.
Conclusions Conclusions Conclusions …Conclusions … Practical implementation of an effective SRE
program is a non-trivial task.program is a non trivial task. Mechanisms for collection and analysis of data on
software product and process quality must be in placeplace.
Fault identification and elimination techniques must be in place. p
Other organizational abilities such as the use of reviews and inspections, reliability based testing, and software process improvement are also necessary forsoftware process improvement are also necessary for effective SRE.
Quality oriented mindset and training are necessary!
A collection of numerical analysis techniques that quantifies the reliability, availability andthat quantifies the reliability, availability and maintainability of a complex systemRAM analysis helps us answer questions related to dependability (i e reliability safetyrelated to dependability (i.e. reliability, safety, availability and maintainability) of the system
66
RAM: Advantages & UsesRAM: Advantages & UsesRAM: Advantages & UsesRAM: Advantages & UsesCan be used to understand Operation of the system - System reliability versus
through-put rate requirementsd ifi bl f il d Safety of the system - Identifiable failure modes
which present an unacceptable consequence to facility workers or the publicfacility workers or the public
Improvements that can have substantial impacts on system performance - Recommendations forsystem performance - Recommendations for improving the safety and reliability of equipment/processes. q p p
RAM: Data RequirementsRAM: Data RequirementsRAM: Data RequirementsRAM: Data Requirements Failure data Maintenance data Reliability and availability data fromReliability and availability data from
recognized industry standards (MTTF, MTBF & MTTR)& MTTR)
Data collection requires:E i i i d j d t• Engineering experience and judgment
• Interviews with engineering and maintenance personnel at the system site
BackgroundBackgroundBackgroundBackground The City of Calgary invested $100 million in the 1994
expansion of the Bonnybrook Wastewater Treatment Plantexpansion of the Bonnybrook Wastewater Treatment Plant (WTP) to serve Calgary's growing population, which was 767,000 in 1996.
This expansion increased the plant capacity by %25 to 500,000 cubic meter per day, while incorporating state-of-the-art treatment technologiesart treatment technologies.
This study was performed in order to provide the City with an assessment of quality of the Distributed Control Systems (DCS) of the Bonnybrook WTP to be used as a guide for the next WTP plant at Pine Creek.
Background (cont’d)Background (cont’d)Background (cont d)Background (cont d) The City’s WTP DCS is real-time, mission
critical, dependable, safe and secure. However, the current quality measures for q y
City of Calgary’s WTP DCS is unknown. To successfully improve the safety andTo successfully improve the safety and
reliability for the next generation of WTP, which is built in Pine Creek, a study of currentwhich is built in Pine Creek, a study of current level of reliability and safety of the existing Bonnybrook WTP plant was prudent.Bonnybrook WTP plant was prudent.
Assumptions & HintsAssumptions & HintsAssumptions & HintsAssumptions & Hintsa) Deal with both hardware (mechanical and
electrical) and software failureselectrical) and software failuresb) Only deal with “failures”, not mandatory
preventative maintenance or minor repairspreventative maintenance or minor repairs where no components are replaced
c) Components whose function is to wear and/or fail after a certain period of time (e.g., batteries, etc.), and regularly replaced items are not included in the analysisare not included in the analysis
d) Probes, gauges, or transmitters whose purpose is to provide information to the userpurpose is to provide information to the user are not included
p pCan the change in layout change the performance?
Why RAM?Why RAM?Why RAM?Why RAM? Reason for conducting RAM analysis for
B b k WWTPBonnybrook WWTP Better understand the system (system configuration) Better understand the impact of failure / faults of Better understand the impact of failure / faults of
components on the system Establish groundwork for Reliability-Availability-g y y
Maintainability measurement Study the method of data collection, fault / maintenance
record keepingrecord keeping Design and develop tool to perform what-if scenario
RAM: Current ScenarioRAM: Current ScenarioRAM: Current ScenarioRAM: Current Scenario Current scenario at Bonnybrook WWTP Reliability of components and the system as a
whole is not measured Established method to measure the system
reliability needs to be put in placereliability needs to be put in place
RAM: FTARAM: FTARAM: FTARAM: FTA Fault tree analysis is
hi la graphical representation of the major (critical)major (critical) failures associated with a product, the p ,causes for the faults, and potential countermeasures.
Analysis: Inside a DCUAnalysis: Inside a DCUAnalysis: Inside a DCUAnalysis: Inside a DCU
Contains serial and parallel subsystems Contains serial and parallel subsystems Configuration affects total system reliabilityConfiguration affects total system reliabilityConfiguration affects total system reliabilityConfiguration affects total system reliability
What We know? What we would Like to see?The exact layout of the DCS (inside-out) - More relevant failure data
- More maintenance dataActual reliability of the current system Failure mode and their effects (FMEA)Cost and impact of “minor” failures on the overall system Change in cost / reliability with the change in config rationconfigurationIs the system serial or parallel? Are the components inside each DCU serial or parallel?pa a eCan the change in layout change performance?Is the current system/configuration fit to be used in the next projects?
Current:From engineering point of viewFrom engineering point of view
Current: Can understand what the system looks like inside-out Can use current system as benchmark for future system’s Can use current system as benchmark for future system s
performance Can change components and see their effects on reliability
In future: Can be used to pinpoint single points of failures Can be used to effectively plan redundancy and refrain
from “over engineering” and over spending (spending can be made at the right place to complement reliability andbe made at the right place to complement reliability and availability
Can help perform what if scenario evaluationFrom From management point management point of viewof view
Can help perform what-if scenario evaluation Can help planning and design of future projects and plants Can help perform cost-value analysis on maintenance vs. p p y
replacement Can help make better decisions on system/ subsystem/
h b d li bili d d icomponent purchase based on reliability data and impact on performance
Can help compare systems/subsystems/components fromCan help compare systems/subsystems/components from several vendors.
Can be used to plan procedures that need to be in place for d ll i i ll l idata collection, maintenance as well as analysis purpose.
What Was AccomplishedWhat Was AccomplishedWhat Was AccomplishedWhat Was Accomplished Analyzed data to identify proper distribution that
d l f il d tmodels failure data Performed goodness-of-fit and bias tests (using
reliability demonstration charts fault tree analysisreliability demonstration charts, fault tree analysis, etc.) to validate distribution fit
Estimated current system reliability Estimated current system reliability Based on these:
A reliability calculation chart to perform what if analysis A reliability calculation chart to perform what-if analysis for various units of the system was developed
A list of recommendations for reliability improvements of the WTPs DCS was produced