SENG 637 Dependability Reliability & Dependability ...people.ucalgary.ca/~far/Lectures/SENG637/PDF/SENG637-10.pdf · Dependability Reliability & Dependability, Reliability & Testing

SENG 637SENG 637Dependability Reliability & Dependability Reliability & Dependability, Reliability & Dependability, Reliability & Testing of Software Testing of Software SystemsSystems

SRE D l tSRE D l tSRE DeploymentSRE Deployment(Chapter 10)(Chapter 10)

Department of Electrical & Computer Engineering, University of Calgary

B.H. Far （[email protected]）http://www enel ucalgary ca/People/far/Lectures/SENG637/

[email protected] 1

http://www.enel.ucalgary.ca/People/far/Lectures/SENG637/

ContentsContentsContentsContents Quality in requirements phase Quality in design & implementation, testing &

release phasesS f Q li A (SQA) d S f Software Quality Assurance (SQA) and Software Reliability Engineering (SRE)

Quality, test and data plansQuality, test and data plans Roles and responsibilities Sample quality and test plan Defect reporting procedure Best practices of SRE

[email protected] 2

Quality in post-release and maintenance phase

Quality vs Project CostsQuality vs Project CostsQuality vs. Project CostsQuality vs. Project CostsCost distribution for a typical

ft j t

Integrationand test

ProductDesign

software project

Programming

3

Total Cost DistributionTotal Cost DistributionTotal Cost DistributionTotal Cost Distribution

Product Design

Questions:Questions:Questions:Questions:

Programming

QQ

How to build How to build quality into a quality into a system?system?

QQ

How to build How to build quality into a quality into a system?system?

Integrationd t t

Maintenance

system?system?

How to How to assess qualityassess quality

system?system?

How to How to assess qualityassess qualityand test

Developing better quality system will contribute to lowering maintenance

assess quality assess quality of a system?of a system?assess quality assess quality of a system?of a system?

4

contribute to lowering maintenance costs

Quality in Software Quality in Software Development ProcessDevelopment ProcessDevelopment ProcessDevelopment Process

Q. How to include quality concerns in the process?

Architectural analysisQuality attributesMethod: ATAM, CBAM, etc.

Software Reliability Engineering (SRE)

Software QualityAssurance (SQA)

Requirement &Requirement &A hit tA hit t

Design &Design &I l t tiI l t ti Test & ReleaseTest & ReleaseArchitectureArchitecture ImplementationImplementation

MaintenanceMaintenance

Software Quality Assessment

[email protected] 5

Q yMethod: RAM, etc.

Chapter 10Chapter 10 Section 1Section 1

S fS f Q liQ liSoftware Software Quality:Quality:Requirements and Requirements and qqArchitecture phaseArchitecture phase

[email protected] 6

Quality ChallengesQuality ChallengesQuality ChallengesQuality Challenges Modern software systems are required to meet

several quality attributes such as: modifiabilityseveral quality attributes such as: modifiability, performance, security, interoperability, portability, reliability, etc.

Questions for any particular system: What precisely do these quality attributes mean?

b l d d i d i d li i Can a system be analyzed to determine desired qualities? How soon can such an analysis occur? How do you know if the design is suitable without How do you know if the design is suitable without

having to build the system first?

SW Architecture Evaluation / Assessment!SW Architecture Evaluation / Assessment!

[email protected] 7

SW Architecture Evaluation / Assessment!SW Architecture Evaluation / Assessment!

Evaluating SW ArchitectureEvaluating SW ArchitectureEvaluating SW ArchitectureEvaluating SW Architecture Determining whether an architecture satisfies

it i t ft i lits requirements often involves: Being very explicit about what the requirements

(functional & non-functional) are and how they are ( ) yreflected in the architecture

Understanding where one has to make trade-offs between different design alternativesdifferent design alternatives

Applying analysis wherever possible to determine the consequences of an architectural choice

Mediating between desires of different stakeholders

To achieve these goals an architectural To achieve these goals an architectural l ti i d dl ti i d d

[email protected] 8

evaluation process is neededevaluation process is needed

SW Architecture EvaluationSW Architecture EvaluationSW Architecture EvaluationSW Architecture Evaluation

I f l / d h hit t l l tiInformal / ad-hoc architectural evaluation

Pros? Quick and CheapQ p

Cons? and Dirty? Incomplete? Unreliable? … and Dirty? Incomplete? Unreliable? … Unrepeatable? Poorly documented?

[email protected] 9

SW Architecture EvaluationSW Architecture EvaluationSW Architecture EvaluationSW Architecture Evaluation

Are there better methods than ad-hoc evaluation? The answer is “YES”:

SAAM (Software Architecture Analysis Method) Scenario-based evaluation

ATAM (Architecture Tradeoff Analysis Method)Scenario based evaluation with focus on trade offs Scenario-based evaluation with focus on trade-offs

SACAM (Software Architecture Comparison Method) Business goal-driven comparison of architecture alternatives

CBAM (Cost-Benefit Analysis Method) Focus on economic aspects

t

[email protected] 10

etc.

ReferencesReferencesReferencesReferences Software Architecture Technology Initiative of the

SEI: http://www.sei.cmu.edu/architecture/ ATAM: Method for Architecture Evaluation (2000),

Rick Kazman, Mark Klein, Paul Clements, Technical Report, CMU/SEI-2000-TR-004.CBAM M ki A hi D i D i i A CBAM: Making Architecture Design Decisions: An Economic Approach (2002), Rick Kazman, Jai Asundi, Mark Klein, Technical Report, CMU/SEI-2002-TR-035.2002 TR 035.

Software Architecture in Practice, 2nd ed., Len Bass, Paul Clements, Rick Kazman, Addison-Wesley, 2003., , y,

Evaluating Software Architectures: Methods and Case Studies, Paul Clements, Rick Kazman, Mark Klein,


Addison-Wesley, 2001.

Chapter 10Chapter 10 Section Section 22

S fS f Q li D i &Q li D i &Software Software Quality: Design & Quality: Design & Implementation, Testing & Implementation, Testing &

Release PhasesRelease Phases


What is Reliable Software?What is Reliable Software?What is Reliable Software?What is Reliable Software? Reliable software products are those that run correctly and

consistently, have fewer remaining defects, handle abnormalconsistently, have fewer remaining defects, handle abnormal situation properly, and need less installation effort

The remaining defects should not affect the normal behaviour and the use of the software they will not do any destructiveand the use of the software, they will not do any destructive things to system and its hardware or software environment, and rarely be evident to the usersD l i li bl ft i Developing reliable software requires: Establishing Software Quality System (SQS) and Software

Quality Assurance (SQA) programs Establishing Software Reliability Engineering (SRE)

process


Software Quality System (SQS)Software Quality System (SQS)Software Quality System (SQS)Software Quality System (SQS)Goals:Goals:

B ildi lit Building quality into the software from the beginningg g

Keeping and p gtracking quality in the software throughout thethroughout the software life cycle

T h l


John W. Horch: Practical Guide to Software Quality Management

Technology

Software Quality Assurance (SQA)Software Quality Assurance (SQA)Software Quality Assurance (SQA)Software Quality Assurance (SQA) Software quality Assurance (SQA) is a planned and

systematic approach to ensure that both software process andsystematic approach to ensure that both software process and software product conform to the established standards, processes, and procedures.

The goals of SQA are to improve software quality by The goals of SQA are to improve software quality by monitoring both software and the development process to ensure full compliance with the established standards and proceduresprocedures.

Steps to establish an SQA program Get the top management’s agreement on its goal and support. Identify SQA issues, write SQA plan, establish standards and SQA

functions, implement the SQA plan and evaluate SQA program.


SRE: Process & PlansSRE: Process & PlansSRE: Process & PlansSRE: Process & Plans

Requirement &Requirement & Design &Design & TestTestArchitectureArchitecture ImplementationImplementation TestTest

Define NecessaryDefine NecessaryReliabilityReliability

Develop OperationalDevelop OperationalProfileProfile

P f T tP f T t

SREProc Prepare for TestPrepare for Test

Execute Execute TestTest

Apply Apply Failure Failure

DataData

Proc

QualityPlan There may be many Test and Data (measurement)

TestPlan

DataPlan

time


Plan There may be many Test and Data (measurement) plans for various parts of the same project

Plan Plan

Defect Handling: Without & Defect Handling: Without & With SQSWith SQSWith SQSWith SQS

Defect reporting, tracking, and closure procedure p g, g, p

Defect reports

SCN: software change notice

STR: software trouble report

DB

p


John W. Horch: Practical Guide to Software Quality Management

SRE: Who is SRE: Who is InvolvedInvolved??SRE: Who is SRE: Who is InvolvedInvolved?? Senior management Test coordinator (manager) Data coordinator (manager)Data coordinator (manager) Customer or user


SRE: Management ConcernsSRE: Management ConcernsSRE: Management ConcernsSRE: Management Concerns Perception and specification of a customer’s real needs.

l i f ifi i i f i d i Translation of specification into a conforming design. Maintaining conformity throughout the development

processesprocesses. Product and sub-product demonstrations which provide

convincing indications of the product and project having metconvincing indications of the product and project having met their requirements.

Ensuring that the tests and demonstrations are designed and controlled, so as to be both achievable and manageable.


Roles & Responsibilities /1Roles & Responsibilities /1Roles & Responsibilities /1Roles & Responsibilities /1 Test Coordinator (Manager):Test Coordinator (Manager):

Test coordinator is expected to ensure that every specific statement ofTest coordinator is expected to ensure that every specific statement of intent in the product requirement, specification and design, is matched by a well designed (cost-effective, convincing, self-reporting, etc.) test, measurement or demonstrationmeasurement or demonstration.

Data Coordinator (Manager) :Data Coordinator (Manager) :Data coordinator ensures that the physical and administrative structures f d ll i i d d d i h li l ifor data collection exist and are documented in the quality plan, receives and validates the data during development, and through analysis and communication ensures that the meaning of the information is known to all, in time, for effective application.


Roles & Responsibilities /2Roles & Responsibilities /2Roles & Responsibilities /2Roles & Responsibilities /2

Customer or User:Customer or User: Actively encouraging the making and following of detailed

quality plans for the products and projects. Requiring access to previous quality plans and their

d d b f i h fi drecorded outcomes before accepting the figures and methods quoted in the new plan.

Enquiring into the sources and validity of synthetics and formulae used in estimating and planningformulae used in estimating and planning.

Appointing appropriate personnel to provide authoritative responses to queries from the developer and a managed interface to the developer.p

Receiving and reviewing reports of significant audits, reviews, tests and demonstrations.

Making any queries and objections in detail and in writing,


at the earliest possible time.

Quality Plans /1Quality Plans /1Quality Plans /1Quality Plans /1 The most promising mechanisms

f i i d i ifor gaining and improving predictability and controllability of software qualities are quality plan and its subsidiary TestTestplan and its subsidiary documents, including test plansand data (measurement) plans.

The creation of the quality planQualityQuality

PlanPlan

PlanPlan

The creation of the quality plancan be instrumental in raising project effectiveness and in preventing expensive and time-

PlanPlan

DataDatap g pconsuming misunderstandings during the project, and at release/acceptance time.

PlanPlan


Quality Plan /2Quality Plan /2Quality Plan /2Quality Plan /2 Quality plan and quality record, provide guidelines

f i t d t lli th f ll ifor carrying out and controlling the followings: Requirement and specification management. Development processes Development processes. Documentation management. Design evaluation.Design evaluation. Product testing. Data collection and interpretation. SRE related

activities Acceptance and release processes.

activities


Quality Plan /3Quality Plan /3Quality Plan /3Quality Plan /3 Quality planning should be made at the very earliest point in a

project preferably before a final decision is made onproject, preferably before a final decision is made on feasibility, and before a software development contract is signed.

Quality plan should be devised and agreed between all the concerned parties: senior management, software development management (both administrative and technical) softwaremanagement (both administrative and technical), software development team, customers, and any involved general support functions such as resource management and

id lit tcompany-wide quality management.


Data (Measurement) PlanData (Measurement) PlanData (Measurement) PlanData (Measurement) Plan The data (measurement) plan prescribes:

What should be measured and recorded during a project; How it should be checked and collated;

How it should be interpreted and applied How it should be interpreted and applied.

Data may be collected in several ways, within the specific project and beyond itspecific project and beyond it.

Ideally, there should be a higher level of data collection and application into which project data iscollection and application into which project data is fed.


Test Plan /1Test Plan /1Test Plan /1Test Plan /1 The purpose of test plan is to ensure that all testing activities

(including those used for controlling the process of(including those used for controlling the process of development, and in indicating the progress of the project) are expected, are manageable and are managed.

Test plans are created as a subsection or as an associated Test plans are created as a subsection or as an associated document of the quality plan.

Test plans become progressively more detailed and expanded d i j tduring a project.

Each test plan defines its own objectives and scope, and the means and methods by whichscope, and the means and methods by which the objectives are expected to be met.


Test Plan /2Test Plan /2Test Plan /2Test Plan /2 For the software product, the test plan is usually restricted by

the scope of the test: certification, feature and load test.the scope of the test: certification, feature and load test. The plan predicts the resources and means required to reach

the required levels of assurance about the end products, and the scheduling of all testing measuring and demonstrationthe scheduling of all testing, measuring and demonstration activities.

Tests, measurements and demonstrations are used to establish th t th ft d t ti fi th i t d tthat the software product satisfies the requirements document, and that each process during a development is carried out correctly and results in acceptable outcomes.


Chapter 10Chapter 10 Section Section 2.12.1

El t f Q lit & T t PlEl t f Q lit & T t PlElements of Quality & Test PlanElements of Quality & Test Plan


Sample SQS Plan /1 Sample SQS Plan /1 Sample SQS Plan /1 Sample SQS Plan /1 1 Purpose 2 Reference Documents 3 Management3 Management

3.1 Organization 3 2 Tasks 3.2 Tasks 3.3 Responsibilities


Based on IEEE Standard 730.1-1989

Sample SQS Plan (cont’d) /2 Sample SQS Plan (cont’d) /2 Sample SQS Plan (cont d) /2 Sample SQS Plan (cont d) /2 4 Documentation

4.1 Purpose 4.2 Minimum Documentation

4 2 1 S ft R i t S ifi ti 4.2.1 Software Requirements Specification 4.2.2 Software Design Description 4.2.3 Software Verification and Validation Plan 4.2.4 Software Verification and Validation Report 4.2.5 User Documentation

fi i l 4.2.6 Configuration Management Plan 4.3 Other Documentation



Sample SQS Plan (cont’d) /3Sample SQS Plan (cont’d) /3Sample SQS Plan (cont d) /3Sample SQS Plan (cont d) /3 5 Standards, Practices, Conventions, and

Metrics 5.1 Purpose 5.2 Documentation, Logic, Coding, and

Commentary Standards and Conventions 5.3 Testing Standards, Conventions, and Practices 5.4 Metrics



Sample SQS Plan (cont’d) /4Sample SQS Plan (cont’d) /4Sample SQS Plan (cont d) /4Sample SQS Plan (cont d) /4 6 Review and Audits

6 1 Purpose 6.1 Purpose 6.2 Minimum Requirements

6.2.1 Software Requirements Review 6.2.2 Preliminary Design Review6. . e a y es g ev ew 6.2.3 Critical Design Review 6.2.4 Software Verification and Validation Review 6.2.5 Functional Audit 6.2.6 Physical Audit 6.2.7 In-process Reviews 6.2.8 Managerial Reviews

6 2 9 C fi i l i 6.2.9 Configuration Management Plan Review 6.2.10 Postmortem Review

6.3 Other Reviews and Audits



Sample SQS Plan (cont’d) /5Sample SQS Plan (cont’d) /5Sample SQS Plan (cont d) /5Sample SQS Plan (cont d) /5 7 Test 8 Problem Reporting and Corrective Action 8 Problem Reporting and Corrective Action

8.1 Practices and Procedures 8.2 Organizational Responsibilities

9 T l T h i d M h d l i 9 Tools, Techniques, and Methodologies 10 Code Control 11 Media Control11 Media Control 12 Supplier Control 13 Records Collection, Maintenance, and Retention 14 Training 15 Risk Management



Sample Test Plan /1Sample Test Plan /1Sample Test Plan /1Sample Test Plan /1 1 Test Plan identifier 2 Introduction

2.1 Objectivesj 2.2 Background 2.3 Scope.3 Scope 2.4 References


Based on IEEE Standard 829-1983

Sample Test Plan (cont’d) /2Sample Test Plan (cont’d) /2Sample Test Plan (cont d) /2Sample Test Plan (cont d) /2 3 Test Items

3.1 Program Modules 3.2 Job Control Procedures 3.3 User Procedures 3.4 Operator Procedures

4 Features To Be Tested 5 Feature Not To be Tested 5 Feature Not To be Tested



Sample Test Plan (cont’d) /3Sample Test Plan (cont’d) /3Sample Test Plan (cont d) /3Sample Test Plan (cont d) /3 6 Approach

6.1 Conversion Testing 6.2 Job Stream Testing

6 3 Interface Testing 6.3 Interface Testing 6.4 Security Testing 6.5 Recovery Testing 6.5 Recovery Testing 6.6 Performance Testing 6.7 Regression 6.8 Comprehensiveness 6.9 Constraints



Sample Test Plan (cont’d) /4Sample Test Plan (cont’d) /4Sample Test Plan (cont d) /4Sample Test Plan (cont d) /4 7 Item Pass/Fail Criteria 8 Suspension Criteria and Resumption

Requirementsq 8.1 Suspension Criteria 8.2 Resumption Requirementsp q

9 Test Deliverables 10 Testing Tasks 10 Testing Tasks



Sample Test Plan (cont’d) /5Sample Test Plan (cont’d) /5Sample Test Plan (cont d) /5Sample Test Plan (cont d) /5 11 Environmental Needs

11 1 H d 11.1 Hardware 11.2 Software 11.3 Securityy 11.4 Tools 11.5 Publications

12 Responsibilities 12.1 Test Group 12 2 User Department 12.2 User Department 12.3 Development Project Group



Sample Test Plan (cont’d) /6Sample Test Plan (cont’d) /6Sample Test Plan (cont d) /6Sample Test Plan (cont d) /6 13 Staffing and Training Needs

13.1 Staffing 13.2 Training

14 Schedule 15 Risks and Contingencies 15 Risks and Contingencies 16 Approvals



Chapter 10Chapter 10 Section 2.2Section 2.2

B t P ti SREB t P ti SREBest Practice SREBest Practice SRE


Practice of SRE /1Practice of SRE /1Practice of SRE /1Practice of SRE /1 The practice of SRE provides the software engineer or

manager the means to predict estimate and measure the ratemanager the means to predict, estimate, and measure the rate of failure occurrences in software.

Using SRE in the context of Software Engineering, one can: Analyze, manage, and improve the reliability of software products. Balance customer needs for competitive price, timely delivery, and a

reliable product.y! p Determine when the software is good enough to release to customers,

minimizing the risks of releasing software with serious problems. Avoid excessive time to market due to overtestingH

opef

ully

Avoid excessive time to market due to overtesting.H


Incremental ImplementationIncremental ImplementationIncremental ImplementationIncremental Implementation

Most projects Most projects implement the SRE activitiesSRE activities incrementally.

A typical A typical implementation sequencesequence


Implementing SRE /1Implementing SRE /1Implementing SRE /1Implementing SRE /1 Feasibility and requirements phase:Feasibility and requirements phase:

Define and classify failures, i.e., failure severity classes

Identify customer reliability needs Determine operational profile Conduct trade-off studies (among reliability, time,

cost, people, technology) Set reliability objectives


Implementing SRE /2Implementing SRE /2Implementing SRE /2Implementing SRE /2 Design and implementation phase:Design and implementation phase:

Allocate reliability among components, acquired software, hardware and other systems

Engineer to meet reliability objectives Focus resources based on operational profile Measure reliability of acquired software, hardware

and other systems, i.e., certification test Manage fault introduction and propagation


Implementing SRE /3Implementing SRE /3Implementing SRE /3Implementing SRE /3 System test and field trial phase:System test and field trial phase:

Determine operational profile used for testing, i.e. test profile

Conduct reliability growth testing Track testing progress Project additional testing needed Certify reliability objectives and release criteria

are met


Implementing SRE /4Implementing SRE /4Implementing SRE /4Implementing SRE /4 Post delivery and maintenance:Post delivery and maintenance:

Project post-release staff needs Monitor field reliability vs. objectives Track customer satisfaction with reliability Time new feature introduction by monitoring

reliability Guide product and process improvement with

reliability measures


Feasibility PhaseFeasibility PhaseFeasibility PhaseFeasibility Phase Activity 1:Activity 1: Define and classify failures

D fi f il f t ’ ti Define failure from customer’s perspective Group identified failures into a group of severity classes from

customer’s perspective Usually 3-4 classes are sufficient

Activity 2:Activity 2: Identify customer reliability needs What is the level of reliability that the customer needs? Who are the rival companies and what are rival products and what is

their reliability?

Activity 3:Activity 3: Determine operational profile Based on the tasks performed and the environmental factors


Requirements PhaseRequirements PhaseRequirements PhaseRequirements Phase Activity 4:Activity 4: Conduct trade-off studies

Reliability and functionality Reliability and functionality Reliability, cost, delivery date, technology, team

Activity 5:Activity 5: Set reliability objectivesbased on Explicit requirement statements from a request for

proposal or standard documentp p Customer satisfaction with a previous release or similar

product Capabilities of competition Capabilities of competition Trade-offs with performance, delivery date and cost Warranty, technology capabilities


Design PhaseDesign PhaseDesign PhaseDesign Phase Activity 6:Activity 6: Allocate reliability among acquired software,

components hardware and other systemscomponents, hardware and other systems Determine which systems and components are involved and how they

affect the overall system reliability

Activity 7:Activity 7: Engineer to meet reliability objectives Plan using fault tolerance, fault removal and fault avoidance

Activity 8:Activity 8: Focus resources based on operational profile Activity 8:Activity 8: Focus resources based on operational profile Operational profile guides the designer to focus on features that are

supposed to be more criticalDe elop more critical f nctions first in more detail Develop more critical functions first in more detail


Implementation PhaseImplementation PhaseImplementation PhaseImplementation Phase Activity 9:Activity 9: Measure reliability of acquired

ft h d d th tsoftware, hardware and other systems Certification test using reliability demonstration chart

A ti it 10A ti it 10 M f lt i t d ti d Activity 10:Activity 10: Manage fault introduction and propagation Practicing a development methodology; constructing Practicing a development methodology; constructing

modular system; employing reuse; conducting inspection and review; controlling change


System Test PhaseSystem Test PhaseSystem Test PhaseSystem Test Phase Activity 11:Activity 11: Determine operational profile used

f t tifor testing Decide upon critical operations Decide upon need of multiplicity of operational profile Decide upon need of multiplicity of operational profile

Activity 12: Activity 12: Conduct reliability growth testingConduct reliability growth testingActivity 13:Activity 13: Track testing progress and certify Activity 13:Activity 13: Track testing progress and certify that reliability objectives are met Conduct feature test regression test and performance and Conduct feature test, regression test and performance and

load test Conduct reliability growth test


Field Trial PhaseField Trial PhaseField Trial PhaseField Trial Phase Activity 14:Activity 14: Project additional testing needed

Check accuracy of test: time and coverage Plan for changes in test strategies and methods

A ti it 15A ti it 15 C tif th t li bilit bj ti d Activity 15:Activity 15: Certify that reliability objectives and release criteria are met Check accuracy of data collection Check accuracy of data collection Check whether test operational profile reflects field

operational profile Check customer’s definition of failure matches with what

was defined for testing the product


Post Delivery Phase /1Post Delivery Phase /1Post Delivery Phase /1Post Delivery Phase /1 Activity 16:Activity 16: Project post-release staff needs

C t ’ t ff f t li ’ t ff t Customer’s staff for system recovery; supplier’s staff to handle customer-reported failures and to remove faults

Activity 17:Activity 17: Monitor field reliability vs. objectivesyy y j Collect post release failure data systematically

Activity 18:Activity 18: Track customer satisfaction with li bilireliability

Survey product features with a sample customer set


Post Delivery Phase /2Post Delivery Phase /2Post Delivery Phase /2Post Delivery Phase /2 Activity 19:Activity 19: Time new feature introduction by

monitoring reliabilitymonitoring reliability New features bring new defects. Add new features desired

by the customers if they can be managed without ifi i li bili f h h lsacrificing reliability of the whole system

Activity 20:Activity 20: Guide product and process improvement with reliability measuresimprovement with reliability measures Root-cause analysis for the faults Why the fault was not detected earlier in the development

phase and what should be done to reduce the probability of introducing similar faults


Chapter 10Chapter 10 Section 2.3Section 2.3

P ti V i tiP ti V i tiPractice VariationsPractice Variations


Existing vs New ProjectsExisting vs New ProjectsExisting vs. New ProjectsExisting vs. New Projects There is no essential difference between new and existing

projects in applying SRE for the first time Howeverprojects in applying SRE for the first time. However, determining failure intensity objective and operational profile for existing projects is easier.

Most of the SRE activities will require only small updates after they have been completed once, e.g., operational profile should only be updated for the new operations addedshould only be updated for the new operations added. (remember interaction factor)

After SRE has been applied to one release, less effort is needed for succeeding releases, e.g., new test cases should be added to the existing ones.


ShortShort--Cycle ProjectsCycle ProjectsShortShort--Cycle ProjectsCycle Projects Small projects or releases or those with short

development cycles may require a modified set ofdevelopment cycles may require a modified set of SRE activities to keep costs low or activity durations short.

Reduction in cost and time can be obtained by limiting the number of elements in the operational profile and b accepting less precisionprofile and by accepting less precision.

Examples:Examples: Setting one operational mode and performing certification test rather than reliabilityperforming certification test rather than reliability growth test.


Cost ConcernsCost ConcernsCost ConcernsCost Concerns There may be a training cost when starting to apply

SRESRE. The principal cost in applying SRE is determining

the operational profilethe operational profile. Another cost is associated with processing and

analyzing failure data during reliability growth testanalyzing failure data during reliability growth test. As most projects have multiple releases, the SRE

cost drops sharply after initial releasecost drops sharply after initial release.


Practice VariationPractice VariationPractice VariationPractice Variation

Defining an operational profile based on “customer Defining an operational profile based on customer modeling”.

Automatic test cases generation based on frequency Automatic test cases generation based on frequency of use reflected in operational profile.

Employing “cleanroom” development techniques Employing cleanroom development techniques together with feature and certification testing.A t ti t ki f li bilit th Automatic tracking of reliability growth.

SRE for Agile software development.


Conclusions Conclusions Conclusions …Conclusions … Practical implementation of an effective SRE

program is a non-trivial task.program is a non trivial task. Mechanisms for collection and analysis of data on

software product and process quality must be in placeplace.

Fault identification and elimination techniques must be in place. p

Other organizational abilities such as the use of reviews and inspections, reliability based testing, and software process improvement are also necessary forsoftware process improvement are also necessary for effective SRE.

Quality oriented mindset and training are necessary!


Chapter 10Chapter 10 Section Section 33

S fS f Q liQ liSoftware Software Quality: Quality: Post Release & Maintenance Post Release & Maintenance

PhasePhase


Quality AssessmentQuality AssessmentQuality AssessmentQuality Assessment

Post release quality Post-release quality assessment: evaluation, validation

Quality Assessment,

63

Ref: Design for Electrical & Comp. Engineers, J.E. Salt et al., Wiley

Quality Assessment: Quality Assessment: DifficultiesDifficultiesDifficultiesDifficulties

Pablo Picasso

Dorra Maar

Leonardo da Vinci

Mona Lisa Dorra Maar(1937)

Mona Lisa(1479)

Same requirements can lead to different systems Need to account for “creativity” in the “design” of the

product and the “requirements” as well as the “product” itself

Quality assessment method: RAM

64

Quality assessment method: RAM

How Do We Assess Quality?How Do We Assess Quality?How Do We Assess Quality?How Do We Assess Quality?

Usual (adUsual (ad--hoc)hoc)Usual (adUsual (ad hoc) hoc) approachapproach

Systematic Systematic approach: approach: RAM

[email protected]

Inside RAMInside RAMInside RAMInside RAM

What is RAM? What is RAM?

RAMRAM:: RRELIABILITYELIABILITY –– AAVAILABILITYVAILABILITY –– MMAINTAINABILITYAINTAINABILITYRAMRAM:: RRELIABILITYELIABILITY AAVAILABILITYVAILABILITY MMAINTAINABILITYAINTAINABILITY

A collection of numerical analysis techniques that quantifies the reliability, availability andthat quantifies the reliability, availability and maintainability of a complex systemRAM analysis helps us answer questions related to dependability (i e reliability safetyrelated to dependability (i.e. reliability, safety, availability and maintainability) of the system

66

RAM: Advantages & UsesRAM: Advantages & UsesRAM: Advantages & UsesRAM: Advantages & UsesCan be used to understand Operation of the system - System reliability versus

through-put rate requirementsd ifi bl f il d Safety of the system - Identifiable failure modes

which present an unacceptable consequence to facility workers or the publicfacility workers or the public

Improvements that can have substantial impacts on system performance - Recommendations forsystem performance - Recommendations for improving the safety and reliability of equipment/processes. q p p


RAM: Data RequirementsRAM: Data RequirementsRAM: Data RequirementsRAM: Data Requirements Failure data Maintenance data Reliability and availability data fromReliability and availability data from

recognized industry standards (MTTF, MTBF & MTTR)& MTTR)

Data collection requires:E i i i d j d t• Engineering experience and judgment

• Interviews with engineering and maintenance personnel at the system site


personnel at the system site

Case StudyCase Study

Di t ib t d C t l S t f thDi t ib t d C t l S t f th

RAM AnalysisRAM Analysis

Distributed Control System of the Distributed Control System of the BonnybrookBonnybrook Waste Water Treatment Waste Water Treatment

Pl t (Cit f C l )Pl t (Cit f C l )Plant (City of Calgary)Plant (City of Calgary)


BackgroundBackgroundBackgroundBackground The City of Calgary invested $100 million in the 1994

expansion of the Bonnybrook Wastewater Treatment Plantexpansion of the Bonnybrook Wastewater Treatment Plant (WTP) to serve Calgary's growing population, which was 767,000 in 1996.

This expansion increased the plant capacity by %25 to 500,000 cubic meter per day, while incorporating state-of-the-art treatment technologiesart treatment technologies.

This study was performed in order to provide the City with an assessment of quality of the Distributed Control Systems (DCS) of the Bonnybrook WTP to be used as a guide for the next WTP plant at Pine Creek.


Background (cont’d)Background (cont’d)Background (cont d)Background (cont d) The City’s WTP DCS is real-time, mission

critical, dependable, safe and secure. However, the current quality measures for q y

City of Calgary’s WTP DCS is unknown. To successfully improve the safety andTo successfully improve the safety and

reliability for the next generation of WTP, which is built in Pine Creek, a study of currentwhich is built in Pine Creek, a study of current level of reliability and safety of the existing Bonnybrook WTP plant was prudent.Bonnybrook WTP plant was prudent.


Assumptions & HintsAssumptions & HintsAssumptions & HintsAssumptions & Hintsa) Deal with both hardware (mechanical and

electrical) and software failureselectrical) and software failuresb) Only deal with “failures”, not mandatory

preventative maintenance or minor repairspreventative maintenance or minor repairs where no components are replaced

c) Components whose function is to wear and/or fail after a certain period of time (e.g., batteries, etc.), and regularly replaced items are not included in the analysisare not included in the analysis

d) Probes, gauges, or transmitters whose purpose is to provide information to the userpurpose is to provide information to the user are not included


Assumptions (Cont’d)Assumptions (Cont’d)Assumptions (Cont d)Assumptions (Cont d)e) Failures due to an improper installation of

hardware/software are not includedhardware/software are not includedf) Missing parts are not considered failures

(e.g., rivets, screws, bolts)(e.g., rivets, screws, bolts) g) Anything below the subsystem level is

considered to be in seriesh) All subsystems are independent (i.e., loss of

one subsystem does not result in loss of th b t ’ f ti lit )another subsystem’s functionality)

i) Failures are not distinguished based on their severityseverity


RAM for RAM for BonnybrookBonnybrook WWTPWWTPRAM for RAM for BonnybrookBonnybrook WWTPWWTP

Reason for conducting RAM analysis for Bonnybrook WWTP

Current scenario at Bonnybrook WWTPy

Methods / Techniques used

Result of the analysis (How Reliable is Bonnybrook?)Result of the analysis (How Reliable is Bonnybrook?)

Key value of RAM analysis to the City of Calgary

How to use the results (for current and future systems)


Why RAM?Why RAM?Why RAM?Why RAM? Reason for conducting RAM analysis for

Bonnybrook WWTPWhat we know? What we DO NOT know?

Current system runs smoothly Actual reliability of the systemCurrent system runs smoothly Actual reliability of the system

Minor failures can be repaired easily (e.g. card frame change)

Cost of each maintenance

Connection layout of components Impact of “minor” failures on the overall system

Frequency of failure / maintenance

- Accurate failure data- Accurate maintenance data

Ch i t / li bilit ith th h i fi tiChange in cost / reliability with the change in configuration

Is the current system configuration good for next projects?

Is the system serial or parallel? Are the components inside each DCU serial or parallel?


p pCan the change in layout change the performance?

Why RAM?Why RAM?Why RAM?Why RAM? Reason for conducting RAM analysis for

B b k WWTPBonnybrook WWTP Better understand the system (system configuration) Better understand the impact of failure / faults of Better understand the impact of failure / faults of

components on the system Establish groundwork for Reliability-Availability-g y y

Maintainability measurement Study the method of data collection, fault / maintenance

record keepingrecord keeping Design and develop tool to perform what-if scenario


RAM: Current ScenarioRAM: Current ScenarioRAM: Current ScenarioRAM: Current Scenario Current scenario at Bonnybrook WWTP Reliability of components and the system as a

whole is not measured Established method to measure the system

reliability needs to be put in placereliability needs to be put in place


RAM: Methods UsedRAM: Methods UsedRAM: Methods UsedRAM: Methods Used Techniques are selected based on availability

of data and tools Techniques used in this analysis are:q y

Reliability Block Diagram (RBD) Reliability Demonstration Chart (RDC)y ( ) Fault Tree Analysis (FTA)


RAM: RBDRAM: RBDRAM: RBDRAM: RBD A reliability block diagram is a graphical

t ti f h th t f trepresentation of how the components of a system are connected from reliability point of view


RAM: RDCRAM: RDCRAM: RDCRAM: RDC RDC analysis is an efficient way of checking

h th f il i t it bj ti (FIO) i twhether failure intensity objective (FIO) is met or not.


RAM: FTARAM: FTARAM: FTARAM: FTA Fault tree analysis is

hi la graphical representation of the major (critical)major (critical) failures associated with a product, the p ,causes for the faults, and potential countermeasures.


Analysis: System ConfigurationAnalysis: System ConfigurationAnalysis: System ConfigurationAnalysis: System Configuration


Analysis: Data Control SystemAnalysis: Data Control SystemAnalysis: Data Control SystemAnalysis: Data Control System

RBD of The DCS layout


Analysis: Inside a DCUAnalysis: Inside a DCUAnalysis: Inside a DCUAnalysis: Inside a DCU

Contains serial and parallel subsystems Contains serial and parallel subsystems Configuration affects total system reliabilityConfiguration affects total system reliabilityConfiguration affects total system reliabilityConfiguration affects total system reliability

Total Total 10 Units10 Units


Analysis: ResultsAnalysis: ResultsAnalysis: ResultsAnalysis: Results


Analysis: ResultsAnalysis: ResultsAnalysis: ResultsAnalysis: Results

What We know? What we would Like to see?The exact layout of the DCS (inside-out) - More relevant failure data

- More maintenance dataActual reliability of the current system Failure mode and their effects (FMEA)Cost and impact of “minor” failures on the overall system Change in cost / reliability with the change in config rationconfigurationIs the system serial or parallel? Are the components inside each DCU serial or parallel?pa a eCan the change in layout change performance?Is the current system/configuration fit to be used in the next projects?


RAM: Key ValuesRAM: Key ValuesRAM: Key ValuesRAM: Key Values

Current:From engineering point of viewFrom engineering point of view

Current: Can understand what the system looks like inside-out Can use current system as benchmark for future system’s Can use current system as benchmark for future system s

performance Can change components and see their effects on reliability

In future: Can be used to pinpoint single points of failures Can be used to effectively plan redundancy and refrain

from “over engineering” and over spending (spending can be made at the right place to complement reliability andbe made at the right place to complement reliability and availability


RAM: Key ValuesRAM: Key ValuesRAM: Key ValuesRAM: Key Values

Can help perform what if scenario evaluationFrom From management point management point of viewof view

Can help perform what-if scenario evaluation Can help planning and design of future projects and plants Can help perform cost-value analysis on maintenance vs. p p y

replacement Can help make better decisions on system/ subsystem/

h b d li bili d d icomponent purchase based on reliability data and impact on performance

Can help compare systems/subsystems/components fromCan help compare systems/subsystems/components from several vendors.

Can be used to plan procedures that need to be in place for d ll i i ll l idata collection, maintenance as well as analysis purpose.


What Was AccomplishedWhat Was AccomplishedWhat Was AccomplishedWhat Was Accomplished Defined benchmark value for reliability metrics for

WTP DCS t ( d ’ d t )WTPs DCS components (vendors’ data) Defined the architecture of the WTPs DCS

d ifi d f f il d Identified source of failure data Stated ground rules and assumptions Identified the confidence level of estimation and

predictions Collected failure and maintenance data for the

current WTPs DCS


What Was AccomplishedWhat Was AccomplishedWhat Was AccomplishedWhat Was Accomplished Analyzed data to identify proper distribution that

d l f il d tmodels failure data Performed goodness-of-fit and bias tests (using

reliability demonstration charts fault tree analysisreliability demonstration charts, fault tree analysis, etc.) to validate distribution fit

Estimated current system reliability Estimated current system reliability Based on these:

A reliability calculation chart to perform what if analysis A reliability calculation chart to perform what-if analysis for various units of the system was developed

A list of recommendations for reliability improvements of the WTPs DCS was produced


ConclusionsConclusionsConclusionsConclusions System integration & manufacturing are not the final

t f d l t ( ll )steps of a development process (usually) Quality assessment of hardware/software system can

be performed systematically using RAMbe performed systematically using RAM Mechanisms for failure data collection and

interpretation are necessaryinterpretation are necessary Engineering judgment (in selecting tools, techniques,

interpreting data etc ) is essential to the analysisinterpreting data, etc.) is essential to the analysis