presentation

1

Software Reliability Engineering: A Roadmap

Michael R. Lyu

Dept. of Computer Science & Engineering The Chinese University of Hong Kong

Future of Software EngineeringICSE’2007

Minneapolis, MinnesotaMay 24, 2007

2

Introduction

Software reliability is the probability of failure-free operation with respect to execution time and environment.

Software reliability engineering (SRE) is the quantitative study of the operational behavior of software-based systems with respect to user requirements concerning reliability.

SRE has been adopted by more than 50 companies as standards or best current practices.

Creditable software reliability techniques are still in urgent need.

3

Historical SRE Techniques: Fault Lifecycle

Fault prevention: to avoid, by construction, fault occurrences.

Fault removal: to detect, by verification and validation, the existence of faults and eliminate them.

Fault tolerance: to provide, by redundancy and diversity, service complying with the specification in spite of manifested faults.

Fault/failure forecasting: to estimate, by statistical modeling, the presence of faults and occurrence of failures.

4

Fault Lifecycle Technique

Fault Manifestation and Modeling Process

Reliability

Fault Prevention

Fault Removal

Fault Tolerance

Fault/Failure Forecasting

5

Fault Lifecycle Technique

Fault Manifestation and Modeling Process

Reliability Availability Safety Security

Fault Prevention

Fault Removal

Fault Tolerance

Fault/Failure Forecasting

6

Software Reliability Modeling

Execution Time

Failure Rate

PresentAdditional Time

Present

Objective

R = e -t

Testing Time

7

Current SRE Process Overview

8

Current Trends and Problems

The theoretical foundation of software reliability comes from hardware reliability techniques.

Software failures do not happen independently. Software failures seldom repeat in exactly the

same or predictable pattern. Failure mode and effect analysis (FMEA) for

software is still controversial and incomplete. There is currently a need for a creditable end-to-

end software reliability paradigm that can be directly linked to reliability prediction from the very beginning.

9

Future Direction 1: Reliability-Centric Software Architectures

The product view – achieve failure-resilient software architecture Fault prevention Fault tolerance

The process view – explore the component-based software engineering Component identification, construction,

protection, integration and interaction Reliability modeling based on software structure

10

Future Direction 2: Design for Reliability Achievement

Fault confinementFault detectionDiagnosisReconfigurationRecoveryRestartRepairReintegration

Fault Confinement

Fault Detection Fault Detection

Failover Diagnosis

Online Offline

Reconfiguration

Recovery

Restart

Repair

Reintegration

12

Future Direction 3: Testing for Reliability Assessment

Establish the link between software testing and reliability

Study the effect of code coverage to fault coverage

Evaluate impact of reliability by various testing metrics

Assess competing testing schemes quantitatively

13

Positive vs. negative evidences for coverage-based software testing

Resources Findings

Positive

Frankl(1988)

Horgan(1994)

Weyuker(1988)

High code coverage brings high software reliability and low failure rate

Chen(1992) A correlation between code coverage and software reliability

is observed

Wong(1994) The correlation between test effectiveness and block coverage is higher than that between test effectiveness and the size of test set

Frate(1995) An increase in reliability comes with an increase in at least one code coverage measures

Cai (2005) Code coverage contributes to a noticeable amount of fault coverage

Negative Briand(2000) The testing result on published data did not support a causal

dependency between code coverage and defect coverage

14

RSDIMU test cases description

I

II

IIIIV

V

VI

15

The correlation: various test regions

Linear modeling fitness in various test case regions

Linear regression relationship between block coverage and fault coverage in the whole test set

Fault Coverage

16

The correlation: normal operational testing vs. exceptional testing

Normal operational testing very weak correlation

Exceptional testing strong correlation

Testing profile (size) R-square

Whole test case (1200) 0.781

Normal testing (827) 0.045

Exceptional testing (373) 0.944

17

The correlation: normal operational testing vs. exceptional testing

Normal testing: small coverage range (48%-52%) Exceptional testing: two main clusters

Fault CoverageFault Coverage

18

The Spectrum in Software Testing and Reliability

Software ReliabilityGrowth Models

New Model Coverage-Based Analysis

• A new model is needed to combine execution time and testing coverage

Time Based Models

CoverageBasedTesting

- user oriented - tester oriented- more physical meaning - less physical meaning - abundant models - lack of models- easy data collection - hard data collection- less relevance to testing - more relevance to testing

19

A New Coverage-Based Reliability Model

λ(t,c): joint failure intensity function λ1(t): failure intensity function with respect to time

λ2(c): failure intensity function with respect to coverage

α1,γ1, α2, γ2: parameters with the constraint of

α 1 + α 2 = 1

joint failure intensity function

failure intensity function with time

failure intensity function with coverageDependency

factors

20

Estimation Accuracy

21

Future Direction 4: Metrics for Reliability Prediction

New models (e.g., BBN) to explore rich software metrics

Data mining approachesMachine learning techniquesBridging the gap of the one-way function:

feedback to building reliable softwareContinuous industrial data collection efforts

– demonstration of cost-effectiveness

22

Future Direction 5: Reliability for Emerging Software Applications

“The Internet changes everything”On-demand customizable softwareService oriented architecture, composition,

integrationCustomization by middleware – from

metadata to metacodeA common infrastructure delivers reliability

to all customers

23

Replication Manager

Web service selection algorithm

WatchDog

UDDI

Registry

WSDL

Web ServiceIIS

Application

Database

Web ServiceIIS

Application

Database

Web ServiceIIS

Application

Database

Client

Port

Application

Database

1. Create Web services

2. Select primary Web service (PWS)

3. Register

4. Look up

5. Get WSDL

6. Invoke Web service

7. Keep check the availability of the PWS

8. If PWS failed, reselect the PWS.

9. Update the WSDL

A Paradigm for Reliable Web Service

24

ConclusionsSoftware reliability is receiving higher

attention as it becomes an important economic consideration for businesses.

New SRE paradigms need to consider software architectures, testing techniques, data analyses, and creditable reliability modeling procedures.

Domain specific approaches on emerging software applications are worthy of investigation.

Still a long way to go, but the directions are clear.

presentation

Documents

fault coverage cai

high software reliability

test set fault coverage

fault occurrences

reliability study

software failures

reliability assessment

impact of reliability