Top Banner
PREDICTING POST-RELEASE DEFECTS USING PRE-RELEASE FIELD TESTING RESULTS Foutse Khomh, Brian Foutse Khomh , Brian Chan, Ying Zou Anand Sinha, Dave Dietz
25

Reliability and Quality - Predicting post-release defects using pre-release field testing results

Jan 13, 2015

Download

Technology

ICSM 2011

Paper : Predicting Post-release Defects Using Pre-release Field Testing Results
Authors : Foutse Khomh, Brian Chan, Ying Zou, Anand Sinha and Dave Dietz
Session: Research Track Session 9: Reliability and Quality
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reliability and Quality - Predicting post-release defects using pre-release field testing results

PREDICTING POST-RELEASE

DEFECTS USING PRE-RELEASE

FIELD TESTING RESULTS

Foutse Khomh, Brian Foutse Khomh, Brian

Chan, Ying Zou

Anand Sinha, Dave Dietz

Page 2: Reliability and Quality - Predicting post-release defects using pre-release field testing results

FIELD TESTING CYCLE

Field testing is important to improve the quality of

an application before release.2

Page 3: Reliability and Quality - Predicting post-release defects using pre-release field testing results

MEAN TIME BETWEEN

FAILURE

Mean Time Between Failures (MTBF) is frequently

used to gauge the reliability of the application.

3

Applications with a low MTBF are undesirable

since they would have a higher number of

defects

Page 4: Reliability and Quality - Predicting post-release defects using pre-release field testing results

AVERAGE USAGE TIME

� AVT is the average time that a user actively uses the

application.

� The AVT can be longer than the period of field testing.

4

A longer AVT indicates that an application is

reliable and a user tends to use the application

longer.

Page 5: Reliability and Quality - Predicting post-release defects using pre-release field testing results

PROBLEM STATEMENT

� MTBF and AVT cannot capture the whole

pattern of failure occurrences in the field testing

of an application.

5

The reliability of A and B is very different.

Page 6: Reliability and Quality - Predicting post-release defects using pre-release field testing results

METRICS

� We propose three metrics that capture additional

patterns of failure occurrences:

� TTFF: the average length of usage time before

the occurrence of the first failure, the occurrence of the first failure,

� FAR: the failure accumulation rating to gauge

the spread of failures to the majority of users,

and

� OFR: the overall failure ratio that captures

daily rates of failures. 6

Page 7: Reliability and Quality - Predicting post-release defects using pre-release field testing results

AVERAGE TIME TO FIRST

FAILURE (TTFF)

0.3

0.35

0.4

0.45

VersionA

% of users reporting failures

70

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days

% of users reporting failures

Page 8: Reliability and Quality - Predicting post-release defects using pre-release field testing results

AVERAGE TIME TO FIRST

FAILURE (TTFF)

0.3

0.35

0.4

0.45

VersionA VersionB

% of users reporting failures

80

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days

% of users reporting failures

Page 9: Reliability and Quality - Predicting post-release defects using pre-release field testing results

AVERAGE TIME TO FIRST

FAILURE (TTFF)

0

0.1

0.2

0.3

0.4

0.5

VersionA VersionB

% of users

reporting failures

9

0

1 2 3 4 5 6 7 8 9 1011121314

Daysreporting failures

TTFF produces high scores for applications

where the majority of users experience the

first failure late.

Page 10: Reliability and Quality - Predicting post-release defects using pre-release field testing results

AVERAGE TIME TO FIRST

FAILURE (TTFF)

0.3

0.35

0.4

0.45

VersionA VersionB

% of users reporting failures

100

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days

% of users reporting failures

TTFFB = 3.56

TTFFA = 6.11

Page 11: Reliability and Quality - Predicting post-release defects using pre-release field testing results

FAILURE ACCUMULATION

RATING (FAR)

0.6

0.7

0.8

0.9

1

% of users reporting

110

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14

VersionA

Number of unique failures

% of users reporting

Page 12: Reliability and Quality - Predicting post-release defects using pre-release field testing results

FAILURE ACCUMULATION

RATING (FAR)

0.6

0.7

0.8

0.9

1

% of users reporting

120

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14

VersionA

VersionB

% of users reporting

Number of unique failures

Page 13: Reliability and Quality - Predicting post-release defects using pre-release field testing results

FAILURE ACCUMULATION

RATING (FAR)

0.2

0.4

0.6

0.8

1

% of users reporting

13

0

1 3 5 7 9 11 13% of users reportingNumber of unique failures

The FAR metric produces high scores for

applications where the majority of users report

a very low numbers of failures.

Page 14: Reliability and Quality - Predicting post-release defects using pre-release field testing results

FAILURE ACCUMULATION

RATING (FAR)

0.6

0.7

0.8

0.9

1

FARB = 4.97

% of users reporting

140

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14

VersionA

VersionBFARA = 6.97

Number of unique failures

% of users reporting

Page 15: Reliability and Quality - Predicting post-release defects using pre-release field testing results

OVERALL FAILURE RATING

(OFR)

0.25

0.3

0.35

VersionA

% of users reporting failures

150

0.05

0.1

0.15

0.2

0.25

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days

% of users reporting failures

Page 16: Reliability and Quality - Predicting post-release defects using pre-release field testing results

OVERALL FAILURE RATING

(OFR)

0.25

0.3

0.35

VersionA VersionB

% of users reporting failures

160

0.05

0.1

0.15

0.2

0.25

1 2 3 4 5 6 7 8 9 10 11 12 13 14

% of users reporting failures

Days

Page 17: Reliability and Quality - Predicting post-release defects using pre-release field testing results

OVERALL FAILURE RATING

(OFR)

0

0.1

0.2

0.3

0.4

VersionA VersionB

% of users reporting

failures

17

0

1 3 5 7 9 11 13

% of users reporting

failures

Days

The OFR metric produces high scores for

applications with fewer users reporting

failures overall.

Page 18: Reliability and Quality - Predicting post-release defects using pre-release field testing results

OVERALL FAILURE RATING

(OFR)

0.25

0.3

0.35

VersionA VersionB OFRB = 0.78

OFRA = 0.93

% of users reporting failures

180

0.05

0.1

0.15

0.2

0.25

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days

% of users reporting failures

Page 19: Reliability and Quality - Predicting post-release defects using pre-release field testing results

CASE STUDY

We analyze 18 versions of an enterprise software

application

� Overall 2,546 users were involved in the field

testingtesting

� The testing period lasted 30 days

19

Page 20: Reliability and Quality - Predicting post-release defects using pre-release field testing results

SPEARMAN CORRELATION

OF THE METRICS

TTFF FAR OFR AVT MTBF

TTFF 1 0.09 -0.08 -0.31 -0.08

20

FAR 0.09 1 0.07 0.33 -0.24

OFR -0.08 0.07 1 0.39 -0.54

AVT -0.31 0.33 0.39 1 -0.3

MTBF -0.08 -0.24 -0.54 -0.3 1

Page 21: Reliability and Quality - Predicting post-release defects using pre-release field testing results

INDEPENDENCY AMONG

PROPOSED METRICS

0.4

0.6

0.8

1

TTFF

21

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

PC1 PC2 PC3 PC4

TTFF

FAR

OFR

MTBF

Page 22: Reliability and Quality - Predicting post-release defects using pre-release field testing results

PREDICTIVE POWER FOR

POST-RELEASE DEFECTS

0.1

0.12

0.14

square

220

0.02

0.04

0.06

0.08

TTFF FAR OFR AVT MTBF

6 months

1 year

2 years

Metrics

Marginal R-square

Page 23: Reliability and Quality - Predicting post-release defects using pre-release field testing results

PRECISION OF PREDICTIONS

WITH ALL FIVE METRICS

60

70

80

90

100

230

10

20

30

40

50

60

5 10 15 20 25 30

6 months

1 year

2 years

Precision (%)

Number of testing days

Page 24: Reliability and Quality - Predicting post-release defects using pre-release field testing results

CONCLUSION

� TTFF, FAR, and OFR complement the traditional

MTBF and AVT in predicting the number of post-

release defects

� Provide faster predictions of the number of post-� Provide faster predictions of the number of post-

release defects with good precision within just 5

days of a pre-release testing period

� It takes MTBF up to 25 days to predict the

number of post-release defects

24

Page 25: Reliability and Quality - Predicting post-release defects using pre-release field testing results

25