Software fault management

SRE & Software Fault Management

Through Measurement and ModelingRick Karcich

[email protected]

mailto:[email protected]

SRE & SW Fault Management

• SRE is a Measurement of Change Activity & Test Activity Problem

• Composed of 3 phaseso Static Code Measurement

Measuring the code that implements each requiremento Measurement of Change, Build-to-Build, Sprint-to-

Sprint Developing the Change Profile

o Dynamic Test Measurement Developing the Test Profile

• Understanding how effective our tests are at hitting changed portions of code releases

SRE == Software Reliability Engineering

Measurement as the basis for SRE improvement

• A Basic Problemo Testers find too many bugs too late in a

release/sprint cycleo Bug fixing destroys the crisp, predictable

execution of a project• The Goal

o Predictably meeting release schedules with known quality/reliability

o Develop information actionable by testers

Measurement as the basis for SRE improvement…

• We don’t really want to know about the things that we can measureo Lines of Codeo Statements

• We really want to understand the things that we cannot measureo Software faultso Software development effort

Measurement as the basis for SRE improvement…

• Modern software systems change continuously• They evolve functionally• The code base evolves as a result

Software Evolution: Measuring A Moving Target

• We assume that we are developing/maintaining a single program

• In effect, we are really working with many programs over time

• They are different programs in a very real sense• We must identify and measure each version of

each program module

The Measurement Problem…

Build/Sprint NA

B

C

DE

F

G

H

IJ

K

Build/Sprint N+1A

B

L

DE

F

G

H

IJ

K

M

The Introduction Of Faults

• People make errors in the interpretation of their taskso System Analystso Systems Designerso Developers

• These errors are manifested in o Specificationso Designo Programso as faults

• Faults, when executed, result in failures

The Fault Process

Build/Sprint N Build/Sprint N+1

Latent Faults

Latent Faults

FaultsRemoved

FaultsAdded

Execution Consequences Of Faults: Failures

• Faults are found in program modules• A fault can only cause a failure if it is executed• Different functionalities execute different sets of

modules• Faults are associated with program

functionalities• A test suite generated from a representative

operational profile is a precondition for reliability analysis

Faults And Uncertainty

• Can never know when all faults have been found• May use past experience to anticipate fault count• Must create a fault surrogate

o Obtained from past development effortso Varies directly with faultso Anticipates distribution of faults in modules

• Comments• Executable Statements• Non-executable Statements• Total Operators• Unique Operators• Function Operators• Total Operands• Unique Operands Fault Index(FI)• Unique Actual Operands• Nodes• Edges• Paths• Maximum Path Length• Average Path Length • Cycles

Development of FI from Raw Metrics – Static Measurement

FI as a Fault Surrogate

• The FI metric is a statistical synthesis of program module complexity

• Program modules may be ordered by FI• The relative complexity of a software system is

the average FI of the component modules• Validation of the FI concept

o Correlates well (0.90) with measures of software faults

A Fault Index

• FI is a synthesized, dimensionless metric• FI is a fault surrogate

o Composed of metrics closely related to faultso Highly correlated with faults

Converting Data to Information

CMA

MetricAnalysis

PCA/FI

PrincipalComponents

Analysis

Modules

Program Lots of Data

12 23 54 12 203 39 238 34

7 13 64 12 215 9 39 238

11 21 54 12 241 39 238 35

5 33 44 12 205 39 138 44

42 55 54 12 113 29 234 14

FI

100

90

110

95

105

Measurement of Change, Build-to-Build(Spirnt-to-Sprint) – Code Churn and Code Delta

• Fault Index(FI) acts as a proxy for faults• FI values change from one build to the next as a

module is changed• Code deltas are differences in FI values from

build to the next• Code churn is the absolute value of Code Delta –

this is the measure of change activity

The Measurement of Change Process

Build i

Build j

Source Code

Measurement

Tools

Baseline

Baselined Build j

Baselined Build i

PCA Domain Sc

DomainScore

Change

Code Churn

Code Delta

Baselining A Software Development Project

• Software changes over software builds• Measurements, such as relative complexity,

change across builds• Initial(arbitrary) build as a baseline• Relative complexity of each build• Measure change in fault surrogate from an initial

baseline

Measurement Baseline

Point A

Point B

-5+10

Measuring Product X’s Change Activity

Measuring Product X’s Change Activity, since Alpha

Questions….?

Software fault management

Technology

fault surrogate o

o lines of code o statements

goal o

faults fi values

failures faults

measures of software

distribution of faults

measurement of change