1people.uwplatt.edu/.../se373_573_Notes_004-Metrics.docx · Web viewCoding 10 Months 3 Months -7 Integration/Test 5 Months 3 Months -2 User Documentation 2 Months 2 Months 0 Management/Support

Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality

1 Software Quality Metrics

1.1 Test Report Metrics

1.1.1 Test Case Status (Completed, Completed with Errors, Not Run Yet). – Some of the Not Run Yet code is also be further divided into

blocked – functionality not yet available or test process cannot be run for some reason

not blocked – just haven’t gotten around to testing these yet.

– See the first graph on the next page.

1.1.2 Defect Gap Analysis Looks and the distance between (total uncovered defects and corrected defects) – which is a

measure how the bug fixers are doing and when will the product be ready to ship.

– The Gap is the difference between Uncovered and Corrected defects.

– At first, there is a latency in correcting defects and defects are uncovered faster than fixed.

Uncovered are all defects that are known and include those found and those fixed.

©2011 Mike Rowe Page 1 9/19/2011 7:55 PM


– As time goes on, the gap should narrow (Hopefully). If it does not, your maintenance

and/or development teams are losing ground in that defects are still being found faster than they are being fixed.

– See the second graph on this page for a Gap Analysis Chart.

From Lewis, Software Testing and Continuous Quality Improvement, 2000

The line with the Gap should be exactly vertical representing the distance (Gap) at one specific time.

©2011 Mike Rowe Page 2 9/19/2011 7:55 PM


1.1.3 Defect Severity – Defect severity by percentage of total defects. Defect Severity helps determine how close to

release the software is and can help in allocating resources.

– Critical – blocking other tests from being run and alpha release,

– Severe – blocking tests and beta release,

– Moderate – testing workaround possible, but blocking final release

– … very minor – fix before the “Sun Burns Out”, USDATA 1994.

– See the first graph on the next page.

1.1.4 Test Burnout Chart of cumulative total defects and defects by period over time periods. It is a measure of

the rate at which new defects are being found.

– Test Burnout helps project the point at which most of the defects will be found using current test cases and procedures, and therefore when (re)testing can halt.

– Burnout is projection or an observation of when no more or only a small number of new defects are expected to be found using current practices.

– Beware, it doesn’t project when your system will be bug free, just when your current testing techniques are not likely find additional Defects.

©2011 Mike Rowe Page 3 9/19/2011 7:55 PM


– See the second graph on this page.


1.1.5 Defects by Function tracks number of defects per function, component or subsystem

– useful in determining where to target additional testing, and/or redesign and implementation.

©2011 Mike Rowe Page 4 9/19/2011 7:55 PM


– Often use a Pareto Chart/Analysis.

– See the table on this page.


©2011 Mike Rowe Page 5 9/19/2011 7:55 PM


1.1.6 Defects by tester

This tracks the number of defects found per tester. (not shown) This is only quantitative and not qualitative analysis,

– Reporting this may lead to quota filling by breaking defects into many small nits rather that one comprehensive report. – Remember Deming’s 14 Quality Principles.

– Many nits are harder to manage and may take more time to fix than having all related issues rolled into one bigger defect.

1.1.7 Root cause analysis What caused the defect to be added to the system – generally try to react to this by evolving

the software development process.

Sometimes this is also referred to Injection Source, although Injection Source is sometimes limited to Internal or External.

– Internal refers to defects caused by the development team (from Requirements Engineers, Designers, Coders, Testers, …).

– External refers to defects caused by non-development team people (customers gave you wrong information, 3rd party software came with defects, etc.)

1.1.8 How defects were found Inspections, walkthroughs, unit tests, integration tests, system tests, etc. If a quality assurance

technique isn’t removing defects, it is a waste of time and money.

1.1.9 Injection Points In what stage of the development cycle was the defect put into the system. This can help evolve

a process to try to prevent defects.

1.1.10 Detection Points In what stage of the development cycle was the defect discovered.

Want to look at the difference between the Injection Point and Detection Point– If there is a significant latency between Injection and Detection, then the process needs to

evolve to reduce this latency. Remember defect remediation costs increase significantly as we progress through

the development stages.

©2011 Mike Rowe Page 6 9/19/2011 7:55 PM



JAD: Joint Application Development – a predecessor of the Agile process

1.1.11 Who found the defects Developers (in requirement, code, unit test, … reviews), QA (integration and system testing),

Alpha testers, Beta testers, integrators, end customers.

©2011 Mike Rowe Page 7 9/19/2011 7:55 PM



1.2 Software Complexity and have been used to estimate testing time and or quality

1.2.1 KLOCS -- CoCoMo Real-time embedded systems, 40-160 LOC/P-month

Systems programs , 150-400 LOC/P-month

Commercial applications, 200-800 LOC/P-month

http://csse.usc.edu/tools/COCOMOSuite.php

http://sunset.usc.edu/research/COCOMOII/expert_cocomo/expert_cocomo2000.html

1.2.2 Comment Percentage The comment percentage can include a count of the number of comments, both on line (with

code) and stand-alone. – http://www.projectcodemeter.com/cost_estimation/index.php?file=kop1.php

The comment percentage is calculated by the total number of comments divided by the total lines of code less the number of blank lines.

©2011 Mike Rowe Page 8 9/19/2011 7:55 PM

http://www.projectcodemeter.com/cost_estimation/index.php?file=kop1.php

http://sunset.usc.edu/research/COCOMOII/expert_cocomo/expert_cocomo2000.html

http://csse.usc.edu/tools/COCOMOSuite.php


Comment percentage of about 30 percent have been mentioned as most effective. Because

comments help developers and maintainers, this metric is used to evaluate the attributes of understandability, reusability, and maintainability.

1.2.3 Halstead’s Metrics Have been associated with maintainability of code

Programmers use operators and operands to write programs

Suggests program comprehension requires retrieval of tokens from mental dictionary via binary search mechanism

Complexity of a piece of code, and hence the time to develop it, depends on:

– n1, number of unique operators

– n2, number of unique operands

– N1, total number of occurrences of operators

– N2, total number of occurrences of operandsSUBROUTINE SORT (X, N)

INTEGER X(100), N, I, J, SAVE, IM1IF (N .LT. 2) GOTO 200

DO 210 I = 2, NIM1 = I – 1DO 220 J = 1, IM1

IF (X(I) .GE. X(J)) GOTO 220SAVE = X(I)X(I) = X(J)X(J) = SAVE

220 CONTINUE210 CONTINUE200 RETURN

Operators Occurrences Operands Occurrences

SUBROUTINE 1 SORT 1

() 10 X 8

, 8 N 4

INTEGER 1 100 1

IF 2 I 6

.LT. 1 J 5

GOTO 2 SAVE 3

DO 2 IM1 3

= 6 2 2

©2011 Mike Rowe Page 9 9/19/2011 7:55 PM


- 1 200 2

.GE. 1 210 2

CONTINUE 2 1 2

RETURN 1 220 3

End-of-line 13

n1 = 14 N1 = 51 n2 = 13 N2 = 42

Program Length, N = N1 + N2 = 93 {Total number of Operators and Operands }

Program Vocabulary, n = n1 + n2 = 27 {number of unique Operands and Operators}

Program Volume, V = N * log2 n = 93 * log2 27 = 442 {Program Length * log(Vocab) }

o Represents storage required for a binary translation of the original program

o Estimates the number of mental comparisons required to comprehend the program

Length estimate, N* = n1 * log2 n1 + n2 * log2 n2 = 101.4 { Unique Operand and Operators }

14 * log2 (14) + 13 * log2 (13) = 55 + 45 = 101.4

Potential volume V* = (2 + n2) log2 (2 + n2)

o Program of minimum size

o For our example, V* = (2+ 13) log2 (2+13) = 15 log2 (15) = 58.6

o Note: that as the Program Volume approaches the Potential Volume we are reaching an optimized theoretical solution.

o And, in theory there is no difference between theory and practice, but in practice there is. – Yogi Berra

Program (complexity) Level, L = V* / V = 58.6 / 442 = 0.13{ Potential Vol. / ACTUAL Program Vol. }. How close are we to theoretical optimal program.

Difficulty, 1 over program complexity level, D = 1 / L = 1 / 0.13 = 7.5 Can contrast two solutions and compare them for Difficulty.

Difficulty estimate, D* = (n1 / 2) * (N2 / n2) = (14 / 2) * (42 / 13) = 22.6

o Programming difficulty increases if additional operators are introduced (i.e., as n1 increases) and if an operands are repeatedly used (i.e., as N2/n2 increases)

Effort, E = V / L* = D* * V = n1* N2 * N * log2 n / (2 * n2) = 9989

22.6 * 442 = 9989

o Measures ‘elementary mental discriminations’

©2011 Mike Rowe Page 10 9/19/2011 7:55 PM


o Two solutions may have very different Effort estimates.

A psychologist, John Stroud, suggested that human mind is capable of making a limited number of mental discrimination per second (Stroud Number), in the range of 5 to 20.

o Using a Stroud number of 18,

o Time for development, T = E/ 18 discriminations/seconds

= 9989/18 discriminations/seconds = 555 seconds = 9 minutes

1.1.1.1 Simplification of Programs to which Halstead’s Metric is sensitiveBelow are constructs that can alter program complexity

o Complementary operations: e.g.

= i + 1 - j - 1 + j v. = i

Reduces N1, N2, Length, Volume, and Difficulty estimate.

o Ambiguous operands: Identifiers refer to different things in different parts of the program – reuse of operands.

r := b * b - 4 * a * c;

.....

r := (-b + SQRT(r)) / 2.0; // r is redefined in this statement

o Or -- Synonymous operands: Different identifiers for same thing

o Common sub-expressions: failure to use variables to avoid redundant re-computation

y := (i + j) * (i + j) * (i + j);

..... can be rewritten

x := i + j;

y := x * x * x;

o Or -- Unwarranted assignment: e.g. over-doing solution to common subexpressions, thus producing unnecessary variables

o Unfactored expressions:

y := a * a + 2 * a *b * b + b * b;

..... can be rewritten

y := (a + b) * (a + b);

©2011 Mike Rowe Page 11 9/19/2011 7:55 PM


1.2.4 Function Points

CoCoMo II

Based on a combination of program characteristics

o external inputs and outputs

o user interactions

o external interfaces

o files used by the system

A weight is associated with each of these

The function point count is computed by multiplying each raw count by the weight and summing all values

FPs are very subjective -- depend on the estimator. They cannot be counted automatically

1.2.5

“In the late 1970's A.J. Albrecht of IBM took the position that the economic output unit of software projects should be valid for all languages, and should represent topics of concern to the users of the software. In short, he wished to measure the functionality of software.

Albrecht considered that the visible external aspects of software that could be enumerated accurately consisted of five items: the inputs to the application, the outputs from it, inquiries by users, the data files that would be updated by the application, and the interfaces to other applications.

After trial and error, empirical weighting factors were developed for the five items, as was a complexity adjustment. The number of inputs was weighted by 4, outputs by 5, inquiries by 4, data file updates by 10, and interfaces by 7. These weights represent the approximate difficulty of implementing each of the five factors.

In October of 1979, Albrecht first presented the results of this new software measurement technique, termed "Function Points" at a joint SHARE/GUIDE/IBM conference in Monterey, California. This marked the first time in the history of the computing era that economic software productivity could actually be measured.

Table 2 provides an example of Albrecht's Function Point technique used to measure either Case A or Case B. Since the same functionality is provided, the Function Point count is also identical.

©2011 Mike Rowe Page 12 9/19/2011 7:55 PM


Table 2. Sample Function Point Calculations

Raw Data Weights Function Points1 Input X 4 = 4

1 Output X 5 = 5

1 Inquiry X 4 = 4

1 Data File X 10 = 10

1 Interface X 7 = 7

----

Unadjusted Total 30

Compexity Adjustment None This is used for the type of system be developed – Embedded is most complex.

Adjusted Function Points 30

Table 3. The Economic Validity of Function Point Metrics

ActivityCase A

Asssembler Version(30 F.P.)

Case B Fortran Version

(30 F.P.)

Difference

Requirements 2 Months 2 Months 0

Design 3 Months 3 Months 0

Coding 10 Months 3 Months -7

Integration/Test 5 Months 3 Months -2

User Documentation 2 Months 2 Months 0

Management/Support 3 Months 2 Months -1

Total 25 Months 15 Months -10

Total Costs $125,000 $75,000 ($50,000)

Cost Per F.P. $4,166.67 $2,500.00 ($1,666.67)

F.P. Per Person Month 1.2 2 + 0.8

©2011 Mike Rowe Page 13 9/19/2011 7:55 PM


The Function Point metrics are far superior to the source line metrics for expressing normalized

productivity data. As real costs decline, cost per Function Point also declines. As real productivity goes up, Function Points per person month also goes up.

In 1986, the non-profit International Function Point Users Groups (IFPUG) was formed to assist in transmitting data and information about this metric. In 1987, the British government adopted a modified form of Function Points as the standard software productivity metric. In 1990, IFPUG published Release 3.0 of the Function Point Counting Practices Manual, which represented a consensus view of the rules for Function Point counting. Readers should refer to this manual for current counting guidelines. “

Table 1 - SLOC per FP by LanguageLanguage SLOC per FP

Assembler 320C 150Algol 106Cobol 106Fortran 106Jovial 106Pascal 91RPG 80PL/I 80Ada 71Lisp 64Basic 64

©2011 Mike Rowe Page 14 9/19/2011 7:55 PM


4th Generation Database 40APL 32Smalltalk 21Query Languages 16Spreadsheet Languages 6

2 QSM Function Point Programming Languages Table

Version 3.0 April 2005© Copyright 2005 by Quantitative Software Management, Inc. All Rights Reserved.

http://www.qsm.com/FPGearing.html#MoreInfo

The table below contains Function Point Language Gearing Factors from 2597 completed function point projects in the QSM database. The projects span 289 languages from a total of 645 languages represented in the database. Because mixed-language projects are not a reliable source of gearing factors, this table is based upon single-language projects only. Version 3.0 features the languages where we have the most recent, high-quality data.

The table will be updated and expanded as additional project data becomes available. As an additional resource, the David Consulting Group has graciously allowed QSM to include their data in this table.

Environmental factors can result in significant variation in the number of source statements per function point. For this reason, QSM recommends that organizations collect both code counts and final function point counts for completed software projects and use this data for estimates. Where there is no completed project data available for estimation, we provide the following gearing factor information (where sufficient project data exists):

the average the median the range (low - high)

We hope this information will allow estimators to assess the amount of variation, the central tendency, and any skew to the distribution of gearing factors for each language.

Language QSM SLOC/FP Data David Consultin

©2011 Mike Rowe Page 15 9/19/2011 7:55 PM

http://www.davidconsultinggroup.com/indata.htm

http://www.qsm.com/database.html

http://www.qsm.com/FPGearing.html#MoreInfo


g

Avg Median Low High Data Access 35 38 15 47 -Ada 154 - 104 205 -Advantage 38 38 38 38 -APS 86 83 20 184 -ASP 69 62 32 127 -

Assembler** 172 157 86 320 575 Basic/400 Macro

C ** 148 104 9 704 225C++ ** 60 53 29 178 80C# 59 59 51 66 -Clipper 38 39 27 70 60COBOL ** 73 77 8 400 175Cool:Gen/IEF 38 31 10 180 -Culprit 51 - - - -DBase III - - - - 60DBase IV 52 - - - 55Easytrieve+ 33 34 25 41 -Excel 47 46 31 63 -Focus 43 42 32 56 60FORTRAN - - - - 210FoxPro 32 35 25 35 -HTML** 43 42 35 53Ideal 66 52 34 203 -IEF/Cool:Gen 38 31 10 180 -Informix 42 31 24 57 -J2EE 61 50 50 100 -Java** 60 59 14 97 80JavaScript** 56 54 44 65 50JCL** 60 48 21 115 400JSP 59 - - - -Lotus Notes 21 22 15 25 -Mantis 71 27 22 250 -Mapper 118 81 16 245 -Natural 60 52 22 141 100Oracle** 38 29 4 122 60

©2011 Mike Rowe Page 16 9/19/2011 7:55 PM


Oracle Dev 2K/FORMS 41/42 30 21/23 100 -Pacbase 44 48 26 60 -PeopleSoft 33 32 30 40 -Perl 60 - - - 50PL/1** 59 58 22 92 126PL/SQL 46 31 14 110 -Powerbuilder** 30 24 7 105 -REXX 67 - - - -RPG II/III 61 49 24 155 120Sabretalk 80 89 54 99 -SAS 40 41 33 49 50Siebel Tools 13 13 5 20 -Slogan 81 82 66 100 -Smalltalk** 35 32 17 55 -SQL** 39 35 15 143 -VBScript** 45 34 27 50 50Visual Basic** 50 42 14 276 -VPF 96 95 92 101 -Web Scripts 44 15 9 114 -

Note: That the applications that a Language is used for may differ significantly. C++, Assembly, Ada … may be used for much more complex projects than Visual Basic, Java, etc. – Rowe’s 2 cents worth.

2.1.1 “A Metrics Suite for Object Oriented Design” S.R. Chidanber and C.F. Kemerer, IEEE Trans. Software Eng., vol 20, no. 6, pp476-493, June 1994.

See metrics below

2.1.2 “A validation of Object-Oriented Design Metrics as Quality Indicators”V.R. Basili, L.C.Briand, W.L Melo, IEEE Trans. On Software Engineering, vol. 22, no. 10, Oct. 1996

WMC – Weighted Methods per Class is the number of methods and operators in a method (excluding those inherited from parent classes). The higher the WMC the higher the probability of fault detection.

DIT – Depth of Inheritance Tree, number of ancestors of a class. The higher the DIT the higher the probability of fault detection.

©2011 Mike Rowe Page 17 9/19/2011 7:55 PM


NOC – Number of Children of a Class, the number of direct descendants for a class. Was

inversely related to fault detection. This was believed to result from high levels of reuse by children. Maybe also if inheritance fan-out is .wide rather than deep, then we have fewer levels of inheritance.

CBO – Coupling Between Object Classes, how many member functions or instance variables of another class does a class use and how many other classes are involved. Was significantly related to probability of finding faults.

RFC – Response For a Class, the number of functions of a class that can directly be executed by other classes (public and friend). The higher the RFC the higher the probability of fault detection.

Many coding standards address these either directly or indirectly. For instance, limit DIT to 3or 4, provide guidance against coupling, provide guidance for methods per class.

2.2 Use of SPC in software quality assurance.

o Pareto for function. 80-20%; 80 of defect found in 20% of modules

o Control and run charts – if error rates increase above some control level, we need to take action

o Look for causes, modify process, modify design, reengineer, rewrite, …

2.3 Questions about Metrics

Is publishing metrics that relate to program composition actually Quality beneficial?

©2011 Mike Rowe Page 18 9/19/2011 7:55 PM

1people.uwplatt.edu/.../se373_573_Notes_004-Metrics.docx · Web viewCoding 10 Months 3 Months -7 Integration/Test 5 Months 3 Months -2 User Documentation 2 Months 2 Months 0 Management/Support

Documents