Metrics and experimentation Kristian Sandahl
Metrics and experimentationKristian Sandahl
IntroductionMotivation:• Management:
– Appraisal– Assurance– Control– Improvement
• Research:– Cause-effect models
Terms:• Metric• Measurement
2016-11-07Metrics & Experiments/Kristian Sandahl 2
Classification• Product metrics:
– Observable or computed properties of the product– Examples: Lines of code, number of pages
• Process metrics:– Properties of how you are developing the product– Examples: Cycle time for a change request, number of
parallel activities• Resource metrics:
– Properties and volumes of the instruments you are using when developing the product
– Examples: Years of education, amount of memory in testing environment
2016-11-07Metrics & Experiments/Kristian Sandahl 3
ScalesNominal = , ≠ Categories Type of
software
Ordinal < , > Rankings Skill rating: high, medium, low
Interval + , - Differences Project delay
Ratio / Absolute zero
Lines of code
Examples
2016-11-07Metrics & Experiments/Kristian Sandahl 4
Theoretical validation of metricsRepresentational theory, based on the mapping between
attributes of real-world entities – numerical values and units:
• For an attribute to be measurable, it must allow different entities to be distinguished from one another.
• A valid measure must obey the representational condition.• Each unit of an attribute contributing to a valid measure is
equivalent.• Different entities can have the same attribute value.Property-based theory, based on graph-theoretic models of
software modules:• Examples: Nonnegativity, Null value, Additivity
2016-11-07Metrics & Experiments/Kristian Sandahl 5
Structural model of measurement
2016-11-07Metrics & Experiments/Kristian Sandahl 6
Empirical (external) validation of metrics• Correlation between internal and external attributes
• Cause-effect models
• Handle bias
• Statistical analysis
2016-11-07Metrics & Experiments/Kristian Sandahl 7
Goal-Question-Metric (GQM)
2016-11-07Metrics & Experiments/Kristian Sandahl 8
Halstead’s software science1/2
The measurable and countable properties are :
• n1 = number of unique or distinct operators appearing in that implementation
• n2 = number of unique or distinct operands appearing in that implementation
• N1 = total usage of all of the operators appearing in that implementation
• N2 = total usage of all of the operands appearing in that implementation http://yunus.hacettepe.edu.tr/~sencer/complexity.html
2016-11-07Metrics & Experiments/Kristian Sandahl 9
Halstead’s software science2/2
Equations:
• Vocabulary n = n1 + n2
• Implementation length N = N1 + N2
• Length equation: N ' = n1log2n1 + n2log2n2
• Program Volume V = Nlog2n
• Potential Volume V ' = ( n*1 + n*2 ) log2 ( n*1 + n*2 )
• Program Level L = V ‘/ V
• L ' = n*1n2 / n1N2
• Elementary mental discriminations E = V / L = V2 / V '
• Intelligence Content I = L ' x V = ( 2n2 / n1N2 ) x (N1 + N2)log2(n1 + n2)
• Time T ' = ( n1N2( n1log2n1 + n2log2n2) log2n) / 2n2S
2016-11-07Metrics & Experiments/Kristian Sandahl 10
Code metrics in Visual Studio• Lines Of Code
• Cyclomatic Complexity
• Maintainability Index = 171–5.2*ln(aveV)–0.23*ave(g’)– 16.2*ln(aveLOC)
• Depth Of Inheritance
• Class Coupling
2016-11-07Metrics & Experiments/Kristian Sandahl 11
Function Points - Background• First suggested by Albrecht 1979• Captures complexity and size• Language independent• Can be used before implementation• Used as input for estimation• Common versions IFPUG v 4.x• Competitor MARK II:
– simpler to count– has finer granularity– is a continuous measure
• A “closed community”• Traditionally used for business systems
2016-11-07Metrics & Experiments/Kristian Sandahl 12
COSMIC-FFP(COmmon Software Measurement International Consortium Full Function Point)
• An ISO-approved method for calculating FP for embedded, real-time systems
• Partitions the system in Functional User Requirements (FUR)
2016-11-07Metrics & Experiments/Kristian Sandahl 13
Example: Change customer data in a warehouse of items
User entry Entry 1Retrieve customer data
Read 1
Display error message
Exit 1
Display customer data
Exit 1
Enter changed data Entry 1
Retrieve item data Read 1
Store item data Write 1
Store modified data Write 1
Total Cfsu 8
2016-11-07Metrics & Experiments/Kristian Sandahl 14
Connections to other methods• Mapping to UML – Use cases as Sequence diagrams,
count messages
• Cfsu = C1 + C2 FP, for less than 100 Cfsu
• C2 1.1-1.2
• C1 varies
2016-11-07Metrics & Experiments/Kristian Sandahl 15
Why do we need experimental studies?Software Engineering has great variation in:• Scale• Domain• Tools• Infrastructure• Human resources• Organization• Locality• Technique• Quality Method
Cause Effect
2016-11-07Metrics & Experiments/Kristian Sandahl 16
What is an experiment?
treatment
No treatment
Units (subjects)
Control group
Comparison
2016-11-07Metrics & Experiments/Kristian Sandahl 17
Types of experiments• Randomized experiment: Units receiving the
treatment are selected by random
• Quasi-experiment: Units are not selected randomly
• Controlled experiment: Comparison between treatments (Sjøberg et al 2005)
• Correlation study: Observes relationships with variables (empirical evaluation)
• Replication: Repeating the study
• Differentiated replication: Replication with variation of essential conditions
2016-11-07Metrics & Experiments/Kristian Sandahl 18
Variables
AgeSexEducationExperience...
Background variables
Controlled variables
Independentvariables
Time of dayTemperatureAvailable resources...
Method usedTool usedSize of taskGroup size...
Dependent variables
No of errors doneTime to complete taskJudgement of quality...
DifferentNot changeable
SameObserved
ManipulatedObserved
Assumed to changeas effect manipulationof independent variables
2016-11-07Metrics & Experiments/Kristian Sandahl 19
Validity threats• Internal validity: Are differences in dependent
variables really due to changes of independent variables?
• Conclusion validity: Are our measurement and analysis methods appropriate?
• Construct validity: Are we measuring the phenomena we intend to do?
• External validity: To what population can we generalise our results?
2016-11-07Metrics & Experiments/Kristian Sandahl 20
Comparing means
Under certain conditions: Student’s t-test
Significance level: nomally 5%
2016-11-07Metrics & Experiments/Kristian Sandahl 21
Comparing distributions• Are the testers’
methods the same?
• Under certain conditions: use the Chi-square test
• For 2x2 contingency tables other methods apply, for instance Cohen’sKappa
2016-11-07Metrics & Experiments/Kristian Sandahl 22
The box plot
2016-11-07Metrics & Experiments/Kristian Sandahl 23
Comparing variance
2016-11-07Metrics & Experiments/Kristian Sandahl 24
Linear regression
2016-11-07Metrics & Experiments/Kristian Sandahl 25
ScopePrediction of:
• Resources
• Calendar time
• Quality (or lack of quality)
• Change impact
• Process performance
• Often confounded with the decision process
2016-11-07Metrics & Experiments/Kristian Sandahl 26
Historical data2016-11-07Metrics & Experiments/Kristian Sandahl 27
Methods for building prediction models• Statistical
– Parametric• Make assumptions about distribution of the variables• Good tools for automation• Linear regression, Variance analysis, ...
– Non-parametric, robust• No assumptions about distribution• Less powerful, low degree of automation• Rank-sum methods, Pareto diagrams, ...
• Causal models– Link elements with semantic links or numerical equations– Simulation models, connectionism models, genetic models, ...
• Judgemental– Organise human expertise– Delphi method, pair-wise comparison, Lichtenberg method
2016-11-07Metrics & Experiments/Kristian Sandahl 28
The Lichtenbeg method process• Staff the analysis group• Describe the work to be estimated• Define general constraints and assumptions• Define the structure• Individual judgement of MIN, MAX, LIKLEY• Calculate common result• Find workpagages with large variance• Sub-devide them and rework
• 5-20 participants• Never influence each others judgements• MIN and MAX should be extreme – 1% of the cases
2016-11-07Metrics & Experiments/Kristian Sandahl 29
Common SE-predictions
• Detecting fault-prone modules• Project effort estimation• Change Impact Analysis• Ripple effect analysis• Process improvement models• Model checking• Consistency checking
Common metrics
2016-11-07Metrics & Experiments/Kristian Sandahl 31
Introduction• There are many faults in software
• Faults are costly to find and repair
• The later we find faults the more costly they are
• We want to find faults early
• We want to have automated ways of finding faults
• Our approach
– Automatic measurements on models
– Use metrics to predict fault-prone modules
2016-11-07Metrics & Experiments/Kristian Sandahl 32
Approach• Find metrics (independent variables)
– Number of model elements (size)
– Number of changed methods (change)
– Transitions per state (complexity)
– Changed operations * transitions per state (combinations)
– ...
• Use metrics to predict (dependent variable)
– Number of TRs
2016-11-07Metrics & Experiments/Kristian Sandahl 33
Capsules
2016-11-07
Metrics & Experiments/Kristian Sandahl
Page
State charts
2016-11-07
Metrics & Experiments/Kristian Sandahl
Page
package
capsule class
attribute operationport protocol
signalState machine
State transition
Data model
2016-11-07Metrics & Experiments/Kristian Sandahl 36
Our project - modelmet• RNC application - Three releases• About 7000 model elements• TR statistics database (2000 TRs)• Find metrics
– Existing metrics (done at standard daily build)– Run scripts on models
• Statistical analysis– Linear regression, principal component analysis,
discriminant analysis, robust methods– Neural networks, Bayesian belief networks
2016-11-07Metrics & Experiments/Kristian Sandahl 37
Size
Change
Complexity
Combined
2016-11-07 38
Metrics based on change, system A
2016-11-07
Metrics & Experiments/Kristian Sandahl
Page
Metrics based on change, system B
2016-11-07
Metrics & Experiments/Kristian Sandahl
Page
Complexity and size metrics, system A
2016-11-07
Metrics & Experiments/Kristian Sandahl
Page
Complexity and Size metrics, system B
2016-11-07
Metrics & Experiments/Kristian Sandahl
Page
Other metrics, system A
TRD = C + 0.034 states – 0.965 protocolsmodelelements
2016-11-07
Metrics & Experiments/Kristian Sandahl
Page
Other metrics, system B
2016-11-07
Metrics & Experiments/Kristian Sandahl
Page
2016-11-07Metrics & Experiments/Kristian Sandahl 45
How to use predictions• Uneven distribution of faults is common – 80/20 rule
• Perform special treatment on selected parts– Select experienced designers– Provide good working conditions– Parallell teams– Inspections– Static and dynamic analysis tools– ...
• Perform root-cause analysis and make corrections
2016-11-07Metrics & Experiments/Kristian Sandahl 46
ResultsContributions:• Valid statistical material:
– Large models, large number of TRs– Two change projects
• Two highly explanatory predictors were found• State chart metrics are as good as OO metrics
Problems:• Some problems to match modules in models and TRs• Effort to collect change data
2016-11-07Metrics & Experiments/Kristian Sandahl 47
www.liu.se
Metrics and experimentation/Kristian Sandahl