Top Banner
A Population Size Estimation Problem Eliezer Kantorowitz Software Engineering Department Ort Braude College of Engineering [email protected]
44

A Population Size Estimation Problem

Jan 02, 2016

Download

Documents

kaye-wright

A Population Size Estimation Problem. Eliezer Kantorowitz Software Engineering Department Ort Braude College of Engineering [email protected]. Table of Contents. The problem Capture Recapture Estimators Estimating number of software defects Defect injection estimators Our experiments - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Population Size Estimation Problem

A Population Size Estimation Problem

Eliezer KantorowitzSoftware Engineering DepartmentOrt Braude College of Engineering

[email protected]

Page 2: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 2

Table of Contents

• The problem

• Capture Recapture Estimators

• Estimating number of software defects

• Defect injection estimators

• Our experiments

• Our estimator

• Conclusions and Research plans

Page 3: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 3

Estimating Population Size

• Two steps1. Make an observation

2. Employ an estimator on the number of observed items

• Example: Industrial quality assurance1. Count the number of defects in a sample

2. Estimate the defect population size from the defects counted in the sample

Page 4: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 4

Partial Observation Methods

• A Partial Observation Methods is an observation method that do not produce a count all the relevant items

• Example: Due to poor lighting, some of the defect items in the sample cannot be seen

Page 5: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 5

This talk is about estimators applicable when using

partial observation methods

Page 6: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 6

Table of Contents

• The problem

• Capture Recapture Estimators

• Estimating number of software defects

• Defect injection estimators

• Our experiments

• Our estimator

• Conclusions and Research plans

Page 7: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 7

Counting Wild AnimalsCapture Recapture Estimators

• Example: Counting the gazelle population in upper Galilee

• Problem: We can only observe a part of the n members of the gazelle population

• Solution: We capture ntag gazelles in a trap. The gazelles are tagged and freed

• We assume that the freed gazelles are evenly mixed with the remaining n-ntag gazelles

Page 8: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 8

Capture Recapture (CR) - 2

• We put a new trap and capture m gazelles of which mtag are recaptured gazelles

• The gazelle population size n may be estimated as

mm

nn

tag

tag 0tagmassuming

Page 9: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 9

Capture Recapture (CR) -3

• A number of different CR estimators corresponding to different sets of assumptions have been developed

• The essence of CR is that we enter (inject) a KNOWN number of tagged animals into the unknown number of animals. This known number can be employed in later statistical analysis

Page 10: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 10

Table of Contents

• The problem

• Capture Recapture Estimators

• Estimating number of software defects

• Defect injection estimators

• Our experiments

• Our estimator

• Conclusions and Research plans

Page 11: A Population Size Estimation Problem

Problem discussed in the following :

Estimating the Number of Defects in Software Users Requirements

Document (URD)

Page 12: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 12

Users Requirements Document (URD)

• Prepared by software analysts and users

• Part of software ordering contract

• In one case 55% of all defects (“bugs”) were URD defects

• URD validation usually done by inspection

Page 13: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 13

Example: a URD used in ourexperiments

PURPOSEManage a costume shop, which rents and sells costumes.

Control the inventory and customer databases. Manage orders and invoices.

CUSTOMER DATABASE - SYSTEM ACTIVITIES Enter new customers.Automatic updates of the customer’s database.List of customers active over the last three years.List of customers ordered by the age of the children.

List of customers ordered by their purchase and rental transactions.

Page 14: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 14

URD validation

• Usually done by inspection

Page 15: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 15

Inspection Method (Fagan 1986)

• The inspected document is presented by its originator to a team of human inspectors

• Each inspector inspects the entire document and records the found defects

• Meeting of all inspectors, where defects found by different inspectors are checked and combined into one list

Page 16: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 16

Inspection Problem

• Usually an inspector sees only a part of all defects

• Different inspectors usually see different sets of defects

• A team of j+1 inspectors usually detects more defects than a team of j inspectors

• Inspection costs proportional to j

Page 17: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 17

Fault Detection Ratio (FDR) as Function of Inspector Team Size

Teams of two inspectors

0

0.2

0.4

0.6

0.8

1

2 4 6 8 10 12 14 16 18 20 22 24 26

number of inspectors

FD

R

Experiment 2 Experiment 1

FDR=(number detected faults)/(total number of faults)

Page 18: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 18

Using Capture Recapture (CR)

• CR was adapted to the inspection problem

• Defects detected by more than one inspector play a similar role to that of recaptured gazelles

• Extensive experiments suggest that CR is not providing sufficient accurate estimates

Page 19: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 19

Table of Contents

• The problem

• Capture Recapture Estimators

• Estimating number of software defects

• Defect injection estimators

• Our experiments

• Our estimator

• Conclusions and Research plans

Page 20: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 20

Defect Injection

• In CR methods we freed a KNOWN number of tagged animals

• In defect injection methods we enter (inject) a KNOWN number of defects into the document

Page 21: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 21

Defect Injection Method

• ninjected – number of injected defects• ninjected-detected – number of detected injected

defects• nreal - number of real defects (the unknown)• ndetected-real – number of detected real defects• Estimated number or real defects:

nreal = ndetected-real(ninjected/ ninjected-detected)

Page 22: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 22

Problems of Defect Injection

• The injected defects must “represent” the real defects “correctly”

Page 23: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 23

Defect Types Distribution

Distribution of Fault Types

Missing Functionality

22%

Inconsistent Information

39%

Missing Information

39%

Page 24: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 24

Examples of Injected DefectsInconsistent InformationLists of customers entered by different techniques that contradict each other (lines 3 and 26).

Cancellation of an order that was reserved is illegal (lines 28 and 32).

The systems do not keep customer data for more than three years (lines 5 and 10).

There is not enough information about the customers in the system (lines 6 and 10).An article that was reserved cannot be sold. (lines 27 and 33).Missing functionality:

…Missing information:

Page 25: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 25

Defect Injection Summary

• Common method for software documents

• Sufficient accurate estimates

• Difficult to produce “representative” defects

• Laborious

Page 26: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 26

Table of Contents

• The problem

• Capture Recapture Estimators

• Estimating number of software defects

• Defect injection estimators

• Our experiments

• Our estimator

• Conclusions and Research plans

Page 27: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 27

The Experimentators

• Eliezer Kantorowitz

• Arie Guttman

• Lior Arzi

• Assaf Harel

Page 28: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 28

Experiments - 1

• Computer Science students at Technion– 250 freshmen– 69 senior

• Industry engineers– 25 engineers

• Two experiments from literature– 57 senior Computer Science students

• All together 401 persons involved

Page 29: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 29

Experiments - 2

• Employed requirements documents– Costume shop information system– Missile launcher– Railroad system (in experiments from

litterature)

• Data of good quality– 401 persons– Careful preparation

Page 30: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 30

Typical Results

Y axis is the number of inspectors that detected the different defects. The two “easiest to detect” defects were detected by 6 inspectors each

Page 31: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 31

Table of Contents

• The problem

• Capture Recapture Estimators

• Estimating number of software defects

• Defect injection estimators

• Our experiments

• Our estimator

• Conclusions and Research plans

Page 32: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 32

The Model - 1

i (fault number)

P0,1

1

nmax n-10

Pi,1 (probability that fault i is detected by 1 inspector)

The linearity assumption

Page 33: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 33

The Model - 2The linearity assumption

max1,0max

1,01, 0 niPn

iPPi

Pi,1 - probability that one inspector detects defect i.

nmax – defects 0 ≤ i <nmax can be detected

n

nFDR max

max

maxmax

1,01, 01

1 FDRn

i

n

i

FDRPPi

Page 34: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 34

The Model - 3

• P0,1 - The probability that one inspector detects the “easiest to detect” defect

• P0,1 ε[0,1] - A measure of the ease of detection

• FDRmax – The inspectors are able to detect the proportion FDRmax of the n defects, i.e. FDRmaxn defects

• FDRmax ε[0,1] – a measure of the domain knowledge of the inspectors

Page 35: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 35

The Model – 4

jiji PP )1(1 1,,

n

PjFDR

n

iji

old

1

0,

max

)(

max

0

, )()(FDR

x

ji dxxPjFDR

The probability that j inspectors will detect defect i may be estimated:

j inspectors are expected to detect FDR(j)n defects:

For n →∞

Page 36: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 36

Kantorowitz Estimator

)1(

)1(11)(

1,0

11,0

max jP

PFDRjFDR

j

Example of application: A quality assurance manager can employ this estimator to estimate the number of inspectors j required to detect the proportion FDR(j) of all faults. The coefficients FDRmax and P0,1 must somehow be estimated

This estimator is implicitly the cost function required in a Total Quality Management (TQM). The number of inspectors j represent the costs, while FDR(j) represents the quality

Page 37: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 37

Application example: What is the optimal inspector team size?

FDR vs. # of inspectors

0

0.2

0.4

0.6

0.8

1

number of inspectors

FD

R

teams of 1 teams of 2 teams of 3

Teams of 2 detects the largest number of defects per inspector

Page 38: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 38

Application example: Comparing Engineers with Students

P0,1

1st year Students

0.400.99

Industry Engineers

0.740.99

Experiment with Missile launcher user requirements document

Example: 4 student teams achieve FDR=053 while only two engineer teams do FDR=0.54, i.e. an engineer detected about twice as many defects as a student

maxFDR

Page 39: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 39

Summary of my estimator

• Based on a property of the data observed in a large number of experiments

• The estimator was derived by modeling the observed property of the data

• Sufficient accurate• Measuring the two coefficients of the model

P0,1and FDRmax is laborious, however, their numerical values may be estimated from similar cases

Page 40: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 40

Table of Contents

• The problem

• Capture Recapture Estimators

• Estimating number of software defects

• Defect injection estimators

• Our experiments

• Our estimator

• Conclusions and Research plans

Page 41: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 41

Surveyed Estimators for the incomplete Counting Problem

• Capture Recapture

• Defect Injection

• My estimator

Page 42: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 42

My Estimator vs. Capture Recapture Estimators

• Our estimator was sufficient accurate for estimating the number of defects in use requirements documents, while the CR estimators were not sufficient accurate

• Were the data in the extensive CR experiments of sufficient good quality?

Page 43: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 43

Why did My Estimator Work

• My estimator exploited a property of the data (the linearity assumption)

• The property was detected through careful extensive experimentation

Page 44: A Population Size Estimation Problem

11/7-2006 @2006 Eliezer Kantorowitz 44

Looking for Similar Applications

• Can the approach of this research be useful in other areas where the employed observation method only count part of the relevant items?