Estimation of Defects Based on Defect Decay Model: ED3M Abstract: An accurate prediction of the number of defects in a software product during system testing contributes not only to the management of the system testing process but also to the estimation of the product’s required maintenance. Here, a new approach, called Estimation of Defects based on Defect Decay Model (ED3M) is presented that computes an estimate the defects in an ongoing testing process. ED3M is based on estimation theory. Unlike many existing approaches, the technique presented here does not depend on historical data from previous projects or any assumptions about the requirements and/or testers’ productivity. It is a completely automated approach that relies only on the data collected during an ongoing testing process. This is a key advantage of the ED3M approach as it makes it widely applicable in different testing environments. Here, the ED3M approach has been evaluated using five data sets from large industrial projects and two data sets from the literature. In addition, a performance analysis has been conducted using simulated data sets to explore its behavior using different
41
Embed
Estimation of Defects Based on Defect Decay Model ED3M
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Estimation of Defects Based on
Defect Decay Model: ED3M
Abstract:
An accurate prediction of the number of defects in a software product during
system testing contributes not only to the management of the system testing process but
also to the estimation of the product’s required maintenance. Here, a new approach,
called Estimation of Defects based on Defect Decay Model (ED3M) is presented that
computes an estimate the defects in an ongoing testing process. ED3M is based on
estimation theory. Unlike many existing approaches, the technique presented here does
not depend on historical data from previous projects or any assumptions about the
requirements and/or testers’ productivity. It is a completely automated approach that
relies only on the data collected during an ongoing testing process. This is a key
advantage of the ED3M approach as it makes it widely applicable in different testing
environments. Here, the ED3M approach has been evaluated using five data sets from
large industrial projects and two data sets from the literature. In addition, a performance
analysis has been conducted using simulated data sets to explore its behavior using
different models for the input data. The results are very promising; they indicate the
ED3M approach provides accurate estimates with as fast or better convergence time in
comparison to well-known alternative techniques, while only using defect data as the
input.
EXISTING SYSTEM:
Several researchers have investigated the behavior of defect density based on
module size. One group of researchers has found that larger modules have lower defect
density. Two of the reasons provided for their findings are the smaller number of links
between modules and that larger modules are developed with more care. The second
group has suggested that there is an optimal module size for which the defect density is
minimal. Their results have shown that defect density depicts a U-shaped behavior
against module size. Still others have reported that smaller modules enjoy lower defect
density, exploiting the famous divide and conquer rule. Another line of studies has been
based on the HAIDER ET AL.: ESTIMATION OF DEFECTS BASED ON DEFECT
DECAY MODEL: ED3M 349 . Convergence statistics, collected from the simulation of
100 data sets generated from the Triple-Linear behavior, of the estimator with (A)10
statistics, collected from the simulation of 100 data sets generated from the
Multiexponential behavior, of the estimator with: 10 percent tolerance, (b) 20 percent
tolerance, and (c) 30 percent tolerance. use of design metrics to predict fault-prone
modules. Briand et al. have studied the degree of accuracy of capture-recapture models,
proposed by biologists, to predict the number of remaining defects during inspection
using actual inspection data. They have also studied the impact of the number of
inspectors and the total number of defects on the accuracy of the estimators based on
relevant capturerecapture models. Ostrand et al. Bell et al. have developed a model to
predict which files will contain the most faults in the next release based on the structure
of each file, as well as fault and modification history from the previous release.
PROPOSED SYSTEM:
Many researchers have addressed this important problem with varying end
goals and have proposed estimation techniques to compute the total number of defects. A
group of researchers focuses on finding error-prone modules based on the size of the
module. Briand et al. predict the number of remaining defects during inspection using
actual inspection data, whereas Ostrand et al predict which files will contain the most
faults in the next release. Zhang and Mockus use data collected from previous projects to
estimate the number of defects in a new project. However, these data sets are not always
available or, even if they are, may lead to inaccurate estimates. For example, Zhang and
Mockus use a naïve method based only on the size of the product to select similar
projects while ignoring many other critical factors such as project type, complexity, etc.
Another alternative that appears to produce very accurate estimates is based on the use of
Bayesian Belief Networks (BBNs) .However, these techniques require the use of
additional information, such as expert knowledge and empirical data, that are not
necessarily collected by most software development companies. Software reliability
growth models (SRGMs) are also used to estimate the total number of defects to measure
software reliability. Although they can be used to indicate the status of the testing
process, some have slow convergence while others have limited application as they may
require more input data or initial values that are selected by experts.
Hardware and Software Requirements:
SOFTWARE REQUIREMENTS
VS .NET 2010,C#
Windows 7.
HARDWARE REQUIREMENTS
Hard disk : 80 GB
RAM : 1GB
Processor : Pentium dual core and above
Monitor : 17’’Color Monitor
Scope of the project :
The goal of the project is estimate the defects in a software product. The availability of this estimate allows a test manager to improve his planning, monitoring, and controlling activities; this provides a more efficient testing process. estimators can achieve high accuracy as more and more data becomes available and the process nears completion.
Introduction :
Software metrics are crucial for characterizing the development status of a software product. Well-defined metrics can help to address many issues, such as cost, resource planning (people, equipment such as testbeds, etc.), and product release schedules. Metrics have been proposed for many phases of the software development lifecycle, including requirements, design, and testing. In this paper, the focus is on characterizing the status of the software testing effort using a single key metric: the estimated number of defects in a software product. The availability of this estimate allows a test manager toimprove his planning, monitoring, and controlling activities; this provides a more efficient testing process. Also, since, in many companies, system
testing is one of the last phases (if not the last), the time to release can be better assessed; the estimated remaining defects can be used to predict the required level of customer support. Ideally, a defect estimation technique has several important characteristics. First, the technique should be accurate as decisions based on inaccurate estimates can be time consuming and costly to correct. However, most estimators can achieve high accuracy as more and more data becomes available and the process nears completion.
By that time, the estimates are of little, if any, use. Therefore,a second important characteristic is that accurate estimates need to be available as early as possible during the system testing phase. The faster the estimate converges to the actual value (i.e., the lower its latency), the more valuable the result is to a test manager. Third, the technique should be generally applicable in different software testing processes and on different kinds of software products. The inputs to the process should be commonly available and should not require extensive expertise in an underlying formalism. In this case, the same technique can be widely reused, both within and among software development companies, reducing training costs, the need for additional tool support, etc. Many researchers have addressed this important problem with varying end goals and have proposed estimation techniques to compute the total number of defects. A group of researchers focuses on finding error-prone modules based on the size of the module . Briand et al. predict the number of remaining defects during inspection using actual inspection data, whereas Ostrand . predict which files will contain the most faults in the next release. Zhang and Mockus use data collected from previous projects to estimate the number of defects in a new project. However, these data sets are not always available or, even if they are, may lead to inaccurate estimates. For example, Zhang and Mockus use a naïve method based only on the size of the product to select similar projects while ignoring many other critical factors such as project type, complexity, etc. Another alternative that appears to produce very accurate estimates is based on the use of Bayesian Belief Networks (BBNs) .
Estimation of Defects based on Defect Decay Model (ED3M), is a novel approach proposed here which has been rigorously validated using case studies, simulated data sets, and data sets from the literature. Based on this validation work, the ED3M approach has been shown to produce accurate final estimates with a convergence rate that meets or improves upon closely
related, well-known techniques. The only input is the defect data; the ED3M approach is fully automated .
Although the ED3M approach has yielded promising results, there are defect prediction issues that are not addressed by it. For example, system test managers would benefit from obtaining a prediction of the defects to be found in ST well before the testing begins, ideally in the requirements or design phase. This could be used to improve the plan for developing the test cases. The ED3M approach, which requires test defect data as the input, cannot be used for this. Alternate approaches which rely on different input data (e.g., historical project data and expert knowledge) could be selected to accomplish this. However, in general, these data are not available at most companies.
A second issue is that test managers may prefer to obtain the predictions for the number of defects on a feature-by-feature basis, rather than for the whole system. Although the ED3M approach could be used for this, the number of sample points for each feature may be too smallto allow for accurate predictions. As before, additional information could be used to achieve such estimations, but this is beyond the scope of this paper. Third, the performance of the ED3M approach is affected when the data diverge from the underlying assumption of an exponential decay behavior.
General Concepts
Software defect prediction is the mechanism of identifying what is the number of defects
that a software under development will have after a certain time frame.
It helps allocating resources to rectify the defects and also helps in software costing.
Generally defect detection is a model which is based on some current data and some
learning data. So most of the models relies on how defect is identified and corrected in an
organization.
Edm3 model on the other hand is a defect prediction model that does not require any
historical data. All you need to give input is bug report of first few days and the rate of
ratifying the bugs.
Based on these inputs, it calculates model parameters lambda and estimate the value of H
which is the curve fitting element for n or number of days over which defect is intended
to be detected.
The concept also takes into consideration of noise factor. A noise is nothing but
misleading defect information. For example in your project facebook connect and
facebook live stream is not working and are reported in the bug report. it is a problem of
facebook api hook. Therefore the second report here. be considered as noise data here.
Module Diagram :
UML Diagram :
Use case diagram :
Project Defect History
Browse (To select the project)
Estimate the defects in a project
Error correction method
Report process
user
Login
Browse
Error estimation
Error correction
Report
Admin
Log of Defect Correction
Class Diagram :
Collaboration Diagram :
Sequence Diagram :
StateChart Diagram :
Activity Diagram :
Component Diagram :
Project Flow Diagram :
Browse the input folderuser
Apply Error Estimation techniques
Error Correction
Report for admin
System Architecture :
Literature review :
The traditional way of predicting software reliability has since the 1970ies been the use of software reliability growth models. They were developed in a time when software was developed using a waterfall process model. This is inline with thefact that most software reliability growth models require a substantial amount of failure data to get any trustworthy estimate of the reliability. Software reliability growth models are normally described in the form of an equation with a number of parameters that need to be fitted to the failure data. A key problem is that the curve fitting often means that the parameters can only be estimated very late in testing and hence their industrial value for decision-making is limited. This is particularly the case when development is done, for example, using an incremental approach or other short turnaround approaches. A sufficient amount of failure data is simply not available. The software reliability growth models have initially been developed for a quite different situation than today. Thus, it is not a surprise that they are no really fit for the challenges today unless the problems can be circumvented. This paper addresses some of the possibilities of addressing the problems with software reliability growth models by looking at ways of estimating the parameters in software reliability growth models before entering integration or system testing.
Construction simulation tools typically provide results in the form of numerical or statistical data. However, they do not illustrate the modeled operations graphically in 3D. This poses significant difficulty in communicating the results of simulation models, especially to persons who are not trained in simulation but are domain experts. The resulting “Black-Box Effect” is a major impediment in verifying and validating simulation models. Decision makers often do not have the means, the training and/or the time to verify and validate simulation models based solely on
the numerical output of simulation models and are thus always skeptic about simulation analyses and have little confidence in their results. This lack of credibility is a major deterrent hindering the widespread use of simulation as an operations planning tool in construction. This paper illustrates the use of DES in the design of a complex dynamic earthwork operation whose control logic was verified and validated using 3D animation. The model was created using Stroboscope and animated using the Dynamic Construction Visualizer.
Over the years, many defect prediction studies have been conducted. The studies consider the problem using a variety of mathematical models (e.g., Bayesian Networks, probability distributions, reliability growth models, etc.) and characteristics of the project, such as module size, file structure, etc. A useful survey and critique of these techniques is available in .Several researchers have investigated the behavior of defect density based on module size. One group of researchers has found that larger modules have lower defect density. Two of the reasons provided for their findings are the smaller number of links between modules and that larger modules are developed with more care. The second group has suggested that there is an optimal module size for which the defect density is minimal. Their results have shown that defect density depicts a U-shaped behavior against module size. Still others have reported that smaller modules enjoy lower defect density, exploiting the famous divide and conquer rule. Another line of studies has been based on theuse of design metrics to predict fault-prone modules . Briand have studied the degree of accuracy of capture-recapture models, proposed by biologists, to predict the number of remaining defects during inspection using actual inspection data. They have also studied the impact of the number of inspectors and the total number of defects on the accuracy of the estimators based on relevant recapture models. Ostrand and Bell have developed a model to predict which files will contain the most faults in the next release based on the
structure of each file, as well as fault and modification history from the previous release. Their research [5] has shown that faults are distributed in files according to the famous Pareto Principle, i.e., 80 percent of the faults are found in 20 percent of the files. Zhang and Mockus assume that defects discovered and fixed during development are caused by implementing new features recorded as Modification Requests (MRs). Historical data from past projects are used to collect estimates for defect rate per feature MR, the time to repair the defect in a feature, and the delay between a feature implementation and defect repair activities. The selection criteria for past similar projects are based only on the size of the project while disregarding many other critical characteristics. These estimates are used as input to a prediction model, based on the Poison distribution, to predict the number of defect repair MRs.The technique that has been presented by Zhang and Mockus relies solely on historical data from past projects and does not consider the data from the current project. Fenton have used BBNs to predict the number of defects in the software. The results shown are plausible; the authors also explain causes of the results from the model. However, accuracy has been achieved at the cost of requiring expert knowledge of the Project Manager and historical data (information besides defect data) from past projects. Currently, such information is not always collected in industry. Also, expert knowledge is highly subjective and can be biased. These factors may limit the application of such models to a few companies that can cope with these requirements. This has been a key motivating factor in developing the ED3M approach. The only information ED3M needs is the defect data from the ongoing testing process; this is collected by almost all companies. Gras also advocate the use and effectiveness of BBNs for defect prediction. However, they point out that the use of BBN is not always possible and an alternative method, Defect Profile Modeling (DPM), is proposed. Although DPM does not demand as much on calibration as BBN, it does rely on data from past projects, such as the defect
identifier, release sourced, phase sourced, release found, phase found, etc.Many Reliability models have been used to predict the number of defects in a software product. The models have also been used to provide the status of the testing process based on the defect growth curve. For example, if the defect curve is growing exponentially, then more undiscovered defects are to follow and testing should continue. If the growth curve has reached saturation, then the decision regarding the fate of testing can be reviewed by managers and engineers.
Advantages :
1) Accurate estimates as early as possible during the system testing process.
2) Much more information to compute the estimates.3) Estimate the large modules.4) Correct the current estimate ,corrected value is the output.
To improve on the existing approaches, in particular with respect to the applicability, the following characteristics are needed in a technique.
First, it should use the defect count, an almost ubiquitous input, as the only data required to compute the estimates (historical data are not required). Most companies, if not all, developing software have a way to report defects which then can be easily counted. Second, the user should not be required to provide any initial values for internal parameters or expert knowledge; this results in a fully automated approach. Third, the technique should be flexible; it should be able to produce estimates based on defect data reported in execution time or calendar time.
Numerous application areas, such as signal processing, defect estimation, and software reliability, need to extract information from observations that have been corrupted by noise. For example, in a software testing process, the observations are the number of defects detected; the noise may have been caused by the experience of the testers, sizeand complexity of the application, errors in collecting the data, etc. The information of interest to extract from the observations is the total number of defects in the software. A branch of statistics and signal processing, estimation theory, provides techniques to accomplish this.
observations: 0,1,--N
Information to extract from dataset
Implementation
bmp = new Bitmap(pictureBox1.Width, pictureBox1.Height); g = Graphics.FromImage(bmp); listBox1.Items.Clear(); OpenFileDialog ofd = new OpenFileDialog(); DialogResult dr=ofd.ShowDialog(); if (dr.Equals(DialogResult.OK)) {
string fname = ofd.FileName; string[] Lines = File.ReadAllLines(fname); for (int i = 0; i < Lines.Length; i++) { listBox5.Items.Add(Lines[i]); } BugReport[] bug = new BugReport[Lines.Length-1]; int BugCount = bug.Length; for (int i = 0; i < Lines.Length-1; i++)// first line is your header. so start reading from next //line { try { string s = Lines[i]; // it is in the form "2.1.2011 error!" string[] data = s.Split(new char[] { '\t' });// Remember split takes a character array as //input. So even if you had 1 character, it is to be passed as an array. string date = data[0];// convert string to integer. string name = data[1]; // first part is name and second part is age Right? bug[i] = new BugReport(name, date); BugCount = i; } catch (Exception ex) { }
} //suppose the DataGridView name is dataGridView1, then for assigning the value: //MessageBox.Show(bug[0].Bug+bug[0].date); //dataGridView1.DataSource = new BugReport[]{new BugReport("1","Error")}; MessageBox.Show("Bug Count="+BugCount.ToString());
float [] W=new float[BugCount]; for (int i = 0; i < BugCount; i++) {
for (int i = 1; i < BugCount; i++) { float f = 0; f = NGram.GetBigramSimilarity(W[i].ToString(), W[i-1].ToString()); listBox2.Items.Add(f);
} ////////////// Let theta be the date variable. Find the probability of Error at certain dates //P[x/theta] is nothing but probability of // Now Calculate W[i]. W[i] is the noise component. Noise is the Correlated Values. //Date wise total Bug is x[N], in a single date there might be more than one error. ArrayList totErrors = new ArrayList(); int err = 1; for (int i = 1; i < BugCount; i++) { if (bug[i].date != bug[i - 1].date) { totErrors.Add( err); } else { err++; }
} int[] X = new int[totErrors.Count]; for (int i = 0; i < X.Length; i++) { X[i] = (int)totErrors[i]; }
BugCount = X.Length;//Update Bugcount //Now theta Becomes date Range... 0,1,2,3 first date, second date and so on //find P[x(n)/theta] double sigma = .5;
double [] Px_Theta=new double[BugCount]; for (int i = 0; i < BugCount; i++) { double p = 1 / (Math.Sqrt(2 * 3.14 * sigma)); double p1 = Math.Exp(-1*(X[i] - i) * (X[i] - i)/(2*sigma*sigma)); Px_Theta[i] = p1 / p; listBox3.Items.Add(Px_Theta[i]); } // R(t) is the Rate of Error Correction velocity. // From observation let us assume that once an error is tracked, is solved in average 2 days double Rt = 2.0; //Lambda1= Rate of Error Occurance, Lambda2 is scale of Error Occurance // Rate of Error Occurance is Given by total Errors/Distinct dates double lambda1 = (double)bug.Length / (double)BugCount; double lambda2 = 0; //Now we know R(init), Rt,Lambda1, eqn 23 has to solve for lambda2 double K = Rt /(double) X[0]; lambda2=K*lambda1/(1-Math.Exp(-1*lambda1)); //Now calculate h(n) double []H=new double[BugCount]; double[] LogCk = new double[BugCount]; int x1=pictureBox1.Width, x2=0, y1=pictureBox1.Height, y2=pictureBox1.Height; for (int i = 0; i < BugCount; i++) { double t1=lambda2/(lambda2-lambda1); t1 = t1 * Math.Exp(-1*lambda1*i);