Page 1
DSpace Institution
DSpace Repository http://dspace.org
Software Engineering thesis
2020
SOFTWARE EFFORT ESTIMATION
MODEL USING GENETIC AND
PARTICLE SWARM OPTIMIZATION ALGORITHM
DEMLEW, MEHARY
http://hdl.handle.net/123456789/11293
Downloaded from DSpace Repository, DSpace Institution's institutional repository
Page 2
BAHIR DAR UNIVERSITY
BAHIR DAR INSTITUTE OF TECHNOLOGY
SCHOOL OF RESEARCH AND POSTGRADUATE STUDIES
FACULITY OF COMPUTING
SOFTWARE EFFORT ESTIMATION MODEL USING GENETIC AND
PARTICLE SWARM OPTIMIZATION ALGORITHM
BY
MEHARY DEMLEW AREGU
BAHIR DAR, ETHIOPIA
August, 2020
Page 3
i
SOFTWARE EFFORT ESTIMATION MODEL USING GENETICS AND PARTICLE
SWARM OPTIMIZATION ALGORITHM
BY
MEHARY DEMLEW AREGU
A thesis submitted to the school of Research and Graduate Studies of Bahir Dar Institute of
Technology, BDU in partial fulfillment of the requirements for the degree of Master of
Science
in Software Engineering in the faculty of Computing.
Advisor: MEKUANINT AGEGNEHU (PhD)
Bahir Dar, Ethiopia
August, 2020
Page 5
iii
© 2020
Mehary Demlew Aregu
ALL RIGHTS RESERVED
Page 7
v
Acknowledgement
Coming this far is not easy and by chance, there is always GOD on behalf of me. Only GOD
deserves all praises who makes everything is possible.
My special thanks and appreciation goes to Dr. Mekuanint Agegnehu for his continuous support,
scientific supervision and constructive guidance right from the moment of problem formulation to
the completion of this research. He has been there for me when I need his support, to review the
paper and to reply feedback at the right time. The way he has been advising me and his punctuality
was inspiring and helped me to do a lot, even in the future. His enthusiasm and encouragement
helped me to do my task day and night so that the research completed on-time. Dr. Mekuanint
Agegnehu was not only my research advisor, he was also my mentor throughout the year.
Another special thanks and appreciation goes to Mr. Daniel Tsegaye who was my project advisor
when I do my undergraduate project, for his continuous moral support and his advices during these
research study. My grateful thanks to my brother, Biruk Demlew. He is always my inspiration.
How good it is having a brother who can be and do anything for his family. He has been helping
me morally, financially when I need online course materials. I have to say thank you my friend
and classmates Zelalem Fiseha, Belay and Atelaw mulatu who helps me in designing the model of
this research and all the conversation we had together throughout the year. Finally I would like to
say thank you my family for their love, much respect and moral support all the time.
Page 8
vi
Abstract
Software effort estimation is the process of predicting the number of human required to develop a
particular software project. During software development, the initial requirement is usually
changed and this makes the project manager to update the software effort, cost, and schedule. To
manage the change of effort, cost, and schedule of a software project, the initial software effort,
and cost estimation need to be accurate. Various researchers have used machine learning
algorithms and algorithmic techniques to improve the accuracy of software effort estimation. The
Constructive Cost Model (COCOMO) is an algorithmic model which is widely used as a software
effort, cost, and time estimation model. However, the COCOMO model and machine learning-
based approach have limitation on estimating the software effort accurately due to the non-
deterministic nature of the problem. Meta-heuristic algorithms are better to find near-optimum
solutions at a reasonable computational cost for parameter optimization. So that it can be a good
technique. In this research paper, a hybrid genetic and particle swarm optimization algorithm based
model is proposed. A hybrid genetic and particle swarm optimization algorithm is used for
optimizing the coefficient parameters of intermediate COCOMO model. The thesis used the
strength of the two algorithms to design an effective software effort estimation model. PSO used
to generate an initial local optimum solution and GA used to optimize parameter values of
COCOMO coefficients. The proposed model was trained and tested using NASA software
datasets. To evaluate the performance of our model, we used the five well known and widely used
software effort and cost estimation accuracy measures: - Percentage of Prediction (PRED (0.25)),
Magnitude Relative Error (MRE), Mean Magnitude Relative error (MMRE), Mean of absolute
error (MAE), and Mean absolute percentage error (MAPE). The results showed that the Magnitude
relative error (MRE) of the proposed model in comparison with COCOMO, GA, and PSO model
is reduced to 362.07%, 120.53%and 21.81% respectively.
Page 9
vii
Table of contents
Acknowledgement ................................................................................................................................. v
Abstract ................................................................................................................................................ vi
Table of contents.................................................................................................................................. vii
List of Abbreviations ............................................................................................................................. x
List of figures ........................................................................................................................................ xi
List of tables.......................................................................................................................................... xi
CHAPTER ONE .................................................................................................................................... 1
INTRODUCTION ................................................................................................................................. 1
1.1. Background ................................................................................................................................ 1
1.2. Motivation .................................................................................................................................. 3
1.3. Statement of the problem........................................................................................................... 4
1.3.1. Research Question .............................................................................................................. 5
1.4. Objective of the study ................................................................................................................ 5
1.4.1. General objective ................................................................................................................ 5
1.4.2. Specific objective ................................................................................................................ 5
1.5. Scope and limitation................................................................................................................... 5
1.6. Methodology of the study........................................................................................................... 6
1.6.1. Data collection .................................................................................................................... 6
1.6.2. Simulation Environments................................................................................................... 6
1.6.3. Experimental Evaluation ................................................................................................... 6
1.6.3.1. Evaluation Criteria in software cost estimation ........................................................... 6
1.7. Significance of the study ............................................................................................................ 7
1.8. Organization of the study .......................................................................................................... 8
CHAPTER TWO................................................................................................................................... 9
LITERATURE REVIEW ..................................................................................................................... 9
2.1. Introduction .................................................................................................................................... 9
2.2. 1. Project Size estimation ........................................................................................................ 10
2.2.1.1. Lawrence H. Putnam LOC Estimation .......................................................................... 10
2.2.1.2. Function point Analysis ................................................................................................. 10
Page 10
viii
2.3. Software cost and effort estimation techniques ................................................................... 11
2.3.1. Constructive Cost Model(COCOMO) ......................................................................... 12
2.3.2. COCOMO II Model ..................................................................................................... 16
2.3.3. SLIM Model.................................................................................................................. 18
2.3.4. Experience based Estimation: ...................................................................................... 18
2.3.5. Estimation by analogy .................................................................................................. 18
2.3.6. Top down and bottom up approach ............................................................................. 18
2.4. MetaHeuristics Algorithms .................................................................................................. 19
2.4.1. Ant Colony Optimization (ACO) ................................................................................. 19
2.4.2. Particle Swarm Optimization (PSO) ............................................................................ 20
2.4.3. Genetics Algorithm ....................................................................................................... 25
2.4.4. Hybridization of Meta-heuristic Algorithm ................................................................. 29
2.5. Related works ....................................................................................................................... 29
2.5.1. Summary of related works ........................................................................................... 32
CHAPTER THREE............................................................................................................................. 34
DESIGN OF METHODOLOGY ........................................................................................................ 34
3.1. Introduction .............................................................................................................................. 34
3.2. Design of the proposed model ................................................................................................... 35
3.3. Generating Initial solution using Particle Swarm Optimization ............................................. 36
3.3.1. Generate initial Random values and assign Pbest value of A and B ................................. 38
3.3.2. Calculate the fitness function ............................................................................................. 38
3.3.2. Calculate the Velocity and update the position of Particles A and B ................................ 39
3.4. Optimizing the Coefficient value of A and B using the Genetic Algorithm............................. 40
3.4.1. Calculate fitness value ........................................................................................................ 41
3.4.2. Selection .............................................................................................................................. 41
3.4.3. Crossover ............................................................................................................................ 41
3.4.4. Mutations............................................................................................................................ 41
CHAPTER FOUR ............................................................................................................................... 42
EXIPERMENTAL RESULT AND DISCUSSION ............................................................................. 42
4.1. Introduction ......................................................................................................................... 42
4.2. Dataset Description .............................................................................................................. 42
4.3. Simulation environment ....................................................................................................... 44
Page 11
ix
4.4. Experiment results ............................................................................................................... 44
4.4.1. Experimental Result for Organic Model on NASA60 dataset ..................................... 44
4.4.2. Experimental result for Semi-detached COCOMO Model on NASA60 dataset ........ 48
4.4.3. Experimental result for Embedded COCOMO Model on NASA60 dataset ............... 51
4.4.4. PSO_GA model effort comparison with research done by (Maleki, Ghaffar, &
Masdari, 2014) for NASA 60 datasets ......................................................................................... 53
4.4.5. Experimental Result for organic COCOMO model on NASA63 dataset ................... 55
4.4.6. Experimental Result for semi-detached COCOMO model on NASA63 dataset ........ 57
4.4.7. Experimental Result for embedded COCOMO model on NASA63 dataset ............... 59
4.4.8. Experimental Result for organic COCOMO model on NASA 93 datasets ................. 60
4.4.9. Experimental result for semi-detached COCOMO model on NASA 93 dataset ....... 61
4.4.10. Experimental Result for Embedded COCOMO model on NASA 93 dataset ............. 62
CHAPTER FIVE ................................................................................................................................. 64
CONCLUSION AND RECOMENDATION ...................................................................................... 64
5.1. Conclusion ................................................................................................................................. 64
5.2. Contribution .............................................................................................................................. 64
5.3 Future work ............................................................................................................................... 65
REFERENCES .................................................................................................................................... 66
APPENDIX .......................................................................................................................................... 70
Appendix 1, Dataset sample with its attributes .............................................................................. 70
Appendix 2, Sample python code to calculate the fitness of each coefficients................................ 71
Appendix 3, Sample initial value generated for coefficients ........................................................... 72
Page 12
x
List of Abbreviations
A Multiplicative Constant
ACO Ant colony optimization
B Exponential Constant
COCOMO Constructive Cost Model
EMs Effort multipliers
GA genetics Algorithm
Gbest Global best value
IEAM-RP Improved Environmental Adaptive Method
IWO Invasive weed optimization algorithm
LOC Line of code
MAE Mean of absolute error
MAPE Mean Absolute percentage error
MD Mathian distance
MMRE Mean Magnitude Relative Errors
MRE magnitude Relative Errors
MSE Mean square error
Pbest Personal Best Value
PRED (n) Percentage of prediction
PSO Particle Swarm optimization
PSO_GA particle swarm optimization and Genetics algorithm
RMSE Root mean square error
SCE Software Cost Estimation
SEE Software Effort estimation
SEER_SEM Software Evaluation and estimation of Resources-Software Estimation Mode
SLM Software Life Cycle management
SLOC/KLOC thousands line of code
VAM/EMF Value/effort adjustment multiplier
Page 13
xi
List of figures
Figure 2. 1. The basic working of ant colony optimization algorithm (Blum, 2005).......................... 20
Figure 2. 2. Graphical representation of PSO .................................................................................... 22
Figure 2. 3. Flowchart diagram for PSO ............................................................................................ 24
Figure 2. 4. Basic operation steep in genetics algorithm..................................................................... 26
Figure 2. 5. Single point crossover, two point crossover and uniform crossover ............................... 28
Figure 3. 1.The proposed PSO_GA model system architecture ......................................................... 36
Figure 3. 2. Flow chart diagram for PSO............................................................................................ 37
Figure 3. 3. Section of genetics Algorithm operation for the proposed methodology ........................ 40
Figure 4. 1. Actual, COCOMO, and PSO_GA effort for organic model ........................................... 45
Figure 4. 2. Relative Error Figure for organic model ........................................................................ 48
Figure 4. 3. MRE for semi-detached model ........................................................................................ 49
Figure 4. 4. Effort graph for embedded model ................................................................................... 52
Figure 4. 5. Effort graph for organic model on NASA63 dataset. ...................................................... 57
Figure 4. 6. MRE for semi-detached model ........................................................................................ 59
List of tables
Table 1. 1. Software effort and cost evaluation metrics (Miandoab & Gharehchopogh, 2016) .......... 7
Table 2. 1. Complexity weight (Ochieng, Mwangi, & Mwgha, 2014) ................................................. 11
Table 2. 2. Basic-COCOMO model types and its project size (Boehm, 1984) ................................... 12
Table 2. 3. . Basic COCOMO projects Coefficient value (Boehm, 1984) ........................................... 13
Table 2. 4. Coefficients value in Intermediate Model (Boehm, 1984)................................................. 14
Table 2. 5. . Cost factor and their weight in intermediate COCOMO (Salijoughinejad & Khatibi,
2018) ..................................................................................................................................................... 15
Table 2. 6. Effort multiplier rating scale and its value for detailed cocomo model (Glinz & Mukhija,
2003) ..................................................................................................................................................... 16
Table 2. 7. COCOMO II effort multipliers (Singal, Kumari, & Sharma, 2020) ................................ 17
Table 2. 8. Parameters of PSO ............................................................................................................ 22
Table 2. 9. Summary of related works ................................................................................................ 33
Page 14
xii
Table 3. 1. Parameter of PSO and its value ........................................................................................ 40
Table 3. 2. Parameters and its value of Genetic Algorithm ................................................................ 42
Table 4. 1. Dataset attribute class, name and its code ........................................................................ 44
Table 4. 2. Estimated effort for organic model ................................................................................... 46
Table 4. 3. Relative errors comparison between models using evaluation criteria ............................ 48
Table 4. 4. Estimated effort for semi-detached model ........................................................................ 49
Table 4. 5. Semi-detached model evaluation using SCE evaluation metrics ...................................... 50
Table 4. 6. Effort for Embedded Model .............................................................................................. 52
Table 4. 7. Comparison of embedded models using evaluation metrics............................................. 53
Table 4. 8. MRE of PSO_GA, COCOMO and (Maleki, Ghaffar, & Masdari, 2014) model ............. 54
Table 4. 9. Estimated Effort of models ................................................................................................ 55
Table 4. 10. Evaluation of organic model using evaluation criteria ................................................... 56
Table 4. 11. Effort comparison using SCE metrics for semi-detached model .................................... 58
Table 4. 12. Effort comparison for embedded model on NASA 63 dataset ........................................ 60
Table 4. 13. Organic model effort comparison on NASA93 dataset ................................................... 61
Table 4. 14. Semi-detached model effort comparison on NASA93 dataset ........................................ 62
Table 4. 15. Semi-detached model effort comparison on NASA93 dataset ........................................ 63
Page 15
1
CHAPTER ONE
INTRODUCTION
1.1. Background
Software cost estimation is a sequence of procedures with techniques that used to arrive to estimate
the effort, dollar cost, and schedule for a particular software project. The software effort measures
the number of man power required to develop a software product, schedule estimation deals with
determining how much time would a particular software project takes to complete, and dollar cost
estimation is the process determining the overall software project cost. The effort, dollar cost, and
schedule are measured in person-months, in dollars and calendar-time respectively (PMI, 2017).
When software projects are getting complex in size, determining the amount of effort, dollar cost
and time to complete a software project is a big challenge which results in fundamental problems
in cost, time-to-market, functionality, and quality requirements. This problem could be overcome
by having a good software project management. Software effort and cost estimation plays
significant role to have good software project management. In software project development, the
software project manager uses the application of knowledge, tools, skills, and techniques to make
sure that the software is delivered on-time and as per the required quality. In the process, the
techniques, preliminary software cost, and effort estimation which is used in different phases of
the project life cycle by the project manager need to be accurate. Because inaccurate software
effort and cost estimation lead to software project failure.
Accurate software effort estimation is very important in software project management because it
helps to determine the operational and economic feasibility of the project at the beginning in a
software project lifecycle, helps the project manager to determine what resources should be used
and how the resources should be used. Good software effort estimation provide assurance and
reduce the level of risks. During software development, the initial requirement is usually changed
and the project manager need to update the software effort and schedule. This means accurate
preliminary software effort estimation helps the project manager to make a decision for re-planning
when changes happen in the project. Managing and controlling a software development process is
possible as long as the early effort estimation is accurate. A software project could not be able to
complete within a given schedule and budget when the project is underestimated. And too many
Page 16
2
resources are committed to the project when the project is overestimated. Therefore, accurate
software effort estimation is required in the early stage of software development. But practically
most software projects do not deliver on-time, on-budget, and as per the request quality. On
average only 16.2% for software projects that are completed on-time and on-budget, in the large
company only 9% of their project comes in on-time and on-budget (Chaos Report , 2015). And
statistics by the same organization proved that the total project success rate is about 30.3%, project
challenged is about 46% and the total project failure rate is 23.4%. (Mandal, 2015).
Software effort and cost estimation is one of the most challenging areas of project management
(Przemyslaw Pospieszny, 2018) (Y.Sangeetha M.Tech (Ph.d), 2012). Researchers and academic
professionals have been struggling to develop a model for software effort and cost estimation. As
a result, many software effort and cost estimation techniques have been suggested. Broadly these
techniques classified into algorithmic and non-algorithmic techniques. The algorithmic model uses
the major cost factors in a mathematical formula to estimate the effort. Constructive Cost Model
(COCOMO), SLIM (software Life Cycle Management) model, Function Point based model, use
case point analysis, and Putnam’s Model (Shekhar & Kumar, 2016) are some of the algorithmic
techniques. In Non-algorithmic techniques estimation is computed from previous similar project
experiences. Analogy techniques, Expert judgment, Parkinson’s Law, pricing to win (L.R. Nerkar,
2014) are included in non-algorithmic techniques.
The COCOMO model is a well-documented and widely used algorithmic model to estimate the
effort, time, and cost of software project (Sachan, Nigam, Singh, & et al, 2016). It was developed
by W. Boehm (Boehm, 1984) based on a historical dataset of 63 projects. In this model, effort of
software is computed using software size as a major parameter and cost factors as effort adjustment
factor. The software size is represented in thousand-lines of code (KLOC). The COCOMO model
has three types: - Basic Model, Intermediate Model, and Detailed Model. In each type of
COCOMO model, there are three mode of projects: - organic, semi-detached, and Embedded. This
model defines a mathematical equation to estimate the effort of software project. The equation is
defined in equation 1 (Maleki, Ghaffar, & Masdari, 2014).
Effort = A × (KLOC)B × EMF (1)
Page 17
3
Where, A and B are multiplicative and exponential constant respectively. KLOC is the size of
software and EMF is the product of all cost factors (effort multipliers). In basic COCOMO model,
the value of EMF is 1.
Currently, Meta-Heuristic algorithm was found successful in efficiently estimating the effort of
projects due to their population-based search techniques (Singh, Singh, & Mishra, 2018). In this
research, a hybrid particle swarm optimization (PSO) and genetic Algorithm (GA) is used for
optimizing the parameters (coefficients) value of Intermediate COCOMO Model so that more
realistic effort can be estimated. The PSO algorithm used for generating an initial parameters value
of COCOMO model and the GA used for optimizing the parameters value from the PSO. The
proposed hybrid model (PSO_GA) is trained with NASA60 projects dataset and tested using
NASA60, NASA63 and NASA93 software projects dataset. PRED (0.25), MRE, MMRE, MAE,
and MAPE software effort and cost estimation evaluation criteria is used to evaluate the
performance of the proposed model.
1.2. Motivation
The motivation for this thesis is back to the course software project management. In software
project management, less than 20% of the software project is deliver on-time, on-budget and as
per the request quality. One of the major factor to have these ratio is inaccurate software effort
and cost estimation early in the project life cycle. After analyzing this, I came up to work-on and
improving the estimation efficiency of COCOMO. COCOMO is chosen because, the model
contain most product, project, personal and platform attributes which can directly or indirectly
affect the effort of software project. Secondly, it is the most documented and widely used software
effort and cost estimation techniques in most software company.
For the last three decades, much research and software effort and cost estimation models have
been done, which shows that this area is so significant that it has gained continuous research
attention and still a hot research issue (Singh, Singh, & Mishra, 2018). Even though many research
papers are done on software effort or cost estimation, none of them can’t achieve a satisfactory
result which can help the software product to deliver on-time, on-budget, and as per the request
quality. Most of the research work in software effort and cost estimation is concentrated on the
rough and quick calculation of effort only using the size of the software as cost function which
Page 18
4
results in the software effort, time and the overall cost not to estimate very well at the beginning
of software life cycle. But in reality, there is much non-functional attribute that affects the cost of
a particular software product. After analyzing this problem we come up to contribute a little effort
to show the effect of using two meta-heuristic algorithms together on the efficiency of software
effort estimation.
1.3. Statement of the problem
In software development, there are many interrelated factors whose relationship is not well
understood and affects the software product quality directly or indirectly. And this makes
estimating the software cost and effort difficult using the algorithmic methods. And are incapable
of combining incomplete information and these defects make the extraction of important
information face with fault (Ahadi & Jafaria, 2016). When the estimation problem is affected by
the numbers of cost factors and variables, the algorithmic methods will be unable to achieve the
real answers (Maleki, Ghaffar, & Masdari, 2014).
The current method of software project effort estimation suffers from lack of accuracy and focused
on some factors (mostly the software size) related to software development process while
neglecting other functional and non-functional attributes which can directly affect the cost of a
particular software project (BaniMustafa, 2018). Most software Cost estimation research including
(Sachan, Nigam, Singh, & et al, 2016) (Singh, Singh, & Mishra, 2018) focused only on optimizing
the Basic COCOMO model and its accuracy is restricted since only Line of code is used as a cost
factor to estimate the software effort. Our research is focused on the Intermediate COCOMO
model, a model that takes many cost-driving factors and KLOC in combination to estimate the
effort of a software project.
In the particle swarm optimization algorithm, each particle can participate and make decision for
a cost function upon finding optimum solutions so that there is much probability not to lose local
optimal solution. PSO has a memory, so knowledge of good solutions is retained by all the
particles; whereas in GA, previous knowledge of the problem is discarded once the population
Page 19
5
changes (Kao & Zahara, 2008). So that there is an opportunity to lose the optimum solutions in a
genetic algorithm and to converges into local minimum value before finding the global minimum.
To solve this problem, we use the PSO algorithm to generate an initial solution and GA to optimize
the value of coefficients.
1.3.1. Research Question
In this study, we investigated the following research questions.
RQ1. What is the impact of using particle swarm and genetics algorithm in combination to tune
the coefficient parameter value of Constructive cost model?
RQ2. How the multiplicative constant (A) and exponential constant (B) parameter value of
COCOMO is to be reviewed by the PSO and GA?
RQ3. How assigning optimal weight for COCOMO coefficients is possible?
1.4. Objective of the study
1.4.1. General objective
The major objective of this research is to develop efficient software development effort
estimation model using a genetic and particle swarm optimization algorithm.
1.4.2. Specific objective
To analyze the impact of the hybrid PSO_GA model in optimizing the coefficients
parameter value of COCOMO model
To generate optimal weight value for COCOMO coefficients parameters
To make estimated effort comparison with state-of-the-art techniques
1.5. Scope and limitation
The main focus of our research is providing efficient software cost estimation model by optimizing
Intermediate COCOMO model parameters value. In this thesis, the model to be developed is based
on the Components of Intermediate COCOMO model. The effect of each component of COCOMO
Page 20
6
in each phase of the software development process is not going to be considered. And also, the
quality of the software to be developed is not going to be considered. The limitation of this research
is that the estimation process is not beyond the system level. In other word, in the COCOMO
model, a particular software project is considered as a homogenous entity, composed from a single
sub-system. Which means, the COCOMO effort multipliers will have only one nominal scale value
throughout the estimation process. But in reality, a software system may composed from smaller
and heterogeneous subsystem. So that, the effort of each sub system should be calculated
differently with different effort multipliers value.
1.6. Methodology of the study
1.6.1. Data collection
The dataset for this research is from the COCOMO NASA dataset which was collected by Jairus
Highn (Promise software engineering Repository, 2005). We used three different datasets that
contain 60, 63, and 93 software projects and each project has 17 attributes. The datasets we used
for our research have been used by many researchers including Research work (Maleki, Ghaffar,
& Masdari, 2014) (Algabri, Saeed, Mathkour, & et al, 2015). The dataset attributes are composed
of four class of attributes named product attributes, hardware attributes, personnel attributes, and
project attributes.
1.6.2. Simulation Environments
For this research, we used Anaconda Environment which is an open-source distribution of python
which contains python modules and packages for scientific computing. Besides, we used Pandas
as a software library for data manipulation and analysis and Numpy for controlling multi-
dimensional array input datasets which are used to calculate the fitness function value of individual
particle or population in particle swarm optimization and genetics algorithm respectively.
1.6.3. Experimental Evaluation
1.6.3.1. Evaluation Criteria in software cost estimation
In software effort and cost estimation, the accuracy of the work done is evaluated by a serious
of evaluation criteria. Magnitude Relative Errors (MRE), Mean Magnitude Relative errors
(MMRE), Median Relative errors some others have been used as evaluation metrics in software
cost estimation. Table 1.1 shows evaluation metrics in software effort and cost estimation.
Page 21
7
Evaluation metrics Name Mathematical formula
Mean Magnitude Relative Error (MMRE) MMRE = 1/𝑛 ∑|𝑎𝑐𝑡 − 𝑒𝑠𝑡|\𝑎𝑐𝑡
𝑛
𝑖=0
Magnitude Relative Error (MRE) MRE =
|Actual effort − estimated Effort|
Actual effort
Percentage of prediction ( PRED(m)) K/n, where k the number of project whose MRE is
<=m in n testing dataset
Median Magnitude Relative error
(MDMRE) 𝑃𝑅𝐸𝐷(𝑛) =1
n∗ ∑
{1, 𝑖𝑓 𝑀𝑅𝐸 ≤ 𝑛0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑛
𝑛=1
Mean squared error (MSE) 𝑀𝑆𝐸 = 1/𝑛 ∑(act − est)2
𝑛
𝑖=0
Root mean squared error (RMSE) RMSE = √∑(act − est)2
𝑛
𝑖=0
Mean of absolute error (MAE) 𝑀𝐴𝐸 = 1/n ∑(𝑎𝑐𝑡 − 𝑒𝑠𝑡)
𝑛
𝑛=1
Mean absolute percentage error (MAPE) 𝑀𝐴𝑃𝐸 = 1/n ∑ (𝑎𝑛 |
𝑎𝑐𝑡 − 𝑒𝑠𝑡
𝑎𝑐𝑡| ∗ 100)
𝑛
𝑖=1
Table 1. 1. Software effort and cost evaluation metrics (Miandoab & Gharehchopogh, 2016)
In our research, we used Mathian distance (MD), the absolute difference between the actual effort
and estimated effort as a fitness function. MRE, MMRE, PRED (0.25), and MAPE are used as
performance evaluation criteria. The MRE is by far the most widely used measure of effort
estimation accuracy (Karna & Gotovac, 2014).
1.7. Significance of the study
It will minimize underestimating and overestimating of software cost and effort by
improving estimation Accuracy.
The research will minimize the relative errors.
Page 22
8
This research will increase the project success rate.
It will assists software companies to better analyze the feasibility of the project and also
efficient management of development process
This research is going to be used as a benchmark for different researcher community
1.8. Organization of the study
This section presents the remaining work of this research as follows.
Chapter Two deals with literature about software effort estimation techniques and models which
have been used for software effort estimation in software project management. And also related
works or state of the art which has been done using algorithmic techniques, machine learning
algorithm and meta-heuristic Algorithm are presented.
In Chapter Three, the proposed research methodology is described. Detail explanation of the
proposed solution and implementation is presented. In chapter four experimental result, evaluation
of the proposed method and comparison with COCOMO and other state-of-the-art models is
conducted. In the last chapter, chapter five, focused on the conclusion, contribution and future
works to be done in the future are covered.
Page 23
9
CHAPTER TWO
LITERATURE REVIEW
2.1. Introduction
For effective software project management, accurate effort estimation is needed. With having
accurate software effort and cost estimation the project manager can easily manage and control the
project activities more easily and efficiently. Many algorithmic techniques, non-algorithmic
techniques, and machine learning algorithms have been developed for this purpose. Software life
cycle management (SLIM), Constructive cost model(COCOMO), Use case point analysis,
Function Point based model, Putnam’s Model, Experience-based Estimation, Estimation by
analogy and top-down approach, machine learning algorithm and heuristic algorithms are some of
the estimation techniques which has been used. In this section clear explanation about some of the
algorithms mentioned above will be addressed, the steps to be followed to estimate the cost of a
particular software project in software engineering economics are also addressed. Lastly, related
works or researches which used meta-heuristic algorithms as a technique to estimate the cost of
software and their gaps are presented.
2.2. Software cost estimation steps
In software engineering economics, there are four steps to be followed to estimate the cost of a
software project (Boehm, 1984). Each step provides an input to the next step. In effort estimation
software size is used as an input in combination with other attributes that affect the software effort
needed to develop software. To calculate the calendar time or schedule that software needed to
develop, software size and software effort is used as an input. The following are the task and steps
to be used to estimate the cost of a particular software project
1. Software size estimation
2. Software effort estimation
3. Software time estimation
4. Software dollar cost estimation
Page 24
10
2.2.1. Project Size estimation
Project size estimation is the first step in software engineering economics to calculate the software
project effort and is the most crucial activity in software management. Because, Subsequent work
of effort and schedule estimating is based on project size (Ochieng, Mwangi, & Mwgha, 2014).
The project size of a particular software project is required by the project manager to determine
the cost of software and to determine the number of peoples to be allocated for a particular project.
Lawrence H. Putnam LOC Estimation and function point analysis is used to estimate the line of
code.
2.2.1.1.Lawrence H. Putnam LOC Estimation
In this technique, the line of code is estimated by breaking down the system into smaller pieces
and estimating the SLOC of each piece (Ochieng, Mwangi, & Mwgha, 2014). In this model, for
each piece of software system smallest piece SLOC, most likely SLOC, and Largest possible
SLOC estimate is made by up to three to four experts. Then the expected SLOC for each piece of
the system is computed by using equation 2.1 (Ochieng, Mwangi, & Mwgha, 2014).
Ei =𝑎+4𝑚+𝑏
6 (2.1)
Where a, is Smallest possible SLOC, b is Largest possible SLOC and m is Most likely SLOC
respectively.
Then the expected software size for the whole software project is the total summation of each piece
SLOC of the system and computed by the following equations
𝐸 = ∑ 𝐸𝑖𝑛𝑖=1 (2.2)
Where n is the total number of piece in the entire system.
2.2.1.2.Function point Analysis
In function point analysis techniques the software size is estimated based of the standard units.
By counting the number of external (inputs, outputs, inquiries, and interfaces) that make up the
system (Ochieng, Mwangi, & Mwgha, 2014). From the external inputs, input files, tables, forms,
screens and messages of the system will be counted as a factor of software size. From the external
inquiries I/O inquiries which requires a response like prompts, interrupts calls, etc…is counted.
Page 25
11
Libraries or programs which are passed into and out of the system is also considered as a factor of
software size. To estimate the software size of a project the following steeps is taking place
(Ochieng, Mwangi, & Mwgha, 2014).
1. Count or estimate all the occurrences of each type of external (inputs, outputs, inquiries,
and interfaces).
2. Assign each occurrence a complexity weight
3. Multiply each occurrence by its complexity weight, and total the results to obtain a function
count
4. Multiply the function count by a value adjustment multiplier (VAM) to obtain the function
point count. VAM = ∑ vi× 0.01+0.065
The multiplicity of each occurrence by its complexity weight is calculated using the following
table.
Description Low medium High
Externals inputs 3 4 6
Externals outputs 4 5 7
external inquiries 3 4 6
external interfaces 5 7 10
internal data files 7 10 15
Table 2. 1. Complexity weight (Ochieng, Mwangi, & Mwgha, 2014)
2.3. Software cost and effort estimation techniques
Software effort and cost estimation techniques are broadly classified into Algorithmic and non-
algorithmic techniques. The algorithmic model uses multiple cost factors in a mathematical
formula to estimate the effort. Constructive Cost Model (COCOMO), software Life Cycle
Management (SLIM) model, Function Point based model, use case point analysis,
and Putnam’s Model (Shekhar & Kumar, 2016) are some of the algorithmic techniques.
Page 26
12
2.3.1. Constructive Cost Model(COCOMO)
One of the most important, well documented and widely used algorithmic model which was
proposed based on the study of 63 projects by Barry Boehm in 1981 is Constructive Cost Model
(Boehm, 1984). This model estimate the software cost and effort using the size of software and
other cost driving factors. Basically the model has three variants named basic Constructive cost
model, Intermediate Constructive cost model and Advanced Constructive cost model. In
COCOMO model the Code size is represent in Line of code (LOC) or thousand Line of Code
(KLOC) and effort is measured in terms of person-month.
A). Basic COCOMO Model
The basic Constructive cost model uses only program size estimated by Line of code or from
function point Analysis, multiplicative constant A and exponential constant B to estimate the
effort of software project. The basic COCOMO model is used for quick calculation. The
estimation accuracy level is very low since many cost-factor of software is not considered and it
is the simplest COCOMO type to use. The model has three class of project:-Organic, semi-
detached and Embedded. Its classification is primarily depend on the size of the project and also
depend on the complexity of project, experience of developer and requirement type. The Basic-
COCOMO model and its projects size is shown in the following table
Model Name Project size
Organic Less than 50 KLOC
Semi-detached 50-300 KLOC
Embedded Over 300 KLOC
Table 2. 2. Basic-COCOMO model types and its project size (Boehm, 1984)
(1). Organic: - this software has a small team to develop the project and requirements are clearly
identified and also the problem is understood very well, and are solved before.
(2). Semi-detached: this class of software has project requirements which is difficult to solve,
project size is more complex than organic class of project.
Page 27
13
(3). Embedded: - embedded class of software is for embedded system, where highest level of
creativity and large team size is required. The value of the constant coefficient A and B is shown
in table 2.3. In all class of this software, the effort of software is calculated using equation 2.3.
Effort = A (KLOC)B (2.3)
Basic COCOMO
projects
A B
Organic 2.4 1.05
Semi-detached 3.0 1.12
Embedded 3.6 1.20
Table 2. 3. . Basic COCOMO projects Coefficient value (Boehm, 1984)
B). Intermediate COCOMO model
Intermediate COCOMO model includes cost drives beside the line of code used in the Basic
COCOMO Model. Cost drives include product attributes, personnel attributes, hardware
attributes, and project attributes. Therefore the estimated cost and effort is the combination of
line of code and this cost drives. In the intermediate COCOMO model, nominal effort estimation
is calculated using the power function of A and B with the value being slightly different from
that of the basic COCOMO (Leung & Fan, 2013). The cost factors has a value ranging from 0.7
to 1.66 and the estimated effort is calculated using equation 2.4.
Effort = A ∗ (KLOC)B ∗ EMF (2.4)
Where EMF is the product of all the cost factors
Page 28
14
Intermediate COCOMO class of
project
A B
Organic 3.2 1.05
Semi-detached 3.0 1.12
Embedded 2.8 1.20
Table 2. 4. Coefficients value in Intermediate Model (Boehm, 1984)
Effort
multipliers
Code
Multipliers
name
Rating
Personnel
Attributes
Very
low
Low Nominal High Very high Extra
high
ACAP analyst
capability
1.46 1.19 1.00 0.86 0.71
AEXP application
experience
1.29 1.13 1.00 0.91 0.82
PCAP programmer
capability
1.42 1.17 1.00 0.86 0.70
VEXP virtual
machine
experience
1.21 1.10 1.00 0.90 -
LEXP language
experience
1.14 1.07 1.00 0.95 -
Project
attributes
MODP modern
programming
practice
1.24 1.10 1.00 0.91 0.82
TOOL software
tools
1.24 1.10 1.00 0.91 0.83
SCED development
schedule
1.23 1.08 1.00 1.04 1.10
Product
attributes
RELY required
software
reliability
0.75 0.88 1.00 1.15 1.40
Page 29
15
DATA database size - 0.94 1.00 1.08 1.16
CPLX product
complexity
0.70 0.85 1.00 1.15 1.30 1.65
Computer
attributes
TIME execution
time
constraint
- - 1.00 1.11 1.30 1.66
STOR main storage
constraint
- - 1.00 1.06 1.21 1.56
VIRT virtual
machine
volatility
- 0.87 1.00 1.15 1.30
TURN computer
turnaround
time
- 0.87 1.00 1.07 1.15
Table 2. 5. . Cost factor and their weight in intermediate COCOMO (Salijoughinejad & Khatibi, 2018)
C). Detailed COCOMO Model
Both the basic and intermediate COCOMO model estimate the software effort at the system level,
which means effort estimation is calculated by considering the software product as a single
homogenous entity. But the fact is most large software project is made up of much smaller
subsystem. From this smaller subsystem, some of them may require little innovation, small team,
the requirement is clearly defined. And the other subsystem may require to build within a set of
tight hardware and software. So the weighting value for each cost factor throughout the process
should not be the same. Because this will make a variation during the cost, effort, and time
estimation process.
To solve this problem the detailed COCOMO model estimates the software effort by analyzing the
effect of each cost factor in each phase of software development. Compute effort as a function of
software program size and a set of cost drivers weighted according to each phase of the software
development lifecycle (Glinz & Mukhija, 2003). The detailed COCOMO is for large system that
contain non-homogenous subsystem (Leung & Fan, 2013). The phases of software development
to estimate the effort of software in Detailed COCOMO model are: Requirement design and
product design (RPD), detailed design (DD), code and unit testing (CUT) and integration and test
Page 30
16
(IT). Estimated effort of each module of the software gives the effort of subsystem and the
combination of all the effort of subsystem eventually gives the effort of the whole system. The
rating scale for each cost derivers in the four phases of detailed COCOMO model is represented
in table 2.6.
Rating RPD DD CUT IT
Very low 1.80 1.35 1.35 1.50
Low 0.85 0.85 0.85 1.20
Nominal 1.00 1.00 1.00 1.00
High 0.75 0.90 0.90 0.85
Very high 0.55 0.75 0.75 0.70
Table 2. 6. Effort multiplier rating scale and its value for detailed cocomo model (Glinz & Mukhija, 2003)
2.3.2. COCOMO II Model
The COCOMO model was developed based on the waterfall software development process model.
To incorporate modern software development process model, COCOMO II developed. The
COCOMO II model can be applied to calculate the effort of software project that uses incremental,
iterative, or spiral model as a development process model or when reengineering is required. The
effort of a project is calculated either in the early Design phase or Post-architecture using equation
2.5. Effort is measured in terms of Person-Month (PM). Person Month is the amount of time that
one person working on the software project development for one month (Abts, Brown, & etal,
2000).
PM = A × SizeE × πi=1n EMi (2.5)
Where,𝐸 = 𝐵 + 0.01 × ∑ SF5𝑗=0 n represents the number of effort multipliers in the early design
or Post-architecture, n is 17 for Post- Architecture model, and 7 for Early Design model. SF
represents the five scale factors in COCOMO II. A and B are constants whose value is derived
from 161 software projects. EM is the product of 17 effort multipliers. In the COCOMO II, there
Page 31
17
are five scale factors namely precedentedness (PREC), Development Flexibility (FLEX), Risk
Resolution (RESL), Team Cohesion (TEAM), and Process Maturity (PMAT). Table 2.7 represent
the COCOMO II effort multipliers and associated value.
Scale
Factors
Very low Low Nominal High Very High Extra High
RELY 0.82 0.92 1.00 1.10 1.26
DATA 0.90 1.00 1.14 1.28
CPLX 0.73 0.87 1.00 1.17 1.34 1.74
RUSE 0.95 1.00 1.07 1.15 1.24
DOCU 0.81 0.91 1.00 1.11 1.23
TIME 1.00 1.11 1.29 1.63
STOR 1.00 1.05 1.17 1.46
PVOL 0.87 1.00 1.15 1.30
ACAP 1.42 1.19 1.00 0.85 0.71
PCAP 1.34 1.15 1.00 0.88 0.76
PCON 1.29 1.12 1.00 0.90 0.81
APEX 1.22 1.10 1.00 0.88 0.81
PLEX 1.19 1.09 1.00 0.91 0.85
LTEX 1.20 1.09 1.00 0.91 0.84
TOOL 1.17 1.09 1.00 0.90 0.78
SITE 1.22 1.09 1.00 0.93 0.86
SCED 1.43 1.14 1.00 1.00 1.00
Table 2. 7. COCOMO II effort multipliers (Singal, Kumari, & Sharma, 2020)
Page 32
18
2.3.3. SLIM Model
Software Life Cycle Management is one the algorithmic techniques which used for large projects.
It is based on the Norden / Rayleigh function and generally known as a macro estimation model
(Jamil, 2007). It is one of the first algorithmic cost and empirical software estimation model
(Keshta, 2017). In this algorithm both the software effort and time which is needed to develop a
software project is described. The software effort is calculated using equation 2.6
𝐸𝑓𝑓𝑜𝑟𝑡 = [𝑆𝑖𝑧𝑒
(𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑣𝑖𝑡𝑦∗𝑇𝑖𝑚𝑒)(4
3)] ∗ B (2.6)
Where Size is the estimated size of the software product, productivity is the productivity of the
organizational process
2.3.4. Experience based Estimation:
It is the most frequently estimation techniques for software projects and used when gathering
requirements and data is difficult. (Borade & Khalkar, 2013). The estimation is computed from
the experience of peoples in the area.
2.3.5. Estimation by analogy
This estimation measurement is the process of driving a solution by finding similar works which
is done previously and apply that solution in a new problem domain. The analogy technique is
very similar to the experience based estimation but merely uses information and experience gained
form previous projects. The experience based estimation is a human intensive approach and
estimation by analogy is data intensive approach based on one or more specified potential
analogous projects (Keung, 2009).
2.3.6. Top down and bottom up approach
The total effort estimate is based on properties of the project as a whole and distributed over project
activities (top down approach) or calculated as the sum of the project activity estimates (bottom-
up) (Jørgensen, 2004). The algorithmic methodology and experienced based estimation techniques
have limitation in predicting the accurate effort value of software projects. Currently, the meta-
heuristic algorithms are widely used and produce better result to estimate the effort and cost of
software projects.
Page 33
19
2.4. MetaHeuristics Algorithms
MetaHeuristics algorithms are Algorithmic structure and computational intelligence paradigm
which is designed to find an optimum solution for NP-hard problems ( Sörensen & Glover, 2017).
In any field, some problems need to be minimized or maximized. These kinds of problems are
challenging and not suitable to be solved by traditional machine learning algorithms.
MetaHeuristics algorithms have a robust searching mechanism that can extract useful information
from incomplete information. In the MetaHeuristics algorithm, an optimum solution is achieved
due to exploration and exploitation. The exploration operation allows the algorithm to find the
local optimum solution and the exploitation operation is responsible for searching new areas that
help the algorithm to achieve the global optimum solution. Different researchers and academic
professionals classified this algorithm in different ways. Some of them classified as nature-inspired
and non-nature inspired MetaHeuristics algorithms. And others classified as trajectory-based and
population-based MetaHeuristics (Abdel-Basset, Abdel-Fatah, & Sangaiah, 2018). The following
algorithms are some of the MetaHeuristics algorithm which have been used to find optimum
solution in software cost estimation.
2.4.1. Ant Colony Optimization (ACO)
Ant Colony optimization algorithm is a heuristic algorithm that is inspired by the social behavior
of Ante colonies and originally introduced by Marco Dorigo and colleagues in the 90s (Dorjio,
1992). Ants’ foraging behavior, the way how the ants find the food source, and how they return
back to the ant nest is the inspiration for the founder of ACO (Blum, 2005). This forging behavior
of ants exploited in artificial ant colonies to find a better solution for continuous and discrete
problems that need an optimum solution. The movement of this artificial ant is controlled by the
goal of searching optimum solution for the given ACO problems. Once the colony optimization
problem is given, the first step will be generating artificial colonies that are part of the solution
component and can represent the given problem. Then this generated artificial ants try searching
solution in their colonies. In the next move, the ant deposit or define organic compound called
pheromone value through which the ant colonies communicate with each other. As soon as the
first ant find a solution, the pheromone value will be updated so that the other ant will take the
shortest path based on the concentration level of the space in the search space. The higher the
Page 34
20
pheromone value of space, has a higher probability to be chosen as the shortest path. The basic
working of Ant colony optimization algorithm is graphically shown in figure 2.1.
Figure 2. 1. The basic working of ant colony optimization algorithm (Blum, 2005).
In software project development, resources are limited and the project should be completed with
minimum cost or affordable software cost. But usually, projects are completed beyond their
scheduled cost and time. As a MetaHeuristics algorithm, the ant colony optimization algorithm
has been used by the researcher to find an optimum solution which can reduce overestimation and
underestimation. This algorithm was widely used to optimize the coefficient parameter value of
the COCOMO model and better result achieved compared with the current COCOMO model.
2.4.2. Particle Swarm Optimization (PSO)
Particle swarm optimization (PSO) is one of the most powerful and latest evolutionary
optimization techniques developed by Eberhart and Kennedy in 1995 (Kennedy & Eberhart, 1995)
and inspired by the social behavior of birds flocking or fish schooling. The algorithm is initialized
with a population of a random solution and search for a potential solution (called as particles) by
updating the velocity and the position of each particle in each generation. Each particle is defined
by position and velocity. The particles, which are the potential solution in the PSO algorithm, are
dispersed at various points in the solution space and the position of each particle update their
position according to its previous best position and the global best position. In addition to the
Page 35
21
position, each particle has a velocity that describes the movement of a particle in the sense of
direction and distance.
Every particle has a memory of its own best position and has a global experience among the
member of the swarm. Once the position, velocity, and personal best are defined for each particle,
this position and velocity are updated for every iteration after calculating their fitness value. If the
fitness function is intended to find the global minimum, the least fitness value among the swarm
will be assigned as a global best position so that all the particles will be moved into this global best
position. And if the fitness function is aimed at finding the global maximum solution, the largest
fitness value among the swarm will be assigned as a global best position. The position and velocity
of each particle in each iteration will be updated by equation 2.7 and 2.8 (Sengupta, Basak, &
Peters, 2018).
V(t + 1) = 𝑤 ∗ 𝑉(𝑡) + 𝑟1𝑐1(Xpbest − X(t)) + r2c2(Xgbest − X(t)) (2.7)
X(t + 1) = 𝑥(𝑡) + 𝑉(𝑡 + 1) (2.8)
Where variables in this two equations are
V(t+1) the velocity of the swarm particle at t+1 timestep
V (t) The velocity of the particle at time t
X(t) The particle position at time t
W Inertial weight
Xi (t+1) the position of the swarm particle at t+1 timestep
c1 and c2 Learning factor or accelerating factor(cognition and social
acceleration coefficients,)
r1 and r2 Uniformly distributed random number between 0 and 1
Xpbest Particle best position
Page 36
22
Xgbest Global best position
Table 2. 8. Parameters of PSO
Figure 2. 2. Graphical representation of PSO
Where, xi (t) is the initial position of particles, xi (t+1) represent the next position, Vi (t) and Vi
(t+1) is initial and the next velocity of particles respectively. Pi(t) and g(t) is the personal and
global best value of the swarm particles respectively.
Cognitive (personal) and Social (global) Acceleration Coefficients (c1 and c2)
The personal acceleration coefficients (c1) is responsible to control the particles’ acceleration
towards the personal best position. And the global acceleration coefficients’ control the particles’
acceleration towards the global best position. These acceleration coefficients are weights that
measure how much a particle should weigh moving towards its cognitive attractor (PBest) or its
social attractor (GBest) (Sengupta, Basak, & Peters, 2018).
Page 37
23
PSO pseudocode
The PSO algorithm pseudocode is:
Input: Randomly initialized position and velocity of particles: Xi (0) and VI (0)
Output: position of the approximate global minimum X*
1: while terminating condition is not reached do
2: for i=1 to numbers of particles do
3: calculate the fitness function f
4: update personal best and global best of each particles
5: update the velocity of the particle using second equation
6: update the position of the particle using first equation
7: end for
8: end while
Page 38
24
start
Generate random particle
Evaluate the fitness function
Evaluate particles personal best
position
Evaluate particles global best position
Update the velocity of the particle
Update the position of particle
Maximum iteration reached?
Yes Return result
No
Figure 2. 3. Flowchart diagram for PSO
Page 39
25
2.4.3. Genetics Algorithm
Genetic algorithm is a meta-heuristics algorithm type that has been commonly used for generating
high-quality solutions for optimization and search problem. John Holland discovers the genetic
algorithm in 1960 based on the concept of the Darwin’s theory of evolution and further described
by Goldberg (Goldbreg, 1988). In genetic algorithm a solution is called chromosomes that contain
a set of gene. In the GA The evolution usually start with generating an initial population containing
N chromosomes from a randomly generated individual and then the chromosomes are evaluated
based on their fitness function value (the value of objective function in the optimization problem
being solved), where the chromosome which has a better fitness value or performance are more
likely to be selected for the next population. Fitness function is the function that the genetics
algorithm trying to optimize (Carr, 2014). Finally the algorithm terminate when either a maximum
number of generation is generated, or a satisfactory fitness level has been reached from the
population. Basic steps or types of operation in genetics algorithm is shown in Figure 2.4.
Page 40
26
start
Initialize the
population
Evaluate objective
function
Result
achieved? Yes Return the result
Selection
Crossover
Mutation
Figure 2. 4. Basic operation steep in genetics algorithm.
The evolution of genetics algorithm starts with generating random population. The number of
population to be generated is depend on the problem to be solved. The next step is evaluating the
objective function which represent the problem being solved. The accuracy of the result is directly
depend on the kind of fitness function we used to represent the problem. The objective function
should clearly represent the real world problem to be solved. In this step optimum solution is
generated and return. If the problem requires to find global maximum solution, the objective
function return chromosomes which has highest fitness value and if the problem to be solved is
intended to find global minimum solution, the fitness function return chromosomes which has the
Page 41
27
list fitness value. If the returned fitness value is the highest which satisfy the problem, the process
will terminate in the first generation and result will be returned. If not the next genetic algorithm
operation called selection will start. Selection is the process of selecting two parents from the
population for crossing (Saini, 2017). Selection operation is performed based on the objective
function value. Chromosomes which has better fitness value will be selected in this process and
passes to the next generation and Chromosomes which has less fitness value will be discard from
the population. The most widely used selection method is Roulette Wheel Selection, Rank
Selection, Tournament Selection, and Boltzmann Selection (Saini, 2017).
The second steep in genetics algorithm is crossover where the selected parents are recombined
together to find better fitness function value. The main role is to provide mixing of the solutions
and convergence in a subspace (Yang, 2014). In this steep there are commonly three operators: -
single point crossover, multipoint crossover and uniform crossover to be selected based on the
fitness function of the problem. The following figures shows single point crossover, two point
crossover and uniform crossover respectively in genetics algorithm.
Page 42
28
Figure 2. 5. Single point crossover, two point crossover and uniform crossover
Sometimes the entire population may have the same allele where crossover operator can’t change
and the solution remains the same and low. To overcome such problems mutation operator is added
in genetics algorithm where parts of a solution change randomly to increase the diversity of the
population and to explore the entire search space (Yang, 2014). In this operation new offspring is
produced from a single parent. The following schematic representation shows a single point
mutations.
Original gene (Before mutations)
1 0 1 0 0 1 1 0
New gene (After mutation)
1 0 1 0 1 1 1 0
Genetics Algorism (GA) have been used to solve problems which have no deterministic solution.
From software engineering perspective, Genetics Algorithm have been used to solve problem like
Page 43
29
software cost estimation, task scheduling, clustering, natural language processing, query
optimization, image processing (Sharma, 2017). (Maleki, Ghaffar, & Masdari, 2014) Used genetic
algorithm in combination with Ant Colony optimization algorithm to optimize the effective factor
weight using NASA software projects dataset. Another author (Sachan, Nigam, Singh, & et al,
2016) used Genetics Algorithm to optimize the parameter value of COCOMO model. Author
(Omara & Arafa, 2010) used genetics Algorithm to minimize the total execution time, load balance
satisfaction, and to overcome communication overhead problem.
2.4.4. Hybridization of Meta-heuristic Algorithm
Non-linear problems which are non-deterministic in nature have been solved using Meta-heuristic
algorithm and promising result achieved. But we can’t certain that satisfactory results achieved
every time. And recently researchers are using the Meta-heuristic Algorithm in combination to
achieve more promising result. While using two meta-heuristic algorithms together the approach
involves using the two approaches sequentially, in parallel or using the operation of one algorithm
in the other algorithm framework (Sengupta, Basak, & Peters, 2018).
2.5. Related works
According to the research of (Sachan, Nigam, Singh, & et al, 2016), the researchers proposed a
simplified genetic algorithmic model to optimize the parameter of the basic COCOMO model and
they found better realistic estimation over the basic COCOMO model. In their research, they used
crossover and selection operator for calculating a new value of A and B and the model with the
optimal set of parameter A and B gives an improved estimation compared to Basic COCOMO. So
it is possible to say, method based on the genetics algorithm is better than the algorithmic
Constructive cost model. The drawback of this study is, the research is based on a very small
dataset and considered only the size of a particular software project to estimate the effort of
software.
In research (Rijwani & Jain, 2016) Multi Layered Feed Forward Artificial Neural Network
Technique was used to estimate the effort of software project. The model trained with 23 inputs
and a hidden layers using back propagation algorithms and tested with a randomly selected 13
COCOMO software project dataset. MRE and MMRE were used to evaluate the result of the
Page 44
30
experiment. The result showed significant reduction of relative errors and better achievement
compared with COCOMO model
In study (Algabri, Saeed, Mathkour, & et al, 2015), the researchers used genetics algorithm as a
technique for tuning parameters of COCOMO model to predicate the software cost estimation
more accurately. In their experiment initial populations of 100 individual was generated and the
experiment were performed using 93 NASA projects dataset. After 1000 iterations they found New
COCOMO model coefficients for each class of project; Organic model, semi-detached model and
embedded model and the result showed more realistic development time comparing to the real
development time.
In study (Salijoughinejad & Khatibi, 2018), a hybrid algorithms were used to increase the accuracy
of cost estimation by enhancing the COCOMO model. In this study, improvement of the
COCOMO model were done by effective selection of coefficients and reconstruction of the
COCOMO model cost drives value. The authors used GA, PSO, invasive weed optimization
algorithm (IWO), and a combination of PSO and IWO to find the optimum value of coefficients
and cost drives. Their experiment result were divided in to three section for organic, semi-detached
and Embedded Model of COCOMO. The MMRE used to evaluate the experiment result. Their
result shows: - for the organic mode of project, newly optimized coefficient value with the
COCOMO model cost drives value was found better to enhance the cost estimation accuracy using
the hybrid IWO and PSO. For the semi-detached mode of project, the COCOMO model cost drives
value was reconstructed and a new coefficient value found which can increase the estimation
accuracy. For the Embedded mode of project, new cost drives value with new coefficients value
was generated to enhance the estimation level. Generally, the authors achieved better result
compared with the current COCOMO model.
Authors of (Singh, Singh, & Mishra, 2018) suggested Environmental Adaption method for
estimating software development cost. In this paper IEAM-RP were used for tuning the parameter
of Sheta model of software cost estimation and the result produced by IEAM-RP have shown better
result compared to other existing techniques.
A model based on Genetic Algorithm combined with Bat Algorithm (Dizaj & Gharehchopogh,
2018) is used to predict the software cost estimation. In this research the authors provided a new
Page 45
31
method by considering the effect of qualitative factors with false variables in the relation
concerning the total estimation of the cost. The proposed method was investigated and assessed
on four different dataset based on seven criteria. Their experimental result showed that the
proposed method improved the accuracy in the Software Cost Estimation by reducing errors value
in comparison with COCOMO model.
In study (Nadal & Sangwan, 2018) a hybrid Improved Bat Algorithm and Gravitational search
algorithm (BATGSA) were used to optimize the COCOMO model. The Improved Bat algorithm
used for exploration and the gravitational search algorithm used for Exploitation. The result found
from the Bat Algorithm further improved by the gravitational search algorithm. The proposed
algorithm tested on four different NASA dataset and compared with three state of the art
techniques. The authors could reduce errors ranges from 2% to 10%.
Thanh (Le & Khuat,, 2016) used a Directed Artificial Bee Colony Algorithm to tune the parameter
value of COCOMO model based on the past actual effort provided in the dataset. The experiment
result based on NASA dataset improved the accuracy of effort estimation compared with the
COCOMO model. To evaluate the result, MMRE, MdMRE, PRED (25), and MAR were used and
the result was improved in all evaluation criteria.
In (Maleki, Ghaffar, & Masdari, 2014) has used genetics algorithm and ant colony optimization
for estimation of the software cost. In this research work MMRE is used as a fitness function. The
GA used to test the suitable value of cost factors according to the project size and they used ACO
to train more optimize factors. Finally, they achieved better result compared with COCOMO. The
basic limitation in this study is the methodology they used. The authors tried to adjust the effort
multipliers value based on the size of the software just to make their estimated effort is more closer
to the actual effort. This approach is not appropriate because we can’t determine the reliability,
availability, and other non-functional attributes of the software by looking the software size from
the dataset. We have to use the value of the effort multipliers (non-functional attributes) which is
provided from the dataset to get more optimized result.
Authors (Alajlan & Tagoug, 2016) used genetics algorithm to tune the parameter value of
COCOMO II model coefficients to estimate the effort and development time using NASA 93
dataset and the optimized coefficient value of A and B produce more accurate result than the
Page 46
32
current COCOMO II model. However, the achieved result is not satisfactory when compared with
the actual effort.
Prerna (Singal, Kumari, & Sharma, 2020) used Differential Evolution Algorithm to improve the
parameter values of COCOMO and COCOMOII model. The authors applied three successful
mutation strategies to find the value of the coefficients and MRE was used as evaluation criteria.
The result were investigated using Promise repository dataset (NASA 93 and COCOMO81) and
better effort estimate were provided.
2.5.1. Summary of related works
NO Authors Research Titles Algorithm used Research gap
1 (Sachan, Nigam,
Singh, & et al,
2016)
Optimizing Basic COCOMO
Model using Simplified Genetic
Algorithm
Genetic
Algorithm
They used only Basic
COCOMO and didn’t consider
the effort factors.
Lack of estimation accuracy
2 (Algabri, Saeed,
Mathkour, & et
al, 2015)
Optimization of Soft Cost
Estimation using Genetic
Algorithm for NASA Software
Projects
Genetics
Algorithm
The researchers only improved
parameter value for basic and
semi-detached class of
COCOMO projects
Their accuracy For embedded
class of project is not
satisfactory
3 (Maleki, Ghaffar,
& Masdari,
2014)
A New Approach for Software
Cost Estimation with Genetic
Algorithm and Ant
Colony Optimization
GA and ACO The authors tried to test the
suitable value of Effort
multipliers by using the
software size (line of code)
rather than using the scale
Page 47
33
value of EM provided on the
dataset
The New effort multipliers
value provided by the authors
is not valid with other dataset.
4 (Nadal &
Sangwan, 2018)
Software Cost Estimation by
Optimizing COCOMO Model
Using Hybrid BATGSA
Algorithm
Improved Bat
Algorithm and
Gravitational
search
algorithm
The algorithm used for
exploration and exploitation in
their methodology couldn’t
produce satisfactory result.
5 (Singal, Kumari,
& Sharma, 2020)
Estimation of software
development effort: A Differential
Evolution Approach
A differential
evolution
approach
The experimental result could
improve the original
COCOMO. But there is higher
relative errors to the actual
effort of projects
Table 2. 9. Summary of related works
As a conclusion, from the above related works, most of the research works focused on only to the
software size (Basic COCOMO) as a factor to estimate the effort of software project. And in some
research works, effort multipliers (Intermediate COCOMO) are used in combination to the
software size to estimate the effort of software project. But the result is not satisfactory and have
larger relative errors to the actual effort. Which means, the exploration and exploitation nature of
the algorithm used in their research couldn’t find the optimal value of A and B. So, in this research
the strength of the PSO for exploration and GA for exploitation is used to find global optimum
value of the intermediate COCOMO coefficients. So that, the estimation capability of the
COCOMO model is improved since the model is predominantly depend of the complexity factors
of A and B.
Page 48
34
CHAPTER THREE
DESIGN OF METHODOLOGY
3.1. Introduction
Software project effort is the number of manpower required to complete a particular software
project. In the intermediate COCOMO model, the software project effort is calculated using the
software size and effort multipliers as a major input and measured in terms of person/month. The
effort multipliers are project, personnel, product, and computer attributes of software that directly
affect the effort and cost of the software. In the COCOMO model, these attributes are called effort
multipliers or cost factors. There are 15 attributes in the COCOMO model to calculate the effort
of a software project. To adjust the relation between cost factors and software project effort, there
are two constant called multiplicative constant (A) and exponential constant (B). The value of A
and B is derived from 63 project historical characteristics and their value is different and constant
in each mode of intermediate COCOMO model. A software project effort is calculated in the
following formula (Maleki, Ghaffar, & Masdari, 2014).
Effort = A × (KLOC)B × EMF (3.1)
Where, A and B are multiplicative and exponential constant respectively. KLOC is the size of a
particular software project and EMF is the product of all the fifteen effort multipliers or cost factors
given by the formula.
EMF = EM1 × EM2 × EM3. . .× EM15 (3.2)
It is difficult to calculate and estimate effort of a software project using software size and effort
multipliers provided in the COCOMO model because the relation between cost factors and effort
is not linear. So, the multiplicative and exponential constant value should be optimum. The
accuracy of effort, cost, and schedule estimation can be increased by finding an appropriate value
of A and B. Hence, the multiplicative and exponential constant values need to be optimal. In this
thesis, the major objective task is to find the optimum value of A and B using particle swarm
optimization and genetic algorithm. Our proposed approach consists of two steps. In the first step,
local optimum solutions are generated and in the second step, the optimized coefficient parameter
values are generated. Our focus is optimizing the parameter values of intermediate COCOMO
Page 49
35
multiplicative constant A and exponential constant B using the PSO_GA. In the training phase,
the inputs are the actual effort, KLOC and other 15 attributes of the project and the outputs are the
optimized value of multiplicative constant A and exponential constant B. Then the testing process
will be executed using this new value of A and B.
3.2. Design of the proposed model
We hybrid two Meta-Heuristic algorithms, Particle Swarm Optimization and Genetic Algorithm.
In the particle swarm optimization Algorithm, every particle participates to find the best global
position by updating their personal best position until maximum criteria reached. So, there is much
probability in the PSO algorithm not to lose local optimal solution which helps the swarm particles
to reach the global optimum solution. PSO has a memory, so knowledge of good solutions is
retained by all the particles; whereas in GA, previous knowledge of the problem is discarded once
the population changes (Kao & Zahara, 2008). So that there is an opportunity to lose the optimum
solutions in the genetic algorithm and the genetic algorithm has opportunities to converge into
local minimum value before finding the global minimum. To solve this problem, we are going to
use the PSO algorithm to generate an initial solution and GA to optimize the value of coefficients
obtained by the PSO Algorithm.
Page 50
36
Figure 3. 1.The proposed PSO_GA model system architecture
The proposed model has three phases. In the first phase initial value of A and B is generated and
local optimum solution for A and B is produced by using PSO. In the second phase, the local
optimum value of A and B is optimized using GA. In the third phase the estimated effort of the
testing project is calculated using the new optimized value of A and B. finally estimated effort is
generated. The following steeps explain what is happening in each phase of the proposed model.
3.3. Generating Initial solution using Particle Swarm Optimization
The Particle swarm optimization algorithm produces the initial solution which helps the genetic
algorithm to produce optimized result. Figure 3.2 shows the steps to generate initial optimized
values of A and B.
Page 51
37
Train
dataset
PSO
Algorithm
Generate Initial values
of A and B
Calculate the fitness
value for each value of
A and B
Assign personal best
(Pbest) value for each
A and B
Is the current
fitness value of A
and B better than
Pbest?
YES
Assign current fitness
value as Pbest
NoKeep previous
Pbest value of
A and B
Assign best A and B s Pbest
value to Global best (Gbest)
Calculate the velocity for
each A and B
Update the value of each A
and B using the Velocity
Maximum
Iteration
Reached?YES
GA
NO
Figure 3. 2. Flow chart diagram for PSO
Page 52
38
3.3.1. Generate initial Random values and assign Pbest value of A and B
The swarm particles in the PSO algorithm are represented as A and B. The first step in the PSO
algorithm is generating random values of A and B which are part of the initial solution. These
solutions will be optimized through the process. The number of A and B to be generated are
depends on the problem and appropriate parameter selection in PSO and GA algorithm plays a
significant role in the performance and efficiency of estimating the effort of software. We
generated 1000 values for each A and B with an initial random position and velocity. These
initialized random values of A and B are candidate solutions to the problem. In each iteration of
the PSO algorithm, the optimized value of A and B will be generated from this candidate solution.
The candidate solution of A and B will be moved in a search space and evaluating their position
through a fitness function. In this step, we set the memory of the swarm and a randomize A and B
in the memory. The particles are given a random position. The personal best for each A and B is
the initial position. Once A and B are generated and their personal best is assigned, we set the
inertia weight value (w) as 1.2, and the acceleration coefficients c1 and c2 as 2.0. The value of w,
c1 and c2 is set to be 1.2 and 2.0 respectively since these value is the most widely used and
recommended parameter value in research paper ( Langsari & Sarno, 2017) (HE, MA, & ZHANG,
2016). Our model also produce better result with these value. The personal acceleration
coefficients (c1) is responsible to control the particles’ acceleration towards the personal best
position. And the global acceleration coefficients’ control the particles’ acceleration towards the
global best position. The inertia weight value describes the effect of the previous velocity on the
current velocity.
3.3.2. Calculate the fitness function
The fitness function determines how good each particle A and B position is in the multidimensional
space to the desired goal. So that it will help the algorithm to understand the next best step for each
A and B. The desired goal in this problem is minimizing the gap between the estimated effort
computed by using A and B and the actual effort provided in the dataset. In our proposed model
we are using the summation of the absolute difference between the estimated effort and actual
effort (Manhattan distance (MD)) as a fitness function. In this case, we are trying to get parameters
value which minimizes the output function. The Manhattan distance is computed using equation
3.3.
Page 53
39
MD = ∑ |Actual Effort − Estimated Effort|𝑛1=1 (3.3)
Where, n represent the number of input dataset used to build the model. If the personal best value
is larger than the newly calculated value named fitness candidate, the fitness candidate will be
assigned to personal best value and the fitness candidate position will be assigned to the personal
best position. And if the global best value is larger than the fitness candidate, the fitness candidate
will be assigned to the global best fitness value and the fitness candidate position will be assigned
to the global best position.
3.3.2. Calculate the Velocity and update the position of Particles A and B
The velocity of the particles determines where the particles is moving into and how fast that
particle is moving. In this phase, we are trying to calculate the best Location where each particle
is seating in. and we are also trying to figure out what is the best position inside the total particles
A and B. The new velocity and position of particles are calculated by the following formula.
Vi(t + 1) = 𝑤 ∗ 𝑉𝑖(𝑡) + 𝑐1 ∗ 𝑟1 ∗ (𝑃𝑏𝑒𝑠𝑡𝑖(𝑡) − 𝑋𝑖(𝑡)) + 𝑐2 ∗ 𝑟2 ∗ (𝐺𝑏𝑒𝑠𝑡(𝑡) − 𝑋𝑖(𝑡) (3.4)
Xi(t + 1) = 𝑋𝑖(𝑡) + 𝑉𝑖(𝑡 + 1) (3.5)
This step will be repeated until the iteration number reaches 1000. In each iteration, we have a
Global best position to be appended on an empty array. We consider this value in the empty array
as a local optimum solution. There as an opportunity where the global optimum solution appears
in this step and the genetics algorithm operation can’t change any more. When maximum iteration
reached the appended global best position will be passed to the genetics algorithm as initial
populations. Now the genetics algorithm has a local optimum solution generated by a swarm
optimization algorithm. The genetic algorithm does not waste time finding a new local optimum
solution using its objective function. The GA will try to get the global optimum solution using the
input gained from PSO as an input. This is where the particle swarm optimization algorithm ends
and the genetic algorithm starts processing. The parameters which affect the operation of the
genetics algorithm and its value that we used in the proposed model are presented in table 3.1.
Page 54
40
Parameters Value
Number of particles A and B 1000
Number of generation 1000
Fitness function MD
C1 2.0
C2 2.0
W 1.2
r1 and r2 Random number between 0 and 1
Table 3. 1. Parameter of PSO and its value
3.4. Optimizing the Coefficient value of A and B using the Genetic Algorithm
Figure 3. 3. Section of genetics Algorithm operation for the proposed methodology
Page 55
41
3.4.1. Calculate fitness value
The genetic algorithm received an initial solution for parameters A and B from the particle swarm
optimization algorithm. The genetic algorithm will start its work by calculating the fitness value
of the population received using MD as a fitness function. The problem that we are trying to solve
is a minimization problem so that the objective function returns the population (A and B) which
has a minimum fitness value. This value will be used as an input to the selection process.
3.4.2. Selection
In this step, best individual values of A and B which has minimum fitness value are selected as
parents for producing offspring to the next generation. While selecting the best parents, we used a
fitness-based selection mechanism.
3.4.3. Crossover
Recombination is used to combine the genetic information of two parents to form new offspring
to ensure newly created individuals are more likely to be better than the parents. In this process,
we used a one-point crossover, because we have only two objective variables to be optimized.
3.4.4. Mutations
To add variants from one generation of a population of chromosomes to the next, we used a
mutation rate of 0.02. The value is selected based on reviewing literature works which achieved
better result with the same population size, crossover type and rate value (Hassanat ,
Almohammadi, Alkafaween, & et, al, 2019). The value to be used in rating of crossover and
mutation are an important aspect in the designing of a GA. Mutation operations prevent the
population of chromosomes from becoming very similar in each generation and can come to a
better solution. The parameter and its value which we used in genetics algorithm is presented in
the table 3.2.
Parameter Value
Number of population 1000
Number of generation 100
Crossover Single point crossover
Page 56
42
Fitness function MD
Mutation rate 0.02
Selection Fitness based selection
Table 3. 2. Parameters and its value of Genetic Algorithm
CHAPTER FOUR
EXIPERMENTAL RESULT AND DISCUSSION
4.1. Introduction
In this chapter, the experimental result and its interpretation using experimental evaluation criteria
are discussed. The optimized parameter value for organic, semi-detached, and embedded class of
intermediate COCOMO model is generated. The result of the experiment is evaluated using five
different software cost estimation evaluation criteria. Using these new optimized parameter value,
the estimated effort for each class of project in each mode of intermediate COCOMO is presented.
The effort estimation comparison between our proposed model, GA model, and PSO model also
discussed. The kind of dataset we used, its attribute description, simulation environment, and
evaluation metrics is also discussed.
4.2. Dataset Description
The dataset for this research is from COCOMO NASA dataset which was collected by Jairus
Highn (Promise software engineering Repository, 2005).The dataset contain 60, 63 and 93
different software projects and each project has 17 attributes. We have used 70% of the dataset for
optimizing the parameter value of the COCOMO model and 30% for testing the model. There is
three class of project named organic, semi-detached and embedded which is divided based on the
line of code, programmer experience and flexibility of requirements. For all this class of project
we used different training and testing dataset. The dataset that we used for this research have been
used by many researchers including Research work (Maleki, Ghaffar, & Masdari, 2014) ( Langsari
& Sarno, 2017) (Rohit Kumar Sachan, 2016) (Singh, Singh, & Mishra, 2018) (Algabri, Saeed,
Mathkour, & et al, 2015).
The dataset attributes is composed from four class of attributes named product attributes, hardware
attributes, personnel attributes and project attributes. In COCOMO model these attributes are
Page 57
43
called effort multiplier or cost drivers which directly affect the cost of a particular software project.
In product attributes there are four effort multipliers, in computer attributes there are four cost
drivers, five cost drivers are included in personal attributes and three effort multipliers are included
in project attributes. The remaining attributes is the software size estimated by line of code which
largely determined the effort of a software project. And the last attribute is the actual effort that a
particular software project needed to complete the project. Table 4.1 shows the 17 attributes of
dataset.
Attributes class Variables
Attribute name Code
Product attributes
required software reliability RELY
data base size DATA
process complexity CPLX
Computer attributes
time constraint for CPU TIME
main memory constraint STOR
machine volatility VIRT
turnaround time TURN
Personal attributes
analysts capability ACAP
application experience AEXP
programmers capability PCAP
virtual machine experience VEXP
language experience LEXP
Project attributes
modern programing practices MODP
use of software tools TOOL
schedule constraint SCED
Project size in LOC Line of code LOC
Page 58
44
Actual effort of project Actual effort ACT_EFFORT
Table 4. 1. Dataset attribute class, name and its code
The dataset we used has a nominal representation from very low to Extra High. We converted this
nominal data to its equivalent numerical weight using table 2.5. Since the dataset represent both
organic, semidetached and embedded mode of project, the dataset were classified to the
corresponding class of project using table 2.2.
4.3. Simulation environment
We used Anaconda Environment which is an open-source distribution of python which contains
python modules and packages for scientific computing. Pandas is used as a software library for
data manipulation and analysis and Numpy for controlling multi-dimensional array input datasets
which are used to calculate the fitness function value of individual particle or population in particle
swarm optimization and genetics algorithm respectively. The experimental environment is
installed on Intel(R) Core (TM) i5-3230M CPU @2.60 GHz and 8GB RAM.
4.4. Experiment results
4.4.1. Experimental Result for Organic Model on NASA60 dataset
Intermediate organic COCOMO model is a mode of a project that includes a software project that
has software size less than 50 KLOC. In the COCOMO model, the Value of A and B is 3.2 and
1.05 respectively. Using the two meta-heuristic algorithms in combination we obtained the
following newly optimized coefficient value of A and B for the Intermediate organic COCOMO
model. A=4.0026; B=1.0931. This new values of A and B are used to calculate the effort of the
software project to show the effect of the proposed model on software effort estimation.
Using the genetics algorithm, we found a new optimized Coefficient value of A and B for the
organic Intermediate COCOMO model. A=4.7813 and B=0.9833. Using the Particle swarm
optimization algorithm, we found the new value of A=4.9426 and B=1.0265.
Table 4.2 shows the estimated effort for the testing dataset using our new optimized coefficient
parameter value of A and B. From the table, the second column indicates the software project size
in Kilo line of code, the third column is the actual effort provided from the dataset, the next other
column represents the estimated effort in each model respectively. The estimated effort
Page 59
45
comparison from the following table shows that all testing projects have good estimation value to
the actual effort in the PSO_GA model and the proposed model is more accurate than GA, PSO,
(Algabri, Saeed, Mathkour, & et al, 2015), and COCOMO model. The COCOMO model under-
estimated the effort of all projects and the estimated effort is more deviated from the actual effort
provided in the dataset. Figure 4.1 represents the graph of the estimated effort for the COCOMO,
Actual, and PSO_GA model. From the figure, the proposed PSO_GA model graph is in line with
the actual effort graph and the COCOMO model graph is under-estimated the effort of all testing
projects. his shows our model is better to estimate the effort of software than the COCOMO
model..
Figure 4. 1. Actual, COCOMO, and PSO_GA effort for organic model
Page 60
46
Table 4. 2. Estimated effort for organic model
No KLOC Actual
effort
PSO_GA
estimated
effort
GA
estimated
effort
PSO
estimated
effort
(Algabri,
Saeed,
Mathkour, &
et al, 2015)
effort
COCOMO
effort
1 20 48 49 42 50 46 35
2 16.3 82 82 72 84 77 58
3 25.9 117.6 114.1 95.4 113.5 103.2 79.3
4 12.8 62 62 56 65 60 45
5 14.0 60 62 56 65 60 44
6 19.3 155 141 122 143 131 99
7 6.5 42 40 39 44 42 30
8 35.5 192 192 155 186 168 131
9 47.5 252 233 182 223 199 158
10 11.3 36 34 32 36 34 25
11 8.0 42 44 42 48 45 32
12 7.7 31.2 30.3 28.9 32.6 30.7 22.1
13 16.0 114 105 93 108 100 75
14 8.2 36.0 32 30 34 32 23
Table 4.3 present the accuracy comparison of our model with COCOMO, GA, PSO, (Algabri,
Saeed, Mathkour, & et al, 2015), and (Nadal & Sangwan, 2018) model using five different
Page 61
47
estimation criteria. From the evaluation criteria, the PRED (0.25) value is expected to be high and
for the rest of the evaluation metrics, the value should be less because it is relative errors found in
the estimation process. From the table, we can see that the PRED (0.25) value of the COCOMO
model is 0.05, which means only 5 % of the testing project has MRE which is less than 0.25. This
is the minimum value found when we compare with PSO_GA, PSO, GA, (Algabri, Saeed,
Mathkour, & et al, 2015), and (Nadal & Sangwan, 2018) model value. From the comparison, the
proposed PSO_GA model has lower relative errors and higher PRED (0.25) value.
For example, the MRE value of our PSO_GA model is 0.6084, and the MRE for COCOMO, GA,
and PSO, model is 4.2291, 1.8137, and 0.8265 respectively. These values indicate the proposed
PSO_GA model can reduce 362.07%, 120.53%, and 21.81%, errors respectively. Comparatively
with (Algabri, Saeed, Mathkour, & et al, 2015) . The PRED (0.25) value for the GA model is 0.95,
which means 95% of the testing project have MRE less than 0.25. For PSO_GA, PSO, (Algabri,
Saeed, Mathkour, & et al, 2015) model the PRED (0.25) value is 1. This indicates that all the
testing organic project in the dataset has MRE less than 0.25. In all these five evaluation metrics,
the proposed PSO_GA model has lower relative errors and higher PRED (0.25) value. So that we
can conclude the PSO_GA model is satisfactory and the organic intermediate COCOMO model
software projects effort should be estimated with the new parameter value generated by our
proposed model.
Approach PRED(0.25) MRE MAE MAPE MMRE
COCOMO
model
0.05 4.2291 29.2062 30.2084 0.3020
(Algabri,
Saeed,
Mathkour, &
et al, 2015)
model
1 1.0783 10.3188 7.7024 0.0770
PSO model 1 0.8265 5.7455 5.9042 0.0590
GA model 0.95 1.8137 15.7269 12.9551 0.1295
Page 62
48
PSO_GA
model
1 0.6084 4.1821 4.3462 0.0434
Table 4. 3. Relative errors comparison between models using evaluation criteria
Figure 4. 2. Relative Error Figure for organic model
4.4.2. Experimental result for Semi-detached COCOMO Model on NASA60 dataset
Semi-detached model includes projects which haves software project size in between 50-300
KLOC. The proposed model was trained using semi-detached projects and obtained the following
newly optimized coefficient value of A and B. A=4.8129 and B=1.0208.The current semi-
detached Intermediate COCOMO model values of A and B, A=3.0; B=1.12.
To analyze the effect of the proposed model, we also used genetic and PSO algorithm individually
to generate the value of COCOMO coefficients and we found a new optimized Coefficient value
of A and B for semi-detached COCOMO model. Using the genetic algorithm, we could generate
this value, A= 4.9992 and B= 1.0094. And using the particle Swarm optimization algorithm we
obtained the following values. A=4.0003; B=1.0524
Table 4.4 represent the value of actual effort, PSO_GA, GA, PSO, COCOMO, (Nadal & Sangwan,
2018), and (Algabri, Saeed, Mathkour, & et al, 2015) model estimated effort value for the testing
semi-detached NASA 60 projects. From the table, it is clearly shows that the proposed PSO_GA
model achieved better result in most case of the testing project. So that, we can say the proposed
Page 63
49
PSO_GA model is better in estimating the effort of software project than others model listed in
table 4.4.
No KLOC Actual
Effort
PSO_GA
estimated
effort
GA
estimate
d effort
PSO
estimated
effort
(Algabri, Saeed,
Mathkour, & et
al, 2015) effort
COCOMO
Effort
1 78 571.4 571.3 564.6 544.9 541.0 548.6
2 177.9 1248 1240 1214 1214 1154 1292
3 190 420 416 407 408 387 436
4 50 370 314 312 295 300 288
5 219 2120 1418 1386 1398 1315 1509
6 282.1 1368 1044 1017 1037 962 1139
Table 4. 4. Estimated effort for semi-detached model
Figure 4. 3. MRE for semi-detached model
Table 4.5 represent the error rate of semi-detached cocomo model using PRED (0.25), MRE,
MAE, MAPE and MMRE. From the table, the proposed model has lower relative errors in all case
of software cost and effort evaluation metrics and the PRED (0.25) value is 0.8333 and this is the
Page 64
50
maximum PRED (0.25) value in the table. The COCOMO model has lower relative errors
compared with GA, PSO and (Algabri, Saeed, Mathkour, & et al, 2015) model. The MRE of our
proposed PSO_GA model compared with GA, PSO, (Algabri, Saeed, Mathkour, & et al, 2015)
and COCOMO model is better by 0.0947, 0.1510, 0.3378, and 0.0581 respectively. Which means,
our model can reduce 9.47%, 15.10%, 33.78%, and, 5.81% of effort errors respectively.
In terms of Mean Magnitude Relative Error, the PSO_GA model achieved better result by 0.0158,
0.0252, 0.0563 and 0.0097 respectively. The mean absolute percentage error for the proposed
model is 12.2083 and this is the minimum percentage error found from the table. The mean
absolute errors of our model is 181.9098. Using MAE as evaluation criteria, the proposed model
could achieved better result compared with other model. The MAE for GA, PSO, (Algabri, Saeed,
Mathkour, & et al, 2015), and COCOMO model is 199.1846, 199.6587, 239.3911, and 167.4203
respectively. When compared with our PSO_GA model, the PSO_GA model could reduce
17.2748, 17.7489, 57.4813, and 14.4895 percent of Mean Absolute errors. Generally, in all the
five evaluation metrics our proposed model achieved better result and we can say the semi-
detached model project effort should be calculate using our newly coefficient parameter value of
A and B.
Approach PRED(0.25) MRE MAE MAPE MMRE
COCOMO
model
0.8333 0.7906 167.4203 13.1771 0.1317
(Algabri,
Saeed,
Mathkour,
& et al,
2015) model
0.8 1.0703 239.3911 17.8395 0.1783
PSO model 0.8 0.8835 199.6587 14.7260 0.1472
GA model 0.8 0.8272 199.1846 13.7876 0.1378
PSO_GA
model
0.8333 0.7325 181.9098 12.2083 0.1220
Table 4. 5. Semi-detached model evaluation using SCE evaluation metrics
Page 65
51
4.4.3. Experimental result for Embedded COCOMO Model on NASA60 dataset
Embedded model include projects with software size greater than 300KLOC. The proposed model
were trained on such sized software project and we obtained the following newly optimized
coefficient value of A and B for the embedded intermediate COCOMO Model. A=4.2431;
B=1.0900.The current Embedded Intermediate COCOMO model value of A and B. A=2.8;
B=1.20
Using the genetics algorithm individually, we found a new value of A and. The value is A=4.8433,
and B= 1.0801. Using PSO algorithm we obtained the following value, 3.9983 and B=1.1295
Figure 4.4 represent the graph of each models’ effort. From the graph we can see that the
COCOMO, PSO and GA model over-estimate the effort of all embedded software projects. In
contrast the (Algabri, Saeed, Mathkour, & et al, 2015) model under-estimated the effort of all
projects. But in our proposed model, the estimation effort graph is almost similar and inline to the
actual effort graph. Table 4.6 shows the value of estimated effort for the testing embedded
intermediate COCOMO model using NASA60 projects dataset. For Embedded model of
COCOMO model, there is only three project whose KLOC is greater than 300 in the dataset. We
used these projects dataset as a training and a testing dataset and their estimated effort for each
model is presented in table 4.6. Form the table, the PSO_GA estimated effort for the last two
projects is satisfactory and for the first one project, the estimated effort is underestimated.
Comparatively with the GA, PSO and COCOMO model, our proposed PSO_GA model achieved
better result.
Page 66
52
Figure 4. 4. Effort graph for embedded model
Table 4. 6. Effort for Embedded Model
No KLOC Actual
effort
PSO_GA
effort
GA
effort
PSO
effort
(Algabri, Saeed,
Mathkour, & et al,
2015) effort
COCOMO
effort
1 302 2400 1956 2110 2309 1361 2419
2 370 3240 3217 3463 3829 2197 4068
3 423 2300 2318 2492 2774 1564 2975
Table 4.7 present the effort comparison between PSO_GA, GA, PSO, and COCOMO model using
five different software cost estimation evaluation metrics. The first evaluation metrics (PRED
(0.25)) indicates the percentage of software projects who have MRE less than or equal to 0.25. For
the COCOMO model, only 33.33% of testing projects in the dataset have MRE less than or equal
to 0.25. In the case of GA, PSO, and PSO_GA model, the PRED (0.25) achieved 1, which means
all the testing projects have MRE which is less than or equal to 0.25. This indicates the proposed
PSO_GA model is better than the COCOMO model. With other relative error evaluation metrics,
the proposed PSO_GA model achieved a better result. For example, the MMRE in the PSO_GA
model is about 0.0666, and the MMRE in GA, PSO, (Mohammed Algabri, 2015), and COCOMO
Page 67
53
model is about 0.0911, 0.1418, 0.3581, and 0.1858 respectively. This indicates the proposed
PSO_GA model could reduce 2.45%, 7.52%, 29.15%, and 11.92% of MMRE.
In terms of MAE, the Proposed PSO_GA model achieved 161.6897 and this is the minimum
Mean absolute error found in the table. Comparatively with GA, PSO, (Mohammed Algabri,
2015), and the COCOMO model, the proposed PSO_GA model could reduce 73.5369, 222.7792,
777.2584, and 346.0795 errors respectively. The magnitude relative error for the proposed model
is 0.1999 which is lower and better than other models in the table. Our model could reduce 7.34%,
22.56%, 87.44%, 35.75% of errors compared with GA, PSO, (Mohammed Algabri, 2015), and
COCOMO model respectively.
Table 4. 7. Comparison of embedded models using evaluation metrics
Approach PRED(0.25) MRE MAE MAPE MMRE
COCOMO model 0.3333 0.5574 507.7692 18.5814 0.1858
(Algabri, Saeed,
Mathkour, & et al,
2015)
- 1.0743 938.9481 35.8132 0.3581
PSO model 1 0.4255 384.4689 14.1844 0.1418
GA model 1 0.2733 235.2266 9.1131 0.0911
PSO_GA model 1 0.1999 161.6897 6.6656 0.0666
4.4.4. PSO_GA model effort comparison with research done by (Maleki, Ghaffar, &
Masdari, 2014) for NASA 60 datasets
In research (Maleki, Ghaffar, & Masdari, 2014) genetics and ant colony optimization algorithm
were used to test the suitable value of Effort multipliers and to optimize the COCOMO
coefficients respectively. The result of their experiment is presented in table 4.9. In comparison
with our proposed model, their research achieved higher Magnitude Relative errors. Which
means our proposed model is better than their work. Table 4.8 represent the magnitude relative
errors of models. The magnitude Relative errors of our proposed model is lower in, most of
Page 68
54
testing project and the total magnitude Relative errors is reduced from 199.26 to 171.83 when
compared with (Maleki, Ghaffar, & Masdari, 2014) model estimated effort. While they find
the suitable value of effort multipliers, they achieved less magnitude relative errors than the
original COCOMO model but have higher magnitude relative errors than our proposed model
in most of the testing project in the dataset. The total MRE of COCOMO model is 374.77 and
this is the higher error rate value in the table. In comparison with the original COCOMO model
the proposed model could reduce 165.94% of errors. Finally, we can say that our proposed
model is better to estimate the effort of software project than (Maleki, Ghaffar, & Masdari,
2014) and COCOMO model.
Table 4. 8. MRE of PSO_GA, COCOMO and (Maleki, Ghaffar, & Masdari, 2014) model
No Project
No
KLOC MRE of
COCOMO by
(Maleki,
Ghaffar, &
Masdari, 2014)
MRE using
original
COCOMO
MRE
using
PSO_GA
MRE by
(Maleki,
Ghaffar, &
Masdari,
2014)
1 9 10.4 28.08 34.26 9.04 16.60
2 11 16.0 26.99 34.14 7.17 10.54
3 13 13.0 46.02 9.35 26.63 33.88
4 16 15.0 25.15 38.99 14.24 8.67
5 28 7.7 34.14 28.85 2.82 19.32
6 37 100 37.25 57.06 43.54 12.36
7 43 20 37.64 55.37 38.50 15.30
8 47 370 44.43 40.89 0.70 31.93
9 57 282.1 28.79 40.13 20.46 23.16
10 60 19.3 25.57 35.77 8.73 27.50
Total MRE 334.06 374.77 171.83 199.26
Page 69
55
Table 4.9 represent the effort comparison between PSO_GA, original COCOMO and effort done
by (Maleki, Ghaffar, & Masdari, 2014) with the actual effort provided in the dataset. From the
table we can see that our estimated effort is much better in all the projects compared with the
original COCOMO effort and COCOMO effort gained after adjusting the suitable value of effort
multipliers. And when we compare our estimated effort with effort by (Maleki, Ghaffar, &
Masdari, 2014), PSO_GA effort achieved better estimated effort result near to the actual effort
provided on NASA 60 dataset.
Table 4. 9. Estimated Effort of models
No KLOC Actual
effort
PSO_GA
effort
Effort by
(Maleki,
Ghaffar, &
Masdari, 2014)
Original
COCOMO
effort
COCOMO
effort by
(Maleki,
Ghaffar, &
Masdari, 2014)
1 29.5 120 142.15 131.97 98.22 92.88
2 19.3 155 141.45 134.77 99.54 99.54
3 32.6 170 174.96 129.85 120.37 120.38
4 35.5 192 192.05 142.01 131.64 131.65
5 38 210 220.87 171.78 150.95 150.96
6 48.5 239 270.11 197.06 182.68 182.68
7 47.5 252 233.60 220.86 158.13 158.13
8 70 278 292.56 352.91 277.95 220.21
9 66.6 300 307.29 310.33 290.51 230.97
10 66.6 352.8 284.53 310.33 268.99 230.97
4.4.5. Experimental Result for organic COCOMO model on NASA63 dataset
The Proposed model was trained and tested on NASA60 datasets. To make sure, the optimized
COCOMO coefficient parameter value is valid for other datasets, we used NASA63 datasets.
NASA63 dataset contains 63 different software projects. These projects are included in to organic,
Page 70
56
semi-detached, and embedded mode according to the size of the software. This section presents
the result of the proposed model on NASA63 organic software projects. To make the result
description clear and in order not to include many tables, only the result using the five software
effort and cost evaluation criteria is presented. Table 4.10 represents the result evaluation on the
selected 15 organic software projects. From the table, the proposed model has larger PRED (0.25)
value and lower value with other evaluation criteria. This indicates our proposed PSO_GA model
achieved better results compared with other models in the table.
The PRED (0.25) value of PSO_GA model indicates, more than 73.33% of the testing dataset has
MRE less than 0.25. In the COCOMO model, only 40% of the testing project has MRE value
which is less than 0.25 and this is the minimum value. Using MRE as evaluation criteria, the
proposed model could reduce 14.265%, 9.014 %, 6.046%, and 7.283% of errors when compared
with COCOMO, GA, PSO, and (Algabri, Saeed, Mathkour, & et al, 2015) model respectively.
The minimum magnitude relative error value is achieved in the proposed model. Using MAE as
evaluation criteria, the PSO_GA model 27.2977, 13.8663, 2.4296, and 8.034 mean error value
when compared with COCOMO, GA, PSO, and (Algabri, Saeed, Mathkour, & et al, 2015)
respectively. Similarly, it could reduce 9.51%, 6.0095%, 4.0308%, and 4.8550% of Mean absolute
errors respectively. In terms of MMRE, our proposed model achieved better result by reducing the
mean error value. So that, we can say our proposed model is better to estimate the effort of organic
software projects with the newly optimized coefficient parameter value.
Table 4. 10. Evaluation of organic model using evaluation criteria
Models
Evaluation metrics
PRED(0.25) MRE MAE MAPE MMRE
COCOMO 0.40 4.4434 40.9818 29.6230 0.2962
(Algabri,
Saeed,
Mathkour,
& et al,
2015)
0.7333 3.7452 21.7215 24.9680 0.2496
Page 71
57
PSO 0.6666 3.6215 16.1137 24.1438 0.2414
GA 0.6 3.9183 27.5504 26.1225 0.2612
PSO_GA 0.7333 3.0169 13.6841 20.1130 0.2011
Figure 4. 5. Effort graph for organic model on NASA63 dataset.
4.4.6. Experimental Result for semi-detached COCOMO model on NASA63 dataset
In NASA63 semi-detached model, five software testing projects were taken from the dataset to
evaluate the performance of the proposed model. From table 4.11, all the listed models achieved
PRED (0.25) value of 0.8. This indicates 80% of the testing datasets have less than 0.25 MRE
value in all models. So, we can’t make a comparison analysis using PRED (0.25). In the other
evaluation metrics, our proposed model achieved better result. The magnitude relative errors of
the proposed model is 0.6414. Using MRE as evaluation criteria, our proposed model could reduce
9.45%, 5.84%, 16.57%, and 9.45% of errors in comparison with COCOMO, GA, PSO, and
(Algabri, Saeed, Mathkour, & et al, 2015) model respectively. In terms of MAE, our proposed
Page 72
58
model achieved better result by 14.4766, 9.1140, 26.6337, and 38.0079 of mean error respectively.
Using MAPE and MMRE, our proposed model also achieved better result comparatively.
Table 4. 11. Effort comparison using SCE metrics for semi-detached model
Models
Evaluation metrics
PRED(0.25) MRE MAE MAPE MMRE
COCOMO 0.8 0.7359 189.5173 14.7191 0.1471
(Algabri,
Saeed,
Mathkour,
& et al,
2015)
0.8 0.8893 213.0486 17.7864 0.1778
PSO 0.8 0.8071 201.6744 16.1420 0.1614
GA 0.8 0.6998 184.1547 13.9978 0.1399
PSO_GA 0.8 0.6414 175.0407 12.8293 0.1282
Page 73
59
Figure 4. 6. MRE for semi-detached model
4.4.7. Experimental Result for embedded COCOMO model on NASA63 dataset
For the embedded model, there is only four software projects in the dataset. So we took all the
projects to test the performance of the proposed PSO_GA model. Relatively with other COCOMO
models, less PRED (0.25) value is achieved in the embedded model. In table 4.12, the PRED (0.25)
value of the PSO_GA model is 0.25. Which indicates only 25% or one project from the dataset
has MRE vale which is less than 0.25. (Algabri, Saeed, Mathkour, & et al, 2015) Model has better
achievements in three of the evaluation metrics. - MRE, MAPE, and MMRE. And our proposed
model has better achievements in two of the evaluation metrics. – PRED (0.25) and MAE. In
comparison with COCOMO, PSO, and GA model, our proposed model achieved satisfactory
result.
Page 74
60
Table 4. 12. Effort comparison for embedded model on NASA 63 dataset
Models
Evaluation metrics
PRED(0.25) MRE MAE MAPE MMRE
COCOMO 0.25 4.2058 2142.3367 105.1468 1.0514
(Algabri,
Saeed,
Mathkour,
& et al,
2015)
- 1.8443 2387.5700 46.1081 0.4610
GA 0.25 3.0368 1611.8130 75.9222 0.7592
PSO 0.25 3.6777 1847.0891 91.9444 0.9194
PSO_GA 0.25 2.6968 1570.5311 67.4202 0.6742
4.4.8. Experimental Result for organic COCOMO model on NASA 93 datasets
In addition to using NASA60 and NASA63 software projects dataset, we also used NASA 93
projects for comparison analysis with other research work. We took 20 testing projects from the
dataset. Table 4.13 present the result of the experiment using the five evaluation metrics. From the
table, the PRED (0.25) value of our proposed model is 0.9. This indicates 90% of the testing
projects have approximate effort representation. The PRED (0.25) value of COCOMO model is
0.35 which makes, only 35% of the testing project have approximate effort estimation. Our
proposed model achieved less relative errors (better result) with the other four evaluation criteria.
In terms of MRE, the proposed model could reduce 32.949%, 12.4%, 6.998%, and 8.156% of
errors when compared with COCOMO, GA, PSO, and (Algabri, Saeed, Mathkour, & et al, 2015)
model respectively.
The mean absolute error for the PSO_GA model 16.0706. Using MAE as evaluation metrics, the
proposed PSO_GA model could reduce error by 37.0406, 19.068, 2.7761, and 10.8191
Page 75
61
respectively. In terms of MAPE, the COCOMO model achieved higher error rate. In contrast the
proposed PSO_GA model achieved better result (less error rate). Using MAPE as evaluation
metrics, our proposed PSO_GA model could reduce 16.4748%, 6.2%, 3.499%, and 4.0783% of
error respectively. Less MMRE value (better result) is achieved in the proposed model. So that,
we can say, the newly optimized coefficient parameter value of COCOMO model is better to
estimate the effort of organic software projects.
Table 4. 13. Organic model effort comparison on NASA93 dataset
Models
Evaluation Criteria
PRED(0.25) MRE MAE MAPE MMRE
COCOMO 0.35 5.4436 53.1112 27.2184 0.2721
(Algabri,
Saeed,
Mathkour, &
et al, 2015)
0.85 2.9643 26.8897 14.8219 0.1482
PSO 0.85 2.8485 18.8467 14.2426 0.1424
GA 0.65 3.3887 35.1386 16.9436 0.1694
PSO_GA 0.9 2.1487 16.0706 10.7436 0.1074
4.4.9. Experimental result for semi-detached COCOMO model on NASA 93 dataset
Table 4.14 shows the result of the experiment using NASA 93 semi-detached software projects.
From the table, the PRED (0.25) value of all models is 0.75. So, we can’t make effort accuracy
comparison using PRED (0.25). With the other evaluation metrics, our proposed model achieved
lower relative errors (better result). The MRE of the proposed PSO_GA model is 3.4612 and this
is the minimum value compared with other models MRE value. The PSO_GA model could reduce
18.4%, 4.76%, 12.31%, and 21.98% of MRE when compared with COCOMO, GA, PSO, and
Page 76
62
(Algabri, Saeed, Mathkour, & et al, 2015) model respectively. Using MAPE as evaluation metric,
the proposed PSO_GA model could achieved better result by 0.9198%, 0.2381%, 0.6155%, and
1.099% of errors. In terms of MMRE, our proposed model also achieved better result
comparatively.
Table 4. 14. Semi-detached model effort comparison on NASA93 dataset
Models
Evaluation Criteria
PRED(0.25) MRE MAE MAPE MMRE
COCOMO 0.75 3.6452 179.9865 18.2260 0.1822
(Algabri,
Saeed,
Mathkour,
& et al,
2015)
0.75 3.6810 199.3725 18.4052 0.1840
PSO 0.75 3.5843 187.8757 17.9217 0.1792
GA 0.75 3.5088 188.5638 17.5443 0.1754
PSO_GA 0.75 3.4612 183.9684 17.3062 0.1730
4.4.10. Experimental Result for Embedded COCOMO model on NASA 93 dataset
In the NASA 93 dataset, there is only six embedded software projects and we select all of them to
test the accuracy of optimized coefficient parameter value in estimating the required effort. In this
mode of project, the (Algabri, Saeed, Mathkour, & et al, 2015) model achieved better result than
our model. Comparatively with COCOMO, GA, and PSO model, our proposed model achieved
satisfactory result. Table 4.15 represent the semi-detached comparison on NASA93 dataset.
Page 77
63
Table 4. 15. Semi-detached model effort comparison on NASA93 dataset
Models
Evaluation metrics
PRED(0.25) MRE MAE MAPE MMRE
COCOMO - 9.1473 1660.1290 182.9461 1.8294
(Nadal &
Sangwan, 2018)
(Algabri, Saeed,
Mathkour, & et
al, 2015)
0.2 4.7383 1158.9310 94.7676 0.9476
GA - 7.7249 1499.2823 154.4999 1.5449
PSO - 8.6067 1599.9663 172.1343 1.7213
PSO_GA - 7.0979 1426.2747 141.9582 1.4195
Page 78
64
CHAPTER FIVE
CONCLUSION AND RECOMENDATION
5.1. Conclusion
Good software project management is chivied through good Software project planning. Software
project planning is one of the phases in the software project development life cycle which ensures
the project's feasibility. And software effort estimation is one of the tasks to be done in this project
planning phase. Different estimation techniques and algorithms have used to estimate the effort of
a particular software project. In this research, a software effort estimation model which relies on
COCOMO model is proposed. We used the Genetics algorithm and Particles Swarm optimization
algorithm sequentially to optimize the parameter value of the Intermediate COCOMO model. In
the process, we used a particle swarm optimization algorithm to produce an initial solution and
genetics algorithm to optimize the parameter value of Intermediate COCOMO coefficients. The
proposed model was trained using the NASA60 dataset where 70% of the dataset is used for as an
input to optimize the parameter value and 30% the dataset is used for testing the effort estimation
accuracy with the new optimized parameter value and evaluated using NASA60, NASA63 and
NASA 93 dataset. In the process, MRE, MMRE, MAE, PRED (0.25), and MAPE are used for
performance evaluation metrics. The experimental result showed that the prediction capability of
this thesis improved the organic, semi-detached and embedded model of COCOMO by 362.07%,
120.53%and 21.81% respectively.
5.2. Contribution
In this research, we investigate the impact of using two MetaHeuristics algorithms together on the
performance of software effort estimation. During the attempt to increase the estimation accuracy
of software effort, we provided the following contributions.
1. We provided a hybrid effort estimation model by using the strength of genetic and Particle
swarm optimization algorithms together to estimate the effort of a software project.
2. The research showed that the PSO algorithm is better for exploration and the GA
algorithm is suitable for exploitation
Page 79
65
5.3 Future work
This study focused on improving the performance of the COCOMO model in terms of software
effort Estimation accuracy by optimizing the parameter value of A and B. and the result of our
work is encouraging. The experimental result using the newly optimized coefficient value is very
promising especially for the organic and semi-detached type of COCOMO software project. But
have large magnitude relative errors for the embedded type of Intermediate COCOMO Model. The
reason might be, when the software size is larger, either the algorithm we used is incapable of
finding the global optimum value of COCOMO coefficients. Or the relation between software size,
effort, and complexity factor (coefficients) may not be expressed using power function. In the
future, we are hoping to present better effort estimation Accuracy for all types of Intermediate
COCOMO models using advanced meta-heuristic algorithms together. Another work to be done
in the future is improving the estimation accuracy performance for the Detailed COCOMO model
and COCOMO II using a MetaHeuristics Algorithm.
Page 80
66
REFERENCES
Abdel-Basset, M., Abdel-Fatah, L., & Sangaiah, A. K. (2018). METAHEURISTIC ALGORITHMS:A
COMPREHENSIVE REVIEW. Vellore, India: ScienceDirect.
Abts, C., Brown, A. W., & etal. (2000). COCOMO II Model Defination manual.
Ahadi, M., & Jafaria, A. (2016, March). A new hybrid for software cost estimation using particle swarm
optimization and differential evolution algorithm. 4.
Alajlan, M., & Tagoug, N. (2016). Optimization of COCOMO-II Model for Effort and Development Time
Estimation using Genetic Algorithms. Proc. Of The International Conference on Communications,
Computer Science and Information Technology.
Algabri, M., Saeed, F., Mathkour, H., & et al. (2015). Optimization of Soft Cost Estimation using Genetic
Algorithm for NASA software projects. 2015 5th National Symposium on Information Tchnology:
Towards New Smart World (pp. 1-4). Riyadh: IEEE. doi:10.1109/NSITNSW.2015.7176416
BaniMustafa, A. (2018, October 17). Predicting Software Effort Estimation Using Machine Learning
Techniques. doi:10.1109/CSIT.2018.8486222
Blum, C. (2005, December). Ant colony optimization: Introduction and recent trends. Physics of Life
Reviews, 2(4), 353-373. doi:https://doi.org/10.1016/j.plrev.2005.10.001
Boehm, B. (1984). Software engineering Economics. IEEE Acess, SE-10(1).
Borade, J. G., & Khalkar, V. R. (2013). Software project Effort and Cost Estimation Techniques.
International Journal of Advanced Research in Computer Science and Software Engineering, 3(8).
Carr, J. (2014). An Intriductio to genetics Algorithm.
(2015). Chaos Report .
Dizaj, A. A., & Gharehchopogh, F. S. (2018). A new approach to software cost estimation by improving
genetic algorithm with Bat Algorithm. Journal of Computer & Robotics, 11(2), 17-30.
Dongshu Wang, D. T. (2017). Particle swarm optimization algorithm: an overview. Springer.
Dorjio, M. (1992). Optimization, learning and natural algorithms.
genetics Algorithms in nature. (n.d.). In genetics algorithm (p. 50).
Glinz, P., & Mukhija, A. (2003). Construcctive Cost Model.
Goldbreg, D. E. (1988). Genetic Algorithms in Search, Optimization and Machine. New York.
Isa Maleki, A. G. (2014). A new approach to software cost estimation by improving genetic algorithm
with Bat Algorithm. International Journal of Innovation and Applied Studies.
Page 81
67
Jamil, A. S. (2007). Used SLIM Model to Estimate Software Cost.
Jørgensen, M. (2004, January). Top-down and bottom-up expert estimation of software development
effort. Information and Software Technology, 46(1), 3-16. Retrieved from
https://doi.org/10.1016/S0950-5849(03)00093-4
Kao, Y.-T., & Zahara, E. (2008, Mach). A hybrid genetics Algorithm and particle Swarm for multimodal
function. Applied Soft Computing, 8(2), 849-857.
Karna, H., & Gotovac, S. (2014). Modeling Expert Effort Estimation of Software Projects. 2014 22nd
International Conference on software Telecomunications and Computer Networks(SoftCom) (pp.
356-360). IEEE. doi:10.1109/SOFTCOM.2014.7039106
Kennedy, J., & Eberhart, R. (1995). new optimizer using particle swarm theory,. MHS'95. Proceedings of
the Six International Symposium on Micro Machine and Human Science. Nagoya, Japan: IEEE.
doi:10.1109/MHS.1995.494215
Keshta, I. M. (2017). Software Cost Estimation Approaches: A Survey. Journal of Software Engineering
and Applications.
Keung, J. (2009). Software Development Cost Estimation using Analogy: A review. 2009 Australian
Software Engineering Conference. IEEE. doi:10.1109/ASWEC.2009.32
Langsari, K., & Sarno, R. (2017). Optimizing COCOMO II Parametrs Using Particle Swarm Method. 2017
3rd International Conference on scinece in informtion technnology (ICSITech) (pp. 29-34).
Bandung: IEEE. doi:10.1109/ICSITech.2017.8257081
L.R. Nerkar, P. Y. (2014). Software Cost Estimation using Algorithmic Model and Non-Algorithmic Model
a Review. International Journal of Computer Applications.
Le, M. H., & Khuat,, T. T. (2016). Optimizing Parameters of Software Effort Estimation Models using
Directed Artificial Bee Colony Algorithm. Informatica, 40(4), 427-436.
Leung, H., & Fan, Z. (2013). software cost estimation.
Maleki, I., Ghaffar, A., & Masdari, M. (2014, Junuary ). A New Approach for Software Cost Estimation
with Hybrid Genetic Algorithm and Ant Colony Optimization. International Journal of Innovation
and Applied Studies, 5(1), 72-81.
Mandal, A. (2015). Identifying the Reasons for Software Project Failure and Some of their Proposed
Remedial through BRIDGE Process Models. International Journal of Computer Sciences and
Engineering .
Miandoab, E. E., & Gharehchopogh, F. S. (2016, June). A Novel Hybrid Algorithm for Software Cost
Estimation Based on Cuckoo Optimization and K-Nearest Neighbors Algorithms. Engineering,
Technology & Applied Science Research, 6, 118-122.
Page 82
68
Nadal, D., & Sangwan, O. P. (2018, August). Software Cost Estimation by Optimizing COCOMO Model
Using Hybrid BATGSA Algorithm. International Journal of Intelligent Engineering and Systems,
11(2), 250-263. doi:10.22266/ijies2018.0831.25
Ochieng, P., Mwangi, W., & Mwgha, S. M. (2014, May). software Size Estimation in Incremental Software
Development based on Improved Pairwise Comparison Matrices. International Journal of
Computer Applications, 93, 29-39. doi:10.5120/16213-5519
Omara, A. F., & Arafa, M. M. (2010, January). Genetiics Algorihm for task scheduling problem. Journal of
Parallel and Distributed Computing, 70(1), 13-22.
PMI. (2017). A guide to the project management body of knowledge (Six ed.). Newtown Square,
Pennsylvania, USA: Project Management Institute,Inc.
Promise software engineering Repository. (2005, April 4). Retrieved from promise.site.uottawa:
http://promise.site.uottawa.ca/SERepository/datasets-page.html
Przemyslaw Pospieszny, B. C.-C. (2018). An effective approach for software project effort and duration
estimation with machine learning algorithm. ELSEVIER.
Rijwani, P., & Jain, S. (2016). Enhanced Software Effort Estimation using Multi Layered Feed Forward
Artificial Neural Network Technique. Procedia Computer Science, 89, 307-312.
Rohit Kumar Sachan, e. (2016). Optimizing Basic COCOMO Model using Simplified Genetic Algorithm.
ELSEVIER.
Sachan, R. K., Nigam, A., Singh, A., & et al. (2016). Optimizing Basic COCOMO Model using Simplified
Genetic Algorithm. Procedia Computer Science , 89, 492-498.
doi:https://doi.org/10.1016/j.procs.2016.06.107
Saini, N. (2017, December 8). Review of Selection Methods in Genetic Algorithms. 6.
Salijoughinejad, R., & Khatibi, V. (2018). A New Optimized Hybrid Model Based on COCOMO to Increase
the Accuracy of Software Cost Estimation. Journal of Advances in Computer Engineering and
Technology, 4(1), 27-40.
Sengupta, S., Basak, S., & Peters, R. A. (2018). Particle Swarm Optimization: A survey of historical and
recent developments with hybridization perspectives. Machine Learning & Knowladge
extraction.
Sharma, S. (2017, Jauuary). Application of genetics Algorithm in software Engineering,Distributed
Computing and Machine Learning. International Journal of Computer Application and
Information Technology, 9(2).
Shekhar, S., & Kumar, U. (2016). Review of Various Software Cost Estimation Techniques. International
Journal of Computer Applications, 141. doi:10.5120/ijca2016909867
Page 83
69
Singal, P., Kumari, A., & Sharma, P. (2020). Estimation of software development effort: A Differential
Evolution Approach. Procedia Computer Science, 167, 2643-2652. Retrieved from
https://doi.org/10.1016/j.procs.2020.03.343
Singh, T., Singh, R., & Mishra, K. K. (2018). Software Cost Estimation using Environmental Adaption
method. Procedia Computer science , 143, 325-332.
doi:https://doi.org/10.1016/j.procs.2018.10.403
Sörensen, K., & Glover, F. W. (2017, January 23 ). METAHEURISTIC. doi:https://doi.org/10.1007/978-1-
4419-1153-7_1167
Think Big, A. S. (2013). The CHAOS Manifesto, 2013.
Tribhuvan Singha, ∗. R. (2018). Software Cost Estimation using Environmental Adaption method.
ELSEVIER.
Y.Sangeetha M.Tech (Ph.d), P. L. (2012). Software Cost Models. International Journal of Engineering
Research & Technology (IJERT).
Yang, X.-S. (2014). Nature-Inspired Optimization Algorithms. Springer.
Page 84
70
APPENDIX
Appendix 1, Dataset sample with its attributes
runfile('C:/Users/dady/geneticAlgorithm/Datacleaning.py',
wdir='C:/Users/dady/geneticAlgorithm')
RELY DATA CPLX TIME ... TOOL SCED KLOC act_effort
0 1.15 0.94 1.15 1.00 ... 1.00 1.08 29.5 120.0
1 1.15 0.94 1.15 1.00 ... 1.00 1.08 19.7 60.0
2 1.15 0.94 1.15 1.00 ... 1.00 1.08 5.5 18.0
3 1.15 0.94 1.15 1.00 ... 1.00 1.08 10.4 50.0
4 1.00 1.16 1.15 1.30 ... 0.83 1.08 16.3 82.0
5 1.00 0.94 1.15 1.00 ... 1.00 1.00 31.5 60.0
6 1.00 1.08 1.15 1.30 ... 1.10 1.04 11.4 98.8
7 1.00 1.00 1.15 1.00 ... 1.00 1.04 47.5 252.0
8 1.00 1.00 1.15 1.00 ... 1.00 1.00 8.0 42.0
9 1.15 1.00 1.00 1.11 ... 1.00 1.00 15.0 90.0
10 1.00 0.94 1.15 1.00 ... 1.00 1.00 11.3 36.0
11 1.00 1.00 1.15 1.00 ... 1.00 1.00 8.0 42.0
12 1.00 1.16 1.15 1.30 ... 0.83 1.08 48.5 239.0
13 1.00 1.16 1.15 1.30 ... 0.83 1.08 32.6 170.0
14 1.00 0.94 1.15 1.00 ... 1.00 1.00 20.0 72.0
15 1.00 1.16 1.15 1.30 ... 0.83 1.08 15.4 70.0
.. ... ... ... ... ... ... ... ... ....
60 1.00 1.08 1.15 1.30 ... 1.10 1.04 11.4 98.8
Page 85
71
Appendix 2, Sample python code to calculate the fitness of each coefficients # -*- coding: utf-8 -*-
"""
Created on Tue Mar 10 20:53:56 2020
@author: dady
"""
# -*- coding: utf-8 -*-
"""
Created on Wed Feb 26 09:30:44 2020
@author: dady
"""
import random
import numpy as np
import pandas as pd
import numpy as numpy
import sample
import detached
from sklearn.model_selection import train_test_split
data = pd.read_csv("nasa60_detachedtesting.txt")
data_value=data.values.tolist()
fittness_value=[]
#print(data_value)
y = data.act_effort
X = data.drop('act_effort', axis=1)
Page 86
72
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3,random_state=0)
print()
train_X=X_train.values.tolist()
train_Y=y_train.values.tolist()
#print(len(train_data))
def fitness_function(x):
for i in range(len(train_X)):
for j in range(16):
fittness=abs(x[0]*pow(train_X[i][15],x[1])*np.prod(train_X[i][0:15])-train_Y[i])
fittness_value.append(fittness)
return fittness_value
Appendix 3, Sample initial value generated for coefficients
909.0497008064683 [3.95623091 1.19678076]
205.8898111906779 [4.67728402 1.03898287]
468.2230304937265 [4.48409103 0.97755579]
205.0267181421627 [4.57667944 1.04318819]
209.02479288994448 [4.64250996 1.04360598]
216.54734579319788 [4.70716406 1.04353031]
205.39103079706877 [4.72593763 1.03501228]
204.49595547550877 [4.71990694 1.0338528 ]
529.8784144902694 [4.73725871 1.09568965]
205.43159989551242 [4.84710612 1.02616166]
204.4848749183215 [4.73065257 1.0332444 ]
204.43921774767088 [4.72206875 1.03382967]
Page 87
73
204.4391720422616 [4.71840025 1.03405207]
256.7535491863642 [4.25795018 1.04101989]
204.59187282897167 [4.70044794 1.03495936]
204.4719081163542 [4.7215581 1.0339129]
204.50916791380558 [4.71507459 1.03439764]
205.7162838635714 [4.85622437 1.025371 ]
488.5155116029744 [2.90038549 1.10972479]
Parents
[[4.72322769 1.0337143 ]
[4.72322769 1.0337143 ]
[4.72322769 1.0337143 ]
[4.72322769 1.0337143 ]]
Crossover
[[4.72322769 1.0337143 ]
[4.72322769 1.0337143 ]
[4.72322769 1.0337143 ]
...
[4.72322769 1.0337143 ]
[4.72322769 1.0337143 ]
[4.72322769 1.0337143 ]]