SOFTWARE EFFORT ESTIMATION MODEL USING GENETIC AND ...

DSpace Institution

DSpace Repository http://dspace.org

Software Engineering thesis

2020

SOFTWARE EFFORT ESTIMATION

MODEL USING GENETIC AND

PARTICLE SWARM OPTIMIZATION ALGORITHM

DEMLEW, MEHARY

http://hdl.handle.net/123456789/11293

Downloaded from DSpace Repository, DSpace Institution's institutional repository

BAHIR DAR UNIVERSITY

BAHIR DAR INSTITUTE OF TECHNOLOGY

SCHOOL OF RESEARCH AND POSTGRADUATE STUDIES

FACULITY OF COMPUTING

SOFTWARE EFFORT ESTIMATION MODEL USING GENETIC AND

PARTICLE SWARM OPTIMIZATION ALGORITHM

BY

MEHARY DEMLEW AREGU

BAHIR DAR, ETHIOPIA

August, 2020

i

SOFTWARE EFFORT ESTIMATION MODEL USING GENETICS AND PARTICLE

SWARM OPTIMIZATION ALGORITHM

BY

MEHARY DEMLEW AREGU

A thesis submitted to the school of Research and Graduate Studies of Bahir Dar Institute of

Technology, BDU in partial fulfillment of the requirements for the degree of Master of

Science

in Software Engineering in the faculty of Computing.

Advisor: MEKUANINT AGEGNEHU (PhD)

Bahir Dar, Ethiopia

August, 2020

ii

iii

© 2020

Mehary Demlew Aregu

ALL RIGHTS RESERVED

iv

v

Acknowledgement

Coming this far is not easy and by chance, there is always GOD on behalf of me. Only GOD

deserves all praises who makes everything is possible.

My special thanks and appreciation goes to Dr. Mekuanint Agegnehu for his continuous support,

scientific supervision and constructive guidance right from the moment of problem formulation to

the completion of this research. He has been there for me when I need his support, to review the

paper and to reply feedback at the right time. The way he has been advising me and his punctuality

was inspiring and helped me to do a lot, even in the future. His enthusiasm and encouragement

helped me to do my task day and night so that the research completed on-time. Dr. Mekuanint

Agegnehu was not only my research advisor, he was also my mentor throughout the year.

Another special thanks and appreciation goes to Mr. Daniel Tsegaye who was my project advisor

when I do my undergraduate project, for his continuous moral support and his advices during these

research study. My grateful thanks to my brother, Biruk Demlew. He is always my inspiration.

How good it is having a brother who can be and do anything for his family. He has been helping

me morally, financially when I need online course materials. I have to say thank you my friend

and classmates Zelalem Fiseha, Belay and Atelaw mulatu who helps me in designing the model of

this research and all the conversation we had together throughout the year. Finally I would like to

say thank you my family for their love, much respect and moral support all the time.

vi

Abstract

Software effort estimation is the process of predicting the number of human required to develop a

particular software project. During software development, the initial requirement is usually

changed and this makes the project manager to update the software effort, cost, and schedule. To

manage the change of effort, cost, and schedule of a software project, the initial software effort,

and cost estimation need to be accurate. Various researchers have used machine learning

algorithms and algorithmic techniques to improve the accuracy of software effort estimation. The

Constructive Cost Model (COCOMO) is an algorithmic model which is widely used as a software

effort, cost, and time estimation model. However, the COCOMO model and machine learning-

based approach have limitation on estimating the software effort accurately due to the non-

deterministic nature of the problem. Meta-heuristic algorithms are better to find near-optimum

solutions at a reasonable computational cost for parameter optimization. So that it can be a good

technique. In this research paper, a hybrid genetic and particle swarm optimization algorithm based

model is proposed. A hybrid genetic and particle swarm optimization algorithm is used for

optimizing the coefficient parameters of intermediate COCOMO model. The thesis used the

strength of the two algorithms to design an effective software effort estimation model. PSO used

to generate an initial local optimum solution and GA used to optimize parameter values of

COCOMO coefficients. The proposed model was trained and tested using NASA software

datasets. To evaluate the performance of our model, we used the five well known and widely used

software effort and cost estimation accuracy measures: - Percentage of Prediction (PRED (0.25)),

Magnitude Relative Error (MRE), Mean Magnitude Relative error (MMRE), Mean of absolute

error (MAE), and Mean absolute percentage error (MAPE). The results showed that the Magnitude

relative error (MRE) of the proposed model in comparison with COCOMO, GA, and PSO model

is reduced to 362.07%, 120.53%and 21.81% respectively.

vii

Table of contents

Acknowledgement ................................................................................................................................. v

Abstract ................................................................................................................................................ vi

Table of contents.................................................................................................................................. vii

List of Abbreviations ............................................................................................................................. x

List of figures ........................................................................................................................................ xi

List of tables.......................................................................................................................................... xi

CHAPTER ONE .................................................................................................................................... 1

INTRODUCTION ................................................................................................................................. 1

1.1. Background ................................................................................................................................ 1

1.2. Motivation .................................................................................................................................. 3

1.3. Statement of the problem........................................................................................................... 4

1.3.1. Research Question .............................................................................................................. 5

1.4. Objective of the study ................................................................................................................ 5

1.4.1. General objective ................................................................................................................ 5

1.4.2. Specific objective ................................................................................................................ 5

1.5. Scope and limitation................................................................................................................... 5

1.6. Methodology of the study........................................................................................................... 6

1.6.1. Data collection .................................................................................................................... 6

1.6.2. Simulation Environments................................................................................................... 6

1.6.3. Experimental Evaluation ................................................................................................... 6

1.6.3.1. Evaluation Criteria in software cost estimation ........................................................... 6

1.7. Significance of the study ............................................................................................................ 7

1.8. Organization of the study .......................................................................................................... 8

CHAPTER TWO................................................................................................................................... 9

LITERATURE REVIEW ..................................................................................................................... 9

2.1. Introduction .................................................................................................................................... 9

2.2. 1. Project Size estimation ........................................................................................................ 10

2.2.1.1. Lawrence H. Putnam LOC Estimation .......................................................................... 10

2.2.1.2. Function point Analysis ................................................................................................. 10

viii

2.3. Software cost and effort estimation techniques ................................................................... 11

2.3.1. Constructive Cost Model(COCOMO) ......................................................................... 12

2.3.2. COCOMO II Model ..................................................................................................... 16

2.3.3. SLIM Model.................................................................................................................. 18

2.3.4. Experience based Estimation: ...................................................................................... 18

2.3.5. Estimation by analogy .................................................................................................. 18

2.3.6. Top down and bottom up approach ............................................................................. 18

2.4. MetaHeuristics Algorithms .................................................................................................. 19

2.4.1. Ant Colony Optimization (ACO) ................................................................................. 19

2.4.2. Particle Swarm Optimization (PSO) ............................................................................ 20

2.4.3. Genetics Algorithm ....................................................................................................... 25

2.4.4. Hybridization of Meta-heuristic Algorithm ................................................................. 29

2.5. Related works ....................................................................................................................... 29

2.5.1. Summary of related works ........................................................................................... 32

CHAPTER THREE............................................................................................................................. 34

DESIGN OF METHODOLOGY ........................................................................................................ 34

3.1. Introduction .............................................................................................................................. 34

3.2. Design of the proposed model ................................................................................................... 35

3.3. Generating Initial solution using Particle Swarm Optimization ............................................. 36

3.3.1. Generate initial Random values and assign Pbest value of A and B ................................. 38

3.3.2. Calculate the fitness function ............................................................................................. 38

3.3.2. Calculate the Velocity and update the position of Particles A and B ................................ 39

3.4. Optimizing the Coefficient value of A and B using the Genetic Algorithm............................. 40

3.4.1. Calculate fitness value ........................................................................................................ 41

3.4.2. Selection .............................................................................................................................. 41

3.4.3. Crossover ............................................................................................................................ 41

3.4.4. Mutations............................................................................................................................ 41

CHAPTER FOUR ............................................................................................................................... 42

EXIPERMENTAL RESULT AND DISCUSSION ............................................................................. 42

4.1. Introduction ......................................................................................................................... 42

4.2. Dataset Description .............................................................................................................. 42

4.3. Simulation environment ....................................................................................................... 44

ix

4.4. Experiment results ............................................................................................................... 44

4.4.1. Experimental Result for Organic Model on NASA60 dataset ..................................... 44

4.4.2. Experimental result for Semi-detached COCOMO Model on NASA60 dataset ........ 48

4.4.3. Experimental result for Embedded COCOMO Model on NASA60 dataset ............... 51

4.4.4. PSO_GA model effort comparison with research done by (Maleki, Ghaffar, &

Masdari, 2014) for NASA 60 datasets ......................................................................................... 53

4.4.5. Experimental Result for organic COCOMO model on NASA63 dataset ................... 55

4.4.6. Experimental Result for semi-detached COCOMO model on NASA63 dataset ........ 57

4.4.7. Experimental Result for embedded COCOMO model on NASA63 dataset ............... 59

4.4.8. Experimental Result for organic COCOMO model on NASA 93 datasets ................. 60

4.4.9. Experimental result for semi-detached COCOMO model on NASA 93 dataset ....... 61

4.4.10. Experimental Result for Embedded COCOMO model on NASA 93 dataset ............. 62

CHAPTER FIVE ................................................................................................................................. 64

CONCLUSION AND RECOMENDATION ...................................................................................... 64

5.1. Conclusion ................................................................................................................................. 64

5.2. Contribution .............................................................................................................................. 64

5.3 Future work ............................................................................................................................... 65

REFERENCES .................................................................................................................................... 66

APPENDIX .......................................................................................................................................... 70

Appendix 1, Dataset sample with its attributes .............................................................................. 70

Appendix 2, Sample python code to calculate the fitness of each coefficients................................ 71

Appendix 3, Sample initial value generated for coefficients ........................................................... 72

x

List of Abbreviations

A Multiplicative Constant

ACO Ant colony optimization

B Exponential Constant

COCOMO Constructive Cost Model

EMs Effort multipliers

GA genetics Algorithm

Gbest Global best value

IEAM-RP Improved Environmental Adaptive Method

IWO Invasive weed optimization algorithm

LOC Line of code

MAE Mean of absolute error

MAPE Mean Absolute percentage error

MD Mathian distance

MMRE Mean Magnitude Relative Errors

MRE magnitude Relative Errors

MSE Mean square error

Pbest Personal Best Value

PRED (n) Percentage of prediction

PSO Particle Swarm optimization

PSO_GA particle swarm optimization and Genetics algorithm

RMSE Root mean square error

SCE Software Cost Estimation

SEE Software Effort estimation

SEER_SEM Software Evaluation and estimation of Resources-Software Estimation Mode

SLM Software Life Cycle management

SLOC/KLOC thousands line of code

VAM/EMF Value/effort adjustment multiplier

xi

List of figures

Figure 2. 1. The basic working of ant colony optimization algorithm (Blum, 2005).......................... 20

Figure 2. 2. Graphical representation of PSO .................................................................................... 22

Figure 2. 3. Flowchart diagram for PSO ............................................................................................ 24

Figure 2. 4. Basic operation steep in genetics algorithm..................................................................... 26

Figure 2. 5. Single point crossover, two point crossover and uniform crossover ............................... 28

Figure 3. 1.The proposed PSO_GA model system architecture ......................................................... 36

Figure 3. 2. Flow chart diagram for PSO............................................................................................ 37

Figure 3. 3. Section of genetics Algorithm operation for the proposed methodology ........................ 40

Figure 4. 1. Actual, COCOMO, and PSO_GA effort for organic model ........................................... 45

Figure 4. 2. Relative Error Figure for organic model ........................................................................ 48

Figure 4. 3. MRE for semi-detached model ........................................................................................ 49

Figure 4. 4. Effort graph for embedded model ................................................................................... 52

Figure 4. 5. Effort graph for organic model on NASA63 dataset. ...................................................... 57

Figure 4. 6. MRE for semi-detached model ........................................................................................ 59

List of tables

Table 1. 1. Software effort and cost evaluation metrics (Miandoab & Gharehchopogh, 2016) .......... 7

Table 2. 1. Complexity weight (Ochieng, Mwangi, & Mwgha, 2014) ................................................. 11

Table 2. 2. Basic-COCOMO model types and its project size (Boehm, 1984) ................................... 12

Table 2. 3. . Basic COCOMO projects Coefficient value (Boehm, 1984) ........................................... 13

Table 2. 4. Coefficients value in Intermediate Model (Boehm, 1984)................................................. 14

Table 2. 5. . Cost factor and their weight in intermediate COCOMO (Salijoughinejad & Khatibi,

2018) ..................................................................................................................................................... 15

Table 2. 6. Effort multiplier rating scale and its value for detailed cocomo model (Glinz & Mukhija,

2003) ..................................................................................................................................................... 16

Table 2. 7. COCOMO II effort multipliers (Singal, Kumari, & Sharma, 2020) ................................ 17

Table 2. 8. Parameters of PSO ............................................................................................................ 22

Table 2. 9. Summary of related works ................................................................................................ 33

xii

Table 3. 1. Parameter of PSO and its value ........................................................................................ 40

Table 3. 2. Parameters and its value of Genetic Algorithm ................................................................ 42

Table 4. 1. Dataset attribute class, name and its code ........................................................................ 44

Table 4. 2. Estimated effort for organic model ................................................................................... 46

Table 4. 3. Relative errors comparison between models using evaluation criteria ............................ 48

Table 4. 4. Estimated effort for semi-detached model ........................................................................ 49

Table 4. 5. Semi-detached model evaluation using SCE evaluation metrics ...................................... 50

Table 4. 6. Effort for Embedded Model .............................................................................................. 52

Table 4. 7. Comparison of embedded models using evaluation metrics............................................. 53

Table 4. 8. MRE of PSO_GA, COCOMO and (Maleki, Ghaffar, & Masdari, 2014) model ............. 54

Table 4. 9. Estimated Effort of models ................................................................................................ 55

Table 4. 10. Evaluation of organic model using evaluation criteria ................................................... 56

Table 4. 11. Effort comparison using SCE metrics for semi-detached model .................................... 58

Table 4. 12. Effort comparison for embedded model on NASA 63 dataset ........................................ 60

Table 4. 13. Organic model effort comparison on NASA93 dataset ................................................... 61

Table 4. 14. Semi-detached model effort comparison on NASA93 dataset ........................................ 62

Table 4. 15. Semi-detached model effort comparison on NASA93 dataset ........................................ 63

1

CHAPTER ONE

INTRODUCTION

1.1. Background

Software cost estimation is a sequence of procedures with techniques that used to arrive to estimate

the effort, dollar cost, and schedule for a particular software project. The software effort measures

the number of man power required to develop a software product, schedule estimation deals with

determining how much time would a particular software project takes to complete, and dollar cost

estimation is the process determining the overall software project cost. The effort, dollar cost, and

schedule are measured in person-months, in dollars and calendar-time respectively (PMI, 2017).

When software projects are getting complex in size, determining the amount of effort, dollar cost

and time to complete a software project is a big challenge which results in fundamental problems

in cost, time-to-market, functionality, and quality requirements. This problem could be overcome

by having a good software project management. Software effort and cost estimation plays

significant role to have good software project management. In software project development, the

software project manager uses the application of knowledge, tools, skills, and techniques to make

sure that the software is delivered on-time and as per the required quality. In the process, the

techniques, preliminary software cost, and effort estimation which is used in different phases of

the project life cycle by the project manager need to be accurate. Because inaccurate software

effort and cost estimation lead to software project failure.

Accurate software effort estimation is very important in software project management because it

helps to determine the operational and economic feasibility of the project at the beginning in a

software project lifecycle, helps the project manager to determine what resources should be used

and how the resources should be used. Good software effort estimation provide assurance and

reduce the level of risks. During software development, the initial requirement is usually changed

and the project manager need to update the software effort and schedule. This means accurate

preliminary software effort estimation helps the project manager to make a decision for re-planning

when changes happen in the project. Managing and controlling a software development process is

possible as long as the early effort estimation is accurate. A software project could not be able to

complete within a given schedule and budget when the project is underestimated. And too many

2

resources are committed to the project when the project is overestimated. Therefore, accurate

software effort estimation is required in the early stage of software development. But practically

most software projects do not deliver on-time, on-budget, and as per the request quality. On

average only 16.2% for software projects that are completed on-time and on-budget, in the large

company only 9% of their project comes in on-time and on-budget (Chaos Report , 2015). And

statistics by the same organization proved that the total project success rate is about 30.3%, project

challenged is about 46% and the total project failure rate is 23.4%. (Mandal, 2015).

Software effort and cost estimation is one of the most challenging areas of project management

(Przemyslaw Pospieszny, 2018) (Y.Sangeetha M.Tech (Ph.d), 2012). Researchers and academic

professionals have been struggling to develop a model for software effort and cost estimation. As

a result, many software effort and cost estimation techniques have been suggested. Broadly these

techniques classified into algorithmic and non-algorithmic techniques. The algorithmic model uses

the major cost factors in a mathematical formula to estimate the effort. Constructive Cost Model

(COCOMO), SLIM (software Life Cycle Management) model, Function Point based model, use

case point analysis, and Putnam’s Model (Shekhar & Kumar, 2016) are some of the algorithmic

techniques. In Non-algorithmic techniques estimation is computed from previous similar project

experiences. Analogy techniques, Expert judgment, Parkinson’s Law, pricing to win (L.R. Nerkar,

2014) are included in non-algorithmic techniques.

The COCOMO model is a well-documented and widely used algorithmic model to estimate the

effort, time, and cost of software project (Sachan, Nigam, Singh, & et al, 2016). It was developed

by W. Boehm (Boehm, 1984) based on a historical dataset of 63 projects. In this model, effort of

software is computed using software size as a major parameter and cost factors as effort adjustment

factor. The software size is represented in thousand-lines of code (KLOC). The COCOMO model

has three types: - Basic Model, Intermediate Model, and Detailed Model. In each type of

COCOMO model, there are three mode of projects: - organic, semi-detached, and Embedded. This

model defines a mathematical equation to estimate the effort of software project. The equation is

defined in equation 1 (Maleki, Ghaffar, & Masdari, 2014).

Effort = A × (KLOC)B × EMF (1)

3

Where, A and B are multiplicative and exponential constant respectively. KLOC is the size of

software and EMF is the product of all cost factors (effort multipliers). In basic COCOMO model,

the value of EMF is 1.

Currently, Meta-Heuristic algorithm was found successful in efficiently estimating the effort of

projects due to their population-based search techniques (Singh, Singh, & Mishra, 2018). In this

research, a hybrid particle swarm optimization (PSO) and genetic Algorithm (GA) is used for

optimizing the parameters (coefficients) value of Intermediate COCOMO Model so that more

realistic effort can be estimated. The PSO algorithm used for generating an initial parameters value

of COCOMO model and the GA used for optimizing the parameters value from the PSO. The

proposed hybrid model (PSO_GA) is trained with NASA60 projects dataset and tested using

NASA60, NASA63 and NASA93 software projects dataset. PRED (0.25), MRE, MMRE, MAE,

and MAPE software effort and cost estimation evaluation criteria is used to evaluate the

performance of the proposed model.

1.2. Motivation

The motivation for this thesis is back to the course software project management. In software

project management, less than 20% of the software project is deliver on-time, on-budget and as

per the request quality. One of the major factor to have these ratio is inaccurate software effort

and cost estimation early in the project life cycle. After analyzing this, I came up to work-on and

improving the estimation efficiency of COCOMO. COCOMO is chosen because, the model

contain most product, project, personal and platform attributes which can directly or indirectly

affect the effort of software project. Secondly, it is the most documented and widely used software

effort and cost estimation techniques in most software company.

For the last three decades, much research and software effort and cost estimation models have

been done, which shows that this area is so significant that it has gained continuous research

attention and still a hot research issue (Singh, Singh, & Mishra, 2018). Even though many research

papers are done on software effort or cost estimation, none of them can’t achieve a satisfactory

result which can help the software product to deliver on-time, on-budget, and as per the request

quality. Most of the research work in software effort and cost estimation is concentrated on the

rough and quick calculation of effort only using the size of the software as cost function which

4

results in the software effort, time and the overall cost not to estimate very well at the beginning

of software life cycle. But in reality, there is much non-functional attribute that affects the cost of

a particular software product. After analyzing this problem we come up to contribute a little effort

to show the effect of using two meta-heuristic algorithms together on the efficiency of software

effort estimation.

1.3. Statement of the problem

In software development, there are many interrelated factors whose relationship is not well

understood and affects the software product quality directly or indirectly. And this makes

estimating the software cost and effort difficult using the algorithmic methods. And are incapable

of combining incomplete information and these defects make the extraction of important

information face with fault (Ahadi & Jafaria, 2016). When the estimation problem is affected by

the numbers of cost factors and variables, the algorithmic methods will be unable to achieve the

real answers (Maleki, Ghaffar, & Masdari, 2014).

The current method of software project effort estimation suffers from lack of accuracy and focused

on some factors (mostly the software size) related to software development process while

neglecting other functional and non-functional attributes which can directly affect the cost of a

particular software project (BaniMustafa, 2018). Most software Cost estimation research including

(Sachan, Nigam, Singh, & et al, 2016) (Singh, Singh, & Mishra, 2018) focused only on optimizing

the Basic COCOMO model and its accuracy is restricted since only Line of code is used as a cost

factor to estimate the software effort. Our research is focused on the Intermediate COCOMO

model, a model that takes many cost-driving factors and KLOC in combination to estimate the

effort of a software project.

In the particle swarm optimization algorithm, each particle can participate and make decision for

a cost function upon finding optimum solutions so that there is much probability not to lose local

optimal solution. PSO has a memory, so knowledge of good solutions is retained by all the

particles; whereas in GA, previous knowledge of the problem is discarded once the population

5

changes (Kao & Zahara, 2008). So that there is an opportunity to lose the optimum solutions in a

genetic algorithm and to converges into local minimum value before finding the global minimum.

To solve this problem, we use the PSO algorithm to generate an initial solution and GA to optimize

the value of coefficients.

1.3.1. Research Question

In this study, we investigated the following research questions.

RQ1. What is the impact of using particle swarm and genetics algorithm in combination to tune

the coefficient parameter value of Constructive cost model?

RQ2. How the multiplicative constant (A) and exponential constant (B) parameter value of

COCOMO is to be reviewed by the PSO and GA?

RQ3. How assigning optimal weight for COCOMO coefficients is possible?

1.4. Objective of the study

1.4.1. General objective

The major objective of this research is to develop efficient software development effort

estimation model using a genetic and particle swarm optimization algorithm.

1.4.2. Specific objective

To analyze the impact of the hybrid PSO_GA model in optimizing the coefficients

parameter value of COCOMO model

To generate optimal weight value for COCOMO coefficients parameters

To make estimated effort comparison with state-of-the-art techniques

1.5. Scope and limitation

The main focus of our research is providing efficient software cost estimation model by optimizing

Intermediate COCOMO model parameters value. In this thesis, the model to be developed is based

on the Components of Intermediate COCOMO model. The effect of each component of COCOMO

6

in each phase of the software development process is not going to be considered. And also, the

quality of the software to be developed is not going to be considered. The limitation of this research

is that the estimation process is not beyond the system level. In other word, in the COCOMO

model, a particular software project is considered as a homogenous entity, composed from a single

sub-system. Which means, the COCOMO effort multipliers will have only one nominal scale value

throughout the estimation process. But in reality, a software system may composed from smaller

and heterogeneous subsystem. So that, the effort of each sub system should be calculated

differently with different effort multipliers value.

1.6. Methodology of the study

1.6.1. Data collection

The dataset for this research is from the COCOMO NASA dataset which was collected by Jairus

Highn (Promise software engineering Repository, 2005). We used three different datasets that

contain 60, 63, and 93 software projects and each project has 17 attributes. The datasets we used

for our research have been used by many researchers including Research work (Maleki, Ghaffar,

& Masdari, 2014) (Algabri, Saeed, Mathkour, & et al, 2015). The dataset attributes are composed

of four class of attributes named product attributes, hardware attributes, personnel attributes, and

project attributes.

1.6.2. Simulation Environments

For this research, we used Anaconda Environment which is an open-source distribution of python

which contains python modules and packages for scientific computing. Besides, we used Pandas

as a software library for data manipulation and analysis and Numpy for controlling multi-

dimensional array input datasets which are used to calculate the fitness function value of individual

particle or population in particle swarm optimization and genetics algorithm respectively.

1.6.3. Experimental Evaluation

1.6.3.1. Evaluation Criteria in software cost estimation

In software effort and cost estimation, the accuracy of the work done is evaluated by a serious

of evaluation criteria. Magnitude Relative Errors (MRE), Mean Magnitude Relative errors

(MMRE), Median Relative errors some others have been used as evaluation metrics in software

cost estimation. Table 1.1 shows evaluation metrics in software effort and cost estimation.

7

Evaluation metrics Name Mathematical formula

Mean Magnitude Relative Error (MMRE) MMRE = 1/𝑛 ∑|𝑎𝑐𝑡 − 𝑒𝑠𝑡|\𝑎𝑐𝑡

𝑛

𝑖=0

Magnitude Relative Error (MRE) MRE =

|Actual effort − estimated Effort|

Actual effort

Percentage of prediction ( PRED(m)) K/n, where k the number of project whose MRE is

<=m in n testing dataset

Median Magnitude Relative error

(MDMRE) 𝑃𝑅𝐸𝐷(𝑛) =1

n∗ ∑

{1, 𝑖𝑓 𝑀𝑅𝐸 ≤ 𝑛0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝑛

𝑛=1

Mean squared error (MSE) 𝑀𝑆𝐸 = 1/𝑛 ∑(act − est)2

𝑛

𝑖=0

Root mean squared error (RMSE) RMSE = √∑(act − est)2

𝑛

𝑖=0

Mean of absolute error (MAE) 𝑀𝐴𝐸 = 1/n ∑(𝑎𝑐𝑡 − 𝑒𝑠𝑡)

𝑛

𝑛=1

Mean absolute percentage error (MAPE) 𝑀𝐴𝑃𝐸 = 1/n ∑ (𝑎𝑛 |

𝑎𝑐𝑡 − 𝑒𝑠𝑡

𝑎𝑐𝑡| ∗ 100)

𝑛

𝑖=1

Table 1. 1. Software effort and cost evaluation metrics (Miandoab & Gharehchopogh, 2016)

In our research, we used Mathian distance (MD), the absolute difference between the actual effort

and estimated effort as a fitness function. MRE, MMRE, PRED (0.25), and MAPE are used as

performance evaluation criteria. The MRE is by far the most widely used measure of effort

estimation accuracy (Karna & Gotovac, 2014).

1.7. Significance of the study

It will minimize underestimating and overestimating of software cost and effort by

improving estimation Accuracy.

The research will minimize the relative errors.

8

This research will increase the project success rate.

It will assists software companies to better analyze the feasibility of the project and also

efficient management of development process

This research is going to be used as a benchmark for different researcher community

1.8. Organization of the study

This section presents the remaining work of this research as follows.

Chapter Two deals with literature about software effort estimation techniques and models which

have been used for software effort estimation in software project management. And also related

works or state of the art which has been done using algorithmic techniques, machine learning

algorithm and meta-heuristic Algorithm are presented.

In Chapter Three, the proposed research methodology is described. Detail explanation of the

proposed solution and implementation is presented. In chapter four experimental result, evaluation

of the proposed method and comparison with COCOMO and other state-of-the-art models is

conducted. In the last chapter, chapter five, focused on the conclusion, contribution and future

works to be done in the future are covered.

9

CHAPTER TWO

LITERATURE REVIEW

2.1. Introduction

For effective software project management, accurate effort estimation is needed. With having

accurate software effort and cost estimation the project manager can easily manage and control the

project activities more easily and efficiently. Many algorithmic techniques, non-algorithmic

techniques, and machine learning algorithms have been developed for this purpose. Software life

cycle management (SLIM), Constructive cost model(COCOMO), Use case point analysis,

Function Point based model, Putnam’s Model, Experience-based Estimation, Estimation by

analogy and top-down approach, machine learning algorithm and heuristic algorithms are some of

the estimation techniques which has been used. In this section clear explanation about some of the

algorithms mentioned above will be addressed, the steps to be followed to estimate the cost of a

particular software project in software engineering economics are also addressed. Lastly, related

works or researches which used meta-heuristic algorithms as a technique to estimate the cost of

software and their gaps are presented.

2.2. Software cost estimation steps

In software engineering economics, there are four steps to be followed to estimate the cost of a

software project (Boehm, 1984). Each step provides an input to the next step. In effort estimation

software size is used as an input in combination with other attributes that affect the software effort

needed to develop software. To calculate the calendar time or schedule that software needed to

develop, software size and software effort is used as an input. The following are the task and steps

to be used to estimate the cost of a particular software project

1. Software size estimation

2. Software effort estimation

3. Software time estimation

4. Software dollar cost estimation

10

2.2.1. Project Size estimation

Project size estimation is the first step in software engineering economics to calculate the software

project effort and is the most crucial activity in software management. Because, Subsequent work

of effort and schedule estimating is based on project size (Ochieng, Mwangi, & Mwgha, 2014).

The project size of a particular software project is required by the project manager to determine

the cost of software and to determine the number of peoples to be allocated for a particular project.

Lawrence H. Putnam LOC Estimation and function point analysis is used to estimate the line of

code.

2.2.1.1.Lawrence H. Putnam LOC Estimation

In this technique, the line of code is estimated by breaking down the system into smaller pieces

and estimating the SLOC of each piece (Ochieng, Mwangi, & Mwgha, 2014). In this model, for

each piece of software system smallest piece SLOC, most likely SLOC, and Largest possible

SLOC estimate is made by up to three to four experts. Then the expected SLOC for each piece of

the system is computed by using equation 2.1 (Ochieng, Mwangi, & Mwgha, 2014).

Ei =𝑎+4𝑚+𝑏

6 (2.1)

Where a, is Smallest possible SLOC, b is Largest possible SLOC and m is Most likely SLOC

respectively.

Then the expected software size for the whole software project is the total summation of each piece

SLOC of the system and computed by the following equations

𝐸 = ∑ 𝐸𝑖𝑛𝑖=1 (2.2)

Where n is the total number of piece in the entire system.

2.2.1.2.Function point Analysis

In function point analysis techniques the software size is estimated based of the standard units.

By counting the number of external (inputs, outputs, inquiries, and interfaces) that make up the

system (Ochieng, Mwangi, & Mwgha, 2014). From the external inputs, input files, tables, forms,

screens and messages of the system will be counted as a factor of software size. From the external

inquiries I/O inquiries which requires a response like prompts, interrupts calls, etc…is counted.

11

Libraries or programs which are passed into and out of the system is also considered as a factor of

software size. To estimate the software size of a project the following steeps is taking place

(Ochieng, Mwangi, & Mwgha, 2014).

1. Count or estimate all the occurrences of each type of external (inputs, outputs, inquiries,

and interfaces).

2. Assign each occurrence a complexity weight

3. Multiply each occurrence by its complexity weight, and total the results to obtain a function

count

4. Multiply the function count by a value adjustment multiplier (VAM) to obtain the function

point count. VAM = ∑ vi× 0.01+0.065

The multiplicity of each occurrence by its complexity weight is calculated using the following

table.

Description Low medium High

Externals inputs 3 4 6

Externals outputs 4 5 7

external inquiries 3 4 6

external interfaces 5 7 10

internal data files 7 10 15

Table 2. 1. Complexity weight (Ochieng, Mwangi, & Mwgha, 2014)

2.3. Software cost and effort estimation techniques

Software effort and cost estimation techniques are broadly classified into Algorithmic and non-

algorithmic techniques. The algorithmic model uses multiple cost factors in a mathematical

formula to estimate the effort. Constructive Cost Model (COCOMO), software Life Cycle

Management (SLIM) model, Function Point based model, use case point analysis,

and Putnam’s Model (Shekhar & Kumar, 2016) are some of the algorithmic techniques.

12

2.3.1. Constructive Cost Model(COCOMO)

One of the most important, well documented and widely used algorithmic model which was

proposed based on the study of 63 projects by Barry Boehm in 1981 is Constructive Cost Model

(Boehm, 1984). This model estimate the software cost and effort using the size of software and

other cost driving factors. Basically the model has three variants named basic Constructive cost

model, Intermediate Constructive cost model and Advanced Constructive cost model. In

COCOMO model the Code size is represent in Line of code (LOC) or thousand Line of Code

(KLOC) and effort is measured in terms of person-month.

A). Basic COCOMO Model

The basic Constructive cost model uses only program size estimated by Line of code or from

function point Analysis, multiplicative constant A and exponential constant B to estimate the

effort of software project. The basic COCOMO model is used for quick calculation. The

estimation accuracy level is very low since many cost-factor of software is not considered and it

is the simplest COCOMO type to use. The model has three class of project:-Organic, semi-

detached and Embedded. Its classification is primarily depend on the size of the project and also

depend on the complexity of project, experience of developer and requirement type. The Basic-

COCOMO model and its projects size is shown in the following table

Model Name Project size

Organic Less than 50 KLOC

Semi-detached 50-300 KLOC

Embedded Over 300 KLOC

Table 2. 2. Basic-COCOMO model types and its project size (Boehm, 1984)

(1). Organic: - this software has a small team to develop the project and requirements are clearly

identified and also the problem is understood very well, and are solved before.

(2). Semi-detached: this class of software has project requirements which is difficult to solve,

project size is more complex than organic class of project.

13

(3). Embedded: - embedded class of software is for embedded system, where highest level of

creativity and large team size is required. The value of the constant coefficient A and B is shown

in table 2.3. In all class of this software, the effort of software is calculated using equation 2.3.

Effort = A (KLOC)B (2.3)

Basic COCOMO

projects

A B

Organic 2.4 1.05

Semi-detached 3.0 1.12

Embedded 3.6 1.20

Table 2. 3. . Basic COCOMO projects Coefficient value (Boehm, 1984)

B). Intermediate COCOMO model

Intermediate COCOMO model includes cost drives beside the line of code used in the Basic

COCOMO Model. Cost drives include product attributes, personnel attributes, hardware

attributes, and project attributes. Therefore the estimated cost and effort is the combination of

line of code and this cost drives. In the intermediate COCOMO model, nominal effort estimation

is calculated using the power function of A and B with the value being slightly different from

that of the basic COCOMO (Leung & Fan, 2013). The cost factors has a value ranging from 0.7

to 1.66 and the estimated effort is calculated using equation 2.4.

Effort = A ∗ (KLOC)B ∗ EMF (2.4)

Where EMF is the product of all the cost factors

14

Intermediate COCOMO class of

project

A B

Organic 3.2 1.05

Semi-detached 3.0 1.12

Embedded 2.8 1.20

Table 2. 4. Coefficients value in Intermediate Model (Boehm, 1984)

Effort

multipliers

Code

Multipliers

name

Rating

Personnel

Attributes

Very

low

Low Nominal High Very high Extra

high

ACAP analyst

capability

1.46 1.19 1.00 0.86 0.71

AEXP application

experience

1.29 1.13 1.00 0.91 0.82

PCAP programmer

capability

1.42 1.17 1.00 0.86 0.70

VEXP virtual

machine

experience

1.21 1.10 1.00 0.90 -

LEXP language

experience

1.14 1.07 1.00 0.95 -

Project

attributes

MODP modern

programming

practice

1.24 1.10 1.00 0.91 0.82

TOOL software

tools

1.24 1.10 1.00 0.91 0.83

SCED development

schedule

1.23 1.08 1.00 1.04 1.10

Product

attributes

RELY required

software

reliability

0.75 0.88 1.00 1.15 1.40

15

DATA database size - 0.94 1.00 1.08 1.16

CPLX product

complexity

0.70 0.85 1.00 1.15 1.30 1.65

Computer

attributes

TIME execution

time

constraint

- - 1.00 1.11 1.30 1.66

STOR main storage

constraint

- - 1.00 1.06 1.21 1.56

VIRT virtual

machine

volatility

- 0.87 1.00 1.15 1.30

TURN computer

turnaround

time

- 0.87 1.00 1.07 1.15

Table 2. 5. . Cost factor and their weight in intermediate COCOMO (Salijoughinejad & Khatibi, 2018)

C). Detailed COCOMO Model

Both the basic and intermediate COCOMO model estimate the software effort at the system level,

which means effort estimation is calculated by considering the software product as a single

homogenous entity. But the fact is most large software project is made up of much smaller

subsystem. From this smaller subsystem, some of them may require little innovation, small team,

the requirement is clearly defined. And the other subsystem may require to build within a set of

tight hardware and software. So the weighting value for each cost factor throughout the process

should not be the same. Because this will make a variation during the cost, effort, and time

estimation process.

To solve this problem the detailed COCOMO model estimates the software effort by analyzing the

effect of each cost factor in each phase of software development. Compute effort as a function of

software program size and a set of cost drivers weighted according to each phase of the software

development lifecycle (Glinz & Mukhija, 2003). The detailed COCOMO is for large system that

contain non-homogenous subsystem (Leung & Fan, 2013). The phases of software development

to estimate the effort of software in Detailed COCOMO model are: Requirement design and

product design (RPD), detailed design (DD), code and unit testing (CUT) and integration and test

16

(IT). Estimated effort of each module of the software gives the effort of subsystem and the

combination of all the effort of subsystem eventually gives the effort of the whole system. The

rating scale for each cost derivers in the four phases of detailed COCOMO model is represented

in table 2.6.

Rating RPD DD CUT IT

Very low 1.80 1.35 1.35 1.50

Low 0.85 0.85 0.85 1.20

Nominal 1.00 1.00 1.00 1.00

High 0.75 0.90 0.90 0.85

Very high 0.55 0.75 0.75 0.70

Table 2. 6. Effort multiplier rating scale and its value for detailed cocomo model (Glinz & Mukhija, 2003)

2.3.2. COCOMO II Model

The COCOMO model was developed based on the waterfall software development process model.

To incorporate modern software development process model, COCOMO II developed. The

COCOMO II model can be applied to calculate the effort of software project that uses incremental,

iterative, or spiral model as a development process model or when reengineering is required. The

effort of a project is calculated either in the early Design phase or Post-architecture using equation

2.5. Effort is measured in terms of Person-Month (PM). Person Month is the amount of time that

one person working on the software project development for one month (Abts, Brown, & etal,

2000).

PM = A × SizeE × πi=1n EMi (2.5)

Where,𝐸 = 𝐵 + 0.01 × ∑ SF5𝑗=0 n represents the number of effort multipliers in the early design

or Post-architecture, n is 17 for Post- Architecture model, and 7 for Early Design model. SF

represents the five scale factors in COCOMO II. A and B are constants whose value is derived

from 161 software projects. EM is the product of 17 effort multipliers. In the COCOMO II, there

17

are five scale factors namely precedentedness (PREC), Development Flexibility (FLEX), Risk

Resolution (RESL), Team Cohesion (TEAM), and Process Maturity (PMAT). Table 2.7 represent

the COCOMO II effort multipliers and associated value.

Scale

Factors

Very low Low Nominal High Very High Extra High

RELY 0.82 0.92 1.00 1.10 1.26

DATA 0.90 1.00 1.14 1.28

CPLX 0.73 0.87 1.00 1.17 1.34 1.74

RUSE 0.95 1.00 1.07 1.15 1.24

DOCU 0.81 0.91 1.00 1.11 1.23

TIME 1.00 1.11 1.29 1.63

STOR 1.00 1.05 1.17 1.46

PVOL 0.87 1.00 1.15 1.30

ACAP 1.42 1.19 1.00 0.85 0.71

PCAP 1.34 1.15 1.00 0.88 0.76

PCON 1.29 1.12 1.00 0.90 0.81

APEX 1.22 1.10 1.00 0.88 0.81

PLEX 1.19 1.09 1.00 0.91 0.85

LTEX 1.20 1.09 1.00 0.91 0.84

TOOL 1.17 1.09 1.00 0.90 0.78

SITE 1.22 1.09 1.00 0.93 0.86

SCED 1.43 1.14 1.00 1.00 1.00

Table 2. 7. COCOMO II effort multipliers (Singal, Kumari, & Sharma, 2020)

18

2.3.3. SLIM Model

Software Life Cycle Management is one the algorithmic techniques which used for large projects.

It is based on the Norden / Rayleigh function and generally known as a macro estimation model

(Jamil, 2007). It is one of the first algorithmic cost and empirical software estimation model

(Keshta, 2017). In this algorithm both the software effort and time which is needed to develop a

software project is described. The software effort is calculated using equation 2.6

𝐸𝑓𝑓𝑜𝑟𝑡 = [𝑆𝑖𝑧𝑒

(𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑣𝑖𝑡𝑦∗𝑇𝑖𝑚𝑒)(4

3)] ∗ B (2.6)

Where Size is the estimated size of the software product, productivity is the productivity of the

organizational process

2.3.4. Experience based Estimation:

It is the most frequently estimation techniques for software projects and used when gathering

requirements and data is difficult. (Borade & Khalkar, 2013). The estimation is computed from

the experience of peoples in the area.

2.3.5. Estimation by analogy

This estimation measurement is the process of driving a solution by finding similar works which

is done previously and apply that solution in a new problem domain. The analogy technique is

very similar to the experience based estimation but merely uses information and experience gained

form previous projects. The experience based estimation is a human intensive approach and

estimation by analogy is data intensive approach based on one or more specified potential

analogous projects (Keung, 2009).

2.3.6. Top down and bottom up approach

The total effort estimate is based on properties of the project as a whole and distributed over project

activities (top down approach) or calculated as the sum of the project activity estimates (bottom-

up) (Jørgensen, 2004). The algorithmic methodology and experienced based estimation techniques

have limitation in predicting the accurate effort value of software projects. Currently, the meta-

heuristic algorithms are widely used and produce better result to estimate the effort and cost of

software projects.

19

2.4. MetaHeuristics Algorithms

MetaHeuristics algorithms are Algorithmic structure and computational intelligence paradigm

which is designed to find an optimum solution for NP-hard problems ( Sörensen & Glover, 2017).

In any field, some problems need to be minimized or maximized. These kinds of problems are

challenging and not suitable to be solved by traditional machine learning algorithms.

MetaHeuristics algorithms have a robust searching mechanism that can extract useful information

from incomplete information. In the MetaHeuristics algorithm, an optimum solution is achieved

due to exploration and exploitation. The exploration operation allows the algorithm to find the

local optimum solution and the exploitation operation is responsible for searching new areas that

help the algorithm to achieve the global optimum solution. Different researchers and academic

professionals classified this algorithm in different ways. Some of them classified as nature-inspired

and non-nature inspired MetaHeuristics algorithms. And others classified as trajectory-based and

population-based MetaHeuristics (Abdel-Basset, Abdel-Fatah, & Sangaiah, 2018). The following

algorithms are some of the MetaHeuristics algorithm which have been used to find optimum

solution in software cost estimation.

2.4.1. Ant Colony Optimization (ACO)

Ant Colony optimization algorithm is a heuristic algorithm that is inspired by the social behavior

of Ante colonies and originally introduced by Marco Dorigo and colleagues in the 90s (Dorjio,

1992). Ants’ foraging behavior, the way how the ants find the food source, and how they return

back to the ant nest is the inspiration for the founder of ACO (Blum, 2005). This forging behavior

of ants exploited in artificial ant colonies to find a better solution for continuous and discrete

problems that need an optimum solution. The movement of this artificial ant is controlled by the

goal of searching optimum solution for the given ACO problems. Once the colony optimization

problem is given, the first step will be generating artificial colonies that are part of the solution

component and can represent the given problem. Then this generated artificial ants try searching

solution in their colonies. In the next move, the ant deposit or define organic compound called

pheromone value through which the ant colonies communicate with each other. As soon as the

first ant find a solution, the pheromone value will be updated so that the other ant will take the

shortest path based on the concentration level of the space in the search space. The higher the

20

pheromone value of space, has a higher probability to be chosen as the shortest path. The basic

working of Ant colony optimization algorithm is graphically shown in figure 2.1.

Figure 2. 1. The basic working of ant colony optimization algorithm (Blum, 2005).

In software project development, resources are limited and the project should be completed with

minimum cost or affordable software cost. But usually, projects are completed beyond their

scheduled cost and time. As a MetaHeuristics algorithm, the ant colony optimization algorithm

has been used by the researcher to find an optimum solution which can reduce overestimation and

underestimation. This algorithm was widely used to optimize the coefficient parameter value of

the COCOMO model and better result achieved compared with the current COCOMO model.

2.4.2. Particle Swarm Optimization (PSO)

Particle swarm optimization (PSO) is one of the most powerful and latest evolutionary

optimization techniques developed by Eberhart and Kennedy in 1995 (Kennedy & Eberhart, 1995)

and inspired by the social behavior of birds flocking or fish schooling. The algorithm is initialized

with a population of a random solution and search for a potential solution (called as particles) by

updating the velocity and the position of each particle in each generation. Each particle is defined

by position and velocity. The particles, which are the potential solution in the PSO algorithm, are

dispersed at various points in the solution space and the position of each particle update their

position according to its previous best position and the global best position. In addition to the

21

position, each particle has a velocity that describes the movement of a particle in the sense of

direction and distance.

Every particle has a memory of its own best position and has a global experience among the

member of the swarm. Once the position, velocity, and personal best are defined for each particle,

this position and velocity are updated for every iteration after calculating their fitness value. If the

fitness function is intended to find the global minimum, the least fitness value among the swarm

will be assigned as a global best position so that all the particles will be moved into this global best

position. And if the fitness function is aimed at finding the global maximum solution, the largest

fitness value among the swarm will be assigned as a global best position. The position and velocity

of each particle in each iteration will be updated by equation 2.7 and 2.8 (Sengupta, Basak, &

Peters, 2018).

V(t + 1) = 𝑤 ∗ 𝑉(𝑡) + 𝑟1𝑐1(Xpbest − X(t)) + r2c2(Xgbest − X(t)) (2.7)

X(t + 1) = 𝑥(𝑡) + 𝑉(𝑡 + 1) (2.8)

Where variables in this two equations are

V(t+1) the velocity of the swarm particle at t+1 timestep

V (t) The velocity of the particle at time t

X(t) The particle position at time t

W Inertial weight

Xi (t+1) the position of the swarm particle at t+1 timestep

c1 and c2 Learning factor or accelerating factor(cognition and social

acceleration coefficients,)

r1 and r2 Uniformly distributed random number between 0 and 1

Xpbest Particle best position

22

Xgbest Global best position

Table 2. 8. Parameters of PSO

Figure 2. 2. Graphical representation of PSO

Where, xi (t) is the initial position of particles, xi (t+1) represent the next position, Vi (t) and Vi

(t+1) is initial and the next velocity of particles respectively. Pi(t) and g(t) is the personal and

global best value of the swarm particles respectively.

Cognitive (personal) and Social (global) Acceleration Coefficients (c1 and c2)

The personal acceleration coefficients (c1) is responsible to control the particles’ acceleration

towards the personal best position. And the global acceleration coefficients’ control the particles’

acceleration towards the global best position. These acceleration coefficients are weights that

measure how much a particle should weigh moving towards its cognitive attractor (PBest) or its

social attractor (GBest) (Sengupta, Basak, & Peters, 2018).

23

PSO pseudocode

The PSO algorithm pseudocode is:

Input: Randomly initialized position and velocity of particles: Xi (0) and VI (0)

Output: position of the approximate global minimum X*

1: while terminating condition is not reached do

2: for i=1 to numbers of particles do

3: calculate the fitness function f

4: update personal best and global best of each particles

5: update the velocity of the particle using second equation

6: update the position of the particle using first equation

7: end for

8: end while

24

start

Generate random particle

Evaluate the fitness function

Evaluate particles personal best

position

Evaluate particles global best position

Update the velocity of the particle

Update the position of particle

Maximum iteration reached?

Yes Return result

No

Figure 2. 3. Flowchart diagram for PSO

25

2.4.3. Genetics Algorithm

Genetic algorithm is a meta-heuristics algorithm type that has been commonly used for generating

high-quality solutions for optimization and search problem. John Holland discovers the genetic

algorithm in 1960 based on the concept of the Darwin’s theory of evolution and further described

by Goldberg (Goldbreg, 1988). In genetic algorithm a solution is called chromosomes that contain

a set of gene. In the GA The evolution usually start with generating an initial population containing

N chromosomes from a randomly generated individual and then the chromosomes are evaluated

based on their fitness function value (the value of objective function in the optimization problem

being solved), where the chromosome which has a better fitness value or performance are more

likely to be selected for the next population. Fitness function is the function that the genetics

algorithm trying to optimize (Carr, 2014). Finally the algorithm terminate when either a maximum

number of generation is generated, or a satisfactory fitness level has been reached from the

population. Basic steps or types of operation in genetics algorithm is shown in Figure 2.4.

26

start

Initialize the

population

Evaluate objective

function

Result

achieved? Yes Return the result

Selection

Crossover

Mutation

Figure 2. 4. Basic operation steep in genetics algorithm.

The evolution of genetics algorithm starts with generating random population. The number of

population to be generated is depend on the problem to be solved. The next step is evaluating the

objective function which represent the problem being solved. The accuracy of the result is directly

depend on the kind of fitness function we used to represent the problem. The objective function

should clearly represent the real world problem to be solved. In this step optimum solution is

generated and return. If the problem requires to find global maximum solution, the objective

function return chromosomes which has highest fitness value and if the problem to be solved is

intended to find global minimum solution, the fitness function return chromosomes which has the

27

list fitness value. If the returned fitness value is the highest which satisfy the problem, the process

will terminate in the first generation and result will be returned. If not the next genetic algorithm

operation called selection will start. Selection is the process of selecting two parents from the

population for crossing (Saini, 2017). Selection operation is performed based on the objective

function value. Chromosomes which has better fitness value will be selected in this process and

passes to the next generation and Chromosomes which has less fitness value will be discard from

the population. The most widely used selection method is Roulette Wheel Selection, Rank

Selection, Tournament Selection, and Boltzmann Selection (Saini, 2017).

The second steep in genetics algorithm is crossover where the selected parents are recombined

together to find better fitness function value. The main role is to provide mixing of the solutions

and convergence in a subspace (Yang, 2014). In this steep there are commonly three operators: -

single point crossover, multipoint crossover and uniform crossover to be selected based on the

fitness function of the problem. The following figures shows single point crossover, two point

crossover and uniform crossover respectively in genetics algorithm.

28

Figure 2. 5. Single point crossover, two point crossover and uniform crossover

Sometimes the entire population may have the same allele where crossover operator can’t change

and the solution remains the same and low. To overcome such problems mutation operator is added

in genetics algorithm where parts of a solution change randomly to increase the diversity of the

population and to explore the entire search space (Yang, 2014). In this operation new offspring is

produced from a single parent. The following schematic representation shows a single point

mutations.

Original gene (Before mutations)

1 0 1 0 0 1 1 0

New gene (After mutation)

1 0 1 0 1 1 1 0

Genetics Algorism (GA) have been used to solve problems which have no deterministic solution.

From software engineering perspective, Genetics Algorithm have been used to solve problem like

29

software cost estimation, task scheduling, clustering, natural language processing, query

optimization, image processing (Sharma, 2017). (Maleki, Ghaffar, & Masdari, 2014) Used genetic

algorithm in combination with Ant Colony optimization algorithm to optimize the effective factor

weight using NASA software projects dataset. Another author (Sachan, Nigam, Singh, & et al,

2016) used Genetics Algorithm to optimize the parameter value of COCOMO model. Author

(Omara & Arafa, 2010) used genetics Algorithm to minimize the total execution time, load balance

satisfaction, and to overcome communication overhead problem.

2.4.4. Hybridization of Meta-heuristic Algorithm

Non-linear problems which are non-deterministic in nature have been solved using Meta-heuristic

algorithm and promising result achieved. But we can’t certain that satisfactory results achieved

every time. And recently researchers are using the Meta-heuristic Algorithm in combination to

achieve more promising result. While using two meta-heuristic algorithms together the approach

involves using the two approaches sequentially, in parallel or using the operation of one algorithm

in the other algorithm framework (Sengupta, Basak, & Peters, 2018).

2.5. Related works

According to the research of (Sachan, Nigam, Singh, & et al, 2016), the researchers proposed a

simplified genetic algorithmic model to optimize the parameter of the basic COCOMO model and

they found better realistic estimation over the basic COCOMO model. In their research, they used

crossover and selection operator for calculating a new value of A and B and the model with the

optimal set of parameter A and B gives an improved estimation compared to Basic COCOMO. So

it is possible to say, method based on the genetics algorithm is better than the algorithmic

Constructive cost model. The drawback of this study is, the research is based on a very small

dataset and considered only the size of a particular software project to estimate the effort of

software.

In research (Rijwani & Jain, 2016) Multi Layered Feed Forward Artificial Neural Network

Technique was used to estimate the effort of software project. The model trained with 23 inputs

and a hidden layers using back propagation algorithms and tested with a randomly selected 13

COCOMO software project dataset. MRE and MMRE were used to evaluate the result of the

30

experiment. The result showed significant reduction of relative errors and better achievement

compared with COCOMO model

In study (Algabri, Saeed, Mathkour, & et al, 2015), the researchers used genetics algorithm as a

technique for tuning parameters of COCOMO model to predicate the software cost estimation

more accurately. In their experiment initial populations of 100 individual was generated and the

experiment were performed using 93 NASA projects dataset. After 1000 iterations they found New

COCOMO model coefficients for each class of project; Organic model, semi-detached model and

embedded model and the result showed more realistic development time comparing to the real

development time.

In study (Salijoughinejad & Khatibi, 2018), a hybrid algorithms were used to increase the accuracy

of cost estimation by enhancing the COCOMO model. In this study, improvement of the

COCOMO model were done by effective selection of coefficients and reconstruction of the

COCOMO model cost drives value. The authors used GA, PSO, invasive weed optimization

algorithm (IWO), and a combination of PSO and IWO to find the optimum value of coefficients

and cost drives. Their experiment result were divided in to three section for organic, semi-detached

and Embedded Model of COCOMO. The MMRE used to evaluate the experiment result. Their

result shows: - for the organic mode of project, newly optimized coefficient value with the

COCOMO model cost drives value was found better to enhance the cost estimation accuracy using

the hybrid IWO and PSO. For the semi-detached mode of project, the COCOMO model cost drives

value was reconstructed and a new coefficient value found which can increase the estimation

accuracy. For the Embedded mode of project, new cost drives value with new coefficients value

was generated to enhance the estimation level. Generally, the authors achieved better result

compared with the current COCOMO model.

Authors of (Singh, Singh, & Mishra, 2018) suggested Environmental Adaption method for

estimating software development cost. In this paper IEAM-RP were used for tuning the parameter

of Sheta model of software cost estimation and the result produced by IEAM-RP have shown better

result compared to other existing techniques.

A model based on Genetic Algorithm combined with Bat Algorithm (Dizaj & Gharehchopogh,

2018) is used to predict the software cost estimation. In this research the authors provided a new

31

method by considering the effect of qualitative factors with false variables in the relation

concerning the total estimation of the cost. The proposed method was investigated and assessed

on four different dataset based on seven criteria. Their experimental result showed that the

proposed method improved the accuracy in the Software Cost Estimation by reducing errors value

in comparison with COCOMO model.

In study (Nadal & Sangwan, 2018) a hybrid Improved Bat Algorithm and Gravitational search

algorithm (BATGSA) were used to optimize the COCOMO model. The Improved Bat algorithm

used for exploration and the gravitational search algorithm used for Exploitation. The result found

from the Bat Algorithm further improved by the gravitational search algorithm. The proposed

algorithm tested on four different NASA dataset and compared with three state of the art

techniques. The authors could reduce errors ranges from 2% to 10%.

Thanh (Le & Khuat,, 2016) used a Directed Artificial Bee Colony Algorithm to tune the parameter

value of COCOMO model based on the past actual effort provided in the dataset. The experiment

result based on NASA dataset improved the accuracy of effort estimation compared with the

COCOMO model. To evaluate the result, MMRE, MdMRE, PRED (25), and MAR were used and

the result was improved in all evaluation criteria.

In (Maleki, Ghaffar, & Masdari, 2014) has used genetics algorithm and ant colony optimization

for estimation of the software cost. In this research work MMRE is used as a fitness function. The

GA used to test the suitable value of cost factors according to the project size and they used ACO

to train more optimize factors. Finally, they achieved better result compared with COCOMO. The

basic limitation in this study is the methodology they used. The authors tried to adjust the effort

multipliers value based on the size of the software just to make their estimated effort is more closer

to the actual effort. This approach is not appropriate because we can’t determine the reliability,

availability, and other non-functional attributes of the software by looking the software size from

the dataset. We have to use the value of the effort multipliers (non-functional attributes) which is

provided from the dataset to get more optimized result.

Authors (Alajlan & Tagoug, 2016) used genetics algorithm to tune the parameter value of

COCOMO II model coefficients to estimate the effort and development time using NASA 93

dataset and the optimized coefficient value of A and B produce more accurate result than the

32

current COCOMO II model. However, the achieved result is not satisfactory when compared with

the actual effort.

Prerna (Singal, Kumari, & Sharma, 2020) used Differential Evolution Algorithm to improve the

parameter values of COCOMO and COCOMOII model. The authors applied three successful

mutation strategies to find the value of the coefficients and MRE was used as evaluation criteria.

The result were investigated using Promise repository dataset (NASA 93 and COCOMO81) and

better effort estimate were provided.

2.5.1. Summary of related works

NO Authors Research Titles Algorithm used Research gap

1 (Sachan, Nigam,

Singh, & et al,

2016)

Optimizing Basic COCOMO

Model using Simplified Genetic

Algorithm

Genetic

Algorithm

They used only Basic

COCOMO and didn’t consider

the effort factors.

Lack of estimation accuracy

2 (Algabri, Saeed,

Mathkour, & et

al, 2015)

Optimization of Soft Cost

Estimation using Genetic

Algorithm for NASA Software

Projects

Genetics

Algorithm

The researchers only improved

parameter value for basic and

semi-detached class of

COCOMO projects

Their accuracy For embedded

class of project is not

satisfactory

3 (Maleki, Ghaffar,

& Masdari,

2014)

A New Approach for Software

Cost Estimation with Genetic

Algorithm and Ant

Colony Optimization

GA and ACO The authors tried to test the

suitable value of Effort

multipliers by using the

software size (line of code)

rather than using the scale

33

value of EM provided on the

dataset

The New effort multipliers

value provided by the authors

is not valid with other dataset.

4 (Nadal &

Sangwan, 2018)

Software Cost Estimation by

Optimizing COCOMO Model

Using Hybrid BATGSA

Algorithm

Improved Bat

Algorithm and

Gravitational

search

algorithm

The algorithm used for

exploration and exploitation in

their methodology couldn’t

produce satisfactory result.

5 (Singal, Kumari,

& Sharma, 2020)

Estimation of software

development effort: A Differential

Evolution Approach

A differential

evolution

approach

The experimental result could

improve the original

COCOMO. But there is higher

relative errors to the actual

effort of projects

Table 2. 9. Summary of related works

As a conclusion, from the above related works, most of the research works focused on only to the

software size (Basic COCOMO) as a factor to estimate the effort of software project. And in some

research works, effort multipliers (Intermediate COCOMO) are used in combination to the

software size to estimate the effort of software project. But the result is not satisfactory and have

larger relative errors to the actual effort. Which means, the exploration and exploitation nature of

the algorithm used in their research couldn’t find the optimal value of A and B. So, in this research

the strength of the PSO for exploration and GA for exploitation is used to find global optimum

value of the intermediate COCOMO coefficients. So that, the estimation capability of the

COCOMO model is improved since the model is predominantly depend of the complexity factors

of A and B.

34

CHAPTER THREE

DESIGN OF METHODOLOGY

3.1. Introduction

Software project effort is the number of manpower required to complete a particular software

project. In the intermediate COCOMO model, the software project effort is calculated using the

software size and effort multipliers as a major input and measured in terms of person/month. The

effort multipliers are project, personnel, product, and computer attributes of software that directly

affect the effort and cost of the software. In the COCOMO model, these attributes are called effort

multipliers or cost factors. There are 15 attributes in the COCOMO model to calculate the effort

of a software project. To adjust the relation between cost factors and software project effort, there

are two constant called multiplicative constant (A) and exponential constant (B). The value of A

and B is derived from 63 project historical characteristics and their value is different and constant

in each mode of intermediate COCOMO model. A software project effort is calculated in the

following formula (Maleki, Ghaffar, & Masdari, 2014).

Effort = A × (KLOC)B × EMF (3.1)

Where, A and B are multiplicative and exponential constant respectively. KLOC is the size of a

particular software project and EMF is the product of all the fifteen effort multipliers or cost factors

given by the formula.

EMF = EM1 × EM2 × EM3. . .× EM15 (3.2)

It is difficult to calculate and estimate effort of a software project using software size and effort

multipliers provided in the COCOMO model because the relation between cost factors and effort

is not linear. So, the multiplicative and exponential constant value should be optimum. The

accuracy of effort, cost, and schedule estimation can be increased by finding an appropriate value

of A and B. Hence, the multiplicative and exponential constant values need to be optimal. In this

thesis, the major objective task is to find the optimum value of A and B using particle swarm

optimization and genetic algorithm. Our proposed approach consists of two steps. In the first step,

local optimum solutions are generated and in the second step, the optimized coefficient parameter

values are generated. Our focus is optimizing the parameter values of intermediate COCOMO

35

multiplicative constant A and exponential constant B using the PSO_GA. In the training phase,

the inputs are the actual effort, KLOC and other 15 attributes of the project and the outputs are the

optimized value of multiplicative constant A and exponential constant B. Then the testing process

will be executed using this new value of A and B.

3.2. Design of the proposed model

We hybrid two Meta-Heuristic algorithms, Particle Swarm Optimization and Genetic Algorithm.

In the particle swarm optimization Algorithm, every particle participates to find the best global

position by updating their personal best position until maximum criteria reached. So, there is much

probability in the PSO algorithm not to lose local optimal solution which helps the swarm particles

to reach the global optimum solution. PSO has a memory, so knowledge of good solutions is

retained by all the particles; whereas in GA, previous knowledge of the problem is discarded once

the population changes (Kao & Zahara, 2008). So that there is an opportunity to lose the optimum

solutions in the genetic algorithm and the genetic algorithm has opportunities to converge into

local minimum value before finding the global minimum. To solve this problem, we are going to

use the PSO algorithm to generate an initial solution and GA to optimize the value of coefficients

obtained by the PSO Algorithm.

36

Figure 3. 1.The proposed PSO_GA model system architecture

The proposed model has three phases. In the first phase initial value of A and B is generated and

local optimum solution for A and B is produced by using PSO. In the second phase, the local

optimum value of A and B is optimized using GA. In the third phase the estimated effort of the

testing project is calculated using the new optimized value of A and B. finally estimated effort is

generated. The following steeps explain what is happening in each phase of the proposed model.

3.3. Generating Initial solution using Particle Swarm Optimization

The Particle swarm optimization algorithm produces the initial solution which helps the genetic

algorithm to produce optimized result. Figure 3.2 shows the steps to generate initial optimized

values of A and B.

37

Train

dataset

PSO

Algorithm

Generate Initial values

of A and B

Calculate the fitness

value for each value of

A and B

Assign personal best

(Pbest) value for each

A and B

Is the current

fitness value of A

and B better than

Pbest?

YES

Assign current fitness

value as Pbest

NoKeep previous

Pbest value of

A and B

Assign best A and B s Pbest

value to Global best (Gbest)

Calculate the velocity for

each A and B

Update the value of each A

and B using the Velocity

Maximum

Iteration

Reached?YES

GA

NO

Figure 3. 2. Flow chart diagram for PSO

38

3.3.1. Generate initial Random values and assign Pbest value of A and B

The swarm particles in the PSO algorithm are represented as A and B. The first step in the PSO

algorithm is generating random values of A and B which are part of the initial solution. These

solutions will be optimized through the process. The number of A and B to be generated are

depends on the problem and appropriate parameter selection in PSO and GA algorithm plays a

significant role in the performance and efficiency of estimating the effort of software. We

generated 1000 values for each A and B with an initial random position and velocity. These

initialized random values of A and B are candidate solutions to the problem. In each iteration of

the PSO algorithm, the optimized value of A and B will be generated from this candidate solution.

The candidate solution of A and B will be moved in a search space and evaluating their position

through a fitness function. In this step, we set the memory of the swarm and a randomize A and B

in the memory. The particles are given a random position. The personal best for each A and B is

the initial position. Once A and B are generated and their personal best is assigned, we set the

inertia weight value (w) as 1.2, and the acceleration coefficients c1 and c2 as 2.0. The value of w,

c1 and c2 is set to be 1.2 and 2.0 respectively since these value is the most widely used and

recommended parameter value in research paper ( Langsari & Sarno, 2017) (HE, MA, & ZHANG,

2016). Our model also produce better result with these value. The personal acceleration

coefficients (c1) is responsible to control the particles’ acceleration towards the personal best

position. And the global acceleration coefficients’ control the particles’ acceleration towards the

global best position. The inertia weight value describes the effect of the previous velocity on the

current velocity.

3.3.2. Calculate the fitness function

The fitness function determines how good each particle A and B position is in the multidimensional

space to the desired goal. So that it will help the algorithm to understand the next best step for each

A and B. The desired goal in this problem is minimizing the gap between the estimated effort

computed by using A and B and the actual effort provided in the dataset. In our proposed model

we are using the summation of the absolute difference between the estimated effort and actual

effort (Manhattan distance (MD)) as a fitness function. In this case, we are trying to get parameters

value which minimizes the output function. The Manhattan distance is computed using equation

3.3.

39

MD = ∑ |Actual Effort − Estimated Effort|𝑛1=1 (3.3)

Where, n represent the number of input dataset used to build the model. If the personal best value

is larger than the newly calculated value named fitness candidate, the fitness candidate will be

assigned to personal best value and the fitness candidate position will be assigned to the personal

best position. And if the global best value is larger than the fitness candidate, the fitness candidate

will be assigned to the global best fitness value and the fitness candidate position will be assigned

to the global best position.

3.3.2. Calculate the Velocity and update the position of Particles A and B

The velocity of the particles determines where the particles is moving into and how fast that

particle is moving. In this phase, we are trying to calculate the best Location where each particle

is seating in. and we are also trying to figure out what is the best position inside the total particles

A and B. The new velocity and position of particles are calculated by the following formula.

Vi(t + 1) = 𝑤 ∗ 𝑉𝑖(𝑡) + 𝑐1 ∗ 𝑟1 ∗ (𝑃𝑏𝑒𝑠𝑡𝑖(𝑡) − 𝑋𝑖(𝑡)) + 𝑐2 ∗ 𝑟2 ∗ (𝐺𝑏𝑒𝑠𝑡(𝑡) − 𝑋𝑖(𝑡) (3.4)

Xi(t + 1) = 𝑋𝑖(𝑡) + 𝑉𝑖(𝑡 + 1) (3.5)

This step will be repeated until the iteration number reaches 1000. In each iteration, we have a

Global best position to be appended on an empty array. We consider this value in the empty array

as a local optimum solution. There as an opportunity where the global optimum solution appears

in this step and the genetics algorithm operation can’t change any more. When maximum iteration

reached the appended global best position will be passed to the genetics algorithm as initial

populations. Now the genetics algorithm has a local optimum solution generated by a swarm

optimization algorithm. The genetic algorithm does not waste time finding a new local optimum

solution using its objective function. The GA will try to get the global optimum solution using the

input gained from PSO as an input. This is where the particle swarm optimization algorithm ends

and the genetic algorithm starts processing. The parameters which affect the operation of the

genetics algorithm and its value that we used in the proposed model are presented in table 3.1.

40

Parameters Value

Number of particles A and B 1000

Number of generation 1000

Fitness function MD

C1 2.0

C2 2.0

W 1.2

r1 and r2 Random number between 0 and 1

Table 3. 1. Parameter of PSO and its value

3.4. Optimizing the Coefficient value of A and B using the Genetic Algorithm

Figure 3. 3. Section of genetics Algorithm operation for the proposed methodology

41

3.4.1. Calculate fitness value

The genetic algorithm received an initial solution for parameters A and B from the particle swarm

optimization algorithm. The genetic algorithm will start its work by calculating the fitness value

of the population received using MD as a fitness function. The problem that we are trying to solve

is a minimization problem so that the objective function returns the population (A and B) which

has a minimum fitness value. This value will be used as an input to the selection process.

3.4.2. Selection

In this step, best individual values of A and B which has minimum fitness value are selected as

parents for producing offspring to the next generation. While selecting the best parents, we used a

fitness-based selection mechanism.

3.4.3. Crossover

Recombination is used to combine the genetic information of two parents to form new offspring

to ensure newly created individuals are more likely to be better than the parents. In this process,

we used a one-point crossover, because we have only two objective variables to be optimized.

3.4.4. Mutations

To add variants from one generation of a population of chromosomes to the next, we used a

mutation rate of 0.02. The value is selected based on reviewing literature works which achieved

better result with the same population size, crossover type and rate value (Hassanat ,

Almohammadi, Alkafaween, & et, al, 2019). The value to be used in rating of crossover and

mutation are an important aspect in the designing of a GA. Mutation operations prevent the

population of chromosomes from becoming very similar in each generation and can come to a

better solution. The parameter and its value which we used in genetics algorithm is presented in

the table 3.2.

Parameter Value

Number of population 1000

Number of generation 100

Crossover Single point crossover

42

Fitness function MD

Mutation rate 0.02

Selection Fitness based selection

Table 3. 2. Parameters and its value of Genetic Algorithm

CHAPTER FOUR

EXIPERMENTAL RESULT AND DISCUSSION

4.1. Introduction

In this chapter, the experimental result and its interpretation using experimental evaluation criteria

are discussed. The optimized parameter value for organic, semi-detached, and embedded class of

intermediate COCOMO model is generated. The result of the experiment is evaluated using five

different software cost estimation evaluation criteria. Using these new optimized parameter value,

the estimated effort for each class of project in each mode of intermediate COCOMO is presented.

The effort estimation comparison between our proposed model, GA model, and PSO model also

discussed. The kind of dataset we used, its attribute description, simulation environment, and

evaluation metrics is also discussed.

4.2. Dataset Description

The dataset for this research is from COCOMO NASA dataset which was collected by Jairus

Highn (Promise software engineering Repository, 2005).The dataset contain 60, 63 and 93

different software projects and each project has 17 attributes. We have used 70% of the dataset for

optimizing the parameter value of the COCOMO model and 30% for testing the model. There is

three class of project named organic, semi-detached and embedded which is divided based on the

line of code, programmer experience and flexibility of requirements. For all this class of project

we used different training and testing dataset. The dataset that we used for this research have been

used by many researchers including Research work (Maleki, Ghaffar, & Masdari, 2014) ( Langsari

& Sarno, 2017) (Rohit Kumar Sachan, 2016) (Singh, Singh, & Mishra, 2018) (Algabri, Saeed,

Mathkour, & et al, 2015).

The dataset attributes is composed from four class of attributes named product attributes, hardware

attributes, personnel attributes and project attributes. In COCOMO model these attributes are

43

called effort multiplier or cost drivers which directly affect the cost of a particular software project.

In product attributes there are four effort multipliers, in computer attributes there are four cost

drivers, five cost drivers are included in personal attributes and three effort multipliers are included

in project attributes. The remaining attributes is the software size estimated by line of code which

largely determined the effort of a software project. And the last attribute is the actual effort that a

particular software project needed to complete the project. Table 4.1 shows the 17 attributes of

dataset.

Attributes class Variables

Attribute name Code

Product attributes

required software reliability RELY

data base size DATA

process complexity CPLX

Computer attributes

time constraint for CPU TIME

main memory constraint STOR

machine volatility VIRT

turnaround time TURN

Personal attributes

analysts capability ACAP

application experience AEXP

programmers capability PCAP

virtual machine experience VEXP

language experience LEXP

Project attributes

modern programing practices MODP

use of software tools TOOL

schedule constraint SCED

Project size in LOC Line of code LOC

44

Actual effort of project Actual effort ACT_EFFORT

Table 4. 1. Dataset attribute class, name and its code

The dataset we used has a nominal representation from very low to Extra High. We converted this

nominal data to its equivalent numerical weight using table 2.5. Since the dataset represent both

organic, semidetached and embedded mode of project, the dataset were classified to the

corresponding class of project using table 2.2.

4.3. Simulation environment

We used Anaconda Environment which is an open-source distribution of python which contains

python modules and packages for scientific computing. Pandas is used as a software library for

data manipulation and analysis and Numpy for controlling multi-dimensional array input datasets

which are used to calculate the fitness function value of individual particle or population in particle

swarm optimization and genetics algorithm respectively. The experimental environment is

installed on Intel(R) Core (TM) i5-3230M CPU @2.60 GHz and 8GB RAM.

4.4. Experiment results

4.4.1. Experimental Result for Organic Model on NASA60 dataset

Intermediate organic COCOMO model is a mode of a project that includes a software project that

has software size less than 50 KLOC. In the COCOMO model, the Value of A and B is 3.2 and

1.05 respectively. Using the two meta-heuristic algorithms in combination we obtained the

following newly optimized coefficient value of A and B for the Intermediate organic COCOMO

model. A=4.0026; B=1.0931. This new values of A and B are used to calculate the effort of the

software project to show the effect of the proposed model on software effort estimation.

Using the genetics algorithm, we found a new optimized Coefficient value of A and B for the

organic Intermediate COCOMO model. A=4.7813 and B=0.9833. Using the Particle swarm

optimization algorithm, we found the new value of A=4.9426 and B=1.0265.

Table 4.2 shows the estimated effort for the testing dataset using our new optimized coefficient

parameter value of A and B. From the table, the second column indicates the software project size

in Kilo line of code, the third column is the actual effort provided from the dataset, the next other

column represents the estimated effort in each model respectively. The estimated effort

45

comparison from the following table shows that all testing projects have good estimation value to

the actual effort in the PSO_GA model and the proposed model is more accurate than GA, PSO,

(Algabri, Saeed, Mathkour, & et al, 2015), and COCOMO model. The COCOMO model under-

estimated the effort of all projects and the estimated effort is more deviated from the actual effort

provided in the dataset. Figure 4.1 represents the graph of the estimated effort for the COCOMO,

Actual, and PSO_GA model. From the figure, the proposed PSO_GA model graph is in line with

the actual effort graph and the COCOMO model graph is under-estimated the effort of all testing

projects. his shows our model is better to estimate the effort of software than the COCOMO

model..

Figure 4. 1. Actual, COCOMO, and PSO_GA effort for organic model

46

Table 4. 2. Estimated effort for organic model

No KLOC Actual

effort

PSO_GA

estimated

effort

GA

estimated

effort

PSO

estimated

effort

(Algabri,

Saeed,

Mathkour, &

et al, 2015)

effort

COCOMO

effort

1 20 48 49 42 50 46 35

2 16.3 82 82 72 84 77 58

3 25.9 117.6 114.1 95.4 113.5 103.2 79.3

4 12.8 62 62 56 65 60 45

5 14.0 60 62 56 65 60 44

6 19.3 155 141 122 143 131 99

7 6.5 42 40 39 44 42 30

8 35.5 192 192 155 186 168 131

9 47.5 252 233 182 223 199 158

10 11.3 36 34 32 36 34 25

11 8.0 42 44 42 48 45 32

12 7.7 31.2 30.3 28.9 32.6 30.7 22.1

13 16.0 114 105 93 108 100 75

14 8.2 36.0 32 30 34 32 23

Table 4.3 present the accuracy comparison of our model with COCOMO, GA, PSO, (Algabri,

Saeed, Mathkour, & et al, 2015), and (Nadal & Sangwan, 2018) model using five different

47

estimation criteria. From the evaluation criteria, the PRED (0.25) value is expected to be high and

for the rest of the evaluation metrics, the value should be less because it is relative errors found in

the estimation process. From the table, we can see that the PRED (0.25) value of the COCOMO

model is 0.05, which means only 5 % of the testing project has MRE which is less than 0.25. This

is the minimum value found when we compare with PSO_GA, PSO, GA, (Algabri, Saeed,

Mathkour, & et al, 2015), and (Nadal & Sangwan, 2018) model value. From the comparison, the

proposed PSO_GA model has lower relative errors and higher PRED (0.25) value.

For example, the MRE value of our PSO_GA model is 0.6084, and the MRE for COCOMO, GA,

and PSO, model is 4.2291, 1.8137, and 0.8265 respectively. These values indicate the proposed

PSO_GA model can reduce 362.07%, 120.53%, and 21.81%, errors respectively. Comparatively

with (Algabri, Saeed, Mathkour, & et al, 2015) . The PRED (0.25) value for the GA model is 0.95,

which means 95% of the testing project have MRE less than 0.25. For PSO_GA, PSO, (Algabri,

Saeed, Mathkour, & et al, 2015) model the PRED (0.25) value is 1. This indicates that all the

testing organic project in the dataset has MRE less than 0.25. In all these five evaluation metrics,

the proposed PSO_GA model has lower relative errors and higher PRED (0.25) value. So that we

can conclude the PSO_GA model is satisfactory and the organic intermediate COCOMO model

software projects effort should be estimated with the new parameter value generated by our

proposed model.

Approach PRED(0.25) MRE MAE MAPE MMRE

COCOMO

model

0.05 4.2291 29.2062 30.2084 0.3020

(Algabri,

Saeed,

Mathkour, &

et al, 2015)

model

1 1.0783 10.3188 7.7024 0.0770

PSO model 1 0.8265 5.7455 5.9042 0.0590

GA model 0.95 1.8137 15.7269 12.9551 0.1295

48

PSO_GA

model

1 0.6084 4.1821 4.3462 0.0434

Table 4. 3. Relative errors comparison between models using evaluation criteria

Figure 4. 2. Relative Error Figure for organic model

4.4.2. Experimental result for Semi-detached COCOMO Model on NASA60 dataset

Semi-detached model includes projects which haves software project size in between 50-300

KLOC. The proposed model was trained using semi-detached projects and obtained the following

newly optimized coefficient value of A and B. A=4.8129 and B=1.0208.The current semi-

detached Intermediate COCOMO model values of A and B, A=3.0; B=1.12.

To analyze the effect of the proposed model, we also used genetic and PSO algorithm individually

to generate the value of COCOMO coefficients and we found a new optimized Coefficient value

of A and B for semi-detached COCOMO model. Using the genetic algorithm, we could generate

this value, A= 4.9992 and B= 1.0094. And using the particle Swarm optimization algorithm we

obtained the following values. A=4.0003; B=1.0524

Table 4.4 represent the value of actual effort, PSO_GA, GA, PSO, COCOMO, (Nadal & Sangwan,

2018), and (Algabri, Saeed, Mathkour, & et al, 2015) model estimated effort value for the testing

semi-detached NASA 60 projects. From the table, it is clearly shows that the proposed PSO_GA

model achieved better result in most case of the testing project. So that, we can say the proposed

49

PSO_GA model is better in estimating the effort of software project than others model listed in

table 4.4.

No KLOC Actual

Effort

PSO_GA

estimated

effort

GA

estimate

d effort

PSO

estimated

effort

(Algabri, Saeed,

Mathkour, & et

al, 2015) effort

COCOMO

Effort

1 78 571.4 571.3 564.6 544.9 541.0 548.6

2 177.9 1248 1240 1214 1214 1154 1292

3 190 420 416 407 408 387 436

4 50 370 314 312 295 300 288

5 219 2120 1418 1386 1398 1315 1509

6 282.1 1368 1044 1017 1037 962 1139

Table 4. 4. Estimated effort for semi-detached model

Figure 4. 3. MRE for semi-detached model

Table 4.5 represent the error rate of semi-detached cocomo model using PRED (0.25), MRE,

MAE, MAPE and MMRE. From the table, the proposed model has lower relative errors in all case

of software cost and effort evaluation metrics and the PRED (0.25) value is 0.8333 and this is the

50

maximum PRED (0.25) value in the table. The COCOMO model has lower relative errors

compared with GA, PSO and (Algabri, Saeed, Mathkour, & et al, 2015) model. The MRE of our

proposed PSO_GA model compared with GA, PSO, (Algabri, Saeed, Mathkour, & et al, 2015)

and COCOMO model is better by 0.0947, 0.1510, 0.3378, and 0.0581 respectively. Which means,

our model can reduce 9.47%, 15.10%, 33.78%, and, 5.81% of effort errors respectively.

In terms of Mean Magnitude Relative Error, the PSO_GA model achieved better result by 0.0158,

0.0252, 0.0563 and 0.0097 respectively. The mean absolute percentage error for the proposed

model is 12.2083 and this is the minimum percentage error found from the table. The mean

absolute errors of our model is 181.9098. Using MAE as evaluation criteria, the proposed model

could achieved better result compared with other model. The MAE for GA, PSO, (Algabri, Saeed,

Mathkour, & et al, 2015), and COCOMO model is 199.1846, 199.6587, 239.3911, and 167.4203

respectively. When compared with our PSO_GA model, the PSO_GA model could reduce

17.2748, 17.7489, 57.4813, and 14.4895 percent of Mean Absolute errors. Generally, in all the

five evaluation metrics our proposed model achieved better result and we can say the semi-

detached model project effort should be calculate using our newly coefficient parameter value of

A and B.


COCOMO

model

0.8333 0.7906 167.4203 13.1771 0.1317

(Algabri,

Saeed,

Mathkour,

& et al,

2015) model

0.8 1.0703 239.3911 17.8395 0.1783

PSO model 0.8 0.8835 199.6587 14.7260 0.1472

GA model 0.8 0.8272 199.1846 13.7876 0.1378

PSO_GA

model

0.8333 0.7325 181.9098 12.2083 0.1220

Table 4. 5. Semi-detached model evaluation using SCE evaluation metrics

51

4.4.3. Experimental result for Embedded COCOMO Model on NASA60 dataset

Embedded model include projects with software size greater than 300KLOC. The proposed model

were trained on such sized software project and we obtained the following newly optimized

coefficient value of A and B for the embedded intermediate COCOMO Model. A=4.2431;

B=1.0900.The current Embedded Intermediate COCOMO model value of A and B. A=2.8;

B=1.20

Using the genetics algorithm individually, we found a new value of A and. The value is A=4.8433,

and B= 1.0801. Using PSO algorithm we obtained the following value, 3.9983 and B=1.1295

Figure 4.4 represent the graph of each models’ effort. From the graph we can see that the

COCOMO, PSO and GA model over-estimate the effort of all embedded software projects. In

contrast the (Algabri, Saeed, Mathkour, & et al, 2015) model under-estimated the effort of all

projects. But in our proposed model, the estimation effort graph is almost similar and inline to the

actual effort graph. Table 4.6 shows the value of estimated effort for the testing embedded

intermediate COCOMO model using NASA60 projects dataset. For Embedded model of

COCOMO model, there is only three project whose KLOC is greater than 300 in the dataset. We

used these projects dataset as a training and a testing dataset and their estimated effort for each

model is presented in table 4.6. Form the table, the PSO_GA estimated effort for the last two

projects is satisfactory and for the first one project, the estimated effort is underestimated.

Comparatively with the GA, PSO and COCOMO model, our proposed PSO_GA model achieved

better result.

52

Figure 4. 4. Effort graph for embedded model

Table 4. 6. Effort for Embedded Model

No KLOC Actual

effort

PSO_GA

effort

GA

effort

PSO

effort

(Algabri, Saeed,

Mathkour, & et al,

2015) effort

COCOMO

effort

1 302 2400 1956 2110 2309 1361 2419

2 370 3240 3217 3463 3829 2197 4068

3 423 2300 2318 2492 2774 1564 2975

Table 4.7 present the effort comparison between PSO_GA, GA, PSO, and COCOMO model using

five different software cost estimation evaluation metrics. The first evaluation metrics (PRED

(0.25)) indicates the percentage of software projects who have MRE less than or equal to 0.25. For

the COCOMO model, only 33.33% of testing projects in the dataset have MRE less than or equal

to 0.25. In the case of GA, PSO, and PSO_GA model, the PRED (0.25) achieved 1, which means

all the testing projects have MRE which is less than or equal to 0.25. This indicates the proposed

PSO_GA model is better than the COCOMO model. With other relative error evaluation metrics,

the proposed PSO_GA model achieved a better result. For example, the MMRE in the PSO_GA

model is about 0.0666, and the MMRE in GA, PSO, (Mohammed Algabri, 2015), and COCOMO

53

model is about 0.0911, 0.1418, 0.3581, and 0.1858 respectively. This indicates the proposed

PSO_GA model could reduce 2.45%, 7.52%, 29.15%, and 11.92% of MMRE.

In terms of MAE, the Proposed PSO_GA model achieved 161.6897 and this is the minimum

Mean absolute error found in the table. Comparatively with GA, PSO, (Mohammed Algabri,

2015), and the COCOMO model, the proposed PSO_GA model could reduce 73.5369, 222.7792,

777.2584, and 346.0795 errors respectively. The magnitude relative error for the proposed model

is 0.1999 which is lower and better than other models in the table. Our model could reduce 7.34%,

22.56%, 87.44%, 35.75% of errors compared with GA, PSO, (Mohammed Algabri, 2015), and

COCOMO model respectively.

Table 4. 7. Comparison of embedded models using evaluation metrics


COCOMO model 0.3333 0.5574 507.7692 18.5814 0.1858

(Algabri, Saeed,

Mathkour, & et al,

2015)

- 1.0743 938.9481 35.8132 0.3581

PSO model 1 0.4255 384.4689 14.1844 0.1418

GA model 1 0.2733 235.2266 9.1131 0.0911

PSO_GA model 1 0.1999 161.6897 6.6656 0.0666

4.4.4. PSO_GA model effort comparison with research done by (Maleki, Ghaffar, &

Masdari, 2014) for NASA 60 datasets

In research (Maleki, Ghaffar, & Masdari, 2014) genetics and ant colony optimization algorithm

were used to test the suitable value of Effort multipliers and to optimize the COCOMO

coefficients respectively. The result of their experiment is presented in table 4.9. In comparison

with our proposed model, their research achieved higher Magnitude Relative errors. Which

means our proposed model is better than their work. Table 4.8 represent the magnitude relative

errors of models. The magnitude Relative errors of our proposed model is lower in, most of

54

testing project and the total magnitude Relative errors is reduced from 199.26 to 171.83 when

compared with (Maleki, Ghaffar, & Masdari, 2014) model estimated effort. While they find

the suitable value of effort multipliers, they achieved less magnitude relative errors than the

original COCOMO model but have higher magnitude relative errors than our proposed model

in most of the testing project in the dataset. The total MRE of COCOMO model is 374.77 and

this is the higher error rate value in the table. In comparison with the original COCOMO model

the proposed model could reduce 165.94% of errors. Finally, we can say that our proposed

model is better to estimate the effort of software project than (Maleki, Ghaffar, & Masdari,

2014) and COCOMO model.

Table 4. 8. MRE of PSO_GA, COCOMO and (Maleki, Ghaffar, & Masdari, 2014) model

No Project

No

KLOC MRE of

COCOMO by

(Maleki,

Ghaffar, &

Masdari, 2014)

MRE using

original

COCOMO

MRE

using

PSO_GA

MRE by

(Maleki,

Ghaffar, &

Masdari,

2014)

1 9 10.4 28.08 34.26 9.04 16.60

2 11 16.0 26.99 34.14 7.17 10.54

3 13 13.0 46.02 9.35 26.63 33.88

4 16 15.0 25.15 38.99 14.24 8.67

5 28 7.7 34.14 28.85 2.82 19.32

6 37 100 37.25 57.06 43.54 12.36

7 43 20 37.64 55.37 38.50 15.30

8 47 370 44.43 40.89 0.70 31.93

9 57 282.1 28.79 40.13 20.46 23.16

10 60 19.3 25.57 35.77 8.73 27.50

Total MRE 334.06 374.77 171.83 199.26

55

Table 4.9 represent the effort comparison between PSO_GA, original COCOMO and effort done

by (Maleki, Ghaffar, & Masdari, 2014) with the actual effort provided in the dataset. From the

table we can see that our estimated effort is much better in all the projects compared with the

original COCOMO effort and COCOMO effort gained after adjusting the suitable value of effort

multipliers. And when we compare our estimated effort with effort by (Maleki, Ghaffar, &

Masdari, 2014), PSO_GA effort achieved better estimated effort result near to the actual effort

provided on NASA 60 dataset.

Table 4. 9. Estimated Effort of models

No KLOC Actual

effort

PSO_GA

effort

Effort by

(Maleki,

Ghaffar, &

Masdari, 2014)

Original

COCOMO

effort

COCOMO

effort by

(Maleki,

Ghaffar, &

Masdari, 2014)

1 29.5 120 142.15 131.97 98.22 92.88

2 19.3 155 141.45 134.77 99.54 99.54

3 32.6 170 174.96 129.85 120.37 120.38

4 35.5 192 192.05 142.01 131.64 131.65

5 38 210 220.87 171.78 150.95 150.96

6 48.5 239 270.11 197.06 182.68 182.68

7 47.5 252 233.60 220.86 158.13 158.13

8 70 278 292.56 352.91 277.95 220.21

9 66.6 300 307.29 310.33 290.51 230.97

10 66.6 352.8 284.53 310.33 268.99 230.97

4.4.5. Experimental Result for organic COCOMO model on NASA63 dataset

The Proposed model was trained and tested on NASA60 datasets. To make sure, the optimized

COCOMO coefficient parameter value is valid for other datasets, we used NASA63 datasets.

NASA63 dataset contains 63 different software projects. These projects are included in to organic,

56

semi-detached, and embedded mode according to the size of the software. This section presents

the result of the proposed model on NASA63 organic software projects. To make the result

description clear and in order not to include many tables, only the result using the five software

effort and cost evaluation criteria is presented. Table 4.10 represents the result evaluation on the

selected 15 organic software projects. From the table, the proposed model has larger PRED (0.25)

value and lower value with other evaluation criteria. This indicates our proposed PSO_GA model

achieved better results compared with other models in the table.

The PRED (0.25) value of PSO_GA model indicates, more than 73.33% of the testing dataset has

MRE less than 0.25. In the COCOMO model, only 40% of the testing project has MRE value

which is less than 0.25 and this is the minimum value. Using MRE as evaluation criteria, the

proposed model could reduce 14.265%, 9.014 %, 6.046%, and 7.283% of errors when compared

with COCOMO, GA, PSO, and (Algabri, Saeed, Mathkour, & et al, 2015) model respectively.

The minimum magnitude relative error value is achieved in the proposed model. Using MAE as

evaluation criteria, the PSO_GA model 27.2977, 13.8663, 2.4296, and 8.034 mean error value

when compared with COCOMO, GA, PSO, and (Algabri, Saeed, Mathkour, & et al, 2015)

respectively. Similarly, it could reduce 9.51%, 6.0095%, 4.0308%, and 4.8550% of Mean absolute

errors respectively. In terms of MMRE, our proposed model achieved better result by reducing the

mean error value. So that, we can say our proposed model is better to estimate the effort of organic

software projects with the newly optimized coefficient parameter value.

Table 4. 10. Evaluation of organic model using evaluation criteria

Models

Evaluation metrics

PRED(0.25) MRE MAE MAPE MMRE

COCOMO 0.40 4.4434 40.9818 29.6230 0.2962

(Algabri,

Saeed,

Mathkour,

& et al,

2015)

0.7333 3.7452 21.7215 24.9680 0.2496

57

PSO 0.6666 3.6215 16.1137 24.1438 0.2414

GA 0.6 3.9183 27.5504 26.1225 0.2612

PSO_GA 0.7333 3.0169 13.6841 20.1130 0.2011

Figure 4. 5. Effort graph for organic model on NASA63 dataset.

4.4.6. Experimental Result for semi-detached COCOMO model on NASA63 dataset

In NASA63 semi-detached model, five software testing projects were taken from the dataset to

evaluate the performance of the proposed model. From table 4.11, all the listed models achieved

PRED (0.25) value of 0.8. This indicates 80% of the testing datasets have less than 0.25 MRE

value in all models. So, we can’t make a comparison analysis using PRED (0.25). In the other

evaluation metrics, our proposed model achieved better result. The magnitude relative errors of

the proposed model is 0.6414. Using MRE as evaluation criteria, our proposed model could reduce

9.45%, 5.84%, 16.57%, and 9.45% of errors in comparison with COCOMO, GA, PSO, and

(Algabri, Saeed, Mathkour, & et al, 2015) model respectively. In terms of MAE, our proposed

58

model achieved better result by 14.4766, 9.1140, 26.6337, and 38.0079 of mean error respectively.

Using MAPE and MMRE, our proposed model also achieved better result comparatively.

Table 4. 11. Effort comparison using SCE metrics for semi-detached model

Models

Evaluation metrics


COCOMO 0.8 0.7359 189.5173 14.7191 0.1471

(Algabri,

Saeed,

Mathkour,

& et al,

2015)

0.8 0.8893 213.0486 17.7864 0.1778

PSO 0.8 0.8071 201.6744 16.1420 0.1614

GA 0.8 0.6998 184.1547 13.9978 0.1399

PSO_GA 0.8 0.6414 175.0407 12.8293 0.1282

59

Figure 4. 6. MRE for semi-detached model

4.4.7. Experimental Result for embedded COCOMO model on NASA63 dataset

For the embedded model, there is only four software projects in the dataset. So we took all the

projects to test the performance of the proposed PSO_GA model. Relatively with other COCOMO

models, less PRED (0.25) value is achieved in the embedded model. In table 4.12, the PRED (0.25)

value of the PSO_GA model is 0.25. Which indicates only 25% or one project from the dataset

has MRE vale which is less than 0.25. (Algabri, Saeed, Mathkour, & et al, 2015) Model has better

achievements in three of the evaluation metrics. - MRE, MAPE, and MMRE. And our proposed

model has better achievements in two of the evaluation metrics. – PRED (0.25) and MAE. In

comparison with COCOMO, PSO, and GA model, our proposed model achieved satisfactory

result.

60

Table 4. 12. Effort comparison for embedded model on NASA 63 dataset

Models

Evaluation metrics


COCOMO 0.25 4.2058 2142.3367 105.1468 1.0514

(Algabri,

Saeed,

Mathkour,

& et al,

2015)

- 1.8443 2387.5700 46.1081 0.4610

GA 0.25 3.0368 1611.8130 75.9222 0.7592

PSO 0.25 3.6777 1847.0891 91.9444 0.9194

PSO_GA 0.25 2.6968 1570.5311 67.4202 0.6742

4.4.8. Experimental Result for organic COCOMO model on NASA 93 datasets

In addition to using NASA60 and NASA63 software projects dataset, we also used NASA 93

projects for comparison analysis with other research work. We took 20 testing projects from the

dataset. Table 4.13 present the result of the experiment using the five evaluation metrics. From the

table, the PRED (0.25) value of our proposed model is 0.9. This indicates 90% of the testing

projects have approximate effort representation. The PRED (0.25) value of COCOMO model is

0.35 which makes, only 35% of the testing project have approximate effort estimation. Our

proposed model achieved less relative errors (better result) with the other four evaluation criteria.

In terms of MRE, the proposed model could reduce 32.949%, 12.4%, 6.998%, and 8.156% of

errors when compared with COCOMO, GA, PSO, and (Algabri, Saeed, Mathkour, & et al, 2015)

model respectively.

The mean absolute error for the PSO_GA model 16.0706. Using MAE as evaluation metrics, the

proposed PSO_GA model could reduce error by 37.0406, 19.068, 2.7761, and 10.8191

61

respectively. In terms of MAPE, the COCOMO model achieved higher error rate. In contrast the

proposed PSO_GA model achieved better result (less error rate). Using MAPE as evaluation

metrics, our proposed PSO_GA model could reduce 16.4748%, 6.2%, 3.499%, and 4.0783% of

error respectively. Less MMRE value (better result) is achieved in the proposed model. So that,

we can say, the newly optimized coefficient parameter value of COCOMO model is better to

estimate the effort of organic software projects.

Table 4. 13. Organic model effort comparison on NASA93 dataset

Models

Evaluation Criteria


COCOMO 0.35 5.4436 53.1112 27.2184 0.2721

(Algabri,

Saeed,

Mathkour, &

et al, 2015)

0.85 2.9643 26.8897 14.8219 0.1482

PSO 0.85 2.8485 18.8467 14.2426 0.1424

GA 0.65 3.3887 35.1386 16.9436 0.1694

PSO_GA 0.9 2.1487 16.0706 10.7436 0.1074

4.4.9. Experimental result for semi-detached COCOMO model on NASA 93 dataset

Table 4.14 shows the result of the experiment using NASA 93 semi-detached software projects.

From the table, the PRED (0.25) value of all models is 0.75. So, we can’t make effort accuracy

comparison using PRED (0.25). With the other evaluation metrics, our proposed model achieved

lower relative errors (better result). The MRE of the proposed PSO_GA model is 3.4612 and this

is the minimum value compared with other models MRE value. The PSO_GA model could reduce

18.4%, 4.76%, 12.31%, and 21.98% of MRE when compared with COCOMO, GA, PSO, and

62

(Algabri, Saeed, Mathkour, & et al, 2015) model respectively. Using MAPE as evaluation metric,

the proposed PSO_GA model could achieved better result by 0.9198%, 0.2381%, 0.6155%, and

1.099% of errors. In terms of MMRE, our proposed model also achieved better result

comparatively.

Table 4. 14. Semi-detached model effort comparison on NASA93 dataset

Models

Evaluation Criteria


COCOMO 0.75 3.6452 179.9865 18.2260 0.1822

(Algabri,

Saeed,

Mathkour,

& et al,

2015)

0.75 3.6810 199.3725 18.4052 0.1840

PSO 0.75 3.5843 187.8757 17.9217 0.1792

GA 0.75 3.5088 188.5638 17.5443 0.1754

PSO_GA 0.75 3.4612 183.9684 17.3062 0.1730

4.4.10. Experimental Result for Embedded COCOMO model on NASA 93 dataset

In the NASA 93 dataset, there is only six embedded software projects and we select all of them to

test the accuracy of optimized coefficient parameter value in estimating the required effort. In this

mode of project, the (Algabri, Saeed, Mathkour, & et al, 2015) model achieved better result than

our model. Comparatively with COCOMO, GA, and PSO model, our proposed model achieved

satisfactory result. Table 4.15 represent the semi-detached comparison on NASA93 dataset.

63

Table 4. 15. Semi-detached model effort comparison on NASA93 dataset

Models

Evaluation metrics


COCOMO - 9.1473 1660.1290 182.9461 1.8294

(Nadal &

Sangwan, 2018)

(Algabri, Saeed,

Mathkour, & et

al, 2015)

0.2 4.7383 1158.9310 94.7676 0.9476

GA - 7.7249 1499.2823 154.4999 1.5449

PSO - 8.6067 1599.9663 172.1343 1.7213

PSO_GA - 7.0979 1426.2747 141.9582 1.4195

64

CHAPTER FIVE

CONCLUSION AND RECOMENDATION

5.1. Conclusion

Good software project management is chivied through good Software project planning. Software

project planning is one of the phases in the software project development life cycle which ensures

the project's feasibility. And software effort estimation is one of the tasks to be done in this project

planning phase. Different estimation techniques and algorithms have used to estimate the effort of

a particular software project. In this research, a software effort estimation model which relies on

COCOMO model is proposed. We used the Genetics algorithm and Particles Swarm optimization

algorithm sequentially to optimize the parameter value of the Intermediate COCOMO model. In

the process, we used a particle swarm optimization algorithm to produce an initial solution and

genetics algorithm to optimize the parameter value of Intermediate COCOMO coefficients. The

proposed model was trained using the NASA60 dataset where 70% of the dataset is used for as an

input to optimize the parameter value and 30% the dataset is used for testing the effort estimation

accuracy with the new optimized parameter value and evaluated using NASA60, NASA63 and

NASA 93 dataset. In the process, MRE, MMRE, MAE, PRED (0.25), and MAPE are used for

performance evaluation metrics. The experimental result showed that the prediction capability of

this thesis improved the organic, semi-detached and embedded model of COCOMO by 362.07%,

120.53%and 21.81% respectively.

5.2. Contribution

In this research, we investigate the impact of using two MetaHeuristics algorithms together on the

performance of software effort estimation. During the attempt to increase the estimation accuracy

of software effort, we provided the following contributions.

1. We provided a hybrid effort estimation model by using the strength of genetic and Particle

swarm optimization algorithms together to estimate the effort of a software project.

2. The research showed that the PSO algorithm is better for exploration and the GA

algorithm is suitable for exploitation

65

5.3 Future work

This study focused on improving the performance of the COCOMO model in terms of software

effort Estimation accuracy by optimizing the parameter value of A and B. and the result of our

work is encouraging. The experimental result using the newly optimized coefficient value is very

promising especially for the organic and semi-detached type of COCOMO software project. But

have large magnitude relative errors for the embedded type of Intermediate COCOMO Model. The

reason might be, when the software size is larger, either the algorithm we used is incapable of

finding the global optimum value of COCOMO coefficients. Or the relation between software size,

effort, and complexity factor (coefficients) may not be expressed using power function. In the

future, we are hoping to present better effort estimation Accuracy for all types of Intermediate

COCOMO models using advanced meta-heuristic algorithms together. Another work to be done

in the future is improving the estimation accuracy performance for the Detailed COCOMO model

and COCOMO II using a MetaHeuristics Algorithm.

66

REFERENCES

Abdel-Basset, M., Abdel-Fatah, L., & Sangaiah, A. K. (2018). METAHEURISTIC ALGORITHMS:A

COMPREHENSIVE REVIEW. Vellore, India: ScienceDirect.

Abts, C., Brown, A. W., & etal. (2000). COCOMO II Model Defination manual.

Ahadi, M., & Jafaria, A. (2016, March). A new hybrid for software cost estimation using particle swarm

optimization and differential evolution algorithm. 4.

Alajlan, M., & Tagoug, N. (2016). Optimization of COCOMO-II Model for Effort and Development Time

Estimation using Genetic Algorithms. Proc. Of The International Conference on Communications,

Computer Science and Information Technology.

Algabri, M., Saeed, F., Mathkour, H., & et al. (2015). Optimization of Soft Cost Estimation using Genetic

Algorithm for NASA software projects. 2015 5th National Symposium on Information Tchnology:

Towards New Smart World (pp. 1-4). Riyadh: IEEE. doi:10.1109/NSITNSW.2015.7176416

BaniMustafa, A. (2018, October 17). Predicting Software Effort Estimation Using Machine Learning

Techniques. doi:10.1109/CSIT.2018.8486222

Blum, C. (2005, December). Ant colony optimization: Introduction and recent trends. Physics of Life

Reviews, 2(4), 353-373. doi:https://doi.org/10.1016/j.plrev.2005.10.001

Boehm, B. (1984). Software engineering Economics. IEEE Acess, SE-10(1).

Borade, J. G., & Khalkar, V. R. (2013). Software project Effort and Cost Estimation Techniques.

International Journal of Advanced Research in Computer Science and Software Engineering, 3(8).

Carr, J. (2014). An Intriductio to genetics Algorithm.

(2015). Chaos Report .

Dizaj, A. A., & Gharehchopogh, F. S. (2018). A new approach to software cost estimation by improving

genetic algorithm with Bat Algorithm. Journal of Computer & Robotics, 11(2), 17-30.

Dongshu Wang, D. T. (2017). Particle swarm optimization algorithm: an overview. Springer.

Dorjio, M. (1992). Optimization, learning and natural algorithms.

genetics Algorithms in nature. (n.d.). In genetics algorithm (p. 50).

Glinz, P., & Mukhija, A. (2003). Construcctive Cost Model.

Goldbreg, D. E. (1988). Genetic Algorithms in Search, Optimization and Machine. New York.

Isa Maleki, A. G. (2014). A new approach to software cost estimation by improving genetic algorithm

with Bat Algorithm. International Journal of Innovation and Applied Studies.

67

Jamil, A. S. (2007). Used SLIM Model to Estimate Software Cost.

Jørgensen, M. (2004, January). Top-down and bottom-up expert estimation of software development

effort. Information and Software Technology, 46(1), 3-16. Retrieved from

https://doi.org/10.1016/S0950-5849(03)00093-4

Kao, Y.-T., & Zahara, E. (2008, Mach). A hybrid genetics Algorithm and particle Swarm for multimodal

function. Applied Soft Computing, 8(2), 849-857.

Karna, H., & Gotovac, S. (2014). Modeling Expert Effort Estimation of Software Projects. 2014 22nd

International Conference on software Telecomunications and Computer Networks(SoftCom) (pp.

356-360). IEEE. doi:10.1109/SOFTCOM.2014.7039106

Kennedy, J., & Eberhart, R. (1995). new optimizer using particle swarm theory,. MHS'95. Proceedings of

the Six International Symposium on Micro Machine and Human Science. Nagoya, Japan: IEEE.

doi:10.1109/MHS.1995.494215

Keshta, I. M. (2017). Software Cost Estimation Approaches: A Survey. Journal of Software Engineering

and Applications.

Keung, J. (2009). Software Development Cost Estimation using Analogy: A review. 2009 Australian

Software Engineering Conference. IEEE. doi:10.1109/ASWEC.2009.32

Langsari, K., & Sarno, R. (2017). Optimizing COCOMO II Parametrs Using Particle Swarm Method. 2017

3rd International Conference on scinece in informtion technnology (ICSITech) (pp. 29-34).

Bandung: IEEE. doi:10.1109/ICSITech.2017.8257081

L.R. Nerkar, P. Y. (2014). Software Cost Estimation using Algorithmic Model and Non-Algorithmic Model

a Review. International Journal of Computer Applications.

Le, M. H., & Khuat,, T. T. (2016). Optimizing Parameters of Software Effort Estimation Models using

Directed Artificial Bee Colony Algorithm. Informatica, 40(4), 427-436.

Leung, H., & Fan, Z. (2013). software cost estimation.

Maleki, I., Ghaffar, A., & Masdari, M. (2014, Junuary ). A New Approach for Software Cost Estimation

with Hybrid Genetic Algorithm and Ant Colony Optimization. International Journal of Innovation

and Applied Studies, 5(1), 72-81.

Mandal, A. (2015). Identifying the Reasons for Software Project Failure and Some of their Proposed

Remedial through BRIDGE Process Models. International Journal of Computer Sciences and

Engineering .

Miandoab, E. E., & Gharehchopogh, F. S. (2016, June). A Novel Hybrid Algorithm for Software Cost

Estimation Based on Cuckoo Optimization and K-Nearest Neighbors Algorithms. Engineering,

Technology & Applied Science Research, 6, 118-122.

68

Nadal, D., & Sangwan, O. P. (2018, August). Software Cost Estimation by Optimizing COCOMO Model

Using Hybrid BATGSA Algorithm. International Journal of Intelligent Engineering and Systems,

11(2), 250-263. doi:10.22266/ijies2018.0831.25

Ochieng, P., Mwangi, W., & Mwgha, S. M. (2014, May). software Size Estimation in Incremental Software

Development based on Improved Pairwise Comparison Matrices. International Journal of

Computer Applications, 93, 29-39. doi:10.5120/16213-5519

Omara, A. F., & Arafa, M. M. (2010, January). Genetiics Algorihm for task scheduling problem. Journal of

Parallel and Distributed Computing, 70(1), 13-22.

PMI. (2017). A guide to the project management body of knowledge (Six ed.). Newtown Square,

Pennsylvania, USA: Project Management Institute,Inc.

Promise software engineering Repository. (2005, April 4). Retrieved from promise.site.uottawa:

http://promise.site.uottawa.ca/SERepository/datasets-page.html

Przemyslaw Pospieszny, B. C.-C. (2018). An effective approach for software project effort and duration

estimation with machine learning algorithm. ELSEVIER.

Rijwani, P., & Jain, S. (2016). Enhanced Software Effort Estimation using Multi Layered Feed Forward

Artificial Neural Network Technique. Procedia Computer Science, 89, 307-312.

Rohit Kumar Sachan, e. (2016). Optimizing Basic COCOMO Model using Simplified Genetic Algorithm.

ELSEVIER.

Sachan, R. K., Nigam, A., Singh, A., & et al. (2016). Optimizing Basic COCOMO Model using Simplified

Genetic Algorithm. Procedia Computer Science , 89, 492-498.

doi:https://doi.org/10.1016/j.procs.2016.06.107

Saini, N. (2017, December 8). Review of Selection Methods in Genetic Algorithms. 6.

Salijoughinejad, R., & Khatibi, V. (2018). A New Optimized Hybrid Model Based on COCOMO to Increase

the Accuracy of Software Cost Estimation. Journal of Advances in Computer Engineering and

Technology, 4(1), 27-40.

Sengupta, S., Basak, S., & Peters, R. A. (2018). Particle Swarm Optimization: A survey of historical and

recent developments with hybridization perspectives. Machine Learning & Knowladge

extraction.

Sharma, S. (2017, Jauuary). Application of genetics Algorithm in software Engineering,Distributed

Computing and Machine Learning. International Journal of Computer Application and

Information Technology, 9(2).

Shekhar, S., & Kumar, U. (2016). Review of Various Software Cost Estimation Techniques. International

Journal of Computer Applications, 141. doi:10.5120/ijca2016909867

69

Singal, P., Kumari, A., & Sharma, P. (2020). Estimation of software development effort: A Differential

Evolution Approach. Procedia Computer Science, 167, 2643-2652. Retrieved from

https://doi.org/10.1016/j.procs.2020.03.343

Singh, T., Singh, R., & Mishra, K. K. (2018). Software Cost Estimation using Environmental Adaption

method. Procedia Computer science , 143, 325-332.

doi:https://doi.org/10.1016/j.procs.2018.10.403

Sörensen, K., & Glover, F. W. (2017, January 23 ). METAHEURISTIC. doi:https://doi.org/10.1007/978-1-

4419-1153-7_1167

Think Big, A. S. (2013). The CHAOS Manifesto, 2013.

Tribhuvan Singha, ∗. R. (2018). Software Cost Estimation using Environmental Adaption method.

ELSEVIER.

Y.Sangeetha M.Tech (Ph.d), P. L. (2012). Software Cost Models. International Journal of Engineering

Research & Technology (IJERT).

Yang, X.-S. (2014). Nature-Inspired Optimization Algorithms. Springer.

70

APPENDIX

Appendix 1, Dataset sample with its attributes

runfile('C:/Users/dady/geneticAlgorithm/Datacleaning.py',

wdir='C:/Users/dady/geneticAlgorithm')

RELY DATA CPLX TIME ... TOOL SCED KLOC act_effort

0 1.15 0.94 1.15 1.00 ... 1.00 1.08 29.5 120.0

1 1.15 0.94 1.15 1.00 ... 1.00 1.08 19.7 60.0

2 1.15 0.94 1.15 1.00 ... 1.00 1.08 5.5 18.0

3 1.15 0.94 1.15 1.00 ... 1.00 1.08 10.4 50.0

4 1.00 1.16 1.15 1.30 ... 0.83 1.08 16.3 82.0

5 1.00 0.94 1.15 1.00 ... 1.00 1.00 31.5 60.0

6 1.00 1.08 1.15 1.30 ... 1.10 1.04 11.4 98.8

7 1.00 1.00 1.15 1.00 ... 1.00 1.04 47.5 252.0

8 1.00 1.00 1.15 1.00 ... 1.00 1.00 8.0 42.0

9 1.15 1.00 1.00 1.11 ... 1.00 1.00 15.0 90.0

10 1.00 0.94 1.15 1.00 ... 1.00 1.00 11.3 36.0

11 1.00 1.00 1.15 1.00 ... 1.00 1.00 8.0 42.0

12 1.00 1.16 1.15 1.30 ... 0.83 1.08 48.5 239.0

13 1.00 1.16 1.15 1.30 ... 0.83 1.08 32.6 170.0

14 1.00 0.94 1.15 1.00 ... 1.00 1.00 20.0 72.0

15 1.00 1.16 1.15 1.30 ... 0.83 1.08 15.4 70.0

.. ... ... ... ... ... ... ... ... ....

60 1.00 1.08 1.15 1.30 ... 1.10 1.04 11.4 98.8

71

Appendix 2, Sample python code to calculate the fitness of each coefficients # -*- coding: utf-8 -*-

"""

Created on Tue Mar 10 20:53:56 2020

@author: dady

"""

# -*- coding: utf-8 -*-

"""

Created on Wed Feb 26 09:30:44 2020

@author: dady

"""

import random

import numpy as np

import pandas as pd

import numpy as numpy

import sample

import detached

from sklearn.model_selection import train_test_split

data = pd.read_csv("nasa60_detachedtesting.txt")

data_value=data.values.tolist()

fittness_value=[]

#print(data_value)

y = data.act_effort

X = data.drop('act_effort', axis=1)

72

X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3,random_state=0)

print()

train_X=X_train.values.tolist()

train_Y=y_train.values.tolist()

#print(len(train_data))

def fitness_function(x):

for i in range(len(train_X)):

for j in range(16):

fittness=abs(x[0]*pow(train_X[i][15],x[1])*np.prod(train_X[i][0:15])-train_Y[i])

fittness_value.append(fittness)

return fittness_value

Appendix 3, Sample initial value generated for coefficients

909.0497008064683 [3.95623091 1.19678076]

205.8898111906779 [4.67728402 1.03898287]

468.2230304937265 [4.48409103 0.97755579]

205.0267181421627 [4.57667944 1.04318819]

209.02479288994448 [4.64250996 1.04360598]

216.54734579319788 [4.70716406 1.04353031]

205.39103079706877 [4.72593763 1.03501228]

204.49595547550877 [4.71990694 1.0338528 ]

529.8784144902694 [4.73725871 1.09568965]

205.43159989551242 [4.84710612 1.02616166]

204.4848749183215 [4.73065257 1.0332444 ]

204.43921774767088 [4.72206875 1.03382967]

73

204.4391720422616 [4.71840025 1.03405207]

256.7535491863642 [4.25795018 1.04101989]

204.59187282897167 [4.70044794 1.03495936]

204.4719081163542 [4.7215581 1.0339129]

204.50916791380558 [4.71507459 1.03439764]

205.7162838635714 [4.85622437 1.025371 ]

488.5155116029744 [2.90038549 1.10972479]

Parents

[[4.72322769 1.0337143 ]

[4.72322769 1.0337143 ]

[4.72322769 1.0337143 ]

[4.72322769 1.0337143 ]]

Crossover

[[4.72322769 1.0337143 ]

[4.72322769 1.0337143 ]

[4.72322769 1.0337143 ]

...

[4.72322769 1.0337143 ]

[4.72322769 1.0337143 ]

[4.72322769 1.0337143 ]]

SOFTWARE EFFORT ESTIMATION MODEL USING GENETIC AND ...

Documents