## 1. Introduction

Software engineering management involves planning [

1]. The software project planning includes software prediction, and the most common predicted variables have been size [

2] (mainly measured in either source lines of code, or function points [

3]), effort (in person-hours or person-months [

3]), duration (in months [

4]), and quality (in defects [

5]).

Software development effort prediction (SDEP), also termed

effort estimation or

cost estimation [

6], is needed for managers to estimate the monetary cost of projects. As reference, in USA the cost by person-month (which is equivalent to 152 person-hours) is of

$8000 USD [

7].

Unfortunately, those projects taking more time (i.e., time overrun) costing more money (i.e., cost overrun) [

8], and cost overrun has been identified as a chronic problem in most software projects [

9]; whereas for cost underrun, a portion of the budgeted money is not spent and then money taxes have to be paid. These issues related to costs have been the causes for which a software project has been assessed based upon the ability to achieve the budgeted cost [

10,

11].

The relevance of the SDEP has been reflected with several publications on systematic reviews published between the years 2007 [

6] and 2020 [

12] where hundreds of studies on SDEP had been analyzed. The prediction models identified in the systematic reviews have been adaptive ridge regression, association rules, Bayesian networks, case-based reasoning (CBR, also termed

analogy-based), decision trees, expectation maximization, fuzzy logic, genetic programming, grey relational analysis, neural networks, principal component analysis, random forest, and support vector regressions.

The proposal of accurate models for SDEP represents a continuous activity of researchers and software managers. On average, software developers spend between 30% and 40% more effort than is predicted when planning any project. The failure attributed to SDEP leads to schedule delays and cost overruns, which can address project failure and affect the reputation and competitiveness. On the other hand, over-predicting the effort of a software project can address ineffective use of resources, which can result in loss of opportunities to fund other projects, and therefore loss of tenders. It can derive, from a social point of view, demotivation of software engineers and their probable search for new job opportunities. These scenarios have motivated researchers for addressing their efforts to determine which prediction technique is most accurate, or to propose new or combined techniques that could provide better predictions [

13].

There are several kinds of techniques, which have been applied to SDEP of projects developed individually in academic settings [

14] or by teams of practitioners [

12]. Our study involves projects developed by teams of practitioners in business environments.

As for metaheuristic algorithms, several studies on them have recently been published between the years 2019 and 2020 being inspired from (a) social behavior of animals, that is, Particle Swarm Optimization (PSO) such as insects, herds, birds and fishes [

15], bats [

16], butterflies [

17], cuckoos [

18], elephants [

19], fireflies [

20], moths [

21], and whales [

16]; (b) nature (brain [

22], differential evolution [

16], and genetic algorithm (GA) [

23]), (c) physics (cooling of metals [

24]), and (d) mathematics (such as sine and cosine operators [

24]).

Regarding software engineering field, metaheuristics have also been specifically applied for SDEP. The applied algorithms have been artificial bee colony (ABC) [

25], cuckoo search [

26], differential evolution [

27], GA [

28], PSO [

29], simulated annealing [

30], tabu search [

30], and whale optimization algorithm [

31].

In those ten studies that we identified where PSO was applied to SDEP (which are analyzed in the

Section 2 of the present study), PSO has been used to optimize parameters of models such as Bayesian belief network [

32], CBR [

33,

34,

35,

36,

37], COCOMO statistical equation [

38] (whose model was published in the year of 1981 [

39]), decision trees [

40], fuzzy logic [

29], mathematical expressions [

25], neural networks [

40], and support vector regression [

40].

Regarding those five studies where PSO is used for optimizing CBR, in four of them the most similar software projects are selected, and the effort of a new project is predicted through a weighted average obtained from data of those similar projects. These weights are calculated using PSO [

34,

35,

36,

37]. In the fifth study, the authors use a hybrid CBR using local and global searches, and a multi-objective PSO is used to minimize two error functions for the searching [

33]. As for fuzzy logic, the PSO is used to optimize the parameter values of the membership functions that make up the model [

29]. In a different manner, once a set of fuzzy models has been defined, PSO is used to choose that model that best fits the SDEP [

38]. PSO has also been used in combination with elements from ABC algorithm to fit the parameters of a predefined SDEP function [

25]. Other study uses a classifier committee to make predictions and uses PSO to optimize the parameter values of each of the base classifiers that make up that classifier committee [

40]. In a similar perspective, PSO was used within a hybrid model, to estimate the components of a Bayesian network [

32]. In our proposal, neither weights nor any CBR model to be optimized are considered.

Unlike the previous proposals, the contribution of our study is the application of the PSO algorithm to optimize the parameters of statistical regression equations (SRE) applied to SDEP (hereafter, termed

PSO-SRE), the selection of which is also optimized. Each equation is generated by following a regression analysis, and from data sets of projects selected from an international public repository of recent software projects (i.e., International Software Benchmarking Standards Group, ISBSG release 2018). The software projects were selected based on their type of development (TD), development platform (DP), and programming language type (PLT) as suggested in the guidelines of the ISBSG [

41]. The ISBSG has widely been used for SDEP models [

42].

The size of a software project is a common variable used for SDEP [

3], therefore, our models use it as the independent variable. In our study, the size type is function points, whose value is calculated from nineteen independent variables mentioned in the

Section 4 of the present study (i.e., adjusted function points, AFP) [

13].

The justification for the comparison between the prediction accuracy of our PSO-SRE with that obtained from SRE is based on the following issues related to SDEP:

- (a)
The prediction accuracy of any new proposed model should at least outperform a SRE [

43].

- (b)
SRE has been the model whose prediction accuracy has mostly been compared to other models such as those based on ML [

44,

45].

- (c)
The prediction accuracy of SRE has outperformed the accuracies obtained from ML models [

44].

Owing to a statistical analysis is needed for a validity studies [

46], the data preprocessing and our conclusions are based on statistical analysis involving identification of outliers, coefficients of correlation and determination of data, as well as on the suitable statistical test for comparing the prediction accuracy between PSO-SRE and SRE.

A systematic literature review published in 2018 which analyzed studies published between 1981 and 2016 on SDEP models recommends the use of same data sets and a same prediction accuracy measure such that conclusions can be compared to other studies [

3]. This recommendation was suggested once they found difficulty to compare the performance among SDEP models due to the wide diversity of data sets and accuracy measures used. Thus, in our study, the models were applied to the same data sets, as well as taking into account a same accuracy measure (i.e., absolute residual, AR). Moreover, they were trained and tested using the same validation method (i.e., a Leave-one-out cross-validation, LOOOV, which is recommended for software effort model evaluation [

47]).

In the present study, the null (H_{0}) and alternative (H_{1}) hypotheses to be tested are the following:

**H**_{0}. Prediction accuracy of the PSO-SRE is statistically equal to that of the SRE when these two models are applied to predict the development effort of software projects using the AFP as the independent variable.

**H**_{1}. Prediction accuracy of the PSO-SRE is statistically not equal to that of the SRE when these two models are applied to predict the development effort of software projects using the AFP as the independent variable.

The remaining of the present study is as follows:

Section 2 has been assigned to describe the related studies where PSO has been applied to predict the development effort of software projects.

Section 3 describes the Particle Swarm Optimization (PSO) metaheuristic, and our proposal: the PSO-SRE.

Section 4 presents the criteria applied to select the data sets of software projects by observing the guidelines of the ISBSG, as well as the data preprocessing.

Section 5 presents the results when PSO-SRE is performed and compares its prediction accuracy to SRE once the two models were trained and tested.

Section 6 mentions our conclusions. Finally,

Section 7 corresponds to a discussion section, which includes the comparison with previous studies, the limitations of our study, validation threats, as well as directions for future work.

## 2. Related Work

The proposed SDEP techniques have been systematically analyzed in several reviews [

3,

6,

12,

44,

45,

48,

49,

50,

51]. They can be classified in those not based on models, and in those based on models. The first type mentioned is also termed expert judgment [

48,

52], whereas the latter one can be classified in two categories: statistical [

53] and ML models [

44,

45].

Table 1 shows an analysis of those ten studies identified where PSO was applied to SDEP. It includes the data set(s) of software projects, the number of projects by data set, the prediction accuracy measure, the validation method, as well as if the result was reported based on statistical significance, if so, the name of the statistical test is mentioned. A description including the proposal and results by study is done next:

Azzeh et al. [

33] use PSO to find the optimum solutions for variables related to multiple evaluation measures when applied CBR. Their results show that CBR improves when taking into account all variables together.

Bardsiri et al. [

34] apply PSO to optimize the CBR weights. The PSO algorithm assigns weights to the features considered in the similarity function. The accuracy of their proposal is compared to those obtained from three types of CBR, as well as to those ones obtained from neural networks, classification and regression tree, and statistical regression models. Results show that the prediction accuracy of the CBR when used PSO was better than all the mentioned models.

Bardsiri et al. [

35] use PSO in combination with CBR to design a weighting system in which the project attributes of different clusters of software projects are given different weights. The performance of their proposal is better than the prediction accuracy obtained when neural networks, classification and regression trees, and statistical regression models are applied.

Chhabra and Singh [

29] firstly compare the prediction accuracy of three models termed Regression-Based COCOMO, Fuzzy COCOMO, and PSO Optimized Fuzzy COCOMO. In the latter one, they use the PSO to optimize the fuzzy logic model parameters. Then, they also compare the performance of it to that of a GA Optimized Fuzzy COCOMO. Their results show that the PSO Optimized Fuzzy COCOMO has better prediction accuracy than those obtained from the other three models. They concluded that the PSO can be applied as optimizer for a fuzzy logic model.

Hosni et al. [

40] apply PSO for setting ensemble parameters of four ML models. They compare the PSO performance to that of grid search. They conclude that PSO and grid search show a same predictive capability when applied to

k-nearest neighbor, support vector regression, neural networks, and decision trees.

Khuat and Le [

25] propose an algorithm combining the PSO and ABC algorithms for optimizing the parameters of a SDEP formula. This formula is generated by using two independent variables obtained from agile software projects (i.e., final velocity, and story point). The accuracy results by applying this formula are compared to those obtained from four types of neural networks (i.e., general regression neural network, probabilistic neural network, group method of data handling polynomial neural network, and cascade correlation neural network). The performance of the algorithm based on PSO and ABC was better than those obtained from the four mentioned neural networks.

Sheta et al. [

38] use PSO for optimizing the parameters of the COCOMO equation (termed PSO-COCOMO). They also build a fuzzy system. The PSO-COCOMO has a better performance than those obtained when the SDEP equations proposed by Halstead, Walston-Felix, Bailey-Basili, and Doty are applied.

Wu et al. [

36] use PSO to optimize the CBR weights. They employ Euclidean, Manhattan, and grey relational grade distances as metrics to calculate the similarity measures. Their results show that the weighed CBR generates better prediction accuracy than unweighted CBR methods. They concude that the combined method integrating PSO and CBR improves the performance for the three mentioned measures.

Wu et al. [

37] use PSO in combination with six CBR methods. These methods differ by their type of distance measure (i.e., Euclidean, Manhattan, Minkowski, grey relational coefficient, Gaussian, and Mahalanobis). Results show that the combination of methods proposed by them has a better performance than independent methods, and that the weighted mean combination method has a better result.

Zare et al. [

32] apply PSO to obtain the optimal updating coefficient of effort prediction based on the concept of optimal control by modifying the predicted value of a Bayesian belief network. Its performance is compared to that obtained when applied GA. Results of their proposed model indicate that optimal updating coefficient obtained by GA increases the accuracy of prediction significantly in comparison with that obtained from PSO.

In accordance with

Table 1, only two studies used a non-biased prediction accuracy measure (i.e., AR), only two of them used a deterministic validation method (i.e., LOOCV), the half of them based their conclusions on statistically significance, and none of the them involved any recent repository of software projects: Albrecht was published in the year of 1983, Canadian organization in 1996, COCOMO in 1981, Desharnais in 1988, IBM in 1994, Kemerer in 1987, Maxwell in 1993, Miyazaki in 1994, Nasa in 1981, Telecom in 1997, and the most recent ISBSG release used was published in the year of 2011. Regarding the data set of projects of the six software organizations, it was published in 2012; however, its size is small: 21 projects [

25]. Finally, the year of those projects of China was not reported in those studies that have used them [

33,

40].

In those four studies where the ISBSG data set was used, the releases were 8 [

40], 10 [

33] and 11 [

34,

35], whose years of publication and sizes were 2003 with 2000 software projects, 2007 with 4000, and 2009 with 5052, respectively. When the release 8 was used, the authors selected a data set of 148 projects based on the following ISBSG criteria: DT (new), quality rating (“A” and “B” categories), resource level with 1 as value, maximum number of people working on the project, number of business units, and IFPUG as functional sizing method (FSM) type [

40]. As for release 10, they selected a data set of 505 projects taking into account only a criterion suggested by the ISBSG: the quality rating (“A”) [

33]. As for release 11: (a) they selected a data set of 134 projects based on three ISBSG criteria: quality rating (“A” and “B”), normalized effort ratio of up to 1.2, and “Insurance” value for the type of organization attribute [

34], and (b) they selected a data set of 380 projects based on quality rating attribute (“A” and “B”), DT, organization type, DP, normalized effort ratio of up to 1.2, resource level with 1 as value, and IFPUG as FSM [

35]. That is, in all of these four studies, only one data set was selected by study, and the type of FSM was not taken into account to select the data set by mixing the IFPUG versions.

In accordance with the analysis of these ten studies, PSO has been used in three fundamental manners: (a) as a tool to support CBR [

33,

34,

35,

36,

37,

40], (b) for the selection of the SDEP model [

38], and (c) for the optimization of values of a SDEP model ([

25,

29,

32]). In our opinion, the manner in how PSO was used in these studies has the following disadvantages:

- (a)
An increase in the computational cost inherent to CBR models by incorporating the use of optimization techniques.

- (b)
Allowing selecting the best SDEP model from a set of predefined models, but without an automatically adjustment of the parameters of the selected model.

- (c)
Define a priori the SDEP model to be used, and only adjusting its parameters.

Taking into account these weaknesses, our proposal incorporates the following two elements in PSO:

- (1)
The selection of the SDEP model, and

- (2)
The automatic adjustment of the SDEP model parameters.

The analysis of the

Table 1 also allows us emphasizing our experimental design which involves new and enhancement software projects selected based on their TD, DP, PLT, and FSM. Data of these projects are preprocessed through an outlier analysis, and calculation of two types of coefficients: correlation and determination. The models are trained and tested based on AR while a LOOCV is applied. Finally, the hypotheses of our study are statistically tested.

## 3. Particle Swarm Optimization (PSO) and PSO-SRE

#### 3.1. PSO

Particle Swarm Optimization (PSO) is an optimization model created in 1995 by Kennedy and Eberhart [

54]. It assumes that there is a cloud of particles, which “fly” in a

D dimensional space. This original idea was refined three years later considering the introduction of memory into particles [

55]. Particles have access to two types of memory: individual memory (the best position occupied by the particle in space) and collective memory (the best position occupied by the cloud in space). The evolution of the original PSO has continuously been analyzed [

15].

In PSO, the size of the particle cloud np (number of particles) is considered as a user parameter. In the cloud, each particle $i$ has stored the following three real vectors of D dimensions: the current position vector ${x}_{i}$, the vector of the best position reached ${p}_{i}$, and the flight speed vector ${v}_{i}$. In addition, the cloud or swarm stores the best global position vector ${g}_{best}$.

The movement of the particles is defined as a change in their position, when adjusting a velocity vector, component by component. To do this, the particles use individual memory and collective memory. The

j-th component of the velocity vector of the

i-th particle is updated as:

where

$w$ is the inertia weight,

${c}_{1}$ is the individual memory coefficient, and

${c}_{2}$ is the global memory coefficient. The function

$rand\left(0,1\right)$ represents the generation of a random number in the [0, 1] interval. If the velocity components exceed the established limits, they are bounded, such that it complied that

${V}_{min}\le {v}_{i,j}\le {V}_{max}$.

Subsequently, the

j-th component of the current position vector of the

i-th particle is adjusted as:

This adjusting process on the positions for the particles is repeated until a stop condition is achieved, which is usually settled as a number of algorithm iterations.

The pseudocode of the PSO algorithm described by Shi and Eberhart [

55] is shown in

Figure 1. It assumes that it is intended to minimize an objective function.

In terms of complexity and time execution, PSO has two external loops that are generation loops and a loop through the entire population. In each loop, the optimization function is computed for each member of the population (particle). Considering $k$ as the cost of computing the optimization function, it can be said that the time of execution of PSO is bounded by $O\left(it\ast np\ast k\right).$

#### 3.2. PSO-SRE

We use a PSO design considering an additional element: the SDEP model. Thus, the 20 functions detailed in

Table 2 were taken into account to achieve it. The x independent variable used in these 20 functions corresponds to the FSM (i.e., AFP). We add an additional component to each individual, which corresponds to an integer number in the interval [

1,

20] which represents the SDEP model to be used by the particle. Consequently, each particle has a different number of dimensions, which will vary according to the coefficients of the selected model. This novel modification allows us to simultaneously optimize the SDEP model to be selected, as well as its parameters.

Figure 2 shows a swarm of five particles used by our proposed algorithm. The pseudocode of the proposal is shown in

Figure 3.

Based on

Figure 2, it can be defined that to obtain the final value to be compared for each particle, the first value is taken, which is the one that corresponds to the number of the model shown in

Table 2, together with the following

n values that correspond to the model parameters. As example, for the particle [1, 0.24, 0.18], the first value (1) would correspond to model 1 of the

Table 2 and the next two values (0.24, 0.18) correspond to the parameters

a and

b to optimize the selected model, which is:

$y=a+bx$ obtained by the SDEP model.

We use the first dimension to determine the equation assigned to the particle. By this, we use a dynamic codification, and the particles will have different dimensions, depending on the assigned equation. The dimensions of the particles range from two to five.

To update the velocity vector of a particle, if the value of ${g}_{best}$ has more dimensions, the ones that are needed are used, and if it has fewer dimensions, we use random numbers instead. The particles have in common that they use mathematical equations to optimize the prediction of the effort of software projects. A particle interacts with itself (updating its best position) and with the best particle of the swarm. Even if a particle and the best particle have different equations, using the coefficients of the best particle helps the particle to move towards a global optimum. If we use random guessing, we do not consider the fitness results. On the other hand, if we use the best particle, we consider the fitness.

We consider that using the proposed codification offers the proposed PSO-SRE more search space capacity (exploration) and allows it to get better out of local optimums. However, for other optimizations problems, this increase in the search capacity of the proposal can be prone to not giving the best results, due to the decrease in the exploitation capacity of the proposal.

The dimension and model will be unique for each data set and, once the stop condition is attained, the test MAR value will be calculated.

A crucial aspect of optimization algorithms such as PSO lies in the selection of the optimization function. In our research, two optimization functions (i.e., the prediction accuracy measures) are evaluated: AR, and the MAR. AR is calculated by

ith-project as follows [

13]:

And the mean of ARs as follows:

The median of ARs is denoted by MdAR. The accuracy of a prediction model is inversely proportional to the MAR or MdAR.

The parameter values for the proposed PSO model are:

$w=0.1$,

${V}_{min}=-10$,

${V}_{max}=10$ and

${c}_{1}={c}_{2}=1.5$, this last value was chosen due to it had better results in our experiments, compared to that recommended as a standard value (i.e.,

${c}_{1}={c}_{2}=2$ [

54]). The swarm size was evaluated considering

$np$ between 50 and 750, whereas the iteration number was set between 250 and 1500.

As optimization function, we use the MAR of the training set, considering a LOOCV for the corresponding model defined in

Table 2.

## 4. Data Sets of Software Projects

In the present study, the data sets used were obtained from the ISBSG release 2018, which is an international public repository whose data of software projects developed between 1989 and 2016, were reported from 32 countries. Among these countries are Spain, United States, Netherlands, Finland, France, Australia, India, Japan, Canada, and Denmark [

59]. The projects were selected observing the ISBSG guidelines by selecting the data sets taking into account the quality of data, FSM, TD, DP, and PLT [

41].

Table 3 describes the number of projects by applying each criterion (the ISBSG classifies the quality data of projects from “A” to “D” types, and “A” and “B” are recommended for statistical analysis). Since IFPUG V4 projects with V4 and post V4 should not be mixed [

41], only those projects whose FSM corresponded to IFPUG 4+ were selected. In classifying the final 2054 projects of

Table 3 by DT, 618 of them were new, 1416 enhanced and 20 re-development projects. The types of DP reported by the ISBSG are mainframe (MF), midrange (MR), multiplatform (Multi), personal computer (PC), and proprietary, whereas the PLT are second (2GL), third (3GL), fourth (4GL) generation, and application generator (ApG). As for the resource level, the ISBSG classifies it in accordance with how effort is quantified, and the level 1 corresponds to development team effort [

41]. Those new and enhancement data sets were selected since they are the larger ones.

The IFPUGV4+ FSM is reported in AFP, which is a composite value calculated from the following nineteen variables: internal logical file, external interface files, external inputs, external outputs, external inquiries, data communications, distributed data processing, performance, heavily used configuration, transaction rate, on-line data entry, end-user efficiency, on-line update, complex processing, reusability, installation ease, operational ease, multiple sites, and facilitate change [

13].

Table 4 classifies those final 2034 new and enhancement projects classified in accordance with criteria included in

Table 3. Since the χ

^{2} statistical normality test to be applied in this study needs at least thirty data, a scatter plot (

Effort vs.

AFP) was generated by data set whose number of projects in

Table 4 was higher or equal than thirty (i.e., fifteen data sets). The scatter plots of these fifteen data sets from the

Table 4 showed skewness, heteroscedasticity, and presence of outliers, therefore, in

Table 5 four statistical normality tests are applied for

Effort and

AFP variables.

Table 5 shows that there is at least a

p-value lower than 0.01 by data set. It means that can be rejected the idea that

Effort and

AFP come from a normal distribution with 99% confidence for all of the data sets. Therefore, data are normalized applying them the natural logarithm (

ln), which ensures that the resulting model goes through the origin on the raw data scale [

43]. As example,

Figure 4 and

Figure 5 depict those scatter plots corresponding to that data set of

Table 4 having 133 new software projects.

Figure 4 and

Figure 5 depict the raw and transformed data, respectively.

Outliers were identified based on studentized residuals greater than 2.5 in absolute value. The outliers, as well as coefficients of correlation (

r) and determination (

r^{2}) by data set are included in

Table 6. In accordance with the number of acceptable outliers, a 5% of them by data set was taken as reference [

60]. As for a minimum percentage for the coefficient of determination, at least a

r^{2} value higher than 0.5 was considered since it has been accepted for SDEP models [

61]. Thus, in this study, eight data sets of those fifteen analyzed in

Table 6 were selected to generate their corresponding PSO-SRE and SRE. They were finally selected since three of the fifteen had a

r^{2} value lower than 0.5, three of them presented a percentage between 11% and 16.6% of outliers, and one of them had a

r^{2} = 0.4119 with 14.28% of outliers.

The model for the SRE is linear having the form

ln(

Effort) = a + b *

ln(

AFP).

Table 7 contains the SRE by data set selected from

Table 6. All equations coincide with the assumption of development effort: the higher size (i.e., AFP), the higher effort is.

## 5. Results

A total of 130, 99, 96, 440, 64, 428, 190 and 53 SREs were generated by data set once a LOOCV was performed.

The proposed PSO-SRE algorithm was executed for each dataset with different configurations in a distributed manner and on a dedicated server for the laboratory, trying to ensure that the execution time was as short as possible and trying to get the best possible result.

After applying the PSO-SRE, it is possible to detail the selected SDEP model (from those ones included in

Table 2) by data set.

Table 8 includes the PSO-SRE configuration in terms of the number of iterations, as well as the swarm size for three types of tests described next (the values were selected from those configurations described in the previous paragraph):

Test 1: Up to 500 iterations, and up to 250 individuals in the swarm;

Test 2: Up to 1500 iterations, and up to 750 individuals in the swarm;

Test 3: Up to 1000 iterations, and up to 500 individuals in the swarm.

As for velocity updates, Barrera et al. [

62] address the issue of defining velocity limits iteratively. They show that for some optimization functions, the velocity update reported by Shi and Eberhart [

55], is susceptible to sub-optimal behavior. However, for predicting the effort of software projects, we obtained good results with the approach of Shi and Eberhart [

55].

The number of iterations and swarm size depend on the data set converging in different conditions. In relating the swarm size and iteration number of the three different tests of

Table 8 to the prediction accuracy of

Table 9 by data set, we can conclude that the increase of swarm size and iteration number degrades the performance of the proposed PSO-SRE algorithm. In accordance with Test 1 data, we can conclude that between 250 and 500 iterations, and a swarm size between 50 and 250 correspond to values which can be suggested for generate better results.

Table 9 includes the prediction accuracy obtained by model. It shows that PSO-SRE had a better MAR than SRE in seven of the eight data sets, and equal than the remaining one for Test 1, that is, when swarm size and number of iterations were lower than those of Test 2 and Test 3. In addition, the MARs for the Test 1 data sets were better than those MARs of Test 2 and Test 3 for all data sets except for one data set in which the MAR resulted equal for the three Tests (MAR = 0.61). Thus, data obtained from Test 1 are used in the present study.

In

Table 9, we include a simple Random Search (RS) algorithm, which:

not have memory of its own nor a search direction,

repeat the random search of “the best particle” for a number of times that is equal to the number of fitness evaluations in the proposed PSO-SRE,

compare its best solution with the best solution yielded by PSO-SRE.

Since a MAR is not sufficient to report results in studies on software effort prediction, a suitable statistical test is applied for comparing the accuracies of the two models [

46]. The selection of this test should be based in the number of data sets to be compared, data dependence and data distribution. In our study, two data sets will be compared at a time, and they are dependent (because each model was applied to each project by data set). As for data distribution, firstly, a new data set is obtained by each of the eight data sets of

Table 9. Each new data set is obtained from the difference between the two ARs by project (an AR of SRE, and an AR of

PSO-SRE). Secondly, four normality statistical tests are performed to each new data set. Thirdly, if any of their four p-values is lower than 0.05 or 0.01, then data are non-normally distributed at 95% or 99% of confidence, respectively, and a Wilcoxon test should be applied (the medians of models should be compared to accept/reject the hypothesis), otherwise, a

t-paired should be performed (the means of models should then be compared) [

63].

Table 10 shows that only in two cases the data resulted normally distributed, then, in the resting fourteen cases, a Wilcoxon test was applied, and the medians were used for the fourteen comparisons.

An important issue regarding PSO-SRE is the computational complexity. The server used for performing the tests had the following characteristics: OS: Ubuntu 20.04.1 LTS x86_64, Host: PowerEdge R720, Kernel: 5.4.0-47-generic, CPU: Intel Xeon E5-2620 0 <24> @2.500 GHz, GPU: NVIDIA Tesla K20Xm, GPU: NVIDIA GeForce 210 and Memory: 4828 MiB/64,347 MiB.

We executed the algorithm in a distributed manner (i.e., datasets in parallel), which reduced the execution time. However, we made a sequential set of experiments (i.e., one dataset at the time) to estimate the total time expended in each set of experiments considering the LOOCV used.

Table 11 shows the time (sequential) by data set. Its column “Prediction” refers to the time of using the proposed PSO-SRE to predict the effort of a software project for each of the datasets. As shown in

Table 11, the proposed PSO-SRE is able to predict the effort of a software project in less than half a minute.

## 6. Conclusions

The results showed in the

Table 9 and

Table 10 allow us accepting the following alternative hypothesis formulated in the

Section 1 of our study in favor of PSO-SRE for seven of the eight data sets (six of them at 99% of confidence, and the seventh one at 95% of confidence):

Prediction accuracy of the PSO-SRE is statistically not equal to that of the SRE when these two models are applied to predict the development effort of software projects using the AFP as the independent variable.

Regarding the remaining data set, the following null hypothesis is accepted at 99% of confidence:

Prediction accuracy of the PSO-SRE is statistically equal to that of the SRE when these two models are applied to predict the development effort of software projects using the AFP as the independent variable.

As for the comparison between the PSO-SRE and RS, the following hypothesis can be accepted in favor of the PSO-SRE for the eight data sets at 99% of confidence:

Prediction accuracy of the PSO-SRE is statistically not equal to that of the RS when these two models are applied to predict the development effort of software projects using the AFP as the independent variable.

We can conclude that a software manager can apply PSO-SRE for predicting the development effort of a software project taking into account the following TD, DP, and PLT when AFP is used as the independent variable:

- (a)
New software projects coded in 3GL and developed in either Mainframe or Multiplatform and coded in 4GL and developed in Multiplatform.

- (b)
Software enhancement projects coded in 3GL and developed in Multiplatform, MidRange or personal computer, as well as in those projects coded in 4GL and developed in Multiplatform.

- (c)
Since the performance of the PSO-SRE resulted statistically equal than SRE, a software manager could also apply PSO-SRE as alternative to an SRE to software enhancement projects coded in 3GL and developed in Mainframe.

Regarding PSO-SRE optimization, from a general perspective, the best prediction accuracy by data set was obtained when the number of iterations was between 250 and 500, and the swarm size between 50 and 250.

## 7. Discussion

In software prediction, one of the most common predicted software variables has been effort, which is commonly measured in person-hours or person-month. SDEP is needed for managers to estimate the cost of projects and then for budgeting and bidding; actually, its importance can be showed in the hundreds of studies published in the last forty years. Thus, in the present study, the PSO was applied for optimizing the parameters of SDEP equations. The prediction accuracy of the PSO-SRE was compared to that obtained from SRE. Both types of models were generated based on eight data sets of software projects selected by observing the guidelines of the ISBSG.

In comparing our study with those ten identified ones where PSO has been applied to SDEP and described in

Table 1, we identify the following issues:

None of them generate their models by using a recent repository of software projects.

Regarding the four studies where the ISBSG is used (1) their releases correspond to those published in the years 2007 and 2009, (2) all of them only select one data set from the ISBSG whose sizes are between 134 and 505, and (3) none of them take into account the version of the FSM to select the data set; whereas in our study, (1) the ISBSG release 2018 was used, (2) eight data sets containing between 53 and 440 projects were selected, and (3) all of them took into account the guidelines suggested by the ISBSG, including the type of FSM, that is, our data sets did not mix IFPUG V4 type with V4 and post V4 one.

The majority of them base their conclusions on a biased prediction accuracy measure, and on a nondeterministic validation method.

The half of them bases their conclusions on statistically significance.

We did not find any study having all of the following characteristics as ours when proposed the PSO-SRE:

- (1)
The use of PSO incorporating an additional component by allowing automatic completion, in a single step, of the selection of the SDEP model, and automatic adjustment of the parameters of the SDEP model.

- (2)
New and enhancement software projects obtained from the ISBSG release 2018.

- (3)
Software projects selected taking into account the TD, DP, PLT, and FSM as suggested by the ISBSG.

- (4)
Preprocessing of data sets through outliers’ analysis, and correlation and determination coefficients.

- (5)
A nonbiased prediction accuracy measure (i.e., AR) to compare the performance between PSO-SRE and SRE models.

- (6)
The use of a deterministic validation method for training and testing the models (i.e., LOOCV)

- (7)
Selection of a suitable statistical test based on number of data sets to be compared, data dependence, and data distribution for comparing the prediction accuracy between PSO-SRE and SRE by data set.

- (8)
Hypotheses tested from statistically significance.

Our manuscript also followed all of the six guidelines when a new SDEP model is proposed [

43] by (1) confirming that our PSO-SRE algorithm outperforms a statistical model (i.e., SRE), when PSO-SRE outperformed to SRE in seven of the eight data sets with statistical significance, (2) taking into account the heteroscedasticity, skewness, and heterogeneity of effort and size data of software projects, (3) using statistical tests to compare the performance between prediction models, (4) explaining in detail how our PSO-SRE is applied, (5) justifying the selection of any statistical test we used, and (6) including the criteria followed for selecting the data sets of software projects from the ISBSG.

A first limitation of the present study that reduces the generalization of our conclusions is related to the number of data sets used, that is, in spite of the ISBSG contains more than eight thousands of software projects, we could only select eight data sets observing the guidelines of the ISBSG. A second one is that only 20 prediction models are considered based on simple SRE (

Table 2). Finally, a third limitation is that we did not consider other more complex ML prediction models.

As for external threat validity, the prediction accuracy of PSO-SRE will depend on an accurate estimation performed by the practitioner on the independent variable value (i.e., the size measured in AFP).

Another limitation regarding the use of heuristic algorithms (PSO in this paper) is that they prune the search space and can discard useful regions. We are aware that the proposed method, despite its good behavior for predicting the effort of software projects, can have a different performance for other optimization problems.

Future work will be related to the application of other metaheuristics for optimizing the parameters for the SRE. We will intent to use a greater number of datasets, whose data is current and reflecting the heterogenic evolution of these data. Alternative models will also be proposed to predict the effort of new and enhancement software projects such as those based on classifiers [

64,

65]. Moreover, additional prediction accuracy measure criteria will be take account such as standardized accuracy and effect size [

66]. Finally, a modification to the algorithm can be added to take duplicate values into account and to act similarly between them, as well as to apply alternative update mechanisms as in [

62] for the velocity update, and test it against the one currently used. We will also explore newer and improved implementations of PSO.