Improving the Results of the Earned Value Management Technique Using Artificial Neural Networks in Construction Projects

The cost, time and scope of a construction project are key parameters for its success. Thus, predicting these indices is indispensable. Correct and accurate prediction of cost throughout the progress of a project gives project managers the chance to identify projects that need revision in their schedules in order to result in the maximum benefit. The aim of this study is to minimize the shortcomings of the Earned Value Management (EVM) method using an Artificial Neural Network (ANN) and multiple regression analysis in order to predict project cost indices more precisely. A total of 50 road construction projects in Fars Province, Iran, were selected for analysis in this research. An ANN model was used to predict the projects’ cost performance indices, thereby creating a more accurate symmetry between the predicted and actual cost by considering factors that influence project success. The input data of the ANN model were analysed in MATLAB software. A multiple regression model was also used as another analytical tool to validate the outcome of the ANN. The results showed that the ANN model resulted in a lower Mean Squared Error (MSE) and a greater correlation coefficient than both the traditional EVM model and the multiple regression model.


Introduction
The number of road construction projects is increasing dramatically every year. Although project management is being more expertly implemented, there are still problems associated with cost overruns in projects [1]. One of the factors that increases the capital output ratio for a country's economy is cost overrun. Estimating the cost of projects has always been a crucial, demanding and sophisticated challenge [2,3]. Cost estimation is a process in which the total cost of a project is predicted based on the existing information [4]. Generally, cost estimation is conducted in order to set the initial budget of a project, which will ideally produce symmetry between the initial estimation and the subsequent actual cost [1]. Cost estimation presents some difficulties, such as the initial information required, the small number of databases available for road construction project costs, the low efficiency of existing cost estimation methods and the existence of uncertainties [5].
Earned Value Management (EVM) is a tool to help with controlling the progress of a project. EVM is able to illustrate the current status of projects, as well as measuring current variances [6]. To assess the progress of projects, EVM exploits three constraints: time, scope and cost. Moreover, EVM is able to predict the future parameters of projects, including the final cost, based on existing data [7][8][9]. This

Methodology
The methodology of the current study was determined according to the research aim. The main purpose of this research was to improve the prediction of the traditional EVM system in Fars road construction projects using an artificial neural network, as well as comparing it with a multiple regression model. The abovementioned main aim can be divided into three stages. Firstly, factors affecting the earned value of Fars road construction projects were determined using the existing literature. An artificial neural network was built in MATLAB, and the identified factors were introduced to the ANN model. In the next stage, the identified factors were prioritized in MATLAB using the ANN model. Finally, multiple regression was used as the analyzing tool, and the obtained results were compared with the ANN model. The abovementioned stages are summarized in Figure 1.

Methodology
The methodology of the current study was determined according to the research aim. The main purpose of this research was to improve the prediction of the traditional EVM system in Fars road construction projects using an artificial neural network, as well as comparing it with a multiple regression model. The abovementioned main aim can be divided into three stages. Firstly, factors affecting the earned value of Fars road construction projects were determined using the existing literature. An artificial neural network was built in MATLAB, and the identified factors were introduced to the ANN model. In the next stage, the identified factors were prioritized in MATLAB using the ANN model. Finally, multiple regression was used as the analyzing tool, and the obtained results were compared with the ANN model. The abovementioned stages are summarized in Figure  1.

Predicting Earned Value Using Artificial Neural Network
Intelligent dynamic systems, such as ANNs, have been under researchers' focus recently [44][45][46][47][48][49][50][51]. ANNs are able to identify the relationship among data by analyzing them and to then exploit this relationship in further analyses [52]. In fact, these computational intelligence-based systems attempt to model the neurosynaptic structure of the brain and are able to contribute to estimation, prediction and categorization problems effectively [53]. Generally, ANNs consist of three layers, namely, the input, hidden and output layers. Each of the abovementioned layers possesses its own neurons. It is important to mention that the number of hidden layers may be more than one according to the problem. In the current study, a multilayer perceptron network was used.

Input Data
Variables affecting the status of the project must be identified in order to investigate its future status. In fact, these variables are the input data of the artificial neural network. In this study, 14 factors affecting a project's success were identified by investigating the existing literature, including books, journal papers and documents from the Fars State Road Administration. Due to the high sensitivity of this paper's topic, the authors were not able to reduce the abovementioned number of factors. Some of the variables possessed numerical values, such as inflation rate. The inflation rate was derived from the Central Bank of Iran. However, there were variables that were not numerical, such as the qualification of the project management team. The abovementioned data were then quantified by scoring the variables from 1 to 5, where 1 and 5 stand for the worst and best status of a variable, respectively. In order to make it clearer, the qualitive status of a variable and its

Predicting Earned Value Using Artificial Neural Network
Intelligent dynamic systems, such as ANNs, have been under researchers' focus recently [44][45][46][47][48][49][50][51]. ANNs are able to identify the relationship among data by analyzing them and to then exploit this relationship in further analyses [52]. In fact, these computational intelligence-based systems attempt to model the neurosynaptic structure of the brain and are able to contribute to estimation, prediction and categorization problems effectively [53]. Generally, ANNs consist of three layers, namely, the input, hidden and output layers. Each of the abovementioned layers possesses its own neurons. It is important to mention that the number of hidden layers may be more than one according to the problem. In the current study, a multilayer perceptron network was used.

Input Data
Variables affecting the status of the project must be identified in order to investigate its future status. In fact, these variables are the input data of the artificial neural network. In this study, 14 factors affecting a project's success were identified by investigating the existing literature, including books, journal papers and documents from the Fars State Road Administration. Due to the high sensitivity of this paper's topic, the authors were not able to reduce the abovementioned number of factors. Some of the variables possessed numerical values, such as inflation rate. The inflation rate was derived from the Central Bank of Iran. However, there were variables that were not numerical, such as the qualification of the project management team. The abovementioned data were then quantified by scoring the variables from 1 to 5, where 1 and 5 stand for the worst and best status of a variable, respectively. In order to make it clearer, the qualitive status of a variable and its corresponding  Table 1. Ten questionnaires were filled out by experts for each project. Thus, 500 questionnaires were used for data gathering. Using Microsoft Project files of the studied projects, the Cost Performance Index (CPI) of each project was extracted. Then, using Microsoft Excel, Mean Squared Error (MSE) was calculated. This error was used to compare the results of the ANN, multiple regression and the traditional EVM method. The BOX-COX method was used in order to normalize data using SPSS software. Then, the obtained data were exported to MATLAB software for further stages. CPI and MSE formulas are presented as follows [1,8,54,55]: where BCWP and ACWP stand for the actual cost of the work performed and the budgeted cost of the work performed, respectively.

Architecture of the Network
In this stage, the network's architecture must be determined. In order to do so, the number of input, hidden and output layers should be specified [15]. In this study, an MLP (Multilayer perceptron) network is used in which the output of each layer is considered the input vector for the next layer. Each layer's neurons have connections with the previous layer's neurons. Each neuron's duty is to calculate the net layer's weight and pass data through a function called the transfer function. Sigmoid Tangent is regarded as one of the most useful functions in this case and has been widely used by experts [56][57][58][59][60][61]. Thus, the abovementioned function was used as the transfer function. The final network in this research constitutes a multilayer perceptron neural network with 14 input variables in an input layer, a hidden layer and an output layer. The schematic structure of the designed neural network is illustrated in Figure 2.

Determination and Prioritization of Factors Using ANN
After training the network, output coefficients of introduced variables can be extracted from MATLAB software. As the artificial neural network considers all the introduced factors important, the prioritization of factors is conducted according to the coefficients.

Determination and Prioritization of Factors Using ANN
After training the network, output coefficients of introduced variables can be extracted from MATLAB software. As the artificial neural network considers all the introduced factors important, the prioritization of factors is conducted according to the coefficients.

Earned Value Prediction Using Multiple Regression Method
The correlation among dependent and independent variables can be determined using the multiple regression method [62]. There are four methods to enter input data into the model. These methods are the entering method (direct method), backward method, forward method and step-wise method [63]. In this study, the direct entering method was selected to be exploited. The linear relationship among the variables is illustrated below: where is the number of predictions, is the value of the coefficient, is the value of the prediction, and is the error of the value. Furthermore, the matrix form of the model is presented as follows: where is the vector of regression coefficients, is the matrix of fitting errors, is the vector of the dependent variable, and is the matrix of independent variables. In order to determine and rank factors affecting the earned value of the studied projects, outputs of SPSS analyses were used. Variables with a significance of less than 0.05 were selected as effective factors. Furthermore, according to their significance value, variables were prioritized.

Earned Value Prediction Using Multiple Regression Method
The correlation among dependent and independent variables can be determined using the multiple regression method [62]. There are four methods to enter input data into the model. These methods are the entering method (direct method), backward method, forward method and step-wise method [63]. In this study, the direct entering method was selected to be exploited. The linear relationship among the variables is illustrated below: where p is the number of predictions, b j is the value of the jth coefficient, x ij is the ith value of the jth prediction, and e i is the error of the ith value. Furthermore, the matrix form of the model is presented as follows: where β is the vector of regression coefficients, ε is the matrix of fitting errors, Y is the vector of the dependent variable, and X is the matrix of independent variables. In order to determine and rank factors affecting the earned value of the studied projects, outputs of SPSS analyses were used. Variables with a significance of less than 0.05 were selected as effective factors. Furthermore, according to their significance value, variables were prioritized.
Finally, the ANN and the multiple regression model were compared according to the correlation coefficient and mean squared error of each model. The model possessing the higher correlation coefficient, as well as the lower MSE, was introduced as the preferable model [64].

Data Collection
In order to collect data, information regarding 50 road construction projects in Fars Province was extracted from documents. Then, besides other literature sources, data were turned into matrices and analyzed. As all factors affecting the cost of the abovementioned projects had to be considered, 14 factors were finally selected.

Normalizing Data
Using SPSS software, data were normalized in a range between −1 and 1. It seems necessary to mention that the ANN's output can be returned to the initial format using the reverse algorithm. Normalized data are illustrated in Table 2.

Determining Hidden Layers of ANN
It is best for the number of hidden layers to be as low as possible. One hidden layer is initially considered for an ANN. Then, after training the ANN, the number of layers will be increased if the output is not suitable. Furthermore, there are a number of functions that can be used to produce the network's outcome. In this study, the Sigmoid Tangent function was exploited. The network introduced into MATLAB software included 14 neurons in its input layer and 3 neurons in its hidden layer. The structure of the network is illustrated in Figure 3.

Determining Hidden Layers of ANN
It is best for the number of hidden layers to be as low as possible. One hidden layer is initially considered for an ANN. Then, after training the ANN, the number of layers will be increased if the output is not suitable. Furthermore, there are a number of functions that can be used to produce the network's outcome. In this study, the Sigmoid Tangent function was exploited. The network introduced into MATLAB software included 14 neurons in its input layer and 3 neurons in its hidden layer. The structure of the network is illustrated in Figure 3.

Training of the ANN
The introduced network in this study is an MLP network with back propagation error. The selected training function for the network was the Levenberg-Marquardt function due to its ability to converge fast. The transfer function was selected by trial and error, until the MSE reached the lowest value in both the training set and testing set. The data set was randomly divided into three groups. Seventy percent of the data was used for acquisition of the network, fifteen percent was used for testing  Figure 4. The number of epochs was selected as 1000. As a result, the network reached its lowest acquisition error after 15 epochs. The network's gradient function performance, MSE graph and regression graphs are shown in Figures 5-7, respectively.

Training of the ANN
The introduced network in this study is an MLP network with back propagation error. The selected training function for the network was the Levenberg-Marquardt function due to its ability to converge fast. The transfer function was selected by trial and error, until the MSE reached the lowest value in both the training set and testing set. The data set was randomly divided into three groups. Seventy percent of the data was used for acquisition of the network, fifteen percent was used for testing the data, and fifteen percent was used for validation. The settings of the training ANN in MATLAB are demonstrated in Figure 4. The number of epochs was selected as 1000. As a result, the network reached its lowest acquisition error after 15 epochs. The network's gradient function performance, MSE graph and regression graphs are shown in Figures 5-7, respectively.     As a sample, one of the studied project's Status Curve (S-Curve) was drawn using the trained ANN and was compared with the traditional EVM's S-Curve. Improvement of the S-Curve is clearly seen in the figures below. Figures 8 and 9 illustrate the traditional model and ANN's S-Curves, respectively. As a sample, one of the studied project's Status Curve (S-Curve) was drawn using the trained ANN and was compared with the traditional EVM's S-Curve. Improvement of the S-Curve is clearly seen in the figures below. Figures 8 and 9 illustrate the traditional model and ANN's S-Curves, respectively. As a sample, one of the studied project's Status Curve (S-Curve) was drawn using the trained ANN and was compared with the traditional EVM's S-Curve. Improvement of the S-Curve is clearly seen in the figures below. Figures 8 and 9 illustrate the traditional model and ANN's S-Curves, respectively.

Determination and Prioritization of Factors Affecting Earned Value in the ANN
After training of the ANN in MATLAB, each variable is given a unique coefficient. Coefficients for the identified factors are illustrated in Table 3. According to the factors' coefficients, the ANN's function to predict the aim is obtained as follows:

Determination and Prioritization of Factors Affecting Earned Value in the ANN
After training of the ANN in MATLAB, each variable is given a unique coefficient. Coefficients for the identified factors are illustrated in Table 3.
According to the factors' coefficients, the ANN's function to predict the aim is obtained as follows: Then, the final equation is obtained as follows: In this stage, SPSS software was exploited. The first condition if using linear regression is having normal data of earned value. Thus, a Kolmogorov-Smirnov test was conducted on the data in order to determine whether they were normal. The results illustrated that the data were not normal. Table 4 and Figure 10 illustrate the information regarding the abovementioned test.

Investigating the Condition of Using Multiple Regression Analysis
In this stage, SPSS software was exploited. The first condition if using linear regression is having normal data of earned value. Thus, a Kolmogorov-Smirnov test was conducted on the data in order to determine whether they were normal. The results illustrated that the data were not normal. Table  4 and Figure 10 illustrate the information regarding the abovementioned test.

Analysis of Multiple Regression Model
Data analysis was conducted in order to validate the ANN results by comparing them with the multiple regression results. The correlation coefficient and determination coefficient of this study's fitted multiple regression were 0.864 and 0.747, respectively. This means that about 74% of the dependent variable's variance is determined according to the model's independent variables. Information regarding the mentioned coefficients and the model analysis results is illustrated in Tables 5 and 6, respectively. In Table 6, B and β stand for unstandardized coefficients and standardized coefficients, respectively. Although it is easier to write the multiple regression model's equation using unstandardized coefficients, using standardized coefficients enables researchers to compare variables more easily. In other words, a higher value of the coefficient means that the variable can predict the outcome more effectively. According to the results, "Risk management", "Plans", "Project schedule", "Relationship among project's parties" and "Conflicts" are the most important factors.
In this study, an artificial neural network model for road construction projects was used in order to improve the prediction of the earned value. Moreover, a multiple regression model was used to validate the ANN results. The ANN and multiple regression models' calculated mean squared errors and the real values of projects are illustrated in Table 7. As it is easily seen, both the ANN model and the multiple regression model possess low errors. Moreover, the ANN model not only had the lowest error, but also possessed the most effective prediction coefficient.

Conclusions
Perceptron neural networks, especially multilayer perceptron networks, are considered to be some of the best neural networks. In this study, it was observed that these networks were able to perform a non-linear mapping with desirable accuracy by selecting a suitable number of layers and neurons. As these neural networks possess the two main features of experimental data-based learning and parallel generalization ability, they are highly suitable for sophisticated systems that are impossible or difficult to model. Artificial neural networks are more accurate in comparison to other methods due to their usage of proven mathematical formulas possessing the lowest possible errors. One of the aspects that limit the usage of artificial neural networks is the difficulty faced when training them. These networks produce better results when they receive a large group of data. However, adjusting the parameters of network training is a difficult task that requires experience and a lot of trial and error. Furthermore, convergence to an incorrect answer, keeping internal information instead of learning it, and requiring a lot of time for training are other difficulties associated with using artificial neural networks.
In this research, two different models, i.e., an artificial neural network model and a multiple regression model, were designed and analyzed in order to improve the traditional earned value management system. The latter model was used as a validation test for the ANN model. Road construction projects in Fars Province, Iran, between 2010 and 2020 were investigated as a case study. Fourteen factors affecting the earned value of these projects were identified. According to the ANN results, "Project plan", "Payment status", "Inflation rate", "Fortuitous events" and "Qualification of project management team" with coefficients of 0.81, 0.65, −0.58, 0.42 and 0.4 were the top five influencing factors, respectively. On the other hand, according to the multiple regression model results, "Risk management", "Plans", "Project schedule", "Relationship among project's parties" and "Conflicts" with standardized coefficients of 0.333, 0.321, 0.311, 0.297 and 0.254, respectively, were the most important factors. A comparison of the two models illustrated that both models result in better results in comparison to the traditional EVM method. Moreover, the ANN model with an MSE of 0.00206 and an R value of 0.896 was selected as the best model.
The methods used in this study could also be used to tackle other problems in the construction industry. The results obtained in this study will help road construction industry members to predict the earned value of future projects more precisely. ANN models are highly recommended by the authors for use in other construction problems. Furthermore, it is suggested that prospective researchers focus on more complex construction projects in order to investigate the performance criteria more deeply [65].