Cost Forecasting of Substation Projects Based on Cuckoo Search Algorithm and Support Vector Machines

Accurate prediction of substation project cost is helpful to improve the investment management and sustainability. It is also directly related to the economy of substation project. Ensemble Empirical Mode Decomposition (EEMD) can decompose variables with non-stationary sequence signals into significant regularity and periodicity, which is helpful in improving the accuracy of prediction model. Adding the Gauss perturbation to the traditional Cuckoo Search (CS) algorithm can improve the searching vigor and precision of CS algorithm. Thus, the parameters and kernel functions of Support Vector Machines (SVM) model are optimized. By comparing the prediction results with other models, this model has higher prediction accuracy.


Introduction
The prediction of the cost level of the power grid project is an important part of the economic evaluation of the power system.Grid projects are often capital-intensive and have high technical requirements [1].It is of great significance to predict the cost level of grid projects effectively for improving the investment efficiency and sustainability.Grid projects mainly include transmission projects and substation projects, and these two types of projects are directly related to the safety production, normal operation, and economic benefits of the power grid.At present, there are many researches on the cost prediction of transmission project [2,3].Whereas, few scholars have studied the cost prediction of the substation project [4,5].In the construction of substation projects, a reasonable cost prediction can provide decision support and reference for the power grid companies, which is also helpful to promote the sustainable development of the investment in substation projects [6].However, due to the impact of regional economic development, the surrounding natural environment and project management level, the cost of substation projects often tends to be non-linear, irregular, and difficult to predict [7].
Scholars around the world have conducted in-depth studies on the cost of transmission and substation projects.The researches mainly include the construction and prediction of transmission project cost index [8,9], the analysis of transmission and substation project cost affecting factors [10][11][12], and the prediction of substation project cost [13], etc.In terms of constructing the cost index, Liu et al. [8] built the cost index of power grid projects by considering different technologies and voltage classes, and obtained the total project cost index by weighting the typical program, in which the weight could be determined by the Paasche index analysis method or Laspeyres index analysis method.Tao et al. [3] selected more than 300 cost indicators of transmission projects from 2002 to 2010, and pointed out that changes of transmission project cost were affected by the previous period.
The Markov chain prediction model was applied to describe the changes of each period.In addition, the total investment of transmission project was composed of the construction cost, the installation cost, the equipment cost, and other expenses.From the four parts, nine key indicators were selected as the comprehensive cost index to be predicted.Hua et al. [9] constructed cost index analysis and prediction model based on Autoregressive Integrated Moving Average Model (ARIMA) and exponential smoothing models, and then, realized the process of modeling and obtained forecast results through SPSS software, which have some references meaning for the application of prediction method.
In terms of the analysis of affecting factors of project cost, Wang et al. [12] set up the evaluation system of cost index of 500 kV transmission project according to the samples of transmission project cost.Activity Based Classification (ABC) analysis, key parameter simplification, and principal component analysis were applied to deal with the samples.Three level indexes and their calculation formulas were obtained through calculation.Furthermore, the Least Squares Support Vector Machine (LSSVM) based on particle swarm optimization was used to calculate and verify the evaluation index system; the results proved that the proposed model has high prediction accuracy.The influence of various factors on the cost of substation engineering was studied from the aspects of technology, organization, external environment, and cost parameters in [13].
In the prediction of project cost, the prediction methods mainly include time series method, multiple regression model, and intelligent prediction method, and the selection of prediction methods could affect the accuracy of prediction results directly.Xu et al. [14] pointed out that the building cost index had been widely used to measure the cost level of the construction industry.However, to improve the accuracy of measurement, the interaction between the cost indices and other variables (such as consumer price index) should be considered.Therefore, the cointegration theory and Vector Auto Regression (VAR) model was proposed for predicting the changes of construction cost, which could assess the risk and uncertainty of rising costs.Zhu et al. [15] took the situation of the region, transaction date, transaction conditions, and individual factors into account firstly to construct a hierarchical structure system of real estate price factors.Then, the 1-9 scale was used to build the comparison judgment matrix and the ranking weight of case level relative to price could be calculated layer by layer.Moreover, the stochastic fuzzy regression analysis method was introduced to predict the cost of residential buildings accurately.Shahandashti et al. [16] pointed that the cost of highway construction could vary greatly over time, which was directly related to the income of highway contractors.Therefore, the research selected sixteen indices from the National Highway Construction Cost Index (NHCCI) as candidate indices through the literature research.Based on the results of the co-integration test, a Vector Error Correction (VEC) model was established to predict the construction cost index of national highway and the results showed that the multivariate time series model was more accurate than the single variable model.
In the field of intelligent prediction model, scholars had explored a series of high-precision intelligent prediction methods.Qin et al. [17] considered qualitative and quantitative cost index as an input set and a single cost as an output set of indicators.The correlation of housing project cost input indicators could be eliminated by means of principal component analysis.Support Vector Machines (SVM) and LSSVM were applied, respectively, to predict the cost of 25 residential projects in Hangzhou, in which the prediction error was within 7%.Zhou et al. [18] predicted the cigarette sales based on LSSVM and optimized the LSSVM parameters on the basis of the improved Cuckoo Search (CS) algorithm.The introduction of inertia weights in the path and location updates of the cuckoo nest helped to avoid falling into local optimum.When considering the after-effectiveness of cigarette sales, the best time delay was determined by comparing the prediction accuracy under different delay numbers, and the cigarette sales at the current time and five time periods ahead was proved to be corresponding by calculating the actual data.Besides, this article made multi-step prediction through the iterative method and predicted the sales volume of cigarettes in different periods in the future, which improved the dynamics of the prediction.Shao et al. [19] proposed that it was of great practical significance to explain the dependence between medium-term demand and external variables scientifically.He solved the problem of nonlinear power demand prediction by combining Ensemble Empirical Mode Decomposition (EEMD) method with semi-parametric model.The model could capture the potential important features, including the climate and economic development from the original electricity demand data.Reliable confidence interval of longer term fluctuation trend was obtained by means of predicting the actual monthly electricity demand data from Suzhou and Guangzhou.
However, the accuracy of the existing prediction methods cannot meet the requirements of the modern management of power grid projects.In view of the above problems based on the traditional prediction methods, this paper proposes an optimized SVM model based on the improved CS algorithm that decomposed by EEMD to predict the substation projects' cost.

EEMD
Empirical Mode Decomposition (EMD) method can decompose the irregular signal into a number of well-characterized Intrinsic Mode Function (IMF) components, and EMD algorithm is essentially a stationary signal process [20][21][22][23][24].However, due to the existence of modal aliasing in the EMD algorithm, the precision and breadth of the EMD algorithm are restricted.In this paper, it is chosen to add the uniformly distributed Gaussian white noise signal to the EMD to decompose the signal, in order to avoid the EMD algorithm modal aliasing influences.This decomposition method is the EEMD algorithm, and the steps of the algorithm are as follows [25][26][27][28]: 1.
Determine the number of decomposed IMFs and the number of decompositions 2.
Add Gaussian white noise sequence to the input signals 3.
Normalize the signals after adding the white noise sequence 4.
Decompose the normalized signals to obtain multiple IMF components and one surplus variable: Among them r(t) is the remainder.The method is applied to predict the cost level of substation project.The original irregular cost data can be decomposed into a number of stationary IMF components through the EEMD processing, which are then input into the SVR model for prediction.Finally, the predictive value of substation project cost level can be obtained by adding the total amount.

SVM
SVM was proposed by Cortes and Vapnik in 1995 [29][30][31], which can solve the small sample and complex nonlinear regression problems effectively.It maps the data X i into high-dimensional space F by nonlinear mapping φ, and performs linear regression in high-dimensional space.The mapped linear function is f (x) = ωφ + b, which is used to solve the optimal function Equation (1) by finding the weight ω and threshold b in the linear function, according to the SVM criterion [32][33][34][35].
The above problem can be transformed into a dual problem by introducing Lagrange multiplier α i ≥ 0, β i ≥ 0, thus the classification decision function of SVM becomes: K(x i , x j ) in the formula is a kernel function.This paper selects radial basis function (RBF) as a kernel function.
where σ is the width of the kernel function.
The key to the accuracy of SVM regression model is the penalty factor C and the width of kernel function σ [36].Therefore, this paper chooses the improved CS algorithm to optimize C and σ in order to improve the generalization ability of SVM.

Optimized CS Algorithm
The CS algorithm can search the optimization much faster and more accurately by simulating the random walking process of cuckoo in the search for the suitable egg laying hosts [37][38][39].According to existing research, the CS algorithm has the following three rules [40,41]: 1.
The number of eggs produced by a cuckoo per time is 1.

2.
The host bird's nest where high-quality eggs are located is the optimal solution and will be retained for the next generation.

3.
The number of host nests is certain, and the probability that cuckoo eggs are found by nest owners is During the search, cuckoo's flight search path follows the Lévy distribution, namely: where x (t+1) i and x (t) i are the bird's nest positions of the (t + 1)th and the tth generation, n is the number of cuckoo, ⊕ is the point to point multiplication, and L(λ) is the Lévy flight path.The relationship between searching path and time is as follows: In the traditional CS algorithm, the probability of finding cuckoo eggs and the step size α of position updating are fixed values, which leads to the problems of the weak global searching ability, slow convergence speed, and low precision.Therefore, an improved cuckoo algorithm is proposed in this paper to update the values of P a and α dynamically, as follows [42,43]: )×t N (10) where t and N are the number of current iterations and the total number of iterations, P amax and P amin are the maximum and minimum values of the detection probability, α max and α min are the maximum and minimum step coefficients.
However, CS algorithm has defects of lacking of search vitality and slow speed of.The optimization ability of CS algorithm can be improved effectively by adding Gauss perturbation [44].
Assuming that the optimal location of the nest x is obtained after the calculation of t times CS iterations.In order to prevent the next iteration of x (i) i and maintain the Gaussian disturbance, the next phase of x i is a d-dimensional vector, and the dimension of p t is d × n.Matrix p t combined with Gaussian perturbation is the basic step of GCS algorithm, namely: where ε is a random matrix with the same order of p t , which follows N(0,1) distribution, and a is constant.In the search for a better nest position vitality at the same time, the position of the bird's nest can be overextended easily because of the large random range of ε.Therefore, the selection of suitable a is particularly important.After obtaining a reasonable set of p t and comparing it with each nest in p t , only a better nest position is reserved to obtain a better set of nest positions p t [45].

Substation Project Cost Prediction Model Based on EEMD-GCS-SVM
The cost prediction of substation project is affected by many factors, and the cost level is non-stationary and irregular.The specific process of substation project cost prediction model based on EEMD-GCS-SVM is shown in the Figure 1.Specific steps are: where ε is a random matrix with the same order of t p , which follows N(0,1) distribution, and a is constant.In the search for a better nest position vitality at the same time, the position of the bird's nest can be overextended easily because of the large random range of ε .Therefore, the selection of suitable a is particularly important.After obtaining a reasonable set of t p and comparing it with each nest in t p , only a better nest position is reserved to obtain a better set of nest positions t p [45].

Substation Project Cost Prediction Model Based on EEMD-GCS-SVM
The cost prediction of substation project is affected by many factors, and the cost level is non-stationary and irregular.The specific process of substation project cost prediction model based on EEMD-GCS-SVM is shown in the Figure 1.Specific steps are:

1.
Decompose the substation cost data to obtain the IMF components and surplus variables through the EEMD method, and normalize the data.

2.
Initialize the parameters and kernel functions of SVM model, input the normalized decomposed variables into SVM model, and find and determine the optimal parameters and kernel function of SVM model by using GCS algorithm.In order to search for the best parameters of the prediction model faster, the range of c, ε are set as [0.01, 100], [0.01, 100], respectively.Then, train the prediction model by plugging the historical data into the model and search the best parameter by using the GCS Algorithm.Firstly, set the N nest (number of birds' nest) as 20, while P a (probability of bird's eggs by bird's nest owner) is 0.45, and N (number of iterations) is 200.After that, randomly generate N nest bird nest location W = (W 1 , W 2 , ..., N nest ) T .Each bird nest W i has s parameters (s = the number of weights between input layer and hidden layer + the number of weights between hidden layer and output layer + the number of translation factors + the number of expansion factors).The predicted values of each bird's nest were calculated, and the nest which has the smallest error in the 20 nests is found, marked as W best .Then W best retain to the next generation.

3.
Train the SVM model by using the training set, and then input the test set data to obtain the predictive value of the cost data.

Basic Data
There are a lot of voltage levels in the substation projects.220 kV is a widely used voltage level of substation projects, and its cost data is more easily obtained at the same time.Thus, take the cost level data of 220 kV new outdoor substation projects at certain place in 2014-2016 (as shown in Table 1) as an example to validate the model.The cost level of substation projects is represented by the cost per kVA as the research data sample.Starting from the data of the first sample project, taking approximately equal time as the interval between the selected samples, we get the cost data of 72 samples, and sort them according to the completion time of the projects.

Basic Data
There are a lot of voltage levels in the substation projects.220 kV is a widely used voltage level of substation projects, and its cost data is more easily obtained at the same time.Thus, take the cost level data of 220 kV new outdoor substation projects at certain place in 2014-2016 (as shown in Table 1) as an example to validate the model.The cost level of substation projects is represented by the cost per kVA as the research data sample.Starting from the data of the first sample project, taking approximately equal time as the interval between the selected samples, we get the cost data of 72 samples, and sort them according to the completion time of the projects.First of all, decompose the cost level data by EEMD into five IMF components, and the decomposition results are shown in Figure 2.    From the decomposition results, each IMF component obtained after the EEMD algorithm shows obvious regularity and periodicity, the algorithm helps to improve the prediction accuracy of subsequent SVM model.Then, the first 54 sets of cost data are taken as training group, the decomposed IMF components and the surplus components are input into the GCS-SVM prediction model, respectively, for training.The total cost of each component is reduced to the predictive value after the prediction results of each component are obtained.The latter 18 sets of cost data are used as test group to verify the prediction effect of the model.

Results Analysis and Comparison
After training the forecasting model, the best value of c and ε are 29.8425 and 0.4871, respectively.In order to verify the accuracy of the model, the GCS-SVM model without EEMD algorithm, the EEMD-CS-SVM model with the non-optimized CS algorithm, and the EEMD-GCS-SVM model are used to predict the samples in this paper.The results are shown in Figure 3a-c.
It can be seen from Figure 3a-c that the prediction result by EEMD-GCS-SVM model, which has been decomposed by EEMD and optimized by GCS algorithm, has the highest fitting and prediction accuracy.The results indicate that the EEMD and GCS algorithm is helpful to improve the accuracy of the model.
In addition, BP neural network model, SVM model, GCS-SVM model, EEMD-CS-SVM model, and EEMD-GCS-SVM model are, respectively, used to predict the data of the test set in this paper, and the error comparison results are shown in Table 2 and Figure 4.
According to the above table, the RMSE of the EEMD-GCS-SVM model is 0.51, MAE is 0.43, and MAPE is 0.13%, which indicates that the prediction accuracy is higher than EEMD-CS-SMV and GCS-SVM, and is significantly better than the BP neural network model and the SVM model.
The boxplot can directly reflect the accuracy of each model.It can be seen from the boxplot that the prediction error of EEMD-GCS-SVM is smaller.It means that this model is more accurate than other models, and is more suitable for the prediction of the cost level of 220 kV substation project.

Results Analysis and Comparison
After training the forecasting model, the best value of c and ε are 29.8425 and 0.4871, respectively.In order to verify the accuracy of the model, the GCS-SVM model without EEMD algorithm, the EEMD-CS-SVM model with the non-optimized CS algorithm, and the EEMD-GCS-SVM model are used to predict the samples in this paper.The results are shown in Figure 3a-c.
It can be seen from Figure 3a-c that the prediction result by EEMD-GCS-SVM model, which has been decomposed by EEMD and optimized by GCS algorithm, has the highest fitting and prediction accuracy.The results indicate that the EEMD and GCS algorithm is helpful to improve the accuracy of the model.
In addition, BP neural network model, SVM model, GCS-SVM model, EEMD-CS-SVM model, and EEMD-GCS-SVM model are, respectively, used to predict the data of the test set in this paper, and the error comparison results are shown in Table 2 and Figure 4.
According to the above table, the RMSE of the EEMD-GCS-SVM model is 0.51, MAE is 0.43, and MAPE is 0.13%, which indicates that the prediction accuracy is higher than EEMD-CS-SMV and GCS-SVM, and is significantly better than the BP neural network model and the SVM model.
The boxplot can directly reflect the accuracy of each model.It can be seen from the boxplot that the prediction error of EEMD-GCS-SVM is smaller.It means that this model is more accurate than other models, and is more suitable for the prediction of the cost level of 220 kV substation project.

Conclusions
With the development of power system reform, it is urgent to improve the cost management level of power grid project.Thus, it is of great significance to control the cost level of grid projects effectively for improving the investment efficiency and sustainability.However, the prediction of substation project cost level is difficult, and the prediction accuracy of the traditional method is insufficient.
(1) The EEMD-GCS-SVM model established in this paper can effectively improve the prediction accuracy of substation project cost with a MAPE value of only 0.13%, which is much better than that of the un-optimized and EEMD models.However, the research only analyzed the cost prediction of the 220 kV substation due to the limited availability of data.Besides, there is no verification and analysis of more regions.Thus, more types of data should be widely used to verify the ability of the modified model in the further study.In future research, we will continue to apply the model to more prediction fields and explore more scientific and accurate prediction methods.

Conclusions
With the development of power system reform, it is urgent to improve the cost management level of power grid project.Thus, it is of great significance to control the cost level of grid projects effectively for improving the investment efficiency and sustainability.However, the prediction of substation project cost level is difficult, and the prediction accuracy of the traditional method is insufficient.
(1) The EEMD-GCS-SVM model established in this paper can effectively improve the prediction accuracy of substation project cost with a MAPE value of only 0.13%, which is much better than that of the un-optimized and EEMD models.However, the research only analyzed the cost prediction of the 220 kV substation due to the limited availability of data.Besides, there is no verification and analysis of more regions.Thus, more types of data should be widely used to verify the ability of the modified model in the further study.In future research, we will continue to apply the model to more prediction fields and explore more scientific and accurate prediction methods.

Figure 1 .
Figure 1.Flow chart of the Ensemble Empirical Mode Decomposition (EEMD)-GCS-Support Vector Machines (SVM) prediction model.1. Decompose the substation cost data to obtain the IMF components and surplus variables through the EEMD method, and normalize the data.2. Initialize the parameters and kernel functions of SVM model, input the normalized decomposed variables into SVM model, and find and determine the optimal parameters and kernel function of SVM model by using GCS algorithm.In order to search for the best parameters of the prediction model faster, the range of c, ε are set as [0.01, 100], [0.01, 100], respectively.Then, train the prediction model by plugging the historical data into the model and search the best parameter by using the GCS Algorithm.Firstly, set the Nnest (number of birds' nest) as 20, while Pa (probability of bird's eggs by bird's nest owner) is 0.45, and N (number of iterations) is 200.After that, randomly generate Nnest bird nest location W = (W1, W2, ..., Nnest) T .Each bird nest Wi has s parameters (s = the number of weights between

Figure 2 .
Figure 2. The result of the EEMD.(a) Original data; (b) IMF1; (c) IMF2; (d) IMF3; (e) IMF4; and (f) IMF5.From the decomposition results, each IMF component obtained after the EEMD algorithm shows obvious regularity and periodicity, the algorithm helps to improve the prediction accuracy of subsequent SVM model.Then, the first 54 sets of cost data are taken as training group, the decomposed IMF components and the surplus components are input into the GCS-SVM prediction model, respectively, for training.The total cost of each component is reduced to the predictive value after the prediction results of each component are obtained.The latter 18 sets of cost data are used as test group to verify the prediction effect of the model.

Figure 4 .
Figure 4. Comparison of the model errors.

( 2 )
EEMD method can decompose irregular and non-stationary sequence signals into multiple IMF components and surplus components.The decomposed signals show regularity and periodicity obviously, which improves the prediction accuracy of the model.(3)On the basis of CS optimized SVM parameters and kernel function, adding Gauss perturbation can effectively improve the search vitality and range of CS algorithm.The optimal SVM parameters are obtained, the calculation of kernel function is faster, and the computational efficiency and prediction accuracy is improved in the model.

Figure 4 .
Figure 4. Comparison of the model errors.

( 2 )
EEMD method can decompose irregular and non-stationary sequence signals into multiple IMF components and surplus components.The decomposed signals show regularity and periodicity obviously, which improves the prediction accuracy of the model.(3)On the basis of CS optimized SVM parameters and kernel function, adding Gauss perturbation can effectively improve the search vitality and range of CS algorithm.The optimal SVM parameters are obtained, the calculation of kernel function is faster, and the computational efficiency and prediction accuracy is improved in the model.

Table 1 .
The cost level of 220 kV new outdoor substation project at a certain place.First of all, decompose the cost level data by EEMD into five IMF components, and the decomposition results are shown in Figure2.layer and hidden layer + the number of weights between hidden layer and output layer + the number of translation factors + the number of expansion factors).The predicted values of each bird's nest were calculated, and the nest which has the smallest error in the 20 nests is found, marked as Wbest.Then Wbest retain to the next generation.3. Train the SVM model by using the training set, and then input the test set data to obtain the predictive value of the cost data. input

Table 1 .
The cost level of 220 kV new outdoor substation project at a certain place.

Table 2 .
Comparison of the model errors.

Table 2 .
Comparison of the model errors.