Cost Forecasting Model of Transformer Substation Projects Based on Data Inconsistency Rate and Modified Deep Convolutional Neural Network

Precise and steady substation project cost forecasting is of great significance to guarantee the economic construction and valid administration of electric power engineering. This paper develops a novel hybrid approach for cost forecasting based on a data inconsistency rate (DIR), a modified fruit fly optimization algorithm (MFOA) and a deep convolutional neural network (DCNN). Firstly, the DIR integrated with the MFOA is adopted for input feature selection. Simultaneously, the MFOA is utilized to realize parameter optimization in the DCNN. The effectiveness of the MFOA–DIR–DCNN has been validated by a case study that selects 128 substation projects in different regions for training and testing. The modeling results demonstrate that this established approach is better than the contrast methods with regard to forecasting accuracy and robustness. Thus, the developed technique is feasible for the cost prediction of substation projects in various voltage levels.


Introduction
The inadequate management and supervision of substation projects tend to bring about high cost, which has critical effects on the economy and sustainability of power engineering. Thus, cost prediction is of great importance for expense saving [1]. However, the comparable projects are hard to collect due to limited engineering in the same period as well as various influential factors such as the overall plan of the power grid, total capacity, terrain features, design and construction level, and local economy [2]. Along with the less sample data, the difficulty of cost forecasting for substation projects has been increased. Therefore, it is of great significance for the sustainability of electric power engineering investment to study and construct the substation cost forecasting model and accurately forecast the substation cost.
Nowadays, many scholars have published their momentous work to handle the cost forecasting of engineering, but few studies have focused on substation projects. The approaches in regard to engineering cost prediction are primarily separated into two kinds-traditional prediction methods and intelligent algorithms. Traditional forecasting techniques primarily consist of time series [3], grey prediction [4], regression analysis [5] and so on. Reference [3] designed a time series prediction model for engineering cost based on bills of quantities and evaluation. The results indicated that this proposed model controlled the error range within 5%. Reference [4] put forward an improved grey forecasting method optimized by a time response function to predict main construction cost indicators in power projects, where the constant C was determined through the minimum Euclidean distance of an original series and constraints of simulation values. In reference [6], a forecasting technique grounded on multiple structure integral linear regression was established in line with the characteristics is superior to logistic regression and the CNN in use of randomly ordered features. Thus, for the purpose of training time and model complexity reduction, feature selection models can be employed.
Considering the influence of parameter selection on prediction performance of the DCNN, it is indispensable to select a proper intelligent algorithm to optimize parameters [26]. The fruit fly optimization algorithm (FOA), proposed by Dr. Pan Wenchao in June 2011, is a novel global optimization algorithm on the foundation of swarm intelligence [27]. This technique is derived from the simulation of foraging behaviors and is similar to the ant colony algorithm [28] and particle swarm optimization [29]. Due to its simple structure, few parameters, and easy realization, scholars at home and abroad have focused on this method and applied it to forecasting [30][31][32][33][34][35]. For example, reference [31] combined the improved FOA with a wavelet least square support vector machine. The case studies verified that the proposed method presents strong validity and feasibility in mid-long term power load prediction compared with other alternative approaches. Reference [33] studied monthly electricity consumption forecasting on the basis of a hybrid model that integrates the support vector regression method with an FOA with a seasonal index adjustment. The experimental results demonstrated this approach can be effectively utilized in the field of electricity consumption forecasting. A novel hybrid forecasting model was constructed in reference [35] for annual electric load prediction; here, an FOA was applied to automatically determine the appropriate parameter values in the proposed approach. In reference [36], the authors applied a modified firefly algorithm and a support vector machine to predict substation engineering cost. The case study of substation engineering in Guangdong Province proved that the proposed model has a higher forecasting accuracy and effectiveness. Remarkably, the potential weaknesses of premature convergence and easily trapping into local optimum make a certain restriction in the performance of an FOA. Thus, quantum behavior was utilized in this paper to modify the basic FOA. This improved approach, namely the MFOA, was exploited to select features with a data inconsistency rate (DIR) and optimize parameters for the DCNN model.
In view of the various influential factors of substation project cost, it is necessary to identify and select proper features as the input to avoid data redundancy and increase computation efficiency [37]. The filter method gives a score to each feature by statistical methods, sorts the features by score, and then selects the subset with the highest score. This method is only for each feature to be considered independently, without considering the feature, dependence or correlation. Compared with the filter method, the wrapper method takes the correlation between features into account by considering the effect of the combination of features on the performance of the model. It compares the differences between different combinations and selects the best combination of performance. The DIR model determines complete characteristic selection by dividing the feature set and calculating the minimum inconsistency of the subsets, as presented in reference [38]. The authors in reference [39] thought that the key sequential of features could be identified by selecting the minimum inconsistency rate, and the optimized feature subset could also be efficiently achieved based on the sequence forward search strategy. The experiments showed that the proposed data classification scheme obtains good performance. In reference [40], a discrete wavelet transform in combination with an inconsistency rate model was designed to achieve optimal feature selection. The experiment verified that this approach contributes to the reduction of redundancy in input vectors and outperforms other models in short-term power load prediction. It can be seen the DIR takes advantage of data inconsistency to eliminate redundant features. Furthermore, it allows for a correlation such that the selected optimal characteristics are able to cover all data information. As a result, the DIR method is introduced for feature selection in this paper.
Based on the aforementioned studies, this paper develops a novel hybrid approach for cost forecasting based on the DIR, the DCNN and the MFOA. Firstly, the DIR integrated with the MFOA is adopted for input feature selection. Simultaneously, the MFOA is utilized to realize parameter optimization in the DCNN. Thus, the proposed method can be applied to cost forecasting of substation projects on the foundation of the optimized input subset as well as the best parameters. The rest of the paper is organized as follows: Section 2 briefly introduces the established hybrid model including the MFOA, the DIR, the DCNN, and the concrete structure. Section 3 verifies the developed technique via a case study. Section 4 draws conclusions.

Methodology
The FOA is a new optimization approach that simulates the foraging behaviors of a fruit fly swarm [27,41]. Their sensitive smell and sharp vision contribute to the discovery of food sources over 40 km and correct flight to the location [42,43]. The food searching procedure of a fruit fly swarm can be seen from Figure 1. The FOA is a new optimization approach that simulates the foraging behaviors of a fruit fly swarm [27,41]. Their sensitive smell and sharp vision contribute to the discovery of food sources over 40 km and correct flight to the location [42,43]. The food searching procedure of a fruit fly swarm can be seen from Figure 1. According to the food searching features, the following is the specific description of the FOA: (1) Initialize the location of the fruit fly swarm according to Equation (1).
(2) For an individual fruit fly, set the random direction and distance for food finding, as shown in Equations (2) and (3): (3) Estimate the distance between the origin point and the smell concentration of each individual fruit fly i S as follows: (4) Take the value of smell concentration into its judgement function; then, in light of Equation (6)  According to the food searching features, the following is the specific description of the FOA: (1) Initialize the location of the fruit fly swarm according to Equation (1).

InitX_axis; InitY_axis
(2) For an individual fruit fly, set the random direction and distance for food finding, as shown in Equations (2) and (3): (3) Estimate the distance between the origin point and the smell concentration of each individual fruit fly S i as follows: (4) Take the value of smell concentration into its judgement function; then, in light of Equation (6), obtain the smell concentration Smell i at each location Energies 2019, 12, 3043 5 of 21 (5) Find out the optimal smell concentration among the fruit fly swarm: (6) Keep a record of the optimal smell concentration as well as its x, y coordinates. Afterwards, the fruit flies can fly to the destination by the use of vision.
The iterative optimization is carried out by a repeat of Step (2) to Step (5). At each iteration, determine whether the smell concentration shows an advantage over the former one. If so, follow Step (6).

MFOA
(1) The development of quantum mechanics has greatly promoted the application of quantum computation in diverse fields. In quantum computation, a quantum bit is utilized to represent quantum state, and the 0 and 1 binary method is adopted to express quantum information.
Here, the basic quantum state consists of the "0" and "1" states, and the state is able to achieve random linear superposition between "0" and "1." Therefore, these two states are allowed to exist simultaneously, which issues a large challenge to the classic bit expression approach in classical mechanics. The superposition of quantum state is described as Equation (9) ψ >= α 0 > +β|1 > , where |0 > and |1 > indicate two kinds of quantum states, α, and β is the probability amplitude.
(2) Initialize the location of fruit fly. Additionally, take advantage of the probability amplitude of the quantum bit to code the current location of the individual fruit fly, as shown in Equation (11): where θ ij = 2πrand(); rand() is equivalent to a random number between 0 and 1; i = 1, 2, · · · , m; j = 1, 2, · · · , n; m represents the number of fruit flies; and n is the quantity of space. As a result, the homologous probability amplitudes of the quantum state |0 > and |1 > are presented in Equations (12) and (13). P ic = (cos(θ i1 ), cos(θ i2 ) · · · cos(θ in )) (12) P is = (sin(θ i1 ), sin(θ i2 ) · · · sin(θ in )) represents the jth quantum bit of the individual fruit fly P i ; then, the related solution space is converted in accordance with Equation (14): where rand() is the random value in the range of [0, 1], X j ic and X j is partly equal the actual value of the parameter in jth dimensional location when the quantum state of ith individual reaches |0 > or |1 > . a i and b i represent the upper and lower limit, respectively.
Suppose the search of the MFOA is conducted in a two-dimensional space, namely j = 1, 2. InitX_axis and InitY_axis represent the initialization of the location. The solution space is described in Equations (16)- (19).
i f rand() < P id : i f rand() ≥ P id : (4) The distance Dist between the origin and location is estimated, and the judgement value of smell concentration S(i), namely the reciprocal of distance, can be obtained- In accordance with Equation (20), the smell concentration Smell i of each fruit fly location is acquired: (6) A quantum rotating gate is employed to update the individual location, as shown in Equation (21): where α k+1 jd and β k+1 jd represent the probability amplitude of jth fruit fly at k + 1th iteration in d-dimensional space and θ k+1 jd equals the rotating angle, as presented in Equation (22): where s(α k jd , β k jd ) and ∆θ k+1 jd are equivalent to the direction and increment of the rotating angle, respectively.
Here, the updated α k+1 jd and β k+1 jd need to be converted to solution space to conform with the operation mechanism.
The loss of population diversity during searching leads to a premature convergence, together with an easy trapping into a local optimum. Thus, individual mutation is introduced in the MFOA to address this problem, as presented in Equation (29): where P m means the mutation probability and rand() equals a random number in [0, 1]. If rand() < P m , carry out mutation and make a change for the probability amplitude in the quantum bit. Thus, the mutated individual is successfully converted into the solution space. (8) Keep a record of the individual with the optimal concentration value as well as the homologous coordinates.

DIR
In the light of various characteristics of the substation project cost, it is of great necessity to select the most correlated features as the input to refrain from information redundancy and increase cost forecasting precision. The discrete features of input can be accurately displayed via data inconsistency [39]. Distinct features are divided into diverse patterns with corresponding frequency. The value of the DIR is able to discriminate the classification capability of data categories. The value of the DIR is positively correlated with the assortment ability of the feature vector.
Suppose there exist g features in substation project cost (e.g., main transformer capacity, area, price), expressed as G 1 , G 2 , . . . , G g . L represents the subset of the feature set Γ. According to the level of substation project cost, set the standard M with c classifications and N as data instances. z ji and λ i equal the values of feature and classification M, respectively. Data instances are represented by z j , λ i , z j = [z j1 , z j2 , z j3 , · · · , z jg ]. According to Equation (32), the DIR can be derived by where f kl equals the number of data instances that belongs to the feature subset of x k and x k implies that the number of feature division interval patterns existing in the data set equals p (k = 1, 2, . . . , p; p ≤ N). The steps of feature selection by the DIR are shown as follows: (1) Initialize the best subset as Γ = {}, namely an empty set.
(2) Estimate the DIR of G 1 , G 2 , . . . , G g that are made up of Γ subset with each residual feature.
(3) Select the feature with minimum inconsistency rate G i as the optimal one. Then, update it in the light of Γ = {Γ, G i }. (4) Make a list of the inconsistency rates of the feature subsets. After that, sort them in ascending order. (5) Choose the feature subset L with fewer characteristics. If τ L ≈ τ Γ or τ L /τ L is the minimum ratio of all the adjacent feature subsets, L is able to be screened as the optimal one, where L represents the adjacent previous subset.
Through the estimation of the inconsistency rate, the redundant features can be effectively eliminated. Meanwhile, correlation can be considered, which guarantees the selected features on behalf of all information.

DCNN
The DCNN is a kind of ANN with deep learning capability whose main characteristics are the local connection and weight sharing of neurons in the same layer [44]. Multiple feature extraction layers and the fully connected one are typically included in the network. Each feature extraction layer consists of two units, that is a convolutional layer and a subsampling one. The framework of the DCNN is shown in Figure 2. In the DCNN, the neural nodes between two layers are no longer fully connected. Instead, layer spatial correlation is adopted to link the neuron nodes of each layer merely to the ones in the adjacent upper layer. Hence, local connection is completed, and the parameter size of the network is greatly reduced.
where kl f equals the number of data instances that belongs to the feature subset of k x and k x implies that the number of feature division interval patterns existing in the data set equals p ( ).
The steps of feature selection by the DIR are shown as follows: (1) Initialize the best subset as {}   , namely an empty set.
(2) Estimate the DIR of ,..., , 2 1 that are made up of  subset with each residual feature.
(3) Select the feature with minimum inconsistency rate i G as the optimal one. Then, update it in (4) Make a list of the inconsistency rates of the feature subsets. After that, sort them in ascending order. (5) Choose the feature subset L with fewer characteristics. If is the minimum ratio of all the adjacent feature subsets, L is able to be screened as the optimal one, where ' L represents the adjacent previous subset. Through the estimation of the inconsistency rate, the redundant features can be effectively eliminated. Meanwhile, correlation can be considered, which guarantees the selected features on behalf of all information.

DCNN
The DCNN is a kind of ANN with deep learning capability whose main characteristics are the local connection and weight sharing of neurons in the same layer [44]. Multiple feature extraction layers and the fully connected one are typically included in the network. Each feature extraction layer consists of two units, that is a convolutional layer and a subsampling one. The framework of the DCNN is shown in Figure 2. In the DCNN, the neural nodes between two layers are no longer fully connected. Instead, layer spatial correlation is adopted to link the neuron nodes of each layer merely to the ones in the adjacent upper layer. Hence, local connection is completed, and the parameter size of the network is greatly reduced. The typical CNN is made up of four layers, namely the input layer, the convolutional layer, the subsampling layer and the full connection layer. In the convolutional layer, the convolutional kernel is used for feature extraction, and the corresponding output can be obtained by a weighted calculation through the activation function, as expressed in Equation (33)  The typical CNN is made up of four layers, namely the input layer, the convolutional layer, the subsampling layer and the full connection layer. In the convolutional layer, the convolutional kernel is used for feature extraction, and the corresponding output can be obtained by a weighted calculation through the activation function, as expressed in Equation (33) x where f (I) = 1 1+e −I , I = k j=m x l−1 j w l j + b l j ( j = 1, 2, . . . , n; 0 < m ≤ k ≤ n), x l j and x l−1 j equal the output in Layer 1 and the input in Layer l − 1, respectively. j represents the local connection from the range of m to k; w l j and θ l j equal the weight and bias, respectively. The subsampling process is implemented on the features of the convolutional layer for dimension-reduction. The characteristics are extracted from each n × n sampling pool by "pool average" or "pool maximum," as described in Equation (34): where g(∼) is the function that completes the selection of the average or maximum value. The operation of pooling is conducive to the complexity reduction of the convolutional layer and the avoidance of over fitting. In addition, it ameliorates the fault tolerance ability of feature vectors for data-characteristic micro deformation, and it enhances computational performance and robustness. Finally, the attained data are linked to the fully connected layer, as expressed in Equation (35): where W l equals the weight from Layer l − 1 to Layer l and x l is the output.
In the aforementioned computation, every convolutional kernel acts on all the input through slide. Multiple sets of output data are derived from the effects of diverse convolutional kernels in which the same kernel corresponds to the uniform weight. Conflate the output of diverse groups. Afterwards, transfer them to the subsampling layer. The range of values is further set, and the average or maximum value can be treated as the specific one in the scope through slide. In the end, the data are integrated to achieve dimension reduction, and the results are output through the full connection layer.
The application of the DCNN approach for cost prediction presents two merits: (i) The existence of deformed data is permitted, and (ii) the quantity of parameters decreases by local connection and weight sharing, so the efficiency and accuracy of cost prediction can be significantly improved. Nevertheless, in substation project cost prediction, the constancy of the forecasting results cannot be assured in virtue of the subjective determination of parameters. Thus, the MFOA is introduced here to optimize the parameters in the DCNN.

Approach of MFOA-DIR-DCNN
The framework of the established technique MFOA-DIR-DCNN for substation project cost prediction is displayed in Figure 3. The specific procedures of this novel method can be explained at length as follows: (1) Determine the initial candidate features of substation project cost. In the DIR, initialize the optimal subset as an empty set Γ = {}. with each residual feature. The feature with minimum inconsistency rate G i is selected as the best one, and the updated optimal feature is set as Γ = {Γ, G i }. (4) Derive the optimal feature subset along with the best values of parameters in the DCNN.
The feature subset at current iteration is brought into the DCNN, and both prediction accuracy r( j) and fitness value Fitness( j) can be calculated for this training process. Then, determine whether each iteration satisfies the termination requirements (reach the target error value or the maximum number of iterations). If not, reinitialize the feature subset and repeat the above steps until the conditions are met. It is noteworthy that the parameters in the DCNN also need to be optimized, and the initial values of weight w and threshold θ are randomly assigned. Therefore, a fitness function based on both forecasting precision and feature selection quantity is set up, as shown in Equation (36): where Num f eature( j) represents the quantity of selected best characteristics in each iteration, and a and b equal the constants in [0, 1]. (5) Forecast via the DCNN. When the iterative number reaches the maximum, the estimation stops.
Here, the optimal feature subset, the best values of w, and θ are taken into the DCNN model for substation project cost forecasting.
Energies 2019, 12, x FOR PEER REVIEW 10 of 21 until the conditions are met. It is noteworthy that the parameters in the DCNN also need to be optimized, and the initial values of weight w and threshold  are randomly assigned.
Therefore, a fitness function based on both forecasting precision and feature selection quantity is set up, as shown in Equation (36):

Data Processing
This paper selected the cost data of 128 substation projects in various voltage levels and in different areas from 2015 to 2018, as shown in Table 1; the statistics of the substation features are shown in Table A1. In this paper, we selected the cost and corresponding influential factors of the first 66 projects as a training set. Correspondingly, the remaining data were employed as a testing set. Here, the construction types of substation projects can be divided into three categories: New substation, extended main transformer, and extended interval engineering are valued at 1, 2 and 3, respectively. The substation types were decomposed into three types where the indoor, the semi-indoor, and the outdoor were set as 1, 2 and 3, respectively. The landforms were parted into eight kinds, namely hillock, hillside field, flat, plain, paddy field, rainfed cropland, mountainous region and depression-these were valued at {1, 2, 3, 4, 5, 6, 7, 8}. In addition, the local GDP was employed to represent the economic development level of the construction area. The proportion of bachelor degree or above in the staff stood for the technical level of the designers. The difference between actual progress and the schedule stipulated in the contract was utilized on behalf of construction progress level. The data needed to be normalized with Equation (37).
where x i and y i represent the actual value and normalized value, respectively, while x min and x max equal the minimum and maximum of the sample data, respectively.

Model Performance Evaluation
Four commonly adopted error criteria are presented in this paper to measure the forecasting precision of substation project cost obtained by all involved approaches.
(1) Relative error (RE) (2) Root mean square error (RMSE) (3) Mean absolute percentage error (MAPE) (4) Average absolute error (AAE) where n is the number of testing samples, while x andx represent the actual value and predictive value of substation project cost, respectively. The aforementioned indicators are negatively correlated with forecasting precision.

Feature Selection
The input of the forecasting techniques was determined on the basis of optimal feature subset selection by the DIR. In reference [45], the authors divided the substation project cost into two main types: Primary and secondary production cost and individual project costs associated with site, totaling more than 20 factors. In reference [46], authors selected more than 26 variables including the area and main transformer capacity as the influencing factors of substation cost. Based on the research of the above references, this paper screened 33 variables as the main influencing factors of substation cost, including area, construction type, voltage level of substation, main transformer capacity, transmission line circuits in the low and high voltage sides, topography, schedule, substation type, the number of transformers, the economic development level of the construction area, inflation rate, the price and number of the circuit breaker in the high voltage side, the quantity of low-voltage capacitors, the price of single main transformer, high-voltage fuse, current transformer, power capacitor, reactor, electric buses, arrester, measuring instrument, relay protection device, signal system, automatic device, the expense of site leveling and foundation treatment, the technical level of the designers, the number of accidents, engineering deviation rate, construction progress level, rainy days, and snowy days. The program in this paper was run in MATLAB R2018b under Intel Core i5-6300U, 4 G and a Windows 10 system.
The iterative process of feature extraction is displayed in Figure 4, where the accuracy curve and the fitness curve show the forecasting precision of the DCNN and fitness values in different iterations, respectively, while option number indicates the quantity of best characteristics derived from the DIR model, and feature reduction refers to the number of characteristics eliminated by the MFOA.  As we can see, the MFOA converged at the 39th iteration, and the homologous optimal fitness value and prediction accuracy equaled −0.91% and 98.9%, respectively, This indicates that the fitting ability of the DCNN can be enhanced, and the forecasting precision is able to reach the highest through learning and training. Furthermore, the quantity of chosen characteristics was inclined to be steady when the MFOA ran to the 51th time. Ultimately, the final selected characteristics embodied construction type, voltage level, main transformer capacity, substation type, the number of transformers, the price of single main transformer, and the area by eliminating 26 redundant features from 33 candidates. The importance of these seven features derived from the DIR was ordered as (from important to unimportant): The price of single main transformer, the number of transformers, main transformer capacity, construction type, area, substation type, and voltage level.

Results and Discussion
After the accomplishment of feature selection, the input vector was brought into the DCNN model for training and testing. Here, the wavelet kernel function [47], one of the most widely used kernel functions, was applied, and the parameters optimized by MFOA equaled:  As we can see, the MFOA converged at the 39th iteration, and the homologous optimal fitness value and prediction accuracy equaled −0.91% and 98.9%, respectively, This indicates that the fitting ability of the DCNN can be enhanced, and the forecasting precision is able to reach the highest through learning and training. Furthermore, the quantity of chosen characteristics was inclined to be steady when the MFOA ran to the 51th time. Ultimately, the final selected characteristics embodied construction type, voltage level, main transformer capacity, substation type, the number of transformers, the price of single main transformer, and the area by eliminating 26 redundant features from 33 candidates. The importance of these seven features derived from the DIR was ordered as (from important to unimportant): The price of single main transformer, the number of transformers, main transformer capacity, construction type, area, substation type, and voltage level.

Results and Discussion
After the accomplishment of feature selection, the input vector was brought into the DCNN model for training and testing. Here, the wavelet kernel function [47], one of the most widely used kernel functions, was applied, and the parameters optimized by MFOA equaled: γ = 43.0126, σ = 19.0382.
For the purpose of verifying the performance of the established approach, four other methods incorporating the MFOA-DCNN, the DCNN, an SVM and the BPNN were used for comparison. In the BPNN, the topology was set as 9-7-1. Tansig and purelin were exploited as the transfer function in the hidden layer and the transfer function in the output layer, respectively. In this paper, we set the maximum number of convergence as 200, while the learning rate and the error equaled 0.1 and 0.0001, respectively. The initial values of weights and thresholds were decided by their own training. In the Energies 2019, 12, 3043 14 of 21 SVM, the penalty parameter c and kernel parameter σ were valued at 10.276 and 0.0013, respectively, and ε in the loss function equaled 2.4375. In the DCNN, γ = 15, σ = 5. Table 2 lists the prediction results of the substation project cost achieved by five different models. For a more intuitive analysis, Figure 5 presents the predictive values and Figure 6  ]. In addition, the minimum absolute values of RE for the MFOA-DIR-DCNN, the MFOA-DCNN, the DCNN, the SVM and the BPNN were 0.23%, 0.79%, 1.44%, −2.52%, 2.83%, respectively, and the maximum absolute values of RE correspondingly equaled 2.99%, 6.12%, 6.51%, −6.94% and 7.17%, respectively. In this respect, these models can be sorted by the forecasting accuracy from the superior to the inferior: the MFOA-DIR-DCNN, the MFOA-DCNN, the DCNN, the SVM and the BPNN. This demonstrates that the application of the MFOA contributes to the enhancement of training and learning process as well as the improvement of global searching ability for the DCNN. Simultaneously, the input derived from the MFOA-DIR can obtain satisfactory prediction results. In contrast with the SVM and the BPNN, this indicates that the DCNN can achieve a better forecasting performance than shallow learning algorithms. as the improvement of global searching ability for the DCNN. Simultaneously, the input derived from the MFOA-DIR can obtain satisfactory prediction results. In contrast with the SVM and the BPNN, this indicates that the DCNN can achieve a better forecasting performance than shallow learning algorithms.   Figure 7 illustrates the comparative results gauged by the RMSE, the MAPE, and the AAE. THis proves that the established hybrid model is superior to the other four techniques from the perspective of the aforementioned error criteria. Concretely, the RMSE, the MAPE and the AAE of the MFOA-DIR-DCNN were 2.2345%, 2.1721% and 2.1700%, respectively. Additionally, the RMSEs of the MFOA-DCNN, the DCNN, the SVM and the BPNN were 3.1818%, 3.7103%, 4.5659%, and 6.2336%, respectively, while the MAPE of the corresponding methods equaled 3.2073%, 3.7148%, 4.4318% and 5.8772%, respectively. Accordingly, the AAE of the MFOA-DCNN, the DCNN, the SVM and the BPNN was equivalent to 3.1251%, 3.7253%, 4.4956% and 5.7347%, respectively. Owing to the fact that the DCNN has advantages over shallow learning algorithms, the MFOA was able to complete parameter optimization of the DCNN, and the DIR approach guarantees the completeness of the input information while reducing the redundant data, which ameliorates the prediction accuracy and robustness. as the improvement of global searching ability for the DCNN. Simultaneously, the input derived from the MFOA-DIR can obtain satisfactory prediction results. In contrast with the SVM and the BPNN, this indicates that the DCNN can achieve a better forecasting performance than shallow learning algorithms.   Figure 7 illustrates the comparative results gauged by the RMSE, the MAPE, and the AAE. THis proves that the established hybrid model is superior to the other four techniques from the perspective of the aforementioned error criteria. Concretely, the RMSE, the MAPE and the AAE of the MFOA-DIR-DCNN were 2.2345%, 2.1721% and 2.1700%, respectively. Additionally, the RMSEs of the MFOA-DCNN, the DCNN, the SVM and the BPNN were 3.1818%, 3.7103%, 4.5659%, and 6.2336%, respectively, while the MAPE of the corresponding methods equaled 3.2073%, 3.7148%, 4.4318% and 5.8772%, respectively. Accordingly, the AAE of the MFOA-DCNN, the DCNN, the SVM and the BPNN was equivalent to 3.1251%, 3.7253%, 4.4956% and 5.7347%, respectively. Owing to the fact that the DCNN has advantages over shallow learning algorithms, the MFOA was able to complete parameter optimization of the DCNN, and the DIR approach guarantees the completeness of the input information while reducing the redundant data, which ameliorates the prediction accuracy and robustness.  Figure 7 illustrates the comparative results gauged by the RMSE, the MAPE, and the AAE. THis proves that the established hybrid model is superior to the other four techniques from the perspective of the aforementioned error criteria. Concretely, the RMSE, the MAPE and the AAE of the MFOA-DIR-DCNN were 2.2345%, 2.1721% and 2.1700%, respectively. Additionally, the RMSEs of the MFOA-DCNN, the DCNN, the SVM and the BPNN were 3.1818%, 3.7103%, 4.5659%, and 6.2336%, respectively, while the MAPE of the corresponding methods equaled 3.2073%, 3.7148%, 4.4318% and 5.8772%, respectively. Accordingly, the AAE of the MFOA-DCNN, the DCNN, the SVM and the BPNN was equivalent to 3.1251%, 3.7253%, 4.4956% and 5.7347%, respectively. Owing to the fact that the DCNN has advantages over shallow learning algorithms, the MFOA was able to complete parameter optimization of the DCNN, and the DIR approach guarantees the completeness of the input information while reducing the redundant data, which ameliorates the prediction accuracy and robustness. For further verification that the proposed method is better, the case was predicted by the methods proposed in Reference [8] (BP neural network), [14] (cuckoo search algorithm and support vector machine), and [36] (modified firefly algorithm and support vector machine). The input of these three models was 33-that is 33 candidate features-and the parameter settings were consistent with those mentioned in the text. Table 3 displays the comparative forecasting results. According to Table 3, it can be concluded that the forecasting precision of the established approach outperforms that of References [8,14,36]. The main reasons consist of three points. First, the feature selection process can remove the low correlation factors, thereby reducing the input of the model and reducing the training error of the model. Second, optimizing the parameters of the neural network or the support vector machine can provide the training accuracy of the model. For example, the prediction results of References [14] and [36] were superior to the prediction results of the SVM (mentioned in Figure 7). Third, the DCNN model not only reduces the number of neurons and weights, it also uses the pooling operation to make the input features have displacement, scaling and distortion invariance, thus improving the accuracy and robustness of network training, which is better than the SVM and the BPNN.
However, when training and testing the proposed model, it was found that the amount of sample data in the training set had a relatively large impact on the test results. The larger the sample size of the training set, the better the test results. Due to the limited number of new substation projects each year, when applying the proposed model, it is necessary to collect more data on the cost of the For further verification that the proposed method is better, the case was predicted by the methods proposed in Reference [8] (BP neural network), [14] (cuckoo search algorithm and support vector machine), and [36] (modified firefly algorithm and support vector machine). The input of these three models was 33-that is 33 candidate features-and the parameter settings were consistent with those mentioned in the text. Table 3 displays the comparative forecasting results. According to Table 3, it can be concluded that the forecasting precision of the established approach outperforms that of References [8,14,36]. The main reasons consist of three points. First, the feature selection process can remove the low correlation factors, thereby reducing the input of the model and reducing the training error of the model. Second, optimizing the parameters of the neural network or the support vector machine can provide the training accuracy of the model. For example, the prediction results of References [14] and [36] were superior to the prediction results of the SVM (mentioned in Figure 7). Third, the DCNN model not only reduces the number of neurons and weights, it also uses the pooling operation to make the input features have displacement, scaling and distortion invariance, thus improving the accuracy and robustness of network training, which is better than the SVM and the BPNN.
However, when training and testing the proposed model, it was found that the amount of sample data in the training set had a relatively large impact on the test results. The larger the sample size of the training set, the better the test results. Due to the limited number of new substation projects each year, when applying the proposed model, it is necessary to collect more data on the cost of the previous substation project cost to ensure that the DCNN can be fully trained.

Conclusions
This paper developed a novel hybrid approach for cost forecasting based on the DIR, the DCNN and the MFOA. Firstly, the DIR integrated with the MFOA was adopted for input feature selection. Simultaneously, the MFOA was utilized to realize parameter optimization in the DCNN. Thus, the proposed method could be applied to cost forecasting of substation projects on the foundation of the optimized input subset, as well as the best value of γ and σ. The proposed model outperformed the comparative approaches in terms of prediction precision. The case studies demonstrated that: (a) The use of the DIR is conducive to the elimination of unrelated noises and the improvement of prediction performance. (b) Improving the DCNN with the MFOA presents good performance mainly due to the fact that the MFOA enhances the global searching capability of the method. (c) The ideal prediction results were obtained by numerical examples of substation projects in different regions, different voltage levels, and different scales, which shows that the adaptability and stability of the proposed model are also strong. Therefore, this established approach for cost forecasting based on the MFOA-DIR-DCNN, considering its effectiveness and feasibility, provides an alternative for this field in the electric-power industry.
However, the feature selection methods have been researched more and more recently, and it is very important for substation project cost forecasting. Thus, the new feature selection method will be will be a research focus in the future.
Author Contributions: H.W. designed this study and wrote this paper; Y.H. provided professional guidance; C.G. and Y.J. revised this paper.

Conflicts of Interest:
The authors declare no conflict of interest.