You are currently viewing a new version of our website. To view the old version click .
Sustainability
  • Article
  • Open Access

7 April 2022

A Novel Multi-Factor Three-Step Feature Selection and Deep Learning Framework for Regional GDP Prediction: Evidence from China

,
and
1
College of Business and Trade, Hunan Industry Polytechnic, Changsha 410036, China
2
School of Traffic and Transportation Engineering, Central South University, Changsha 410075, China
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Frontiers and Best Practices in Bio, Circular, and Green Growth and Eco-Innovation

Abstract

Gross domestic product (GDP) is an important index reflecting the economic development of a region. Accurate GDP prediction of developing regions can provide technical support for sustainable urban development and economic policy formulation. In this paper, a novel multi-factor three-step feature selection and deep learning framework are proposed for regional GDP prediction. The core modeling process is mainly composed of the following three steps: In Step I, the feature crossing algorithm is used to deeply excavate hidden feature information of original datasets and fully extract key information. In Step II, BorutaRF and Q-learning algorithms analyze the deep correlation between extracted features and targets from two different perspectives and determine the features with the highest quality. In Step III, selected features are used as the input of TCN (Temporal convolutional network) to build a GDP prediction model and obtain final prediction results. Based on the experimental analysis of three datasets, the following conclusions can be drawn: (1) The proposed three-stage feature selection method effectively improves the prediction accuracy of TCN by more than 10%. (2) The proposed GDP prediction framework proposed in the paper has achieved better forecasting performance than 14 benchmark models. In addition, the MAPE values of the models are lower than 5% in all cases.

1. Introduction

Regional gross domestic product (GDP) can fully reflect basic economic indicators such as a region’s economic growth rate and changes in economic scale, which is equal to the sum of the added value of various industries in the region [1]. It is widely used all over the world and has become a general macroeconomic indicator to measure regional economic conditions [2]. The effective forecasting of regional GDP in economic operation and development can not only determine a certain degree of macroeconomic trend and guide the healthy development of macroeconomics but also provide a crucial basis for sustainable urban development [3]. The research on regional GDP can explore the internal driving force of local economic growth and promote the optimization and upgrading of local industrial structure [4]. Besides this, by predicting the regional GDP, local governments can make more comprehensive scientific and economic choices [4]. The government can forecast and prospect the development of the market economy so that the development plans could be formulated according to the forecast results and decisions that are beneficial to the local economy can be conducted [5]. The formulation of macro-control economic policies and the adjustment of corporate development strategies all depend on accurate forecasting of regional GDP [6].
Especially at the moment, the sudden outbreak of COVID-19 since the beginning of 2020 had such a great impact on the operation of the world economy that the major economic indicators have declined significantly [5,6,7]. However, with the continuous progress of production resumption, the major economic indicators (especially in China) have shown a rebounding momentum and continual improvement [8,9]. Given the complex world economic situation at this stage, monitoring the current state of economic operation and forecasting the future GDP trend plays a paramount role in the overall control of the macroeconomy and are of practical significance for the formulation of economic policies and macro-control in the next step [10,11,12]. GDP prediction technology can guide the direction of regional sustainable development in the future. Therefore, the research on this direction can promote regional industrial upgrading and form the direction of green development.
At present, the mainstream GDP forecasting framework mainly includes statistical models, machine learning models, and hybrid models [13]. Statistical models mainly use multiple regression and time series modeling to construct mathematical formulas for GDP changes. Machine learning models mainly include support vector machines, decision trees, and so on, which establish a nonlinear mapping of input and output. Compared with these two kinds of models, the hybrid model can effectively improve the prediction performance of the model from data analysis, feature extraction, and nonlinear modeling by effectively combining various components [14]. Considering the complexity of GDP prediction feature categories, feature engineering, and deep learning are adopted in the paper to establish an accurate multi-factor GDP prediction framework.

3. Methodology

3.1. Framework of the Proposed Regional GDP Forecasting Model

The influencing factors of regional GDP are complex, and the prediction accuracy of simple single series is not satisfactory enough under the circumstance. This paper proposes a regional GDP forecasting method based on the integration of economic, educational, employment, and industrial data. The accuracy of the prediction model can be improved greatly through feature analysis of the multivariate data. Moreover, a three-step feature selection method is proposed to better obtain the features that are helpful to GDP prediction. TCN network can deeply explore the nonlinear relationship between features and target prediction. In this paper, the TCN model is used to predict regional GDP in combination with selected features. The specific model framework is shown in Figure 1.
Figure 1. The core structure of the proposed GDP prediction model.

3.2. Multivariate Economic Characteristic Data

Regional economic forecasting is a typical time series forecasting problem, but it is different from traditional single time series forecasting. The regional economy is affected by education, industrial structure, transportation and logistics, geographical location, and other factors. Therefore, multivariate data analysis is indispensable to realizing accurate regional economic forecasts. In this paper, education, industry, historical economic data, population, and other factors that have a great impact on regional economic forecast are taken as the prediction features. Historical information is the first place to be considered. The historical economic information like real GDP, real consumption, and so on can reflect quite a few laws in economic development, which is instructive for GDP forecasting [43]. Employment and population decide the purchasing power of the public and further influence the market circulation and the improvement of GDP [44]. As for education and technology, positive education can promote employment, and science and technology is the basement of productivity-increasing [45]. Thus, education data is also in need to improve the accuracy of GDP prediction. Finally, industrial information is directly associated with GDP. For example, industrial output, energy consumption, the transportation efficiency are all crucial parts of the economy [46]. The stability and accuracy of the regional economic forecasting model can be greatly improved through multivariate data fusion. As shown in Figure 2, there are 20 features, which are mainly divided into four categories: historical economic indicators, employment and population, educational scale, and industrial structure.
Figure 2. Initial features for regional GDP forecasting.

3.3. Three-Stage Feature Selection

3.3.1. Stage I: Feature Crossing

The 20-dimensional initial features used are divided into four groups according to different domains. The degree of correlation of features within and between groups is different, and the amount of useful information contained in the features is also different. Directly applying feature selection is easy to cause the loss of useful information, and the data cannot be exploited to the full [47]. Thus, feature crossing is utilized to solve the problems. On the one hand, the different information contained in the feature can be combined effectively by applying feature cross, which is able to improve the performance of feature selection. On the other hand, it also allows models to learn more complex nonlinear features. In this part, four feature crossing schemes are proposed. According to whether features belong to the same category, different statistical aggregation or simple calculating methods are carried out on features. Then, the new features are obtained. As shown in Figure 3, distinguishing the feature class by color and the detailed feature crossing method are illustrated below.
Figure 3. Feature crossing strategy.
Scheme 1: Traverse features. The 20-dimensional features are added, subtracted, multiplied, and divided.
Scheme 2: Intra-group features. If two sets of features belong to the same class, addition and subtraction are carried on each pairwise feature; otherwise, multiply and divide.
Scheme 3: Group aggregation features. For all features that belong to the same class, calculate the mean, standard deviation, maximum, minimum, and range of them to generate new features.
Scheme 4: Intra-group feature and group aggregation features. Namely, the new feature splicing obtained by Schemes 2 and 3.

3.3.2. Stage II: Feature Filtering Based on Boruta-RF

Through feature crossing, the input feature dimension is greatly increased. Among all features, some of them have no correction with the dependent variables. If all the crossed features were taken as the inputs of the feature selection part, the efficiency would be impacted. As a result, a feature filter method is adopted before feature selection. Boruta is an important feature filtering algorithm proposed by M. Kursa and R. Rudnicki [48]. The application of this method as a feature selection method belongs to filter feature selection. Different from most feature screening methods, which aim at the maximization of the evaluation function index or the optimization of the model loss function, this method aims to filter out features unrelated to dependent variables and get all feature sets related to dependent variables. Generally, it is believed that if a feature is added or deleted and the model performance does not change, then the feature is not important. However, it is not completely true. When the feature only has no or little effect on the improvement of model performance, it is not necessarily irrelevant to the dependent variable. Boruta can retain all the features related to the target value, so it is of great significance to use this algorithm as a preliminary feature filtering algorithm for feature selection.
The core idea of Boruta is to Shuffle each feature to generate its shadow feature. The shadow features are chaotic and do not correspond to the original samples. All the features and shadows are utilized to train models. Then, the shadow features with the highest importance scores are taken as the baseline, and the feature sets related to the dependent variable are selected from the crossed features according to the baseline. The specific steps are clarified as follows.
Step 1: shuffle each feature Xi in the feature matrix X and combine the shuffled features and original features [49].
X s = s h u f f l e X
X n = X ; X s
where Xs and Xn are the shadow feature matrix and newly generated feature matrix respectively.
Step 2: Train the Random Forest model using the new generated features and calculate the average relative entropy of each feature. Then, calculate Zscore [50], which is given below for all features including the shadow features, and take the highest Z score value of shadow features as the baseline, named Zbase.
Z s c o r e = G ¯ σ G
where G is the related entropy of features, G ¯ is the mean value of G, and σ G is the standard deviation.
Step 3: Set a percentage parameter perc, compare Zscore values of the crossed features with p e r c Z b a s e and get rid of the features whose Zscore value is lower than p e r c Z b a s e . In the process, Benjamini Hochberg FDR and Bonferroni are adopted to guarantee the stability of the algorithm.
Step 4: Delete all shadow features and repeat the above steps until all features are marked as important or unimportant.

3.3.3. Stage III: Feature Selection Based on Reinforcement Learning

Reinforcement learning is a real-time learning method that focuses on online learning and environmental feedback [51]. It is an algorithm used to describe and solve the problem that agents maximize returns or achieve specific goals through Learning strategies during their interaction with the environment. Different from the traditional evolutionary algorithms, it can realize dynamic optimization and is inclusive of the error path optimization [52]. As a classic reinforcement learning algorithm with excellent decision-making, Q-learning is widely used in decision-making and optimization problems. In this part, a feature selection method based on Q-learning is proposed. The filtered features are further screened to avoid over-fitting the model due to excessively high feature dimensions. The specific steps are given as follows:
Step 1: Initialize the core parameters of Q-learning (the state matrix S and the action matrix a). The state matrix S represents the selection of these features. The action matrix a is the action to keep or leave these features [53].
S = s 1 , s 2 , , s m
a = Δ s 1 , Δ s 2 , Δ s 3 , , Δ s m
where sm represents the selection of these features, and sm is 0 or 1 (0 represents that the feature is not required, and 1 represents that the feature is retained). ∆sm is the action of adding or deleting the m-th feature.
Action a: Select an action strategy according to ε-greedy.
a n = A c t i o n   b a s e d   o n   m a x Q S , a p r o b a b i l i t y   o f   1 ε R a n d o m   a c t i o n p r o b a b i l i t y   o f   ε
ε 0 , 1
where ε is the exploration probability.
Step 2: Establish the reward R, which will affect the agent’s action. In this part, the MAPE of the TCN is taken as a reward.
Step 3: The agent performs an action based on a comprehensive analysis of the current environment and the state S.
Step 4: Calculate the evaluation function Q and update the Q table. Based on the reward R received from the environment, the agent updates the state and Q table by adjusting the action of input feature changes. The calculation formula of the Q value is shown as follows [54]:
Q n + 1 ( S n , a n ) = Q n ( S n , a n ) + β n R ( S n , a n ) + γ max Q n ( S n + 1 , a n + 1 ) Q n ( S n , a n )
where a represents the agent’s behavior; S stands for the current status of an agent; R is the immediate return; γ is the discount parameter; β represents learning speed.
Step 5: When the termination condition is met, the agent stops its action. At this point, the state matrix S is the final selection result of model input features. Otherwise, repeat steps 3 to 4.

3.4. TCN for Regional GDP Forecasting

TCN is a neural network model which integrates extended causal convolution and residual connection and can be used for time series prediction [55]. TCN is composed of multiple TCN residual blocks stacked. Each TCN residual block has an important parameter pair (k, d) which represents the convolution kernel size and expansion coefficient respectively [56]. The final output of the TCN residual block is the sum of the outputs of the two paths. One path takes the input values through two levels of the same DCC and outputs them. Firstly, the input value enters the DCC after the weight initialization of layer 1. Then, the output is nonlinear transformed by the ReLU activation function. Finally, the nonlinear outputs are regularized to reduce the overfitting of the model and are input to layer 2 DCC for the same transformation again. The other path is for the input value to reach the output directly through the one-dimensional convolution layer. The path is RC, which is derived from the residual neural network. It can alleviate the problems of gradient disappearance and gradient explosion existing in the deep neural network and contribute to the construction of the deep neural network.
The core component of TCN is DCC. DCC increases the value of expansion coefficient d based on causal convolution, thus expanding the receptive field of the network, that is, accepting longer historical data [57]. The first is the application of causal convolution, which means that there is no leakage of information in the past. In the network applied in this paper, the convolution kernel is 2, the expansion coefficient is 1, and the receptive field is 3. And the ŷt GDP sequences are calculated from input sequences [xt − 2, xt − 1, xt] and have nothing to do with the input sequences [xt + 1, xt + 2, ...]. Therefore, the application of causal convolution in TCN will not give rise to information leakage. However, causal convolution has the problem of a small receptive field. Therefore, DCC expands the network receptive field by increasing the expansion coefficient. The receptive field of DCC in the same layer can be expanded to 4. The extended convolution operation can be obtained by the following equation [58]:
T C N t = l = 0 p 1 f l X t d l
where TCN(t) is the extended convolution operation, X represents the time series data, f is the filter function, p is the length of the data, l is the element in X.

4. Case Study

4.1. GDP Dataset

The case study is the key to evaluating the performance of different GDP prediction frameworks. In order to select valuable regional GDP data sets, based on the analysis of GDP data in the references [59], this paper adopts the data of three Provinces in China to construct the experimental analysis. The data comes from the National Statistical Yearbook, which contains GDP data and other features data for each quarter from 2005 to 2021. Table 2 gives the basic information and input features of these three data. Figure 4, Figure 5 and Figure 6 show the temporal fluctuation characteristics of three sets of GDP data. It is necessary to ensure the stability and robustness of the proposed model. To fully prove the stability and validity of the GDP prediction model, the proposed model and other benchmark models are evaluated by the ten-fold cross-validation method. In addition, ten repeated tests were used to evaluate the performance of the model. The average value of the evaluation index of the ten predicted results was used to analyze the effect of the model. This paper mainly constructs the single-step forecasting model, that is, the model predicts the GDP of the next quarter through the current moment and historical data. The key software platform used for experiments and modeling in this paper is the Python 3.8.5 platform, mainly using TensorFlow 2.3 to build the neural network. The Python was designed by Guido van Rossum, who works in Google. The version of python used in this paper is 3.8.5. The TensorFlow was created by Google open source, the version used in this paper is 2.3.
Table 2. Basic information about this GDP dataset.
Figure 4. Raw GDP Data 1.
Figure 5. Raw GDP Data 2.
Figure 6. Raw GDP Data 3.

4.2. Performance Evaluation Indexes

The regression analysis index is the key to evaluating the performance of the model proposed in this paper. To fully analyze the modeling performance of each model, three classic indexes, which are the MAE (Mean Absolute Error), the MAPE (Mean Absolute Percentage Error), and the RMSE (Root Mean Square Error), are used in all case studies. These indexes can be obtained by the following Equation (10) [60]:
M A E = ( T = 1 n Y ( T ) Y ( T ) ) / n M A P E = ( T = 1 n ( Y ( T ) Y ( T ) ) / y ( T ) ) / n R M S E = ( T = 1 n Y ( T ) Y ( T ) 2 ) / n
where Y (T) represents true GDP data. Y ( T ) represents the GDP data calculated by the proposed model. N means the number of samples.
At the same time, it is necessary to select appropriate indicators to evaluate the performance differences between different models. This study utilized the Promoting percentages of the MAE (PMAE), the Promoting percentages of the MAPE (PMAPE), and the Promoting percentages of the RMSE (PRMSE) to evaluate the performance differences between different algorithms. These indexes can be obtained by the following Equation (11) [61]:
P M A E = M A E a M A E b M A E a P M A P E = M A P E a M A P E b M A P E a P R M SE = R M S E a R M S E b R M S E a

4.3. Contrast Experiment with Benchmark Algorithms

4.3.1. Experimental Results and Analysis of Different Predictors

To fully compare and analyze the modeling effects of different predictors and prove the superiority of the TCN algorithm, this paper adopts TCN, GRU, LSTM, RNN, ELM, and RBF models to construct comparative experiments. The experiment includes a classical deep learning model and a traditional shallow neural network. Table 3 gives the regression analysis indexes of the prediction results of these algorithms. From Table 3, the following conclusions can be drawn:
Table 3. The regression analysis indexes of several predictors.
(1)
Compared with the traditional RBF and ELM algorithms, the neural network model based on deep learning can obtain fewer error prediction results. The experimental cases fully prove that the deep learning method can achieve satisfactory modeling results in this field. The feasible reason is that the multi-layer deep network structure has certain advantages in mining the deep feature information of data.
(2)
Compared with the traditional RNN prediction model, other deep learning models can achieve more satisfactory prediction results. This proves that other deep neural networks with special structures can better resume an excellent GDP forecasting framework. The possible reason is that the RNN model has problems such as gradient descent and gradient disappearance, which to some extent limits the training effect of the model and reduces the overall accuracy.
(3)
Compared with GRU and LSTM, the TCN model adopted in this paper can achieve smaller prediction errors in all cases. This fully proves the practical value and modeling ability of the TCN algorithm in GDP forecasting. The feasible reason is that the TCN algorithm fully combines the characteristics of CNN and RNN. Therefore, TCN improves the parallel computing capability of the model while maintaining the advantages of timing modeling, which further improves the performance of the model.

4.3.2. Experimental Results and Analysis of Different Hybrid Models

In order to fully verify the application value of the GDP prediction model proposed in this paper, two parts of comparative experiments are set up in this section.
Part I: To prove that the three-stage feature selection method adopted in this paper can effectively optimize the prediction performance of the TCN algorithm, the proposed FC-BorutaRF-Q-TCN algorithm is compared with FC-BorutaRF-TCN, FC-Q-TCN, and TCN respectively.
Part II: To fully prove that the feature selection model based on reinforcement learning adopted in this paper has excellent feature selection ability, the Q-learning algorithm is compared with classical GA and PSO.
Table 4 gives the regression analysis indexes of the prediction results of these algorithms. Table 5 and Table 6 show the promoting percentages of FC-Borutarf-Q-TCN by other models. Figure 7 gives the loss of different feature selection algorithms during iteration. Table 7 shows the results of feature selection. From Table 4, Table 5, Table 6 and Table 7 and Figure 7, the following conclusions can be drawn:
Table 4. The indexes evaluation results of several forecasting models.
Table 5. The promoting percentages of FC-BorutaRF-Q-TCN by other models.
Table 6. The promoting percentages of the Q-learning by heuristic algorithms.
Figure 7. Loss of different feature selection algorithms during iteration.
Table 7. Feature selection results.
(1)
Compared with the single TCN model, all the hybrid models can achieve better prediction accuracy. The experimental results fully prove the ability of the feature engineering algorithm to optimize the prediction results of the predictor. The possible reason is that the feature engineering algorithm deeply excavates the deep correlation between GDP and other feature historical data and labels from two perspectives and selects the best quality features, which effectively optimizes the modeling ability of TCN.
(2)
The prediction results of FC- BorutaRF-Q-TCN are obviously better than those of FC-Q-TCN and FC-BorutaRF-TCN. This fully proves that the three-stage feature selection algorithm adopted in this paper can deeply mine the feature information of original data and achieve better results than the single feature selection algorithm. The feasible reason is that the BorutaRF algorithm and Q-Learning algorithm fully analyze the characteristic information obtained from the original data of the FC method and optimize it from two different perspectives. Therefore, TCN can obtain the best input features and establish the optimal GDP prediction model.
(3)
Compared with PSO and GA, the feature selection algorithm based on reinforcement learning adopted in this paper can obtain the best results. This fully proves the ability of the Q-learning algorithm to analyze feature quality and make a selection. The possible reason is that, compared with other heuristic algorithms, the reinforcement learning algorithm improves the intelligence of the model by constantly training agents. Therefore, Q-learning can effectively evaluate the quality of features and select the optimal input for the TCN algorithm.
(4)
Based on feature selection results, it can be found that the largest number of retained features are historical GDP data and industrial features. In addition, the data on educational features and population features are less reserved. It proves that historical GDP data and industrial features play a paramount role in the composition of regional GDP. Based on the feature selection results, a more accurate prediction framework can be constructed in the process of establishing a GDP prediction model in the future.

4.4. Contrast Experiment with Existing Algorithms

To prove that the FC-BorutarRF-Q-TCN model proposed in this paper is an advanced GDP forecasting model with excellent research prospects, it is compared with the four existing models. The four existing models include the classical time series model (ARIMA), the traditional machine learning model (SVM), and the two most advanced models (Yan’s model [62] and Dong’s model [63]). Figure 8, Figure 9 and Figure 10 show the MAE, MAPE, and RMSE values of the proposed model and those of four existing models. Figure 11, Figure 12 and Figure 13 show the prediction results of all the comparison models. Based on Figure 8, Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13, the following conclusions can be drawn:
Figure 8. MAE values of the proposed model and existing models.
Figure 9. MAPE values of the proposed model and existing models.
Figure 10. RMSE values of the proposed model and existing models.
Figure 11. Predicted results of all algorithms (Data 1).
Figure 12. Predicted results of all algorithms (Data 2).
Figure 13. Predicted results of all algorithms (Data 3).
(1)
Compared with the classical ARIMA and SVM algorithms, all the mixed models can achieve more satisfactory prediction results. The experimental results fully prove the practicability and effectiveness of the hybrid model in GDP forecasting. The feasible reason is that the hybrid model effectively optimizes the input features of the GDP forecasting model from the perspective of feature analysis and data mining, which effectively improves the analysis and modeling capabilities of all predictors.
(2)
The FC-BorutarRF-Q-TCN model proposed in this paper can achieve the best prediction accuracy in all cases. This fully proves the stability and advance of the FC-BorutarRF-Q-TCN model. First of all, the model outperforms the feature crossing algorithm to mine potential feature information from the original data. Then, the BorutarRF algorithm and the Q-learning algorithm screen the features obtained by the FC algorithm from two different angles. Finally, the screened features are used as the input of TCN to build the GDP prediction model and obtain the final prediction results. Overall, the model further improves the prediction performance from multiple perspectives. Therefore, the FC-BorutarRF-Q-TCN model can achieve excellent research value in the field of GDP prediction.

4.5. Discussion

Based on the above analysis of all experimental results, the following discussion and analysis are carried out in this section:
(1)
The FC-BorutarRF-Q-TCN model proposed in this paper can achieve the best prediction accuracy in all cases. In addition, the stability and effectiveness of the model are fully proved by the results of ten-fold cross-validation and ten repeated experiments.
(2)
Table 3 fully proves the predictive performance of TCN. Compared with other neural networks, TCN effectively combines the parallel computation capability of CNN and the recursive modeling capability of RNN. Therefore, TCN has achieved excellent results in the field of GDP forecasting modeling.
(3)
Based on Table 4, Table 5, Table 6 and Table 7 and Figure 7, it can be found that the proposed three-stage feature selection framework can effectively improve the prediction performance of TCN. In addition, based on the result of feature selection, it can be seen that historical GDP data and Industrial structure features are relatively crucial indexes affecting GDP prediction. Education and population have relatively little impact on GDP prediction. Therefore, the experimental results have certain help for the future government to formulate policies and promote economic development.
(4)
Figure 8, Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13 fully show the application prospect of the proposed model in GDP prediction. As can be seen from Figure 8, Figure 9 and Figure 10, the errors of the proposed FC-BorutarRF-Q-TCN model are significantly lower than that of other existing models. In addition, the MAPE values of the proposed model are less than 5% in all cases. In addition, based on Figure 11, Figure 12 and Figure 13, it can be found that the GDP prediction result of the proposed FC-BorutarRF-Q-TCN model is extremely close to the real GDP data, which can prove the strong practicality of the model in this field.
(5)
GDP prediction technology can provide technical support for regional economic development and policymaking. However, GDP prediction technology also has some limitations. GDP is a favorable indicator of physical production, and to some extent ignores the value of open-source products, services, free products, and other related industries. Therefore, GDP and other modern industries should be taken into consideration to comprehensively evaluate the regional economic level and formulate further development policies [64].

5. Conclusions and Future Work

As a comprehensive signal of the future economic situation, GDP forecasting technology provides technical support for national macro-economic regulation. In the paper, a new multi-data-driven GDP prediction model based on three-stage feature selection and a TCN network is proposed. The main contributions of this paper are summarized from the following perspectives:
(1)
Different from the traditional single-variable GDP time series forecasting framework, this paper proposes a multi-data-driven GDP forecasting model. The model comprehensively considers the influence of other features on GDP and further optimizes the prediction performance of the model.
(2)
Different from the traditional shallow neural network and recursive neural network, the TCN neural network adopted in the paper fully combines the training advantages of CNN and the timing sequence modeling ability of RNN. Therefore, the TCN algorithm could achieve a more excellent GDP prediction effect and is the most important predictor.
(3)
A new three-stage feature selection framework is proposed to optimize the prediction performance of TCN. On the one hand, the framework uses the FC algorithm to further mine the potential features of the original data and expand the deep information of the data. On the other hand, Q-learning and BorutarRF algorithms screen features from different angles and ensure the quality of TCN inputs. The three-stage feature Selection framework improves TCN performance by more than 10%.
(4)
The feature selection results show that historical GDP data and Industrial structure features are relatively crucial feature data for GDP prediction modeling. The feature selection results have important reference values for the government to adjust economic policy.
(5)
In order to prove the advance and practicability of the proposed FC-BorutarRF-Q-TCN prediction framework, fourteen models used by other researchers were replicated and compared with the model proposed in this paper. The experimental results show that the FC-BorutarRF-Q-TCN model is a GDP forecasting framework with excellent research prospects. The MAPE values are all less than 5%.
The proposed multi-factor data-driven GDP prediction framework provides a meaningful reference for regional economic development strategy. In the future, the model proposed can be further improved from the following perspectives to enhance its practical value:
(1)
The GDP prediction framework is mainly obtained through multi-factor data-driven training. Therefore, when the amount of data increases and updates, the model also needs to be updated and trained constantly to ensure timeliness.
(2)
The model can accurately predict the change in GDP in the next quarter. Based on the forecast results, the government makes relevant economic policies to realize the adjustment and development of the regional economic level. In the future, it is a very important step to formulate a reasonable sustainable development strategy based on the GDP prediction results.
(3)
GDP prediction technology provides effective technical guidance for the sustainable development of the regional economy. Therefore, based on the research results, it can effectively drive the upgrading of regional industries and promote green development and sustainability in the future.
(4)
GDP can effectively reflect the situation of physical production and regional economic development. However, GDP does not fully analyze related industries such as services, open-source products, and products provided by society for free. Therefore, comprehensive consideration of GDP and other industries is very indispensable for the sustainable development of the regional economy and prosperity level.

Author Contributions

Conceptualization, G.Y. and C.Y.; methodology, C.Y.; software, G.Y. and C.Y.; validation, C.Y., G.Y. and Q.L.; formal analysis, Q.L.; investigation, Q.L.; resources, Q.L.; data curation, Q.L. and C.Y.; writing—original draft preparation, Q.L.; writing—review and editing, Q.L., G.Y. and C.Y.; visualization, Q.L.; supervision, G.Y.; project administration, G.Y.; funding acquisition, Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This study is fully supported by the Natural Science Foundation of Hunan Province (Grant No. 2021JJ60003).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pavía-Miralles, J.M.; Cabrer-Borrás, B. On estimating contemporaneous quarterly regional GDP. J. Forecast. 2007, 26, 155–170. [Google Scholar] [CrossRef]
  2. Jurun, E.; Pivac, S. Comparative regional GDP analysis: Case study of Croatia. Cent. Eur. J. Op. Res. 2011, 19, 319–335. [Google Scholar] [CrossRef]
  3. Landefeld, J.S.; Seskin, E.P.; Fraumeni, B.M. Taking the pulse of the economy: Measuring GDP. J. Econ. Perspect. 2008, 22, 193–216. [Google Scholar] [CrossRef]
  4. Chen, X.; Liu, C.; Yu, X. Urbanization, Economic Development, and Ecological Environment: Evidence from Provincial Panel Data in China. Sustainability 2022, 14, 1124. [Google Scholar] [CrossRef]
  5. König, M.; Winkler, A. COVID-19: Lockdowns, fatality rates and GDP growth. Intereconomics 2021, 56, 32–39. [Google Scholar] [CrossRef]
  6. Abbass, K.; Begum, H.; Alam, A.; Awang, A.H.; Abdelsalam, M.K.; Egdair, I.M.M.; Wahid, R. Fresh Insight through a Keynesian Theory Approach to Investigate the Economic Impact of the COVID-19 Pandemic in Pakistan. Sustainability 2022, 14, 1054. [Google Scholar] [CrossRef]
  7. Kokolakakis, T.; Lera-Lopez, F.; Ramchandani, G. Measuring the Economic Impact of COVID-19 on the UK’s Leisure and Sport during the 2020 Lockdown. Sustainability 2021, 13, 13865. [Google Scholar] [CrossRef]
  8. Liu, X.; Liu, Y.; Yan, Y. China macroeconomic report 2020: China’s macroeconomy is on the rebound under the impact of COVID-19. Econ. Political Stud. 2020, 8, 395–435. [Google Scholar] [CrossRef]
  9. Yang, W.; Wang, X.; Zhang, K.; Ke, Z. COVID-19, Urbanization Pattern and Economic Recovery: An Analysis of Hubei, China. Int. J. Environ. Res. Public Health 2020, 17, 9577. [Google Scholar] [CrossRef]
  10. Zheng, G.; Zhu, S. Research on the Effectiveness of China’s Macro Control Policy on Output and Technological Progress under Economic Policy Uncertainty. Sustainability 2021, 13, 6844. [Google Scholar] [CrossRef]
  11. Goolsbee, A.; Syverson, C. Fear, lockdown, and diversion: Comparing drivers of pandemic economic decline 2020. J. Public Econ. 2021, 193, 104311. [Google Scholar] [CrossRef] [PubMed]
  12. Huynh, T.L. The COVID-19 risk perception: A survey on socioeconomics and media attention. Econ. Bull. 2020, 40, 758–764. [Google Scholar]
  13. Gan, Z.; Li, C.; Zhou, J.; Tang, G. Temporal convolutional networks interval prediction model for wind speed forecasting. Electr. Power Syst. Res. 2021, 191, 106865. [Google Scholar] [CrossRef]
  14. Sun, Q.; Yang, Z.; Chen, X.; Yu, C. Optical Performance monitoring using Q-learning optimized least square support vector machine in optical network. In Proceedings of the 2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP), Nanjing, China, 22–24 October 2021; pp. 954–958. [Google Scholar]
  15. Suganthi, L.; Samuel, A.A. Energy models for demand forecasting—A review. Renew. Sustain. Energy Rev. 2012, 16, 1223–1240. [Google Scholar] [CrossRef]
  16. Kalantaripor, M.; Najafi Alamdarlo, H. Spatial Effects of Energy Consumption and Green GDP in Regional Agreements. Sustainability 2021, 13, 10078. [Google Scholar] [CrossRef]
  17. Gómez, M.; Rodríguez, J.C. Energy consumption and financial development in NAFTA countries, 1971–2015. Appl. Sci. 2019, 9, 302. [Google Scholar] [CrossRef]
  18. Bjørnland, H.C.; Ravazzolo, F.; Thorsrud, L.A. Forecasting GDP with global components: This time is different. Int. J. Forecast. 2017, 33, 153–173. [Google Scholar] [CrossRef][Green Version]
  19. Sezer, O.B.; Gudelek, M.U.; Ozbayoglu, A.M. Financial time series forecasting with deep learning: A systematic literature review: 2005–2019. Appl. Soft Comput. 2020, 90, 106181. [Google Scholar] [CrossRef]
  20. Sadik-Zada, E.R.; Loewenstein, W. Drivers of CO2-Emissions in Fossil Fuel Abundant Settings: (Pooled) Mean Group and Nonparametric Panel Analyses. Energies 2020, 13, 3956. [Google Scholar] [CrossRef]
  21. Dai, S.; Niu, D.; Li, Y. Forecasting of energy consumption in China based on ensemble empirical mode decomposition and least squares support vector machine optimized by improved shuffled frog leaping algorithm. Appl. Sci. 2018, 8, 678. [Google Scholar] [CrossRef]
  22. Abonazel, M.R.; Abd-Elftah, A.I. Forecasting Egyptian GDP using ARIMA models. Rep. Econ. Financ. 2019, 5, 35–47. [Google Scholar] [CrossRef]
  23. Wu, C.; Chen, P. Application of support vector machines in debt to GDP ratio forecasting. In Proceedings of the 2006 International Conference on Machine Learning and Cybernetics, Dalian, China, 13–16 August 2006; pp. 3412–3415. [Google Scholar]
  24. Ghanem, S.; Fandi, G.; Krepl, V.; Husein, T.; Rzek, O.; Muller, Z.; Kyncl, J.; Tlustý, J.; Smutka, L. The Impact of COVID-19 on Electricity Prices in Italy, the Czech Republic, and China. Appl. Sci. 2021, 11, 8793. [Google Scholar] [CrossRef]
  25. Sadik-Zada, E.R.; Niklas, B. Business Cycles and Alcohol Consumption: Evidence from a Nonlinear Panel ARDL Approach. J. Wine Econ. 2021, 16, 429–438. [Google Scholar] [CrossRef]
  26. Niklas, B.; Sadik-Zada, E.R. Income Inequality and Status Symbols: The Case of Fine Wine Imports. J. Wine Econ. 2019, 14, 365–373. [Google Scholar] [CrossRef]
  27. Yusof, Y.; Kamaruddin, S.S.; Husni, H.; Ku-Mahamud, K.R.; Mustaffa, Z. Forecasting model based on LSSVM and ABC for natural resource commodity. Int. J. Comput. Theory Eng. 2013, 5, 906. [Google Scholar] [CrossRef]
  28. Long, G. GDP prediction by support vector machine trained with genetic algorithm. In Proceedings of the 2010 2nd International Conference on Signal Processing Systems, Dalian, China, 5–7 July 2010; pp. V3-1–V3-3. [Google Scholar]
  29. Guleryuz, D. Determination of industrial energy demand in Turkey using MLR, ANFIS and PSO-ANFIS. J. Artif. Intell. Syst. 2021, 3, 16–34. [Google Scholar]
  30. Tuo, S.; Chen, T.; He, H.; Feng, Z.; Zhu, Y.; Liu, F.; Li, C. A Regional Industrial Economic Forecasting Model Based on a Deep Convolutional Neural Network and Big Data. Sustainability 2021, 13, 12789. [Google Scholar] [CrossRef]
  31. Sa’adah, S.; Wibowo, M.S. Prediction of Gross Domestic Product (GDP) in Indonesia Using Deep Learning Algorithm. In Proceedings of the 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Yogyakarta, Indonesia, 10–11 December 2020; pp. 32–36. [Google Scholar]
  32. Liu, B.; Fu, C.; Bielefield, A.; Liu, Y.Q. Forecasting of Chinese primary energy consumption in 2021 with GRU artificial neural network. Energies 2017, 10, 1453. [Google Scholar] [CrossRef]
  33. Wang, H.; Zhao, Y.; Tan, S. Short-term load forecasting of power system based on time convolutional network. In Proceedings of the 2019 8th International Symposium on Next Generation Electronics (ISNE), Zhengzhou, China, 9–10 October 2019; pp. 1–3. [Google Scholar]
  34. Sun, Y.; Hong, Y.; Wang, S. Out-of-sample forecasts of China’s economic growth and inflation using rolling weighted least squares. J. Manag. Sci. Eng. 2019, 4, 1–11. [Google Scholar] [CrossRef]
  35. Yoon, J. Forecasting of real GDP growth using machine learning models: Gradient boosting and random forest approach. Comput. Econ. 2021, 57, 247–265. [Google Scholar] [CrossRef]
  36. Bai, X.; Zhang, F.; Hou, J.; Xia, F.; Tolba, A.; Elashkar, E. Implicit multi-feature learning for dynamic time series prediction of the impact of institutions. IEEE Access 2017, 5, 16372–16382. [Google Scholar] [CrossRef]
  37. Ortega-Bastida, J.; Gallego, A.-J.; Rico-Juan, J.R.; Albarrán, P. Regional gross domestic product prediction using twitter deep learning representations. In Proceedings of the IADIS International Conference Applied Computing, Bangkok, Thailand, 4–5 February 2020; pp. 89–96. [Google Scholar]
  38. Nahil, A.; Lyhyaoui, A. Short-term stock price forecasting using kernel principal component analysis and support vector machines: The case of Casablanca stock exchange. Procedia Comput. Sci. 2018, 127, 161–169. [Google Scholar] [CrossRef]
  39. Wang, J.; Li, Y. Multi-step ahead wind speed prediction based on optimal feature extraction, long short term memory neural network and error correction strategy. Appl. Energy 2018, 230, 429–443. [Google Scholar] [CrossRef]
  40. Kosana, V.; Teeparthi, K.; Madasthu, S.; Kumar, S. A novel reinforced online model selection using Q-learning technique for wind speed prediction. Sustain. Energy Technol. Assess. 2022, 49, 101780. [Google Scholar] [CrossRef]
  41. Xu, R.; Li, M.; Yang, Z.; Yang, L.; Qiao, K.; Shang, Z. Dynamic feature selection algorithm based on Q-learning mechanism. Appl. Intell. 2021, 51, 7233–7244. [Google Scholar] [CrossRef]
  42. Maeda-Gutiérrez, V.; Galván-Tejada, C.E.; Cruz, M.; Valladares-Salgado, A.; Galván-Tejada, J.I.; Gamboa-Rosales, H.; García-Hernández, A.; Luna-García, H.; Gonzalez-Curiel, I.; Martínez-Acuña, M. Distal Symmetric Polyneuropathy Identification in Type 2 Diabetes Subjects: A Random Forest Approach. Healthcare 2021, 9, 138. [Google Scholar] [CrossRef] [PubMed]
  43. Ortega-Bastida, J.; Gallego, A.J.; Rico-Juan, J.R.; Albarrán, P. A multimodal approach for regional GDP prediction using social media activity and historical information. Appl. Soft Comput. 2021, 111, 107693. [Google Scholar] [CrossRef]
  44. Chiang, Y.H.; Tao, L.; Wong, F.K.W. Causal relationship between construction activities, employment and GDP: The case of Hong Kong. Habitat Int. 2015, 46, 1–12. [Google Scholar] [CrossRef]
  45. Ifa, A.; Guetat, I. Does public expenditure on education promote Tunisian and Moroccan GDP per capita? ARDL approach. J. Financ. Data Sci. 2018, 4, 234–246. [Google Scholar] [CrossRef]
  46. Wiemer, C.; Tian, X. The measurement of small-scale industry for China’s GDP accounts. China Econ. Rev. 2001, 12, 317–322. [Google Scholar] [CrossRef]
  47. Liang, J.; Hou, L.; Luan, Z.; Huang, W. Feature Selection with Conditional Mutual Information Considering Feature Interaction. Symmetry 2019, 11, 858. [Google Scholar] [CrossRef]
  48. Aghaeepour, N.; Finak, G.; Hoos, H.; Mosmann, T.R.; Brinkman, R.; Gottardo, R.; Scheuermann, R.H.; Flow, C.A.P.C.; Consortium, D. Critical assessment of automated flow cytometry data analysis techniques. Nat. Methods 2013, 10, 228–238. [Google Scholar] [CrossRef] [PubMed]
  49. Gholami, H.; Mohammadifar, A.; Golzari, S.; Kaskaoutis, D.G.; Collins, A.L. Using the Boruta algorithm and deep learning models for mapping land susceptibility to atmospheric dust emissions in Iran. Aeolian Res. 2021, 50, 100682. [Google Scholar] [CrossRef]
  50. Ahmed, A.A.M.; Deo, R.C.; Feng, Q.; Ghahramani, A.; Raj, N.; Yin, Z.; Yang, L. Deep learning hybrid model with Boruta-Random forest optimiser algorithm for streamflow forecasting with climate mode indices, rainfall, and periodicity. J. Hydrol. 2021, 599, 126350. [Google Scholar] [CrossRef]
  51. Park, J.; Kim, T.; Seong, S.; Koo, S. Control automation in the heat-up mode of a nuclear power plant using reinforcement learning. Prog. Nucl. Energy 2022, 145, 104107. [Google Scholar] [CrossRef]
  52. Qi, C.; Zhu, Y.; Song, C.; Cao, J.; Xiao, F.; Zhang, X.; Xu, Z.; Song, S. Self-supervised reinforcement learning-based energy management for a hybrid electric vehicle. J. Power Sources 2021, 514, 230584. [Google Scholar] [CrossRef]
  53. Subramanian, A.; Chitlangia, S.; Baths, V. Reinforcement learning and its connections with neuroscience and psychology. Neural Netw. 2022, 145, 271–287. [Google Scholar] [CrossRef] [PubMed]
  54. Huynh, T.N.; Do, D.T.T.; Lee, J. Q-Learning-based parameter control in differential evolution for structural optimization. Appl. Soft Comput. 2021, 107, 107464. [Google Scholar] [CrossRef]
  55. Schreiber, J.A.; Müller, S.L.; Westphälinger, S.E.; Schepmann, D.; Strutz-Seebohm, N.; Seebohm, G.; Wünsch, B. Systematic variation of the benzoylhydrazine moiety of the GluN2A selective NMDA receptor antagonist TCN-201. Eur. J. Med. Chem. 2018, 158, 259–269. [Google Scholar] [CrossRef]
  56. Howell, B.A.; Puglisi, L.; Clark, K.; Albizu-Garcia, C.; Ashkin, E.; Booth, T.; Brinkley-Rubinstein, L.; Fiellin, D.A.; Fox, A.D.; Maurer, K.F.; et al. The Transitions Clinic Network: Post Incarceration Addiction Treatment, Healthcare, and Social Support (TCN-PATHS): A hybrid type-1 effectiveness trial of enhanced primary care to improve opioid use disorder treatment outcomes following release from jail. J. Subst. Abus. Treat. 2021, 128, 108315. [Google Scholar] [CrossRef]
  57. Samal, K.K.R.; Panda, A.K.; Babu, K.S.; Das, S.K. Multi-output TCN autoencoder for long-term pollution forecasting for multiple sites. Urban Clim. 2021, 39, 100943. [Google Scholar] [CrossRef]
  58. Ma, Q.; Wang, H.; Luo, P.; Peng, Y.; Li, Q. Ultra-short-term Railway traction load prediction based on DWT-TCN-PSO_SVR combined model. Int. J. Electr. Power Energy Syst. 2022, 135, 107595. [Google Scholar] [CrossRef]
  59. Wu, X.; Zhang, Z.; Chang, H.; Huang, Q. A data-driven gross domestic product forecasting model based on multi-indicator assessment. IEEE Access 2021, 9, 99495–99503. [Google Scholar] [CrossRef]
  60. Yan, G.; Yu, C.; Bai, Y. Wind Turbine Bearing Temperature Forecasting Using a New Data-Driven Ensemble Approach. Machines 2021, 9, 248. [Google Scholar] [CrossRef]
  61. Liu, X.; Qin, M.; He, Y.; Mi, X.; Yu, C. A new multi-data-driven spatiotemporal PM2.5 forecasting model based on an ensemble graph reinforcement learning convolutional network. Atmos. Pollut. Res. 2021, 12, 101197. [Google Scholar] [CrossRef]
  62. Yan, G.; Yu, C.; Bai, Y. A New Hybrid Ensemble Deep Learning Model for Train Axle Temperature Short Term Forecasting. Machines 2021, 9, 312. [Google Scholar] [CrossRef]
  63. Dong, S.; Yu, C.; Yan, G.; Zhu, J.; Hu, H. A Novel Ensemble Reinforcement Learning Gated Recursive Network for Traffic Speed Forecasting. In Proceedings of the 2021 Workshop on Algorithm and Big Data, Fuzhou, China, 12–14 March 2021; pp. 55–60. [Google Scholar]
  64. Jean-Paul, F.; Martine, D. Beyond GDP Measuring What Counts for Economic and Social Performance: Measuring What Counts for Economic and Social Performance; OECD Publishing: Paris, France, 2018. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.