1. Introduction
As a key industry to economic development, power provides a basic guarantee for social production and life. With the advantages of low pollution, high efficiency, low distribution costs and a wide range of applications, power will remain an irreplaceable source of energy for a considerable period of time in the future. With the rapid development of the electricity market, the electricity demand for social production and life is increasing, which places high demands on the production planning and scheduling of the power system. Therefore, the accuracy of short-term electricity load forecasting is very important [
1], as it will enable us to ensure the reliable and economic operation of the power system [
2]. However, the electricity load is a random and non-stationary series that is affected by various influencing factors, such as the time of day, weather conditions, economic indicators, etc., presenting many challenges to load forecasting [
3]. With the constant development of smart grids, various distributed smart meters have been installed and configured in the power system, collecting a large amount of accurate and reliable load data, which provide the basis for short-term power load forecasting.
For the purpose of improving the accuracy of short-term load prediction, many experts have done research work on the subject. These methods mainly include traditional forecasting methods, artificial intelligence forecasting methods and hybrid forecasting methods. The traditional forecasting methods include regression analysis [
4], autoregressive integrated sliding average model (ARMIA) [
5,
6], seasonal exponential smoothing [
7], and Kalman filter [
8], which are all implemented based on statistics and have the advantages of being simple and fast, with good fitting and forecasting effects on smooth curves. However, as a typical time-series forecasting problem, short-term power load prediction usually entails strong volatility in the load data, and the factors affecting the load are numerous and complex, so it is difficult to guarantee the accuracy of forecasting using traditional statistics-based forecasting methods. Experts and scholars from various countries have applied machine learning and deep neural networks (DNN) in short-term load prediction. Typical algorithms for forecasting using machine learning approaches include expert systems [
9,
10] and support vector regression [
11,
12], but expert systems do not have self-learning capabilities, and support vector regression has difficulties in handling large-scale data. In addition, random forests [
13] and regression trees [
14] have been used in short-term load prediction. With the development of artificial intelligence technology, artificial neural network (ANN) forecasting methods [
15], deep learning methods [
16] and deep belief networks (DBN) [
17] have also been used in short-term load prediction in large numbers, and have achieved better forecasting results.
Aiming to more fully tap the information within the historical data of power loads and to fully consider the relevant influencing factors affecting integration within the forecasting model, many experts have proposed hybrid forecasting methods to make full use of the advantages of various forecasting methods, forming a complementary effect and further improving the accuracy of short-term load prediction. Among the hybrid prediction methods used for short-term electricity load, the combined use of convolutional neural networks (CNN) and recurrent neural networks (RNN) is a classical approach. The references [
18,
19] used CNNs to extract feature vectors, the reference [
18] fed the processed feature vectors into a long short-term memory network (LSTM) for forecasting, and the reference [
19] fed the feature vectors constructed from CNN extraction into a gated recurrent unit (GRU) for forecasting, and both models obtained high accuracy forecasting values. Both GRU and LSTM are RNN networks, and are designed to solve the problem of gradient disappearance in RNN networks. Additionally, to solve the vanishing gradient problem when predicting long sequences, reference [
20] introduced an attention mechanism into RNNs. In addition, short-term load models were constructed in references [
21,
22] using a combination of CNN and BiLSTM, and CNN and BiGRU, respectively. In reference [
23], a residual convolutional neural network (R-CNN) is used to extract the basic features of power consumption data, which is input into a multilayer long- and short-term memory network (ML-LSTM) to learn the sequence information, and finally, a fully connected layer is used for prediction. These methods are effective in improving the accuracy of short-term load prediction, but the interpretability of the forecasting models constructed based on such ideas is not strong.
Another way of thinking about hybrid models is to build a short-term forecasting model by decomposing the original load data in order to reduce their volatility, and the different components obtained from the decomposition are predicted using their own approaches, with the various methods forming complementary strengths. Wavelet decomposition is used in reference [
24], but different basis functions and orders have different effects when using wavelet decomposition to deal with unstable sequences, which makes it a priori and increases the complexity of its use. The empirical mode decomposition (EMD) approach is used in references [
3,
25], but EMD suffers from mode confounding problems. The references [
26,
27,
28] use an ensemble empirical mode decomposition (EEMD) approach to obtain multiple intrinsic mode functions (IMFs), with separate predictions for different IMFs. Based on this, EEMD has been improved to obtain a complete ensemble empirical mode decomposition algorithm (CEEMD), which has been used to decompose the raw load sequence and construct prediction models [
29,
30,
31]. An EMD–mRMR–FOA–GRNN model was constructed in reference [
3], where EMD was first used to decompose the raw load series into several IMFs and a residual with different frequencies, and then the correlation analysis of each IMF with features such as day type, temperature and meteorological conditions was performed using the minimal redundancy maximal relevance (mRMR). This resulted in the best feature set. Finally, the fruit fly optimization algorithm (FOA) was used to optimize the smoothing factor in the generalized regression neural network (GRNN), and the final forecast load was obtained by summing the results of all IMF forecasting values. The prediction accuracy of this model was significantly improved. In reference [
27], the original load data were decomposed and reconstructed with components with similar entropy values, and the reconstructed components were fed into the LSTM for forecasting, while the hyperparameters in the LSTM were optimized using the Bayesian Optimization Algorithm (BOA).
Variational mode decomposition (VMD) is also a commonly used decomposition approach for short-term load forecasting [
32,
33,
34], which is a completely non-recursive model that avoids the mode-confounding problems present in EMD, and has the advantage that the number of mode decompositions can be artificially determined. In reference [
32], VMD decomposes the original sequence into multiple subsequences, which reduces the volatility of the raw load. CNN is used to solve the problem whereby it is difficult for GRU to extract high-dimensional characteristics of the power load, and an attention mechanism is introduced to solve the problem whereby important information cannot be emphatically weighted when the series is too long. A hybrid model based on the cuckoo search algorithm (CSA), optimizing VMD, seasonal autoregressive integrated moving average (SARIMA) and a deep belief network (DBM), is proposed in reference [
33]. Firstly, VMD–CSA decomposes the original load into a number of regular and random sub-series. Secondly, SARIMA is used to forecast the regular subsequence and DBN is used to forecast the random subsequence, and finally, the prediction values of each sub-sequence are summed to result in the final prediction value. In reference [
34], a hybrid prediction model based on VMD, BOA and LSTM was constructed and achieved excellent forecasting results. In view of the above advantages of VMD, VMD was chosen to be used to decompose the original load data in this paper.
In hybrid forecasting models, the artificial setting of some parameters requires a lot of experience, and makes the parameter setting task difficult, so using a swarm intelligence optimization algorithm to find the optimal parameters for the forecasting model can improve the performance of the forecasting model [
35]. Reference [
21] used the Grey Wolf Optimization algorithm (GWO) to obtain optimal parameter sets for CNN and BiLSTM. In reference [
29], quantum dragonfly algorithm (QDA) was used in combination with SVR. In reference [
36], the optimal parameters for SVR are determined using the particle swarm optimization algorithm (PSO). In reference [
37], a paper manufacturer was used as the research object to collect tertiary electricity consumption data and establish a hybrid prediction model based on production information back propagation neural network (BPNN) in combination with genetic algorithm (GA) and PSO, in order to obtain more accurate forecasting values and reduce the unit power consumption of paper products. The setting of VMD parameters directly affects the subsequent forecasting effect, so this paper uses SSA to seek the optimal key parameters of VMD.
In addition, the light weight of the model is also an issue that should be considered in short-term load forecasting. A forecasting model using GRU and Random Forest (RF) is proposed in reference [
38], where GRU is used to predict the electric load, while RF is used to decrease the input dimensionality of the model. The prediction model is constructed with guaranteed accuracy to achieve a lightweight model. Due to the complexity of the various factors that influence the load, directly entering all of them into the prediction model would unnecessarily increase the dimensionality of the feature vector, and would not be conducive to the accuracy of the model. Reference [
39] proposes a conditional mutual information-based feature selection method to select a more effective set of input variables for the prediction model. In order to filter out most of the unrelated and redundant features, reference [
40] uses the Partial Mutual Information-based filtering method. In reference [
41], the Spearman rank correlation coefficient is used to quantitatively analyze the correlation between the building heating load and various variables. By feature selection engineering, the constructed forecasting model is simplified to ensure high forecasting accuracy and forecasting rate. To achieve this goal, this paper takes full consideration of the dimensionality of the feature vector when constructing the feature vector to ensure the light weight of the model. Pearson correlation analysis (PCC) and maximal information coefficient (MIC) are used to select the factors with a high degree of influence on short-term load as feature values and construct the feature vector; the autocorrelation function (ACF) is used to select the node load values that have the greatest influence on the load values of the nodes to be predicted.
The various forecasting methods mentioned above are summarized in
Table 1 below.
This paper proposes a two-stage hybrid forecasting model to fully mine the information in non-linear load data, and to integrate the influencing factors into the forecasting model to improve the accuracy of short-term load forecasting. The main contributions of this paper are as follows:
(1) This paper analyses the power load characteristics and influencing factors, and summarizes the function expression of the current load with which to construct a two-stage hybrid forecasting model, which improves the interpretability of the model;
(2) An evaluation criterion for VMD decomposition applicable to the field of time-series forecasting is proposed, and SSA is used to optimize the parameters of VMD under this criterion, which reduces the randomness of setting VMD parameters by artificial experience, reduces the loss of original load data decomposition, and improves the decomposition effect;
(3) Using the optimized VMD to decompose the original load data, the high- and low-frequency components from decomposition are fed into different models for forecasting—the low-frequency components are fed into MLR for forecasting and the high-frequency components are fed into LSTM for forecasting. The trend load values over different time spans are obtained, and the first stage of forecasting is completed;
(4) Different from previous models in which the component load forecasting values were directly summed up, this paper makes error corrections to the trend load values in the second stage. PCC and MIC are used to select the relevant factors with a high degree of influence; ACF is used to select the node load values with a high influence on the load values of the nodes to be predicted. The fusion of the forecasting values in the first stage and the selected feature values by feature engineering is used to construct feature vectors, which are input into the fully connected layer for forecasting and complete the error correction in the second stage. The accuracy of the final load forecasting value obtained is effectively improved.
The rest of this paper is organized as follows.
Section 2 presents the analysis of power load characteristics and influencing factors.
Section 3 presents the sparrow search algorithm optimizing variational mode decomposition.
Section 4 explores the proposed method.
Section 5 provides an analysis and discussion of the experimental results.
Section 6 concludes this paper.
5. Experiment and Results
The dataset of Place 1 used in this paper was from China, spanning the period from 1 January 2013 to 10 January 2015, with a sampling interval of 1 h and 24 sampling points per day, for a total of 17,760 pieces of data. In this paper, the first 17,736 points of load data are divided, with 85% assigned as the training set and 15% as the validation set, and the load data of 10 January 2015 were used as the test set. The trained model is used to forecast the load values of the 24 moment nodes on 10 January 2015, and the performance metrics of the method are analyzed.
5.1. Evaluation Criteria
In this paper, the following three items are used as the evaluation criteria for the accuracy of the forecasting model: root mean square error (RMSE), mean absolute percentage error (MAPE) and mean absolute error (MAE), which are calculated as Equations (27)–(29).
where
is the forecasting load of the model;
is the true value of the load;
is the number of predicted time nodes. In this paper the load value is predicted for the next 24 time nodes, so
.
5.2. Data Processing
The difference in magnitude between data with different indicators affects the convergence training speed of the neural network model, and also tends to cause the problems of gradient disappearance and gradient explosion. Therefore, in this paper, the data are normalized using the Min–Max normalization process. The formula is as follows.
where
is the normalized data,
;
is the original data;
is the maximum value in the data; and
is the minimum value in the data.
In addition, for discontinuous data such as the time-type factors in this paper, one-hot coding is used.
5.3. Multi-Step and Iterative Forecasting Approach
Load forecasting can be classified by the output dimension of the forecasting model into single-step forecasting and multi-step forecasting, with single-step forecasting only outputting the forecasting value for one future time node and multi-step forecasting outputting a multi-dimensional vector containing the forecasting values for multiple time nodes future. The purpose of short-term load forecasting in this paper is to obtain the load values for the next 24 moments. There are direct and iterative forecasting approaches for short-term load forecasting, but considering that multi-step forecasting will cause a lag in the input sequence, there is a lag problem in the characteristics, and this will affect the accuracy of forecasting. Moreover, considering the application scenario of the forecasting model, the load values for the next 24 moments are predicted on the previous day. Therefore, the forecasting process uses an iterative forecasting approach, whereby the predicted value at each time t is obtained, and then the predicted value at time t is added to the feature vector for the next moment at time t + 1 to obtain the predicted value at time t + 1, and so on, to predict the values for the next 24 time points on the next day.
5.4. Results
The parameter combination
was obtained by the SSA optimization of VMD. The original load data from Place 1 were decomposed using the optimized VMD parameter combination, and the decomposition effect is shown in
Figure 7. The first bar in
Figure 7 shows the original load data with strong random fluctuations. After the decomposition, four IMFs and Res are obtained, which realize the noise reduction of the original sequence, and the fluctuations of several components show a certain regularity. According to the oscillation frequency of each component curve used to distinguish the low-frequency component and high-frequency component, IMF1 is the low-frequency component (using MLR for forecasting), and the remaining several components and Res are the high-frequency components (using LSTM for forecasting), thus constructing a hybrid forecasting model to complete the first stage of forecasting.
This paper uses the training set to train the parameters of the first stage in the learning model. In order to avoid information leakage and overfitting problems caused by using the training set data again in the second stage of learning, the forecasting values of each component are obtained in the second stage using the validation set for prediction, and the component forecasting values are combined with the influencing factors selected by PCC and MIC and the loadings at the corresponding moment selected by the ACF function to construct the feature vector. We input the feature vector into the fully connected layer for the second stage of forecasting to obtain the final forecasting values. In this paper, the load values for the 24 moments on 10 January 2015 are predicted by the iterative forecasting approach.
5.4.1. Analysis of SVLM–FE Forecasting Results
Several classical forecasting models were chosen for comparison in the experiments of this paper to evaluate the performance of the SVLM–FE hybrid model proposed in this paper, using SVR, MLR, the LSTM network and the GRU network, respectively. Experiments have also been conducted using the SVLM model as a means of comparing the improvement in prediction accuracy before and after error correction in the SVLM–FE model. Their feature vectors were constructed in the same way as the feature vectors in the first stage of the SVLM–FE model, and iterative forecasting was used to predict the load values for 24 moments in the future day.
From the analysis of the data in
Table 5 and
Figure 8, we can conclude that the three criteria of RMSE, MAE and MAPE are optimal for SVLM–FE, and the three criteria of SVR are the worst. Compared to the SVLM model, the RMSE of SVLM–FE decreased by 56.277 MW (30.51 percentage points), MAE decreased by 48.347 MW (32.05 percentage points), and MAPE decreased by 32.64 percentage points. Compared to the classical LSTM model, the SVLM–FE showed a 233.742 MW (a 64.59 percentage point decrease in RMSE, a 165.639 MW (a 61.77 percentage point decrease in MAE) and a 59.78 percentage point decrease in MAPE. Compared to the single GRU model, SVLM–FE’s RMSE decreased by 273.383 MW (68.08 percentage points), the MAE decreased by 252.030 MW (71.08 percentage points) and MAPE decreased by 70.80 percentage points. The indicators for MLR and GRU are relatively similar: compared to MLR, SVLM–FE’s RMSE decreased by 264.848 MW (67.39 percentage points), the MAE decreased by 255.257 MW (71.34 percentage points) and the MAPE decreased by 71.57 percentage points. In all three metrics, both the comparison with the SSA–VMD-based hybrid model SVLM and the comparison with the single model showed higher forecasting accuracy.
In terms of computing time, MLR takes the shortest time of 1.621 s, and has higher forecasting accuracy than SVR, so MLR is used to forecast the low-frequency components in the method proposed in this paper. LSTM requires a longer computing time than GRU, but LSTM has higher forecasting accuracy, so LSTM is used to forecast the high-frequency components in the method proposed in this paper. The forecasting method SVLM–FE proposed in this paper has the longest computing time of 276.158 s, which is due to the fact that the SSA–VMD decomposes the original load data into multiple components, and each component is input into a separate model for forecasting, which increases the computing time. At the same time, the SVLM–FE adds the error correction work of the second fully connected layer to SVLM, so SVLM–FE takes more time than SVLM. Considering that, in practical applications, power plants and grids need more accurate short-term load forecasting to assist the dispatching work of various departments, the research purpose of this paper is to improve the accuracy of short-term power load forecasting, and the computing time of the proposed SVLM–FE model in this paper can meet the time demand in practical applications. Moreover, with the continuous improvement of hardware computing power, the computing time of the proposed model SVLM–FE will be further reduced.
To verify the ability of the SVLM–FE model to fit the load curve, a load curve plot was drawn, as shown in
Figure 9.
In
Figure 9 it can be seen that the forecasting results of the SVLM–FE model are the closest to the true load values, thus successfully predicting key points in the day at all times. At 11 am, when the peak load is reached, the SVLM–FE model is the closest, and at 1 pm, when the load peaks, and 4 pm the model has the lowest error. The SVLM–FE accurately predicts this load path with less error at the peak and trough; the SVLM is also able to predict the load path for this time period, but with a larger error value than the SVLM–FE. Other models did not make good directional judgements in the face of this rapidly changing load, with flatter curves that did not fit the small peaks and troughs well.
Therefore, in the process of the SSA search for the key parameter set of VMD, the decomposition evaluation criteria proposed in this paper are used as the fitness function, which can effectively reduce the volatility of the load data and establish the foundation for the construction of a highly accurate short-term load forecasting model. At the same time, the component load forecasting values obtained in the first stage and the feature values obtained by feature selection are used to construct a feature vector, which is fed into the fully connected layer for forecasting. This completes the error correction for the first-stage forecasting, and the method is experimentally proven to be very effective.
5.4.2. Analysis of SVGM–FE Forecasting Results
In order to further verify the effectiveness of SSA–VMD for load data decomposition, the effectiveness of constructing hybrid forecasting models based on SSA–VMD, and the effectiveness of error correction in two-stage forecasting for load component forecast values, this paper builds the model SSA–VMD–GRU–MLR (SVGM) and a two-stage forecasting model SVGM–EF. The newly constructed models were used to forecast the load values on 10 January 2015.
The statistical analysis in
Table 6 and
Figure 10 leads to the following conclusions. The hybrid forecasting models SVGM and SVGM–FE, constructed after SSA–VMD decomposition, have significantly improved forecasting accuracy compared to the GRU forecasting model and the MLR forecasting model. Compared to the GRU model, the RMSE of the SVGM model decreased by 146.665 MW (36.52 percentage points), the MAE decreased by 159.268 MW (44.92 percentage points), and the MAPE decreased by 51.61 percentage points. Compared to the MLR model, the RMSE of the SVGM model decreased by 138.120 MW (35.14 percentage points), MAE decreased by 162.495 MW (45.42 percentage points), and MAPE decreased by 52.88 percentage points. The experiments demonstrate the effectiveness of constructing a hybrid forecasting model based on SSA–VMD. Compared with the SVGM model, the RMSE of the SVGM–FE model decreased by 77.099 MW (30.25 percentage points), the MAE decreased by 60.357 MW (30.91 percentage points), and the MAPE decreased by 23.56 percentage points. In terms of computing time, the SVGM–FE has the longest computing time of 204.812 s, which means it is able to meet the time requirements in practical applications.
Table 5 and
Table 6 show that the forecasting accuracy of SVLM–FE is the highest, followed by SVGM–FE, then SVLM, and SVGM ranks the fourth, thus further validating the effectiveness of the two-stage short-term load forecasting method constructed based on SSA–VMD. Due to the better forecasting performance of SVLM–FE than SVGM–FE, the LSTM was used in the forecasting method proposed in this paper for the high frequency component. In terms of computing time, although the computing time of SVLM–FE is 71.346 s longer than SVGM–FE, SVLM–FE has higher forecasting accuracy, and the computing time of SVLM–FE can meet the time requirements in realistic scenarios, which proves the superiority of the performance of the proposed forecasting method, SVLM–FE.
As shown in
Figure 11, the forecasting values of the SVGM model are closer to the true values than the single GRU and MLR forecasting models; after the second stage of error correction, the errors in the forecasting values obtained by SVGM–FE are further reduced, and the best performance is achieved at both peak and trough load values, including at small peaks and troughs, where the SVGM–FE model forecasts values closest to the true values. SVGM–FE achieves a better forecasting performance for the details of the trend of load values over the next 24 moments. These experimental results further demonstrate the validity of the hybrid SSA–VMD-based model and the superiority of the second stage of forecasting with correction for error. Moreover, the forecasting accuracy of SVLM–FE is higher than that of SVGM–FE, so the final model proposed in this paper is SVLM–FE.
5.5. Experiment in Place 2
To further validate the high accuracy of the forecasting method SVLM–FE proposed in this paper, as well as its stability, experiments were conducted on another dataset and the forecasting results were analyzed. This dataset was from Place 2 in China and spanned the period from 1 January 2013 to 10 January 2015, with a sampling interval of 1 h, 24 sampling points per day and a total of 17,760 pieces of data. This experiment also divided the first 17,736 pieces of load data, with 85% assigned as the training set and 15% as the validation set, and the load data of 10 January 2015 were used as the test set. The trained model was used to predict the load values at 24 momentary points on 10 January 2015, again using single-step forecasting and iterative forecasting. The experimental results of the models on this dataset are analyzed below.
From the analysis of the evaluation criteria in
Table 7 and
Figure 12, we can draw the following conclusions. The three indexes RMSE, MAE and MAPE of SVLM–FE are the best, and the three indexes of SVR are the worst. Compared to the SVLM model, the RMSE of SVLM–FE decreased by 72.845 MW (39.49 percentage points), the MAE decreased by 62.645 MW (40.43 percentage points), and the MAPE decreased by 35.39 percentage points. Compared to the classical LSTM model, the SVLM–FE showed a decrease in RMSE of 259.448 MW (69.92 percentage points), a decrease in MAE of 186.862 MW (66.94 percentage points) and a decrease in MAPE of 62.37 percentage points. Compared to the single GRU model, SVLM–FE saw a decrease in RMSE of 264.594 MW (70.36 percentage points), a decrease in MAE of 207.105 MW (69.17 percentage points) and a decrease in MAPE of 66.03 percentage points. Compared to the MLR model, SVLM–FE’s RMSE decreased by 287.449 MW (72.03 percentage points), its MAE decreased by 273.503 MW (74.77 percentage points) and its MAPE decreased by 74.03 percentage points. In terms of computing time, SVLM–FE still requires the longest time of 273.897 s, which means it is able to meet the time requirements of practical applications.
In
Figure 13 it can be seen that the forecasting result of the SVLM–FE model is the closest to the real load value, and accurately predicts the key nodes in each moment of the day. The SVLM–FE model predicts the peak and trough values of the load, as well as the small peaks and troughs, with high accuracy. SVLM–FE accurately fits the load curve of the day, and gives a good judgment on the trend of the load and many details. Especially after 13:00 on the day, the SVLM–FE prediction model made an accurate judgment on the rapid change of load, while other models did not make an accurate judgment in the face of this rapid change.
As can be seen in
Table 8, and
Figure 14 and
Figure 15, the SVGM–FE model has the highest prediction accuracy and the best fit to the load curve in the comparison of the four prediction models, SVGM–FE, SVGM, GRU and MLR, which further demonstrates the effectiveness of constructing a short-term load forecasting model based on SSA–VMD and feature selection. A comparison with
Table 7, and
Figure 12 and
Figure 13, shows that the forecasting performance of SVLM–FE is better than that of SVGM–FE.
In summary, in the experiments on the dataset from Place 2, the forecasting method SVLM–FE proposed in this paper achieved the best forecasting results, which means it has high accuracy and stability.
6. Conclusions
Accurate short-term power load forecasting is of great importance to the reliable and safe operation of power systems. Usually, the power load undergoes random fluctuations, and there are many complex factors influencing the load. In order to improve the accuracy of short-term load forecasting, assist in a higher level of production scheduling and planning between each department of the power system, and at the same time achieve energy saving and emission reductions, this paper proposes a short-term load forecasting model based on SSA–VMD and feature selection. The following conclusions were obtained experimentally:
(1) In view of the non-stationary characteristics of the load, the VMD is used to decompose the original load into multiple components. It is proposed to use the mean absolute error as the decomposition quality evaluation criterion, which can be extended to apply to the decomposition problem of data in time series forecasting. In order to solve the problem whereby VMD decomposition relies on the artificial setting of key parameter sets, this paper adopts SSA for VMD optimization, and uses the decomposition evaluation criterion proposed in this paper as the fitness function to construct an SSA–VMD decomposition optimization algorithm. The algorithm reduces the signal loss in the decomposition process and improves the quality of VMD decomposition, thus reducing load volatility, more effectively mining the deep time series features in load data, and laying the foundation for building a high-precision forecasting model;
(2) After decomposing the original load data via SSA–VMD, the different frequency components were forecasted separately using different methods. The SVLM model constructed in this paper shows a significant decrease in RMSE, MAE and MAPE compared to the single LSTM and MLR models. It proves the effectiveness of constructing a hybrid forecasting model based on SSA–VMD;
(3) The SVLM–FE model constructed in this paper is based on a two-stage hybrid forecasting idea. The model takes full account of the influence of load trend on the load value of the node to be predicted and the influence of related factors on the load. In the first stage, only the trend influence of power load is considered, and the forecasting value of each component is predicted. The component forecasting value is input as the feature vector to the second stage for error correction to further reduce the error. In the second stage of the model, the load component forecasts obtained in the first stage, the selected influencing factors using PCC and MIC, and the selected load values of the time nodes using ACF are fused and reconstructed into the feature vector, which is fed into the fully connected layer for forecasting to obtain the final forecasting values. Compared with the SVLM forecasting model without error correction, the forecasting accuracy of SVLM–FE was further improved, with RMSE, MAE and MAPE decreasing by 30.51%, 32.05% and 32.64% respectively. Moreover, SVLM–FE performs well at both peaks and troughs of load, and can respond promptly and accurately to the small peaks and troughs that occur when high-frequency changes in load occur. This proves that the two-stage hybrid forecasting model is effective in error correction. Additionally, the SVGM–FE model built on this idea in this paper shows sub-optimal performance in forecasting.
The forecasting method SVLM–FE proposed in this paper has a high forecasting accuracy. It is worth noting that some of the hyperparameters in the model need to be set artificially, such as the setting of hyperparameters in LSTM networks, which relies on a lot of artificial experience. In the future work, a swarm intelligence optimization algorithm can be used to determine some of the hyperparameters in the model.