Short-Term Electric Load Forecasting Based on Variational Mode Decomposition and Grey Wolf Optimization

: Short-term electric load forecasting plays a signiﬁcant role in the safe and stable operation of the power system and power market transactions. In recent years, with the development of new energy sources, more and more sources have been integrated into the grid. This has posed a serious challenge to short-term electric load forecasting. Focusing on load series with non-linear and time-varying characteristics, an approach to short-term electric load forecasting using a “decomposition and ensemble” framework is proposed in this paper. The method is veriﬁed using hourly load data from Oslo and the surrounding areas of Norway. First, the load series is decomposed into ﬁve components by variational mode decomposition (VMD). Second, a support vector regression (SVR) forecasting model is established for the ﬁve components to predict the electric load components, and the grey wolf optimization (GWO) algorithm is used to optimize the cost and gamma parameters of SVR. Finally, the predicted values of the ﬁve components are superimposed to obtain the ﬁnal electric load forecasting results. In this paper, the proposed method is compared with GWO-SVR without modal decomposition and using empirical mode decomposition (EMD) to test the impact of VMD on prediction. This paper also compares the proposed method with the SVR model using VMD and other optimization algorithms. The four evaluation indexes of the proposed method are optimal: MAE is 71.65 MW, MAPE is 1.41%, MSE is 10,461.32, and R 2 is 0.9834. This indicates that the proposed method has a good application prospect for short-term electric load forecasting.


Introduction
As a result of increasing air pollution and the threat of global warming, there is a growing demand for the development of new technology for energy generation [1]. With the development of electrical energy conversion technology [2,3], a large number of new energy sources are being integrated into the grid. Hybrid vehicles are also developing rapidly [4]. These developments are leading to changes in the grid pattern and electricity consumption structure. In next-generation smart cities, new energy technologies, IoT technologies, and artificial intelligence technologies are combined to provide better energy services for customers and ensure sustainability [5].
The electric energy generated by new energy sources is affected by many factors, including light, wind speed, and so on. Power generation and power frequency are uncertain, which brings great challenges to load balance, load management, and stable operation [6] of the grid. Accurate prediction of load demand has important significance for the stable operation of the power system and planning of the grid [7,8]. According to the length of the prediction cycle, electric load forecasting can be divided into three categories: long-term, medium-term, and short-term electric load forecasting. Short-term electric load forecasting has a great impact on price determination, power dispatch, power system economic operation, and reliability and thus is a crucial type of electric load forecasting.

Related Work
Traditional methods include the autoregressive integrated moving average model (ARIMA) [14,15], linear regression model [16,17], multiple linear regression (MLR) model [18,19], and exponential smoothing model [20,21]. The traditional methods work well in typical cases, but the effect is poor on weekends, holidays, and other special time nodes. On the other hand, the traditional methods need a large amount of historical data and a long training time. It is difficult for traditional methods to accurately predict shortterm electric load when data is small or missing. A large number of studies have shown that the load series is non-linear and time-varying. Traditional linear regression methods have problems such as inaccuracy and the need for a large amount of historical data.
Machine learning methods have shown good performance in finding the potential rules and complex characteristics of data, so they are considered by many researchers to have promising application in the field of energy prediction. Manohar et al. [22] used the back propagation (BP) neural network to predict short-term load and achieved high accuracy. Jigoria-Oprea et al. [23] obtained good results when using the artificial neural network (ANN) to predict the load with information on working days, weekends, and holidays. Hong [24] used SVR in combination with the chaotic artificial bee colony algorithm (CABC) for electric load forecasting and achieved high prediction accuracy. Aly et al. [25] used ANN, wavelet neural network (WNN), and the Kalman filter (KF) to build load models to predict short-term load. In addition, deep learning algorithms have been widely used in the field of electric load forecasting, including the deep neural network (DNN) [26,27], long-and short-term memory network (LSTM) [28,29], and convolution neural network (CNN) [30,31]. Guo [32] transformed the load series into an image and extracted the features of different scales through CNN, then used LSTM to predict the load. However, these methods require a large amount of data, and the model input is more complex, which requires a variety of influencing factor data.
As a hot research topic in recent years, the "decomposition and integration" method involves decomposition of the original series into several components, the processing of each component, and finally integration of the prediction results of each component to obtain the final result. At present, this method has been widely used in wind speed prediction [33,34], fault diagnosis [35,36], and biological signal processing [37,38]. It has also been applied in the field of electric load forecasting, Li et al. [39] decomposed the load series by wavelet transform and then used the improved artificial bee colony optimization extreme learning machine (ELM) to predict the load. Li et al. [40] used wavelet transform (WT) combined with extreme learning machine and partial least squares regression to predict load series. Ding [41] used wavelet transform combined with feature selection for data preprocessing and then used relevance vector machine (RVM) to predict the load. Kassa et al. [42] used EMD to decompose the raw load data and then predicted the individual components using a prediction model optimized by the particle swarm optimization algorithm (PSO). By using the genetic optimization algorithm (GA) [43,44], particle swarm optimization algorithm [45,46], artificial bee colony algorithm (ABC) [47,48], and grey wolf optimization algorithm [49], the parameters of the prediction model can be optimized to improve the accuracy of the model. However, the wavelet transform is generally effective in decomposing nonlinear series, and EMD is prone to modal mixing when performing the decomposition. Table 1 shows a comparison of related works. The wavelet transform is generally effective in decomposing nonlinear series, and the EMD is prone to modal blending Through the above related work, we can conclude that using the idea of "decomposition and integration" for forecasting load series can help to reduce the complexity of the model input and the dependence on historical data; on the other hand, using optimization algorithms to optimize the parameters of the forecasting model can help to further improve the forecasting accuracy. Therefore, we need to use an appropriate method to decompose the series and then combine it with an optimization algorithm to predict each component.

Methods
In this part, the methods adopted in the VMD-GWO-SVR are explained.

Variational Mode Decomposition (VMD)
VMD [9] is a non-recursive adaptive decomposition method proposed by Dragomiretskiy et al. This method realizes the optimal mode decomposition by calculating the optimal solution of the variational mode and finally decomposes the original signal into the sum of the intrinsic mode function (IMF) with different central frequencies. Therefore, VMD can effectively avoid modal aliasing and the endpoint effect.
The goal of VMD is to decompose the input signal into a number of sub-signals, namely, a number of intrinsic mode functions (IMFs). The intrinsic mode function can be regarded as several AM-FM components u k (t), which have limited bandwidth and central frequency.
where A k (t) is the instantaneous amplitude of u k (t) and ω k (t) is the instantaneous power of u k (t).
Equation (1) is the expression of intrinsic mode function. It can be seen from this equation that the intrinsic mode function is a modulation function. At this time, the Hilbert transform is used for u k (t) to obtain the unilateral spectrum after the intrinsic mode function analysis: By estimating the exponential function e −jω k of the center frequency ω k , the spectrum of each intrinsic mode function from Equation (1) can be modulated to the fundamental frequency band, namely In this case, the variational mode decomposition becomes the problem of constructing and solving constraints, that is, decomposing the original signal into several intrinsic mode function (IMF) components. This constraint condition can be expressed as where {u k } = {u 1 , u 2 , . . . , u k } is the decomposed eigenfunction component, {ω k } = {ω 1 , ω 2 , . . . , ω k } is the center frequency of the corresponding component, f is the input signal, and δ(t) is the unit pulse function. The constrained problem is transformed into a non-constrained problem by introducing the quadratic penalty term and Lagrange multiplier operator, and the optimal solution of the model is calculated. The results are as follows: where α is a quadratic penalty factor and λ(t) is the Lagrange factor. By using the Alternate Direction Method of Multipliers (ADMM), the values of u n+1 k , ω n+1 k and λ n+1 are updated, and the extreme points of the augmented Lagrange function are calculated to decompose the original signal into k intrinsic mode function.
The updated function of u n+1 k is as follows: The equation is transformed from time domain to frequency domain by equidistant Fourier transform. The frequency domain iterative equation is as follows: Similarly, the central frequency ω n+1 k can be converted to the frequency domain, and the iterative equation is λ can be updated by Equation (10): The termination condition of parameter iteration is set as the iteration accuracy ε > 0. When the iteration satisfies Equation (11) below, the iteration is terminated, and K IMF components are obtained.

GWO-SVR
GWO [41] is a swarm intelligence optimization algorithm proposed by Mirjalili et al. in 2014. It is derived mainly by imitating the predatory behavior of the grey wolf population. The optimization is realized by tracking, enclosing, chasing, and attacking the prey. The GWO algorithm has the advantages of a simple principle, fewer parameters to be adjusted, easy implementation, strong global search ability, etc. SVR [41] is a branch of support vector machine (SVM) learning and is a machine learning algorithm based on statistical principles. In the process of training the model, the grey wolf optimization algorithm is used to optimize the penalty coefficient cost and kernel function parameter gamma of support vector regression, which can improve the prediction accuracy of the model. The Figure 1 is the algorithm flow chart of GWO-SVR. The steps for GWO to optimize SVR parameters are as follows: Step 1: Set the parameters of the GWO optimization algorithm, including initial population size M, the maximum number of iterations N, and the search range of cost parameters and gamma parameters.
Step 2: Initialize the position of each wolf in the wolf group; the position of the grey wolf is determined by the values of the cost parameter and gamma parameter. Set synergy parameters a, A, and C, where |A| > 1 forces wolf and prey separation, 0 < |C| <2, and a are linearly decreased from 2 to 0 over the course of iterations.
Step 3: Calculate the fitness of the corresponding parameters of each wolf. According to the numerical order of fitness, the grey wolf is divided into four classes: α, β, δ, and ω.
Step 4: According to Equations (12)- (17), update the position of each grey wolf in the wolf group. Compare the fitness of the corresponding parameters in the new position of the grey wolf with that before the update to determine whether to replace the fitness.
Step 5: Update a, A, and C values.
Step 6: Calculate the fitness of the current parameters for each wolf.
Step 7: Determine whether the current iteration number is n < N; if not, output the global optimal value. If so, return to Step 4.
Step 8: Output the position of α wolf. The corresponding position of α wolf is the optimal cost parameter and gamma parameter.
Step 9: Establish the SVR regression model using the optimal cost parameters and gamma parameters.  Figure 2 is the framework of the proposed method. First, the load series is decomposed into several high-frequency components and low-frequency components by VMD, and then each component is input into the GWO-SVR model for prediction. Finally, the predicted value of each component is superimposed to obtain the final prediction result.

Model Evaluation Indicators
In this study, the evaluation index is used to evaluate the performance of the prediction model. These indicators are mean absolute error (MAE), mean absolute percentage error (MAPE), mean square error (MSE), and determination coefficient (R 2 ). Adjusted where L t andL t are the actual and forecasting values of the load at time t, M is the total number of data used, and p is the number of features.

Data Sets
To verify the accuracy of the proposed model, we use the hourly load data of the grid in Oslo and its surrounding areas in 2019 [50] as the experimental data. It is found that 96 groups of load values are missing in February, and one group of load values is missing in March. Therefore, this paper uses the expectation-maximization method to fill the missing data. Through analysis of the filled data, it can be seen that the electricity consumption in January, February, and December is larger, and the electricity consumption in June, July, and August is smaller, as shown in Table 2. In this paper, the experimental data is divided into the training set and the prediction set according to the ratio of 4:1. The first 7008 h of data are used as the training set, and the rest of the data are used as the prediction set. We divide the load data into groups every four hours and forecast the load value of the next hour through the load data of the first three hours. Therefore, the number of features in this paper is three.
Combined with Figure 3, we can see that the load value shows a downward trend from January to August, reaching the lowest electricity consumption in August, and the load value shows an upward trend from August to December. Through the variance and standard deviation, we find that the January, February, April, and December load value fluctuations are larger, whereas the June, July, August, and September load fluctuations are smaller. Experiments were conducted on 64-bit Windows 10 using MATLAB R2018a with an i7-7700hq CPU and a GTX-1050 graphics card.

Variational Mode Decomposition
The VMD algorithm needs to set the number of decomposition modes K in advance. In this paper, the K value is determined by observing the central frequency of each mode. If the central frequency is similar to the modal component, it is considered that VMD has overdecomposition. Figure 4 shows the decomposition results of three cases with K values of 5, 6, and 7. Table 3 shows the modal center frequencies of different modal decompositions. The results of modal decomposition show that when the number of decomposed modes is 6, the waveforms of IMF5 and IMF6 are similar and the center frequencies are 0.327434 and 0.424177, respectively. When K = 7, the waveforms of IMF4, IMF5, IMF6, and IMF7 are similar, and the central frequencies are 0.2042, 0.2602, 0.3380, and 0.4258, respectively. Therefore, it can be determined that over-decomposition occurs when K > 5, so K = 5.  It can be seen from Figure 4 that the power consumption of IMF1 after decomposition is the smallest in June, July, and August and the largest in January, February, and December, which is in line with the law before the decomposition of load data. On the other hand, the load data can be regarded as composed of several high-frequency components and low-frequency components. The load has relatively complex data characteristics, and the load characteristics at different scales can be better displayed by modal decomposition.

Influence of Each Modal Component on Electric Load Forecasting Results
In order to further explore the influence of the prediction results for each component, we summarize the prediction effect of GWO and artificial bee colony algorithm optimized SVR models for five components after VMD decomposition, as shown in Table 4. We can see that the prediction effect of GWO on IMF1 and IMF2 is almost the same as that of ABC. Among the five indicators predicted by IMF3, MSE and MAPE of ABC are better than those of GWO; R 2 and Adjusted R 2 are the same as that of GWO; and MAE is slightly worse than that of GWO. In the prediction of IMF4 by GWO, MAE, and MAPE were better than ABC; MSE, R 2 , and Adjusted R 2 were slightly worse than ABC. For the prediction results of IMF5, the MSE and MAPE of GWO-SVR and ABC-SVR are the same, but the MAE, R 2 , and Adjusted R 2 of GWO-SVR are better than ABC-SVR. Thus, we can conclude that GWO-SVR and ABC-SVR have similar prediction performance in low-frequency components, but GWO-SVR has higher prediction accuracy in high-frequency components. Through five indexes we can see that the difficulty of the prediction increases with the increase of component frequency. To better compare the performance of the two models, we also compare the training time and predicting time of the two models. As shown in Table 5  As shown in Table 6, we can see that the low-frequency component often has a larger value, which has a greater impact on the accuracy of electric load forecasting results.

Comparison of GWO-SVR Combined with Different Modal Decomposition Methods
In order to compare the influence of different modal decomposition methods and to determine whether to use the modal decomposition method for electric load forecasting, the data processed by EMD and the data without modal decomposition are compared with the data processed by variational mode decomposition. The regression algorithm adopts the GWO-SVR algorithm. The initial population number for GWO optimization is set to 30, and the optimization range of the cost parameter and gamma parameter is set to [0.01, 100]. The Table 7 shows the GWO-SVR prediction results under different modal decompositions. The training time in the table is the sum of the training time of the model for the five components, and the predicting time is the sum of the predicting times for the five components. Through the analysis, we can see that the prediction results for the MAE, MAPE, and MSE of GWO-SVR with mode decomposition are reduced compared with those without mode decomposition, and the determination coefficient R 2 is increased from 0.9643 to 0.9776 and 0.9834, which shows that the load series decomposition with mode decomposition is helpful to improve the accuracy of electric load forecasting. Through the comparison between VMD decomposition and EMD decomposition, it can be found that MAE decreases by 0.9729, MAPE decreases by 0.57%, MSE decreases by 3684.72, and R 2 increases by 0.0058. It can be seen that VMD has a better effect than EMD in the field of electric load forecasting.
On the other hand, we can see that the training time of the VMD-GWO-SVR model is significantly shorter than the other two algorithms by comparing the predicting time and testing time of the three models. The predicting time is 0.024 s longer than that of GWO-SVR. Considering the prediction accuracy and computation time, we can determine that the prediction performance of VMD-GWO-SVR is better than that of EMD-GWO-SVR and GWO-SVR.
It can be seen from Figure 5 that VMD-GWO-SVR has a good prediction effect. The trend of the predicted value and the actual value of this model is the same (R 2 is 0.9834), and the prediction error is small (MAE is 71.6453, MAPE is 0.1413, and MSE is 10461.32). Therefore, this model has high prediction accuracy.

Comparison of Variational Modal Decomposition Combined with Different Optimization Algorithms
In order to verify the accuracy of the proposed method in the field of short-term electric load forecasting, this paper not only makes a longitudinal comparison with the prediction results of the SVR without the optimization algorithm but also compares it with the prediction results of the SVR algorithm using the GA, PSO and ABC. The initial population of the genetic optimization algorithm, particle swarm optimization algorithm, and artificial bee colony optimization algorithm is set to 30, the maximum number of iterations is set to 100, the search range of cost parameters and gamma parameters is set to [0.01, 100], and the load value is normalized to [−1, 1]. The above work ensures the unity of initial conditions. It can be seen that the GWO-SVR and SVR prediction results have been improved under the premise of using VMD for modal decomposition, which proves that the use of the modal decomposition method is helpful to improve the accuracy of electric load forecasting. By comparing the prediction results of GWO use for optimization of SVR parameters, we find that the prediction accuracy of the SVR model optimized by GWO under the premise of modal decomposition is significantly improved, and MAE is reduced by 62.5705 compared with VMD-SVR, which is only 53.4% of VMD-SVR. MAPE decreases by 0.01242 compared with VMD-SVR, which is only 53.2% of VMD-SVR. MSE decreases by 17,347.3592 compared with VMD-SVR, which is only 37.7% of VMD-SVR. R 2 increases by 0.027. Based on the above analysis, we can draw the conclusion that parameter optimization of the SVR model using an optimization algorithm is helpful to improve the prediction accuracy.
The training time in specified in Table 8 is the sum of the training times of the models for the five components, and the predicting time is the sum of the predicting times for the five components. By comparing the predicting time and training time of SVR and VMD-SVR and comparing the predicting time and training time of GWO-SVR and VMD-GWO-SVR, we can see that using VMD to decompose the original load series not only improves the prediction accuracy of the model but also reduces the computation time of the model. Among the four models mentioned above, VMD-GWO-SVR has the highest prediction accuracy, and the model training time is longer than that of VMD-SVR, but the three indexes of MAE, MAPE, and MSE have more obvious advantages than VMD-SVR. Considering the prediction time and accuracy, VMD-GWO-SVR is more suitable for short-term electric load forecasting.  Table 9 shows the prediction results of different optimization algorithms combined with SVR after VMD modal decomposition. It can be seen that the prediction accuracies of ABC-SVR and GWO-SVR are relatively close. The prediction accuracy of PSO-SVR is lower than that of ABC-SVR and GWO-SVR, and the prediction accuracy of GA-SVR is the worst. Further comparison of the four evaluation indexes of ABC-SVR and GWO-SVR shows that MAE, MAPE, and MSE of GWO-SVR are smaller, indicating that the predicted results are closer to the actual load value than ABC-SVR. The R 2 of 0.9834 is greater than 0.9767 of ABC-SVR, indicating that the trend of GWO-SVR prediction results is closer to the trend of actual load value than that of ABC-SVR. In summary, we can conclude that GWO-SVR has higher accuracy in the field of electric load forecasting than SVR optimized by other optimization algorithms, and it can also better predict the trend of actual load value change.

Conclusions
In this paper, a prediction model based on the "decomposition and integration" method combined with VMD and GWO-SVR is proposed. First, the load series is decomposed by VMD, and the optimal modal number K = 5 is selected by observing the central frequency of the component combined with the component waveform. Then, the five decomposed components are input into the GWO-SVR model for parameter optimization. Finally, the five prediction results are superimposed to obtain the final prediction results. In this paper, the proposed method is compared with other prediction methods. The 2019 hourly load data of the first landmark area of Norway, Oslo, and its surrounding areas are used. The conclusions are as follows: (1) Modal decomposition of the load series can effectively separate the complex information contained in the original series and then predict each component separately, which helps to improve the prediction accuracy of the model and the prediction ability of the load value trend. Regarding the load data decomposition method, VMD decomposition can better separate the feature information from the original series into different components. In addition, the decomposition of the load series using VMD can also effectively reduce the model computation time.
(2) Compared with other optimization algorithms after VMD processing, the GWO-SVR algorithm is the best in all four indicators. Compared with ABC's five component prediction results, GWO's prediction results in IMF1-IMF3 are almost the same, but it has better prediction effect on IMF4 and IMF5, indicating that GWO has better accuracy in high-frequency component prediction. On the other hand, the computation time of VMD-GWO-SVR for all five components is shorter than that of the corresponding components of VMD-ABC-SVR.
(3) This method has high accuracy in electric load forecasting and can better predict the trend of load change, which indicates that this method has good application prospect for short-term electric load forecasting.
In future work, we will investigate the delay between time series and apply other methods to deal with the high-frequency components of the modal decomposition in order to further improve the accuracy of the model.

Data Availability Statement:
Restrictions apply to the availability of these data. Data was obtained from ENTSO-E and is available at https://transparency.entsoe.eu/dashboard/show (accessed on 17 October 2020).

Acknowledgments:
We would like to thank the ENTSO-E for making the data available.

Conflicts of Interest:
The authors declare no conflict of interest.