A Hybrid System Based on LSTM for Short-Term Power Load Forecasting

: As the basic guarantee for the reliability and economic operations of state grid corporations, power load prediction plays a vital role in power system management. To achieve the highest possible prediction accuracy, many scholars have been committed to building reliable load forecasting models. However, most studies ignore the necessity and importance of data preprocessing strategies, which may lead to poor prediction performance. Thus, to overcome the limitations in previous studies and further strengthen prediction performance, a novel short-term power load prediction system, VMD-BEGA-LSTM (VLG), integrating a data pretreatment strategy, advanced optimization technique, and deep learning structure, is developed in this paper. The prediction capability of the new system is evaluated through simulation experiments that employ the real power data of Queensland, New South Wales, and South Australia. The experimental results indicate that the developed system is signiﬁcantly better than other comparative systems and shows excellent application potential.


Introduction
Along with fast-economic development, power enterprises continue to expand their construction scales, and the corresponding power grid structures and operation modes are gradually becoming diversified [1]. Because electricity is a special type of energy, people cannot store much electricity. This requires the power generation capacity of power generation enterprises and the power supply capacity of power supply companies to maintain a state of dynamic balance; otherwise, the lives of residents and the production of enterprises will be affected, potentially endangering the security and availability of the whole electrified wire netting system. Accurate power load prediction provides an important guarantee to ensure that the power supply and demand remain in a stable state. An accurate power load will yield significant economic benefits. It is estimated that every 1% increase in the accuracy of power consumption forecasting will save millions of dollars in operating costs [2]. Improving the prediction performance of electrical load prediction will not only provide a solid foundation for the smooth operation of the power grid but also provide theoretical support for power supply and dispatching plans.
The power load forecast is closely related to the dispatch and normal operations of national electricity consumption, which affects the lives and production activities of residents and the country overall [3]. The current main research work focuses on ultra-short-term and short-term power load prediction, which are hotspots that will allow academia and power companies to dynamically adjust their power generation plans and trading plans in the market environment [4]. This article mainly studies short-term power load forecasting.
Based on the principles and structures of the research methods, power load research can be distinguished into three categories: (i) Physical prediction methods, (ii) statistical prediction methods, and (iii) machine learning methods. The physical prediction model is a prediction model established by combining some physical characteristics and historical power load data. Most of the model assumes that the relationship between power load data and related physical information is still valid in future predictions, and the power load is predicted through intuitive analysis [5]. The main methods involve the unit consumption method, the load density method, and the elastic coefficient method [6]. For example, Yang et al. [7] predicted the power load distribution by calculating the load density and combining it with the characteristics of power consumption in the area. However, physical prediction methods require a great deal of observation data, which inevitably consumes exorbitant and difficult-to-obtain computing resources. Moreover, these physical methods are more appropriate for long-term power load predictions than short-term power load predictions. Statistical methods are more fit for short-term power load predictions. In recent years, traditional learning methods and traditional statistical systems have been broadly used for predicting short-term power loads. Considering previous time-series data of power loads, Cui et al. [8]. developed a new load forecasting model by combining the grey linear regression model with the Markov chain, which overcomes the shortcomings of the traditional grey model that ignores the linear factors. Commonly used traditional statistical systems include autoregressive (AR) and autoregressive moving average (ARMA) models [9]. Song et al. [10] proposed a non-parametric hybrid model. The results show that the hybrid model relying on non-parameters is generally better than other models. However, this method also has some shortcomings. For example, in multi-step predictions, this model's prediction accuracy is low, and the time series prediction method currently widely used cannot take into account the influence of meteorological factors [11]. Although some researchers have proposed some methods to increase the adaptability of time series forecasting methods to meteorological changes, such methods still use the ARIMA model, which has insufficient explanatory power and cannot fundamentally improve the adaptability of the time series prediction method to meteorological changes [12].
In the 1990s, computers gradually entered all walks of life, and power load prediction technology was created in large quantities. Artificial neural network models are becoming increasingly popular for power load prediction. The artificial neural network (ANN) model has very strong nonlinear modeling capabilities and is a data-driven nonlinear adaptive method [13]. At the present time, the main algorithms for artificial neural networks are the back propagation neural network (abbreviated as BP) and Elman neural network (abbreviated as Elman), which can approximate any nonlinear function without knowing the relationship between the predictive model and the data [14]. Moreover, the support vector machine (SVM) originally employed by Vapnik and others of Bell Labs is extensively used in the domain of power load prediction and has been continuously improved by researchers. Some of the artificial intelligence technologies have been applied as follows. Guo et al. [15] suggested a generalized neural network based on using the local mean decomposition to make predictions. Xu et al. [16] proposed an improved RBF prediction model using a weighted fuzzy clustering algorithm to determine the center of the benchmark function. Li et al. [17] established a hybrid neural network model using an improved gray wolf bionic algorithm. The experimental results show that in power system prediction, the hybrid model significantly reduces the prediction errors compared to other comparative models. The effectiveness of the neural network method in power load forecasting has been extensively verified. To date, deep learning (DL) technology has been applied in the automation of many industries, such as image and audio detection [18]. Compared with the neural network, deep learning has a deeper hidden layer. This layer can make the computer perceive problems like human beings by simulating the connection of the human brain grid. At percent, DL has become one of the most attractive technologies in short-term electrified power load prediction as a result of its excellent end-learning capacity and offers the most advanced forecasting performance [19]. Li et al. [20] combined the convolution neural network (CNN), long short-term memory neural network (LSTM), and gated recurrent unit (GRU) algorithm and proposed a prediction model based on deep learning for power load forecasting in Beijing. Massaoudi et al. [21] combined the savitzky Golay filter with the bi-directional long-term memory neural network (BiLSTM) to predict the short-term power load. Experimental results show that the proposed model is highly effective.
Although the DL algorithm improves the prediction accuracy of traditional prediction models, a single traditional prediction system often has some prominent and obvious shortcomings, which leads to unsatisfactory prediction results. Therefore, our study focuses on the combined model [22]. However, modeling of the classical hybrid model still needs to be improved. The high volatility and randomness of the power load data will affect the learning ability of the prediction model and lead to poor prediction performance when using raw data without processing directly [23]. Hence, to reduce the random interference of the data sequence and improve the prediction performance, data preprocessing methods such as empirical mode decomposition (EMD) and the deep learning image noise reduction algorithm (DA) are applied to time series prediction [24,25]. For example, Ribeiro et al. [26] proposed an adaptive, decomposition, heterogeneous, and integrated learning model. In the data preprocessing stage, the super parameters of complementary ensemble empirical mode decomposition were optimized, and three machine learning models were calculated to predict the short-term electricity price in Brazil market. Stefenon et al. [27] proposed a method of feature extraction using wavelet energy coefficient and combining with LSTM to predict power insulator fault. The experimental results show that the method has good prediction accuracy. Although these methods improve the prediction accuracy to a certain extent, they still have some shortcomings. For example, the problem of mode aliasing often occurs in EMD, and the residual noise in DA cannot be processed [28,29]. On the other hand, the final result of the deep learning network depends to some extent on the initial random hidden layer nodes, input unit length, and model hyperparameter optimization method, which will affect the instability of the prediction. However, there is a contradiction between the number of hidden layer nodes in the network and training ability. Universally speaking, when the hidden layers are few, the prediction accuracy is also poor. To some extent, as the hidden layer nodes increase in number, the forecasting accuracy also improves. Nonetheless, this correlativity is limited. When reaching the apex, the predictive power decreases as the number of hidden layer nodes increases. Therefore, it is very important to determine the number of hidden layers and the length of the input cells [30]. By reviewing the previous literature, we found that the above prediction method still has some inherent defects [31]. The shortcomings of these systems are summarized in Table 1.
According to the above analysis, this paper proposes a new hybrid system that combines data preprocessing technology, the advanced deep learning prediction method, and the bionic optimization algorithm to further improve the short-term power load forecasting accuracy. More specifically, based on variational mode decomposition (VMD), the original power load series are decomposed and reconstructed in this paper to effectively remove the noise of the original load data and extract the data features effectively. Then, we apply the deep learning prediction method using the LSTM to predict the processed power load data. Finally, A calculation technique employing the binary encoding genetic optimization algorithm (BEGA) based on swarm intelligent evolution and a bionic strategy is proposed to find the optimal LSTM model's hidden layer nodes and input unit length. The main contributions and innovations of this research are as follows: 1.
The original power load data is decomposed and reconstructed using VMD technology to extract the effective features of the data. This reduces the adverse effects of the instability and irregularity of the original load data on the forecasting model.

2.
The long-term and short-term memory neural network is applied to forecast the power load data. This solves the problem where the time series depends on previous data and overcomes the low accuracy and poor stability of traditional models.

3.
A binary encoding genetic algorithm is proposed to adaptively decide the hidden layer nodes and the length of the input data unit of the LSTM. This algorithm abandons the traditional decimal coding method and uses binary coding for integer optimization.

4.
The adaptive moment estimation (Adam) algorithm is employed for optimizing the model's hyperparameters, instead of the traditional gradient descent algorithm and stochastic gradient descent algorithm (SGD). This improves the convergence speed and prediction stability of the model. 5.
The prediction model optimized by the hybrid optimization method has high prediction accuracy and good stability, thereby effectively improving the accuracy of power load prediction. The second part introduces the specific methods used for the proposed model, including data preprocessing technology, the LSTM prediction model, and the binary encoding genetic algorithm; we describe the framework of the VLG model in detail in Section 3. In Section 4, we conduct three different experiments from different angles and probe the experimental conclusions of the combined system and other systems. To further certify the accuracy and effectiveness of the new hybrid system, Section 5 offers a specific discussion. Finally, the results and conclusions are given in Section 6.

Related Theory
In this part, the methods adopted in the employed hybrid system are explained.

Variational Mode Decomposition (VMD)
VMD is a completely non-recursive adaptive signal processing technology employed by Dragomiretskiy et al. [32], which can effectively achieve the adaptive separation of signals in the frequency domain.
Step 1: For each mode, the analytical signal related to each mode is calculated by the Hilbert transition, and its single-sided spectrum can then be acquired [33]: Step 2: Adjust the estimated center frequency e −jω k t by adding an exponential term to each mode [34]: Step 3: Each mode can be closely centered around the center pulse frequency. Gaussian smoothness is used to estimate the bandwidth of the signal above so that a constrained variation problem can be obtained: where f is the original signal, µ k is the modal function, and δ(t) is the Dirac distribution [35].
Step 4: The second penalty factor ensures that the signal still offers better reconstruction accuracy under high noise conditions. The Lagrange multiplier maintains strict constraints, and the augmented Lagrange formula is [36][37][38] where α is the penalty factor, λ is the Lagrange factor, and L is an augmented Lagrangian multiplier. The solution to the original minimization problem (3) is transformed as a saddle point of the augmented Lagrangian L.

Long Short Term Memory Neural Network (LSTM)
The LSTM, as a special RNN, has strong processing ability for time series data and effectively overcomes the defects of gradient disappearance and gradient explosion in RNNs in machine learning. Figure 1 shows the schematic structure of the LSTM.
The three gates included in the LSTM all use the sigmoid function; the sigmoid function σ is [39] σ( Step 1: The LSTM must decide what information is invalid and discard that information from the unit state. The Sigmoid layer named "Forgotten Gate" f t will do this part of the work. The expression where h t−1 is the hidden state, and x t is the input vector.
Step 2: LSTM determines what information remains in the cell state. First, the "input gate layer" i t determines which values will be updated. Next, the candidate information ∼ C t is created through a tanh neural network layer, and the input gate also reads h t−1 and x t . Next, multiply i t and ∼ C t to obtain the new information needed to remember the cell state. The i t and ∼ C t expressions are as follows [40]: Step 3: According to the current input variables x t and h t−1 , the cell C can be used to remove outdated old information and add the new information needed and thereby update the cell state and obtain the new cell state [41,42]: Step 4: Multiply the latest cell state C t result by the output gate O t vector to obtain the LSTM's final output state vector h t : where W f , W i , W C , and W o are the coefficient matrix, and b f , b i , b C , and b o are the bias vector.

Binary Encoding Genetic Algorithm (BEGA)
The BEGA algorithm is an improved genetic algorithm with binary coding. The genetic algorithm is an evolutionary principle that simulates "the survival of the fittest" phenomenon in nature [43]. The basic operation of the binary encoding genetic algorithm is as follows: Step 1: Binary encoding. The four bases in the human chromosome are simulated with two numbers, 0 and 1, and the variable is described with a string of a certain length containing 0 s and 1 s.
Step 2: Select the operation. The selection operation of the genetic algorithm is a form of roulette. In accordance with the selection strategy of the fitness ratio, the selected probability P i of each individual i is where f i is the fitness value, which is the root mean square error value between the power load forecast value and the true value under the LSTM neural network in this study.
Step 3: Cross operation. A pair of genetic sequences are cut individually via two-point crossover, and then the cut sequences are randomly combined, which makes the binary string sequence more likely to be transcoded.
Step 4: Mutation operation. Due to the particularity of binary coding, the method of bit-flip mutation is used in the mutation operation. For each gene value in the individual, the opposite value is taken according to the given mutation probability P.
Step 5: Decode. assuming that the binary code is converted into the decimal range of [−a, a] and that b-bit precision is required, we need to discretize the interval into (|−a| + a) × 10 b numbers. Then, we convert the binary code x bin into a decimal real number x dec . Next, through formula decoding using we can obtain a real number in the [−a, a] interval.

Adaptive Moment Estimation Optimizer (Adam)
Adam is a random objective function optimization algorithm based on a one-step degree and also an adaptive estimation algorithm based on a low-order matrix. This method has high calculation efficiency and small memory requirements and is very suitable for optimization problems with a large data volume or many parameters [44]. The algorithm is as follows: Step 1: Given an objective function J(θ) with some parameters θ, calculate the gradient at time t: Step 2: Comprehensively, consider the gradient momentum of the previous time step and calculate and update the gradient mean m t and exponential moving average v t of the square of the gradient: where ϕ 1 is the exponential decay rate, which controls weight distribution, and ϕ 2 is the exponential decay rate. This rate impacts the process of the previous squared gradient.
Step 3: Since the initial setting of m 0 and v 0 is 0, here we need to adjust the deviation of the gradient mean m 0 and the exponential moving average v 0 of the square of the gradient to alleviate the impact of the deviation:m Step 4: Update parameters θ: where α represents the step size, which is updated according to , and the built-in parameters of the Adam algorithm are set to ϕ 1 = 0.9; ϕ 2 = 0.999; ε = 10 −8 .

The Formation of the Combined Forecasting System
In this study, a VLG hybrid system for short-term load forecasting is proposed to improve the forecasting accuracy. This system mainly uses VMD technology to remove the noise of the original sequence to obtain a stable time series, which is used as the input feature of the model training and provides a high-quality training set for the model. Here, BEGA is used to adaptively optimize the length L of the input data sample unit of the LSTM and the number N of the cell units, allowing the LSTM to achieve optimal performance. The analysis framework of the prediction system is shown in Figure 2. The 1-step is VMD data preprocessing module, the 2-step is BEGA and Adam optimization algorithm module, and the 3-step is LSTM prediction module. The specific instructions are as follows.

Data Preprocessing Module
As the original load data without data processing for feature extraction have strong volatility and randomness, if these data are input into the model input directly, the prediction performance will decline [45]. Therefore, the advanced decomposition ensemble method is used to eliminate the high frequency noise in the power load series and extract valuable information components from the original power load data, which can effectively improve the prediction accuracy. In this paper, we use VMD to preprocess the data, decompose the original time series into several intrinsic mode functions (IMFs), and then reconstruct those functions as the input series of the prediction model.

Optimization Algorithm Module
The module is mainly classified into two portions. The first part is used to optimize the selection of the input unit length L and the number of the number of hidden layer nodes N of the LSTM through a BEGA algorithm. In this part of the optimization, the binary genetic sequence is initialized. Through the genetic algorithm, L and N are changed within a certain range. Then, pass the changed L and N to the LSTM for training to calculate the loss function as where y m is the true value, andŷ m is the predicted value. The smaller the value of the loss function, the smaller the model prediction error will be. Next, based on the loss function value, update the population, and obtain L and N values that minimize the value of the loss function. Finally, the length L of the input unit and the number N of the number of hidden layer nodes that optimize the performance of the model are determined. The second part is to optimize the hyperparameters in LSTM. Adam is an adaptive objective function optimization algorithm that adaptively calculates the learning rates of different parameters on the basis of the first and second-moment estimates of the gradient to improve the model convergence speed and provide better prediction accuracy.

Forecast Module
To make the prediction performance more accurate and stable, the VLG forecasting system is built to forecast the power load data. By constructing the index system, the prediction errors between the prediction results and the original power load data are evaluated.

Experiment and Evaluation
The accurate prediction of short-term power load data has considerable practical significance and is very important for ameliorating the prediction performance of the model. Therefore, in this paper, we propose a hybrid system to enhance the accuracy of short-term power load prediction and apply other traditional nonlinear forecasting models for comparison in this section to assess the effectiveness of the employed hybrid system. Taking the power load data of New South Wales, South Australia, and Queensland (30 min power load data for each season in 2013) as examples, the performance of the hybrid model is evaluated. This part mainly outlines the procedure and conclusions of the experiment.

Model Evaluation Indicators
To judge whether the performance of one model is better than that of another model, the model evaluation criteria play a vital role. However, there is currently no unified standard to evaluate prediction performance, nor can a single model evaluation index fully reveal the excellent performance of a given model [46]. Therefore, we use the method of constructing an index evaluation system to evaluate the present model. This index system mainly includes the mean absolute percentage error (MAPE), root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R 2 ). The smaller the values of these indicators are, the more accurate the prediction and the better the model performance will be. In contrast, the higher the value of R 2 , the better the prediction model performance. The particular content and calculation formula for each index is introduced in Table 2.

Metric
Definition Equation

MAPE
Average absolute percentage errors Note: In this table,ŷ i represents the predicted value of the system, y i is the true value of the power load data, and y is the average value of the sequence. The calculation equation is

Experimental Setup
To estimate the performance of the VLG hybrid system for short-term power load predictions, the hybrid system is compared with the traditional models for each component to explore the impact of the combination model on short-term power load forecasting. We divide the experiment into three parts: Experiment 1, Experiment 2, and Experiment 3. In Experiment 1, The prediction performance of the model using VMD and other data preprocessing techniques is evaluated. Experiment 2 compares the performance of the VLG hybrid system with the performance of different neural networks and nonlinear prediction models in predicting the short-term power load. Experiment 3 discusses the impact of the adaptive optimization algorithm BEGA on model performance.

Data Description
In the experiment, since the power load data will change with the seasons, we selected the 2013 power load data of New South Wales, South Australia, and Queensland in Australia as the experimental verification data. We used the data from January 1 to January 31 as the summer dataset, the data from March 1 to March 31 as the autumn data set, the data from June 1 to June 30 as the winter dataset, and the data from September 1 until September 30th as the spring data set. In addition, the power load data set was given a time interval of 30 min. Among the data points included in the data set of each season, the first 80% were selected as the training set and the last 20% as the test set. Specifically, the lengths of the training set and test sets in the summer and autumn data sets of the three states were 1190 and 298, respectively. Correspondingly, the lengths of the training set and test set for the spring data set and winter data were 1152 and 288, respectively. When constructing the model input vector, we adopted a rolling acquisition mechanism. In other words, x(1), x(2), · · ··, x(t − 1), x(t) was used as the basis of the latter data x(t + 1). Table 3 presents the descriptive statistics of the original power load data set, training data set, and test data set for the NSW seasonal data.

Model Parameter Setting
This section outlines the parameter settings of each component of the proposed hybrid prediction model VLG, including the length of the input unit, the initial learning rate, and other parameter settings of the LSTM, BEGA optimization algorithm, and Adam optimizer.

Parameter Setting of the ANN
Before the experiment, the parameters in the model and optimization algorithm must be defined. For the deep learning algorithm LSTM, the length of the input unit L and the number of hidden layer nodes N are optimized by the BEGA algorithm. The input unit length of BP is 5, and the number of hidden layer nodes is 11. See Table 4 for the other parameter settings.

Parameter Settings of the Optimization Algorithm
Setting the parameters of the BEGA optimization algorithm and Adam optimizer is very important for the short-term power load forecasting accuracy. The specific parameter settings we studied are shown in Table 5.  Figure 3 shows the power load data for South Australia, Queensland, and New South Wales. It is not difficult to see that the data fluctuations in New South Wales and Queensland are relatively stable, with South Australia fluctuating the most frequently. In the Figure 3, (a) is the situation of the three study regions. (b) is the feature selection part, which shows the decomposition results of power load series, and IMFs is arranged in descending order of frequency. Then, the high frequency IMF is removed and the remaining IMFs are reconstructed to obtain the optimal input sequence. The characteristics of the data series are obviously improved than the original ones.

Experiment I
In the previous section, we noted that the noise reduction technology of time series plays a crucial role in the performance and prediction accuracy of the model. Based on the research of previous scholars, noise reduction technology can mainly be divided into three major categories: empirical mode decomposition technology (EMD), denoising autoencoder technology (DA), and VMD, which is used in our proposed hybrid system. The principle of VMD was discussed in the previous section. EMD technology involves decomposing the internal model function from the original signal to obtain a series of different intrinsic mode functions (IMFs). This method can decompose non-stationary and non-linear signals into stationary signals with different time scales. The denoising autoencoder (DA) is a type of unsupervised learning that uses an encoder and a decoder. White noise or Gaussian noise is added to the original sequence, and the neural network is continuously iterated to obtain a dimensionality reduction feature expression of the data to achieve the effect of feature extraction. Experiment 1 aims to verify the influence of the noise reduction data set on the model and the performance of the VMD noise reduction method. In this way, the prediction accuracy of VMD-LSTM and other neural network models based on EMD and DA (i.e., EMD-BP, EMD-LSTM, DA-BP, and DA-LSTM) is compared. We performed three experiments on each model to assess the prediction accuracy of each model, and the specific prediction results based on NSW spring dataset are shown in Table 6. Table 6.
Comparison of the prediction performance of models using different data preprocessing systems. The details of the experiment are as follows:

Model
(a) The comparison results of the evaluation index system of VMD-LSTM and other hybrid systems are introduced in this table. The VMD-LSTM shows better prediction accuracy and prediction performance on most evaluation indicators. For the comparison between the single LSTM model and the VMD-LSTM system, it is obvious that the evaluation index of the VMD-LSTM system is superior to that of the LSTM model in all cases. At the same time, the prediction accuracy of all noise reduction models is higher than that of the model based on original data, which indicates that data preprocessing is indispensable for power load prediction. (b) In the case, we compare the VMD-LSTM model and several other hybrid model methods.
Among the performance test index values of the experiments in various regions, the VMD-LSTM model offers the best MAPE results, with 0.4859%, 0.9352%, and 0.4922%. Secondly, the models based on prediction accuracy are VMD-LSTM, EMD-LSTM, EMD-BP, VMD-BP, DA-LSTM, and DA-BP, in order from high to low. Among the six models, VMD-LSTM has the best prediction accuracy. The coefficient of determination (R 2 ) reflects the difference in the performance of the prediction model from the fit. In this experiment, the R 2 of VMD-LSTM is the best with 0.9971, 0.9910, and 0.9967 in the three states. We also certify the effectiveness of the noise reduction model VMD employed in this paper. (c) The previous time-series data denoising technology is also applied to the power load, short-term wind speed, and stock prediction models. Most of these models only discuss the improvement of model accuracy and performance via noise reduction technology but do not discuss the new sequence obtained after using the noise reduction method correlation with the original time series. Therefore, through the gray correlation method (GC) and the method of calculating the Pearson correlation coefficient (PE) and Spearman correlation coefficient (SP), the differences between different noise reduction methods are discussed from the perspective of the correlation between the new sequence and the original sequence. Detailed calculation results are given in Table 7. Table 7 shows the correlation between the new sequence obtained by different noise reduction methods and the original power load data, in which the new sequence obtained using the VMD method has the highest correlation with the original sequence. In summary, using the VMD noise reduction method to process data not only performs well in improving the prediction accuracy of the model but also maintains more original information in the sequence, making it a more suitable method for the data preprocessing of short-term power load data. Remarks: Based on the above experiments, the employed VMD-LSTM combined system has the highest prediction performance, and Queensland has the highest prediction accuracy, with a MAPE value of 0.4922%. The average prediction accuracy of the three regions is 0.6378%. This shows that the system has high prediction accuracy and excellent stability. Moreover, in the test index based on the correlation measure, VMD remains superior to other noise reduction models, further verifying the effectiveness of the system.

Experiment II
Power load data are very sensitive to natural factors such as the season and climate and is one of the dominant factors affecting the fluctuation characteristics of the power load. In this part, the 30 min power load data of New South Wales, Queensland, South Australia from March to April, June to July, September to October, and December to January in 2013 are used as seasonal data for this area. A comparison of the prediction performance differences between the different predictive methods based on the power load data of New South Wales is shown in Table 8. Moreover, Tables 9 and 10 provide a prediction performance comparison between the traditional predicted model based on the seasonal data of Queensland and South Australia and the combination model based on the LSTM employed in this study. In general, the VLG system employed in this research provides better performance than traditional prediction methods.
(a) For New South Wales, based on the annual average of the prediction results, BP, PSO-BP, LSTM, and our proposed hybrid model VLG obtained good accuracy results for all evaluation indicators. The ARIMA model also achieved a prediction effect with a MAPE value of less than 2%. However, the MAPE values of the other two models were higher than 10%, and the performance was poor. The ARIMA model has always been considered to provide superior performance in predicting power load data, and its MAPE value is lower than the values of the other traditional models discussed in this article, which is logical. However, the ARIMA model is not beneficial for long-term rolling forecasting and has certain disadvantages. The BP model and the optimization system based on the BP neural network design have always had good prediction performance in many experiments related to power load prediction. However, their MAPE values are still nearly three times higher than those of our proposed hybrid model VLG, which shows that our proposed hybrid model provides outstanding performance in short-term power load prediction.
For seasonal data, the proposed system obtained the best performance for each seasonal data set. For spring data, the MAPE value obtained by the VLG hybrid system was 0.3081%. Among these methods, the ARIMA, BP, PSO-BP, and LSTM models all perform better. The forecasting accuracy of the VLG prediction model was the highest, with a MAPE of 0.3081%, RMSE of 30.27, R 2 of 0.9979, and MAE of 24.88. In terms of the other seasonal feature data, the VLG model also obtained the best results, with corresponding MAPE values of 0.4271%, 0.2724%, and 0.3717%. Among them, the forecasting accuracy of the autumn feature data set was the highest in each model, while the forecasting accuracy of the general system of the winter feature data set was worse than that of other seasons, which indicates indirectly that the forecasting of the power load is affected by regional and seasonal factors.
(b) For Queensland, from the perspective of annual average forecasting accuracy, the proposed combined system is still better than other classic models in prediction accuracy, with a MAPE value of 0.3486%. Among the remaining eight models, the forecasting accuracy of the VMD-LSTM system ranks second. The performance of the PSO-BP optimization model and the original LSTM model is similar, and the prediction accuracy is excellent. However, the MAPE of the hybrid system's VLG proposed in this paper was 0.1624%, 0.5052%, and 0.5499% lower than the values of the above model, respectively. Compared with the feature data of South Wales, the prediction accuracy of Elman and RBF models is improved but is still not ideal compared to the other models. For the seasonal data in Queensland, the VLG model still achieved the best results for the seasonal data, with corresponding MAPE values of 0.2602%, 0.3718%, 0.3101%, and 0.4524%, respectively. From the perspective of the coefficient of determination, the goodness of fit of the VLG combination forecasting model was the most significant, and R 2 values of the four seasonal data were determined as 0.9983, 0.9953, 0.9979, and 0.9942, respectively. Here, the general prediction accuracy for the winter feature data set was worse than that of other seasons. The performance rankings of each model differed little from the rankings based on the South Wales feature dataset.
(c) For South Australia, from the perspective of the annual average prediction accuracy, in all prediction models, the performance evaluation index values calculated by the VLG hybrid model were significantly better than the performance index values calculated by the other prediction model processing methods, and its MAPE value was 0.9800%. The prediction results of the characteristic data of South Australia were slightly lower than those of the models of South Wales and Queensland, but the prediction accuracy of the VLG, VMD-LSTM, LSTM, and PSO-BP models was still excellent. In comparison with the other three systems, the MAPE value of the VLG hybrid model system decreased by 0.4066%, 1.2675%, and 1.1927%, respectively. In terms of seasonal feature data, like with the prediction results of Queensland and South Wales, the VLG hybrid model provided the smallest prediction errors and the greatest point prediction accuracy in four-season feature data prediction compared with other traditional models. Figure 4 shows the performance comparison between the VLG prediction system and the seven comparison models based on the spring data set of South Australia. In addition, Figure 5 shows the fit between the predicted value of VLG and the actual value of power load based on the four seasonal data sets in South Australia.
Moreover, based on the seasonal feature data of the three states, the seasonal temperature and regional climate were some of the most important factors affecting the performance of short-term power load predictions. Among them, autumn was often the season with the highest power load forecast accuracy. On the contrary, the prediction performance of the three states based on the winter feature data was lower than that of the other three seasons. In terms of regional differences, the annual average forecast performance gap between New South Wales and Queensland was not large, with MAPE values of 0.3717% and 0.3486%. South Australia's annual average forecast performance was relatively poor, with a MAPE value of 0.9800%, which may be related to the South Australia power load data set.
Remark: Based on Experiment 2, the performance evaluation value calculated by the hybrid model VLG is superior to the performance evaluation value calculated by any classic single-item model and hybrid model. Therefore, the experimental conclusions indicate that the employed VLG hybrid system performs well in short-term power load predictions. Simultaneously, the seasonal climate and other factors have a certain impact on power load forecasting.
Here, bold numbers indicate that the index values of this system are superior to those of the other system.
In Figure 6, (a) is the radar map of the comparison of annual average MAPE of different prediction models in three regions. (b) gives the prediction performance comparison of different data preprocessing methods. (c) shows the outstanding advantages of the BEGA optimization algorithm and the prediction results of VLG prediction system. (d) shows the three indicators of the classic prediction model increase by percentage compared with the VLG prediction system, in New South Wales.

Experiment III
The results of Experiment 2 show that the hybrid model based on LSTM has excellent performance in short-term power load prediction. However, LSTM needs one to manually configure the input unit length L and the number of hidden layer nodes N. These configured parameters also largely determine the model's ability to engage in short-term power load forecasting. There is no fixed rule for determining appropriate parameters when the LSTM predicts a time series. Therefore, common solutions include using repeated trials or enumeration methods to obtain appropriate parameters for accurate prediction accuracy. However, methods such as enumeration consume considerable time and may not necessarily select the best parameters. On the other hand, for the LSTM network, the final values of the hyperparameters of each neural unit and gate structure also have a significant impact on the prediction performance. Usually, the gradient descent method (GD) or stochastic gradient descent method (SGD) is used in an experiment to find and select these hyperparameters. The gradient descent method is prone to problems such as a local optimal solution and slow convergence. Although the prediction results of the VMD-LSTM were satisfactorily evaluated, considering the deficiencies of the LSTM and SGD algorithms, we do not consider the prediction performance to have reached its optimal value. In response, the VLG hybrid system uses a binary encoding genetic algorithm (BEGA) and the Adam optimizer to solve these problems.
In this part of the experiment, we focus on the hybrid model VMD-LSTM and the original LSTM neural network and use the Adam optimizer and BEGA for parameter optimization. Here, the performance of the feature data sets of each season in New South Wales under different configurations, with different numbers of input units L and numbers of hidden layer nodes N, was studied. That is, the experience of previous scholars was used to determine the two different sets of L and N values to verify the effects of the BEGA on model accuracy. The specific experimental steps are shown in Table 11. At the same time, using the spring data of New South Wales as the data set, the Adam optimizer is compared with the stochastic gradient descent method to discuss the differences in prediction performance produced by different hyperparameter optimization methods. The specific experimental steps are shown in Table 12. As can be determined from Table 11, the VLG model using the binary encoding genetic algorithm provides better prediction results than the model selected via empirical summary in each season data set. Based on the seasonal feature data set, the BEGA optimal input unit length L and the number of hidden layer nodes N are 12-8, 12-8, 25-12, and 4-6, respectively, and the prediction accuracy MAPE values are 0.2602%, 0.3718%, 0.3101%, and 0.3758%, respectively. It can be seen from the MAPE values that using the BEGA algorithm to select the optimal unit length L and the number of cell units N can increase the model accuracy by up to 30%. Based on a comparison of the RMSE, MAE, and R 2 values, the performance improvement of the VLG system in the spring data set was the highest, with MAE, RMSE, and R 2 values of 15.53, 20.46, and 0.9986, respectively. Compared with the experience summary method, the MAE and RMSE average decline was 10.72, 15.37. In this four-season feature data set, our proposed VLG model using the BEGA algorithm has very good prediction performance and can accurately predict future short-term power load changes.  Table 12 shows the performance advantages and disadvantages of different optimizers based on LSTM and VMD-LSTM systems. The results show that the prediction values of LSTM networks with different structures are quite different. Through a longitudinal comparison, it can be seen that using the Adam optimizer is much better than using the random gradient descent for prediction. Under the same data set, the MAPE values of the LSTM and VMD-LSTM models using the Adam optimizer were 0.8744% and 0.2281%, while the MAPE values of the LSTM and VMD-LSTM models using the SGD algorithm were 1.071% and 0.6253%. Thus, compared to using the Adam optimizer, employing the SGD algorithm reduced the performance by 22% and 174%, respectively. This result is enough to show the importance of using the Adam optimizer for model prediction accuracy and that this optimizer significantly improves the stability and accuracy of the predictions. Moreover, through a horizontal comparison, it can be further shown that using the optimal number of input units L and the number of cell units N (4-6) determined by the BEGA algorithm can give the model the best performance in the field of measurement precision and stableness, thereby significantly improving model performance.
Remarks: For prediction performance and prediction accuracy, using the proposed BEGA algorithm and Adam optimizer is significantly superior to applying the evaluation index value determined by summarizing experience and using the stochastic gradient descent method. This further confirms that the VLG hybrid system has outstanding performance in future short-term power load predictions.

Discussion
This section provides an in-depth exploration of the above experimental results-that is, the availability of the hybrid system, the performance differences between the optimization algorithm hybrid model used and other optimization hybrid prediction systems, and the improvements of the evaluation index of the proposed hybrid system.

Effectiveness of the Proposed System
First, the predicted error is a vital indicator to estimate the performance of the prediction system. In this section, the validity of the hybrid model developed by the Diebold-Mariano test is tested, and the significance level of the prediction error of the different models is demonstrated via the hypothesis test method. The VLG system is then compared with other models. The Diebold-Mariano test evaluates the differences between different prediction systems according to the error of the system prediction performance [47]. The null hypothesis H 1 and the alternative hypothesis H 0 are presented in Equation (25) and (26): The DM test consequences of the employed system and those of the other comparative models are shown in Table 13. On account of the above results, the following conclusions can be drawn: (1) By contrasting and dissecting the forecasting errors of different hybrid systems, the DM test consequences of different prediction models are all at the upper limit at a confidence level of 1%; (2) The Diebold-Mariano test was performed on the prediction errors of four different traditional single models, and the test results of the VLG were all higher than the upper limit at a confidence level of 1%; (3) The minimum value of the comparison between the VLG combination system and the LSTM model using other data preprocessing technologies is 2.0104, which is also far beyond the 5% significance level threshold.
Therefore, according to the Diebold-Mariano test results, it can be legitimately summarized that the employed predicted model not only has greater prediction capacity than other systems but also indicates crucial distinctions in the level of prediction accuracy and superiority in short-term power load forecasting.

Model Stability Study
This section starts with the stability of the model and proposes two different sets of multiple experiments to estimate the prediction ability of the VLG system. By comparing the prediction stability values of the three systems of LSTM, BP, and PSO-BP, the prediction stability is verified for the core part of the hybrid system, LSTM, employed in this paper. All three models use raw data and can be modeled without denoising. Based on the Spring feature data set in Queensland, Australia, 20 repeated experiments were conducted on the three models to explore the volatility of the prediction accuracy of the models. As is well-known, variance can reflect the robustness and volatility of the forecasting system. The smaller the standard deviation of the prediction error is, the more robust the prediction system and the weaker the volatility will be. Therefore, the standard deviation of the predicted error is used to appraise the robustness of the employed combination prediction system and other contradistinctive systems. Figure 7a illustrates the differences in forecasting stability between the three systems over 20 tests. The results show that the BP has the largest volatility: The maximum MAPE is 1.5648%, the minimum is 0.9186%, and the standard deviation is 0.2022. In contrast, the prediction capacity of the LSTM model is better and more stable. The maximum value of MAPE is 0.9126%, the minimum value is 0.8241%, and the standard deviation is only 0.0367. Among these ANNs, the LSTM model has the best effectiveness in prediction accuracy and stability, as well as the least standard deviation, which reflects the stability advantages of the LSTM model. Moreover, the excellent prediction performance of these models depends to a great extent on the optimization method of the model parameters. Different parameter optimization methods have certain effects on the forecasting accuracy and convergence speed of the system. To ensure a single variable, the LSTM model was selected to apply the Adam optimizer and the SGD stochastic gradient descent method for 100 trials, and the differences between the two optimization methods were compared. Figure 7 (b) shows the results. The average value of MAPE of LSTM model obtained by Adam optimization method is 0.8457%, which is significantly improved than that of SGD. In the scatter plot of MAPE, the MAPE obtained by the Adam optimizer is between 0.8% and 0.9%, and the standard deviation is 0.0341. Moreover, the bandwidth of the scatter plot is narrower, indicating that the model prediction is more stable and has smaller prediction volatility. On the contrary, the MAPE value of the model using the stochastic gradient descent method is between 0.9% and 1.3%, with a standard deviation of 0.2451. Here, the prediction error of the model is increased, the stability is worse than that of the Adam optimizer, the prediction volatility is large, and the prediction performance is unstable.

Multi-Step Prediction and Result Analysis
In the experiments in Section 4, the forecast model was used to make the next forecast of the power load data-that is, a single-step forecast. This section compares the three different hybrid models proposed to appraise the prediction performance of the employed hybrid system VLG in a multi-step prediction test.
Unlike the comparison model proposed in the previous section, this experiment aims to verify whether the new VLG model is comparable with the other two VMD-based hybrid models (i.e., VMD-GWO-SVM and VMD-PSO-BP). SVM and BP models perform well in dealing with time series forecasting problems and are also generally employed in short-term power load data predictions. In the model's hyperparameter optimization algorithm, the two comparative experiments, respectively, used the PSO optimization algorithm and the gray wolf optimization algorithm. In the spring data of Queensland, BP and SVM are optimized to certify the accuracy of the VLG hybrid system based on a comparison with excellent classic hybrid models (see Table 14 and Figure 8 for comparison results). In Figure 8  For the spring data of Queensland, in a one-step forecast, the proposed combined model VLG and comparative model VMD-PSO-BP had no significant differences in the various evaluation indicators but still obtained the best MAPE, MAE and RMSE, respectively, with 0.2448%, 18.13, and 23.54. In the one-step prediction, although the employed system does not reflect the outstanding advantages of the classic combined model VMD-PSO-BP in prediction accuracy and performance, the performance remains no worse than that of any other models. When the prediction of the model involves a two-step prediction, the VLG achieves obvious advantages, and the MAPE, MAE, and RMSE are 0.7653%, 48.77, and 56.73, respectively. Instead, the VMD-PSO-BP model has no significant difference in prediction accuracy compared to the VMD-GWO-SVM in the two-step prediction. The prediction accuracy of both is far poorer than that of one-step prediction. The MAPE values are 1.6333% and 1.9159%, which are 0.868% and 1.1506% higher than the values of the VLG system. In the three-step prediction, the MAPEs of the three hybrid systems are all greater than 1%, but the MAPE of the proposed VLG combined system is 1.35655%. This system still offers the greatest prediction ability among the three excellent VMD-based combined systems. From the perspective of the improved MAPE, the prediction capacity of the VLG system improved the greatest in the two-step prediction, with 60.05% and 53.14%, respectively. Figure 8 demonstrates a comparison of the forecasting performance of the experiment with the spring data of Queensland steps 1, 2, and 3. Among the three-step prediction models, the VLG hybrid system is still the most accurate and valid prediction system.

Improvement of the Evaluation Index
In previous index evaluation systems, the MAPE values of each prediction model were too small, and the RMSEs were different because of the differences in the data dimension, making it challenging to intuitively display the degree of differences in the model prediction accuracy [48]. In this study, we use the percentage improvements of the MAPE and RMSE criteria. In this way, a comprehensive analysis of the proposed combined system can be carried out. The definition is as follows: The improved MAPE and RMSE indicators are shown in Table 15. Considering the results in Table 15, the prediction capacity of the employed combined system is discussed and analyzed as follows: (a) The predictive capacity of the system employed in this study is clearly commendable. This indicates that the prediction veracity of the system is gradually improved due to the data preprocessing technology and simulation optimization algorithm playing a vital role in improving the prediction ability of the system. (c) Notably, the VLG hybrid system presents obvious advantages over other systems.

Future and Prospects
The hybrid prediction system based on deep learning proposed in this study overcomes the shortcomings of traditional prediction models. In the rapid development of the intelligent information age, accurate load forecasting has become an indispensable part of the power load field, which plays a vital role in the safe operation, daily distribution, and economy of the power system. From artificial neural networks, to machine learning, and finally to deep learning, the construction of load forecasting model is approaching ever closer to the actual situation. Given the existing prediction models, the prediction models constructed by deep learning methods such as LSTM and CNN are obviously better than those constructed by artificial neural networks and traditional machine learning algorithms. A hybrid prediction model combining the intelligent optimization algorithm and deep learning network has higher practical application value and stronger expansibility, making it able to more easily fit nonlinear time series with strong volatility.
In addition to traditional load forecasting, intermittent renewable energy, such as photovoltaic power generation and wind power generation, features stronger volatility, randomness, and instability. However, research on predicting intermittent renewable energy using deep learning methods is still very limited. The development of a comprehensive and effective deep learning method is expected to become a new direction of smart grid research. The application of artificial intelligence algorithms in the field of new energy is a new typical application scenario for artificial intelligence and also provides a new intelligent solution for building a global low-carbon energy future.

Conclusions
Short-term power load predictions play a crucial role in the safe operation and risk assessment of electrified wire netting, which has aroused attention and excitement among scholars. Because of the inherent uncertainty and randomness of power load sequences, determining how to efficiently and effectively predict the power load is still a challenging task. A string of research has also been developed to enhance the performance of power load prediction. Unfortunately, these methods are mostly restricted to using a onefold prediction model to predict power load sequences; however, any single model will have inherent shortcomings. Moreover, most previous studies have not considered the effect of data preprocessing and sequence noise on the model prediction accuracy. In response, this paper proposed a new predictive analysis system that overcomes the above shortcomings and provides an effective technical means for short-term power load analysis and monitoring. In the developed model, variational modal decomposition (VMD) was employed to divide the original sequence from high to low into a set of components. Through reconstruction, the global noise was effectively removed. Then, the long-term short-term memory neural network (LSTM) was used instead of the classical neural network to predict the power load data, which effectively improved the prediction accuracy. Finally, to further improve the modeling performance and robustness, an improved binary coding genetic algorithm was proposed based on the genetic algorithm; this algorithm achieved high accuracy and maintained strong stability. The effectiveness of the algorithm was verified by experiments. The predictive analysis system developed by the application was used to predict the seasonal data of New South Wales, Queensland, and South Australia and calculate multiple performance indicators (MAPE, RMSE, MSE, and PMAPE). The experimental results indicate that the minimum MAPE values of the employed VLG system are 0.3717%, 0.3486%, and 0.9800%, respectively, which are better than the values of the comparison model. Like the traditional excellent hybrid model, the multi-step prediction of this model also provides strong prediction performance. In general, the proposed predictive analysis system shows excellent performance in analyzing and monitoring short-term power loads. Specifically, this system does not only deeply analyze the information related to people's activities and lives but can also accurately and steadily approach the actual values. Therefore, future power plant decision-makers and grid investors could make reasonable decisions based on the system presented in this article to monitor and predict power loads.

List of terminologies (method and indices) EMD
Empirical Actual signal x t Corresponding input The main terminologies mentioned in this paper (including indices, methods, variables and parameters).