Next Article in Journal
Are BBQs Significantly Polluting Air in Poland? A Simple Comparison of Barbecues vs. Domestic Stoves and Boilers Emissions
Next Article in Special Issue
Power Generation Prediction of an Open Cycle Gas Turbine Using Kalman Filter
Previous Article in Journal
Online Prediction of Vehicular Fuel Cell Residual Lifetime Based on Adaptive Extended Kalman Filter
Previous Article in Special Issue
A Study on the Development of Machine-Learning Based Load Transfer Detection Algorithm for Distribution Planning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hybrid System Based on LSTM for Short-Term Power Load Forecasting

School of Statistics, Dongbei University of Finance and Economics, Dalian 116025, China
*
Author to whom correspondence should be addressed.
Energies 2020, 13(23), 6241; https://doi.org/10.3390/en13236241
Submission received: 21 October 2020 / Revised: 20 November 2020 / Accepted: 23 November 2020 / Published: 26 November 2020
(This article belongs to the Special Issue Forecasting and Planning in Power Systems)

Abstract

:
As the basic guarantee for the reliability and economic operations of state grid corporations, power load prediction plays a vital role in power system management. To achieve the highest possible prediction accuracy, many scholars have been committed to building reliable load forecasting models. However, most studies ignore the necessity and importance of data preprocessing strategies, which may lead to poor prediction performance. Thus, to overcome the limitations in previous studies and further strengthen prediction performance, a novel short-term power load prediction system, VMD-BEGA-LSTM (VLG), integrating a data pretreatment strategy, advanced optimization technique, and deep learning structure, is developed in this paper. The prediction capability of the new system is evaluated through simulation experiments that employ the real power data of Queensland, New South Wales, and South Australia. The experimental results indicate that the developed system is significantly better than other comparative systems and shows excellent application potential.

1. Introduction

Along with fast-economic development, power enterprises continue to expand their construction scales, and the corresponding power grid structures and operation modes are gradually becoming diversified [1]. Because electricity is a special type of energy, people cannot store much electricity. This requires the power generation capacity of power generation enterprises and the power supply capacity of power supply companies to maintain a state of dynamic balance; otherwise, the lives of residents and the production of enterprises will be affected, potentially endangering the security and availability of the whole electrified wire netting system. Accurate power load prediction provides an important guarantee to ensure that the power supply and demand remain in a stable state. An accurate power load will yield significant economic benefits. It is estimated that every 1% increase in the accuracy of power consumption forecasting will save millions of dollars in operating costs [2]. Improving the prediction performance of electrical load prediction will not only provide a solid foundation for the smooth operation of the power grid but also provide theoretical support for power supply and dispatching plans.
The power load forecast is closely related to the dispatch and normal operations of national electricity consumption, which affects the lives and production activities of residents and the country overall [3]. The current main research work focuses on ultra-short-term and short-term power load prediction, which are hotspots that will allow academia and power companies to dynamically adjust their power generation plans and trading plans in the market environment [4]. This article mainly studies short-term power load forecasting.
Based on the principles and structures of the research methods, power load research can be distinguished into three categories: (i) Physical prediction methods, (ii) statistical prediction methods, and (iii) machine learning methods. The physical prediction model is a prediction model established by combining some physical characteristics and historical power load data. Most of the model assumes that the relationship between power load data and related physical information is still valid in future predictions, and the power load is predicted through intuitive analysis [5]. The main methods involve the unit consumption method, the load density method, and the elastic coefficient method [6]. For example, Yang et al. [7] predicted the power load distribution by calculating the load density and combining it with the characteristics of power consumption in the area. However, physical prediction methods require a great deal of observation data, which inevitably consumes exorbitant and difficult-to-obtain computing resources. Moreover, these physical methods are more appropriate for long-term power load predictions than short-term power load predictions. Statistical methods are more fit for short-term power load predictions. In recent years, traditional learning methods and traditional statistical systems have been broadly used for predicting short-term power loads. Considering previous time-series data of power loads, Cui et al. [8]. developed a new load forecasting model by combining the grey linear regression model with the Markov chain, which overcomes the shortcomings of the traditional grey model that ignores the linear factors. Commonly used traditional statistical systems include autoregressive (AR) and autoregressive moving average (ARMA) models [9]. Song et al. [10] proposed a non-parametric hybrid model. The results show that the hybrid model relying on non-parameters is generally better than other models. However, this method also has some shortcomings. For example, in multi-step predictions, this model’s prediction accuracy is low, and the time series prediction method currently widely used cannot take into account the influence of meteorological factors [11]. Although some researchers have proposed some methods to increase the adaptability of time series forecasting methods to meteorological changes, such methods still use the ARIMA model, which has insufficient explanatory power and cannot fundamentally improve the adaptability of the time series prediction method to meteorological changes [12].
In the 1990s, computers gradually entered all walks of life, and power load prediction technology was created in large quantities. Artificial neural network models are becoming increasingly popular for power load prediction. The artificial neural network (ANN) model has very strong nonlinear modeling capabilities and is a data-driven nonlinear adaptive method [13]. At the present time, the main algorithms for artificial neural networks are the back propagation neural network (abbreviated as BP) and Elman neural network (abbreviated as Elman), which can approximate any nonlinear function without knowing the relationship between the predictive model and the data [14]. Moreover, the support vector machine (SVM) originally employed by Vapnik and others of Bell Labs is extensively used in the domain of power load prediction and has been continuously improved by researchers. Some of the artificial intelligence technologies have been applied as follows. Guo et al. [15] suggested a generalized neural network based on using the local mean decomposition to make predictions. Xu et al. [16] proposed an improved RBF prediction model using a weighted fuzzy clustering algorithm to determine the center of the benchmark function. Li et al. [17] established a hybrid neural network model using an improved gray wolf bionic algorithm. The experimental results show that in power system prediction, the hybrid model significantly reduces the prediction errors compared to other comparative models. The effectiveness of the neural network method in power load forecasting has been extensively verified. To date, deep learning (DL) technology has been applied in the automation of many industries, such as image and audio detection [18]. Compared with the neural network, deep learning has a deeper hidden layer. This layer can make the computer perceive problems like human beings by simulating the connection of the human brain grid. At percent, DL has become one of the most attractive technologies in short-term electrified power load prediction as a result of its excellent end-learning capacity and offers the most advanced forecasting performance [19]. Li et al. [20] combined the convolution neural network (CNN), long short-term memory neural network (LSTM), and gated recurrent unit (GRU) algorithm and proposed a prediction model based on deep learning for power load forecasting in Beijing. Massaoudi et al. [21] combined the savitzky Golay filter with the bi-directional long-term memory neural network (BiLSTM) to predict the short-term power load. Experimental results show that the proposed model is highly effective.
Although the DL algorithm improves the prediction accuracy of traditional prediction models, a single traditional prediction system often has some prominent and obvious shortcomings, which leads to unsatisfactory prediction results. Therefore, our study focuses on the combined model [22]. However, modeling of the classical hybrid model still needs to be improved. The high volatility and randomness of the power load data will affect the learning ability of the prediction model and lead to poor prediction performance when using raw data without processing directly [23]. Hence, to reduce the random interference of the data sequence and improve the prediction performance, data preprocessing methods such as empirical mode decomposition (EMD) and the deep learning image noise reduction algorithm (DA) are applied to time series prediction [24,25]. For example, Ribeiro et al. [26] proposed an adaptive, decomposition, heterogeneous, and integrated learning model. In the data preprocessing stage, the super parameters of complementary ensemble empirical mode decomposition were optimized, and three machine learning models were calculated to predict the short-term electricity price in Brazil market. Stefenon et al. [27] proposed a method of feature extraction using wavelet energy coefficient and combining with LSTM to predict power insulator fault. The experimental results show that the method has good prediction accuracy. Although these methods improve the prediction accuracy to a certain extent, they still have some shortcomings. For example, the problem of mode aliasing often occurs in EMD, and the residual noise in DA cannot be processed [28,29]. On the other hand, the final result of the deep learning network depends to some extent on the initial random hidden layer nodes, input unit length, and model hyperparameter optimization method, which will affect the instability of the prediction. However, there is a contradiction between the number of hidden layer nodes in the network and training ability. Universally speaking, when the hidden layers are few, the prediction accuracy is also poor. To some extent, as the hidden layer nodes increase in number, the forecasting accuracy also improves. Nonetheless, this correlativity is limited. When reaching the apex, the predictive power decreases as the number of hidden layer nodes increases. Therefore, it is very important to determine the number of hidden layers and the length of the input cells [30]. By reviewing the previous literature, we found that the above prediction method still has some inherent defects [31]. The shortcomings of these systems are summarized in Table 1.
According to the above analysis, this paper proposes a new hybrid system that combines data preprocessing technology, the advanced deep learning prediction method, and the bionic optimization algorithm to further improve the short-term power load forecasting accuracy. More specifically, based on variational mode decomposition (VMD), the original power load series are decomposed and reconstructed in this paper to effectively remove the noise of the original load data and extract the data features effectively. Then, we apply the deep learning prediction method using the LSTM to predict the processed power load data. Finally, A calculation technique employing the binary encoding genetic optimization algorithm (BEGA) based on swarm intelligent evolution and a bionic strategy is proposed to find the optimal LSTM model’s hidden layer nodes and input unit length. The main contributions and innovations of this research are as follows:
  • The original power load data is decomposed and reconstructed using VMD technology to extract the effective features of the data. This reduces the adverse effects of the instability and irregularity of the original load data on the forecasting model.
  • The long-term and short-term memory neural network is applied to forecast the power load data. This solves the problem where the time series depends on previous data and overcomes the low accuracy and poor stability of traditional models.
  • A binary encoding genetic algorithm is proposed to adaptively decide the hidden layer nodes and the length of the input data unit of the LSTM. This algorithm abandons the traditional decimal coding method and uses binary coding for integer optimization.
  • The adaptive moment estimation (Adam) algorithm is employed for optimizing the model’s hyperparameters, instead of the traditional gradient descent algorithm and stochastic gradient descent algorithm (SGD). This improves the convergence speed and prediction stability of the model.
  • The prediction model optimized by the hybrid optimization method has high prediction accuracy and good stability, thereby effectively improving the accuracy of power load prediction.
The second part introduces the specific methods used for the proposed model, including data preprocessing technology, the LSTM prediction model, and the binary encoding genetic algorithm; we describe the framework of the VLG model in detail in Section 3. In Section 4, we conduct three different experiments from different angles and probe the experimental conclusions of the combined system and other systems. To further certify the accuracy and effectiveness of the new hybrid system, Section 5 offers a specific discussion. Finally, the results and conclusions are given in Section 6.

2. Related Theory

In this part, the methods adopted in the employed hybrid system are explained.

2.1. Variational Mode Decomposition (VMD)

VMD is a completely non-recursive adaptive signal processing technology employed by Dragomiretskiy et al. [32], which can effectively achieve the adaptive separation of signals in the frequency domain.
Step 1: For each mode, the analytical signal related to each mode is calculated by the Hilbert transition, and its single-sided spectrum can then be acquired [33]:
Hilbert f ( t ) = 1 π p . v . f ( v ) t - v d v
Step 2: Adjust the estimated center frequency e j ω k t by adding an exponential term to each mode [34]:
δ t + j π t μ k ( t ) e j ω k t
Step 3: Each mode can be closely centered around the center pulse frequency. Gaussian smoothness is used to estimate the bandwidth of the signal above so that a constrained variation problem can be obtained:
Min K t δ ( t ) + j π t * μ k ( t ) e - j φ k t 2 2
s . t . K μ k = f 0
where f is the original signal, μ k is the modal function, and δ ( t ) is the Dirac distribution [35].
Step 4: The second penalty factor ensures that the signal still offers better reconstruction accuracy under high noise conditions. The Lagrange multiplier maintains strict constraints, and the augmented Lagrange formula is [36,37,38]
L μ k , φ k , λ = α k t ψ ( t ) + j π t * μ k e - j φ k t 2 2 + f ( t ) - k m k ( t ) 2 2 + λ ( t ) , f ( t ) k m k ( t )
where α is the penalty factor, λ is the Lagrange factor, and L is an augmented Lagrangian multiplier. The solution to the original minimization problem (3) is transformed as a saddle point of the augmented Lagrangian L.

2.2. Long Short Term Memory Neural Network (LSTM)

The LSTM, as a special RNN, has strong processing ability for time series data and effectively overcomes the defects of gradient disappearance and gradient explosion in RNNs in machine learning. Figure 1 shows the schematic structure of the LSTM.
The three gates included in the LSTM all use the sigmoid function; the sigmoid function σ is [39]
σ ( x ) = 1 1 + e x
Step 1: The LSTM must decide what information is invalid and discard that information from the unit state. The Sigmoid layer named “Forgotten Gate” f t will do this part of the work. The expression of f t is
f t = σ ( W f h t 1 , x t + b f )
where h t 1 is the hidden state, and x t is the input vector.
Step 2: LSTM determines what information remains in the cell state. First, the “input gate layer” i t determines which values will be updated. Next, the candidate information C t is created through a t a n h neural network layer, and the input gate also reads h t 1 and x t . Next, multiply i t and C t to obtain the new information needed to remember the cell state. The i t and C t expressions are as follows [40]:
i t = σ W i h t 1 , x t + b i
C t = tan h W C h t 1 , x t + b C
Step 3: According to the current input variables x t and h t 1 , the cell C can be used to remove outdated old information and add the new information needed and thereby update the cell state and obtain the new cell state [41,42]:
C t : C t = f t C t 1 + i t C t
Step 4: Multiply the latest cell state C t result by the output gate O t vector to obtain the LSTM’s final output state vector h t :
tan h ( x ) = e x e x e x + e x
O t = σ W o h t 1 , x t + b o
h t = O t tan h ( C t )
where W f , W i , W C , and W o are the coefficient matrix, and b f , b i , b C , and b o are the bias vector.

2.3. Binary Encoding Genetic Algorithm (BEGA)

The BEGA algorithm is an improved genetic algorithm with binary coding. The genetic algorithm is an evolutionary principle that simulates “the survival of the fittest” phenomenon in nature [43]. The basic operation of the binary encoding genetic algorithm is as follows:
Step 1: Binary encoding. The four bases in the human chromosome are simulated with two numbers, 0 and 1, and the variable is described with a string of a certain length containing 0 s and 1 s.
Step 2: Select the operation. The selection operation of the genetic algorithm is a form of roulette. In accordance with the selection strategy of the fitness ratio, the selected probability P i of each individual i is
f i = k / F i
p i = f i j = 1 N f i
where f i is the fitness value, which is the root mean square error value between the power load forecast value and the true value under the LSTM neural network in this study.
Step 3: Cross operation. A pair of genetic sequences are cut individually via two-point crossover, and then the cut sequences are randomly combined, which makes the binary string sequence more likely to be transcoded.
Step 4: Mutation operation. Due to the particularity of binary coding, the method of bit-flip mutation is used in the mutation operation. For each gene value in the individual, the opposite value is taken according to the given mutation probability P.
Step 5: Decode. assuming that the binary code is converted into the decimal range of [ a , a ] and that b-bit precision is required, we need to discretize the interval into ( a + a ) × 10 b numbers. Then, we convert the binary code x b i n into a decimal real number x d e c . Next, through formula decoding using
x = a + x dec 2 a 2 a b 1
we can obtain a real number in the [ a , a ] interval.

2.4. Adaptive Moment Estimation Optimizer (Adam)

Adam is a random objective function optimization algorithm based on a one-step degree and also an adaptive estimation algorithm based on a low-order matrix. This method has high calculation efficiency and small memory requirements and is very suitable for optimization problems with a large data volume or many parameters [44]. The algorithm is as follows:
Step 1: Given an objective function J ( θ ) with some parameters θ , calculate the gradient at time t:
g t = θ J ( θ t 1 )
Step 2: Comprehensively, consider the gradient momentum of the previous time step and calculate and update the gradient mean m t and exponential moving average v t of the square of the gradient:
m t = φ 1 m t 1 + ( 1 φ 1 ) g t
v t = φ 2 v t 1 + ( 1 φ 2 ) g t 2
where φ 1 is the exponential decay rate, which controls weight distribution, and φ 2 is the exponential decay rate. This rate impacts the process of the previous squared gradient.
Step 3: Since the initial setting of m 0 and v 0 is 0, here we need to adjust the deviation of the gradient mean m 0 and the exponential moving average v 0 of the square of the gradient to alleviate the impact of the deviation:
m ^ t = m t / ( 1 φ 1 t )
v ^ t = v t / ( 1 φ 2 t )
Step 4: Update parameters θ :
θ t = θ t 1 α m ^ t / ( v ^ t + ε )
where α represents the step size, which is updated according to α t = α 1 φ 2 t 1 φ 1 t , and the built-in parameters of the Adam algorithm are set to φ 1 = 0.9 ; φ 2 = 0.999 ; ε = 10 8 .

3. The Formation of the Combined Forecasting System

In this study, a VLG hybrid system for short-term load forecasting is proposed to improve the forecasting accuracy. This system mainly uses VMD technology to remove the noise of the original sequence to obtain a stable time series, which is used as the input feature of the model training and provides a high-quality training set for the model. Here, BEGA is used to adaptively optimize the length L of the input data sample unit of the LSTM and the number N of the cell units, allowing the LSTM to achieve optimal performance. The analysis framework of the prediction system is shown in Figure 2. The 1-step is VMD data preprocessing module, the 2-step is BEGA and Adam optimization algorithm module, and the 3-step is LSTM prediction module. The specific instructions are as follows.

3.1. Data Preprocessing Module

As the original load data without data processing for feature extraction have strong volatility and randomness, if these data are input into the model input directly, the prediction performance will decline [45]. Therefore, the advanced decomposition ensemble method is used to eliminate the high frequency noise in the power load series and extract valuable information components from the original power load data, which can effectively improve the prediction accuracy. In this paper, we use VMD to preprocess the data, decompose the original time series into several intrinsic mode functions (IMFs), and then reconstruct those functions as the input series of the prediction model.

3.2. Optimization Algorithm Module

The module is mainly classified into two portions. The first part is used to optimize the selection of the input unit length L and the number of the number of hidden layer nodes N of the LSTM through a BEGA algorithm. In this part of the optimization, the binary genetic sequence is initialized. Through the genetic algorithm, L and N are changed within a certain range. Then, pass the changed L and N to the LSTM for training to calculate the loss function as
loss = 1 M m = 1 M ( y m y ^ m ) 2
where y m is the true value, and y ^ m is the predicted value. The smaller the value of the loss function, the smaller the model prediction error will be. Next, based on the loss function value, update the population, and obtain L and N values that minimize the value of the loss function. Finally, the length L of the input unit and the number N of the number of hidden layer nodes that optimize the performance of the model are determined.
The second part is to optimize the hyperparameters in LSTM. Adam is an adaptive objective function optimization algorithm that adaptively calculates the learning rates of different parameters on the basis of the first and second-moment estimates of the gradient to improve the model convergence speed and provide better prediction accuracy.

3.3. Forecast Module

To make the prediction performance more accurate and stable, the VLG forecasting system is built to forecast the power load data. By constructing the index system, the prediction errors between the prediction results and the original power load data are evaluated.

4. Experiment and Evaluation

The accurate prediction of short-term power load data has considerable practical significance and is very important for ameliorating the prediction performance of the model. Therefore, in this paper, we propose a hybrid system to enhance the accuracy of short-term power load prediction and apply other traditional nonlinear forecasting models for comparison in this section to assess the effectiveness of the employed hybrid system. Taking the power load data of New South Wales, South Australia, and Queensland (30 min power load data for each season in 2013) as examples, the performance of the hybrid model is evaluated. This part mainly outlines the procedure and conclusions of the experiment.

4.1. Model Evaluation Indicators

To judge whether the performance of one model is better than that of another model, the model evaluation criteria play a vital role. However, there is currently no unified standard to evaluate prediction performance, nor can a single model evaluation index fully reveal the excellent performance of a given model [46]. Therefore, we use the method of constructing an index evaluation system to evaluate the present model. This index system mainly includes the mean absolute percentage error (MAPE), root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). The smaller the values of these indicators are, the more accurate the prediction and the better the model performance will be. In contrast, the higher the value of R2, the better the prediction model performance. The particular content and calculation formula for each index is introduced in Table 2.

4.2. Experimental Setup

To estimate the performance of the VLG hybrid system for short-term power load predictions, the hybrid system is compared with the traditional models for each component to explore the impact of the combination model on short-term power load forecasting. We divide the experiment into three parts: Experiment 1, Experiment 2, and Experiment 3. In Experiment 1, The prediction performance of the model using VMD and other data preprocessing techniques is evaluated. Experiment 2 compares the performance of the VLG hybrid system with the performance of different neural networks and nonlinear prediction models in predicting the short-term power load. Experiment 3 discusses the impact of the adaptive optimization algorithm BEGA on model performance.

4.3. Data Description

In the experiment, since the power load data will change with the seasons, we selected the 2013 power load data of New South Wales, South Australia, and Queensland in Australia as the experimental verification data. We used the data from January 1 to January 31 as the summer dataset, the data from March 1 to March 31 as the autumn data set, the data from June 1 to June 30 as the winter dataset, and the data from September 1 until September 30th as the spring data set. In addition, the power load data set was given a time interval of 30 min. Among the data points included in the data set of each season, the first 80% were selected as the training set and the last 20% as the test set. Specifically, the lengths of the training set and test sets in the summer and autumn data sets of the three states were 1190 and 298, respectively. Correspondingly, the lengths of the training set and test set for the spring data set and winter data were 1152 and 288, respectively. When constructing the model input vector, we adopted a rolling acquisition mechanism. In other words, x ( 1 ) , x ( 2 ) , , x ( t 1 ) , x ( t ) was used as the basis of the latter data x ( t + 1 ) . Table 3 presents the descriptive statistics of the original power load data set, training data set, and test data set for the NSW seasonal data.

4.4. Model Parameter Setting

This section outlines the parameter settings of each component of the proposed hybrid prediction model VLG, including the length of the input unit, the initial learning rate, and other parameter settings of the LSTM, BEGA optimization algorithm, and Adam optimizer.

4.4.1. Parameter Setting of the ANN

Before the experiment, the parameters in the model and optimization algorithm must be defined. For the deep learning algorithm LSTM, the length of the input unit L and the number of hidden layer nodes N are optimized by the BEGA algorithm. The input unit length of BP is 5, and the number of hidden layer nodes is 11. See Table 4 for the other parameter settings.

4.4.2. Parameter Settings of the Optimization Algorithm

Setting the parameters of the BEGA optimization algorithm and Adam optimizer is very important for the short-term power load forecasting accuracy. The specific parameter settings we studied are shown in Table 5.
Figure 3 shows the power load data for South Australia, Queensland, and New South Wales. It is not difficult to see that the data fluctuations in New South Wales and Queensland are relatively stable, with South Australia fluctuating the most frequently. In the Figure 3, (a) is the situation of the three study regions. (b) is the feature selection part, which shows the decomposition results of power load series, and IMFs is arranged in descending order of frequency. Then, the high frequency IMF is removed and the remaining IMFs are reconstructed to obtain the optimal input sequence. The characteristics of the data series are obviously improved than the original ones.

4.5. Experiment I

In the previous section, we noted that the noise reduction technology of time series plays a crucial role in the performance and prediction accuracy of the model. Based on the research of previous scholars, noise reduction technology can mainly be divided into three major categories: empirical mode decomposition technology (EMD), denoising autoencoder technology (DA), and VMD, which is used in our proposed hybrid system. The principle of VMD was discussed in the previous section. EMD technology involves decomposing the internal model function from the original signal to obtain a series of different intrinsic mode functions (IMFs). This method can decompose non-stationary and non-linear signals into stationary signals with different time scales. The denoising autoencoder (DA) is a type of unsupervised learning that uses an encoder and a decoder. White noise or Gaussian noise is added to the original sequence, and the neural network is continuously iterated to obtain a dimensionality reduction feature expression of the data to achieve the effect of feature extraction.
Experiment 1 aims to verify the influence of the noise reduction data set on the model and the performance of the VMD noise reduction method. In this way, the prediction accuracy of VMD-LSTM and other neural network models based on EMD and DA (i.e., EMD-BP, EMD-LSTM, DA-BP, and DA-LSTM) is compared. We performed three experiments on each model to assess the prediction accuracy of each model, and the specific prediction results based on NSW spring dataset are shown in Table 6.
The details of the experiment are as follows:
(a)
The comparison results of the evaluation index system of VMD-LSTM and other hybrid systems are introduced in this table. The VMD-LSTM shows better prediction accuracy and prediction performance on most evaluation indicators. For the comparison between the single LSTM model and the VMD-LSTM system, it is obvious that the evaluation index of the VMD-LSTM system is superior to that of the LSTM model in all cases. At the same time, the prediction accuracy of all noise reduction models is higher than that of the model based on original data, which indicates that data preprocessing is indispensable for power load prediction.
(b)
In the case, we compare the VMD-LSTM model and several other hybrid model methods. Among the performance test index values of the experiments in various regions, the VMD-LSTM model offers the best MAPE results, with 0.4859%, 0.9352%, and 0.4922%. Secondly, the models based on prediction accuracy are VMD-LSTM, EMD-LSTM, EMD-BP, VMD-BP, DA-LSTM, and DA-BP, in order from high to low. Among the six models, VMD-LSTM has the best prediction accuracy. The coefficient of determination (R2) reflects the difference in the performance of the prediction model from the fit. In this experiment, the R2 of VMD-LSTM is the best with 0.9971, 0.9910, and 0.9967 in the three states. We also certify the effectiveness of the noise reduction model VMD employed in this paper.
(c)
The previous time-series data denoising technology is also applied to the power load, short-term wind speed, and stock prediction models. Most of these models only discuss the improvement of model accuracy and performance via noise reduction technology but do not discuss the new sequence obtained after using the noise reduction method correlation with the original time series. Therefore, through the gray correlation method (GC) and the method of calculating the Pearson correlation coefficient (PE) and Spearman correlation coefficient (SP), the differences between different noise reduction methods are discussed from the perspective of the correlation between the new sequence and the original sequence. Detailed calculation results are given in Table 7.
Table 7 shows the correlation between the new sequence obtained by different noise reduction methods and the original power load data, in which the new sequence obtained using the VMD method has the highest correlation with the original sequence. In summary, using the VMD noise reduction method to process data not only performs well in improving the prediction accuracy of the model but also maintains more original information in the sequence, making it a more suitable method for the data preprocessing of short-term power load data.
Remarks:
Based on the above experiments, the employed VMD-LSTM combined system has the highest prediction performance, and Queensland has the highest prediction accuracy, with a MAPE value of 0.4922%. The average prediction accuracy of the three regions is 0.6378%. This shows that the system has high prediction accuracy and excellent stability. Moreover, in the test index based on the correlation measure, VMD remains superior to other noise reduction models, further verifying the effectiveness of the system.

4.6. Experiment II

Power load data are very sensitive to natural factors such as the season and climate and is one of the dominant factors affecting the fluctuation characteristics of the power load. In this part, the 30 min power load data of New South Wales, Queensland, South Australia from March to April, June to July, September to October, and December to January in 2013 are used as seasonal data for this area. A comparison of the prediction performance differences between the different predictive methods based on the power load data of New South Wales is shown in Table 8. Moreover, Table 9 and Table 10 provide a prediction performance comparison between the traditional predicted model based on the seasonal data of Queensland and South Australia and the combination model based on the LSTM employed in this study. In general, the VLG system employed in this research provides better performance than traditional prediction methods.
(a) For New South Wales, based on the annual average of the prediction results, BP, PSO-BP, LSTM, and our proposed hybrid model VLG obtained good accuracy results for all evaluation indicators. The ARIMA model also achieved a prediction effect with a MAPE value of less than 2%. However, the MAPE values of the other two models were higher than 10%, and the performance was poor. The ARIMA model has always been considered to provide superior performance in predicting power load data, and its MAPE value is lower than the values of the other traditional models discussed in this article, which is logical. However, the ARIMA model is not beneficial for long-term rolling forecasting and has certain disadvantages. The BP model and the optimization system based on the BP neural network design have always had good prediction performance in many experiments related to power load prediction. However, their MAPE values are still nearly three times higher than those of our proposed hybrid model VLG, which shows that our proposed hybrid model provides outstanding performance in short-term power load prediction.
For seasonal data, the proposed system obtained the best performance for each seasonal data set. For spring data, the MAPE value obtained by the VLG hybrid system was 0.3081%. Among these methods, the ARIMA, BP, PSO-BP, and LSTM models all perform better. The forecasting accuracy of the VLG prediction model was the highest, with a MAPE of 0.3081%, RMSE of 30.27, R2 of 0.9979, and MAE of 24.88. In terms of the other seasonal feature data, the VLG model also obtained the best results, with corresponding MAPE values of 0.4271%, 0.2724%, and 0.3717%. Among them, the forecasting accuracy of the autumn feature data set was the highest in each model, while the forecasting accuracy of the general system of the winter feature data set was worse than that of other seasons, which indicates indirectly that the forecasting of the power load is affected by regional and seasonal factors.
(b) For Queensland, from the perspective of annual average forecasting accuracy, the proposed combined system is still better than other classic models in prediction accuracy, with a MAPE value of 0.3486%. Among the remaining eight models, the forecasting accuracy of the VMD-LSTM system ranks second. The performance of the PSO-BP optimization model and the original LSTM model is similar, and the prediction accuracy is excellent. However, the MAPE of the hybrid system’s VLG proposed in this paper was 0.1624%, 0.5052%, and 0.5499% lower than the values of the above model, respectively. Compared with the feature data of South Wales, the prediction accuracy of Elman and RBF models is improved but is still not ideal compared to the other models. For the seasonal data in Queensland, the VLG model still achieved the best results for the seasonal data, with corresponding MAPE values of 0.2602%, 0.3718%, 0.3101%, and 0.4524%, respectively. From the perspective of the coefficient of determination, the goodness of fit of the VLG combination forecasting model was the most significant, and R2 values of the four seasonal data were determined as 0.9983, 0.9953, 0.9979, and 0.9942, respectively. Here, the general prediction accuracy for the winter feature data set was worse than that of other seasons. The performance rankings of each model differed little from the rankings based on the South Wales feature dataset.
(c) For South Australia, from the perspective of the annual average prediction accuracy, in all prediction models, the performance evaluation index values calculated by the VLG hybrid model were significantly better than the performance index values calculated by the other prediction model processing methods, and its MAPE value was 0.9800%. The prediction results of the characteristic data of South Australia were slightly lower than those of the models of South Wales and Queensland, but the prediction accuracy of the VLG, VMD-LSTM, LSTM, and PSO-BP models was still excellent. In comparison with the other three systems, the MAPE value of the VLG hybrid model system decreased by 0.4066%, 1.2675%, and 1.1927%, respectively. In terms of seasonal feature data, like with the prediction results of Queensland and South Wales, the VLG hybrid model provided the smallest prediction errors and the greatest point prediction accuracy in four-season feature data prediction compared with other traditional models. Figure 4 shows the performance comparison between the VLG prediction system and the seven comparison models based on the spring data set of South Australia. In addition, Figure 5 shows the fit between the predicted value of VLG and the actual value of power load based on the four seasonal data sets in South Australia.
Moreover, based on the seasonal feature data of the three states, the seasonal temperature and regional climate were some of the most important factors affecting the performance of short-term power load predictions. Among them, autumn was often the season with the highest power load forecast accuracy. On the contrary, the prediction performance of the three states based on the winter feature data was lower than that of the other three seasons. In terms of regional differences, the annual average forecast performance gap between New South Wales and Queensland was not large, with MAPE values of 0.3717% and 0.3486%. South Australia’s annual average forecast performance was relatively poor, with a MAPE value of 0.9800%, which may be related to the South Australia power load data set.
Remark:
Based on Experiment 2, the performance evaluation value calculated by the hybrid model VLG is superior to the performance evaluation value calculated by any classic single-item model and hybrid model. Therefore, the experimental conclusions indicate that the employed VLG hybrid system performs well in short-term power load predictions. Simultaneously, the seasonal climate and other factors have a certain impact on power load forecasting.
In Figure 6, (a) is the radar map of the comparison of annual average MAPE of different prediction models in three regions. (b) gives the prediction performance comparison of different data preprocessing methods. (c) shows the outstanding advantages of the BEGA optimization algorithm and the prediction results of VLG prediction system. (d) shows the three indicators of the classic prediction model increase by percentage compared with the VLG prediction system, in New South Wales.

4.7. Experiment III

The results of Experiment 2 show that the hybrid model based on LSTM has excellent performance in short-term power load prediction. However, LSTM needs one to manually configure the input unit length L and the number of hidden layer nodes N. These configured parameters also largely determine the model’s ability to engage in short-term power load forecasting. There is no fixed rule for determining appropriate parameters when the LSTM predicts a time series. Therefore, common solutions include using repeated trials or enumeration methods to obtain appropriate parameters for accurate prediction accuracy. However, methods such as enumeration consume considerable time and may not necessarily select the best parameters. On the other hand, for the LSTM network, the final values of the hyperparameters of each neural unit and gate structure also have a significant impact on the prediction performance. Usually, the gradient descent method (GD) or stochastic gradient descent method (SGD) is used in an experiment to find and select these hyperparameters. The gradient descent method is prone to problems such as a local optimal solution and slow convergence. Although the prediction results of the VMD-LSTM were satisfactorily evaluated, considering the deficiencies of the LSTM and SGD algorithms, we do not consider the prediction performance to have reached its optimal value. In response, the VLG hybrid system uses a binary encoding genetic algorithm (BEGA) and the Adam optimizer to solve these problems.
In this part of the experiment, we focus on the hybrid model VMD-LSTM and the original LSTM neural network and use the Adam optimizer and BEGA for parameter optimization. Here, the performance of the feature data sets of each season in New South Wales under different configurations, with different numbers of input units L and numbers of hidden layer nodes N, was studied. That is, the experience of previous scholars was used to determine the two different sets of L and N values to verify the effects of the BEGA on model accuracy. The specific experimental steps are shown in Table 11. At the same time, using the spring data of New South Wales as the data set, the Adam optimizer is compared with the stochastic gradient descent method to discuss the differences in prediction performance produced by different hyperparameter optimization methods. The specific experimental steps are shown in Table 12.
As can be determined from Table 11, the VLG model using the binary encoding genetic algorithm provides better prediction results than the model selected via empirical summary in each season data set. Based on the seasonal feature data set, the BEGA optimal input unit length L and the number of hidden layer nodes N are 12–8, 12–8, 25–12, and 4–6, respectively, and the prediction accuracy MAPE values are 0.2602%, 0.3718%, 0.3101%, and 0.3758%, respectively. It can be seen from the MAPE values that using the BEGA algorithm to select the optimal unit length L and the number of cell units N can increase the model accuracy by up to 30%. Based on a comparison of the RMSE, MAE, and R2 values, the performance improvement of the VLG system in the spring data set was the highest, with MAE, RMSE, and R2 values of 15.53, 20.46, and 0.9986, respectively. Compared with the experience summary method, the MAE and RMSE average decline was 10.72, 15.37. In this four-season feature data set, our proposed VLG model using the BEGA algorithm has very good prediction performance and can accurately predict future short-term power load changes.
Table 12 shows the performance advantages and disadvantages of different optimizers based on LSTM and VMD-LSTM systems. The results show that the prediction values of LSTM networks with different structures are quite different. Through a longitudinal comparison, it can be seen that using the Adam optimizer is much better than using the random gradient descent for prediction. Under the same data set, the MAPE values of the LSTM and VMD-LSTM models using the Adam optimizer were 0.8744% and 0.2281%, while the MAPE values of the LSTM and VMD-LSTM models using the SGD algorithm were 1.071% and 0.6253%. Thus, compared to using the Adam optimizer, employing the SGD algorithm reduced the performance by 22% and 174%, respectively. This result is enough to show the importance of using the Adam optimizer for model prediction accuracy and that this optimizer significantly improves the stability and accuracy of the predictions. Moreover, through a horizontal comparison, it can be further shown that using the optimal number of input units L and the number of cell units N (4–6) determined by the BEGA algorithm can give the model the best performance in the field of measurement precision and stableness, thereby significantly improving model performance.
Remarks:
For prediction performance and prediction accuracy, using the proposed BEGA algorithm and Adam optimizer is significantly superior to applying the evaluation index value determined by summarizing experience and using the stochastic gradient descent method. This further confirms that the VLG hybrid system has outstanding performance in future short-term power load predictions.

5. Discussion

This section provides an in-depth exploration of the above experimental results—that is, the availability of the hybrid system, the performance differences between the optimization algorithm hybrid model used and other optimization hybrid prediction systems, and the improvements of the evaluation index of the proposed hybrid system.

5.1. Effectiveness of the Proposed System

First, the predicted error is a vital indicator to estimate the performance of the prediction system. In this section, the validity of the hybrid model developed by the Diebold-Mariano test is tested, and the significance level of the prediction error of the different models is demonstrated via the hypothesis test method. The VLG system is then compared with other models. The Diebold-Mariano test evaluates the differences between different prediction systems according to the error of the system prediction performance [47]. The null hypothesis H1 and the alternative hypothesis H0 are presented in Equation (25) and (26):
H 1 : E a L ( e r r i 1 ) E b L ( e r r i 2 )
H 0 : E a L ( e r r i 1 ) = E b L ( e r r i 2 )
DM = d m e a n ( L ( e r r i 1 ) L ( e r r i 2 ) ) S 2 / n S 2
The DM test consequences of the employed system and those of the other comparative models are shown in Table 13.
On account of the above results, the following conclusions can be drawn:
(1)
By contrasting and dissecting the forecasting errors of different hybrid systems, the DM test consequences of different prediction models are all at the upper limit at a confidence level of 1%;
(2)
The Diebold-Mariano test was performed on the prediction errors of four different traditional single models, and the test results of the VLG were all higher than the upper limit at a confidence level of 1%;
(3)
The minimum value of the comparison between the VLG combination system and the LSTM model using other data preprocessing technologies is 2.0104, which is also far beyond the 5% significance level threshold.
Therefore, according to the Diebold-Mariano test results, it can be legitimately summarized that the employed predicted model not only has greater prediction capacity than other systems but also indicates crucial distinctions in the level of prediction accuracy and superiority in short-term power load forecasting.

5.2. Model Stability Study

This section starts with the stability of the model and proposes two different sets of multiple experiments to estimate the prediction ability of the VLG system. By comparing the prediction stability values of the three systems of LSTM, BP, and PSO-BP, the prediction stability is verified for the core part of the hybrid system, LSTM, employed in this paper. All three models use raw data and can be modeled without denoising. Based on the Spring feature data set in Queensland, Australia, 20 repeated experiments were conducted on the three models to explore the volatility of the prediction accuracy of the models. As is well-known, variance can reflect the robustness and volatility of the forecasting system. The smaller the standard deviation of the prediction error is, the more robust the prediction system and the weaker the volatility will be. Therefore, the standard deviation of the predicted error is used to appraise the robustness of the employed combination prediction system and other contradistinctive systems.
Figure 7a illustrates the differences in forecasting stability between the three systems over 20 tests. The results show that the BP has the largest volatility: The maximum MAPE is 1.5648%, the minimum is 0.9186%, and the standard deviation is 0.2022. In contrast, the prediction capacity of the LSTM model is better and more stable. The maximum value of MAPE is 0.9126%, the minimum value is 0.8241%, and the standard deviation is only 0.0367. Among these ANNs, the LSTM model has the best effectiveness in prediction accuracy and stability, as well as the least standard deviation, which reflects the stability advantages of the LSTM model.
Moreover, the excellent prediction performance of these models depends to a great extent on the optimization method of the model parameters. Different parameter optimization methods have certain effects on the forecasting accuracy and convergence speed of the system. To ensure a single variable, the LSTM model was selected to apply the Adam optimizer and the SGD stochastic gradient descent method for 100 trials, and the differences between the two optimization methods were compared. Figure 7 (b) shows the results. The average value of MAPE of LSTM model obtained by Adam optimization method is 0.8457%, which is significantly improved than that of SGD. In the scatter plot of MAPE, the MAPE obtained by the Adam optimizer is between 0.8% and 0.9%, and the standard deviation is 0.0341. Moreover, the bandwidth of the scatter plot is narrower, indicating that the model prediction is more stable and has smaller prediction volatility. On the contrary, the MAPE value of the model using the stochastic gradient descent method is between 0.9% and 1.3%, with a standard deviation of 0.2451. Here, the prediction error of the model is increased, the stability is worse than that of the Adam optimizer, the prediction volatility is large, and the prediction performance is unstable.

5.3. Multi-Step Prediction and Result Analysis

In the experiments in Section 4, the forecast model was used to make the next forecast of the power load data—that is, a single-step forecast. This section compares the three different hybrid models proposed to appraise the prediction performance of the employed hybrid system VLG in a multi-step prediction test.
Unlike the comparison model proposed in the previous section, this experiment aims to verify whether the new VLG model is comparable with the other two VMD-based hybrid models (i.e., VMD-GWO-SVM and VMD-PSO-BP). SVM and BP models perform well in dealing with time series forecasting problems and are also generally employed in short-term power load data predictions. In the model’s hyperparameter optimization algorithm, the two comparative experiments, respectively, used the PSO optimization algorithm and the gray wolf optimization algorithm. In the spring data of Queensland, BP and SVM are optimized to certify the accuracy of the VLG hybrid system based on a comparison with excellent classic hybrid models (see Table 14 and Figure 8 for comparison results). In Figure 8, (a) is the forecasting and actual values of the proposed model and the classical hybrid prediction model in the multi-step prediction. (b) shows the index results of the proposed prediction system and the classical hybrid prediction model.).
For the spring data of Queensland, in a one-step forecast, the proposed combined model VLG and comparative model VMD-PSO-BP had no significant differences in the various evaluation indicators but still obtained the best MAPE, MAE and RMSE, respectively, with 0.2448%, 18.13, and 23.54. In the one-step prediction, although the employed system does not reflect the outstanding advantages of the classic combined model VMD-PSO-BP in prediction accuracy and performance, the performance remains no worse than that of any other models. When the prediction of the model involves a two-step prediction, the VLG achieves obvious advantages, and the MAPE, MAE, and RMSE are 0.7653%, 48.77, and 56.73, respectively. Instead, the VMD-PSO-BP model has no significant difference in prediction accuracy compared to the VMD-GWO-SVM in the two-step prediction. The prediction accuracy of both is far poorer than that of one-step prediction. The MAPE values are 1.6333% and 1.9159%, which are 0.868% and 1.1506% higher than the values of the VLG system. In the three-step prediction, the MAPEs of the three hybrid systems are all greater than 1%, but the MAPE of the proposed VLG combined system is 1.35655%. This system still offers the greatest prediction ability among the three excellent VMD-based combined systems. From the perspective of the improved MAPE, the prediction capacity of the VLG system improved the greatest in the two-step prediction, with 60.05% and 53.14%, respectively. Figure 8 demonstrates a comparison of the forecasting performance of the experiment with the spring data of Queensland steps 1, 2, and 3. Among the three-step prediction models, the VLG hybrid system is still the most accurate and valid prediction system.

5.4. Improvement of the Evaluation Index

In previous index evaluation systems, the MAPE values of each prediction model were too small, and the RMSEs were different because of the differences in the data dimension, making it challenging to intuitively display the degree of differences in the model prediction accuracy [48]. In this study, we use the percentage improvements of the MAPE and RMSE criteria. In this way, a comprehensive analysis of the proposed combined system can be carried out. The definition is as follows:
P MAPE = MAPE 1 - MAPE 2 MAPE 1 × 100 %
P RMSE = RMSE 1 - RMSE 2 RMSE 1 × 100 %
The improved MAPE and RMSE indicators are shown in Table 15. Considering the results in Table 15, the prediction capacity of the employed combined system is discussed and analyzed as follows:
(a)
The predictive capacity of the system employed in this study is clearly commendable.
(b)
The value of the percentage improvement of the evaluation index shows a clear decreasing trend. This indicates that the prediction veracity of the system is gradually improved due to the data preprocessing technology and simulation optimization algorithm playing a vital role in improving the prediction ability of the system.
(c)
Notably, the VLG hybrid system presents obvious advantages over other systems.

5.5. Future and Prospects

The hybrid prediction system based on deep learning proposed in this study overcomes the shortcomings of traditional prediction models. In the rapid development of the intelligent information age, accurate load forecasting has become an indispensable part of the power load field, which plays a vital role in the safe operation, daily distribution, and economy of the power system. From artificial neural networks, to machine learning, and finally to deep learning, the construction of load forecasting model is approaching ever closer to the actual situation. Given the existing prediction models, the prediction models constructed by deep learning methods such as LSTM and CNN are obviously better than those constructed by artificial neural networks and traditional machine learning algorithms. A hybrid prediction model combining the intelligent optimization algorithm and deep learning network has higher practical application value and stronger expansibility, making it able to more easily fit nonlinear time series with strong volatility.
In addition to traditional load forecasting, intermittent renewable energy, such as photovoltaic power generation and wind power generation, features stronger volatility, randomness, and instability. However, research on predicting intermittent renewable energy using deep learning methods is still very limited. The development of a comprehensive and effective deep learning method is expected to become a new direction of smart grid research. The application of artificial intelligence algorithms in the field of new energy is a new typical application scenario for artificial intelligence and also provides a new intelligent solution for building a global low-carbon energy future.

6. Conclusions

Short-term power load predictions play a crucial role in the safe operation and risk assessment of electrified wire netting, which has aroused attention and excitement among scholars. Because of the inherent uncertainty and randomness of power load sequences, determining how to efficiently and effectively predict the power load is still a challenging task. A string of research has also been developed to enhance the performance of power load prediction. Unfortunately, these methods are mostly restricted to using a onefold prediction model to predict power load sequences; however, any single model will have inherent shortcomings. Moreover, most previous studies have not considered the effect of data preprocessing and sequence noise on the model prediction accuracy. In response, this paper proposed a new predictive analysis system that overcomes the above shortcomings and provides an effective technical means for short-term power load analysis and monitoring. In the developed model, variational modal decomposition (VMD) was employed to divide the original sequence from high to low into a set of components. Through reconstruction, the global noise was effectively removed. Then, the long-term short-term memory neural network (LSTM) was used instead of the classical neural network to predict the power load data, which effectively improved the prediction accuracy. Finally, to further improve the modeling performance and robustness, an improved binary coding genetic algorithm was proposed based on the genetic algorithm; this algorithm achieved high accuracy and maintained strong stability. The effectiveness of the algorithm was verified by experiments. The predictive analysis system developed by the application was used to predict the seasonal data of New South Wales, Queensland, and South Australia and calculate multiple performance indicators (MAPE, RMSE, MSE, and PMAPE). The experimental results indicate that the minimum MAPE values of the employed VLG system are 0.3717%, 0.3486%, and 0.9800%, respectively, which are better than the values of the comparison model. Like the traditional excellent hybrid model, the multi-step prediction of this model also provides strong prediction performance. In general, the proposed predictive analysis system shows excellent performance in analyzing and monitoring short-term power loads. Specifically, this system does not only deeply analyze the information related to people’s activities and lives but can also accurately and steadily approach the actual values. Therefore, future power plant decision-makers and grid investors could make reasonable decisions based on the system presented in this article to monitor and predict power loads.

Author Contributions

Conceptualization, Y.J. Methodology, J.W. Visualization, A.S. Writing—review and editing, H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Foundation of Liaoning Social Science Federation [Grant No. 20201slktyb-031].

Acknowledgments

This research was supported by Foundation of Liaoning Social Science Federation [Grant No. 20201slktyb-031]. And all the authors do not have any possible conflicts of interest.

Conflicts of Interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work. This manuscript is our own work and the content of this paper has not been copied from elsewhere. This manuscript has not been published before nor submitted to another journal for the consideration of publication and all data measurements are genuine results and have not been manipulated. In addition, none of the authors have any financial or scientific conflicts of interest with regard to the research described in this manuscript. And this manuscript was supported by Foundation of Liaoning Social Science Federation [Grant No. 20201slktyb-031]. And all the authors do not have any possible conflicts of interest.

Abbreviations

List of terminologies (method and indices)
EMDEmpirical model decompositionDADenoise autoencoder
VMDVariational Modal DecompositionBPBack propagation neural network
LSTMLong Short-Term Memory neural networkRBFRadial basis function
ElmanElman neural networkSVMSupport vector machine
ARMAAutoregressive moving average modelARIMAAutoregressive interval moving average model
PSOPartial swarm optimization algorithmGWOGrey wolf optimization algorithm
EMD-BPBP after EMD technologyDA-BPBP after DA technology
VMD-BPBP after VMD technologyEMD-LSTMLSTM after EMD technology
DA-LSTMLSTM after DA technologyVMD-LSTMLSTM after VMD technology
GCGrey correlation methodIMFsIntrinsic mode functions
SPSpearman correlation coefficientPEPearson correlation coefficient
MAPEMean absolute percentage errorMAEMean absolute error
RMSERoot Mean Square ErrorR2Coefficient of determination
AIArtificial intelligencePSO-BPBP after PSO optimization algorithm
AdamAdaptive moment estimation optimizationSGDStochastic Gradient Descent
BEGABinary encoding genetic algorithmEMD-LSTM GALSTM after EMD and GA optimization
DA-LSTM-GALSTM after DA and GA optimizationVMD-PSO-BPBP after PSO and VMD optimization
VMD-GWO-SVMVMD after VMD and GWO optimizationDMDiebold-Mariano test
DLDeep learningRNNRecurrent neural networks
CNNConvolution neural networkBiLSTMBi-directional long-term memory neural network
ANNsArtificial neural networksPMAPEThe improvement in MAPE values
ADMMAlternate Direction Method of MultipliersVLGThe model combined VMD-BEGA -LSTM
List of terminologies (parameters and variables)
ωkCenter pulse frequencyN1Population Size of the GA
fActual signalαInitial learning rate
λLagrange factorxtCorresponding input
uk(ω)Modal function of VMD TechnologyL1Binary encoding length
ftForget gateitInput gate
OtOutput gateCtMemory cell state
htT-1 time inputsigmoidInput gate layer
WfCoefficient matrixbfBias vector
gtT gradient of time stepmtGradient mean
vtExponential moving averageβExponential decay rate
piSelection probability in geneticxbinBinary code
xdecDecimal codeLInput unit length
NNumber of cells in the hidden layerlossLoss function value of training model
ymTrue value of data y ^ m The value of model forecast
iterModel iterationsyAverage of sequence data
ωkCenter pulse frequencyN1Population Size of the GA
fActual signalxtCorresponding input
The main terminologies mentioned in this paper (including indices, methods, variables and parameters).

References

  1. Yang, W.; Wang, J.; Tong, N. A hybrid forecasting system based on a dual decomposition strategy and multi-objective optimization for electricity price forecasting. Appl. Energy 2019, 235, 1205–1225. [Google Scholar] [CrossRef]
  2. Guo, Z.; Zhou, K.; Zhang, X.; Yang, S. A deep learning model for short-term power load and probability density forecasting. Energy 2018, 160, 1186–1200. [Google Scholar] [CrossRef]
  3. Zhang, X.; Wang, J. A novel decomposition-ensemble model for forecasting short-term load-time series with multiple seasonal patterns. Appl. Soft Comput. 2018, 65, 478–494. [Google Scholar] [CrossRef]
  4. Wang, R.; Wang, J.; Xu, Y. A novel combined model based on hybrid optimization algorithm for electrical load forecasting. Appl. Soft Comput. 2019, 82, 105548. [Google Scholar] [CrossRef]
  5. He, Q.; Wang, J.; Lu, H. A hybrid system for short-term wind speed forecasting. Appl. Energy 2018, 226, 756–771. [Google Scholar] [CrossRef]
  6. Zhao, H.; Han, X.; Guo, S. DGM (1, 1) model optimized by MVO (multi-verse optimizer) for annual peak load forecasting. Neural Comput. Appl. 2018, 30, 1811–1825. [Google Scholar] [CrossRef]
  7. Yang, Z.; Niu, H. Research on urban distribution network planning management system based on load density method. Eng. Technol. Res. 2018, 8, 76–77. [Google Scholar]
  8. Cui, Q.; Shu, J.; Wu, Z.; Huang, L.; Yao, W.; Song, X. Medium- and long-term load forecasting based on glrm model and MC error correction. New Energy Progress 2017, 5, 472–477. [Google Scholar]
  9. Jaihuni, M.; Basak, J.K.; Khan, F.; Okyere, F.G.; Arulmozhi, E.; Bhujel, A.; Park, J.; Hyun, L.D.; Kim, H.T. A Partially Amended Hybrid Bi-GRU—ARIMA Model (PAHM) for Predicting Solar Irradiance in Short and Very-Short Terms. Energies 2020, 13, 435. [Google Scholar] [CrossRef] [Green Version]
  10. Song, J.; Wang, J.; Lu, H. A novel combined model based on advanced optimization algorithm for short-term wind speed forecasting. Appl. Energy 2018, 215, 643–658. [Google Scholar] [CrossRef]
  11. Lydia, M.; Kumar, S.S.; Selvakumar, A.I.; Kumar, G.E.P. Linear and nonlinear auto-regressive models for short-term wind speed forecasting. Energy Convers. Manag. 2016, 112, 115–124. [Google Scholar] [CrossRef]
  12. Kavasseri Rajesh, G.; Seetharaman, K. Day-ahead wind speed forecasting using f-ARIMA models. Renew. Energy 2009, 34, 1388–1393. [Google Scholar] [CrossRef]
  13. Zhang, X.; Wang, J.; Gao, Y. A Hybrid Short-Term Electricity Price Forecasting Framework: Cuckoo Search-Based Feature Selection with Singular Spectrum Analysis and Svm. Energy Econ. 2019, 81, 899–913. [Google Scholar] [CrossRef]
  14. Liu, J.; Liu, X.; Le, B.T. Rolling Force Prediction of Hot Rolling Based on GA-MELM. Complexity 2019, 3476521. [Google Scholar] [CrossRef]
  15. Fan, G.F.; Guo, Y.H.; Zheng, J.M.; Hong, W.C. A generalized regression model based on hybrid empirical mode decomposition and support vector regression with back--propagation neural network for mid--short--term load forecasting. J. Forecast. 2020, 39. [Google Scholar] [CrossRef]
  16. Xu, P. Research on Load Forecasting Method Based on Fuzzy Clustering and RBF Neural Network; Guangxi University: Nanning, China, 2012. [Google Scholar]
  17. Xingjun, L.; Zhiwei, S.; Hongping, C.; Mohammed, B.O. A new fuzzy--based method for load balancing in the cloud--based Internet of things using a grey wolf optimization algorithm. Int. J. Commun. Syst. 2020, 33. [Google Scholar] [CrossRef]
  18. Almalaq, A.; Edwards, G. A review of deep learning methods applied on load forecasting. In Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, Cancun, Mexico, 18–21 December 2017; pp. 511–516. [Google Scholar]
  19. Ryu, S.; Noh, J.; Kim, H. Deep neural network based demand side short term load forecasting. Energies 2016, 10, 3. [Google Scholar] [CrossRef]
  20. Massaoudi, M.S.; Refaat, S.; Abu-Rub, H.; Chihi, I.; Oueslati, F.S. PLS-CNN-BiLSTM: An End-to-End Algorithm-Based Savitzky–Golay Smoothing and Evolution Strategy for Load Forecasting. Energies 2020, 13, 5464. [Google Scholar] [CrossRef]
  21. Li, H.; Liu, H.; Ji, H.; Zhang, S.; Li, P. Ultra-Short-Term Load Demand Forecast Model Framework Based on Deep Learning. Energies 2020, 13, 4900. [Google Scholar] [CrossRef]
  22. Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-Term Residential Load Forecasting based on LSTM Recurrent Neural Network. IEEE Trans. Smart Grid 2017, 841–851. [Google Scholar] [CrossRef]
  23. Zhang, W.; Qu, Z.; Zhang, K.; Mao, W.; Ma, Y.; Fan, X. A combined model based on CEEMDAN and modified flower pollination algorithm for wind speed forecasting. Energy Convers. Manag. 2017, 136, 439–451. [Google Scholar] [CrossRef]
  24. Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
  25. Wu, Z.; Huang, N.E. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 2009, 1–41. [Google Scholar] [CrossRef]
  26. Ribeiro, M.H.D.M.; Stefenon, S.F.; De Lima, J.D.; Nied, A.; Mariani, V.C.; Coelho, L.S. Electricity Price Forecasting Based on Self-Adaptive Decomposition and Heterogeneous Ensemble Learning. Energies 2020, 13, 5190. [Google Scholar] [CrossRef]
  27. Stefenon, S.F.; Ribeiro, M.H.D.M.; Nied, A.; Mariani, V.C.; Dos Santos Coelho, L.; Da Rocha, D.F.M.; Grebogi, R.B.; De Barros Ruano, A.E. Wavelet group method of data handling for fault prediction in electrical power insulators. Int. J. Electr. Power Energy Syst. 2020, 123. [Google Scholar] [CrossRef]
  28. He, X.; Nie, Y.; Guo, H.; Wang, J. Research on a Novel Combination System on the Basis of Deep Learning and Swarm Intelligence Optimization Algorithm for Wind Speed Forecasting. IEEE Access 2020, 8, 51482–51499. [Google Scholar] [CrossRef]
  29. Zhang, Y.; Pan, G.; Chen, B.; Han, J.; Zhao, Y.; Zhang, C. Short-term wind speed prediction model based on GA-ANN improved by VMD. Renew. Energy 2020, 156, 1373–1388. [Google Scholar] [CrossRef]
  30. He, Z.; Chen, Y.; Shang, Z.; Li, C.; Li, L.; Xu, M. A novel wind speed forecasting model based on moving window and multi-objective particle swarm optimization algorithm. Appl. Math. Model. 2019, 76, 717–740. [Google Scholar] [CrossRef]
  31. Zhu, C.; Teng, K. An early fault feature extraction method for rolling bearings based on variational mode decomposition and random decrement technique. Vibroeng. Procedia 2018, 18, 41–45. [Google Scholar]
  32. Chen, X.; Yang, Y.; Cui, Z.; Shen, J. Wavelet Denoising for the Vibration Signals of Wind Turbines Based on Variational Mode Decomposition and Multiscale Permutation Entropy. IEEE Access 2020, 8, 40347–40356. [Google Scholar] [CrossRef]
  33. Dragomiretskiy, K.; Zosso, D. Variational Mode Decomposition. IEEE Trans. Signal Process. 2014, 62, 531–544. [Google Scholar] [CrossRef]
  34. Zhao, N.; Mao, Z.; Wei, D.; Zhao, H.; Zhang, J.; Jiang, Z. Fault Diagnosis of Diesel Engine Valve Clearance Based on Variational Mode Decomposition and Random Forest. Appl. Sci. 2020, 10, 1124. [Google Scholar] [CrossRef] [Green Version]
  35. Song, E.; Ke, Y.; Yao, C.; Dong, Q.; Yang, L. Fault Diagnosis Method for High-Pressure Common Rail Injector Based on IFOA-VMD and Hierarchical Dispersion Entropy. Entropy 2019, 21, 923. [Google Scholar] [CrossRef] [Green Version]
  36. Wang, J.; Wang, S.; Yang, W. A novel non-linear combination system for short-term wind speed forecast. Renew. Energy 2019, 143, 1172–1192. [Google Scholar] [CrossRef]
  37. Sun, H.; Fang, L.; Zhao, F. A fault feature extraction method for single-channel signal of rotary machinery based on VMD and KICA. J. Vibroeng. 2019, 21, 370–383. [Google Scholar] [CrossRef]
  38. Lin, H.; Hua, Y.; Ma, L.; Chen, L. Application of ConvLSTM network in numerical temperature prediction interpretation. In Proceedings of the ICMLC ′19—2019 11th International Conference on Machine Learning and Computing, Zhuhai, China, 22–24 February 2019; pp. 109–113. [Google Scholar] [CrossRef]
  39. Wang, J.Q.; Du, Y.; Wang, J. LSTM based long-term energy consumption prediction with periodicity. Energy 2020, 197, 117197. [Google Scholar] [CrossRef]
  40. Sakinah, N.; Tahir, M.; Badriyah, T.; Syarif, I. LSTM with adam optimization-powered high accuracy preeclampsia classification. In Proceedings of the 2019 International Electronics Symposium (IES), Surabaya, Indonesia, 27–28 September 2019; pp. 314–319. [Google Scholar] [CrossRef]
  41. Li, C.; Xie, C.; Zhang, B.; Chen, C.; Han, J. Deep Fisher discriminant learning for mobile hand gesture recognition. Pattern Recognit. 2018, 77, 276–288. [Google Scholar] [CrossRef] [Green Version]
  42. Qin, X.; Zhang, W.; Gao, S.; He, X.; Lu, J. Sensor fault diagnosis of autonomous underwater vehicle based on LSTM. In Proceedings of the 2018 37th Chinese Control Conference (CCC), Wuhan, China, 25–27 July 2018; pp. 6067–6072. [Google Scholar] [CrossRef]
  43. Li, H.; Wang, J.; Li, R.; Lu, H. Novel analysis–forecast system based on multi-objective optimization for air quality index. J. Clean. Prod. 2019, 208, 1365–1383. [Google Scholar] [CrossRef]
  44. Bera, S. Analysis of various optimizers on deep convolutional neural network model in the application of hyperspectral remote sensing image classification. Int. J. Remote Sens. 2020, 41. [Google Scholar] [CrossRef]
  45. Yang, W.; Wang, J.; Wang, R. Research and Application of a Novel Hybrid Model Based on Data Selection and Artificial Intelligence Algorithm for Short Term Load Forecasting. Entropy 2017, 19, 52. [Google Scholar] [CrossRef] [Green Version]
  46. Yang, W.; Wang, J.; Niu, T.; Du, P. A Novel System for Multi-Step Electricity Price Forecasting for Electricity Market Management. Appl. Soft Comput. 2020, 88. [Google Scholar] [CrossRef]
  47. He, B.; Ying, N.; Jianzhou, W. Electric Load Forecasting Use a Novelty Hybrid Model on the Basic of Data Preprocessing Technique and Multi-Objective Optimization Algorithm. IEEE Access 2020, 8, 13858–13874. [Google Scholar]
  48. Yechi, Z.; Jianzhou, W.; Haiyan, L. Research and Application of a Novel Combined Model Based on Multiobjective Optimization for Multistep-Ahead Electric Load Forecasting. Energies 2019, 12, 1931. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Flowchart of the long short-term memory neural network (LSTM) algorithm.
Figure 1. Flowchart of the long short-term memory neural network (LSTM) algorithm.
Energies 13 06241 g001
Figure 2. The flowchart of variational mode decomposition-binary encoding genetic optimization algorithm-LSTM (VLG) system.
Figure 2. The flowchart of variational mode decomposition-binary encoding genetic optimization algorithm-LSTM (VLG) system.
Energies 13 06241 g002
Figure 3. Site data and data preprocessing results. (a) the situation of the three study regions. (b) Data preprocessing process.
Figure 3. Site data and data preprocessing results. (a) the situation of the three study regions. (b) Data preprocessing process.
Energies 13 06241 g003
Figure 4. The fitting situation and error description of the VLG prediction system and seven comparative prediction models based on spring dataset of South Australia.
Figure 4. The fitting situation and error description of the VLG prediction system and seven comparative prediction models based on spring dataset of South Australia.
Energies 13 06241 g004
Figure 5. The fit of the VLG prediction system under the training set and test set based on the four seasonal data sets of South Australia.
Figure 5. The fit of the VLG prediction system under the training set and test set based on the four seasonal data sets of South Australia.
Energies 13 06241 g005
Figure 6. Comparison results of the hybrid forecasting models. (a) Comparison of annual average MAPE of different prediction models. (b) Compared results with different decomposed approaches. (c) Compared results of the BEGA algorithm. (d) performance demonstration of different prediction models based on NSW.
Figure 6. Comparison results of the hybrid forecasting models. (a) Comparison of annual average MAPE of different prediction models. (b) Compared results with different decomposed approaches. (c) Compared results of the BEGA algorithm. (d) performance demonstration of different prediction models based on NSW.
Energies 13 06241 g006
Figure 7. Prediction performance and stability of the STM neural network and Adam optimizer with the Queensland datasets. (a) Experimental results stability of different prediction models. (b) Experimental results of stability of different optimizers.
Figure 7. Prediction performance and stability of the STM neural network and Adam optimizer with the Queensland datasets. (a) Experimental results stability of different prediction models. (b) Experimental results of stability of different optimizers.
Energies 13 06241 g007
Figure 8. Comparison between the VLG model and the classical hybrid model in multi-step predictions. (a) Prediction model fitting. (b) Comparison of prediction performance evaluation indexes.
Figure 8. Comparison between the VLG model and the classical hybrid model in multi-step predictions. (a) Prediction model fitting. (b) Comparison of prediction performance evaluation indexes.
Energies 13 06241 g008
Table 1. Comparison of the advantages and disadvantages of existing power load forecasting systems.
Table 1. Comparison of the advantages and disadvantages of existing power load forecasting systems.
CategoryAdvantageDisadvantageMethod SampleSample AdvantageSample Disadvantage
Physical arithmeticSimple model ideal, wide parameters rangeNeed a lot of observation data, consumes a lot of computing resources, More suitable for long-term power load forecastSingle consumption methodSeparate analysis of different types of electricity consumptionWhen there are many influencing factors, the prediction accuracy is not high
Elastic coefficient method
Reflects the relationship between economic growth rate and power consumption growth rateThe calculation is complex and requires accurate statistics on economic growth
Statistical strategiesWide application,
higher prediction accuracy
In multi-step prediction, the prediction accuracy of the model is badTime series
(autoregressive moving average (ARMA), ARIMA)
Simple model assumptions, good self-fitnessThe extrapolation effect is poor, reducing the prediction range
Grey prediction systemLess modeling information and convenient operationLow accuracy lack of systematisms
Machine learningWide application,
Strong generalization and robustness
High complexity,
high requirements of knowledge
Neural network
(back propagation neural network (BP), support vector machine (SVM), RBF)
Excellent fitting effect nonlinear propertyHigh the degree of data dependence
Data denoising
(empirical mode decomposition (EMD), deep learning image noise reduction algorithm (DA))
Compared with other methods, it is easy to understandEMD has mode aliasing and DA has insufficient noise reduction
Deep learning
(convolution neural network (CNN))
Strong fault tolerance, simple human-computer interactionLong CPU operation time
Note: With the advance of forecasting technology, few researchers use a single method to forecast the power load. In this table, different power load forecasting methods used in previous studies are detailed introduced.
Table 2. The evaluation metrics.
Table 2. The evaluation metrics.
MetricDefinitionEquation
MAPEAverage absolute percentage errors MAPE = 1 N n = 1 N y ^ i y i y i × 100 %
MAEAverage absolute error MAE = 1 N n = 1 N y ^ i y i
RMSERoot Mean Square Error RMSE = 1 N n = 1 N ( y ^ i y i ) 2
R2Coefficient of determination R 2 = 1 i = 1 N ( F i A i ) 2 i = 1 N ( F i F ¯ ) 2
Note: In this table, y ^ i represents the predicted value of the system, y i is the true value of the power load data, and y ¯ is the average value of the sequence. The calculation equation is y ¯ = 1 / N ( i = 1 N y i ) .
Table 3. Statistical description of the NSW seasonal power load data (MW).
Table 3. Statistical description of the NSW seasonal power load data (MW).
SeasonDataNumberMinMaxMeanStandard
SummerAll samples
Training set
Testing set
1488
1190
298
5622.05
5622.05
5909.89
13787.85
13787.85
10928.47
8351.85
8409.83
8120.35
1519.63
1584.77
1200.07
AutumnAll samples
Training set
Testing set
1488
1190
298
5449.59
5689.53
5449.59
10724.86
10080.21
10724.86
7909.03
7959.44
7707.75
1166.83
1104.36
1372.34
WinterAll samples
Training set
Testing set
1440
1152
288
5997.81
5997.81
6191.79
11553.75
11553.75
11537.78
8602.26
8562.20
8762.52
1208.75
1213.52
1177.98
SpringAll samples
Training set
Testing set
1440
1152
288
5661.39
5661.39
5699.65
9916.19
9916.19
9081.59
7520.16
7543.92
7425.11
877.59
868.22
909.46
Table 4. Parameter settings of the artificial neural network model.
Table 4. Parameter settings of the artificial neural network model.
Parameters LSTMBP
Length of input units Based on BEGA algorithm / 55
Number of hidden layer nodes Based on BEGA algorithm / 811
Objective function MSEMSE
Activation function SigmoidPURELIN
Epochs200200
Initial learning rate0.0010.0001
Note: The input unit length of the LSTM model without the BEGA optimization algorithm is 5, and the number of hidden layer nodes is 8.
Table 5. Model parameters.
Table 5. Model parameters.
ModelParametersDefault Value
BEGAMaximum number of iterations30
Binary code length15
Population number10
Fitness functionMSE
Select operationroulette wheel selection
AdamInitial learning rate0.001
α 0.001
β 10.9
β 20.999
Table 6. Comparison of the prediction performance of models using different data preprocessing systems.
Table 6. Comparison of the prediction performance of models using different data preprocessing systems.
ModelMAERMSEMAPE(100%)R2
NSWSAQLDNSWSAQLDNSWSAQLDNSWSAQLD
LSTM
BP
83.91
100.49
32.65
35.30
51.08
86.59
110.57
125.95
50.15
52.63
67.69
74.80
0.8478
1.0305
2.3571
2.5735
0.8475
0.9509
0.9919
0.9887
0.9744
0.9743
0.9920
0.9896
EMD-LSTM
EMD-BP
51.95
65.89
13.25
22.31
30.99
38.01
63.05
75.11
17.21
29.13
37.89
46.31
0.5476
0.6558
1.0236
1.7704
0.5231
0.6846
0.9963
0.9932
0.9898
0.9823
0.9949
0.9921
DA-LSTM
DA-BP
61.09
68.30
20.69
19.79
38.66
41.31
77.72
88.57
29.62
32.38
42.86
54.61
0.7646
0.8553
1.6430
1.5923
0.7733
0.8062
0.9925
0.9906
0.9836
0.9841
0.9934
0.9924
VMD-LSTM
VMD-BP
49.42
52.04
11.69
19.77
28.75
35.54
62.17
94.89
15.60
25.48
34.47
42.78
0.4922
0.8251
0.9352
1.5808
0.4859
0.6421
0.9971
0.9921
0.9910
0.9802
0.9967
0.9925
Table 7. Correlation coefficient between new sequence and original sequence.
Table 7. Correlation coefficient between new sequence and original sequence.
ModelVMDEMDDA
NSWSAQLDNSWSAQLDNSWSAQLD
GC0.89530.7850.7490.8850.7790.7250.7520.7670.702
SP0.9990.9900.9990.9980.9900.9990.9960.9860.997
PE0.9990.9870.9990.9990.9860.9990.9950.9860.996
Table 8. The performance indicators of the VLG model based on seasonal data from New South Wales compared with other traditional power load forecasting methods.
Table 8. The performance indicators of the VLG model based on seasonal data from New South Wales compared with other traditional power load forecasting methods.
ModelSpringSummerAutumnWinterAnnual Mean
MAPE (%)R2RMSEMAEMAPE (%)R2RMSEMAEMAPE (%)R2RMSEMAEMAPE (%)R2RMSEMAEMAPE (%)R2RMSEMAE
Elman
RBF
ARIMA
SVM
BP
PSO-BP
LSTM
VMD-LSTM
VLG
19.1725
19.4860
1.7767
2.0421
1.2493
1.1766
0.8942
0.5032
0.3081
0.8506
0.8644
0.9812
0.9792
0.9836
0.9835
0.9944
0.9951
0.9979
1691.73
1707.73
121.86
176.34
127.40
132.91
90.83
49.16
30.27
1457.86
1477.86
117.85
158.17
101.96
90.41‘
71.91
40.84
24.88
8.1867
8.3402
1.4676
1.7475
1.2646
0.7660
0.9548
0.6901
0.4271
0.9198
0.9145
0.9817
0.9751
0.9833
0.9942
0.9934
0.9947
0.9972
906.39
922.56
150.69
119.33
138.28
86.28
112.82
72.96
48.05
714.14
725.44
119.91
112.19
103.55
65.77
87.33
60.83
38.53
6.7683
6.9717
1.5563
1.8620
1.2846
1.0928
1.0360
0.4762
0.2724
0.9337
0.9288
0.9812
0.9807
0.9830
0.9854
0.9877
0.9952
0.9981
671.54
682.04
144.49
130.54
126.88
113.07
103.10
46.33
26.66
508.20
520.26
114.45
121.22
98.95
80.16
78.48
28.45
20.76
8.3994
8.3686
1.9512
2.3122
1.1503
1.0047
1.0091
0.5423
0.4792
0.9131
0.9177
0.9810
0.9759
0.9839
0.9903
0.9852
0.9943
0.9970
612.39
612.06
139.23
196.92
115.42
161.11
107.30
52.14
44.73
763.87
766.47
126.23
179.83
89.81
67.09
72.56
34.69
35.48
10.6317
10.7916
1.6880
1.9910
1.2372
1.0100
0.9735
0.5530
0.3717
0.9043
0.9064
0.9813
0.9777
0.9835
0.9887
0.9902
0.9949
0.9976
970.51
981.09
139.06
155.78
126.99
123.34
103.51
55.15
37.43
901.85
872.51
119.61
142.85
98.56
75.85
77.57
41.20
29.91
Note: The above tables report the details of different forecasting methods. Three indicators were selected—MAE, RMSE, and MAPE—to verify the performance of each model. The specific equations of the three indicators are MAE = 1 / N ( n = 1 N y ^ i y i ) , RMSE = 1 N n = 1 N ( y ^ i y i ) 2 , MAPE = 1 N n = 1 N y ^ i y i y i , and R 2 = 1 i = 1 N ( F i A i ) 2 i = 1 N ( F i F ¯ ) 2 . Here, bold numbers indicate that the index values of this system are superior to those of the other system.
Table 9. The performance indicators of the VLG model based on seasonal data from Queensland compared with other traditional power load forecasting methods.
Table 9. The performance indicators of the VLG model based on seasonal data from Queensland compared with other traditional power load forecasting methods.
ModelSpringSummerAutumnWinterAnnual Mean
MAPE (%)R2RMSEMAEMAPE (%)R2RMSEMAEMAPE (%)R2RMSEMAEMAPE (%)R2RMSEMAEMAPE (%)R2RMSEMAE
Elman
RBF
ARIMA
SVM
BP
PSO-BP
LSTM
VMD-LSTM
VLG
7.4343
7.4613
1.9497
1.4228
0.9847
0.8657
0.8475
0.4859
0.2602
0.9212
0.9209
0.9801
0.9885
0.9896
0.9910
0.9919
0.9946
0.9983
531.83
534.82
137.21
113.49
74.80
73.07
67.69
34.47
20.26
430.17
431.19
107.65
101.34
59.45
48.37
51.08
28.75
15.53
6.8212
6.9253
2.1738
1.6231
1.1686
0.7973
0.9781
0.4453
0.3718
0.9314
0.9286
0.9664
0.9790
0.9862
0.9921
0.9897
0.9938
0.9953
514.17
523.02
133.49
124.26
85.89
61.26
70.98
32.47
28.46
392.03
398.09
122.18
112.32
67.20
45.14
56.83
25.81
21.67
7.1909
7.2187
2.1513
1.2141
0.9670
0.8817
0.8740
0.5228
0.3101
0.9117
0.9106
0.9658
0.9865
0.9891
0.9902
0.9913
0.9948
0.9979
538.69
545.30
136.75
101.93
73.76
69.80
65.07
35.36
21.60
404.58
406.02
123.22
89.44
53.29
49.31
49.20
29.19
17.30
4.3644
4.3591
1.8488
1.3952
0.927
0.8704
0.8944
0.5912
0.4524
0.9504
0.9510
0.9809
0.9860
0.9885
0.9907
0.9905
0.9941
0.9942
300.81
298.00
112.16
106.44
68.63
99.11
64.60
53.21
30.96
252.10
251.76
127.64
94.76
53.65
48.36
50.46
48.64
25.25
6.4527
6.4911
2.0309
1.4138
1.0118
0.8538
0.8985
0.5110
0.3486
0.9287
0.9278
0.9733
0.9850
0.9884
0.9910
0.9909
0.9943
0.9964
471.37
475.28
129.90
111.53
75.77
75.81
67.08
38.87
25.32
369.72
371.76
120.17
99.47
58.39
47.79
51.89
33.09
19.93
Note: The above tables report the details of different forecasting methods. Three indicators were selected—MAE, RMSE, and MAPE—to verify the performance of each model. The specific equations of the three indicators are MAE = 1 / N ( n = 1 N y ^ i y i ) , RMSE = 1 N n = 1 N ( y ^ i y i ) 2 , MAPE = 1 N n = 1 N y ^ i y i y i , and R 2 = 1 i = 1 N ( F i A i ) 2 i = 1 N ( F i F ¯ ) 2 . Here, bold numbers indicate that the index values of this system are superior to those of the other system.
Table 10. The performance indicators of the VLG model based on seasonal data from South Australia compared with other traditional power load forecasting methods.
Table 10. The performance indicators of the VLG model based on seasonal data from South Australia compared with other traditional power load forecasting methods.
ModelSpringSummerAutumnWinterAnnual Mean
MAPE(%)R2RMSEMAEMAPE(%)R2RMSEMAEMAPE(%)R2RMSEMAEMAPE(%)R2RMSEMAEMAPE(%)R2RMSEMAE
Elman
RBF
ARIMA
SVM
BP
PSO-BP
LSTM
VMD-LSTM
VLG
24.1390
24.6208
2.2543
2.7534
2.5735
2.1429
2.3571
1.2135
0.9816
0.7494
0.7501
0.9784
0.9728
0.9736
0.9781
0.9769
0.9885
0.9912
404.81
407.97
28.02
60.31
52.63
51.53
50.15
19.74
17.62
331.16
336.96
35.12
47.98
35.30
29.82
32.65
16.22
13.50
11.4670
11.6931
1.5754
2.1322
2.2731
1.8272
2.1480
0.9312
0.8991
0.8821
0.8796
0.9802
0.9745
0.9703
0.9820
0.9712
0.9907
0.9919
216.55
218.64
29.11
41.33
48.26
45.49
53.52
19.23
18.13
171.91
174.94
21.20
29.86
29.43
27.61
34.57
14.11
13.98
15.2161
15.5422
2.7415
2.6262
2.5139
2.4783
2.3904
1.1885
0.8212
0.8346
0.8339
0.9737
0.9749
0.9778
0.9789
0.9795
0.9894
0.9923
210.36
210.86
47.22
55.91
50.11
46.91
45.24
18.88
13.93
166.59
170.14
39.03
42.64
36.12
31.36
30.27
14.52
10.19
12.4362
12.5620
2.9042
2.7440
2.606
2.2424
2.0946
1.3132
1.2181
0.8744
0.8706
0.9721
0.9787
0.9793
0.9801
0.9814
0.9862
0.9896
222.15
220.51
55.12
52.35
57.42
49.50
45.57
44.21
25.41
165.58
166.75
39.93
34.22
37.84
28.20
27.27
27.54
20.21
15.8146
16.1045
2.3687
2.5639
2.4916
2.1727
2.2475
1.1616
0.9800
0.8351
0.8336
0.9761
0.9752
0.9755
0.9798
0.9773
0.9887
0.9913
263.46
264.49
39.86
52.475
52.11
48.35
48.62
25.51
18.77
208.81
212.19
33.82
38.68
34.67
29.24
31.19
18.09
14.47
Note: The above tables report the details of different forecasting methods. Three indicators were selected—MAE, RMSE, and MAPE—to verify the performance of each model. The specific equations of the three indicators are MAE = 1 / N ( n = 1 N y ^ i y i ) , RMSE = 1 N n = 1 N ( y ^ i y i ) 2 , MAPE = 1 N n = 1 N y ^ i y i y i , and R 2 = 1 i = 1 N ( F i A i ) 2 i = 1 N ( F i F ¯ ) 2 . Here, bold numbers indicate that the index values of this system are superior to those of the other system.
Table 11. Comparison of the results for different input unit lengths and numbers of cell units based on the NSW seasonal dataset.
Table 11. Comparison of the results for different input unit lengths and numbers of cell units based on the NSW seasonal dataset.
ModelEvaluation ParametersMAPEMAERMSER2
Season: Spring      The optimal parameters: 12–8
VMD-LSTMBEGA
5–8
16–12
0.2602
0.3612
0.5194
15.53
22.01
30.50
20.26
28.12
35.13
0.9986
0.9973
0.9949
Season: Summer      The optimal parameters: 1–-8
VMD-LSTMBEGA
5–8
16–12
0.3718
0.6480
0.5034
21.67
37.35
28.95
28.46
45.24
35.03
0.9971
0.9944
0.9952
Season: Autumn      The optimal parameters: 25–12
VMD-LSTMBEGA
5–8
16–12
0.3101
0.5785
0.3554
17.30
30.68
19.22
21.60
37.96
23.75
0.9978
0.9940
0.9969
Season: Winter      The optimal parameters: 4–6
VMD-LSTMBEGA
5–8
16–12
0.3758
0.5828
0.5521
20.90
32.69
30.63
25.77
40.98
37.55
0.9965
0.9938
0.9942
Table 12. Comparison of the performance results of different optimizers based on the NSW spring dataset.
Table 12. Comparison of the performance results of different optimizers based on the NSW spring dataset.
Forecasting
Model
MetricParameters
Optimizer BEGA
(4–6)
12–88–1016–12
LSTMAdamMAPE
MAE
RMSE
R2
0.8744
66.22
86.79
0.9926
1.1920
89.74
114.19
0.9874
0.9580
74.80
92.12
0.9901
0.8939
68.01
84.74
0.9913
LSTMSGDMAPE
MAE
RMSE
R2
1.071
81.34
109.94
0.9885
1.2219
91.59
118.64
0.9870
1.2685
96.27
123.39
0.9868
1.2840
97.95
126.43
0.9866
VMD-LSTMAdamMAPE
MAE
RMSE
R2
0.2281
19.58
22.89
0.9987
0.2947
22.35
27.02
0.9981
0.3671
28.53
33.65
0.9971
0.3609
19.21
35.50
0.9972
VMD-LSTMSGDMAPE
MAE
RMSE
R2
0.6253
60.04
48.52
0.9935
0.7022
65.98
51.61
0.9928
0.8376
61.64
77.12
0.9916
0.9021
66.66
82.78
0.9903
Table 13. The Diebold-Mariano test results for each model.
Table 13. The Diebold-Mariano test results for each model.
SiteModelVLG (Adam)ModelVLG (Adam)
NSW

Elman
RBF
SVM
BP
PSO-BP
LSTM
EMD-BP
9.8311 *
9.9634 *
4.1134 *
3.1654 *
2.9253 *
3.8639 *
3.8217 *
EMD-LSTM
DA-BP
DA-LSTM
VMD-BP
VMD-LSTM (SGD)
VMD-LSTM (Adam)
VLG (SGD)
2.1136 **
3.0329 *
3.9364 *
3.8029 *
3.8395 *
2.0104 **
3.7156 *
SiteModelVLG (Adam)ModelVLG (Adam)
QLD

Elman
RBF
SVM
BP
PSO-BP
LSTM
EMD-BP
8.2174 *
8.2316 *
4.3108 *
3.5936 *
3.1732 *
3.1420 *
2.4718 **
EMD-LSTM
DA-BP
DA-LSTM
VMD-BP
VMD-LSTM (SGD)
VMD-LSTM (Adam)
VLG (SGD)
2.3522 **
2.9871 *
2.8346 *
3.2135 *
2.8724 *
2.3412 **
2.3439 **
SiteModelVLG (Adam)ModelVLG (Adam)
SA

Elman
RBF
SVM
BP
PSO-BP
LSTM
EMD-BP
7.9347 *
7.9931 *
3.3416 *
3.3153 *
3.1732 *
3.3509 *
2.1798 **
EMD-LSTM
DA-BP
DA-LSTM
VMD-BP
VMD-LSTM (SGD)
VMD-LSTM (Adam)
VLG (SGD)
2.1261 **
2.8469 *
2.9336 *
2.1743 **
2.0214 **
1.9542 ***
2.0147 **
* represents the DM value at a confidence level of 1%. ** represents the DM value at a confidence level of 5%. *** represents the DM value at a confidence level of 10%.
Table 14. Comparison of the forecasting performance of the combined model and that of other VMD-based models.
Table 14. Comparison of the forecasting performance of the combined model and that of other VMD-based models.
ModelMAERMSEMAPE (100%)Percentage of the MAPE
1-Step2-Step3-Step1-Step2-Step3-Step1-Step2-Step3-Step1-Step2-Step3-Step
VMD-GWO-SVM
VMD-PSO-BP
VLG
39.22
19.90
18.13
146.36
119.96
48.77
164.35
133.26
103.22
99.87
48.52
23.54
183.45
161.51
56.73
198.24
172.97
133.86
0.8846
0.2821
0.2448
1.9159
1.6333
0.7653
2.0213
1.6722
1.3565
72.33%
13.22%
-
60.05%
53.14%
-
32.88%
18.87%
-
Table 15. Improvement percentage of the proposed model.
Table 15. Improvement percentage of the proposed model.
Model
P MAPE (100%)
SpringSummerAutumnWinterAnnual Mean
NSWQLDSANSWQLDSANSWQLDSANSWQLDSANSWQLDSA
Elman
RBF
ARIMA
SVM
BP
PSO-BP
LSTM
VMD-LSTM
EMD-LSTM-GA
DA-LSTM-GA
VMD-PSO-BP
98.39
98.41
82.66
84.91
75.33
73.81
65.22
38.77
10.11
35.79
12.77
96.50
96.51
86.65
81.71
73.57
69.94
67.70
46.44
14.89
39.46
16.31
95.93
96.01
56.45
64.35
61.85
54.19
58.3
19.11
10.36
27.33
9.80
94.78
94.87
70.89
75.56
66.22
44.24
55.26
38.11
8.74
28.22
10.06
94.54
94.6
82.89
77.09
68.18
53.36
61.98
16.50
9.47
32.12
9.89
92.15
92.31
42.87
57.83
60.44
50.79
58.14
3.44
7.32
27.22
4.22
95.97
96.09
82.49
85.37
78.79
75.12
73.70
42.79
14.31
38.12
15.25
95.68
95.70
85.58
74.46
67.83
64.82
64.51
40.68
12.74
36.48
13.22
94.60
94.71
70.04
68.73
67.33
66.86
66.64
30.90
9.33
29.87
11.47
94.29
94.27
75.44
79.27
58.34
52.30
52.51
11.64
7.35
0.13
8.36
89.63
89.62
75.53
67.57
51.19
48.02
49.41
23.47
7.27
19.52
7.98
90.20
90.31
58.06
55.61
53.25
45.67
41.84
7.24
6.11
17.84
5.31
96.50
96.55
77.98
81.33
69.95
63.19
61.81
32.78
10.31
30.83
11.12
94.59
94.62
82.83
75.34
65.54
59.17
54.99
31.78
11.09
31.89
11.85
93.80
94.26
58.63
61.78
62.56
54.89
56.39
15.63
8.23
28.86
7.75
Model
PRMSE (100%)
SpringSummerAutumnWinterAnnual Mean
NSWQLDSANSWQLDSANSWQLDSANSWQLDSANSWQLDSA
Elman
RBF
ARIMA
SVM
BP
PSO-BP
LSTM
VMD-LSTM
EMD-LSTM-GA
DA-LSTM-GA
VMD-PSO-BP
98.21
98.23
75.16
82.83
76.24
77.23
66.67
38.43
10.45
32.48
13.34
96.19
96.21
85.23
82.14
72.91
72.27
70.06
41.22
13.96
37.97
17.54
95.64
95.68
37.12
70.78
66.52
65.81
64.87
10.739
11.43
26.71
9.66
94.69
94.79
68.11
59.73
65.25
44.31
57.41
34.14
8.41
30.15
10.23
94.46
94.55
78.68
77.09
66.86
53.54
59.90
12.35
9.33
34.01
8.97
91.62
91.70
37.71
56.13
62.43
60.14
66.12
5.720
7.42
26.17
4.63
96.03
96.09
81.55
79.58
78.99
76.42
74.04
42.46
15.09
37.38
14.17
95.99
96.03
84.20
78.80
70.71
69.05
66.80
38.91
11.98
35.84
14.03
93.37
93.39
70.49
75.08
72.20
70.30
69.21
26.21
9.48
28.85
11.66
92.70
92.69
67.87
77.28
61.25
72.24
58.31
13.44
6.89
0.24
8.72
89.70
89.61
72.39
70.91
54.88
68.76
52.07
41.81
7.05
18.89
7.64
88.56
88.47
53.90
51.46
55.75
48.67
44.24
42.52
6.35
16.97
5.44
96.14
96.18
73.08
85.11
70.53
69.65
63.84
32.13
10.64
32.85
11.10
94.62
94.67
80.51
77.29
66.583
66.60
62.25
34.86
11.01
32.22
12.30
92.88
92.91
52.91
64.23
63.98
61.17
61.39
26.42
8.44
27.26
7.21
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Jin, Y.; Guo, H.; Wang, J.; Song, A. A Hybrid System Based on LSTM for Short-Term Power Load Forecasting. Energies 2020, 13, 6241. https://doi.org/10.3390/en13236241

AMA Style

Jin Y, Guo H, Wang J, Song A. A Hybrid System Based on LSTM for Short-Term Power Load Forecasting. Energies. 2020; 13(23):6241. https://doi.org/10.3390/en13236241

Chicago/Turabian Style

Jin, Yu, Honggang Guo, Jianzhou Wang, and Aiyi Song. 2020. "A Hybrid System Based on LSTM for Short-Term Power Load Forecasting" Energies 13, no. 23: 6241. https://doi.org/10.3390/en13236241

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop