A Hybrid System Based on LSTM for Short-Term Power Load Forecasting

Jin, Yu; Guo, Honggang; Wang, Jianzhou; Song, Aiyi

doi:10.3390/en13236241

Open AccessArticle

A Hybrid System Based on LSTM for Short-Term Power Load Forecasting

School of Statistics, Dongbei University of Finance and Economics, Dalian 116025, China

^*

Author to whom correspondence should be addressed.

Energies 2020, 13(23), 6241; https://doi.org/10.3390/en13236241

Submission received: 21 October 2020 / Revised: 20 November 2020 / Accepted: 23 November 2020 / Published: 26 November 2020

(This article belongs to the Special Issue Forecasting and Planning in Power Systems)

Download

Browse Figures

Versions Notes

Abstract

:

As the basic guarantee for the reliability and economic operations of state grid corporations, power load prediction plays a vital role in power system management. To achieve the highest possible prediction accuracy, many scholars have been committed to building reliable load forecasting models. However, most studies ignore the necessity and importance of data preprocessing strategies, which may lead to poor prediction performance. Thus, to overcome the limitations in previous studies and further strengthen prediction performance, a novel short-term power load prediction system, VMD-BEGA-LSTM (VLG), integrating a data pretreatment strategy, advanced optimization technique, and deep learning structure, is developed in this paper. The prediction capability of the new system is evaluated through simulation experiments that employ the real power data of Queensland, New South Wales, and South Australia. The experimental results indicate that the developed system is significantly better than other comparative systems and shows excellent application potential.

Keywords:

power load forecasting; hybrid analysis-forecast system; data preprocessing strategy; deep learning structure; optimization algorithm

1. Introduction

Along with fast-economic development, power enterprises continue to expand their construction scales, and the corresponding power grid structures and operation modes are gradually becoming diversified [1]. Because electricity is a special type of energy, people cannot store much electricity. This requires the power generation capacity of power generation enterprises and the power supply capacity of power supply companies to maintain a state of dynamic balance; otherwise, the lives of residents and the production of enterprises will be affected, potentially endangering the security and availability of the whole electrified wire netting system. Accurate power load prediction provides an important guarantee to ensure that the power supply and demand remain in a stable state. An accurate power load will yield significant economic benefits. It is estimated that every 1% increase in the accuracy of power consumption forecasting will save millions of dollars in operating costs [2]. Improving the prediction performance of electrical load prediction will not only provide a solid foundation for the smooth operation of the power grid but also provide theoretical support for power supply and dispatching plans.

The power load forecast is closely related to the dispatch and normal operations of national electricity consumption, which affects the lives and production activities of residents and the country overall [3]. The current main research work focuses on ultra-short-term and short-term power load prediction, which are hotspots that will allow academia and power companies to dynamically adjust their power generation plans and trading plans in the market environment [4]. This article mainly studies short-term power load forecasting.

Based on the principles and structures of the research methods, power load research can be distinguished into three categories: (i) Physical prediction methods, (ii) statistical prediction methods, and (iii) machine learning methods. The physical prediction model is a prediction model established by combining some physical characteristics and historical power load data. Most of the model assumes that the relationship between power load data and related physical information is still valid in future predictions, and the power load is predicted through intuitive analysis [5]. The main methods involve the unit consumption method, the load density method, and the elastic coefficient method [6]. For example, Yang et al. [7] predicted the power load distribution by calculating the load density and combining it with the characteristics of power consumption in the area. However, physical prediction methods require a great deal of observation data, which inevitably consumes exorbitant and difficult-to-obtain computing resources. Moreover, these physical methods are more appropriate for long-term power load predictions than short-term power load predictions. Statistical methods are more fit for short-term power load predictions. In recent years, traditional learning methods and traditional statistical systems have been broadly used for predicting short-term power loads. Considering previous time-series data of power loads, Cui et al. [8]. developed a new load forecasting model by combining the grey linear regression model with the Markov chain, which overcomes the shortcomings of the traditional grey model that ignores the linear factors. Commonly used traditional statistical systems include autoregressive (AR) and autoregressive moving average (ARMA) models [9]. Song et al. [10] proposed a non-parametric hybrid model. The results show that the hybrid model relying on non-parameters is generally better than other models. However, this method also has some shortcomings. For example, in multi-step predictions, this model’s prediction accuracy is low, and the time series prediction method currently widely used cannot take into account the influence of meteorological factors [11]. Although some researchers have proposed some methods to increase the adaptability of time series forecasting methods to meteorological changes, such methods still use the ARIMA model, which has insufficient explanatory power and cannot fundamentally improve the adaptability of the time series prediction method to meteorological changes [12].

In the 1990s, computers gradually entered all walks of life, and power load prediction technology was created in large quantities. Artificial neural network models are becoming increasingly popular for power load prediction. The artificial neural network (ANN) model has very strong nonlinear modeling capabilities and is a data-driven nonlinear adaptive method [13]. At the present time, the main algorithms for artificial neural networks are the back propagation neural network (abbreviated as BP) and Elman neural network (abbreviated as Elman), which can approximate any nonlinear function without knowing the relationship between the predictive model and the data [14]. Moreover, the support vector machine (SVM) originally employed by Vapnik and others of Bell Labs is extensively used in the domain of power load prediction and has been continuously improved by researchers. Some of the artificial intelligence technologies have been applied as follows. Guo et al. [15] suggested a generalized neural network based on using the local mean decomposition to make predictions. Xu et al. [16] proposed an improved RBF prediction model using a weighted fuzzy clustering algorithm to determine the center of the benchmark function. Li et al. [17] established a hybrid neural network model using an improved gray wolf bionic algorithm. The experimental results show that in power system prediction, the hybrid model significantly reduces the prediction errors compared to other comparative models. The effectiveness of the neural network method in power load forecasting has been extensively verified. To date, deep learning (DL) technology has been applied in the automation of many industries, such as image and audio detection [18]. Compared with the neural network, deep learning has a deeper hidden layer. This layer can make the computer perceive problems like human beings by simulating the connection of the human brain grid. At percent, DL has become one of the most attractive technologies in short-term electrified power load prediction as a result of its excellent end-learning capacity and offers the most advanced forecasting performance [19]. Li et al. [20] combined the convolution neural network (CNN), long short-term memory neural network (LSTM), and gated recurrent unit (GRU) algorithm and proposed a prediction model based on deep learning for power load forecasting in Beijing. Massaoudi et al. [21] combined the savitzky Golay filter with the bi-directional long-term memory neural network (BiLSTM) to predict the short-term power load. Experimental results show that the proposed model is highly effective.

Although the DL algorithm improves the prediction accuracy of traditional prediction models, a single traditional prediction system often has some prominent and obvious shortcomings, which leads to unsatisfactory prediction results. Therefore, our study focuses on the combined model [22]. However, modeling of the classical hybrid model still needs to be improved. The high volatility and randomness of the power load data will affect the learning ability of the prediction model and lead to poor prediction performance when using raw data without processing directly [23]. Hence, to reduce the random interference of the data sequence and improve the prediction performance, data preprocessing methods such as empirical mode decomposition (EMD) and the deep learning image noise reduction algorithm (DA) are applied to time series prediction [24,25]. For example, Ribeiro et al. [26] proposed an adaptive, decomposition, heterogeneous, and integrated learning model. In the data preprocessing stage, the super parameters of complementary ensemble empirical mode decomposition were optimized, and three machine learning models were calculated to predict the short-term electricity price in Brazil market. Stefenon et al. [27] proposed a method of feature extraction using wavelet energy coefficient and combining with LSTM to predict power insulator fault. The experimental results show that the method has good prediction accuracy. Although these methods improve the prediction accuracy to a certain extent, they still have some shortcomings. For example, the problem of mode aliasing often occurs in EMD, and the residual noise in DA cannot be processed [28,29]. On the other hand, the final result of the deep learning network depends to some extent on the initial random hidden layer nodes, input unit length, and model hyperparameter optimization method, which will affect the instability of the prediction. However, there is a contradiction between the number of hidden layer nodes in the network and training ability. Universally speaking, when the hidden layers are few, the prediction accuracy is also poor. To some extent, as the hidden layer nodes increase in number, the forecasting accuracy also improves. Nonetheless, this correlativity is limited. When reaching the apex, the predictive power decreases as the number of hidden layer nodes increases. Therefore, it is very important to determine the number of hidden layers and the length of the input cells [30]. By reviewing the previous literature, we found that the above prediction method still has some inherent defects [31]. The shortcomings of these systems are summarized in Table 1.

According to the above analysis, this paper proposes a new hybrid system that combines data preprocessing technology, the advanced deep learning prediction method, and the bionic optimization algorithm to further improve the short-term power load forecasting accuracy. More specifically, based on variational mode decomposition (VMD), the original power load series are decomposed and reconstructed in this paper to effectively remove the noise of the original load data and extract the data features effectively. Then, we apply the deep learning prediction method using the LSTM to predict the processed power load data. Finally, A calculation technique employing the binary encoding genetic optimization algorithm (BEGA) based on swarm intelligent evolution and a bionic strategy is proposed to find the optimal LSTM model’s hidden layer nodes and input unit length. The main contributions and innovations of this research are as follows:

The original power load data is decomposed and reconstructed using VMD technology to extract the effective features of the data. This reduces the adverse effects of the instability and irregularity of the original load data on the forecasting model.
The long-term and short-term memory neural network is applied to forecast the power load data. This solves the problem where the time series depends on previous data and overcomes the low accuracy and poor stability of traditional models.
A binary encoding genetic algorithm is proposed to adaptively decide the hidden layer nodes and the length of the input data unit of the LSTM. This algorithm abandons the traditional decimal coding method and uses binary coding for integer optimization.
The adaptive moment estimation (Adam) algorithm is employed for optimizing the model’s hyperparameters, instead of the traditional gradient descent algorithm and stochastic gradient descent algorithm (SGD). This improves the convergence speed and prediction stability of the model.
The prediction model optimized by the hybrid optimization method has high prediction accuracy and good stability, thereby effectively improving the accuracy of power load prediction.

The second part introduces the specific methods used for the proposed model, including data preprocessing technology, the LSTM prediction model, and the binary encoding genetic algorithm; we describe the framework of the VLG model in detail in Section 3. In Section 4, we conduct three different experiments from different angles and probe the experimental conclusions of the combined system and other systems. To further certify the accuracy and effectiveness of the new hybrid system, Section 5 offers a specific discussion. Finally, the results and conclusions are given in Section 6.

2. Related Theory

In this part, the methods adopted in the employed hybrid system are explained.

2.1. Variational Mode Decomposition (VMD)

VMD is a completely non-recursive adaptive signal processing technology employed by Dragomiretskiy et al. [32], which can effectively achieve the adaptive separation of signals in the frequency domain.

Step 1: For each mode, the analytical signal related to each mode is calculated by the Hilbert transition, and its single-sided spectrum can then be acquired [33]:

Hilbert f (t) = \frac{1}{π} p . v . \int_{ℝ} \frac{f (v)}{t - v} d v

(1)

Step 2: Adjust the estimated center frequency

e^{- j ω_{k} t}

by adding an exponential term to each mode [34]:

[(δ (t) + \frac{j}{π t}) μ_{k} (t)] e^{- j ω_{k} t}

(2)

Step 3: Each mode can be closely centered around the center pulse frequency. Gaussian smoothness is used to estimate the bandwidth of the signal above so that a constrained variation problem can be obtained:

Min \{\sum_{K} {‖\partial_{t} [(δ (t) + \frac{j}{π t}) * μ_{k} (t)] e^{{- j φ}_{k} t}‖}_{2}^{2}\}

(3)

s . t . \sum_{K} μ_{k} = f_{0}

(4)

where

f

is the original signal,

μ_{k}

is the modal function, and

δ (t)

is the Dirac distribution [35].

Step 4: The second penalty factor ensures that the signal still offers better reconstruction accuracy under high noise conditions. The Lagrange multiplier maintains strict constraints, and the augmented Lagrange formula is [36,37,38]

\begin{matrix} L (\{μ_{k}\}, \{φ_{k}\}, λ) = α {\sum_{k} ‖\partial_{t} \{[ψ (t) + \frac{j}{π t}] * μ_{k}\} e^{{- j φ}_{k} t}‖}_{2}^{2} \\ + {‖f (t) - \sum_{k} m_{k} (t)‖}_{2}^{2} + 〈λ (t), f (t) - \sum_{k} m_{k} (t)〉 \end{matrix}

(5)

where

α

is the penalty factor,

λ

is the Lagrange factor, and L is an augmented Lagrangian multiplier. The solution to the original minimization problem (3) is transformed as a saddle point of the augmented Lagrangian L.

2.2. Long Short Term Memory Neural Network (LSTM)

The LSTM, as a special RNN, has strong processing ability for time series data and effectively overcomes the defects of gradient disappearance and gradient explosion in RNNs in machine learning. Figure 1 shows the schematic structure of the LSTM.

The three gates included in the LSTM all use the sigmoid function; the sigmoid function

σ

is [39]

σ (x) = \frac{1}{1 + e^{- x}}

(6)

Step 1: The LSTM must decide what information is invalid and discard that information from the unit state. The Sigmoid layer named “Forgotten Gate”

f_{t}

will do this part of the work. The expression of

f_{t}

is

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(7)

where

h_{t - 1}

is the hidden state, and

x_{t}

is the input vector.

Step 2: LSTM determines what information remains in the cell state. First, the “input gate layer”

i_{t}

determines which values will be updated. Next, the candidate information

\tilde{C_{t}}

is created through a

t a n h

neural network layer, and the input gate also reads

h_{t - 1}

and

x_{t}

. Next, multiply

i_{t}

and

\tilde{C_{t}}

to obtain the new information needed to remember the cell state. The

i_{t}

and

\tilde{C_{t}}

expressions are as follows [40]:

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(8)

\tilde{C_{t}} = \tan h (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C})

(9)

Step 3: According to the current input variables

x_{t}

and

h_{t - 1}

, the cell C can be used to remove outdated old information and add the new information needed and thereby update the cell state and obtain the new cell state [41,42]:

C_{t} : C_{t} = f_{t} * C_{t - 1} + i_{t} * \tilde{C_{t}}

(10)

Step 4: Multiply the latest cell state

C_{t}

result by the output gate

O_{t}

vector to obtain the LSTM’s final output state vector

h_{t}

:

\tan h (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

(11)

O_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

(12)

h_{t} = O_{t} * \tan h (C_{t})

(13)

where

W_{f}

,

W_{i}

,

W_{C}

, and

W_{o}

are the coefficient matrix, and

b_{f}

,

b_{i}

,

b_{C}

, and

b_{o}

are the bias vector.

2.3. Binary Encoding Genetic Algorithm (BEGA)

The BEGA algorithm is an improved genetic algorithm with binary coding. The genetic algorithm is an evolutionary principle that simulates “the survival of the fittest” phenomenon in nature [43]. The basic operation of the binary encoding genetic algorithm is as follows:

Step 1: Binary encoding. The four bases in the human chromosome are simulated with two numbers, 0 and 1, and the variable is described with a string of a certain length containing 0 s and 1 s.

Step 2: Select the operation. The selection operation of the genetic algorithm is a form of roulette. In accordance with the selection strategy of the fitness ratio, the selected probability

P_{i}

of each individual

i

is

f_{i} = k / F_{i}

(14)

p_{i} = \frac{f_{i}}{\sum_{j = 1}^{N} f_{i}}

(15)

where

f_{i}

is the fitness value, which is the root mean square error value between the power load forecast value and the true value under the LSTM neural network in this study.

Step 3: Cross operation. A pair of genetic sequences are cut individually via two-point crossover, and then the cut sequences are randomly combined, which makes the binary string sequence more likely to be transcoded.

Step 4: Mutation operation. Due to the particularity of binary coding, the method of bit-flip mutation is used in the mutation operation. For each gene value in the individual, the opposite value is taken according to the given mutation probability P.

Step 5: Decode. assuming that the binary code is converted into the decimal range of

[- a, a]

and that b-bit precision is required, we need to discretize the interval into

(|- a| + a) \times 10^{b}

numbers. Then, we convert the binary code

x^{b i n}

into a decimal real number

x^{d e c}

. Next, through formula decoding using

x = - a + x^{dec} * \frac{2 a}{2 a^{b} - 1}

(16)

we can obtain a real number in the

[- a, a]

interval.

2.4. Adaptive Moment Estimation Optimizer (Adam)

Adam is a random objective function optimization algorithm based on a one-step degree and also an adaptive estimation algorithm based on a low-order matrix. This method has high calculation efficiency and small memory requirements and is very suitable for optimization problems with a large data volume or many parameters [44]. The algorithm is as follows:

Step 1: Given an objective function

J (θ)

with some parameters

θ

, calculate the gradient at time t:

g_{t} = \nabla_{θ} J (θ_{t - 1})

(17)

Step 2: Comprehensively, consider the gradient momentum of the previous time step and calculate and update the gradient mean

m_{t}

and exponential moving average

v_{t}

of the square of the gradient:

m_{t} = φ_{1} m_{t - 1} + (1 - φ_{1}) g_{t}

(18)

v_{t} = φ_{2} v_{t - 1} + (1 - φ_{2}) g_{t}^{2}

(19)

where

φ_{1}

is the exponential decay rate, which controls weight distribution, and

φ_{2}

is the exponential decay rate. This rate impacts the process of the previous squared gradient.

Step 3: Since the initial setting of

m_{0}

and

v_{0}

is 0, here we need to adjust the deviation of the gradient mean

m_{0}

and the exponential moving average

v_{0}

of the square of the gradient to alleviate the impact of the deviation:

{\hat{m}}_{t} = m_{t} / (1 - φ_{1}^{t})

(20)

{\hat{v}}_{t} = v_{t} / (1 - φ_{2}^{t})

(21)

Step 4: Update parameters

θ

:

θ_{t} = θ_{t - 1} - α * {\hat{m}}_{t} / (\sqrt{{\hat{v}}_{t} + ε})

(22)

where

α

represents the step size, which is updated according to

α_{t} = \frac{α \sqrt{1 - φ_{2}^{t}}}{1 - φ_{1}^{t}}

, and the built-in parameters of the Adam algorithm are set to

φ_{1} = 0.9

;

φ_{2} = 0.999

;

ε = 10^{- 8}

.

3. The Formation of the Combined Forecasting System

In this study, a VLG hybrid system for short-term load forecasting is proposed to improve the forecasting accuracy. This system mainly uses VMD technology to remove the noise of the original sequence to obtain a stable time series, which is used as the input feature of the model training and provides a high-quality training set for the model. Here, BEGA is used to adaptively optimize the length L of the input data sample unit of the LSTM and the number N of the cell units, allowing the LSTM to achieve optimal performance. The analysis framework of the prediction system is shown in Figure 2. The 1-step is VMD data preprocessing module, the 2-step is BEGA and Adam optimization algorithm module, and the 3-step is LSTM prediction module. The specific instructions are as follows.

3.1. Data Preprocessing Module

As the original load data without data processing for feature extraction have strong volatility and randomness, if these data are input into the model input directly, the prediction performance will decline [45]. Therefore, the advanced decomposition ensemble method is used to eliminate the high frequency noise in the power load series and extract valuable information components from the original power load data, which can effectively improve the prediction accuracy. In this paper, we use VMD to preprocess the data, decompose the original time series into several intrinsic mode functions (IMFs), and then reconstruct those functions as the input series of the prediction model.

3.2. Optimization Algorithm Module

The module is mainly classified into two portions. The first part is used to optimize the selection of the input unit length L and the number of the number of hidden layer nodes N of the LSTM through a BEGA algorithm. In this part of the optimization, the binary genetic sequence is initialized. Through the genetic algorithm, L and N are changed within a certain range. Then, pass the changed L and N to the LSTM for training to calculate the loss function as

loss = \frac{1}{M} \sum_{m = 1}^{M} (y_{m} - {\hat{y}}_{m})^{2}

(23)

where

y_{m}

is the true value, and

{\hat{y}}_{m}

is the predicted value. The smaller the value of the loss function, the smaller the model prediction error will be. Next, based on the loss function value, update the population, and obtain L and N values that minimize the value of the loss function. Finally, the length L of the input unit and the number N of the number of hidden layer nodes that optimize the performance of the model are determined.

The second part is to optimize the hyperparameters in LSTM. Adam is an adaptive objective function optimization algorithm that adaptively calculates the learning rates of different parameters on the basis of the first and second-moment estimates of the gradient to improve the model convergence speed and provide better prediction accuracy.

3.3. Forecast Module

To make the prediction performance more accurate and stable, the VLG forecasting system is built to forecast the power load data. By constructing the index system, the prediction errors between the prediction results and the original power load data are evaluated.

4. Experiment and Evaluation

The accurate prediction of short-term power load data has considerable practical significance and is very important for ameliorating the prediction performance of the model. Therefore, in this paper, we propose a hybrid system to enhance the accuracy of short-term power load prediction and apply other traditional nonlinear forecasting models for comparison in this section to assess the effectiveness of the employed hybrid system. Taking the power load data of New South Wales, South Australia, and Queensland (30 min power load data for each season in 2013) as examples, the performance of the hybrid model is evaluated. This part mainly outlines the procedure and conclusions of the experiment.

4.1. Model Evaluation Indicators

To judge whether the performance of one model is better than that of another model, the model evaluation criteria play a vital role. However, there is currently no unified standard to evaluate prediction performance, nor can a single model evaluation index fully reveal the excellent performance of a given model [46]. Therefore, we use the method of constructing an index evaluation system to evaluate the present model. This index system mainly includes the mean absolute percentage error (MAPE), root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R²). The smaller the values of these indicators are, the more accurate the prediction and the better the model performance will be. In contrast, the higher the value of R², the better the prediction model performance. The particular content and calculation formula for each index is introduced in Table 2.

4.2. Experimental Setup

To estimate the performance of the VLG hybrid system for short-term power load predictions, the hybrid system is compared with the traditional models for each component to explore the impact of the combination model on short-term power load forecasting. We divide the experiment into three parts: Experiment 1, Experiment 2, and Experiment 3. In Experiment 1, The prediction performance of the model using VMD and other data preprocessing techniques is evaluated. Experiment 2 compares the performance of the VLG hybrid system with the performance of different neural networks and nonlinear prediction models in predicting the short-term power load. Experiment 3 discusses the impact of the adaptive optimization algorithm BEGA on model performance.

4.3. Data Description

In the experiment, since the power load data will change with the seasons, we selected the 2013 power load data of New South Wales, South Australia, and Queensland in Australia as the experimental verification data. We used the data from January 1 to January 31 as the summer dataset, the data from March 1 to March 31 as the autumn data set, the data from June 1 to June 30 as the winter dataset, and the data from September 1 until September 30th as the spring data set. In addition, the power load data set was given a time interval of 30 min. Among the data points included in the data set of each season, the first 80% were selected as the training set and the last 20% as the test set. Specifically, the lengths of the training set and test sets in the summer and autumn data sets of the three states were 1190 and 298, respectively. Correspondingly, the lengths of the training set and test set for the spring data set and winter data were 1152 and 288, respectively. When constructing the model input vector, we adopted a rolling acquisition mechanism. In other words,

\{x (1), x (2), \cdot \cdot \cdot \cdot, x (t - 1), x (t)\}

was used as the basis of the latter data

x (t + 1)

. Table 3 presents the descriptive statistics of the original power load data set, training data set, and test data set for the NSW seasonal data.

4.4. Model Parameter Setting

This section outlines the parameter settings of each component of the proposed hybrid prediction model VLG, including the length of the input unit, the initial learning rate, and other parameter settings of the LSTM, BEGA optimization algorithm, and Adam optimizer.

4.4.1. Parameter Setting of the ANN

Before the experiment, the parameters in the model and optimization algorithm must be defined. For the deep learning algorithm LSTM, the length of the input unit L and the number of hidden layer nodes N are optimized by the BEGA algorithm. The input unit length of BP is 5, and the number of hidden layer nodes is 11. See Table 4 for the other parameter settings.

4.4.2. Parameter Settings of the Optimization Algorithm

Setting the parameters of the BEGA optimization algorithm and Adam optimizer is very important for the short-term power load forecasting accuracy. The specific parameter settings we studied are shown in Table 5.

Figure 3 shows the power load data for South Australia, Queensland, and New South Wales. It is not difficult to see that the data fluctuations in New South Wales and Queensland are relatively stable, with South Australia fluctuating the most frequently. In the Figure 3, (a) is the situation of the three study regions. (b) is the feature selection part, which shows the decomposition results of power load series, and IMFs is arranged in descending order of frequency. Then, the high frequency IMF is removed and the remaining IMFs are reconstructed to obtain the optimal input sequence. The characteristics of the data series are obviously improved than the original ones.

4.5. Experiment I

In the previous section, we noted that the noise reduction technology of time series plays a crucial role in the performance and prediction accuracy of the model. Based on the research of previous scholars, noise reduction technology can mainly be divided into three major categories: empirical mode decomposition technology (EMD), denoising autoencoder technology (DA), and VMD, which is used in our proposed hybrid system. The principle of VMD was discussed in the previous section. EMD technology involves decomposing the internal model function from the original signal to obtain a series of different intrinsic mode functions (IMFs). This method can decompose non-stationary and non-linear signals into stationary signals with different time scales. The denoising autoencoder (DA) is a type of unsupervised learning that uses an encoder and a decoder. White noise or Gaussian noise is added to the original sequence, and the neural network is continuously iterated to obtain a dimensionality reduction feature expression of the data to achieve the effect of feature extraction.

Experiment 1 aims to verify the influence of the noise reduction data set on the model and the performance of the VMD noise reduction method. In this way, the prediction accuracy of VMD-LSTM and other neural network models based on EMD and DA (i.e., EMD-BP, EMD-LSTM, DA-BP, and DA-LSTM) is compared. We performed three experiments on each model to assess the prediction accuracy of each model, and the specific prediction results based on NSW spring dataset are shown in Table 6.

The details of the experiment are as follows:

(a): The comparison results of the evaluation index system of VMD-LSTM and other hybrid systems are introduced in this table. The VMD-LSTM shows better prediction accuracy and prediction performance on most evaluation indicators. For the comparison between the single LSTM model and the VMD-LSTM system, it is obvious that the evaluation index of the VMD-LSTM system is superior to that of the LSTM model in all cases. At the same time, the prediction accuracy of all noise reduction models is higher than that of the model based on original data, which indicates that data preprocessing is indispensable for power load prediction.
(b): In the case, we compare the VMD-LSTM model and several other hybrid model methods. Among the performance test index values of the experiments in various regions, the VMD-LSTM model offers the best MAPE results, with 0.4859%, 0.9352%, and 0.4922%. Secondly, the models based on prediction accuracy are VMD-LSTM, EMD-LSTM, EMD-BP, VMD-BP, DA-LSTM, and DA-BP, in order from high to low. Among the six models, VMD-LSTM has the best prediction accuracy. The coefficient of determination (R²) reflects the difference in the performance of the prediction model from the fit. In this experiment, the R² of VMD-LSTM is the best with 0.9971, 0.9910, and 0.9967 in the three states. We also certify the effectiveness of the noise reduction model VMD employed in this paper.
(c): The previous time-series data denoising technology is also applied to the power load, short-term wind speed, and stock prediction models. Most of these models only discuss the improvement of model accuracy and performance via noise reduction technology but do not discuss the new sequence obtained after using the noise reduction method correlation with the original time series. Therefore, through the gray correlation method (GC) and the method of calculating the Pearson correlation coefficient (PE) and Spearman correlation coefficient (SP), the differences between different noise reduction methods are discussed from the perspective of the correlation between the new sequence and the original sequence. Detailed calculation results are given in Table 7.

Table 7 shows the correlation between the new sequence obtained by different noise reduction methods and the original power load data, in which the new sequence obtained using the VMD method has the highest correlation with the original sequence. In summary, using the VMD noise reduction method to process data not only performs well in improving the prediction accuracy of the model but also maintains more original information in the sequence, making it a more suitable method for the data preprocessing of short-term power load data.

Remarks:

Based on the above experiments, the employed VMD-LSTM combined system has the highest prediction performance, and Queensland has the highest prediction accuracy, with a MAPE value of 0.4922%. The average prediction accuracy of the three regions is 0.6378%. This shows that the system has high prediction accuracy and excellent stability. Moreover, in the test index based on the correlation measure, VMD remains superior to other noise reduction models, further verifying the effectiveness of the system.

4.6. Experiment II

Power load data are very sensitive to natural factors such as the season and climate and is one of the dominant factors affecting the fluctuation characteristics of the power load. In this part, the 30 min power load data of New South Wales, Queensland, South Australia from March to April, June to July, September to October, and December to January in 2013 are used as seasonal data for this area. A comparison of the prediction performance differences between the different predictive methods based on the power load data of New South Wales is shown in Table 8. Moreover, Table 9 and Table 10 provide a prediction performance comparison between the traditional predicted model based on the seasonal data of Queensland and South Australia and the combination model based on the LSTM employed in this study. In general, the VLG system employed in this research provides better performance than traditional prediction methods.

(a) For New South Wales, based on the annual average of the prediction results, BP, PSO-BP, LSTM, and our proposed hybrid model VLG obtained good accuracy results for all evaluation indicators. The ARIMA model also achieved a prediction effect with a MAPE value of less than 2%. However, the MAPE values of the other two models were higher than 10%, and the performance was poor. The ARIMA model has always been considered to provide superior performance in predicting power load data, and its MAPE value is lower than the values of the other traditional models discussed in this article, which is logical. However, the ARIMA model is not beneficial for long-term rolling forecasting and has certain disadvantages. The BP model and the optimization system based on the BP neural network design have always had good prediction performance in many experiments related to power load prediction. However, their MAPE values are still nearly three times higher than those of our proposed hybrid model VLG, which shows that our proposed hybrid model provides outstanding performance in short-term power load prediction.

For seasonal data, the proposed system obtained the best performance for each seasonal data set. For spring data, the MAPE value obtained by the VLG hybrid system was 0.3081%. Among these methods, the ARIMA, BP, PSO-BP, and LSTM models all perform better. The forecasting accuracy of the VLG prediction model was the highest, with a MAPE of 0.3081%, RMSE of 30.27, R² of 0.9979, and MAE of 24.88. In terms of the other seasonal feature data, the VLG model also obtained the best results, with corresponding MAPE values of 0.4271%, 0.2724%, and 0.3717%. Among them, the forecasting accuracy of the autumn feature data set was the highest in each model, while the forecasting accuracy of the general system of the winter feature data set was worse than that of other seasons, which indicates indirectly that the forecasting of the power load is affected by regional and seasonal factors.

(b) For Queensland, from the perspective of annual average forecasting accuracy, the proposed combined system is still better than other classic models in prediction accuracy, with a MAPE value of 0.3486%. Among the remaining eight models, the forecasting accuracy of the VMD-LSTM system ranks second. The performance of the PSO-BP optimization model and the original LSTM model is similar, and the prediction accuracy is excellent. However, the MAPE of the hybrid system’s VLG proposed in this paper was 0.1624%, 0.5052%, and 0.5499% lower than the values of the above model, respectively. Compared with the feature data of South Wales, the prediction accuracy of Elman and RBF models is improved but is still not ideal compared to the other models. For the seasonal data in Queensland, the VLG model still achieved the best results for the seasonal data, with corresponding MAPE values of 0.2602%, 0.3718%, 0.3101%, and 0.4524%, respectively. From the perspective of the coefficient of determination, the goodness of fit of the VLG combination forecasting model was the most significant, and R² values of the four seasonal data were determined as 0.9983, 0.9953, 0.9979, and 0.9942, respectively. Here, the general prediction accuracy for the winter feature data set was worse than that of other seasons. The performance rankings of each model differed little from the rankings based on the South Wales feature dataset.

(c) For South Australia, from the perspective of the annual average prediction accuracy, in all prediction models, the performance evaluation index values calculated by the VLG hybrid model were significantly better than the performance index values calculated by the other prediction model processing methods, and its MAPE value was 0.9800%. The prediction results of the characteristic data of South Australia were slightly lower than those of the models of South Wales and Queensland, but the prediction accuracy of the VLG, VMD-LSTM, LSTM, and PSO-BP models was still excellent. In comparison with the other three systems, the MAPE value of the VLG hybrid model system decreased by 0.4066%, 1.2675%, and 1.1927%, respectively. In terms of seasonal feature data, like with the prediction results of Queensland and South Wales, the VLG hybrid model provided the smallest prediction errors and the greatest point prediction accuracy in four-season feature data prediction compared with other traditional models. Figure 4 shows the performance comparison between the VLG prediction system and the seven comparison models based on the spring data set of South Australia. In addition, Figure 5 shows the fit between the predicted value of VLG and the actual value of power load based on the four seasonal data sets in South Australia.

Moreover, based on the seasonal feature data of the three states, the seasonal temperature and regional climate were some of the most important factors affecting the performance of short-term power load predictions. Among them, autumn was often the season with the highest power load forecast accuracy. On the contrary, the prediction performance of the three states based on the winter feature data was lower than that of the other three seasons. In terms of regional differences, the annual average forecast performance gap between New South Wales and Queensland was not large, with MAPE values of 0.3717% and 0.3486%. South Australia’s annual average forecast performance was relatively poor, with a MAPE value of 0.9800%, which may be related to the South Australia power load data set.

Remark:

Based on Experiment 2, the performance evaluation value calculated by the hybrid model VLG is superior to the performance evaluation value calculated by any classic single-item model and hybrid model. Therefore, the experimental conclusions indicate that the employed VLG hybrid system performs well in short-term power load predictions. Simultaneously, the seasonal climate and other factors have a certain impact on power load forecasting.

In Figure 6, (a) is the radar map of the comparison of annual average MAPE of different prediction models in three regions. (b) gives the prediction performance comparison of different data preprocessing methods. (c) shows the outstanding advantages of the BEGA optimization algorithm and the prediction results of VLG prediction system. (d) shows the three indicators of the classic prediction model increase by percentage compared with the VLG prediction system, in New South Wales.

4.7. Experiment III

The results of Experiment 2 show that the hybrid model based on LSTM has excellent performance in short-term power load prediction. However, LSTM needs one to manually configure the input unit length L and the number of hidden layer nodes N. These configured parameters also largely determine the model’s ability to engage in short-term power load forecasting. There is no fixed rule for determining appropriate parameters when the LSTM predicts a time series. Therefore, common solutions include using repeated trials or enumeration methods to obtain appropriate parameters for accurate prediction accuracy. However, methods such as enumeration consume considerable time and may not necessarily select the best parameters. On the other hand, for the LSTM network, the final values of the hyperparameters of each neural unit and gate structure also have a significant impact on the prediction performance. Usually, the gradient descent method (GD) or stochastic gradient descent method (SGD) is used in an experiment to find and select these hyperparameters. The gradient descent method is prone to problems such as a local optimal solution and slow convergence. Although the prediction results of the VMD-LSTM were satisfactorily evaluated, considering the deficiencies of the LSTM and SGD algorithms, we do not consider the prediction performance to have reached its optimal value. In response, the VLG hybrid system uses a binary encoding genetic algorithm (BEGA) and the Adam optimizer to solve these problems.

In this part of the experiment, we focus on the hybrid model VMD-LSTM and the original LSTM neural network and use the Adam optimizer and BEGA for parameter optimization. Here, the performance of the feature data sets of each season in New South Wales under different configurations, with different numbers of input units L and numbers of hidden layer nodes N, was studied. That is, the experience of previous scholars was used to determine the two different sets of L and N values to verify the effects of the BEGA on model accuracy. The specific experimental steps are shown in Table 11. At the same time, using the spring data of New South Wales as the data set, the Adam optimizer is compared with the stochastic gradient descent method to discuss the differences in prediction performance produced by different hyperparameter optimization methods. The specific experimental steps are shown in Table 12.

As can be determined from Table 11, the VLG model using the binary encoding genetic algorithm provides better prediction results than the model selected via empirical summary in each season data set. Based on the seasonal feature data set, the BEGA optimal input unit length L and the number of hidden layer nodes N are 12–8, 12–8, 25–12, and 4–6, respectively, and the prediction accuracy MAPE values are 0.2602%, 0.3718%, 0.3101%, and 0.3758%, respectively. It can be seen from the MAPE values that using the BEGA algorithm to select the optimal unit length L and the number of cell units N can increase the model accuracy by up to 30%. Based on a comparison of the RMSE, MAE, and R² values, the performance improvement of the VLG system in the spring data set was the highest, with MAE, RMSE, and R² values of 15.53, 20.46, and 0.9986, respectively. Compared with the experience summary method, the MAE and RMSE average decline was 10.72, 15.37. In this four-season feature data set, our proposed VLG model using the BEGA algorithm has very good prediction performance and can accurately predict future short-term power load changes.

Table 12 shows the performance advantages and disadvantages of different optimizers based on LSTM and VMD-LSTM systems. The results show that the prediction values of LSTM networks with different structures are quite different. Through a longitudinal comparison, it can be seen that using the Adam optimizer is much better than using the random gradient descent for prediction. Under the same data set, the MAPE values of the LSTM and VMD-LSTM models using the Adam optimizer were 0.8744% and 0.2281%, while the MAPE values of the LSTM and VMD-LSTM models using the SGD algorithm were 1.071% and 0.6253%. Thus, compared to using the Adam optimizer, employing the SGD algorithm reduced the performance by 22% and 174%, respectively. This result is enough to show the importance of using the Adam optimizer for model prediction accuracy and that this optimizer significantly improves the stability and accuracy of the predictions. Moreover, through a horizontal comparison, it can be further shown that using the optimal number of input units L and the number of cell units N (4–6) determined by the BEGA algorithm can give the model the best performance in the field of measurement precision and stableness, thereby significantly improving model performance.

Remarks:

For prediction performance and prediction accuracy, using the proposed BEGA algorithm and Adam optimizer is significantly superior to applying the evaluation index value determined by summarizing experience and using the stochastic gradient descent method. This further confirms that the VLG hybrid system has outstanding performance in future short-term power load predictions.

5. Discussion

This section provides an in-depth exploration of the above experimental results—that is, the availability of the hybrid system, the performance differences between the optimization algorithm hybrid model used and other optimization hybrid prediction systems, and the improvements of the evaluation index of the proposed hybrid system.

5.1. Effectiveness of the Proposed System

First, the predicted error is a vital indicator to estimate the performance of the prediction system. In this section, the validity of the hybrid model developed by the Diebold-Mariano test is tested, and the significance level of the prediction error of the different models is demonstrated via the hypothesis test method. The VLG system is then compared with other models. The Diebold-Mariano test evaluates the differences between different prediction systems according to the error of the system prediction performance [47]. The null hypothesis H₁ and the alternative hypothesis H₀ are presented in Equation (25) and (26):

H_{1} : E_{a} [L (e r r_{i}^{1})] \neq E_{b} [L (e r r_{i}^{2})]

(24)

H_{0} : E_{a} [L (e r r_{i}^{1})] = E_{b} [L (e r r_{i}^{2})]

(25)

DM = \frac{d m e a n (L (e r r_{i}^{1}) - L (e r r_{i}^{2}))}{\sqrt{S^{2} / n}} S^{2}

(26)

The DM test consequences of the employed system and those of the other comparative models are shown in Table 13.

On account of the above results, the following conclusions can be drawn:

(1): By contrasting and dissecting the forecasting errors of different hybrid systems, the DM test consequences of different prediction models are all at the upper limit at a confidence level of 1%;
(2): The Diebold-Mariano test was performed on the prediction errors of four different traditional single models, and the test results of the VLG were all higher than the upper limit at a confidence level of 1%;
(3): The minimum value of the comparison between the VLG combination system and the LSTM model using other data preprocessing technologies is 2.0104, which is also far beyond the 5% significance level threshold.

Therefore, according to the Diebold-Mariano test results, it can be legitimately summarized that the employed predicted model not only has greater prediction capacity than other systems but also indicates crucial distinctions in the level of prediction accuracy and superiority in short-term power load forecasting.

5.2. Model Stability Study

This section starts with the stability of the model and proposes two different sets of multiple experiments to estimate the prediction ability of the VLG system. By comparing the prediction stability values of the three systems of LSTM, BP, and PSO-BP, the prediction stability is verified for the core part of the hybrid system, LSTM, employed in this paper. All three models use raw data and can be modeled without denoising. Based on the Spring feature data set in Queensland, Australia, 20 repeated experiments were conducted on the three models to explore the volatility of the prediction accuracy of the models. As is well-known, variance can reflect the robustness and volatility of the forecasting system. The smaller the standard deviation of the prediction error is, the more robust the prediction system and the weaker the volatility will be. Therefore, the standard deviation of the predicted error is used to appraise the robustness of the employed combination prediction system and other contradistinctive systems.

Figure 7a illustrates the differences in forecasting stability between the three systems over 20 tests. The results show that the BP has the largest volatility: The maximum MAPE is 1.5648%, the minimum is 0.9186%, and the standard deviation is 0.2022. In contrast, the prediction capacity of the LSTM model is better and more stable. The maximum value of MAPE is 0.9126%, the minimum value is 0.8241%, and the standard deviation is only 0.0367. Among these ANNs, the LSTM model has the best effectiveness in prediction accuracy and stability, as well as the least standard deviation, which reflects the stability advantages of the LSTM model.

Moreover, the excellent prediction performance of these models depends to a great extent on the optimization method of the model parameters. Different parameter optimization methods have certain effects on the forecasting accuracy and convergence speed of the system. To ensure a single variable, the LSTM model was selected to apply the Adam optimizer and the SGD stochastic gradient descent method for 100 trials, and the differences between the two optimization methods were compared. Figure 7 (b) shows the results. The average value of MAPE of LSTM model obtained by Adam optimization method is 0.8457%, which is significantly improved than that of SGD. In the scatter plot of MAPE, the MAPE obtained by the Adam optimizer is between 0.8% and 0.9%, and the standard deviation is 0.0341. Moreover, the bandwidth of the scatter plot is narrower, indicating that the model prediction is more stable and has smaller prediction volatility. On the contrary, the MAPE value of the model using the stochastic gradient descent method is between 0.9% and 1.3%, with a standard deviation of 0.2451. Here, the prediction error of the model is increased, the stability is worse than that of the Adam optimizer, the prediction volatility is large, and the prediction performance is unstable.

5.3. Multi-Step Prediction and Result Analysis

In the experiments in Section 4, the forecast model was used to make the next forecast of the power load data—that is, a single-step forecast. This section compares the three different hybrid models proposed to appraise the prediction performance of the employed hybrid system VLG in a multi-step prediction test.

Unlike the comparison model proposed in the previous section, this experiment aims to verify whether the new VLG model is comparable with the other two VMD-based hybrid models (i.e., VMD-GWO-SVM and VMD-PSO-BP). SVM and BP models perform well in dealing with time series forecasting problems and are also generally employed in short-term power load data predictions. In the model’s hyperparameter optimization algorithm, the two comparative experiments, respectively, used the PSO optimization algorithm and the gray wolf optimization algorithm. In the spring data of Queensland, BP and SVM are optimized to certify the accuracy of the VLG hybrid system based on a comparison with excellent classic hybrid models (see Table 14 and Figure 8 for comparison results). In Figure 8, (a) is the forecasting and actual values of the proposed model and the classical hybrid prediction model in the multi-step prediction. (b) shows the index results of the proposed prediction system and the classical hybrid prediction model.).

For the spring data of Queensland, in a one-step forecast, the proposed combined model VLG and comparative model VMD-PSO-BP had no significant differences in the various evaluation indicators but still obtained the best MAPE, MAE and RMSE, respectively, with 0.2448%, 18.13, and 23.54. In the one-step prediction, although the employed system does not reflect the outstanding advantages of the classic combined model VMD-PSO-BP in prediction accuracy and performance, the performance remains no worse than that of any other models. When the prediction of the model involves a two-step prediction, the VLG achieves obvious advantages, and the MAPE, MAE, and RMSE are 0.7653%, 48.77, and 56.73, respectively. Instead, the VMD-PSO-BP model has no significant difference in prediction accuracy compared to the VMD-GWO-SVM in the two-step prediction. The prediction accuracy of both is far poorer than that of one-step prediction. The MAPE values are 1.6333% and 1.9159%, which are 0.868% and 1.1506% higher than the values of the VLG system. In the three-step prediction, the MAPEs of the three hybrid systems are all greater than 1%, but the MAPE of the proposed VLG combined system is 1.35655%. This system still offers the greatest prediction ability among the three excellent VMD-based combined systems. From the perspective of the improved MAPE, the prediction capacity of the VLG system improved the greatest in the two-step prediction, with 60.05% and 53.14%, respectively. Figure 8 demonstrates a comparison of the forecasting performance of the experiment with the spring data of Queensland steps 1, 2, and 3. Among the three-step prediction models, the VLG hybrid system is still the most accurate and valid prediction system.

5.4. Improvement of the Evaluation Index

In previous index evaluation systems, the MAPE values of each prediction model were too small, and the RMSEs were different because of the differences in the data dimension, making it challenging to intuitively display the degree of differences in the model prediction accuracy [48]. In this study, we use the percentage improvements of the MAPE and RMSE criteria. In this way, a comprehensive analysis of the proposed combined system can be carried out. The definition is as follows:

P_{MAPE} = |\frac{{MAPE}_{1} {- MAPE}_{2}}{{MAPE}_{1}}| \times 100 %

(27)

P_{RMSE} = |\frac{{RMSE}_{1} {- RMSE}_{2}}{{RMSE}_{1}}| \times 100 %

(28)

The improved MAPE and RMSE indicators are shown in Table 15. Considering the results in Table 15, the prediction capacity of the employed combined system is discussed and analyzed as follows:

(a): The predictive capacity of the system employed in this study is clearly commendable.
(b): The value of the percentage improvement of the evaluation index shows a clear decreasing trend. This indicates that the prediction veracity of the system is gradually improved due to the data preprocessing technology and simulation optimization algorithm playing a vital role in improving the prediction ability of the system.
(c): Notably, the VLG hybrid system presents obvious advantages over other systems.

5.5. Future and Prospects

The hybrid prediction system based on deep learning proposed in this study overcomes the shortcomings of traditional prediction models. In the rapid development of the intelligent information age, accurate load forecasting has become an indispensable part of the power load field, which plays a vital role in the safe operation, daily distribution, and economy of the power system. From artificial neural networks, to machine learning, and finally to deep learning, the construction of load forecasting model is approaching ever closer to the actual situation. Given the existing prediction models, the prediction models constructed by deep learning methods such as LSTM and CNN are obviously better than those constructed by artificial neural networks and traditional machine learning algorithms. A hybrid prediction model combining the intelligent optimization algorithm and deep learning network has higher practical application value and stronger expansibility, making it able to more easily fit nonlinear time series with strong volatility.

In addition to traditional load forecasting, intermittent renewable energy, such as photovoltaic power generation and wind power generation, features stronger volatility, randomness, and instability. However, research on predicting intermittent renewable energy using deep learning methods is still very limited. The development of a comprehensive and effective deep learning method is expected to become a new direction of smart grid research. The application of artificial intelligence algorithms in the field of new energy is a new typical application scenario for artificial intelligence and also provides a new intelligent solution for building a global low-carbon energy future.

6. Conclusions

Short-term power load predictions play a crucial role in the safe operation and risk assessment of electrified wire netting, which has aroused attention and excitement among scholars. Because of the inherent uncertainty and randomness of power load sequences, determining how to efficiently and effectively predict the power load is still a challenging task. A string of research has also been developed to enhance the performance of power load prediction. Unfortunately, these methods are mostly restricted to using a onefold prediction model to predict power load sequences; however, any single model will have inherent shortcomings. Moreover, most previous studies have not considered the effect of data preprocessing and sequence noise on the model prediction accuracy. In response, this paper proposed a new predictive analysis system that overcomes the above shortcomings and provides an effective technical means for short-term power load analysis and monitoring. In the developed model, variational modal decomposition (VMD) was employed to divide the original sequence from high to low into a set of components. Through reconstruction, the global noise was effectively removed. Then, the long-term short-term memory neural network (LSTM) was used instead of the classical neural network to predict the power load data, which effectively improved the prediction accuracy. Finally, to further improve the modeling performance and robustness, an improved binary coding genetic algorithm was proposed based on the genetic algorithm; this algorithm achieved high accuracy and maintained strong stability. The effectiveness of the algorithm was verified by experiments. The predictive analysis system developed by the application was used to predict the seasonal data of New South Wales, Queensland, and South Australia and calculate multiple performance indicators (MAPE, RMSE, MSE, and PMAPE). The experimental results indicate that the minimum MAPE values of the employed VLG system are 0.3717%, 0.3486%, and 0.9800%, respectively, which are better than the values of the comparison model. Like the traditional excellent hybrid model, the multi-step prediction of this model also provides strong prediction performance. In general, the proposed predictive analysis system shows excellent performance in analyzing and monitoring short-term power loads. Specifically, this system does not only deeply analyze the information related to people’s activities and lives but can also accurately and steadily approach the actual values. Therefore, future power plant decision-makers and grid investors could make reasonable decisions based on the system presented in this article to monitor and predict power loads.

Author Contributions

Conceptualization, Y.J. Methodology, J.W. Visualization, A.S. Writing—review and editing, H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Foundation of Liaoning Social Science Federation [Grant No. 20201slktyb-031].

Acknowledgments

This research was supported by Foundation of Liaoning Social Science Federation [Grant No. 20201slktyb-031]. And all the authors do not have any possible conflicts of interest.

Conflicts of Interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work. This manuscript is our own work and the content of this paper has not been copied from elsewhere. This manuscript has not been published before nor submitted to another journal for the consideration of publication and all data measurements are genuine results and have not been manipulated. In addition, none of the authors have any financial or scientific conflicts of interest with regard to the research described in this manuscript. And this manuscript was supported by Foundation of Liaoning Social Science Federation [Grant No. 20201slktyb-031]. And all the authors do not have any possible conflicts of interest.

Abbreviations

List of terminologies (method and indices)
EMD	Empirical model decomposition	DA	Denoise autoencoder
VMD	Variational Modal Decomposition	BP	Back propagation neural network
LSTM	Long Short-Term Memory neural network	RBF	Radial basis function
Elman	Elman neural network	SVM	Support vector machine
ARMA	Autoregressive moving average model	ARIMA	Autoregressive interval moving average model
PSO	Partial swarm optimization algorithm	GWO	Grey wolf optimization algorithm
EMD-BP	BP after EMD technology	DA-BP	BP after DA technology
VMD-BP	BP after VMD technology	EMD-LSTM	LSTM after EMD technology
DA-LSTM	LSTM after DA technology	VMD-LSTM	LSTM after VMD technology
GC	Grey correlation method	IMFs	Intrinsic mode functions
SP	Spearman correlation coefficient	PE	Pearson correlation coefficient
MAPE	Mean absolute percentage error	MAE	Mean absolute error
RMSE	Root Mean Square Error	R²	Coefficient of determination
AI	Artificial intelligence	PSO-BP	BP after PSO optimization algorithm
Adam	Adaptive moment estimation optimization	SGD	Stochastic Gradient Descent
BEGA	Binary encoding genetic algorithm	EMD-LSTM GA	LSTM after EMD and GA optimization
DA-LSTM-GA	LSTM after DA and GA optimization	VMD-PSO-BP	BP after PSO and VMD optimization
VMD-GWO-SVM	VMD after VMD and GWO optimization	DM	Diebold-Mariano test
DL	Deep learning	RNN	Recurrent neural networks
CNN	Convolution neural network	BiLSTM	Bi-directional long-term memory neural network
ANNs	Artificial neural networks	P_MAPE	The improvement in MAPE values
ADMM	Alternate Direction Method of Multipliers	VLG	The model combined VMD-BEGA -LSTM
List of terminologies (parameters and variables)
ω_k	Center pulse frequency	N1	Population Size of the GA
f	Actual signal	α	Initial learning rate
λ	Lagrange factor	x_t	Corresponding input
u_k(ω)	Modal function of VMD Technology	L1	Binary encoding length
f_t	Forget gate	i_t	Input gate
O_t	Output gate	C_t	Memory cell state
h_t	T-1 time input	sigmoid	Input gate layer
W_f	Coefficient matrix	b_f	Bias vector
g_t	T gradient of time step	m_t	Gradient mean
v_t	Exponential moving average	β	Exponential decay rate
p_i	Selection probability in genetic	x^bin	Binary code
x^dec	Decimal code	L	Input unit length
N	Number of cells in the hidden layer	loss	Loss function value of training model
y_m	True value of data	${\hat{y}}_{m}$	The value of model forecast
iter	Model iterations	y	Average of sequence data
ω_k	Center pulse frequency	N1	Population Size of the GA
f	Actual signal	x_t	Corresponding input
The main terminologies mentioned in this paper (including indices, methods, variables and parameters).

References

Yang, W.; Wang, J.; Tong, N. A hybrid forecasting system based on a dual decomposition strategy and multi-objective optimization for electricity price forecasting. Appl. Energy 2019, 235, 1205–1225. [Google Scholar] [CrossRef]
Guo, Z.; Zhou, K.; Zhang, X.; Yang, S. A deep learning model for short-term power load and probability density forecasting. Energy 2018, 160, 1186–1200. [Google Scholar] [CrossRef]
Zhang, X.; Wang, J. A novel decomposition-ensemble model for forecasting short-term load-time series with multiple seasonal patterns. Appl. Soft Comput. 2018, 65, 478–494. [Google Scholar] [CrossRef]
Wang, R.; Wang, J.; Xu, Y. A novel combined model based on hybrid optimization algorithm for electrical load forecasting. Appl. Soft Comput. 2019, 82, 105548. [Google Scholar] [CrossRef]
He, Q.; Wang, J.; Lu, H. A hybrid system for short-term wind speed forecasting. Appl. Energy 2018, 226, 756–771. [Google Scholar] [CrossRef]
Zhao, H.; Han, X.; Guo, S. DGM (1, 1) model optimized by MVO (multi-verse optimizer) for annual peak load forecasting. Neural Comput. Appl. 2018, 30, 1811–1825. [Google Scholar] [CrossRef]
Yang, Z.; Niu, H. Research on urban distribution network planning management system based on load density method. Eng. Technol. Res. 2018, 8, 76–77. [Google Scholar]
Cui, Q.; Shu, J.; Wu, Z.; Huang, L.; Yao, W.; Song, X. Medium- and long-term load forecasting based on glrm model and MC error correction. New Energy Progress 2017, 5, 472–477. [Google Scholar]
Jaihuni, M.; Basak, J.K.; Khan, F.; Okyere, F.G.; Arulmozhi, E.; Bhujel, A.; Park, J.; Hyun, L.D.; Kim, H.T. A Partially Amended Hybrid Bi-GRU—ARIMA Model (PAHM) for Predicting Solar Irradiance in Short and Very-Short Terms. Energies 2020, 13, 435. [Google Scholar] [CrossRef] [Green Version]
Song, J.; Wang, J.; Lu, H. A novel combined model based on advanced optimization algorithm for short-term wind speed forecasting. Appl. Energy 2018, 215, 643–658. [Google Scholar] [CrossRef]
Lydia, M.; Kumar, S.S.; Selvakumar, A.I.; Kumar, G.E.P. Linear and nonlinear auto-regressive models for short-term wind speed forecasting. Energy Convers. Manag. 2016, 112, 115–124. [Google Scholar] [CrossRef]
Kavasseri Rajesh, G.; Seetharaman, K. Day-ahead wind speed forecasting using f-ARIMA models. Renew. Energy 2009, 34, 1388–1393. [Google Scholar] [CrossRef]
Zhang, X.; Wang, J.; Gao, Y. A Hybrid Short-Term Electricity Price Forecasting Framework: Cuckoo Search-Based Feature Selection with Singular Spectrum Analysis and Svm. Energy Econ. 2019, 81, 899–913. [Google Scholar] [CrossRef]
Liu, J.; Liu, X.; Le, B.T. Rolling Force Prediction of Hot Rolling Based on GA-MELM. Complexity 2019, 3476521. [Google Scholar] [CrossRef]
Fan, G.F.; Guo, Y.H.; Zheng, J.M.; Hong, W.C. A generalized regression model based on hybrid empirical mode decomposition and support vector regression with back--propagation neural network for mid--short--term load forecasting. J. Forecast. 2020, 39. [Google Scholar] [CrossRef]
Xu, P. Research on Load Forecasting Method Based on Fuzzy Clustering and RBF Neural Network; Guangxi University: Nanning, China, 2012. [Google Scholar]
Xingjun, L.; Zhiwei, S.; Hongping, C.; Mohammed, B.O. A new fuzzy--based method for load balancing in the cloud--based Internet of things using a grey wolf optimization algorithm. Int. J. Commun. Syst. 2020, 33. [Google Scholar] [CrossRef]
Almalaq, A.; Edwards, G. A review of deep learning methods applied on load forecasting. In Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, Cancun, Mexico, 18–21 December 2017; pp. 511–516. [Google Scholar]
Ryu, S.; Noh, J.; Kim, H. Deep neural network based demand side short term load forecasting. Energies 2016, 10, 3. [Google Scholar] [CrossRef]
Massaoudi, M.S.; Refaat, S.; Abu-Rub, H.; Chihi, I.; Oueslati, F.S. PLS-CNN-BiLSTM: An End-to-End Algorithm-Based Savitzky–Golay Smoothing and Evolution Strategy for Load Forecasting. Energies 2020, 13, 5464. [Google Scholar] [CrossRef]
Li, H.; Liu, H.; Ji, H.; Zhang, S.; Li, P. Ultra-Short-Term Load Demand Forecast Model Framework Based on Deep Learning. Energies 2020, 13, 4900. [Google Scholar] [CrossRef]
Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-Term Residential Load Forecasting based on LSTM Recurrent Neural Network. IEEE Trans. Smart Grid 2017, 841–851. [Google Scholar] [CrossRef]
Zhang, W.; Qu, Z.; Zhang, K.; Mao, W.; Ma, Y.; Fan, X. A combined model based on CEEMDAN and modified flower pollination algorithm for wind speed forecasting. Energy Convers. Manag. 2017, 136, 439–451. [Google Scholar] [CrossRef]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
Wu, Z.; Huang, N.E. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 2009, 1–41. [Google Scholar] [CrossRef]
Ribeiro, M.H.D.M.; Stefenon, S.F.; De Lima, J.D.; Nied, A.; Mariani, V.C.; Coelho, L.S. Electricity Price Forecasting Based on Self-Adaptive Decomposition and Heterogeneous Ensemble Learning. Energies 2020, 13, 5190. [Google Scholar] [CrossRef]
Stefenon, S.F.; Ribeiro, M.H.D.M.; Nied, A.; Mariani, V.C.; Dos Santos Coelho, L.; Da Rocha, D.F.M.; Grebogi, R.B.; De Barros Ruano, A.E. Wavelet group method of data handling for fault prediction in electrical power insulators. Int. J. Electr. Power Energy Syst. 2020, 123. [Google Scholar] [CrossRef]
He, X.; Nie, Y.; Guo, H.; Wang, J. Research on a Novel Combination System on the Basis of Deep Learning and Swarm Intelligence Optimization Algorithm for Wind Speed Forecasting. IEEE Access 2020, 8, 51482–51499. [Google Scholar] [CrossRef]
Zhang, Y.; Pan, G.; Chen, B.; Han, J.; Zhao, Y.; Zhang, C. Short-term wind speed prediction model based on GA-ANN improved by VMD. Renew. Energy 2020, 156, 1373–1388. [Google Scholar] [CrossRef]
He, Z.; Chen, Y.; Shang, Z.; Li, C.; Li, L.; Xu, M. A novel wind speed forecasting model based on moving window and multi-objective particle swarm optimization algorithm. Appl. Math. Model. 2019, 76, 717–740. [Google Scholar] [CrossRef]
Zhu, C.; Teng, K. An early fault feature extraction method for rolling bearings based on variational mode decomposition and random decrement technique. Vibroeng. Procedia 2018, 18, 41–45. [Google Scholar]
Chen, X.; Yang, Y.; Cui, Z.; Shen, J. Wavelet Denoising for the Vibration Signals of Wind Turbines Based on Variational Mode Decomposition and Multiscale Permutation Entropy. IEEE Access 2020, 8, 40347–40356. [Google Scholar] [CrossRef]
Dragomiretskiy, K.; Zosso, D. Variational Mode Decomposition. IEEE Trans. Signal Process. 2014, 62, 531–544. [Google Scholar] [CrossRef]
Zhao, N.; Mao, Z.; Wei, D.; Zhao, H.; Zhang, J.; Jiang, Z. Fault Diagnosis of Diesel Engine Valve Clearance Based on Variational Mode Decomposition and Random Forest. Appl. Sci. 2020, 10, 1124. [Google Scholar] [CrossRef] [Green Version]
Song, E.; Ke, Y.; Yao, C.; Dong, Q.; Yang, L. Fault Diagnosis Method for High-Pressure Common Rail Injector Based on IFOA-VMD and Hierarchical Dispersion Entropy. Entropy 2019, 21, 923. [Google Scholar] [CrossRef] [Green Version]
Wang, J.; Wang, S.; Yang, W. A novel non-linear combination system for short-term wind speed forecast. Renew. Energy 2019, 143, 1172–1192. [Google Scholar] [CrossRef]
Sun, H.; Fang, L.; Zhao, F. A fault feature extraction method for single-channel signal of rotary machinery based on VMD and KICA. J. Vibroeng. 2019, 21, 370–383. [Google Scholar] [CrossRef]
Lin, H.; Hua, Y.; Ma, L.; Chen, L. Application of ConvLSTM network in numerical temperature prediction interpretation. In Proceedings of the ICMLC ′19—2019 11th International Conference on Machine Learning and Computing, Zhuhai, China, 22–24 February 2019; pp. 109–113. [Google Scholar] [CrossRef]
Wang, J.Q.; Du, Y.; Wang, J. LSTM based long-term energy consumption prediction with periodicity. Energy 2020, 197, 117197. [Google Scholar] [CrossRef]
Sakinah, N.; Tahir, M.; Badriyah, T.; Syarif, I. LSTM with adam optimization-powered high accuracy preeclampsia classification. In Proceedings of the 2019 International Electronics Symposium (IES), Surabaya, Indonesia, 27–28 September 2019; pp. 314–319. [Google Scholar] [CrossRef]
Li, C.; Xie, C.; Zhang, B.; Chen, C.; Han, J. Deep Fisher discriminant learning for mobile hand gesture recognition. Pattern Recognit. 2018, 77, 276–288. [Google Scholar] [CrossRef] [Green Version]
Qin, X.; Zhang, W.; Gao, S.; He, X.; Lu, J. Sensor fault diagnosis of autonomous underwater vehicle based on LSTM. In Proceedings of the 2018 37th Chinese Control Conference (CCC), Wuhan, China, 25–27 July 2018; pp. 6067–6072. [Google Scholar] [CrossRef]
Li, H.; Wang, J.; Li, R.; Lu, H. Novel analysis–forecast system based on multi-objective optimization for air quality index. J. Clean. Prod. 2019, 208, 1365–1383. [Google Scholar] [CrossRef]
Bera, S. Analysis of various optimizers on deep convolutional neural network model in the application of hyperspectral remote sensing image classification. Int. J. Remote Sens. 2020, 41. [Google Scholar] [CrossRef]
Yang, W.; Wang, J.; Wang, R. Research and Application of a Novel Hybrid Model Based on Data Selection and Artificial Intelligence Algorithm for Short Term Load Forecasting. Entropy 2017, 19, 52. [Google Scholar] [CrossRef] [Green Version]
Yang, W.; Wang, J.; Niu, T.; Du, P. A Novel System for Multi-Step Electricity Price Forecasting for Electricity Market Management. Appl. Soft Comput. 2020, 88. [Google Scholar] [CrossRef]
He, B.; Ying, N.; Jianzhou, W. Electric Load Forecasting Use a Novelty Hybrid Model on the Basic of Data Preprocessing Technique and Multi-Objective Optimization Algorithm. IEEE Access 2020, 8, 13858–13874. [Google Scholar]
Yechi, Z.; Jianzhou, W.; Haiyan, L. Research and Application of a Novel Combined Model Based on Multiobjective Optimization for Multistep-Ahead Electric Load Forecasting. Energies 2019, 12, 1931. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Flowchart of the long short-term memory neural network (LSTM) algorithm.

Figure 2. The flowchart of variational mode decomposition-binary encoding genetic optimization algorithm-LSTM (VLG) system.

Figure 3. Site data and data preprocessing results. (a) the situation of the three study regions. (b) Data preprocessing process.

Figure 4. The fitting situation and error description of the VLG prediction system and seven comparative prediction models based on spring dataset of South Australia.

Figure 5. The fit of the VLG prediction system under the training set and test set based on the four seasonal data sets of South Australia.

Figure 6. Comparison results of the hybrid forecasting models. (a) Comparison of annual average MAPE of different prediction models. (b) Compared results with different decomposed approaches. (c) Compared results of the BEGA algorithm. (d) performance demonstration of different prediction models based on NSW.

Figure 7. Prediction performance and stability of the STM neural network and Adam optimizer with the Queensland datasets. (a) Experimental results stability of different prediction models. (b) Experimental results of stability of different optimizers.

Figure 8. Comparison between the VLG model and the classical hybrid model in multi-step predictions. (a) Prediction model fitting. (b) Comparison of prediction performance evaluation indexes.

Table 1. Comparison of the advantages and disadvantages of existing power load forecasting systems.

Category	Advantage	Disadvantage	Method Sample	Sample Advantage	Sample Disadvantage
Physical arithmetic	Simple model ideal, wide parameters range	Need a lot of observation data, consumes a lot of computing resources, More suitable for long-term power load forecast	Single consumption method	Separate analysis of different types of electricity consumption	When there are many influencing factors, the prediction accuracy is not high
Physical arithmetic	Simple model ideal, wide parameters range		Elastic coefficient method	Reflects the relationship between economic growth rate and power consumption growth rate	The calculation is complex and requires accurate statistics on economic growth
Statistical strategies	Wide application, higher prediction accuracy	In multi-step prediction, the prediction accuracy of the model is bad	Time series (autoregressive moving average (ARMA), ARIMA)	Simple model assumptions, good self-fitness	The extrapolation effect is poor, reducing the prediction range
Statistical strategies	Wide application, higher prediction accuracy		Grey prediction system	Less modeling information and convenient operation	Low accuracy lack of systematisms
Machine learning	Wide application, Strong generalization and robustness	High complexity, high requirements of knowledge	Neural network (back propagation neural network (BP), support vector machine (SVM), RBF)	Excellent fitting effect nonlinear property	High the degree of data dependence
			Data denoising (empirical mode decomposition (EMD), deep learning image noise reduction algorithm (DA))	Compared with other methods, it is easy to understand	EMD has mode aliasing and DA has insufficient noise reduction
			Deep learning (convolution neural network (CNN))	Strong fault tolerance, simple human-computer interaction	Long CPU operation time

Note: With the advance of forecasting technology, few researchers use a single method to forecast the power load. In this table, different power load forecasting methods used in previous studies are detailed introduced.

Table 2. The evaluation metrics.

Metric	Definition	Equation
MAPE	Average absolute percentage errors	$MAPE = \frac{1}{N} \sum_{n = 1}^{N} \|\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}\| \times 100 %$
MAE	Average absolute error	$MAE = \frac{1}{N} \sum_{n = 1}^{N} \|{\hat{y}}_{i} - y_{i}\|$
RMSE	Root Mean Square Error	$RMSE = \sqrt{\frac{1}{N} \sum_{n = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}}$
R²	Coefficient of determination	$R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(F_{i} - A_{i})}^{2}}{\sum_{i = 1}^{N} {(F_{i} - \bar{F})}^{2}}$

Note: In this table,

{\hat{y}}_{i}

represents the predicted value of the system,

y_{i}

is the true value of the power load data, and

\bar{y}

is the average value of the sequence. The calculation equation is

\bar{y} = 1 / N (\sum_{i = 1}^{N} y_{i})

.

Table 3. Statistical description of the NSW seasonal power load data (MW).

Season	Data	Number	Min	Max	Mean	Standard
Summer	All samples Training set Testing set	1488 1190 298	5622.05 5622.05 5909.89	13787.85 13787.85 10928.47	8351.85 8409.83 8120.35	1519.63 1584.77 1200.07
Autumn	All samples Training set Testing set	1488 1190 298	5449.59 5689.53 5449.59	10724.86 10080.21 10724.86	7909.03 7959.44 7707.75	1166.83 1104.36 1372.34
Winter	All samples Training set Testing set	1440 1152 288	5997.81 5997.81 6191.79	11553.75 11553.75 11537.78	8602.26 8562.20 8762.52	1208.75 1213.52 1177.98
Spring	All samples Training set Testing set	1440 1152 288	5661.39 5661.39 5699.65	9916.19 9916.19 9081.59	7520.16 7543.92 7425.11	877.59 868.22 909.46

Table 4. Parameter settings of the artificial neural network model.

Parameters	LSTM	BP
Length of input units	Based on BEGA algorithm / 5	5
Number of hidden layer nodes	Based on BEGA algorithm / 8	11
Objective function	MSE	MSE
Activation function	Sigmoid	PURELIN
Epochs	200	200
Initial learning rate	0.001	0.0001

Note: The input unit length of the LSTM model without the BEGA optimization algorithm is 5, and the number of hidden layer nodes is 8.

Table 5. Model parameters.

Model	Parameters	Default Value
BEGA	Maximum number of iterations	30
	Binary code length	15
	Population number	10
	Fitness function	MSE
	Select operation	roulette wheel selection
Adam	Initial learning rate	0.001
	$α$	0.001
	$β$ 1	0.9
	$β$ 2	0.999

Table 6. Comparison of the prediction performance of models using different data preprocessing systems.

Model	MAE			RMSE			MAPE(100%)			R²
Model	NSW	SA	QLD	NSW	SA	QLD	NSW	SA	QLD	NSW	SA	QLD
LSTM BP	83.91 100.49	32.65 35.30	51.08 86.59	110.57 125.95	50.15 52.63	67.69 74.80	0.8478 1.0305	2.3571 2.5735	0.8475 0.9509	0.9919 0.9887	0.9744 0.9743	0.9920 0.9896
EMD-LSTM EMD-BP	51.95 65.89	13.25 22.31	30.99 38.01	63.05 75.11	17.21 29.13	37.89 46.31	0.5476 0.6558	1.0236 1.7704	0.5231 0.6846	0.9963 0.9932	0.9898 0.9823	0.9949 0.9921
DA-LSTM DA-BP	61.09 68.30	20.69 19.79	38.66 41.31	77.72 88.57	29.62 32.38	42.86 54.61	0.7646 0.8553	1.6430 1.5923	0.7733 0.8062	0.9925 0.9906	0.9836 0.9841	0.9934 0.9924
VMD-LSTM VMD-BP	49.42 52.04	11.69 19.77	28.75 35.54	62.17 94.89	15.60 25.48	34.47 42.78	0.4922 0.8251	0.9352 1.5808	0.4859 0.6421	0.9971 0.9921	0.9910 0.9802	0.9967 0.9925

Table 7. Correlation coefficient between new sequence and original sequence.

Model	VMD			EMD			DA
Model	NSW	SA	QLD	NSW	SA	QLD	NSW	SA	QLD
GC	0.8953	0.785	0.749	0.885	0.779	0.725	0.752	0.767	0.702
SP	0.999	0.990	0.999	0.998	0.990	0.999	0.996	0.986	0.997
PE	0.999	0.987	0.999	0.999	0.986	0.999	0.995	0.986	0.996

Table 8. The performance indicators of the VLG model based on seasonal data from New South Wales compared with other traditional power load forecasting methods.

Model	Spring				Summer				Autumn				Winter				Annual Mean
Model	MAPE (%)	R²	RMSE	MAE	MAPE (%)	R²	RMSE	MAE	MAPE (%)	R²	RMSE	MAE	MAPE (%)	R²	RMSE	MAE	MAPE (%)	R²	RMSE	MAE
Elman RBF ARIMA SVM BP PSO-BP LSTM VMD-LSTM VLG	19.1725 19.4860 1.7767 2.0421 1.2493 1.1766 0.8942 0.5032 0.3081	0.8506 0.8644 0.9812 0.9792 0.9836 0.9835 0.9944 0.9951 0.9979	1691.73 1707.73 121.86 176.34 127.40 132.91 90.83 49.16 30.27	1457.86 1477.86 117.85 158.17 101.96 90.41‘ 71.91 40.84 24.88	8.1867 8.3402 1.4676 1.7475 1.2646 0.7660 0.9548 0.6901 0.4271	0.9198 0.9145 0.9817 0.9751 0.9833 0.9942 0.9934 0.9947 0.9972	906.39 922.56 150.69 119.33 138.28 86.28 112.82 72.96 48.05	714.14 725.44 119.91 112.19 103.55 65.77 87.33 60.83 38.53	6.7683 6.9717 1.5563 1.8620 1.2846 1.0928 1.0360 0.4762 0.2724	0.9337 0.9288 0.9812 0.9807 0.9830 0.9854 0.9877 0.9952 0.9981	671.54 682.04 144.49 130.54 126.88 113.07 103.10 46.33 26.66	508.20 520.26 114.45 121.22 98.95 80.16 78.48 28.45 20.76	8.3994 8.3686 1.9512 2.3122 1.1503 1.0047 1.0091 0.5423 0.4792	0.9131 0.9177 0.9810 0.9759 0.9839 0.9903 0.9852 0.9943 0.9970	612.39 612.06 139.23 196.92 115.42 161.11 107.30 52.14 44.73	763.87 766.47 126.23 179.83 89.81 67.09 72.56 34.69 35.48	10.6317 10.7916 1.6880 1.9910 1.2372 1.0100 0.9735 0.5530 0.3717	0.9043 0.9064 0.9813 0.9777 0.9835 0.9887 0.9902 0.9949 0.9976	970.51 981.09 139.06 155.78 126.99 123.34 103.51 55.15 37.43	901.85 872.51 119.61 142.85 98.56 75.85 77.57 41.20 29.91

Note: The above tables report the details of different forecasting methods. Three indicators were selected—MAE, RMSE, and MAPE—to verify the performance of each model. The specific equations of the three indicators are

MAE = 1 / N (\sum_{n = 1}^{N} |{\hat{y}}_{i} - y_{i}|)

,

RMSE = \sqrt{\frac{1}{N} \sum_{n = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}}

,

MAPE = \frac{1}{N} \sum_{n = 1}^{N} |\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}|

, and

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(F_{i} - A_{i})}^{2}}{\sum_{i = 1}^{N} {(F_{i} - \bar{F})}^{2}}

. Here, bold numbers indicate that the index values of this system are superior to those of the other system.

Table 9. The performance indicators of the VLG model based on seasonal data from Queensland compared with other traditional power load forecasting methods.

Model	Spring				Summer				Autumn				Winter				Annual Mean
Model	MAPE (%)	R²	RMSE	MAE	MAPE (%)	R²	RMSE	MAE	MAPE (%)	R²	RMSE	MAE	MAPE (%)	R²	RMSE	MAE	MAPE (%)	R²	RMSE	MAE
Elman RBF ARIMA SVM BP PSO-BP LSTM VMD-LSTM VLG	7.4343 7.4613 1.9497 1.4228 0.9847 0.8657 0.8475 0.4859 0.2602	0.9212 0.9209 0.9801 0.9885 0.9896 0.9910 0.9919 0.9946 0.9983	531.83 534.82 137.21 113.49 74.80 73.07 67.69 34.47 20.26	430.17 431.19 107.65 101.34 59.45 48.37 51.08 28.75 15.53	6.8212 6.9253 2.1738 1.6231 1.1686 0.7973 0.9781 0.4453 0.3718	0.9314 0.9286 0.9664 0.9790 0.9862 0.9921 0.9897 0.9938 0.9953	514.17 523.02 133.49 124.26 85.89 61.26 70.98 32.47 28.46	392.03 398.09 122.18 112.32 67.20 45.14 56.83 25.81 21.67	7.1909 7.2187 2.1513 1.2141 0.9670 0.8817 0.8740 0.5228 0.3101	0.9117 0.9106 0.9658 0.9865 0.9891 0.9902 0.9913 0.9948 0.9979	538.69 545.30 136.75 101.93 73.76 69.80 65.07 35.36 21.60	404.58 406.02 123.22 89.44 53.29 49.31 49.20 29.19 17.30	4.3644 4.3591 1.8488 1.3952 0.927 0.8704 0.8944 0.5912 0.4524	0.9504 0.9510 0.9809 0.9860 0.9885 0.9907 0.9905 0.9941 0.9942	300.81 298.00 112.16 106.44 68.63 99.11 64.60 53.21 30.96	252.10 251.76 127.64 94.76 53.65 48.36 50.46 48.64 25.25	6.4527 6.4911 2.0309 1.4138 1.0118 0.8538 0.8985 0.5110 0.3486	0.9287 0.9278 0.9733 0.9850 0.9884 0.9910 0.9909 0.9943 0.9964	471.37 475.28 129.90 111.53 75.77 75.81 67.08 38.87 25.32	369.72 371.76 120.17 99.47 58.39 47.79 51.89 33.09 19.93

Note: The above tables report the details of different forecasting methods. Three indicators were selected—MAE, RMSE, and MAPE—to verify the performance of each model. The specific equations of the three indicators are

MAE = 1 / N (\sum_{n = 1}^{N} |{\hat{y}}_{i} - y_{i}|)

,

RMSE = \sqrt{\frac{1}{N} \sum_{n = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}}

,

MAPE = \frac{1}{N} \sum_{n = 1}^{N} |\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}|

, and

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(F_{i} - A_{i})}^{2}}{\sum_{i = 1}^{N} {(F_{i} - \bar{F})}^{2}}

. Here, bold numbers indicate that the index values of this system are superior to those of the other system.

Table 10. The performance indicators of the VLG model based on seasonal data from South Australia compared with other traditional power load forecasting methods.

Model	Spring				Summer				Autumn				Winter				Annual Mean
Model	MAPE(%)	R²	RMSE	MAE	MAPE(%)	R²	RMSE	MAE	MAPE(%)	R²	RMSE	MAE	MAPE(%)	R²	RMSE	MAE	MAPE(%)	R²	RMSE	MAE
Elman RBF ARIMA SVM BP PSO-BP LSTM VMD-LSTM VLG	24.1390 24.6208 2.2543 2.7534 2.5735 2.1429 2.3571 1.2135 0.9816	0.7494 0.7501 0.9784 0.9728 0.9736 0.9781 0.9769 0.9885 0.9912	404.81 407.97 28.02 60.31 52.63 51.53 50.15 19.74 17.62	331.16 336.96 35.12 47.98 35.30 29.82 32.65 16.22 13.50	11.4670 11.6931 1.5754 2.1322 2.2731 1.8272 2.1480 0.9312 0.8991	0.8821 0.8796 0.9802 0.9745 0.9703 0.9820 0.9712 0.9907 0.9919	216.55 218.64 29.11 41.33 48.26 45.49 53.52 19.23 18.13	171.91 174.94 21.20 29.86 29.43 27.61 34.57 14.11 13.98	15.2161 15.5422 2.7415 2.6262 2.5139 2.4783 2.3904 1.1885 0.8212	0.8346 0.8339 0.9737 0.9749 0.9778 0.9789 0.9795 0.9894 0.9923	210.36 210.86 47.22 55.91 50.11 46.91 45.24 18.88 13.93	166.59 170.14 39.03 42.64 36.12 31.36 30.27 14.52 10.19	12.4362 12.5620 2.9042 2.7440 2.606 2.2424 2.0946 1.3132 1.2181	0.8744 0.8706 0.9721 0.9787 0.9793 0.9801 0.9814 0.9862 0.9896	222.15 220.51 55.12 52.35 57.42 49.50 45.57 44.21 25.41	165.58 166.75 39.93 34.22 37.84 28.20 27.27 27.54 20.21	15.8146 16.1045 2.3687 2.5639 2.4916 2.1727 2.2475 1.1616 0.9800	0.8351 0.8336 0.9761 0.9752 0.9755 0.9798 0.9773 0.9887 0.9913	263.46 264.49 39.86 52.475 52.11 48.35 48.62 25.51 18.77	208.81 212.19 33.82 38.68 34.67 29.24 31.19 18.09 14.47

Note: The above tables report the details of different forecasting methods. Three indicators were selected—MAE, RMSE, and MAPE—to verify the performance of each model. The specific equations of the three indicators are

MAE = 1 / N (\sum_{n = 1}^{N} |{\hat{y}}_{i} - y_{i}|)

,

RMSE = \sqrt{\frac{1}{N} \sum_{n = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}}

,

MAPE = \frac{1}{N} \sum_{n = 1}^{N} |\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}|

, and

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(F_{i} - A_{i})}^{2}}{\sum_{i = 1}^{N} {(F_{i} - \bar{F})}^{2}}

. Here, bold numbers indicate that the index values of this system are superior to those of the other system.

Table 11. Comparison of the results for different input unit lengths and numbers of cell units based on the NSW seasonal dataset.

Model	Evaluation Parameters	MAPE	MAE	RMSE	R²
Season: Spring The optimal parameters: 12–8
VMD-LSTM	BEGA 5–8 16–12	0.2602 0.3612 0.5194	15.53 22.01 30.50	20.26 28.12 35.13	0.9986 0.9973 0.9949
Season: Summer The optimal parameters: 1–-8
VMD-LSTM	BEGA 5–8 16–12	0.3718 0.6480 0.5034	21.67 37.35 28.95	28.46 45.24 35.03	0.9971 0.9944 0.9952
Season: Autumn The optimal parameters: 25–12
VMD-LSTM	BEGA 5–8 16–12	0.3101 0.5785 0.3554	17.30 30.68 19.22	21.60 37.96 23.75	0.9978 0.9940 0.9969
Season: Winter The optimal parameters: 4–6
VMD-LSTM	BEGA 5–8 16–12	0.3758 0.5828 0.5521	20.90 32.69 30.63	25.77 40.98 37.55	0.9965 0.9938 0.9942

Table 12. Comparison of the performance results of different optimizers based on the NSW spring dataset.

Forecasting Model		Metric	Parameters
Forecasting Model	Optimizer		BEGA (4–6)	12–8	8–10	16–12
LSTM	Adam	MAPE MAE RMSE R²	0.8744 66.22 86.79 0.9926	1.1920 89.74 114.19 0.9874	0.9580 74.80 92.12 0.9901	0.8939 68.01 84.74 0.9913
LSTM	SGD	MAPE MAE RMSE R²	1.071 81.34 109.94 0.9885	1.2219 91.59 118.64 0.9870	1.2685 96.27 123.39 0.9868	1.2840 97.95 126.43 0.9866
VMD-LSTM	Adam	MAPE MAE RMSE R²	0.2281 19.58 22.89 0.9987	0.2947 22.35 27.02 0.9981	0.3671 28.53 33.65 0.9971	0.3609 19.21 35.50 0.9972
VMD-LSTM	SGD	MAPE MAE RMSE R²	0.6253 60.04 48.52 0.9935	0.7022 65.98 51.61 0.9928	0.8376 61.64 77.12 0.9916	0.9021 66.66 82.78 0.9903

Table 13. The Diebold-Mariano test results for each model.

Site	Model	VLG (Adam)	Model	VLG (Adam)
NSW	Elman RBF SVM BP PSO-BP LSTM EMD-BP	9.8311 * 9.9634 * 4.1134 * 3.1654 * 2.9253 * 3.8639 * 3.8217 *	EMD-LSTM DA-BP DA-LSTM VMD-BP VMD-LSTM (SGD) VMD-LSTM (Adam) VLG (SGD)	2.1136 ** 3.0329 * 3.9364 * 3.8029 * 3.8395 * 2.0104 ** 3.7156 *
Site	Model	VLG (Adam)	Model	VLG (Adam)
QLD	Elman RBF SVM BP PSO-BP LSTM EMD-BP	8.2174 * 8.2316 * 4.3108 * 3.5936 * 3.1732 * 3.1420 * 2.4718 **	EMD-LSTM DA-BP DA-LSTM VMD-BP VMD-LSTM (SGD) VMD-LSTM (Adam) VLG (SGD)	2.3522 ** 2.9871 * 2.8346 * 3.2135 * 2.8724 * 2.3412 2.3439
Site	Model	VLG (Adam)	Model	VLG (Adam)
SA	Elman RBF SVM BP PSO-BP LSTM EMD-BP	7.9347 * 7.9931 * 3.3416 * 3.3153 * 3.1732 * 3.3509 * 2.1798 **	EMD-LSTM DA-BP DA-LSTM VMD-BP VMD-LSTM (SGD) VMD-LSTM (Adam) VLG (SGD)	2.1261 ** 2.8469 * 2.9336 * 2.1743 2.0214 1.9542 * 2.0147

* represents the DM value at a confidence level of 1%. ** represents the DM value at a confidence level of 5%. *** represents the DM value at a confidence level of 10%.

Table 14. Comparison of the forecasting performance of the combined model and that of other VMD-based models.

Model	MAE			RMSE			MAPE (100%)			Percentage of the MAPE
Model	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step
VMD-GWO-SVM VMD-PSO-BP VLG	39.22 19.90 18.13	146.36 119.96 48.77	164.35 133.26 103.22	99.87 48.52 23.54	183.45 161.51 56.73	198.24 172.97 133.86	0.8846 0.2821 0.2448	1.9159 1.6333 0.7653	2.0213 1.6722 1.3565	72.33% 13.22% -	60.05% 53.14% -	32.88% 18.87% -

Table 15. Improvement percentage of the proposed model.

Model P _MAPE (100%)	Spring			Summer			Autumn			Winter			Annual Mean
Model P _MAPE (100%)	NSW	QLD	SA	NSW	QLD	SA	NSW	QLD	SA	NSW	QLD	SA	NSW	QLD	SA
Elman RBF ARIMA SVM BP PSO-BP LSTM VMD-LSTM EMD-LSTM-GA DA-LSTM-GA VMD-PSO-BP	98.39 98.41 82.66 84.91 75.33 73.81 65.22 38.77 10.11 35.79 12.77	96.50 96.51 86.65 81.71 73.57 69.94 67.70 46.44 14.89 39.46 16.31	95.93 96.01 56.45 64.35 61.85 54.19 58.3 19.11 10.36 27.33 9.80	94.78 94.87 70.89 75.56 66.22 44.24 55.26 38.11 8.74 28.22 10.06	94.54 94.6 82.89 77.09 68.18 53.36 61.98 16.50 9.47 32.12 9.89	92.15 92.31 42.87 57.83 60.44 50.79 58.14 3.44 7.32 27.22 4.22	95.97 96.09 82.49 85.37 78.79 75.12 73.70 42.79 14.31 38.12 15.25	95.68 95.70 85.58 74.46 67.83 64.82 64.51 40.68 12.74 36.48 13.22	94.60 94.71 70.04 68.73 67.33 66.86 66.64 30.90 9.33 29.87 11.47	94.29 94.27 75.44 79.27 58.34 52.30 52.51 11.64 7.35 0.13 8.36	89.63 89.62 75.53 67.57 51.19 48.02 49.41 23.47 7.27 19.52 7.98	90.20 90.31 58.06 55.61 53.25 45.67 41.84 7.24 6.11 17.84 5.31	96.50 96.55 77.98 81.33 69.95 63.19 61.81 32.78 10.31 30.83 11.12	94.59 94.62 82.83 75.34 65.54 59.17 54.99 31.78 11.09 31.89 11.85	93.80 94.26 58.63 61.78 62.56 54.89 56.39 15.63 8.23 28.86 7.75
Model P_RMSE (100%)	Spring			Summer			Autumn			Winter			Annual Mean
Model P_RMSE (100%)	NSW	QLD	SA	NSW	QLD	SA	NSW	QLD	SA	NSW	QLD	SA	NSW	QLD	SA
Elman RBF ARIMA SVM BP PSO-BP LSTM VMD-LSTM EMD-LSTM-GA DA-LSTM-GA VMD-PSO-BP	98.21 98.23 75.16 82.83 76.24 77.23 66.67 38.43 10.45 32.48 13.34	96.19 96.21 85.23 82.14 72.91 72.27 70.06 41.22 13.96 37.97 17.54	95.64 95.68 37.12 70.78 66.52 65.81 64.87 10.739 11.43 26.71 9.66	94.69 94.79 68.11 59.73 65.25 44.31 57.41 34.14 8.41 30.15 10.23	94.46 94.55 78.68 77.09 66.86 53.54 59.90 12.35 9.33 34.01 8.97	91.62 91.70 37.71 56.13 62.43 60.14 66.12 5.720 7.42 26.17 4.63	96.03 96.09 81.55 79.58 78.99 76.42 74.04 42.46 15.09 37.38 14.17	95.99 96.03 84.20 78.80 70.71 69.05 66.80 38.91 11.98 35.84 14.03	93.37 93.39 70.49 75.08 72.20 70.30 69.21 26.21 9.48 28.85 11.66	92.70 92.69 67.87 77.28 61.25 72.24 58.31 13.44 6.89 0.24 8.72	89.70 89.61 72.39 70.91 54.88 68.76 52.07 41.81 7.05 18.89 7.64	88.56 88.47 53.90 51.46 55.75 48.67 44.24 42.52 6.35 16.97 5.44	96.14 96.18 73.08 85.11 70.53 69.65 63.84 32.13 10.64 32.85 11.10	94.62 94.67 80.51 77.29 66.583 66.60 62.25 34.86 11.01 32.22 12.30	92.88 92.91 52.91 64.23 63.98 61.17 61.39 26.42 8.44 27.26 7.21

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jin, Y.; Guo, H.; Wang, J.; Song, A. A Hybrid System Based on LSTM for Short-Term Power Load Forecasting. Energies 2020, 13, 6241. https://doi.org/10.3390/en13236241

AMA Style

Jin Y, Guo H, Wang J, Song A. A Hybrid System Based on LSTM for Short-Term Power Load Forecasting. Energies. 2020; 13(23):6241. https://doi.org/10.3390/en13236241

Chicago/Turabian Style

Jin, Yu, Honggang Guo, Jianzhou Wang, and Aiyi Song. 2020. "A Hybrid System Based on LSTM for Short-Term Power Load Forecasting" Energies 13, no. 23: 6241. https://doi.org/10.3390/en13236241

APA Style

Jin, Y., Guo, H., Wang, J., & Song, A. (2020). A Hybrid System Based on LSTM for Short-Term Power Load Forecasting. Energies, 13(23), 6241. https://doi.org/10.3390/en13236241

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid System Based on LSTM for Short-Term Power Load Forecasting

Abstract

1. Introduction

2. Related Theory

2.1. Variational Mode Decomposition (VMD)

2.2. Long Short Term Memory Neural Network (LSTM)

2.3. Binary Encoding Genetic Algorithm (BEGA)

2.4. Adaptive Moment Estimation Optimizer (Adam)

3. The Formation of the Combined Forecasting System

3.1. Data Preprocessing Module

3.2. Optimization Algorithm Module

3.3. Forecast Module

4. Experiment and Evaluation

4.1. Model Evaluation Indicators

4.2. Experimental Setup

4.3. Data Description

4.4. Model Parameter Setting

4.4.1. Parameter Setting of the ANN

4.4.2. Parameter Settings of the Optimization Algorithm

4.5. Experiment I

4.6. Experiment II

4.7. Experiment III

5. Discussion

5.1. Effectiveness of the Proposed System

5.2. Model Stability Study

5.3. Multi-Step Prediction and Result Analysis

5.4. Improvement of the Evaluation Index

5.5. Future and Prospects

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI