Short-Term Load Forecasting Based on Deep Learning Bidirectional LSTM Neural Network

: Accurate load forecasting guarantees the stable and economic operation of power systems. With the increasing integration of distributed generations and electrical vehicles, the variability and randomness characteristics of individual loads and the distributed generation has increased the complexity of power loads in power systems. Hence, accurate and robust load forecasting results are becoming increasingly important in modern power systems. The paper presents a multi-layer stacked bidirectional long short-term memory (LSTM)-based short-term load forecasting framework; the method includes neural network architecture, model training, and bootstrapping. In the proposed method, reverse computing is combined with forward computing, and a feedback calculation mechanism is designed to solve the coupling of before and after time-series information of the power load. In order to improve the convergence of the algorithm, deep learning training is introduced to mine the correlation between historical loads, and the multi-layer stacked style of the network is established to manage the power load information. Finally, actual data are applied to test the proposed method, and a comparison of the results of the proposed method with different methods shows that the proposed method can extract dynamic features from the data as well as make accurate predictions, and the availability of the proposed method is veriﬁed with real operational data.


Introduction
A reliable and accuracy short-term load forecasting system is the basis of energy trade between the customers and electrical utility companies [1,2]. With the increasing penetration of distributed generations and consumer energy systems, the randomness and variability of load profiles bring more challenges for short-term load forecasting systems. Researchers around the world have focused on short-term load forecasting in recent years and tried to get a more accuracy forecasting result using variable new technologies.
The traditional load forecasting method uses statistics [1,3,4], which has appeared in former studies. However, large amounts of precise historical data are needed, which increases the challenges of accurate prediction. Artificial neural network-based methods are the most popular among the data-driven methods due to their strong capability of nonlinear approximation and self-learning. Different types of neural networks such as back propagation (BP) [5], radial basis function (RBF) [6], and extreme learning machines (ELM) [7,8] have been proposed and applied in short-term load forecasting. Furthermore, a regularizing term and the combination of multiple ELM is added to reduce the randomness of traditional ELM in photovoltaic power forecasting in [8]. However, low convergence speed is always an obstacle to the large-scale application of neural networks.
The rapid development of a deep learning framework and artificial intelligence (AI) technology brings more choices for the power system load forecasting. In recent years, convolutional neural network (CNN) [9,10], deep belief network (DBN) [11][12][13], and deep residual networks (DRN) [14] have been developed and applied in load forecasting, which shows a promising prospect in load forecasting areas. These methods can extract the key elements of the load profile. A multiple-input deep convolutional neural network (CNN) model is proposed and applied in the short-term photovoltaic power forecasting in [9], in which solar radiation and ambient temperature combined with the historical output power of a PV system are collected as the input data of the forecasting model. In [10], a deep convolutional neural network-based forecasting method is proposed for the short-term PV power forecasting; here, the original meter data are decomposed into a two-dimensional timescale by convolution kernels and refined into advanced features by a CNN model. Deep belief network is applied in photovoltaic power forecasting in [11]; the proposed methodology is focused on real data capturing to establish the optimum architectural of deep belief network. An improved deep belief network is applied in load forecasting considering demand-side response in [12]; three aspects of the DBN are optimized to dispose the predictive accuracy. The deep belief network method is incorporated into a feed-forward neural network in [13], in which the layer-by-layer unsupervised training procedure is combined with parameters' fine-tuning based on a supervised back-propagation training method. In [14], a two-stage ensemble strategy of deep residual network is formulated to enhance the generalization capability of load forecasting.
Due to its advantages to solve the vanishing gradient issues, LSTM is more effective than a recurrent neural network to deal with industrial problems that are highly related to time series [15][16][17]. A LSTM neural network has been successfully deployed in many practical applications; it can learn longer-term dependencies due to the associated memory units [18][19][20]. An LSTM architecture-based method is used in a distributed network in [18], in which the LSTM-based structure is used for the linear regression of each node and receives a variable length data sequence with its neighbors to train the LSTM architecture. A video-captioning method based on adversarial learning and LSTM is proposed in [19]; it is used to handle the temporal nature of video data exponential error accumulation. In [20], an attention-based LSTM model with semantic consistency is used to transfer videos to natural sentences. An LSTM neural network is used in multimodal ambulatory sleep detection in [21], and the proposed method can synthesize temporal information accuracy. In [22], nonuniformly sampled variable length sequential data are classified, which is followed by regression by LSTM.
The LSTM neural network also shows significant potential in the prediction field. The authors in [23,24] proposed an LSTM-based framework for the single energy customer load forecasting. In [25], a multi-layer bidirectional RNN based on LSTM and a gated recurrent unit (GRU) is proposed for short-term load forecasting; the proposed method can match different types of load data and is shown to be more accurate. In [26], a forecasting model based on LSTM-DNN is proposed for the photovoltaic power output, available temperature data, and statistical features extracted from the historical photovoltaic output data using stationary wavelet transform. An LSTM neural network is proposed for the prediction of solar irradiance one hour in advance and one day in advance [27,28]; the clearness index was used to classify the type of weather by k-means. A k-means LSTM network model for wind power spot prediction is proposed in [29]; the wind power factors are clustered to generate a new LSTM sub-prediction model.
In [30], the authors proposed five LSTM-based forecasting methods for photovoltaic power prediction, and the prediction capacity is improved by stacking LSTM layers on top of each other. In [31] a one-dimensional convolutional stacked LSTM for load disaggregation is proposed; the deep learning framework is created by stacking several LSTM layers within the hidden layers. The hidden layers are joined reconnections in the LSTM cell. There is no gradient disappearance or gradient explosion problem in the prediction model of the stacked LSTM neural network. However, the long-distance data transmission will cause data loss, which will result in accumulated errors in the process prediction. In order to solve this problem, reverse computing combined with forward computing are introduced to solve the unidirectionality of the memory process in the process of training the data. A feedback mechanism is introduced to improve the front and back association. Combined with the reverse computing, the LSTM neural network has the ability of bidirectional computing, which can overcome the defection of data loss in long-distance transmission. Furthermore, forward and backward propagation prediction make data more dependent and reliable. A multi-layer stacked deep learning style is built for the data training process to improve the information communication between the dataset sequentially.
The main contributions of this paper are as follows: (1) A bidirectional LSTM shortterm load forecasting framework model is proposed in this paper, in which reverse computing is combined with forward computing to retrieve the important information hidden in the load profiles and improve the forecasting ability of the time-series problem. (2) A multilayer stacked bidirectional LSTM prediction structure based on deep learning technology is proposed. The advantages of the multi-layer structure are applied to analyze the load profiles and extract the data essential features. (3) Last, the multi-layer stacked bidirectional LSTM prediction model is approved by using real operational cases, and the evaluation results are compared with other methods.

LSTM Neural Network
The LSTM neural network was proposed in 1997, which is a time-domain deep learning neural network. Compared with traditional recurrent neural network, there are two special parts: the forget gate and the memory unit in the hidden layer of the LSTM neural network. A long-term information stream from the input to output can improve the memory capacity of the neural network in the process of training. The structure of the LSTM cell is shown in Figure 1. It consists of four computing units: namely, the output gate, forget gate, memory unit, and input gate, respectively. Based on the output h t−1 of the last hidden layer and the current input x t , a new value of f t is generated based on the activation function "Sigmoid", which determines whether to let the information C t−1 learned in the last moment pass through; that is, how much of the last cell state C t−1 is saved to the current time C t . The function between h t−1 ,x t and f t can be written as: where σ is the "Sigmoid" function, and the range of the output value of the "Sigmoid" function is [0,1]. W f is the weight matrix of the forget gate, b f is the bias of the forget gate, f t is the value of the forget gate which decides the forgetting factor of long-term memory information. The value of f t is between [0,1]. The threshold of LSTM consists of a sigmoid activation function and dot multiplication operation. After the hidden layer of the previous moment enters the forget gate, the function gives the judge information about whether it is updated or not. However, the cell state rolls continuously and runs in the horizontal direction.
where tanh is the hyperbolic tangent activation function, C t is the temporary unit state of C t , W c is the weight matrix of the memory unit, b c is the bias of the memory unit, and I t is the output value of the input gate. The current cell state C t is the sum of the original state and the updated state.
where W o is the weight matrix of the output gate, o t is the output value of the output gate, and b o is the bias of the output gate. The initial output h t is obtained through the sigmoid layer, tanh(C t ) is between −1 and 1.
The signal passes through the input gate, output gate, and forget gate in turn, and it realizes information storing and maintaining in the current time period. The LSTM structure of the neural network shows that the input variable is transmitted horizontally from input to output directly. Hence, the prediction error will accumulate continuously and suddenly swell in direct proportion with the previous time in the prediction model. Figure 2 shows the LSTM prediction accumulation error in the power load forecasting in one day. The short-term load forecasting method usually takes a day or a week as the training dataset. Three days' data are taken as sample forecast in Figure 2. The error accumulation occurs in the LSTM prediction results with the increase in the step of the prediction data, and the error will become larger and larger as time goes on.

Bidirectional LSTM Neural Network
In order to overcome the accumulative error problem, a bidirectional LSTM is proposed here, which is shown in Figure 3. The bidirectional LSTM neural network consists of two layers of LSTM structure; one is used to calculate the hidden vector from the front to the back, and the other is used to calculate the hidden vector from the back to the front. The output of the bidirectional LSTM neural network is determined by these two layers.
where t s is the state variable of the hidden layer at time t , o t is the state variable output layer at time t , t s  is the state variable of the reverse hidden layer at time is the input vector, g and f are activation functions, V , W , and U are the w matrix from the hidden layer to the output layer, the hidden layer, and the input la the hidden layer, and V  , W  , and U  are the corresponding reverse weight matri state weight matrix of the forward layer and the backward layer is not shared inform between the two. The forward layer and backward layer are calculated in turn an the result of each time. The final output o t depends on the sum of the forward calcu result t s and the reverse calculation result t s  .
Forward Layer Input Layer

Multi-Layer Stacked Bidirectional LSTM Neural Network for Short-Term Load Forecasting
The power load profile is affected by the residential electricity behavior, temper humidity, etc. It is a multi-dimension nonlinear problem. The bidirectional LSTM n network solves the accumulated error problem in the training process. Furthermo multi-layer bidirectional LSTM neural network is the fusion of a bidirectional neur work based on a deep learning mechanism. A multi-layer forward structure and r structure constitute the multi-layer stacked bidirectional LSTM. The multi-layer st bidirectional LSTM neural network expands the depth of the bidirectional LSTM n network. The input data can be learned repeatedly to get an in-depth understand the data characteristics and improve the accuracy of load forecasting.

Multi-Layer Stacked Bidirectional LSTM Neural Network
The system structure of the multi-layer stacked bidirectional LSTM is shown i ure 4. In the multi-layer stacked structure, every two layers of the LSTM neural ne are composed of forward and reverse LSTM networks. The second layer receives th of the output results of the first layer of forward and reverse LSTM. The bidirectional LSTM neural network is different from the traditional feed-forward mechanism neural network. The internal nodes in each layer do not connect with each other in bidirectional LSTM. A directional loop is introduced in the connection of hidden layers, foregoing information; results are memorized and stored in the memory unit, which can improve the association of single pieces of information in different time series. The current output of the neural network is determined bycombining the previous output and the current input. However, with the increasing amount of input data in the time series, there will be gradient disappearance and gradient explosion problems due to the lack of delay window width.
Based on the traditional LSTM model, the bidirectional LSTM neural network will fully consider the front and back correlation of the load data in time series and improve the model performance for the sequence classification problem especially. During the training process, the input data sequence of the forward layer is the training data, and the backward layer is the reverse copy of the input data sequence. The results of bidirectional structure prediction are determined by the previous input and the latter input, which increases the dependence between the training data to avoid the forgetting of the order information. Figure 3 shows that the forward layer calculates the forward direction from 1 to t, and it saves the output of the forward hidden layer at each moment. The backward layer calculates the reverse time series and saves the output of the backward hidden layer at each moment. Finally, the output of the bidirectional LSTM neural network is calculated by combining the corresponding output results of the forward layer and backward layer at each time point. The bidirectional LSTM neural network can be written as: where s t is the state variable of the hidden layer at time t, o t is the state variable of the output layer at time t, s t is the state variable of the reverse hidden layer at time t, x t is the input vector, g and f are activation functions, V, W, and U are the weight matrix from the hidden layer to the output layer, the hidden layer, and the input layer to the hidden layer, and V , W , and U are the corresponding reverse weight matrix. The state weight matrix of the forward layer and the backward layer is not shared information between the two. The forward layer and backward layer are calculated in turn and give the result of each time. The final output o t depends on the sum of the forward calculation result s t and the reverse calculation result s t .

Multi-Layer Stacked Bidirectional LSTM Neural Network for Short-Term Load Forecasting
The power load profile is affected by the residential electricity behavior, temperature, humidity, etc. It is a multi-dimension nonlinear problem. The bidirectional LSTM neural network solves the accumulated error problem in the training process. Furthermore, the multi-layer bidirectional LSTM neural network is the fusion of a bidirectional neural network based on a deep learning mechanism. A multi-layer forward structure and reverse structure constitute the multi-layer stacked bidirectional LSTM. The multi-layer stacked bidirectional LSTM neural network expands the depth of the bidirectional LSTM neural network. The input data can be learned repeatedly to get an in-depth understanding of the data characteristics and improve the accuracy of load forecasting.

Multi-Layer Stacked Bidirectional LSTM Neural Network
The system structure of the multi-layer stacked bidirectional LSTM is shown in Figure 4. In the multi-layer stacked structure, every two layers of the LSTM neural network are composed of forward and reverse LSTM networks. The second layer receives the sum of the output results of the first layer of forward and reverse LSTM. Figure 4 specifies the multi-layer bidirectional LSTM neural network system structure; the output of the multi-layer stacked bidirectional LSTM neural network is determined by the forward and backward results of each layer, and its model can be expressed as follows.
where s i t and s i t−1 are the state variables of the i th hidden layer at t − 1 and t time, respectively. The forward and reverse calculations do not share the weight information. V (i) , U (i) , and W (i) are the weight matrix between the input layer, hidden layer, and output layer. In the reverse calculation, V (i), U (i), and W (i) are the corresponding inverse weight matrix, respectively. i is the number of bidirectional LSTM layers, and i = 0, 1, 2 · · · ∞ represents the value of the output layer.    Figure 4 specifies the multi-layer bidirectional LSTM neural network system structure; the output of the multi-layer stacked bidirectional LSTM neural network is determined by the forward and backward results of each layer, and its model can be expressed as follows.

Multi-Layer Stacked Bidirectional LSTM Based Load Forecasting
The essential concept of the proposed improved LSTM neural network consists of obtaining the statistical analysis of the power load by reconstructing the training sample data. The multi-layer stacked bidirectional LSTM network is trained to perform the forecast of the power load for the next 24 h. The prediction process of the proposed model can be divided into the follow steps and shown in Figure 5. Step 1: Data preparation. Historical data of the power load profiles are collected and pre-processed to remove any outlier or incorrect data before the training process. However, the original data are not standard enough to use directly. Normalization is a common method to normalize original data structures in system modeling, and the original data become dimensionless after normalization, which can increase the convergence speed of the neural network. After normalization, the value of the original data is between the range of [0,1]. There are many normalization methods such as min-max scaling, Z-score standardization method, and decimal scaling. In this paper, a linear normalization method based on min-max scaling is used, which can be written as follows: x max and x min are the maximum and minimum values of the sample data of power load, x is the original value of the sample data, and x * is the normalized value of the original data.
Step 2: Network training. The forward value of input at t = 1 and the reverse state value of input at t = T (T is the last sampling time of the training dataset) are unknown, which are generally set to a fixed value (0.5) in the training process. Additionally, the derivative of the forward value of input at T = t and the original value of the reverse state of t = 1 are generally set to zero. It is assumed that the later information is not very important for the current information updated. The process of network training contains the following: (1) Forward transfer. With the time sequence of 1 < t <= T, training data are input from the cell of the bidirectional LSTM, and the predicted outputs are determined. Forward Appl. Sci. 2021, 11, 8129 9 of 16 passes are only for forward states (from t = 1 to t = T) and backward states (from t = T to t = 1). The output cells were transferred forward, and the n-th layer forward predicted output is calculated.
(2) Backward transfer: The derivative of the partial objective function is calculated for the forward transfer time period with 1 < t <= T. The backward LSTM cells are calculated based on the forward value of 1 < t <= T and the reverse value of 1 < t <= T. The reversed prediction output is calculated.
(3) Weight matrix updating. Based on the loss function of the neural network during the training process, the weight matrix is calculated and updated.
(4) Result output. Based on the bidirectional calculation, the parameters of the prediction model of LSTM neural network are estimated.

Evaluation Index
In this paper, the mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE) are used to evaluate the error of prediction results. MAPE, RMSE, and MAE are common indicators to evaluate the accuracy of the proposed model based on the measurement value and estimated value. The definition of the indicators is shown in Equations (16)- (18). MAE is the estimated indictor, which is used as the measurement value. RMSE is used to evaluate the deviation between the observed value and the true value; it is sensitive to outliers. MAPE is used to evaluate the relative errors between the average observed value and the true value on the test. MAE can reflect the error distribution during the time series, while MAPE normalizes the error at different points and reduces the effect of the absolute errors of the outliers.
n is the number of sample data, x i is the real value, andx i is the predicted value.

Simulation and Experimental Analysis
In this section, we evaluate the performance of the proposed multi-layer stacked bidirectional LSTM neural network for short-term load forecasting, and the key parameters of the model are discussed as well. Moreover, the comparison between the proposed method and previous work are also assessed. All models were executed in a computer with a CPU clock speed of 3.0 GHz and 8 GB of RAM. The hidden layer of the proposed model is 100, the hidden node is 8, the initial value of training learning rate is 0.01, and the number of model training iterations is 100.

Dataset for Load Forecasting
The databases used in the paper were obtained from the station in the southwest of China with an AC power voltage of 35 kV. The dataset contains a 3-year power load profile with the sampling time of 15 min. The dataset is a mixed dataset that contains different types of loads such as resident load, commercial load, and industrial load. The dataset was pre-processed in order to separate the relevant data and select the predictive features in the models. Here, we separated the dataset into different types for the load forecasting based on days and season characteristics. The pre-processing of the dataset is shown in Section 3.2, and the forecasting models were trained and tested using a 1-year sample dataset where the first 80% is used for model training and the remaining 20% is used to test the performance of the proposed model.

Neural Network Structure Determine
Prediction accuracy has a significant relationship with the depth of the bidirectional LSTM neural network. The dynamic characteristics of the load data will be extracted based on the interaction of the different layers of the neural network. The internal relevance information of the load profiles will be deep learned with the different stacked layers, and the nonlinearity of the load sequence can be described in different dimensions. The parameters of the input units, forget units, and output units of the proposed model are shown in Table 1. The prediction accuracy of the different layers of the LSTM neural work is shown in Figure 6. It can be seen that the proposed multi-layer bidirectional LSTM neural network is an effective method and is accurate enough for the load forecast problem. Furthermore, with the increasing numbers of layers, the prediction result will be more accurate. However, when there are four layers, the prediction accuracy will increase, on the contrary. It is said that three layers is suitable for the prediction of the load sequence data in this paper. Table 2 shows the prediction errors of MAPE between the different layers of the different neural network model.

Method Comparison
In order to show the high performance of the multi-layer stacked bidirectional LSTM neural network in short-term load forecasting, different methods that contain a BP neural network, ELM, traditional LSTM, and multi-layer stacked bidirectional LSTM model are discussed in this paper. The prediction results all the methods tested in this paper followed the same trend with the real load power shown in Figure 7. It can be seen that the multilayer stacked bidirectional LSTM neural network will be more competitive, and the error comparison of those methods is shown in Table 3, where the MAPE, RMSE, and MAE index are calculated and compared for one day over 24 h. From Table 3, the average MAPE of the proposed method prediction model is 0.4137%; however, the average MAPE values of the BP, LSTM, and ELM models are 1.485%, 1.030%, and 0.77%, respectively. The average RMSE of the proposed method prediction model is 0.706, and those of the BP, LSTM, and ELM models are 2.95, 1.921, and 1.369, respectively.  Different time interval errors are calculated between ELM, LSTM, BP, and the proposed method in Table 4. The two-hour interval forecasting results fluctuate in different evaluation indexes. However, the total evaluation indexes of the proposed method are at the minimum in one day, and the quantitate analysis forecasting errors are calculated and shown in Figure 8. Here, it can be seen that proposed method based on the multi-layer bidirectional LSTM prediction model better grasps the prediction sample information and has a more competitive forecasting performance. The multi-layer stacked bidirectional LSTM neural network model can retain the original characteristics of the load sequences and reduce the data error by incorporating errors in unsupervised training, which can enhance the robustness of the predictive model.  Different samples of training data will significantly affect the robustness of the load forecasting model. The load will be more accurate with a smaller sample in the training dataset. With 48 or 24 measurement points, the training dataset will be more random, which will increase the difficulty of the load prediction. In this paper, a sample dataset with 48 measurement points is used for the training of the proposed method to verify the robustness of the proposed method and compare it with other methods. Figure 9 shows the comparison of the load forecasting results with a half-hour training dataset; the proposed method is accurate enough to track the load profiles based on deep learning, which can extract the internal characteristics of the discrete sample load data and improve the robustness of the proposed method with the multi-layer bidirectional training mechanism. The MAPE of the proposed method is 2.39%, as shown in Table 5.  Furthermore, in order to verify the generalization ability of the proposed method for the more complex environments on special days such as the weekend or a holiday, study cases are tested with the historical day based on different measurement points; these are shown in Figures 10 and 11. It can be seen that the prediction results of the load profile can track the measurement data accuracy in different sample time intervals, and the sampling time will influence the prediction results. The results in Figures 10 and 11 show that the proposed multi-layer stacked bidirectional LSTM method is more accurate than the other methods mentioned such as the BP neural network, ELM, and traditional LSTM neural network.

Conclusions
Accurate short-term load forecasting is a huge challenge due to the complexity of the electrical load composition in modern power systems. In this paper, based on the traditional LSTM neural network, a multi-layer stacked type short-term load forecasting method is proposed. Reverse computing combined with forward computing is designed to solve the unidirectionality of the memory process during the training period. The output gate can collaborate the implied information in the historical load series. Furthermore, a multi-layer stacked deep learning style for the neural network is proposed to perceive a low-level features form of power load and form a more abstract high-level representation of load characteristics. At last, a load forecasting frame based on the multi-layer bidirectional LSTM neural network is proposed that contains neural network model construction, historical load profile training, and load forecasting. In the experiments, the real operational load data of a substation are tested, and the performance of the proposed method is tested and evaluated. The results show that the proposed multi-layer stacked bidirectional LSTM neural network method has high performance and is more accurate than the others. The proposed method can retain the original information as much as possible and has a strong memory function to extract the relevant information from historical load sequences. However, with the increase in the sequence length of the problem, the efficiency of the proposed method will reduce because the capacity of the memory units is limited. There are four fully connected layers in each cell in the LSTM neural network; it needs a lot of computing time in a deep stacked LSTM neural network. Future works will focus on the industrial application of the proposed method with a more complex dataset. (1) We built an online load forecasting system. The application of load forecasting is employed for the dispatch of the power system, which is working all the time. Hence, an online and rolling load forecasting system using the historical load data is the basis of this work.
(2) We corrected the load forecasting results. If the load forecast results deviate greatly, the forecast points are corrected based on the data before and after time points.
Author Contributions: Conceptualization, C.C. and Y.T.; methodology, Y.T., T.Z.; validation, C.C. and Y.T.; writing-review and editing, C.C. and Y.T. and Z.D. All authors have read and agreed to the published version of the manuscript.