Long-Short-Term Memory Network Based Hybrid Model for Short-Term Electrical Load Forecasting

Short-term electrical load forecasting is of great significance to the safe operation, efficient management, and reasonable scheduling of the power grid. However, the electrical load can be affected by different kinds of external disturbances, thus, there exist high levels of uncertainties in the electrical load time series data. As a result, it is a challenging task to obtain accurate forecasting of the short-term electrical load. In order to further improve the forecasting accuracy, this study combines the data-driven long-short-term memory network (LSTM) and extreme learning machine (ELM) to present a hybrid model-based forecasting method for the prediction of short-term electrical loads. In this hybrid model, the LSTM is adopted to extract the deep features of the electrical load while the ELM is used to model the shallow patterns. In order to generate the final forecasting result, the predicted results of the LSTM and ELM are ensembled by the linear regression method. Finally, the proposed method is applied to two real-world electrical load forecasting problems, and detailed experiments are conducted. In order to verify the superiority and advantages of the proposed hybrid model, it is compared with the LSTM model, the ELM model, and the support vector regression (SVR). Experimental and comparison results demonstrate that the proposed hybrid model can give satisfactory performance and can achieve much better performance than the comparative methods in this short-term electrical load forecasting application.


Introduction
With the rapid development of THE economy, the demand for electricity has increased greatly in recent years.According to the statistics [1], the global power generation in 2007 was about 19,955.3 TWh, of which the power generation in China was 3281.6 TWh; and in 2015, the global power generation was about 24,097.7 TWh, while the power generation in China was 5810.6 TWh.In order to realize the sustainable development of our society, we need to adopt efficient strategies to effectively reduce the level of the electrical load.Electrical load forecasting plays an important role in the efficient management of the power grid, as it can improve the real-time dispatching and operation planning of the power systems, reduce the consumption of non-renewable energy, and increase the economic and social benefits of the power grids.
According to the prediction intervals, the electrical load forecasting problem can be divided into three categories: the short-term electrical load forecasting (hourly or daily forecasting), the medium-term electrical load forecasting (monthly forecasting), and the long-term electrical load forecasting (yearly forecasting).Among them, the short-term load forecasting is the most widely studied.In the past several decades, a great number of approaches have been proposed for electrical load prediction.Such approaches can be classified to be the traditional statistic methods and the computational intelligence methods.
The traditional statistic methods used the collected time series data of the electrical load to find the electricity consumption patterns.Many studies have applied statistical methods to electrical load forecasting.In [2], an autoregressive moving average (ARMA) model was given for modeling the electricity demand loads.In [3], the autoregressive integrated moving average model (ARIMA) model was designed for forecasting the short-term electricity load.In [4], the ARMA model for short-term load forecasting was identified considering the non-Gaussian process.In [5], a regression-based approach to short-term system load forecasting was provided.Finally, in [6], the multiple linear regression model was proposed for the modeling and forecasting of the hourly electric load.
In recent years, computational intelligence methods have achieved great success and are widely used in many areas, such as network resources optimization [7,8], resource management systems in vehicular networks [9,10], and so on.Especially in the area of electrical load forecasting, computational intelligence methods have found a large number of applications due to their strong non-linearity learning and modeling capabilities.In [11][12][13], support vector regression (SVR) was successfully applied to short-term electrical load forecasting.In [14], a non-parameter kernel regression approach was presented for estimating electrical energy consumption.As a biologically-inspired analytical method with powerful learning ability, neural networks (NNs) have attracted more and more attention to electrical load prediction over the last few years.For example, in [15], a dynamic NN was utilized for the prediction of daily power consumption so as to retain the production-consumption relation and to secure profitable operations of the power system.In [16], an improved back propagation NN (BPNN) based on complexity decomposition technology and modified flower pollination optimization was proposed for the short-term load forecasting application.In [17], a hierarchical neural model with time windows was given for the long-term electrical load prediction.In [18], a hybrid predictive model combining the fly optimization algorithm (FOA) and the generalized regression NN was proposed for the power load prediction.In [19], the radial basis function NN was presented for the short-term electrical load forecasting considering the weather factors.Extreme learning machine (ELM) as a special kind of one-hidden-layer NN, which is popular nowadays due to its fast learning speed and excellent approximation ability [20,21].It has also found applications in electrical load prediction.In [22], a novel recurrent ELM approach was proposed for the electricity load estimates, and in [23] Zhang et al. proposed an ensemble model of ELM for the short-term load forecasting of the Australian national electricity market.
However, these aforementioned NNs, including the ELM, are all shallow ones which have only one hidden layer.The shallow structures limit their abilities to learn the deep patterns from the data.On the other hand, the electrical load data usually has high levels of uncertainties and randomness because the load can be affected by many random factors, such as the weather conditions, the socio-economic dynamics, etc.Such uncertainties make the accurate forecasting of the electrical load a difficult task.Reinforcement learning and deep learning provide us powerful modeling techniques that can effectively deal with high levels of uncertainties.Reinforcement learning learns optimal strategies in a trial-and-error manner by continuously interacting with the environment [24,25] and has found applications in this area.For example, in [26], reinforcement learning was successfully applied to the real-time power management for a hybrid energy storage system.On the other hand, the deep neural network can extract more representative features from the raw data in a pre-training way for obtaining more accurate prediction results.Due to the superiority in feature extraction and model fitting, deep learning has attracted a great amount of attention around the world, and has been widely applied in various fields, such as green buildings [27,28], image processing [29][30][31][32], speech recognition [33,34], and intelligent traffic management systems [35][36][37].As a novel deep learning method, the long-short-term memory network (LSTM) can make full use of the historical information due to its special structure [38].This makes the LSTM give more accurate estimated results for time series prediction applications.The LSTM has been successfully applied to the multivariate time series prediction [39], the modeling of the missing data in clinical time series [40], traffic speed prediction [41], and time series classification [42].All these applications have verified the power of the LSTM method.
In this study, in order to further improve the forecasting performance for electrical loads, a hybrid model is proposed.The proposed hybrid model combines the LSTM model and the ELM model to effectively model both the deep patterns and the shallow features in the time series data of the electrical load.Further, the linear regression model is chosen as the ensemble part of the proposed hybrid model, and the least square estimation method is adopted to determine the parameters of the linear regression model.Then, the hybrid model is applied to predict two real-world electrical load time series.Additionally, comparisons with the LSTM, ELM, and SVR are conducted to show the advantages of the proposed forecasting model.From the experimental and comparison results, we can observe that the proposed hybrid model can give excellent forecasting performance and performs best compared to the comparative methods.
The remainder of this paper is structured as follows: In Section 2, the recurrent neural network (RNN), the LSTM and the ELM will be introduced.In Section 3, the hybrid model will be presented.In Section 4, the proposed hybrid model will be applied to forecast the electrical load of the Albert area and the electrical load of one service restaurant.Additionally, comprehensive comparisons will be provided.Finally, in Section 5, conclusions will be made.

Methodologies
In this section, the RNN will be introduced firstly, and then the LSTM will be discussed.Finally, the ELM will be given.

Recurrent Neural Network
A RNN is a special kind of artificial neural network.It still consists of the input layer, the hidden layer, and the output layer [38,39].The structure of the typical RNN model is shown in Figure 1.In the traditional feedforward NN, the nodes are connected layer by layer and there are no connections between the nodes at the same hidden layer.However, in the RNN, the nodes in the same hidden layer are connected with each other.The peculiarity is that a RNN can encode the prior information into the learning process of the current hidden layer, so the time series data can be learned efficiently.The mapping of one node g t can be represented as: where x t represents the input at time t; g t is the hidden state at time t, and it is also the memory unit of the network; W and U are the shared parameters in each layer; and f (•) represents the nonlinear function.
Information 2018, 9, x FOR PEER REVIEW 3 of 17 been successfully applied to the multivariate time series prediction [39], the modeling of the missing data in clinical time series [40], traffic speed prediction [41], and time series classification [42].All these applications have verified the power of the LSTM method.
In this study, in order to further improve the forecasting performance for electrical loads, a hybrid model is proposed.The proposed hybrid model combines the LSTM model and the ELM model to effectively model both the deep patterns and the shallow features in the time series data of the electrical load.Further, the linear regression model is chosen as the ensemble part of the proposed hybrid model, and the least square estimation method is adopted to determine the parameters of the linear regression model.Then, the hybrid model is applied to predict two real-world electrical load time series.Additionally, comparisons with the LSTM, ELM, and SVR are conducted to show the advantages of the proposed forecasting model.From the experimental and comparison results, we can observe that the proposed hybrid model can give excellent forecasting performance and performs best compared to the comparative methods.
The remainder of this paper is structured as follows: In Section 2, the recurrent neural network (RNN), the LSTM and the ELM will be introduced.In Section 3, the hybrid model will be presented.In Section 4, the proposed hybrid model will be applied to forecast the electrical load of the Albert area and the electrical load of one service restaurant.Additionally, comprehensive comparisons will be provided.Finally, in Section 5, conclusions will be made.

Methodologies
In this section, the RNN will be introduced firstly, and then the LSTM will be discussed.Finally, the ELM will be given.A RNN is a special kind of artificial neural network.It still consists of the input layer, the hidden layer, and the output layer [38,39].The structure of the typical RNN model is shown in Figure 1.In the traditional feedforward NN, the nodes are connected layer by layer and there are no connections between the nodes at the same hidden layer.However, in the RNN, the nodes in the same hidden layer are connected with each other.The peculiarity is that a RNN can encode the prior information into the learning process of the current hidden layer, so the time series data can be The connections between nodes in the RNN form a directed graph along a sequence.This allows it to exhibit dynamic temporal behavior for a time sequence.Unlike feedforward neural networks, RNNs can use their internal state (memory) to process sequences of inputs [38,39].In theory, a RNN is suitable for predicting future values using the information from the past data.However, in practical applications, when the time interval between the previous information and the current prediction position is large, the RNN cannot memorize the previous information well, and there still exists the vanishing gradient problem, so the predicted results from the RNN are not satisfactory sometimes.In recent years, to solve this weakness and enhance the performance of the RNN, the LSTM network was proposed.

Long-Short-Term Memory Network
A LSTM network is a RNN which is composed of LSTM units [38,39].The structure of the common LSTM unit is demonstrated in Figure 2. As shown in this figure, a common LSTM unit consists of a cell, an input gate, an output gate, and a forget gate.The cell is the memory in the LSTM which is used to remember the values over arbitrary time intervals.The "gate" of LSTM is a special network structure, whose input is a vector, and the output range is 0 to 1.When the output value is 0, no information is allowed to pass.When the output value is 1, all information is allowed to pass.The connections between nodes in the RNN form a directed graph along a sequence.This allows it to exhibit dynamic temporal behavior for a time sequence.Unlike feedforward neural networks, RNNs can use their internal state (memory) to process sequences of inputs [38,39].In theory, a RNN is suitable for predicting future values using the information from the past data.However, in practical applications, when the time interval between the previous information and the current prediction position is large, the RNN cannot memorize the previous information well, and there still exists the vanishing gradient problem, so the predicted results from the RNN are not satisfactory sometimes.In recent years, to solve this weakness and enhance the performance of the RNN, the LSTM network was proposed.

Long-Short-Term Memory Network
A LSTM network is a RNN which is composed of LSTM units [38,39].The structure of the common LSTM unit is demonstrated in Figure 2. As shown in this figure, a common LSTM unit consists of a cell, an input gate, an output gate, and a forget gate.The cell is the memory in the LSTM which is used to remember the values over arbitrary time intervals.The "gate" of LSTM is a special network structure, whose input is a vector, and the output range is 0 to 1.When the output value is 0, no information is allowed to pass.When the output value is 1, all information is allowed to pass.=( , , , , ) and the output vector are known, the calculation formula of the gate is expressed as follows: where ( ) 1/ (1 ) e  

  x x
; W is the weight matrix; and b is the bias vector.
In the LSTM, the role of the cell state is to record the current state.It is the core of the calculation node and can be computed as: where c W is the weight matrix of the cell state; o b is the bias vector of the cell state; t i means the input gate, which determines how much of the input at the current time is saved in the cell state; and When the current input vector x = (x 1 , x 2 , • • • , x t−1 , x t ) and the output vector s = (s 1 , s 2 , • • • , s t−1 , s t ) are known, the calculation formula of the gate is expressed as follows: where σ(x) = 1/(1 + e −x ); W is the weight matrix; and b is the bias vector.
In the LSTM, the role of the cell state is to record the current state.It is the core of the calculation node and can be computed as: where W c is the weight matrix of the cell state; b o is the bias vector of the cell state; i t means the input gate, which determines how much of the input at the current time is saved in the cell state; and f t represents the forget gate used to help the network to forget the past input information and reset the memory cells.The calculation of the input gate and forget gate can be respectively expressed as: where W i and W f are, respectively, the weight matrices of the input gate and forget gate, and b i and b f are, respectively, the bias vectors of the input gate and forget gate.
The output gate of the LSTM controls the information in the cell state of the current time to flow into the current output.The output o t can be expressed as: where W o is the weight matrix in the output gate, and b o is the bias vector in the output gate.
The final output of the LSTM is computed as: The training of the LSTM network usually adopts the back-propagation algorithm.For more details on the training and tuning of the LSTM model, please refer to [38].

Extreme Learning Machine
The ELM is one kind of the popular single hidden layer NNs [20].The network structure of the ELM is shown in Figure 3. Being different from the gradient descent method (the back-propagation algorithm) commonly used in NN training process, in the ELM, its parameters before the hidden layer are randomly generated, and its weights between the hidden layer and output layer are determined by the least square method.Since there is no iterative process, the amount of calculation and the training time in the ELM can be greatly reduced.Thus, it has a very fast learning speed [20].
where i W and f W are, respectively, the weight matrices of the input gate and forget gate, and i b and f b are, respectively, the bias vectors of the input gate and forget gate.
The output gate of the LSTM controls the information in the cell state of the current time to flow into the current output.The output t o can be expressed as: where o W is the weight matrix in the output gate, and o b is the bias vector in the output gate.
The final output of the LSTM is computed as: tanh( ) The training of the LSTM network usually adopts the back-propagation algorithm.For more details on the training and tuning of the LSTM model, please refer to [38].

Extreme Learning Machine
The ELM is one kind of the popular single hidden layer NNs [20].The network structure of the ELM is shown in Figure 3. Being different from the gradient descent method (the back-propagation algorithm) commonly used in NN training process, in the ELM, its parameters before the hidden layer are randomly generated, and its weights between the hidden layer and output layer are determined by the least square method.Since there is no iterative process, the amount of calculation and the training time in the ELM can be greatly reduced.Thus, it has a very fast learning speed [20].

Input layer
Hidden layer Output layer The input-output mapping of the ELM can be expressed as: where ( , ) k g x θ represents the activation function, and: The input-output mapping of the ELM can be expressed as: Information 2018, 9, 165 6 of 17 where g k (x, θ) represents the activation function, and: Suppose that the training dataset is , then, the training process of the ELM can be summarized as follows [20].
Step 1: Set the number of hidden neurons and randomize the parameters θ in the activation functions; Step 2: Calculate the output matrix H as: Step 3: Calculate the output weights as β = H † y, where H † is the Moore-Penrose pseudo-inverse of the output matrixH, and y is the output vector and can be expressed as:

The Proposed Hybrid Model
In this section, the hybrid model combining the LSTM and the ELM will be proposed firstly.Then, the model evaluation indices will be presented.Finally, the data preprocessing will be introduced.

The Hybrid Model
The structure of the proposed hybrid model is demonstrated in Figure 4.As shown in this figure, once the data is input, the outputs of the LSTM model and the ELM model will be firstly calculated, then they will be ensembled by the linear regression method to generate the final output of the hybrid model.
, then, the training process of the ELM can be summarized as follows [20].
Step 1: Set the number of hidden neurons and randomize the parameters θ in the activation functions; Step 2: Calculate the output matrix H as: Step 3: Calculate the output weights as †  β H y , where † H is the Moore-Penrose pseudo-inverse of the output matrix H , and y is the output vector and can be expressed as:

The Proposed Hybrid Model
In this section, the hybrid model combining the LSTM and the ELM will be proposed firstly.Then, the model evaluation indices will be presented.Finally, the data preprocessing will be introduced.

The Hybrid Model
The structure of the proposed hybrid model is demonstrated in Figure 4.As shown in this figure, once the data is input, the outputs of the LSTM model and the ELM model will be firstly calculated, then they will be ensembled by the linear regression method to generate the final output of the hybrid model.Suppose that the linear regression in the hybrid model is expressed as: In this hybrid model, the LSTM and ELM models can be constructed by the learning algorithms mentioned in the previous subsections.Then, to design this hybrid model, the only remaining task is to determine the parameters of the linear regression part.Assume that, for the lth input x l in the aforementioned training dataset , the predicted outputs of the LSTM and ELM are, respectively, y s (x l ) and y e l (x l ), then we get the training dataset for the linear regression part as y s (x l ), y e (x l ); y l L l=1 .
Suppose that the linear regression in the hybrid model is expressed as: For the newly generated training dataset y s (x l ), y e l (x l ); y l L l=1 , we expect that: Then, these equations can be rewritten in the matrix form as: where: ) y e (x 2 ) . . .
As a result, the parameters of the linear regression part in the hybrid model can be determined as: where A + is the Moore-Penrose pseudo-inverse of the matrix A.

Model Evaluation Indices
In order to evaluate the performance of the proposed hybrid model, the following three indices, which are the mean absolute error (MAE), the root mean square error (RMSE), and the mean relative error (MRE), are adopted.The formulas for them can be expressed as: where L is the number of training or test samples, ŷl and y l are, respectively, the predicted values and real values of the electrical load.The MAE, RMSE, and MRE are common measures of forecasting errors in time series analysis.They serve to aggregate the magnitudes of the prediction errors into a single measure.The MAE is an average of the absolute errors between the predicted values and actual observed values.In addition, the RMSE represents the sample standard deviation of the differences between the predicted values and the actual observed values.As larger errors have a disproportionately large effect on MAE and RMSE, they are sensitive to outliers.The MRE, also known as the mean absolute percentage deviation, can remedy this drawback, and it expresses the prediction accuracy as a percentage through dividing the absolute errors by their corresponding actual values.For prediction applications, the smaller the values of MAE, RMSE, and MRE, the better the forecasting performance will be.

Data Preprocessing
When the tanh function is selected as the LSTM activation function, its output value will be in the range of [-1, 1].In order to ensure the correctness of the results from the LSTM model, the electrical load data need to be normalized in our experiments.
Suppose that the time series of the electrical load data is {s 1 , s 2 , • • • , s t−1 , s t , • • • , s N }, then, the following equation is used to realize the normalization: where s min and s max are, respectively, the minimum and maximum values of the electrical load data.Then, we obtain the normalized electrical load data series as Subsequently, this time series can be used to generate the training or testing data pairs as follows: where

Experiments and Comparisons
In this section, the proposed hybrid model will be applied to forecast the electrical load of the Albert area and the electrical load of one service restaurant.Detailed experiments will be conducted in these two experiments and comparisons with the LSTM, ELM, and SVR will also be made.

Applied Dataset
The electrical load data used in this experiment was downloaded from the website of the Albert Electric System Operator (AESO) [43].This historical electrical load dataset was collected by the Albert Electric System Operator (AESO) and provided for market participants.The electrical load data in this experiment was sampled from 1 January 2005 to 31 December 2016.Additionally, the data sampling period was one hour.This applied dataset has missing values, so, we filled in the missing values through the averaging filter to ensure the integrity and rationality of the data.Finally, this electrical load dataset contains a total of 105,192 samples.In our following experiment, the data samples from 2005 to 2015 are used for training while the data samples in 2016 are used for testing.

Experimental Setting
In order to determine the optimal structure of the LSTM model for the electrical load prediction, the following two design factors are considered in this paper: the number of hidden neurons and the number of input variables.The larger the number of hidden neurons, the better the modeling performance of the LSTM may be.However, with more hidden neurons, the greater the training time and the complexity of the LSTM.On the other hand, a small number of input variables will limit the prediction accuracy, while more input variables will increase the training difficulty.
In this experiment, we test five levels of the number of hidden neurons, which are 20, 40, 60, 80, and 100.Additionally, the number of the input variables is selected from eight levels, which are 5, 6, 7, 8, 9, 10, 11, and 12. Thus, 40 cases are given.Then, in each case, in order to consider the effects of the random initializations of the networks' weights, 10 tests are run considering different random initializations.Additionally, in each case, the MAE, MRE, and RMSE are computed as the averages of those indices in the 10 runs.The averaged performances of the LSTM model in 40 cases in this experiment are shown in Table 1.From Table 1, among all the 40 cases, the result of the 28th case is the best.That is to say, when the number of input variables is 10 and the number of hidden neurons is 60, the LSTM model can achieve the best performance.For the ELM, the number of neuron nodes is also be set to 60.The hybrid model also adopts the LSTM and ELM with the selected structure.Additionally, after being trained, the linear regression part of the hybrid model has the following expression: ŷ(x) = −19.9755+ 0.6296y s (x) + 0.3737y e (x) Additionally, we use the software "libsvm" to realize the SVR prediction.In order to achieve as better performance as possible, the SVR is tuned by trial-and-error.The tuned SVR adopts the radial basis function as its kernel function, whose parameter gamma is set to be 0.001.The penalty coefficient of the SVR is tuned to be 100 for better performance, while the other parameters, including the loss function and the error band, are the defaults in the "libsvm".

Experimental Results and Analysis
The prediction results of the four models in this application are shown in Figure 5.In order to show the details more clearly, in this figure we only plotted the prediction results of the last ten days in 2016.It can be seen from Figure 5 that the proposed hybrid model has much better performance compared with the other three models.
The performance indices of the four models are shown in Table 2. Obviously, the three indices of the proposed hybrid model are smaller than the other three models.From the point of view of these three indices, the performance of the proposed hybrid model can improve at least 5% compared to the LSTM, 8% compared to ELM, and 15% compared to SVR.In other words, in this experiment, Hybrid model > LSTM > ELM > SVR, where ">" means "performs better than".these three indices, the performance of the proposed hybrid model can improve at least 5% compared to the LSTM, 8% compared to ELM, and 15% compared to SVR.In other words, in this experiment, Hybrid model > LSTM > ELM > SVR, where ">" means "performs better than".Figure 6 demonstrates the histograms of the hourly prediction errors in this experiment.Higher and narrower histogram around zero means better forecasting performance.From this figure, it is clear that the diagram of the hybrid model has much more errors locating around zero, which once again implies that the prediction performance of the proposed hybrid model is the best.
In order to better demonstrate the experimental performance of the proposed hybrid model, the scatter plots of the actual and predicted values of the electrical load in the first experiment are drawn in Figure 7.This figure also verifies that the proposed hybrid model can provide satisfied fitting performance.Figure 6 demonstrates the histograms of the hourly prediction errors in this experiment.Higher and narrower histogram around zero means better forecasting performance.From this figure, it is clear that the diagram of the hybrid model has much more errors locating around zero, which once again implies that the prediction performance of the proposed hybrid model is the best.
In order to better demonstrate the experimental performance of the proposed hybrid model, the scatter plots of the actual and predicted values of the electrical load in the first experiment are drawn in Figure 7.This figure also verifies that the proposed hybrid model can provide satisfied fitting performance.

Applied Dataset
The electrical load dataset in the second experiment was downloaded from [44].This dataset contains hourly load profile data for 16 commercial building types and residential buildings in the United States.In this study, we select the electrical load data of one service restaurant in Helena, MT, USA for our experiment.The selected time series data were collected from 1 January 2004 to 31 December 2004 with an hourly sampling period.Again, in this experiment, we apply the averaging filter to fill in the missing values.Hence, in total, we have 8760 samples.In our experiment, the data in the first ten months are chosen for training and the ones in the last two months are for testing.

Experimental Setting
The method for determining the optimal structure of the LSTM model is similar to that in the first experiment.In this application, the number of hidden neurons is also chosen from the same five levels, while the number of the input variables is tested among the same eight levels.As a result, there still exist 40 cases in this experiment.Again, in each case, 10 different random initializations are considered.The averaged indices of the LSTM model in 40 cases in this application are shown in Table 3. From this table, we can observe that case 35 has the best performance.In other words, the optimal structure of the LSTM model has 12 input variables and 100 neurons in the hidden layer.Similarly, the number of hidden neurons in the ELM is set to be 100.Further, the hybrid model is constructed by ensembling these two LSTM and ELM models.The regression part for this ensembling is obtained after learning as follows: ŷ(x) = −2.6753+ 0.4367y s (x) + 0.6231y e (x) (26) Additionally, for the SVR in this application, we also use the radial basis function as the kernel function, but the parameter gamma is tuned to be 0.1, and the penalty coefficient is tuned to be 110.
Again, the defaults in the "libsvm" are used for the other parameters, including the loss function and the error band in this application.

Experimental Results and Analysis
For the testing data, the forecasting results of the last five days from the four models are demonstrated in Figure 8.Additionally, in order to show the improvement of the proposed hybrid model, the performance indices of the four models in this application are listed in Table 4. From Figure 8 and Table 4, we once again observe that the proposed hybrid model can achieve the best performance in this electrical load forecasting application.Compared with the other three comparative methods, the improvement of the proposed hybrid model can achieve at least 33.3%, 31.6%, and 52.5% according to the indices MAE, MRE, and RMSE, respectively.
Additionally, for the SVR in this application, we also use the radial basis function as the kernel function, but the parameter gamma is tuned to be 0.1, and the penalty coefficient is tuned to be 110.Again, the defaults in the "libsvm" are used for the other parameters, including the loss function and the error band in this application.

Experimental Results and Analysis
For the testing data, the forecasting results of the last five days from the four models are demonstrated in Figure 8.Additionally, in order to show the improvement of the proposed hybrid model, the performance indices of the four models in this application are listed in Table 4. From Figure 8 and Table 4, we once again observe that the proposed hybrid model can achieve the best performance in this electrical load forecasting application.Compared with the other three comparative methods, the improvement of the proposed hybrid model can achieve at least 33.3%, 31.6%, and 52.5% according to the indices MAE, MRE, and RMSE, respectively.To further reflect the differences of the four methods, the histograms of their prediction errors in this application are demonstrated in Figure 9. From Figure 9a, we can observe that the mean of the forecasting errors of the proposed hybrid model is located around zero, which implies that the  To further reflect the differences of the four methods, the histograms of their prediction errors in this application are demonstrated in Figure 9. From Figure 9a, we can observe that the mean of the forecasting errors of the proposed hybrid model is located around zero, which implies that the forecasting errors of the proposed hybrid model are relatively small.From Figure 9b, it can be seen that the center of the forecasting errors of the LSTM model is greater than zero.This means that the LSTM model has larger prediction errors than the hybrid model.Comparing Figure 9c,d with Figure 9a, we can find that the error histograms of the ELM and SVR are lower and fatter than that of the proposed hybrid model.Just as mentioned previously, the lower and flatter error histogram means the worse performance.We can also observe from Figure 9d that some forecasting errors of the SVR are very large.Overall, in this electrical load forecasting application, the hybrid model > LSTM > ELM > SVR again.
Information 2018, 9, x FOR PEER REVIEW 14 of 17 forecasting errors of the proposed hybrid model are relatively small.From Figure 9b, it can be seen that the center of the forecasting errors of the LSTM model is greater than zero.This means that the LSTM model has larger prediction errors than the hybrid model.Comparing Figures 9c,d with Figure 9a, we can find that the error histograms of the ELM and SVR are lower and fatter than that of the proposed hybrid model.Just as mentioned previously, the lower and flatter error histogram means the worse performance.We can also observe from Figure 9d that some forecasting errors of the SVR are very large.Overall, in this electrical load forecasting application, the hybrid model > LSTM > ELM > SVR again.

Conclusions
The short-term electrical load forecasting plays an important role in the efficient management of the power grid.This study presented one hybrid model for the short-term electrical load forecasting.The proposed hybrid model used the ELM method to model the shallow features of the electrical load and adopted the LSTM method to extract the deep patterns.In the hybrid model, the predicted results from the ELM and LSTM are ensembled by one linear regression which is determined by the least square method.Two real-world electrical load forecasting applications were also given to evaluate the performance of the proposed hybrid model.Experimental results demonstrated that the proposed hybrid model can give satisfactory prediction accuracy and can achieve the best results compared with the comparative methods.The experimental results also indicate that the LSTM can use its memory cells to learn and retain useful information in the historical data of electrical load for a long period of time, and use its forget gates to remove useless information, which makes the hybrid model have excellent learning performance and generalization ability.The proposed hybrid method can also be applied to some other time series prediction problems, e.g., building energy consumption prediction and traffic flow estimates.

Conclusions
The short-term electrical load forecasting plays an important role in the efficient management of the power grid.This study presented one hybrid model for the short-term electrical load forecasting.The proposed hybrid model used the ELM method to model the shallow features of the electrical load and adopted the LSTM method to extract the deep patterns.In the hybrid model, the predicted results from the ELM and LSTM are ensembled by one linear regression which is determined by the least square method.Two real-world electrical load forecasting applications were also given to evaluate the performance of the proposed hybrid model.Experimental results demonstrated that the proposed hybrid model can give satisfactory prediction accuracy and can achieve the best results compared with the comparative methods.The experimental results also indicate that the LSTM can use its memory cells to learn and retain useful information in the historical data of electrical load for a long period of time, and use its forget gates to remove useless information, which makes the hybrid model have excellent learning performance and generalization ability.The proposed hybrid method can also be applied to some other time series prediction problems, e.g., building energy consumption prediction and traffic flow estimates.

Figure 1 .
Figure 1.The structure of the typical RNN model.

Figure 1 .
Figure 1.The structure of the typical RNN model.

Figure 2 .
Figure 2. The structure of the LSTM unit.

Figure 2 .
Figure 2. The structure of the LSTM unit.

Figure 3 .
Figure 3.The structure of the ELM.

Figure 3 .
Figure 3.The structure of the ELM.

Figure 4 ., 1 (
Figure 4.The structure of the proposed hybrid model.

Figure 4 .
Figure 4.The structure of the proposed hybrid model.

Figure 5 .
Figure 5. Experimental results of the last ten days in 2016 in the first experiment: (a) Hybrid model; (b) LSTM, (c) ELM; (d) SVR.

Figure 5 .
Figure 5. Experimental results of the last ten days in 2016 in the first experiment: (a) Hybrid model; (b) LSTM, (c) ELM; (d) SVR.

Figure 7 .
Figure 7.The actual and predicted values of the electrical load in the first experiment.Figure 7. The actual and predicted values of the electrical load in the first experiment.

Figure 8 .
Figure 8. Experimental results of the last five days in the second application: (a) Hybrid model; (b) LSTM; (c) ELM; and (d) SVR.

Figure 8 .
Figure 8. Experimental results of the last five days in the second application: (a) Hybrid model; (b) LSTM; (c) ELM; and (d) SVR.
represents the forget gate used to help the network to forget the past input information and reset the memory cells.The calculation of the input gate and forget gate can be respectively expressed as: t f

Table 1 .
The averaged performances of the LSTM model in 40 cases in the first experiment.

Table 2 .
The performance indices of the four models in the first experiment.

Table 2 .
The performance indices of the four models in the first experiment.

Table 3 .
The averaged indices of the LSTM model in the second experiment.

Table 4 .
The performance indices of the four models in the second experiment.

Table 4 .
The performance indices of the four models in the second experiment.