A Deep Neural Network Model for Short-Term Load Forecast Based on Long Short-Term Memory Network and Convolutional Neural Network

: Accurate electrical load forecasting is of great signiﬁcance to help power companies in better scheduling and efﬁcient management. Since high levels of uncertainties exist in the load time series, it is a challenging task to make accurate short-term load forecast (STLF). In recent years, deep learning approaches provide better performance to predict electrical load in real world cases. The convolutional neural network (CNN) can extract the local trend and capture the same pattern, and the long short-term memory (LSTM) is proposed to learn the relationship in time steps. In this paper, a new deep neural network framework that integrates the hidden feature of the CNN model and the LSTM model is proposed to improve the forecasting accuracy. The proposed model was tested in a real-world case, and detailed experiments were conducted to validate its practicality and stability. The forecasting performance of the proposed model was compared with the LSTM model and the CNN model. The Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) were used as the evaluation indexes. The experimental results demonstrate that the proposed model can achieve better and stable performance in STLF.


Introduction
Demand Response Management (DRM) is one of the main features in smart grid that helps to reduce power peak load and variation [1].The DRM controls the electricity consumption at the customer side and targets at improving energy-efficiency and reducing cost [2].Accurate load forecasting has been more essential after deregulation of electricity industry [3].It can minimize the gap between electricity supply and demand, while any error in the forecasting brings additional costs.In 1985, it was estimated that a 1% increase in forecasting error increases the associated operating costs of up to 10 million pounds every year in the thermal British power system [4].Power companies are beginning to work with experts to explore models obtaining more accurate results in load forecasts.For instance, the National Grid in the United Kingdom (UK) is currently working with DeepMind [5,6], a Google-owned AI team, to predict the power supply and demand peaks in the UK based on the information from smart meters and incorporating weather-related variables.Therefore, precise load forecast is expected to reduce operation costs, optimize utilities and generate profits.
Load forecasting in energy management systems (EMS) can be categorized into four types according to different length of forecast interval [7]: (1) very short-term load forecasting (VSTLF) forecasts load for few minutes; (2) short-term load forecasting (STLF) forecasts load from 24 h to one week; (3) medium-term load forecasting (MTLF) forecasts load more than one week to few months; and (4) long-term load forecasting (LTLF) forecasts load longer than one year.In this paper, we focus on STLF.STLF is essential for controlling and scheduling of the power system in making everyday power system operation, interchange evaluation, security assessment, reliability analysis and spot price calculation [8,9], which leads to the higher accuracy requirement rather than long-term prediction.
The STLF problem has been tackled with various methods.These methods can be loosely categorized into two groups, namely traditional and computational intelligence methods.Statistical methods are most frequently used in early literature, including multiple linear regression [10,11], exponential smoothing [12], and the autoregressive integrated moving average (ARIMA) [13].However, due to the inherent non-linear and the high requirement of the original time sequences of the electrical load data, these methods perform poorly in the STLF.
Computational intelligence methods have achieved great success and are widely used in load forecasting based on the non-linear learning and modeling capability, including clustering methods [14], fuzzy logic system [15], support vector machine (SVM) [16,17] and artificial neural networks [18].In [19], a methodology based on artificial neural networks methods reinforced by an appropriate wavelet denoising algorithm is implemented to obtain short-term load forecasting, and the results show that the proposed method greatly improves the accuracy.Recently, deep learning frameworks have gained a particular attention [20].Compared to shallow learning, deep learning usually involves a larger number of hidden layers, which makes the model able to learn more complex non-linear patterns [21].As a deep learning framework with powerful learning ability to capture the non-stationary and long-term dependencies forecasting horizon [22], recurrent neural networks (RNNs) are effective methods for load forecasting in power grids.In [23], A novel pooling-based deep RNN is applied for household load forecast and achieves preliminary success.Compared with the state-of-the-art techniques in household load forecasting, the proposed method outperforms ARIMA by 19.5%, SVR by 13.1% and RNN by 6.5% in terms of RMSE.In [24], a new load forecasting model that incorporates one-step-ahead concept into RNN model is proposed.The performance in high or low demand regions is outstanding, which proves that the proposed electricity loads forecasting model can extract tinier fluctuations in different region than the other models.However, the vanishing gradient point is a problem for RNNs to improve the performance.To solve this problem, the long short-term memory (LSTM) and gated recurrent units (GRU), which variants of RNNs, have been proposed and perform well in long-term horizon forecasting based on the past data [25][26][27].In [28],the proposed LSTM-based method is capable of forecasting accurately the complex electric load time series with a long forecasting horizon by exploiting the long-term dependencies.The experiments show that the proposed method performs better in complex electrical load forecasting scenario..In [29], a method for short-term load forecasting with multi-source data using gated recurrent unit neural networks, which are used for extracting temporal features with simpler architecture and less convergence time in the hidden layers, is proposed.The average MAPE can be low as 10.98% for the proposed method, which outperforms other current methods, such as BPNNs, SAEs, RNNs and LSTM.
In addition to the above representative methods, the convolutional neural networks (CNNS) have been widely applied in the field of prediction.CNN can capture local trend features and scale-invariant features when the nearby data points typically have strong relationship with each other [30].The pattern of the local trend of the load data in nearby hours can be extracted by CNN.In [31], a new load forecasting model that uses the CNN structure is presented and compared with other neural networks.The results show that MAPE and CV-RMSE of proposed algorithm are 9.77% and 11.66%, which are the smallest among all models.The experiments prove that the CNN structure is effective in the load forecasting and the hidden feature can be extracted by the designed 1D convolution layers.Based on the above literature, LSTM and CNN are both demonstrated to provide high accuracy prediction in STLF due to their advantages to capture hidden features.Therefore, it is desired to develop a hybrid neural network framework that can capture and integrate such various hidden features to provide better performance.
This paper proposes a new deep learning framework based on LSTM and CNN.More specifically, it consists of three parts: the LSTM module, the CNN module and the feature-fusion module.The LSTM module can learn the useful information for a long time by the forget gate and memory cell, and CNN module is utilized to extract patterns of local trend and the same pattern which appears in different region.The feature-fusion module is used to integrate these hidden features and make the final prediction.The proposed CNN-LSTM model was developed and applied to predict a real-word electrical load time series.Additionally, several methods were implemented to be compared to our proposed model.To prove the validity of the proposed model, the CNN module and the LSTM module were also tested independently.Furthermore, the test dataset was divided into several partitions to test the stability of the proposed model.In summary, this paper proposes a deep learning framework that can effectively capture and integrate the hidden feature of the CNN model and the LSTM model to achieve higher accuracy and stability.From the experiments, the proposed CNN-LSTM model takes advantage of each components and achieves higher accuracy and stability in STLF.
The major contributions of this paper are: (1) a high precision STLF deep learning framework, which can integrate the hidden feature of the CNN model and LSTM model; (2) demonstrating the superiority of the proposed deep learning framework in real-word electrical load time series by comparisons with several models; (3) validating the practicality and stability of the proposed CNN-LSTM model in several partitions of test dataset; snf (4) a research direction in time sequence forecasting based on the integration of the hidden features of the LSTM and CNN model.
The rest of this paper is structured as follows: In Section 2, the RNN, LSTM, and CNN are introduced.In Section 3, the proposed CNN-LSTM neural network framework is proposed.In Section 4, the proposed model is applied to forecast the electrical load in a real-world case.Additionally, comparisons and analysis are provided.In Section 5, the discussion of the result is shown.Finally, we draw the conclusion in Section 6.

Methodologies of Artificial Neural Networks
This section provides brief backgrounds on several artificial neural networks, including RNN, LSTM, and CNN.

RNN
RNN is a kind of artificial neural network shown to have a strong ability to capture the hidden correlations occurring in data in applications for speech recognition, natural language processing and time series prediction.It is particularly suitable for modeling sequence problems by operating on input information as well as a trace of previously acquired information due to recurrent connections [32].As shown in Figure 1, the mapping of one node S t and the output O t can be represented as: where S t is the memory of the network at time t; U, W and V are the share weight matrix in each layer; X t and O t represents the input and the output at time t; and f (.) and g(.) represent the nonlinear function.Unlike the weight connection established between the layers in the basic neural network, RNN can use the internal state (memory) to process sequence of inputs [33].The hidden state captures the information at the previous point time, and the output is derived from the current time and previous memories.RNN performs well when the output is close to its associated inputs because the information of the previous node is passed to the next node.In theory, RNN is also able to deal with long dependencies.However, in practical applications, RNN cannot memorize the previous information well when the time interval is long due to the gradient vanishing problem.To solve these weaknesses and enhance the performance of the RNN, a special type of RNN architecture called LSTM was proposed.

LSTM
To overcome the aforementioned disadvantages of traditional RNNs, LSTM combines short-term memory with long-term memory through the gate control.As shown in the Figure 2, a common unit consists of a memory cell, an input gate, an output gate, and a forget gate.The input X t at time t is selectively saved into cell C t determined by the input gate, and the state of the last moment cell C t−1 is selectively forgotten by the forget gate.Finally, the output gate controls which part of the cell C t is added to the output h t .The calculation of the input gate i t and forget gate f t can be, respectively, expressed as: where W i and W f are the weight matrices, h t−1 is the output of the previous cell, x t is the input, and b i and b f are the bias vectors.The next step is to update the cell state C t , which can be computed as: where W c is the weight matrix, b c is the bias vector, and C t−1 is the state of the previous cell.
The output gate o t and the final output h t can be expressed as: where W o is the weight matrix and b o is the bias vector.

CNN
CNN is a kind of deep artificial neural networks.CNN is most commonly applied to deal with tasks in which data have high local correlation, such as visual imagery, video prediction, and text categorization.It can capture when the same pattern appears in different regions.CNN requires minimal preprocessing by using a variation of multilayer perceptrons, and is effective at dealing with high-dimensional data based on their shared-weights architecture and translation invariance characteristics.
CNN usually consists of convolutional layers, pooling layers and fully-connected layers.Convolutional layers apply a convolution operation to the input.The purpose of the convolution operation is to extract different features of the input, and more layers can iteratively extract complex features from the last feature.As shown in Figure 3, each convolutional layer is composed of several convolutional units, and the parameters of each convolution units are optimized by a back propagation algorithm.Generally, features with a large dimension are obtained after the convolutional layer, which need to be dimension-reduced.Pooling layers combine the outputs of neuron clusters at one layer into a single neuron in the next layer.Fully-connected layer, which combines all local features into global features, is used to calculate the final result.

The Proposed Method
In this section, we describe our CNN-LSTM based hybrid deep learning forecasting framework for STLF.It is motivated by the combination of CNN and LSTM, which considers the local trend and the long-term dependency of load data.

The Overview of the Proposed Framework
The structure of the proposed hybrid deep neural network is shown in Figure 4.The inputs are the information of the load value in the past few hours, and the outputs represent the prediction of the future load values.The proposed framework mainly consists of a CNN module, a LSTM module and a feature-fusion module.In the data preparation step, null values are checked and the load data are split into training and test sets.Then, the origin data are transferred into two different datasets.The CNN module is used to capture the local trend and the LSTM module is utilized to learn the long-term dependency.The two hidden feature are concatenated in the feature-fusion module.The final prediction is generated after a fully-connected layer.In the following, the detailed structure of each components is described.
In the CNN module, the main target is to capture the feature of the local trend.The inputs are the standardized load datasets, and the outputs are the prediction of the trend in next few hours.The main structure of the CNN module is performed by three convolution layers (Conv1, Conv2, and Conv3).Convolution layers are one-dimensional convolutions, and the activation function is the Rectified Linear Unit (RELU).The hidden feature of the CNN module is constructed to integrate with the feature of the LSTM module in the feature-fusion module.
The LSTM module is used to capture the long-term dependency.The inputs are reshaped for LSTM structure, and the prediction target is the maximum value of the next few hours.The hidden neurons of the output of the LSTM module are same as the CNN module.
After the process of the CNN module and the LSTM module, the outputs of the two modules are concatenated in the merge layer of the feature-fusion module.The final prediction is generated after a fully-connected layer.

Model Evaluation Indexes
To evaluate the performance of the proposed model, the Mean Absolute Error (MAE), the Mean Absolute Percentage Error (MAPE) and the Root Mean Square Error (RMSE) are employed.The error measures are defined as follows: where N is the size of training or test samples, and y L and y l are the predicted value and actual value, respectively.The MAE is the average of the absolute errors between the predicted values and actual values, which reflects the actual predicted value error.The MAPE further considers the ratio between error and the actual value.The RMSE represents the sample standard deviation of differences between the predicted values and the actual observed values.The smaller are the values of MAE, MAPE and RMSE, the better is the forecasting performance.

Experiments and Results
The proposed model was applied to forecast the electrical load in a real-world case.In this section, the experiments are described in detail, and comparisons with LSTM, CNN and the proposed model are also presented.

Datasets Description
In the experiment, the electric load dataset in the Italy-North Area provided by entsoe Transparency Platform was used.The period of the particular dataset used in this paper is from 1 January 2015 to 31 December 2017.The data sampling was one hour.The electrical dataset contains a total of 26,304 samples.In this study, the load data for first two years were chosen as the training set.The test data were collected in 2017.An example of the test dataset is shown in Figure 5.

The Detailed Experimental Setting
The past 21 × 24 h load data were selected as the input variable of the model, and the output was the load in the next 24 h.In the CNN module, the kernel sizes of the convolutional layer are 5, 3, and 3, and the filter sizes are 16, 32, and 64.The feature maps are all activated by the Rectified Linear Unit (ReLU) function.The hidden neuron of the LSTM module was set as 100.The sigmoid function was chosen to be the active function of the fully-connected layer.The training process continued until the MSE value had no improvement in 500 iterations or the maximal number of epochs was reached.

Experimental Results and Analysis
In this application, random forest (RF), decision tree (DT), DeepEnergy (DE) [28] and the proposed CNN-LSTM model were implemented and tested in the prediction of the next 24 h load forecast.Besides, the CNN module and the LSTM module were also extracted and tested to demonstrate the superiority of our proposed model.The result obtained by the proposed CNN-LSTM model is illustrated in Figure 6.To evaluate the performance and stability of the proposed model, the test dataset was divided into eight partitions.The detailed experimental results of each model are illustrated in Tables 1-3.As shown in Tables 1-3, the averaged MAE, MAPE and RMSE of the decision tree are the largest in the six models.The performance of the deep neural networks is much better than the decision tree and random forest.The results of the CNN module is a little better than the LSTM module, while they are both higher than the DeepEnergy.Although the performance of the independent CNN module and LSTM module is a little worse than the DeepEnergy, the proposed model, which integrates these two modules, provides better result.The average indexes of the proposed CNN-LSTM model are the minimum among all models: 692.1446, 0.0396 and 1134.1791.From the point of view of these three indexes, t the proposed model can improve performance by at least 9% compared to the DeepEnergy, 12% compared to the CNN module, and 14% compared to the LSTM module.Therefore, it is proven that our proposed CNN-LSTM model can make more accurate forecast by integrating the hidden feature of the CNN module and the LSTM module.According to the average indexes, it is demonstrated that our proposed CNN-LSTM model can achieve the best performance in STLF.
Meanwhile, it is also evident the our proposed model is stable.In the eight partitions of the test dataset, the results of the proposed model prove the superiority compared to the other forecasting methods.For a better visualization, the results of six models in the eight partitions are also illustrated in Figures 7-9.As shown in Figures 7-9, the curves that denote the proposed CNN-LSTM model are approximately the minimum among all partitions.Specifically, the MAE, MAPE and RMSE of the proposed model are the minimum in half of the eight test partitions, i.e., Test-2, Test-4, Test-7 and Test-8.In the other four partitions, the proposed model also provides accurate forecast result.The MAPE and RMSE of the proposed model are also the minimum in Test-1 and Test-3, and the MAE is only higher than DeepEnergy in Test-1 and random forest in Test-3.Although the performance of the proposed model is not the best in Test-5 and Test-6, it is still one of the best three results.On the other hand, the performances of the independent CNN module and LSTM module are not stable.The MAPE of the LSTM module is the largest in Test-3, while it is good in Test-7 and Test-8.The CNN module has good performance in Test-1 and Test-5, while it performs the worst in Test-7.It is obvious that the proposed model has good performance in all eight partitions, which proves that the proposed CNN-LSTM model can improve the stability of the load forecast.

Discussion
Deep learning methods, such as CNN and LSTM, are widely used in many applications.In this study, CNN and LSTM provide more accurate results than random forest and decision tree.In aspect of the LSTM model, it can learn useful information in the historical data for a long period by the memory cell, while the useless information will be forgotten by the forget gate.According to the result, the LSTM module can make accurate load forecast by exploiting the long-term dependencies.On the other hand, the CNN model can extract patterns of local trend and capture the same pattern, which appears in different regions.The experiments also show that the CNN structure is effective in the load forecast.To further improve the accuracy and stability of the load forecast, a new deep neural network framework, which integrates the the CNN module and the LSTM module is proposed in this paper.In the experiments, our proposed CNN-LSTM model achieves the best performance among all models.Furthermore, the test dataset is divided into eight partitions to test the stability of the proposed model.The independent CNN module and LSTM module perform well in some partitions and poor in others, while the proposed model has good performance in all partitions.It demonstrates that the proposed model has better stability than independent module.The results prove that the integration of the hidden features of CNN model and LSTM model is effective in load forecast and can improve the prediction stability.This paper gives a new research direction in time sequence forecasting based on the integration of LSTM and CNN.Future studies can attempt to further improve the accuracy of the short-term electrical load forecast by more effective way to integrate the hidden features of LSTM and CNN.

Conclusions
This paper proposes a multi-step deep learning framework for STLF.The proposed model is based on the LSTM module, the CNN module and the feature-fusion module.The performance of the proposed model was validated by experiment with a real-world case of the Italy-North Area electrical load forecast.In addition, several partitions of test datasets were tested to verify the performance and stability of the proposed CNN-LSTM model.According to the results, the proposed model has the lowest values of MAE, MAPE and RMSE.The experiments demonstrate the superiority of the proposed model, which can effectively capture the hidden features extracted by the CNN module and LSTM module.The result shows a new research direction to further improve the accuracy and stability of the load forecast by integrating the hidden features of LSTM and CNN.

Figure 1 .
Figure 1.A simple recurrent neural network structure.

Figure 5 .
Figure 5.An example of the load data in test dataset.

Figure 6 .
Figure 6.The forecast result using the proposed CNN-LSTM model for test data.

Figure 7 .
Figure 7.The comparison of the MAE in the six models.

Figure 8 .
Figure 8.The comparison of the MAPE in the six models.

Figure 9 .
Figure 9.The comparison of the RMSE in the six models.

Table 1 .
The experimental results in terms of Mean Absolute Error (MAE).

Table 2 .
The experimental results in terms of Mean Absolute Percentage Error (MAPE).

Table 3 .
The experimental results in terms of Root Mean Square Error (RMSE).