Multi-Step Short-Term Power Consumption Forecasting with a Hybrid Deep Learning Strategy

: Electric power consumption short-term forecasting for individual households is an important and challenging topic in the ﬁelds of AI-enhanced energy saving, smart grid planning, sustainable energy usage and electricity market bidding system design. Due to the variability of each household’s personalized activity, difﬁculties exist for traditional methods, such as auto-regressive moving average models, machine learning methods and non-deep neural networks, to provide accurate prediction for single household electric power consumption. Recent works show that the long short term memory (LSTM) neural network outperforms most of those traditional methods for power consumption forecasting problems. Nevertheless, two research gaps remain as unsolved problems in the literature. First, the prediction accuracy is still not reaching the practical level for real-world industrial applications. Second, most existing works only work on the one-step forecasting problem; the forecasting time is too short for practical usage. In this study, a hybrid deep learning neural network framework that combines convolutional neural network (CNN) with LSTM is proposed to further improve the prediction accuracy. The original short-term forecasting strategy is extended to a multi-step forecasting strategy to introduce more response time for electricity market bidding. Five real-world household power consumption datasets are studied, the proposed hybrid deep learning neural network outperforms most of the existing approaches, including auto-regressive integrated moving average (ARIMA) model, persistent model, support vector regression (SVR) and LSTM alone. In addition, we show a k -step power consumption forecasting strategy to promote the proposed framework for real-world application usage.


Introduction
Artificial intelligence (AI) enhanced electric power consumption short-term forecasting is an important technique for smart grid planning, sustainable energy usage and electricity market bidding system design.Existing work shows that 20% extra energy output is required to overcome a 5% integrated residential electric power consumption peak increment without effective power consumption forecasting [1].The advanced metering infrastructure (AMI) introduces the possibility to learn power consumption pattern for each residential house from its historical data.The resulting power consumption prediction provides an important hint for both the power suppliers and consumers to maintain a sustainable environment for energy saving, management and scheduling [2,3].

1.
High prediction accuracy.The volatility level of single household power consumption is high due to the irregular human behaviours.Moreover, the source data is usually univariate, consisting only power consumption records in kilowatts (kws), which increases the difficulty for accurate power consumption forecasting.2.
Multi-step forecasting.Most existing load forecasting works focus on one-step forecasting solutions.A longer time forecasting solution is required to facilitate real-world application usage, such as the dynamic electricity market bidding system design.
Traditional electric power forecasting methods overcome the uncertainty by integrating the overall power consumption of a large group of households or clustering similar pattern customers into sub-groups to reduce the irregularity.However, during the development process of smart grid, the accurate prediction of a household electric power consumption is highly demanded, which may come out with a customized electricity price plan for that particular household.Moreover, univariate data forecasting remains as one of the most challenging problems in the field of machine learning, since most of the dependent variables are unknown, such as the electric current, voltage, weather conditions, etc. [7].Classic univariate forecasting methods are usually applied to cases that either the rest of the features are too difficult to be measured or there are too many variables to be measured, e.g., the stock market indices forecasting problems [8].Flexibility of those univariate forecasting methods is introduced while no extra information is required.The proposed approach can be plugged into management system for other households power consumption forecasting as long as the historical data is available in the system.
In recent years, deep learning neural networks (DLNNs) became increasingly attractive throughout the world and were extensively employed in a large number of application fields, including natural language processing (NLP) [9], image object detection [10], time series analysis [11], etc.For individual household power consumption forecasting problems, recent works reported that the long short term memory (LSTM) neural network provides extremely high accuracy on prediction [2,12,13].Experimental results show that, by using the conventional LSTM neural network alone, the prediction accuracy outperforms most of the traditional statistical and machine learning methods, including auto-regressive integrated moving average (ARIMA) model [14], support vector machine (SVM) [15], non-deep artificial neural networks (ANNs) [16] and their combinations [17], because of the extra neighboring time frame states dependencies introduced by memory gates in recurrent neural network (RNN).However, even recent works, such as [2,12,13] focus on short-term forecasting strategy, which forecast power load only one step further.For particular applications, such as electricity market bidding system design, a longer time forecasting strategy can be more preferred.
Moreover, LSTM neural network is a special form of RNN [18]; and there exist other types of DLNNs, such as convolution neural networks (CNNs) [19] and deep belief nets (DBNs) [20].The temporal CNN, which consists of a special 1-D convolution operation, is also reported to be potentially useful for time series prediction problems [21].In the field of NLP, there are suggestions to combine the temporal CNN with RNN to obtain more precise classification results [22].

Related Works
Electric power consumption forecasting is useful in many application areas.Besides electricity market bidding, it can also be applied to demand side management for transcative grid [3] and power ramp rate control [23].Conventional forecasting methods include support vector regression (SVR), ANNs, fuzzy logic methods [24] and time series analysis methods, such as autoregressive integrated moving average (ARIMA) [25], autoregressive method with exogenous variables [26,27] and grey models (GMs) [28].As early as 2007, Ediger and Akar [29] started to use ARIMA and seasonal ARIMA methods to forecast the energy consumption by fuel until the year 2020 in Turkey.Yuan et al. [14] compared the results of China's primary energy consumption forecasting using ARIMA and GM (1,1).Both methods work well; and a hybrid method combining the two methods was also proposed to show the best mean absolute percent error (MAPE) value they could achieve.O gcu et al. [30] compared ANN and support vector regression (SVR) models in forecasting electricity consumption of Turkey.For performance measurement, the mean absolute percentage error (MAPE) rates are used; and the SVR model showed a 0.6% better performance than ANN.Rodrigues et al. [31] designed an ANN energy consumption model consisting of a single hidden layer with 20 neurons to forecast 93 households energy consumptions in Portugal.Experimental results showed an averaged MAPE value of 4.2% for daily energy consumption forecasting in between of the 93 households.Deb et al. [32] compared ANN and an adaptive neuro-fuzzy interface system for energy consumption forecasting of three institution buildings in Singapore, and showed high forecasting accuracy.Wang and Hu [33] proposed hybrid forecasting method combining ARIMA model, extreme learning machine (ELM), SVRs and Gaussian process regression model for short-term wind speed forecasting problem.All individual base forecasting models are integrated in a non-linear way, where the experimental results showed the forecasting accuracy and reliability of the proposed hybrid method.
Deep learning neural networks are modern popular machine learning techniques dealing with big data with high classification and prediction accuracy, which has been widely applied in many fields, such as stock indices forecasting [34,35], wind speed prediction [36,37], solar irradiance forecasting [38,39], etc.In recent years, with the fast development of smart grid technology, DLNNs are widely employed to solve power consumption forecasting problems, both for industrial and residential buildings; and because of the significantly more internal hidden layers and computations compared to classic ANNs, DLNN is applied to more challenging problems, such as power consumption forecasting for individual households [21].Ryu et al. [40] trained DLNN with single household electricity consumption data in 2016 and showed that the DLNN can produce better prediction accuracy compared with shallow neural network (SNN), double seasonal Holt-Winters (DSHW) model and the autoregressive integrated moving average (ARIMA).Shi et al. [12] proposed a pooling-based deep recurrent neural network to capture the uncertainty of single household load forecasting problem and applied the proposed method on 920 Ireland customers' smart meter data.Experimental results show that the proposed deep learning neural network outperforms most classic data-driven forecasting methods, including ARIMA, SVR and RNN.Kong et al. [13] straightly applied a two-hidden-layer LSTM to single household power consumption forecasting problems; and compared their results with back-propagation neural network (BPNN), k-nearest neighbor regression (KNN) and extreme learning machine (ELM) to show the large forecasting accuracy improvement by using LSTM.

Contributions
In this study, a hybrid deep learning neural network framework combining LSTM neural network with CNN is designed to deal with the single household power consumption forecasting problem.The conventional LSTM neural network is extended by adding a pre-processing phase using CNN.The pre-processing phase extracts useful features from the original data and more importantly, converts the univariate data into multi-dimensional by 1-D convolution, which potentially enhances the prediction capability of the LSTM neural network.To evaluate the performance of the proposed framework, a series of experiments were performed based on five real-world households electric power consumption data collected by the UK-DALE dataset [41].The experimental results show that the proposed hybrid DLNN framework outperforms most of the existing approaches in the literature, including auto-regressive integrated moving average (ARIMA) model, support vector regression (SVR) and LSTM alone with three measurement metrics, including root-mean-square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE).The scientific impacts of this work to the literature involve: • A 1-D convolutional neural network is introduced to pre-process the univariate dataset and convert the original data into multi-dimensional features after two layers of temporal convolution operations.

•
A hybrid deep neural network is designed to forecasting power consumption for individual household.Experimental results show that the proposed framework outperforms most of the existing approaches including ARIMA, SVR and LSTM.

•
A k-step forecasting strategy is designed to introduce k forecasting points/values simultaneously.
The value of k is determined to be less than or equal to the number of cores/threads to maintain the efficiency.The actual forecasting period/response time depends on the power consumption recording interval and the value of k.Compared with traditional one-step forecasting strategies, the k-step forecasting solution provides more response time for dynamic electricity market bidding.
Five individual households located in UK are studied to show the effectiveness and robustness of the proposed hybrid DLNN structure design.The study of multi-step electric power consumption forecasting strategy can be useful in customizing the smart grid planning and electricity market bidding system design.

Materials and Methods
Long short term memory (LSTM) and convolutional neural network (CNN) are two hot branches of deep learning neural network and they have attracted wide attention across the world in recent years.In this study, aiming at solving the high volatility and uncertainty of single household power consumption forecasting problem, we combine LSTM and CNN to form a hybrid deep learning approach that is able to provide more accurate and robust forecasting result compared with traditional approaches.
With five real-world household power consumption data, the proposed framework pre-processes the raw data with CNN and uses the output of CNN to train the LSTM model.

Data Description
The power consumption data collected from five households located in London, UK was original published by Kelly and Knottenbelt [41].In the original dataset, smart meters are used to collect power consumption data from each individual electric power device, such as television, air-con, fridge and so on.We utilize the aggregate power consumption data for the five households only.The original collection frequency is 6 s.We merge the data to convert it to time series datasets with time intervals at 5 min.Since the data lengths vary from different households, we select a continuous time period consisting of 12,000 data samples for each household.Out of 12,000 data samples, 10,800 data samples are used to train the proposed DLNN framework; and the remaining 1,200 data samples are retained for testing and verification purposes for each household.

Long Short Term Memory based Recurrent Neural Network
Long short term memory (LSTM) model is a special form of the recurrent neural network (RNN) that provides feedback at each neuron.The output of RNN is not only dependent on the current neuron input and weight but also dependent on previous neuron inputs.Therefore, theoretically speaking, the RNN structure is typically suitable for processing time series data.However, when dealing with a long and correlated series of data samples, exploding and vanishing gradients problems appear [42], which later becomes the cutting point for LSTM model to be introduced [43].
To overcome the vanishing gradients problem of RNN model, LSTM contains internal loops that maintain useful information and abandon garbages.There are four important elements in the Energies 2018, 11, 3089 5 of 15 flowchart of LSTM model: cell status, input gate, forget gate and output gate (Figure 1).The input, forget and output gates are used to control the update, maintenance and deletion of information contained in cell status.The forward computation process can be denoted as: where C t , C t−1 and Ct represent current cell status value, last time frame cell status value and the update for the current cell status value, respectively.The notations f t , i t and o t represent forget gate, input gate and output gate, respectively.With proper parameter settings, the output value h t is calculated based on Ct and C t−1 values according to Equations ( 4) and ( 6).All weights, including: W f , W i , W C and W o , are updated based on the difference between the output value and the actual value following back-propagation through time (BPTT) algorithm [44].
The internal structure of LSTM model.

Temporal Convolutional Neural Network
Convolutional neural network (CNN) is probably the most commonly used deep learning neural network which is currently mainly applied to image recognition/classification topics in the field of computer vision.With a large quantity of input raw data samples, CNN is usually capable to extract useful subsets of the input data efficiently.Generally speaking, CNN is still a feed-forward neural network, which is extended from multi-layer neural network (MLNN).The main difference between CNN and the traditional MLNN is that CNN has the properties of sparse interaction and parameter sharing [45].
Traditional MLNN uses full connection strategy to build the neural network between input layer and output layer, which means that each output neuron has the chance to interact with each input neuron.Suppose that there are m inputs and n outputs, the weight matrix has m × n entries.CNN reduces the weight matrix size from m × n to k × n by setting up a convolutional kernel with size k × k.Moreover, the convolutional kernel is shared by all inputs, which means that there is only one weight matrix with size k × n to be learned from the training process.The two properties of CNN increases the training efficiency for parameter optimization; under the same computational complexity, the CNN is able to train a neural network with more hidden layers, or, in other words, a deeper neural network.Temporal convolutional neural network introduces a special 1-D convolution, which is suitable for processing univariate time series data.Instead of using a k × k convolutional kernel as in the traditional CNN, the temporal CNN uses a kernel size of k × 1. Suppose that the input data fits function g(x) ∈ [l, 1] → R; the convolutional kernel function is f (x) ∈ [k, 1] → R. The 1-D convolution mapping between the input and kernel h(x) ∈ [(l − k)/d + 1, 1] → R with step size d can be written as: After the temporal convolutional operation, the original univariate dataset can be expanded to a m-dimensional feature dataset.In this way, the temporal CNN applies 1-D convolution to time series data and expand the univariate dataset to multi-dimensional extracted features (first phase in Figure 2); and the expanded features are found to be more suitable for prediction using LSTM.

CNN-LSTM Forecasting Framework
To attack the two challenges (volatility and univariate data) that we mentioned in Section 1, a hybrid deep neural network (DNN) combining CNN with LSTM is proposed.The structure of the hybrid DNN framework is depicted in Figure 2. In the pre-processing phase, CNN extract important information from the input data and most importantly, re-organize the univariate input data to multi-dimensional batches using convolution (Figure 2).In the second phase, the re-organized batches are input into LSTM units to perform forecasting.
From Figure 2, a two-hidden-layer temporal CNN is used to pre-process the input dataset.It is noted that the traditional temporal CNN usually includes pooling operations to prevent over-fitting when the number of hidden layer is greater than five.In this study, we omit the pooling operation to maximally retain the extracted features.
After pre-processing the input data, a LSTM neural network is designed to train and forecast the power consumption for individual household.The training process of LSTM structure is shown in Figure 3, where the extracted features from the first phase are treated as inputs to train the LSTM model.A dropout layer is added to the LSTM neural network to prevent overfitting.The loss value, which is the difference between the predicted output y p and the expected output y e , is computed to optimize the weights of all LSTM units.The optimization process follows the gradient descent optimization algorithm named RMSprop, which is commonly used for weight optimization of deep neural networks [46].

A k-Step Power Consumption Forecasting Strategy
Traditional power consumption forecasting approaches focus on one-step forecasting solutions [2,12,13].For very short step size, such as 5 min, the response time can be too short for manual/automated electricity market bidding.In this study, we design a k-step power consumption forecasting strategy, which predicts k future data points simultaneously.Preassumption is made that the historical data is long enough to perform the data re-organization step.
Recall that the original power consumption data collected by UK-DALE has the step size at 6 s.The original data can be re-organized into different datasets with step size at n min, 2n min, . . . .kn min.In this study, we focus on n = 5.For each dataset, a core or a thread can be assigned to perform CNN-LSTM power consumption forecasting.The combinational result of all calculations from k cores provides a k-step power consumption forecasting solution, i.e., forecasting power consumption data points at 5 min, 10 min, . . .until 5k min in the future.Detailed algorithm of the proposed k-step power consumption forecasting strategy is shown in Algorithm 1.
Algorithm 1 A k-step power consumption forecasting strategy Input: The UK-DALE dataset.Output: Data points at 5 min, 10 min, .. . .5k min.Initialization: re-organize the original data into k different datasets according to specified step sizes.While There are unassigned datasets and there are free threads/cores Assign any unassigned dataset to a free thread/core.Apply the proposed CNN-LSTM framework to the specific dataset and obtain one-step forecasting result.end-While Combine all one-step forecasting results to obtain a k-step power consumption forecasting result.
Using the concurrent programming, we claim that the efficiency of the proposed k-step forecasting algorithm is competitive to the traditional one-step forecasting algorithms, given that the value of k is less than or equivalent to the number of threads/cores.

Results
The proposed hybrid DNN framework is implemented using Python 3.5.2(64-bit) with PyCharm Community Edition 2016.3.2.The hardware configuration includes an Intel Core i7-7700 CPU @2.80GHz, 8G RAM and a NVIDIA GeForce GTX1050 graphics card.The proposed hybrid DNN framework is built based on the open source deep learning tool Tensorflow, proposed by Google [47] with Keras [48] version 2.0.8 as the front-end interface.
The prediction results of the proposed CNN-LSTM are compared with modern existing methods, including ARIMA model, SVR and LSTM.The prediction performances are evaluated using error metrics [49].Three error metrics are calculated, including root-mean-square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE).Generally speaking, smaller values of the error metrics present higher prediction accuracy.The formulations of the above three metrics are listed in Equations ( 7)-( 9): where y i is an actual testing sample value; ŷi is the prediction result of y i ; and N is the total number of testing samples.
All error metrics values of the five compared methods with all five households data described in Section 2.1 are listed in Table 1.The averaged computational time for each prediction point is recorded in Table 2 for all compared methods except the persistence model, since the persistence simply takes the previous time stamp's data as the prediction result [50].On average, and most of the cases in Table 1, the proposed CNN-LSTM framework outperforms all other compared forecasting methods with reasonable computational time (around 0.06 s for each prediction).Compared with SVR, the proposed framework has slightly higher MAE and RMSE values for households 2 and 4. From the data description of UK-DALE project, the power consumption curves of households 2 and 4 are less volatile; and the power consumption curves of households 1, 3 and 5 are relatively more active.The prediction results suggest that the deep learning methods are more suitable for volatile data description.Moreover, for MAPE, which measures the relative errors of the prediction results, the proposed CNN-LSTM framework shows lower error rates compared with all other methods for all five households.
Figures 4-6 show the detailed prediction results for households 1, 3 and 5.The actual power consumption curves are shown in black color; and the CNN-LSTM prediction results are shown in red.In general, from Figures 4-6, the proposed CNN-LSTM method shows lower prediction errors and consequently higher prediction accuracy compared with ARIMA model, SVR and LSTM for all fives houses power consumption data collected by the UK-DALE dataset, which suggests that the proposed method is more robust than other methods for short-term power consumption forecasting.
In addition, we show the k-step power consumption forecasting results for k value up to 6. Table 3 shows RMSE and MAPE values for each house, while the value of k increases from 2 to 6.In Figure 7, twenty groups of 6-step power consumption forecasting results are depicted with training data omitted.It can be easily observed that the k-step forecasting algorithm produces more steps of forecasting results with acceptable compared to traditional one-step forecasting approaches.Based on Algorithm 1, the k-step forecasting method repetitively runs the proposed CNN-LSTM framework.The average error and the average running time for the k-step algorithm will be very close to the original one-step CNN-LSTM framework, given that the value of k is less than or equivalent to the number of cores/threads.Considering a very small power consumption interval at 5 min, the proposed method demonstrates a 30 min response time forecasting for dynamic electricity market bidding, which can be potentially useful in real-world applications [6].The 30 min forecasting period is the necessary response time that we considered in this experimental section.Nevertheless, the 30 min response time can be further extended in two ways:

•
First, the 5 × 6 = 30 min can be extended with larger k value.In order to keep our computation in real-time, we force the value of k to be less than or equivalent to the number of cores/threads.The response time can be extended with more powerful CPU.

•
Second, the 5 × 6 = 30 min can also be extend using a coarser time interval, e.g., 15 min resolution instead of 5 min.For k = 6, the proposed k-step forecasting algorithm provides a one-and-a-half-hour response time for market bidding.
The project page and source code of the proposed CNN-LSTM framework is freely available online at: http://www.keddiyan.com/files/PowerForecast.html.

Conclusions and Future Work
This study proposed a novel hybrid deep learning neural network framework combining convolutional neural network (CNN) and long short term memory (LSTM) neural work to deal with univariate and volatile residential power consumption forecasting.Recent works already show that by LSTM neural network alone, high prediction accuracy for power consumption forecasting can be achieved [2,12,13].We further demonstrate that the hybrid framework that was proposed in this study outperforms the conventional LSTM neural network.The CNN extracts the most useful information from the original raw data and converts the univariate single household power consumption dataset into multi-dimensional data, which potentially facilitates the prediction performance of LSTM.
Figure 8 shows the prediction accuracy improvement from conventional LSTM to CNN-LSTM using MAPE as a measurement metric.The results were obtained based on five real-world households power consumption data collected by the UK-DALE project.The proposed CNN-LSTM framework is 13.1%, 48.8%, 2.4%, 33.2% and 14.5% lower than LSTM, respectively, for the five tested households, using MAPE as the error metric, which demonstrates the usefulness of the proposed method for maintaining a sustainable balance between energy consumption and savings.

Figure 2 .
Figure 2. The proposed hybrid DNN power consumption forecasting framework.

Figure 3 .
Figure 3.The training process of the LSTM model.

Figure 4 .
Figure 4.The prediction results for household 1 power consumption data using various methods.The dark red box shows a zoom-in region of the prediction results.

Figure 7 .
Figure 7. Twenty groups of k-step (k = 6) power consumption forecasting results for different household datasets, showing the performance and robustness of the proposed CNN-LSTM framework.

Figure 8 .
Figure 8. Experimental result comparison between LSTM and CNN-LSTM using MAPE as an error metric.

Table 2 .
Averaged computational time (in seconds) taken by CNN-LSTM, LSTM, SVR and ARIMA models for each predicted data point.