A Hybrid Neural Network Model for Power Demand Forecasting

: The problem of power demand forecasting for the effective planning and operation of smart grid, renewable energy and electricity market bidding systems is an open challenge. Numerous research efforts have been proposed for improving prediction performance in practical environments through statistical and artiﬁcial neural network approaches. Despite these efforts, power demand forecasting problems remain to be a grand challenge since existing methods are not sufﬁciently practical to be widely deployed due to their limited accuracy. To address this problem, we propose a hybrid power demand forecasting model, called ( c , l )-Long Short-Term Memory (LSTM) + Convolution Neural Network (CNN). We consider the power demand as a key value, while we incorporate c different types of contextual information such as temperature, humidity and season as context values in order to preprocess datasets into bivariate sequences consisting of <Key, Context [1, c ] > pairs. These c bivariate sequences are then input into c LSTM networks with l layers to extract feature sets. Using these feature sets, a CNN layer outputs a predicted proﬁle of power demand. To assess the applicability of the proposed hybrid method, we conduct extensive experiments using real-world datasets. The results of the experiments indicate that the proposed ( c , l )-LSTM+CNN hybrid model performs with higher accuracy than previous approaches.


Introduction
Power demand forecasting is an important and challenging topic for the fields of smart grids, renewable energy and the electricity market bidding system.Power demand forecasting for the prevention of blackouts is becoming crucial globally as power consumption in businesses and homes rapidly increases.Optimum operation of the power system based upon accurate power demand forecasting is essential.Traditionally, governments and utilities have adopted a variety of methods to secure more power plants, including nuclear, hydropower, thermal power and renewable energy sources, to balance the demand and supply of electricity.Governments have also recently put considerable effort into increasing the efficiency of power systems by profiling the demand for power consumption and inhibiting the maximum demand during peak hours.The importance of power demand forecasting for predicting the profile of power demand is growing.Power demand forecasting facilitates the processing of estimating the required generation amount (power generation capability and reserve) in advance, and effectively controlling the demand (peak clipping and shifting).
We can classify power demand forecasting problems into three categories: short-term, mediumterm, and long-term forecasting.Short-term forecasting is used to predict power demands over time periods of minutes, hours, days or weeks.Medium-term forecasting extends from months or one to two years, and long-term forecasting deals with predictions many years ahead.
Short-term power demand forecasting has been an active area of research.There are two main approaches: statistical methods and artificial neural networks.Statistical approaches include the Autoregressive Integrated Moving Average (ARIMA) [1], double seasonal Holt-Winters exponential smoothing [2] and PCA-based Linear Regression [3].Recently, artificial neural network-based approaches have received considerable attention in power demand forecasting.In an artificial neural network model, the model architecture varies depending upon both the period to be predicted and the data required to make the prediction.Broadly speaking, there are two type of sequences: univariate or multivariate.Models using univariate datasets tend to be simple, small in size and quick to train, but they have low accuracy, while models based on multivariate datasets are slower and more computationally intensive in practice.To address these problems, we first preprocess a given dataset into multi-bivariate sequences to effectively learn the features that can be extracted from individual context information.Then, we exploit a novel hybrid network model to accurately predict an n-day profile of power demand.Specifically, the proposed hybrid network consists of multi-LSTM layers and a CNN layer.In the multi-LSTM layers, each layer extracts features from each input, comprised of a bivariate (power and contextual information) sequence, and feeds these feature sets to a CNN layer to obtain an n-day profile.The proposed hybrid model is aimed at general forecasting problems with all short-term level of temporal granularities (minutes, hours, days etc.).The rationale of the proposed hybrid model design is to combine the efficiency of multi-LSTM in extracting features from various context information with the ensemble potential of CNN by introducing a bivariate-based context learning approach.
The rest of the paper is organized as follows: in Section 2, we introduce related researches on power demand forecasting using artificial neural networks and hybrid models.Section 3 first describes the pre-processing of the datasets and then depicts our hybrid network model.Section 4 describes the experimental methods and results.Lastly, Section 5 concludes the paper.

Power Demand Forecasting Using Deep Learning
Most of the techniques used to predict power demand have included Recurrent Neural Network (RNN)-based LSTMs, which have used on time series data and natural language processing [4,5].In particular, CNNs have produced high classification and recognition performance in the field computer vision and pattern recognition [6][7][8] and have also been demonstrated to be effective in various fields involving time series data such as language data, human behavior pattern data, energy load data etc.[9][10][11].
An ensemble deep learning method using several deep learning networks was described in [9].In this paper, based on the observation that the output value changes when the number of epochs is changed, output values were obtained by using different epochs for each Deep Belief Network (DBN) over several DBNs.The authors then constructed an ensemble deep learning network using the output from a Support Vector Regression (SVR) as input and showed 4% and 15% better performance in predicting power demand than was obtained using SVR and DBN, respectively.In [10,11], time series data were processed using a multi-channel Deep CNN model which learns features from an individual univariate time series in each channel, and combines information from all channels to produce a feature representation at the final layer.This method was also applied to human behavior pattern data and ECG (electrocardiogram) data.
Most of the studies using artificial neural networks for power demand forecasting have used data from residential buildings, commercial or office buildings.Experiments on solar powered buildings have also been conducted [12].These experiments used power demand data from business days, non-business days, and seasonal data.The model was optimized by adjusting the numbers of features features and neurons.A study using two types of artificial neural network models was described in [13].One model used a pre-trained Restricted Boltzmann machine (RBM), the other used a Rectified Linear Unit (ReLU) without pre-training.These models obtained better results in predicting the future 24 h than ARIMA or Shallow Neural Network (SNN).For small power systems with non-linear and noncritical characteristics, [14] used an LSTM model to predict power demand.The amount of power used in residential areas was divided into smaller groups, down to individual households.A study using the LSTM model to forecast the power demand for each household was conducted [15].The authors forecasted the amount of power needed in the future based upon the current amount of power produced in a solar power plant.The LSTM, DBN, and Auto-LSTM were used in the experiment, and the Auto-LSTM had the best performance.Reference [16] proposed the Augmented LSTM (ALSTM) network method, which enhances the Auto-LSTM network method used in [17] by combining the AutoEncoder and LSTM.A study was carried out on forecasting power demand after 60 h by constructing the encoder and decoder using a Sequence to Sequence (S2S) structure-based on an LSTM.This work reported in [18] (Figure 1).This study mapped the date of the next day, not the current date, to power values.When power values and dates of the same day are used as inputs, the predicted values of the same day simply follow the pattern of the previous power values.In order to improve the accuracy of the estimate of power demand for individual households, [19] proposed Pooling-based Deep-RNN (PDRNN).This study used the power data from the target household as well as those from neighboring power areas.The root mean square error (RMSE) of PDRNN was much lower: 19.5%, 13.1% and 6.5% compared to results from the ARIMA, SVR and classical deep RNNs.
To predict the power demand for individual buildings, the network in [20] was constructed using only a CNN, and was evaluated with only changing parameters.The data model used in [20] differs from existing methods in that only the power data is input to the first CNN layer, and the final fully connected layer incorporates information such as date and temperature, to predict power demand.In order to evaluate its performance, the proposed network, a Support Vector machine (SVM) and RBM were compared [20].Experimental results showed better performance than previous methods, but the network model in [20] was not better than the method described in [18].The use of CNN-based bagging techniques for smart grid load forecasting was reported in [21].In reference [22], the USA District public consumption dataset and load dataset for 2016 provided by the Electric Reliability Council of Texas were processed using multiple CNNs to forecast power demand.
In the area of Natural Language Processing (NLP), RNNs, which are excellent for time series data processing, are primarily used.In order to improve an RNN's performance, it is necessary to In order to improve the accuracy of the estimate of power demand for individual households, [19] proposed Pooling-based Deep-RNN (PDRNN).This study used the power data from the target household as well as those from neighboring power areas.The root mean square error (RMSE) of PDRNN was much lower: 19.5%, 13.1% and 6.5% compared to results from the ARIMA, SVR and classical deep RNNs.
To predict the power demand for individual buildings, the network in [20] was constructed using only a CNN, and was evaluated with only changing parameters.The data model used in [20] differs from existing methods in that only the power data is input to the first CNN layer, and the final fully connected layer incorporates information such as date and temperature, to predict power demand.In order to evaluate its performance, the proposed network, a Support Vector machine (SVM) and RBM were compared [20].Experimental results showed better performance than previous methods, but the network model in [20] was not better than the method described in [18].The use of CNN-based bagging techniques for smart grid load forecasting was reported in [21].In reference [22], the USA District public consumption dataset and load dataset for 2016 provided by the Electric Reliability Council of Texas were processed using multiple CNNs to forecast power demand.
In the area of Natural Language Processing (NLP), RNNs, which are excellent for time series data processing, are primarily used.In order to improve an RNN's performance, it is necessary to carefully select useful contextual information.Reference [23] conducted a study to predict where users should move next by selecting time and space as the contextual information, in order to achieve better results than traditional RNN models.Similarly, reference [24] introduced an RNN which is dependent on contextual information.In this study, when using input words to predict the next word, a feature layer with context information about the sentence topic was added.

Approaches Based on a Hybrid Network Model
One of the hybrid network structures for power demand forecasting is the CLDNN (a unified architecture of CNN, LSTM, and DNN) structure proposed in [25].In this model as shown in Figure 2, LSTM layers were stacked on top of a CNN to create a hybrid network.This model was proposed for natural language processing, and the results on the power demand prediction problem showed limited prediction accuracy.
Energies 2019, 12, x FOR PEER REVIEW 4 of 18 dependent on contextual information.In this study, when using input words to predict the next word, a feature layer with context information about the sentence topic was added.

Approaches Based on a Hybrid Network Model
One of the hybrid network structures for power demand forecasting is the CLDNN (a unified architecture of CNN, LSTM, and DNN) structure proposed in [25].In this model as shown in Figure 2, LSTM layers were stacked on top of a CNN to create a hybrid network.This model was proposed for natural language processing, and the results on the power demand prediction problem showed limited prediction accuracy.In reference [26], CNNs and LSTMs were used together to construct a framework consisting of two phases, to estimate power demand (Figure 3).The first function of the CNN layer is to extract the features of the power data, and the second function is to transform the one-dimensional power data into a multidimensional dataset by using the output of the CNN as input to the second phase, LSTM.The results are output through the dropout [27] layer.In reference [26], multi-step forecasting was performed, unlike traditional power demand forecasting methods based on a one-step forecasting.Hybrid network studies for predicting power demand were also reported in [28,29].In reference [28], the authors transformed a dataset into 2D images and used those images as inputs to a CNN-RNN model.The accuracy of the CNN-RNN was 10% and 26% higher than that of an LSTM and an ANN [30], respectively.Another study with a CNN-LSTM based hybrid framework was proposed in [29].In this study, CNN and LSTM were arranged horizontally and the characteristics of the input data were extracted separately.After feature extraction by the CNN and the LSTM, the outputs of the two networks were concatenated in the merge layer of a feature-fusion layer.
In this paper, we propose the (c, l)-LSTM+CNN hybrid prediction model.As discussed in Section 3.2, we place multi-LSTM networks at the front to extract feature sets.Then, we create an ensemble by adding a CNN layer after the LSTMs in order to produce the final output.In reference [26], CNNs and LSTMs were used together to construct a framework consisting of two phases, to estimate power demand (Figure 3).The first function of the CNN layer is to extract the features of the power data, and the second function is to transform the one-dimensional power data into a multidimensional dataset by using the output of the CNN as input to the second phase, LSTM.The results are output through the dropout [27] layer.In reference [26], multi-step forecasting was performed, unlike traditional power demand forecasting methods based on a one-step forecasting.
Energies 2019, 12, x FOR PEER REVIEW 4 of 18 dependent on contextual information.In this study, when using input words to predict the next word, a feature layer with context information about the sentence topic was added.

Approaches Based on a Hybrid Network Model
One of the hybrid network structures for power demand forecasting is the CLDNN (a unified architecture of CNN, LSTM, and DNN) structure proposed in [25].In this model as shown in Figure 2, LSTM layers were stacked on top of a CNN to create a hybrid network.This model was proposed for natural language processing, and the results on the power demand prediction problem showed limited prediction accuracy.In reference [26], CNNs and LSTMs were used together to construct a framework consisting of two phases, to estimate power demand (Figure 3).The first function of the CNN layer is to extract the features of the power data, and the second function is to transform the one-dimensional power data into a multidimensional dataset by using the output of the CNN as input to the second phase, LSTM.The results are output through the dropout [27] layer.In reference [26], multi-step forecasting was performed, unlike traditional power demand forecasting methods based on a one-step forecasting.Hybrid network studies for predicting power demand were also reported in [28,29].In reference [28], the authors transformed a dataset into 2D images and used those images as inputs to a CNN-RNN model.The accuracy of the CNN-RNN was 10% and 26% higher than that of an LSTM and an ANN [30], respectively.Another study with a CNN-LSTM based hybrid framework was proposed in [29].In this study, CNN and LSTM were arranged horizontally and the characteristics of the input data were extracted separately.After feature extraction by the CNN and the LSTM, the outputs of the two networks were concatenated in the merge layer of a feature-fusion layer.
In this paper, we propose the (c, l)-LSTM+CNN hybrid prediction model.As discussed in Section 3.2, we place multi-LSTM networks at the front to extract feature sets.Then, we create an ensemble by adding a CNN layer after the LSTMs in order to produce the final output.Hybrid network studies for predicting power demand were also reported in [28,29].
In reference [28], the authors transformed a dataset into 2D images and used those images as inputs to a CNN-RNN model.The accuracy of the CNN-RNN was 10% and 26% higher than that of an LSTM and an ANN [30], respectively.Another study with a CNN-LSTM based hybrid framework was proposed in [29].In this study, CNN and LSTM were arranged horizontally and the characteristics of the input data were extracted separately.After feature extraction by the CNN and the LSTM, the outputs of the two networks were concatenated in the merge layer of a feature-fusion layer.
In this paper, we propose the (c, l)-LSTM+CNN hybrid prediction model.As discussed in Section 3.2, we place multi-LSTM networks at the front to extract feature sets.Then, we create an ensemble by adding a CNN layer after the LSTMs in order to produce the final output.In power demand forecasting, there are two general types of sequences: univariate or multivariate.Models based on univariate datasets simply use the power demand values.The model size is therefore relatively small, and training time is fast.However, these models have the disadvantage that the error rate is inapplicable because context information is excluded.Models based on multivariate datasets include context information in addition to the power demand values.These models generally show better performance because of the addition of the context information.The model in [18] used this type of data, and showed better performance than other models.

Data Processing and Deep Learning Models
Power demand forecasting can be influenced by many factors that can change patterns of load consumption.In this paper, we consider the power value as a Key value, and the other context domains as c Context information.We vertically divide a given dataset into sets of <Key, Context [1, c] > pairs such as <power, temperature> or <power, day>.We create five bivariate sequences using five context domains: temperature, humidity, holiday status, day of the week, and season.Table 1 summarizes the data notation used in this paper.This pairing scheme enables the proposed model to be scalable with increasing numbers of context domains by simply adding additional <Key, Context> pairs.

Overlapped Window and Dataset
In order to predict the power demand for seven future days, we use the previous 14 days' worth of information as training data.Specifically, we use a 14-day overlapped window with a one-day stride, as shown in Figure 4.The red box in Figure 4 is the first 14-day window used for training, and the blue box is the second 14-day window, to be learnt after sliding the window by one day.In power demand forecasting, there are two general types of sequences: univariate or multivariate.Models based on univariate datasets simply use the power demand values.The model size is therefore relatively small, and training time is fast.However, these models have the disadvantage that the error rate is inapplicable because context information is excluded.Models based on multivariate datasets include context information in addition to the power demand values.These models generally show better performance because of the addition of the context information.The model in [18] used this type of data, and showed better performance than other models.
Power demand forecasting can be influenced by many factors that can change patterns of load consumption.In this paper, we consider the power value as a Key value, and the other context domains as c Context information.We vertically divide a given dataset into sets of <Key, Context[1, c]> pairs such as <power, temperature> or <power, day>.We create five bivariate sequences using five context domains: temperature, humidity, holiday status, day of the week, and season.Table 1 summarizes the data notation used in this paper.This pairing scheme enables the proposed model to be scalable with increasing numbers of context domains by simply adding additional <Key, Context> pairs.In order to predict the power demand for seven future days, we use the previous 14 days' worth of information as training data.Specifically, we use a 14-day overlapped window with a one-day stride, as shown in Figure 4.The red box in Figure 4 is the first 14-day window used for training, and the blue box is the second 14-day window, to be learnt after sliding the window by one day.In this paper, we use Korea's daily power demand dataset [31] provided by the Korea Power Exchange.This dataset is a set of daily power demand values for each day from 1 January 2006 to 20 May 2017.Specifically, it consists of 4158 power demand values.As mentioned in Section 3.1.1,these power demand values are paired with five context domains: temperature, humidity, holiday status, day of the week, and season, resulting in five bivariate sequences in the form of <Key, Context [1, c] >.
To further investigate the performance behavior, we classify the dataset into six categories.The reason for dividing the dataset into the categories shown in Table 2 is that the power demand amount during the weekend is different from that during the weekdays.As shown in Figure 5a,b, seasonal power demand is different, and overall it is apparent that weekend power demand is less than during weekdays.Therefore, when constructing our training datasets, we first divide the dataset by the presence or absence of holidays.Then, as shown in Table 2,  In this paper, we use Korea's daily power demand dataset [31] provided by the Korea Power Exchange.This dataset is a set of daily power demand values for each day from 1 January 2006 to 20 May 2017.Specifically, it consists of 4158 power demand values.As mentioned in Section 3.1.1,these power demand values are paired with five context domains: temperature, humidity, holiday status, day of the week, and season, resulting in five bivariate sequences in the form of <Key, Context[1, c]>.
To further investigate the performance behavior, we classify the dataset into six categories.The reason for dividing the dataset into the categories shown in Table 2 is that the power demand amount during the weekend is different from that during the weekdays.As shown in Figure 5a,b, seasonal power demand is different, and overall it is apparent that weekend power demand is less than during weekdays.Therefore, when constructing our training datasets, we first divide the dataset by the presence or absence of holidays.Then, as shown in Table 2,

(c, l)-LSTM+CNN Hybrid Forecasting Model
Several studies have shown that LSTMs show good performance when learning time series data [14][15][16][17][18].In reference [32], three different network types-CNN, LSTM, and DNN-were used to improve speech recognition performance.After training the three networks separately, three outputs were generated, and a combination layer was added.In reference [25], it was suggested that the design of combining the three separate networks results in better performance than that of any of the networks individually.With these observations in mind, we propose a hybrid deep learning neural network framework combining LSTM neural network with CNN to deal with the power demand forecasting problem.This hybrid network consists of c LSTM networks with l layers, followed by a CNN.Unlike the data preprocessing method used in the previous studies, the proposed hybrid model extracts the features of a dataset using an LSTM neural network in front stage.As mentioned in the previous section, the dataset is preprocessed into bivariate sequences in the form of <Key, Context [1, c] > pairs.Each bivariate sequence is used as the input to an LSTM network with l layers.The LSTM network is composed of l layers, as shown in Figure 6, and acts as a node in the entire network structure, as shown in Figure 7.

(c, l)-LSTM+CNN Hybrid Forecasting Model
Several studies have shown that LSTMs show good performance when learning time series data [14][15][16][17][18].In reference [32], three different network types-CNN, LSTM, and DNN-were used to improve speech recognition performance.After training the three networks separately, three outputs were generated, and a combination layer was added.In reference [25], it was suggested that the design of combining the three separate networks results in better performance than that of any of the networks individually.With these observations in mind, we propose a hybrid deep learning neural network framework combining LSTM neural network with CNN to deal with the power demand forecasting problem.This hybrid network consists of c LSTM networks with l layers, followed by a CNN.Unlike the data preprocessing method used in the previous studies, the proposed hybrid model extracts the features of a dataset using an LSTM neural network in front stage.As mentioned in the previous section, the dataset is preprocessed into bivariate sequences in the form of <Key, Context[1, c]> pairs.Each bivariate sequence is used as the input to an LSTM network with l layers.The LSTM network is composed of l layers, as shown in Figure 6, and acts as a node in the entire network structure, as shown in Figure 7.

(c, l)-LSTM+CNN Hybrid Forecasting Model
Several studies have shown that LSTMs show good performance when learning time series data [14][15][16][17][18].In reference [32], three different network types-CNN, LSTM, and DNN-were used to improve speech recognition performance.After training the three networks separately, three outputs were generated, and a combination layer was added.In reference [25], it was suggested that the design of combining the three separate networks results in better performance than that of any of the networks individually.With these observations in mind, we propose a hybrid deep learning neural network framework combining LSTM neural network with CNN to deal with the power demand forecasting problem.This hybrid network consists of c LSTM networks with l layers, followed by a CNN.Unlike the data preprocessing method used in the previous studies, the proposed hybrid model extracts the features of a dataset using an LSTM neural network in front stage.As mentioned in the previous section, the dataset is preprocessed into bivariate sequences in the form of <Key, Context[1, c]> pairs.Each bivariate sequence is used as the input to an LSTM network with l layers.The LSTM network is composed of l layers, as shown in Figure 6, and acts as a node in the entire network structure, as shown in Figure 7.We use 20 units of each LSTM and train each with data from the past 14 days.The feature set extracted from each LSTM network is a 14 × 20 matrix.We then integrate these feature sets to produce the input to the CNN.Specifically, we combine the feature sets of the c (=5) LSTM networks into one 14 × 20 matrix using element-wise multiplication.This matrix is passed to the input of the CNN layer to get the power demand forecast for the next seven days.
In this study, we use multi two-hidden-layer LSTMs and a two-hidden-layer CNN.After the CNN layer, we apply max-pooling (=2) once.There are 495 neurons in the proposed model structure and 60,363 parameters.In our proposed hybrid model, we use the Grid Search function provided by SciKit-learn [33] to optimize the entire model.If we specify a range of hyperparameters, this function learns by changing each parameter value within the given range and returns the value at which the optimum result is obtained.Specifically, when we specify a range of the value of hyperparameters, we take into account the data range and data size of our datasets, and as a result, we use 64 filters, a kernel size of three for each CNN layer as described in Table 3.The loss value, which is the difference between the predicted output ŷ and the expected output y, is computed as the mean squared error.The optimization process uses the gradient descent optimization algorithm called the Adam optimizer [34], which is commonly used for weight optimization of deep neural networks.The activation function applied to the network for each layer is the ReLU.

Experiments and Results
In this section, we assess the efficiency and effectiveness of the proposed (c, l)-LSTM+CNN hybrid model by comparing it against three widely-used models, ARIMA, (c, l)-LSTM and S2S LSTM.Note that (c, l)-LSTM is a model that does not include the CNN layer used in our proposed model.In this paper, two metrics are used to evaluate the forecasting accuracy of the model.One is the mean absolute percentage error (MAPE) (Equation ( 1)), and the other is the relative root-mean-square error (RRMSE) (Equation ( 2)) called the power consumption prediction error rate.Smaller values of the error metrics indicate higher forecasting accuracy: where y i is an actual test value; ŷi is the forecasting result of y i ; and N is the total number of testing samples.

Experiment Environment and Determination of the Number of Layers l
The proposed (c, l)-LSTM+CNN hybrid model is implemented using Python 3.5.2(64-bit) with PyCharm Community Edition 2016.3.2.The hardware configuration includes an Intel Core i7-5820k CPU@3.3GHz,32G RAM and a NVIDIA GeForce GTX780 graphics card.The proposed hybrid model is built using Tensorflow [35] with Keras [36] version 2.1.5as the front-end interface.
Different values for the number of layers parameter l, may lead to different accuracies, and this parameter thus may have a non-trivial impact on the overall performance of the proposed hybrid Energies 2019, 12, 931 9 of 17 model.In this section, we focus on determining a value for l that obtains the minimum prediction error.We conduct a set of experiments to determine l in LSTM networks.The dataset used for these experiments is the all-day dataset d 1 .We conduct four sets of experiments varying l from 1 to 4 (Table 4 and Figure 8).Since the results of these experiments show that the highest accuracy is achieved at l = 2, we set the number of layers l in each LSTM network to 2. experiments is the all-day dataset d1.We conduct four sets of experiments varying l from 1 to 4 (Table 4 and Figure 8).Since the results of these experiments show that the highest accuracy is achieved at l = 2, we set the number of layers l in each LSTM network to 2.

Case 1: With Holidays
We first present the results of experiments using the datasets of d1, d2 and d3, which include holidays.The dataset d1 includes all days of the week, while the datasets d2, d3 are classified by season and by day of the week, respectively.As mentioned in Section 3.1.2,we use a daily power demand sequence from 1 January 2006 to 20 May 2017 as training data.In addition, we use five types of context information: average temperature, humidity, holiday status, day of the week, and seasons.Most studies on forecasting power demand provide only hourly or daily predictions [37][38][39].To address this issue, we choose a next 7-day profile as a target profile from 21 May 2017 to 27 May 2017 for performance evaluation.
Figures 9-11 show the prediction results for each dataset, i.e., d1, d3 and d2, respectively.As shown in Figures 9-11, the proposed (c, l)-LSTM+CNN hybrid model shows lower forecasting error and consequently higher forecasting accuracy compared with the ARIMA model, (c, l)-LSTM and S2S LSTM for all datasets.The proposed hybrid model shows the highest accuracy when trained on the dataset d3 (Figure 10).

Case 1: With Holidays
We first present the results of experiments using the datasets of d 1 , d 2 and d 3 , which include holidays.The dataset d 1 includes all days of the week, while the datasets d 2 , d 3 are classified by season and by day of the week, respectively.As mentioned in Section 3.1.2,we use a daily power demand sequence from 1 January 2006 to 20 May 2017 as training data.In addition, we use five types of context information: average temperature, humidity, holiday status, day of the week, and seasons.Most studies on forecasting power demand provide only hourly or daily predictions [37][38][39].To address this issue, we choose a next 7-day profile as a target profile from 21 May 2017 to 27 May 2017 for performance evaluation.The average forecasting errors of models trained on datasets d 1 , d 2 and d 3 are summarized in Table 5.Our proposed (c, l)-LSTM+CNN hybrid model produces better results when using d 3 than when using d 1 or d 2 .Also, the proposed hybrid model shows better forecasting accuracy than the ARIMA, (c, l)-LSTM and S2S LSTM models.The prediction errors (MAPE) of the proposed model trained on d 1 are 70%, 58% and 45% lower than ARIMA, (c, l)-LSTM and S2S LSTM, respectively.In addition, the MAPE of the proposed model trained on d 2 are up to 4% lower than those of ARIMA, (c, l)-LSTM and S2S LSTM.In particular, the (c, l)-LSTM+CNN trained on d 3 shows the best forecasting accuracy by, on average, 0.81% and 1.17% for MAPE and RRMSE, respectively.The MAPE of our model is much lower by 74%, 76% and 51% compared to the ARIMA, (c, l)-LSTM and S2S LSTM, respectively.

Case 2: Without Holidays
As mentioned in Section 3.1.2,the pattern of power demand on weekends and holidays differs from that on weekdays, making it difficult for models to learn the underlying patterns, if trained on all of the data.To further investigate this fact, we divide the dataset without holidays into three categories, d 4 , d 5 and d 6 , as shown in Table 2.
Figures 13-15 present the performance of each model using the datasets d 4 , d 6 and d 5 , respectively.From these figures, we observe that the datasets without holidays (d 4 , d 5 and d 6 ) are more effective for training models than the datasets with holidays (d 1 , d 2 and d 3 ).In particular, as shown in Figure 14, the proposed (c, l)-LSTM+CNN hybrid model shows the best forecasting accuracy when using the dataset d 6 .
Energies 2019, 12, x FOR PEER REVIEW 12 of 18 when using d1 or d2.Also, the proposed hybrid model shows better forecasting accuracy than the ARIMA, (c, l)-LSTM and S2S LSTM models.The prediction errors (MAPE) of the proposed model trained on d1 are 70%, 58% and 45% lower than ARIMA, (c, l)-LSTM and S2S LSTM, respectively.In addition, the MAPE of the proposed model trained on d2 are up to 4% lower than those of ARIMA, (c, l)-LSTM and S2S LSTM.In particular, the (c, l)-LSTM+CNN trained on d3 shows the best forecasting accuracy by, on average, 0.81% and 1.17% for MAPE and RRMSE, respectively.The MAPE of our model is much lower by 74%, 76% and 51% compared to the ARIMA, (c, l)-LSTM and S2S LSTM, respectively.

Case 2: Without Holidays
As mentioned in Section 3.1.2,the pattern of power demand on weekends and holidays differs from that on weekdays, making it difficult for models to learn the underlying patterns, if trained on all of the data.To further investigate this fact, we divide the dataset without holidays into three categories, d4, d5 and d6, as shown in Table 2.
Figures 13-15 present the performance of each model using the datasets d4, d6 and d5, respectively.From these figures, we observe that the datasets without holidays (d4, d5 and d6) are more effective for training models than the datasets with holidays (d1, d2 and d3).In particular, as shown in Figure 14, the proposed (c, l)-LSTM+CNN hybrid model shows the best forecasting accuracy when using the dataset d6.To further analyze the forecasting performance of each model, we present the forecasting result for each model in Figures 16a-d.As can be seen from Figure 16a, ARIMA trained on d4 and d5 shows similar patterns, but shows a different pattern when trained on d6.Figures 16b,c show the results of the (c, l)-LSTM and S2S LSTM, respectively.To further analyze the forecasting performance of each model, we present the forecasting result for each model in Figures 16a-d.As can be seen from Figure 16a, ARIMA trained on d4 and d5 shows similar patterns, but shows a different pattern when trained on d6.Figures 16b,c show the results of the (c, l)-LSTM and S2S LSTM, respectively.To further analyze the forecasting performance of each model, we present the forecasting result for each model in Figure 16a-d.As can be seen from Figure 16a, ARIMA trained on d 4 and d 5 shows similar patterns, but shows a different pattern when trained on d 6 .Figure 16b,c show the results of the (c, l)-LSTM and S2S LSTM, respectively.When compared with ARIMA, we can see that the results of these two models are closer to the actual values than ARIMA.In Figure 16d, it is clear that the proposed (c, l)-LSTM+CNN hybrid model produces results closer to the actual power value pattern for most of the datasets than do the other models.In particular, the proposed hybrid model shows the best results when using d6.
The average errors of models using d4, d5 and d6 are described in Table 6.When compared with Table 5 in Section 4.2, we observe that the overall accuracy is improving.As can be seen in Table 6, the (c, l)-LSTM+CNN hybrid forecasting model trained on d6 has better accuracy than any of the other models.Specifically, the proposed (c, l)-LSTM+CNN hybrid model shows the best accuracy on average by 0.82% for MAPE and 0.90% for RRMSE.On the other hand, the RRMSEs of ARIMA, (c, l)-LSTM and S2S LSTM are 3.85%, 2.44%, and 1.40% on average, respectively.The proposed hybrid model shows 77%, 63% and 36% lower prediction error (RRMSE) than ARIMA, (c, l)-LSTM and S2S LSTM, respectively.When compared with ARIMA, we can see that the results of these two models are closer to the actual values than ARIMA.In Figure 16d, it is clear that the proposed (c, l)-LSTM+CNN hybrid model produces results closer to the actual power value pattern for most of the datasets than do the other models.In particular, the proposed hybrid model shows the best results when using d 6 .
The average errors of models using d 4 , d 5 and d 6 are described in Table 6.When compared with Table 5 in Section 4.2, we observe that the overall accuracy is improving.As can be seen in Table 6, the (c, l)-LSTM+CNN hybrid forecasting model trained on d 6 has better accuracy than any of the other models.Specifically, the proposed (c, l)-LSTM+CNN hybrid model shows the best accuracy on average by 0.82% for MAPE and 0.90% for RRMSE.On the other hand, the RRMSEs of ARIMA, (c, l)-LSTM and S2S LSTM are 3.85%, 2.44%, and 1.40% on average, respectively.The proposed hybrid model shows 77%, 63% and 36% lower prediction error (RRMSE) than ARIMA, (c, l)-LSTM and S2S LSTM, respectively.predicting over longer time periods, we present the results of predicting a 21-day profile as shown in Figure 17.The MAPE and RRMSE of S2S LSTM is 2.99% and 3.59%, respectively, while the proposed (c, l)-LSTM+CNN hybrid model is able to predict a 21-day profile with forecasting accuracy of 0.91% and 1.13% in MAPE and RRMSE, respectively.These results indicate that the proposed (c, l)-LSTM+CNN hybrid model scales well with time.

Forecasting an n-Day Profile
As discussed in Section 1, the proposed (c, l)-LSTM+CNN hybrid model can be applied to general forecasting problems with any temporal granularity.To assess the efficacy of our proposed model for predicting over longer time periods, we present the results of predicting a 21-day profile as shown in Figure 17.The MAPE and RRMSE of S2S LSTM is 2.99% and 3.59%, respectively, while the proposed (c, l)-LSTM+CNN hybrid model is able to predict a 21-day profile with forecasting accuracy of 0.91% and 1.13% in MAPE and RRMSE, respectively.These results indicate that the proposed (c, l)-LSTM+CNN hybrid model scales well with time.

Conclusions
In this paper, we propose a hybrid model for forecasting power demand for an n-day profile by combining the benefits of LSTMs and CNNs.Unlike previous studies using univariate or multivariate sequences, we preprocess a dataset by pairing a power demand value (Key) with a context value (Context c), resulting in <Key, Context [1, c]> bivariate sequences to efficiently reflect important context information to be used when training hybrid neural networks.We propose a (c, l)-LSTM+CNN hybrid forecasting model consisting of (c, l)-LSTM for extracting features from each bivariate sequence, and a CNN for ensembling these feature sets to derive a predicted profile of power demand.
Extensive experiments are conducted by dividing the dataset into two groups: with holidays and without holidays.Each group of datasets is divided into an all-days dataset, a seasonal dataset and a dataset by day of the week.We compare our proposed hybrid network with existing methods: ARIMA, (c, l)-LSTM and S2S LSTM.In particular, when we use the dataset (d6) by day of the week without holidays, the proposed (c, l)-LSTM+CNN hybrid model shows the best accuracy on average by 0.82% and 0.90% in terms of MAPE and RRMSE.Specifically, the proposed hybrid model shows 77%, 63% and 36% lower prediction error (RRMSE) than ARIMA, (c, l)-LSTM and S2S LSTM, respectively.
Since the proposed hybrid model can be applied to general forecasting problems at all shortterm levels of temporal granularity, it can be extended in various directions.In particular, we expect that the proposed hybrid model can also be applied to other types of time series such as indoor human behavioral patterns, 12-lead ECG (electrocardiogram) etc.As another example, the proposed hybrid model can be applied to the 15-minutely or hourly prediction of photovoltaics (PV) generation by considering various context information such as temperature, cloudiness, air quality index (PM2.5,PM10, O3, NO2, SO2, CO).
In addition, the importance of forecasting electricity demand in factories and houses at smallscale units with short temporal granularity is recently emerging.To deal with this issue, we are currently collecting relevant data, and plan to augment our proposed hybrid model with small-scale electric power demand forecast ability to support prosumers.In future work, we plan to extend our hybrid model to produce medium-term forecasts for horizons of lengths ranging from a few months to one to two years.

Conclusions
In this paper, we propose a hybrid model for forecasting power demand for an n-day profile by combining the benefits of LSTMs and CNNs.Unlike previous studies using univariate or multivariate sequences, we preprocess a dataset by pairing a power demand value (Key) with a context value (Context c ), resulting in <Key, Context [1, c] > bivariate sequences to efficiently reflect important context information to be used when training hybrid neural networks.We propose a (c, l)-LSTM+CNN hybrid forecasting model consisting of (c, l)-LSTM for extracting features from each bivariate sequence, and a CNN for ensembling these feature sets to derive a predicted profile of power demand.
Extensive experiments are conducted by dividing the dataset into two groups: with holidays and without holidays.Each group of datasets is divided into an all-days dataset, a seasonal dataset and a dataset by day of the week.We compare our proposed hybrid network with existing methods: ARIMA, (c, l)-LSTM and S2S LSTM.In particular, when we use the dataset (d 6 ) by day of the week without holidays, the proposed (c, l)-LSTM+CNN hybrid model shows the best accuracy on average by 0.82% and 0.90% in terms of MAPE and RRMSE.Specifically, the proposed hybrid model shows 77%, 63% and 36% lower prediction error (RRMSE) than ARIMA, (c, l)-LSTM and S2S LSTM, respectively.
Since the proposed hybrid model can be applied to general forecasting problems at all short-term levels of temporal granularity, it can be extended in various directions.In particular, we expect that the proposed hybrid model can also be applied to other types of time series such as indoor human behavioral patterns, 12-lead ECG (electrocardiogram) etc.As another example, the proposed hybrid model can be applied to the 15-minutely or hourly prediction of photovoltaics (PV) generation by considering various context information such as temperature, cloudiness, air quality index (PM2.5,PM10, O 3 , NO 2 , SO 2 , CO).
In addition, the importance of forecasting electricity demand in factories and houses at small-scale units with short temporal granularity is recently emerging.To deal with this issue, we are currently collecting relevant data, and plan to augment our proposed hybrid model with small-scale electric power demand forecast ability to support prosumers.In future work, we plan to extend our hybrid model to produce medium-term forecasts for horizons of lengths ranging from a few months to one to two years.
six training datasets are constructed by classifying the dataset using all days of the week, seasonal dataset, and dataset by day.The data size of d 1 is 24,948 because it consists of 4158 daily power values and five context values for each day, while the data size d 2 is 5105 because it consists of 1021 daily power values per season and four context values (excluding season) for each day.The reason that the data sizes of d 5 and d 6 vary 2680~2780 and 2240~2280, respectively is because the number of holidays included in the weekday is different.Energies 2019, 12, x FOR PEER REVIEW 6 of 18

Figure 5 .
Figure 5. Comparisons of power demand: (a) average seasonal power demand by year; (b) average power demand for weekdays and weekends by week.

Figure 5 .
Figure 5. Comparisons of power demand: (a) average seasonal power demand by year; (b) average power demand for weekdays and weekends by week.

Figure 6 .
Figure 6.An LSTM network with l layers.

Figure 7 .
Figure 7. Structure of the proposed hybrid model.The feature set output from the c LSTM networks is used as input to the CNN.

Figure 6 .
Figure 6.An LSTM network with l layers.

Figure 6 .
Figure 6.An LSTM network with l layers.

Figure 7 .
Figure 7. Structure of the proposed hybrid model.The feature set output from the c LSTM networks is used as input to the CNN.

Figure 7 .
Figure 7. Structure of the proposed hybrid model.The feature set output from the c LSTM networks is used as input to the CNN.

Table 4 .Figure 8 .
Figure 8. Power demand forecasting results for l values.

Figure 8 .
Figure 8. Power demand forecasting results for l values.

Figures 9 -
11 show the prediction results for each dataset, i.e., d 1 , d 3 and d 2 , respectively.As shown in Figures 9-11, the proposed (c, l)-LSTM+CNN hybrid model shows lower forecasting error and consequently higher forecasting accuracy compared with the ARIMA model, (c, l)-LSTM and S2S LSTM for all datasets.The proposed hybrid model shows the highest accuracy when trained on the dataset d 3 (Figure 10).

Table 2 .
Training sets used in the experiments.

Table 2 .
Training sets used in the experiments.

Table 4 .
Errors by l values for the (c, l)-LSTM+CNN hybrid forecasting model with d 1 .

Table 5 .
Errors for d 1 , d 2 and d 3 .

Table 6 .
Errors for d 4 , d 5 and d 6 .general forecasting problems with any temporal granularity.To assess the efficacy of our proposed model for 4.4.Forecasting an n-Day ProfileAs discussed in Section 1, the proposed (c, l)-LSTM+CNN hybrid model can be applied to