Forecasting of Power Demands Using Deep Learning

: The forecasting of electricity demands is important for planning for power generator sector improvement and preparing for periodical operations. The prediction of future electricity demand is a challenging task due to the complexity of the available demand patterns. In this paper, we studied the performance of the basic deep learning models for electrical power forecasting such as the facility capacity, supply capacity, and power consumption. We designed different deep learning models such as convolution neural network (CNN), recurrent neural network (RNN), and a hybrid model that combines both CNN and RNN. We applied these models to the data provided by the Korea Power Exchange. This data contains the daily recordings of facility capacity, supply capacity, and power consumption. The experimental results showed that the CNN model outperforms the other two models signiﬁcantly for the three features forecasting (facility capacity, supply capacity, and power consumption).


Introduction
The commercial electric power companies struggle to provide end-users with stable and safe electricity. Therefore, designing efficient forecasting models is a vital step for the planning of the operation of electronic power systems. The demand patterns of electricity could be affected by various factors such as time, economy, social, and environmental factors [1,2].
The power forecasting models can be classified based on their predictive ability into short-term, medium-term, and long-term. Short-term power forecasting models predict up to 1 day/week ahead and used for scheduling the generation and transmission of electricity, medium-term power forecasting models predict for 1 day/week to 1 year ahead and used for fuel preparation, long-term power forecasting models predict more than 1 year and used for developing power supply and delivery system [3][4][5].
Power demands prediction frameworks can be divided into statistical models, grey models, and artificial intelligence models [6]. In statistical models, the correlation between the inputs and the outputs is statistically figured out using empirical models such as log-linear regression models, co-integration analysis and autoregressive integrated moving average (ARIMA), combined bootstrap aggregation (bagging), and exponential smoothing. In the grey models, the researchers integrated a partial theoretical structure with empirical data to build the structure and as result, a limited amount of data are required to infer the behavior of the electrical systems. In artificial intelligence models, they learn to model complex relationships between the outputs and inputs based on the available training data. Different techniques have been applied in electricity demand forecasting including time series models [7], holt-winters and seasonal regression [8], multiple linear regression [9,10], first-order fuzzy time series [11], autoregressive integrated moving average (ARIMA) [12], seasonal ARIMA (SARIMA) [2], support vector machine (SVM) [13], support vector regression [14], Least square SVM (LSSVM) [1], and artificial neural network (ANN) [10,14,15].
For example, the work conducted by [11] used the monthly demand data from 1970 to 2009 as a training dataset and 2010 data as a testing dataset. The electricity demand in Thailand was predicted by including gross domestic product, maximum ambient temperature, and the population. These features were used as input to a neural network [10]. The authors of [14] used Bayesian regularization in the autoregressive neural network for electricity demand forecasting. Authors in [16] compared different forecasting methods such as neural networks, fuzzy logic, and autoregressive process, and as a result, they found that neural networks and fuzzy logic are more accurate that autoregressive processes. Different techniques were combined by the authors of [17] in which they combined the support vector machine, neural networks, autoregressive integrated moving average, and generalized regression neural network. The authors of [18] proposed a traditional feed-forward neural network with one output node that can predict peak load in one hour or one day in the future. Also, the researchers explored radial basis function networks [19], recurrent neural networks [20], and self-organizing maps [21].
In this paper, we studied the performance of deep learning models such as convolution neural network (CNN) and recurrent neural network (RNN) for electricity demand forecasting. Unlike previous works [28] that include year and month index to electricity consumption, we use only the daily demand as an input to our predictive models. We designed different deep learning models such as convolution neural network (CNN), recurrent neural network (RNN), and a hybrid model that combines both CNN and RNN. We applied these models to the data provided by the Korea Power Exchange. We built an independent model for each feature namely facility capacity, supply capacity, and power consumption. The experimental results showed that the CNN model outperforms the other two models significantly for the three features forecasting.

Materials and Methods
In this section, we introduce the dataset used for this study and the design of the proposed models.

Materials
In this paper, we used the data from the Korea Power Exchange. It contains recordings from 2003.01.01 to 2020.05.22. The available measurements were taken daily in this dataset-we have 6352 records. The available information is facility capacity, supply capacity, maximum power consumption, supply reserve, and supply reserve ratio. Here, we are interested in predicting the future demands of facility capacity, supply capacity, and maximum power consumption. The statistical overview of these features is given in Table 1. We split that dataset into three parts. The training data contains the records within the first 11 years (from 2003 to 2013), The validation data contains the records within the following three years (from 2014 to 2016). Testing data contains the consumption within the remaining years (from 2017 to 2020). Figure 1 shows the patterns of the available features in the dataset within the study period.

Data Preprocessing and Preparation
The dataset was normalized to unit norm. To prepare the training, validation, and test sets, we studied different scenarios by varying the history to be used for predicting future demands. We tested different values of the history for predicting different future demands. The history values were set to 7-days,15-days, 30-days, 45-days, and 60-days in the past. On the other hand, we tried to predict future demands after 1-day, 7-days, 15-days, 30-days, 45-days, and 60-days. As a result, we had to study 5 × 6 = 30 possible cases to see the best performing model that we can use for future demands prediction. Figure 2 shows an example of preparing the dataset for 15-days in the past to predict the demands 7-days ahead.

The Proposed Models
We designed different deep learning models, namely convolution neural network (CNN) [29], recurrent neural network (RNN) [30], and a hybrid model that combines CNN with RNN. We aimed to find the best performing model according to the past and future data of our dataset.
CNN model is a type of deep neural network that is utilized in various domains. It is also considered as a shift-invariant model as they have translation invariance characteristics and shared-weights architecture. CNN models were widely and successfully used in different areas such as image and video classification, medical image analysis, natural language processing, bioinformatics, and time-series. In this work, we designed a simple two-layer CNN model as shown in Figure 3. Each layer consists of a 1-dimensional convolution layer with 32 filters and a filter size of 3, a non-linear activation function which is the rectified linear unit (ReLU), and a max-pooling layer with window size and stride of 2. The learned features from these two layers are then fed into a fully connected layer with a one-node for prediction. The convolution layer is a 1-d convolution expressed in Equation (1) where I is the input, k, and o are the indices of kernels and the output position, respectively, W f is the weight matrix of S × N shape with S filters and N channels. RNN is another typical deep learning model that is mainly used in natural language processing and speech recognition [31,32]. It is used to understand the data's sequential behavior and predict the next likely outcomes [33]. In this paper, we have used a one bidirectional long short-term memory (LSTM) layer with 16 nodes followed by a dropout layer [34] with a dropout probability of 0.2, and a fully-connected layer with one node for prediction. Figure 4 shows the architecture of the RNN model. Bi-LSTM has been used in different areas such as phoneme classification [35], speech recognition [36], human action recognition [37], and machine translation [38]. Different gates are available in the LSTM cell. The input gate is used to decide which information should be stored for the next layer and update the current state. The forget gate is used to decide the information that should be removed according to the previous inputs. The output gate decides which part of the state value should be output. Thus, considering an input sequence {x} T t=1 , the LSTM has cell states {C} T t=1 , hidden states {h} T t=1 and outputs a sequence {o} T t=1 . This can be expressed mathematically by Equation (2) where weight matrices and b o , b c , b i , b f are the biases. Sigmoid and Tanh are the activation functions. The is the element-wise multiplication.  For all of these models, we used the grid search algorithm for hyper-parameters tuning. We used Keras framework for building and training the proposed models (https://keras.io/). The number of the epochs was set to 40 with early stopping based on validation loss. The RMSprop optimizer was used for optimization with learning rate of 0.001 [39].

Results and Discussions
In this paper, we used mean absolute error (MAE) and R 2 in order to evaluate the performance of the proposed models. These parameters were calculated using the Scikit-learn tool (https://scikitlearn.org/stable/).
We predicted future demands of facility capacity, supply capacity, and power consumption. We tested the three developed tools namely CNN, RNN, and the hybrid model using different values of histories and futures. Table 2 shows the best performing model with its past and future configurations for the three features of the study. The '-' sign in Table 2 means that the model did not fit the data at all. CNN model significantly outperforms RNN and hybrid models. We have extensively searched for the best hyperparameters and architectures of RNN models but these models did not converge. The main reason is that the size of the training dataset is not large enough to train the RNN model. Thus, our future work will concentrate on collecting more data from Korean Power Exchange to train more accurate forecasting models. Therefore, we will consider the CNN model for power demands forecasting. Figure 6 shows the heat maps of the R 2 results for all past against future configurations in the CNN model. For facility capacity forecasting, we can see that the CNN model performs outstandingly for all combinations. The best R 2 value is 0.992 for the past of 7 days and the future of 1 day. On the other hand the minimum R 2 is 0.820 for the past of 7 days and the future of 15 days. Thus, we can use the developed CNN model for facility capacity forecasting using different past values for predicting different future values.  For supply capacity forecasting, we can see that the CNN model performs well for some combinations only, forecasting up to 15 days. The best R 2 value is 0.851 for the past of 7 days and the future of 1 day. Thus, we can use the developed model for short-term forecasting. The model can forecast with 0.69 of R 2 for the past of 7 days and the future of 7 days. Furthermore, the R 2 is 0.43 for the past of 7 days and the future of 15 days.
For power consumption forecasting, we can see that the CNN model performs well for some combinations only. The best R 2 value is 0.772 for the past of 60 days and the future of 1 day. Thus, we can use the developed model for short-term forecasting. The model can forecast with 0.68 of R 2 for the past of 45 days and the future of 7 days. Furthermore, the R 2 is 0.59 for the past of 45 or 60 days and the future of 15 days.
Furthermore, the MAE of the CNN model outperforms the MAE results of the other models. For instance, the MAE of facility capacity feature is 0.025. On other hand, the MAE of the supply capacity of the CNN model is better by 0.265 than the hybrid model. Similarly, the MAE of the CNN model in power consumption features is better by 0.095 of the hybrid model.  In addition, we visualize the performance of CNN model for the three features in a sample future interval as shown in Figures 7-9, respectively. It can be seen that the predicted results of facility capacity and power consumption follow the observed values. On the other hand, the predicted results of supply capacity are not always following the observed trend. In order to see the real performance of the proposed model, we compared it with a support vector machine (SVM) and artificial neural networks (ANN). SVM was chosen as a benchmark model because previous researchers have proven that SVM can produce satisfactory performance across various power demands forecasting [40][41][42]. For a fair comparison, we also performed a hyperparameter search for the penalty factor and gamma using grid search and found that the best performing penalty factor and gamma are 1 and 0.001 for the facility capacity feature, 0.001 and 1 for the supply capacity feature, and 100 and 0.001 for the power consumption feature.
ANN models have been used by many researchers and showed good performance such as [10,14,15]. We have also performed a grid search for hyperparameter optimization. We designed a two-layer ANN model where the first layer has 32 nodes followed by ReLU as a non-linear activation function. Then a dropout layer with a drop rate of 0.5 was added. The second layer has one node for prediction. Table 3 shows the comparison results between the proposed model and SVM and ANN models. It can be seen that the CNN model outperforms ANN and SVM in facility capacity and supply capacity features. However, in power consumption, SVM performs slightly better. It is known that deep learning models require big datasets for training them therefore, our future work will include collecting large datasets for more accurate forecasting models.

Conclusions
Power demands forecasting is a challenging topic and important for future planning. In this paper, we introduced the utilization of different deep learning models for future demands forecasting for facility capacity, supply capacity, and power consumption. Different deep learning architectures were studied, namely CNN, RNN, and the hybrid model that combines CNN with RNN. The experimental results show that the CNN model outperformed RNN and hybrid model significantly. Furthermore, we compared the performance of the CNN model with SVM and ANN models. The comparison results showed that CNN performs generally better. The developed CNN model is a short-term power demand forecasting model as it cannot forecast more than one day. The future plan is collecting more training data from the Korea Power Exchange in order to train a more robust forecasting model that can perform mid-term to long-term power demand forecasting.
Author Contributions: T.K. and D.Y.L. and H.T. prepared the dataset, conceived the algorithm, and carried out the experiment and analysis. T.K., D.Y.L., and H.T. wrote the manuscript with support from K.T.C. All authors discussed the results and contributed to the final manuscript. All authors have read and agreed to the published version of the manuscript.