Single Residential Load Forecasting Using Deep Learning and Image Encoding Techniques

: The integration of more renewable energy resources into distribution networks makes the operation of these systems more challenging compared to the traditional passive networks. This is mainly due to the intermittent behavior of most renewable resources such as solar and wind generation. There are many different solutions being developed to make systems ﬂexible such as energy storage or demand response. In the context of demand response, a key factor is to estimate the amount of load over time properly to better manage the demand side. There are many different forecasting methods, but the most accurate solutions are mainly found for the prediction of aggregated loads at the substation or building levels. However, more effective demand response from the residential side requires prediction of energy consumption at every single household level. The accuracy of forecasting loads at this level is often lower with the existing methods as the volatility of single residential loads is very high. In this paper, we present a hybrid method based on time series image encoding techniques and a convolutional neural network. The results of the forecasting of a real residential customer using different encoding techniques are compared with some other existing forecasting methods including SVM, ANN, and CNN. Without CNN, the lowest mean absolute percentage of error (MAPE) for a 15 min forecast is above 20%, while with existing CNN, directly applied to time series, an MAPE of around 18% could be achieved. We ﬁnd the best image encoding technique for time series, which could result in higher accuracy of forecasting using CNN, an MAPE of around 12%.


Introduction
In power systems, the whole energy production within a period of time should be always equal to the whole energy consumption, whether consumed in loads or lost in carrier assets like transmission lines [1]. In any conditions that this balance is not met for a while, the system would face security challenges or even collapse [2,3]. There are two types of variables in the system, controllable and uncontrollable parameters. The latter brings uncertainty to the system, which makes the operation and control challenging [4]. High voltage (HV) transmission systems have many established control strategies to ensure power balance and normal operation. These strategies are mainly applied by means of big power plants, which are capable of performing load frequency control. The amount of demand is also forecast to provide an input for proper power dispatching.
Forecasting electricity demand is performed in different time horizons: short, medium, and long term load forecasting. Figure 1 classifies load forecasting into different types from the time horizon perspective and geographic area or level aspect. Methods for medium or long term load forecasting can be classified into two types of models as end-use models and econometric models. The former collects statistical aggregated data of electricity demand from residential, commercial, and industrial sectors and explains energy demand as a function of the number of consumers in the market. The latter model estimates the relationship between the energy consumption and the economic factors that influence the energy consumption, and it combines economic theory and statistical techniques (e.g., linear regression or time series methods) [5]. Short term load forecasting (STLF) has been used since 1920 mainly for the daily energy supply routine [6]. It is needed for economic energy dispatch, unit commitment, and energy trade in the competitive market. The time horizon for STLF varies from intervals within the next hour, day-ahead, or one week.
There are many different forecasting algorithms that can predict demand at the transmission level with a very high accuracy. This is mainly for two reasons; firstly, the loads at this level have a large amount of aggregated values with slow changes, secondly, the loads are the demand of distribution grids that are mostly passive networks with a low level of distributed generation (DG). However, the integration of DG, especially from intermittent renewable resources, makes forecasting challenging.
The main reason that the variable renewable energy (VRE) has the potential to create operational challenges in the system is its variability. This may even result in low probability, but high impact events in power systems. The intermittent behavior of VREs like wind generation or solar production brings uncertainty for the power system, as the predictability of such generation is quite low.
As VRE generators are either uncontrollable or difficult to control promptly, it is necessary to exploit the flexibility provided by storage systems [7], another vector of energy like district heating or gas network, and the demand side. Otherwise, distributed generations may need to be curtailed or the distribution network would demand more power beyond the scheduled profile. As a less costly solution, demand side management (DSM) is attracting attention, and it is becoming a more common energy management paradigm.
In this regard, we address a very important prerequisite of demand response (DR) as a class of DSM mechanisms, which is load prediction; in order to intelligently and efficiently set the input parameters of DR, we need to have an estimation of load behavior.
Considering the importance of DR and the share of residential loads in distribution system demand, we aim at studying forecasting solutions for single residential loads. Reviewing the literature, it is found that there are not very accurate solutions to forecast such loads or the methods are not as developed as those applied to aggregated loads like buildings or at higher voltage levels like substations.
Within the context of load forecasting of single households with high volatility, we propose the integration of time series with image encoding methods into a convolutional neural network (CNN). As per forecasting, time series are currently fed to CNN directly, which implies a one-directional first layer. In our proposed method, image encoding techniques would make the first layer two-dimensional (2D), which demonstrates a higher accuracy. We apply three different image encoding methods to time series of historic load data including the recurrence plot, Gramian angular field, and Markov transition field. We introduce the one with the higher accuracy in the case of integration into CNN. This is not only a new contribution to the limited number of existing methods of single residential load forecasting, but also improves the accuracy of load forecasting of such small scale volatile loads.
Experiments on the load dataset confirm the improvement in the forecasting results. In comparison with other well established methods, Support Vector Machines (SVM) [8], Artificial Neural Network (ANN) [9], and 1D-CNN [10], the proposed method gives more accurate values. Figure 2 illustrates a high level scheme of the application of single residential load forecasting in demand response; once historic data of load power consumption (e.g., two year data) are collected, the model can be trained at the aggregator or an agent's server. The trained model/algorithm can be embedded in a smart home to predict the next 15 min of load using the previous 24 h of measured consumption. This predicted value can be used by the aggregator to set appropriate demand response signals to be sent to the smart home for possible actions on smart appliances.  The rest of this paper contains the following sections: Section 2 reviews the impacts of renewable energy integration into distribution systems and the increasing need to estimate the load to manage the demand side. A literature review of load forecasting methods, especially those that predict residential loads, is presented in this section. In Section 3, some different image encoding techniques applied to time series, as well as the deep learning model are introduced, and the methodology of our proposed solution is presented. To evaluate the performance of the new solution, we applied it to a well known Boston housing dataset and a set of historic data of active power consumption of a single residential household. In Section 4, the performance of our proposed solution is compared with three existing methods, and in Section 5, the paper is concluded with some short remarks.

Background
In this section, the impacts of variable renewable resources are detailed, the importance of demand response is briefly introduced, and the contribution of load forecasting in demand response is discussed. Some existing load forecasting methods are then presented to address the current gap in the field.

Renewable Energy Integration
Integration of intermittent renewable energy resources as distributed generation in distribution systems has shifted these systems from a passive paradigm to active networks [7]. The demand from the HV system perspective is no longer pure loads at consumption centers and distribution networks. It is actually the net demand, which is the difference between consumption and local distributed generation. Considering variable renewable energy (VRE) resources as distributed generation, the net consumption would have higher uncertainty and hence be more difficult to forecast. It also becomes more crucial to predict such demands as any dramatic changes in environmental conditions like solar irradiation, weather, temperature, and wind speed would change the net demand profile very fast, which may not be captured promptly by system control and management systems. This could result in power imbalance [11], low power quality, or voltage violations. If a similar scenario occurs in neighboring distribution networks, the aggregation of effects would even initiate some frequency deviation in the upstream transmission systems.
For the VREs with a very limited reactive control, such as fixed speed induction wind turbines or small PV inverters, a lack of reactive power also increases voltage instability [12]. This occurs because of the variations of wind speed and the irregular solar radiation with time.
In the transient or short term, from milliseconds to seconds, high penetration of VREs would also present challenges for system stability. For the renewable energy sources with a very limited reactive control, such as fixed speed induction wind turbines or small PV inverters, a lack of reactive power increases voltage instability. This occurs because of the variations of wind speed and the irregular solar radiation with time. Voltage fluctuation can make problems for the sensitive electric and electronic devices and can decrease their useful life time.
VREs can also impact system voltage indirectly: electronic devices and non-linear appliances usually inject harmonics to the network, which are minimized by current control systems on conventional power generators, while VREs' incapability of absorbing such harmonics can potentially cause voltage distortion [13]. From the other side, active power control is not widely used by wind and solar generators currently, and this diminishes system inertia. Low inertia consequently results in frequency control deterioration and eventually frequency instability. In the large bulk power plants, the rotating inertia reduces fluctuations in case there are disturbances to the grid; however, VRE generators have a low frequency response capability and, in some cases like PV, no inertia, as well as no frequency response is provided.
In the medium term, from some minutes to days, power system balance is threatened by the increasing share of VRE. If system flexibility is low and there is no ramping reserve from generators, any dramatic changes in weather conditions would lead to power imbalances by VREs. High penetration of VRE increases the need for greater flexibility in the rest of the system to allow economic dispatch. This is because overcoming the uncertainty of the net load (total load minus production of VRE) requires more load following reserves due to the low predictability of VREs' output.
In long term planning, months to some years, the adequacy becomes challenging. Maximum generation of VRE does not necessarily match peak demand periods, and it even often occurs during periods of relatively low demand, which can cause over-generation. System adequacy is also affected by the location dependency of VRE since the best places for wind or solar power production may be located far from the load centers and even existing transmission lines. This can cause grid congestions. When dispatchable generation is replaced with high amount of VREs, system reliability will also be affected in the long term [14]. Moreover, encouraging investors with incentives and the capacity market mechanism brings new challenges as conventional high cost generators will gradually lose their attraction.

Demand Response
Demand side management applies some mechanisms to change the load profile of customers, aligning it with the generation power profile. These actions may shift or shave load curves either directly or through sending suggestion signals to consumers.
Demand response (DR) is a class of DSM mechanisms that reschedules users' energy pattern usage according to the responses to electricity price or the incentives of the power utility [15]. DR keeps consumer in the loop and makes them aware of energy issues. DR benefits customers, the utility, and also society. Customer benefits include electricity demand satisfaction, reducing electricity bills, improving the life cycle of equipment and productivity. Some of the benefits for utilities are lowering the cost of services, improving the operating efficiency and flexibility, and improving customer service. Societal benefits mainly include environmental degradation reduction and resource conservation.
In [16,17], incentive based and price based DR schemes were discussed. In [18], incentive based schemes were recommended as more effective solutions for demand peak shaving, and the work in [19] showed how customers would be encouraged to participate in load shifting by means of rewards and how this could eventually lead to voltage profile improvement.
In the DR process, the system operator has low or no information about the load characteristics or operating regime of the appliances, especially for domestic and thermoelectric devices. Therefore, it needs to forecast the loads for the time of response so that it can make optimal decisions to control or shift the energy of appliances.
In order to set the input parameters of DR intelligently and efficiently, we need to have an estimation of the load behavior. This implies the necessity of predicting the load consumption during time to support adjusting incentives or tariffs [20,21].

Load Forecasting in Distribution Systems
Some of the different existing methods that have already been applied to forecast electric loads are reviewed in this subsection. Some methods are intended to forecast loads at aggregated levels like the substation or building levels, while some methods try to capture the consumption behavior of single customers. There are also methods to predict the consumption behavior of single appliances like air conditioners or highly demanding electric loads in distribution systems such as heat pumps. Figure 3 summarizes different types of models and algorithms used in load forecasting. The authors in [22,23] proposed some accurate methods to forecast aggregated loads at the substation level. The work in [24] presented some algorithms based on machine learning to improve the forecasting accuracy of the aggregated loads. In [25], the authors proposed a new approach to load forecasting in the presence of active demand, based on a decomposition of the load into its components. In this study, the aggregated load of about 60 consumers from an Italian LV network was used to demonstrate the performance of the proposed approach. The work in [26] presented a probabilistic load forecasting model in which weather forecasting uncertainty was used. The artificial neural network (ANN) and the probabilistic temperature forecasts were used to build this probabilistic normal load forecasting model. Although this model could achieve satisfactory performance with a low average mean absolute percentage error, it was developed to forecast loads at the building level including several customers. The work in [27] also presented a more accurate method based on deep learning algorithms for day-ahead building level load forecasts.
There are also some forecasting methods that aim to predict the consumption of large appliances or high consumption devices like boilers for district heating systems [28], heat pumps [29], electric vehicles at the station columns [30], etc. Different methods have been introduced in the literature for improving the accuracy of the load forecasting of single devices: data partitioning techniques in [31], or machine learning techniques, probability theory, and statistics in [29] for the prediction of electricity load consumption of heat pumps.
Considering the importance of DR and the share of residential loads in distribution system demand, we aim at studying forecasting solutions for single residential loads. There are many appliances at the household level whose load would be managed if it were predictable. This not only benefits system operators, but also adds value to home energy management systems, user awareness applications, and non-intrusive load monitoring (NILM) algorithms [32]. Reviewing the literature shows a limited number of contributions in proposing advanced methods for single residential load forecasting. The accuracy of the existing methods applied to single loads are also low compared to the ones applied to the aggregated loads. This is actually due to the high volatility of single residential loads; a few home appliances with different duty cycles whose switching would suddenly jump up or down the load profile. Switching on an electric oven or a dishwasher may increase the total load from 30% to 70% of the rated power. The load volatility of residential customers has an inverse correlation with the number of appliances of the household.
The fewer the appliances, the higher the volatility in the load curve could be expected, and this makes forecasting more difficult.
In [33], the authors described the accuracy of load forecasting at different levels of aggregation. They showed that aggregation of loads would improve the short term forecasting, although the performance improved until a point beyond which no further improvement could be seen. This study proved that short term load forecasting of single loads, for example a household demand, was much more difficult than predicting the consumption of a residential building with several households.
The authors in [34,35] used deep learning techniques to forecast single residential loads. The former extended a load forecasting approach based on deep long short term memory (LSTM) with automatic hyperparameter tuning to cope with the high volatility of single loads. The latter, instead, presented an LSTM recurrent neural network (RNN) method to improve the accuracy of single load forecasting.
The authors in [36] proposed a method based on support vector regression (SVR) modeling to forecast the energy consumption of single residential customers. The data granularity in their study was daily and hourly, and the performance of the model in terms of accuracy was quite low: the lowest mean absolute percentage of error (MAPE) was 12.78% for daily forecast and 23.31% for hourly forecast.

Methodology
Proposed methodology is discussed in this section. First, the time series forecasting problem is formulated in Section 3.1. Deep learning methods and structures are discussed in Section 3.2.
Time series image encoding methods are presented in Section 3.3. Finally, the proposed method based on time series image encoding and deep neural networks is discussed in Section 3.4. The conceptual diagram of the proposed method is depicted in Figure 4.

Time Series Prediction
Time series can be defined as a sequence of vectors or scalars that depend on time. Time series of vectors can be formulated as the following equation.
where t i is the i th time index. A sample sequence of the load consumption profile is shown in Figure 5. This sequence was extracted from dataset described in Section 4.3. Global Active Power (kW) One of the important tasks in time series is performing forecasting. This task involves predicting the next values of the time series x(t). The problem of predicting the next values can be solved using the past values of the time series. From this point of view, the future value can be predicted based on the model that is formulated in (2).
where F is the prediction function and i is the time index.

Deep Learning
Deep learning or multiple layer neural networks are a class of machine learning algorithms that are used in regression, clustering, and classification problems. These models are commonly used in audio recognition, image processing [37], machine translation, computer vision [38], medical image segmentation [39,40], and similar applications. In general, deep learning outperforms the traditional machine learning techniques in case the size of data is large, the model is complicated like image classification, and high performance computing is available.
Recently, some of deep learning models including the convolutional neural network (CNN) and recurrent neural network (RNN) [41] have also been applied to the prediction of time series. As the nature of the time series used in forecasting is one-dimensional, one-dimensional CNN (1D CNN) is typically used for regression and forecasting problems. However, the capability of CNN to demonstrate higher performance is beyond applying it to one-dimensional data.
CNN is comprised of convolutional, pooling, and fully connected layers. Convolutional layers can be designed in a 1D or 2D manner. In each convolutional layer, the input data are convolved with a number of filters. Values in the filter are equivalent to weights in neural networks that should be updated during the training procedure based on algorithms like back-propagation. After each convolutional layer, activation functions are applied to add nonlinearity to the system. Pooling layers are used to reduce the spatial dimension in order to control the number of the parameters and overfitting. Finally, fully connected layers map the feature maps to the desired output.
To leverage the capability of CNN, we developed our forecasting method using a two-dimensional convolutional neural network (2D CNN). In this case, the input data to the input layer of the model should be two-dimensional. The CNN model consisted of one convolutional layer and one max pooling layer with 32 filters and a kernel size of 4. After the convolutional layer, we modeled a fully connected layer with 64 nodes. All activation functions were set to be the rectified linear unit (ReLU). Figure 6 shows the structure of the CNN network used.

Time Series Image Coding
As deep networks have a high capability to recognize and digest images [42], if time series are encoded as images, the performance of the classification and forecasting of them using deep networks would be improved. There are several methods to encode time series into images, including recurrence plots (RP) [43], the Gramian angular field (GAF), and the Markov transition field (MTF) [44]. In the following subsections, these methods are investigated and applied to encode individual household power consumption data as images.

Recurrence Plots
Recurrence plots [45] are generated if the first states in the phase space trajectory of the time series sequence are computed according to the following equation: Then, recurrence plots are formed using the following equation: where R is the recurrence plot and dist is the distance function that can be calculated based on the Euclidean distance. Figure 7 shows the RPs for two different sample times. The dataset used here is described in Section 4.3. As can be seen in this figure, different times ended with different encoded images containing different features.

Gramian Angular Field
The Gramian angular field is a method of encoding time series into images. This method uses polar mapping to map time series data into a polar plane (see Figure 8b). To generate the Gramian angular field, first, the given time series should be rescaled in the interval [1, −1] following the equation below [46] Then, the rescaled data are represented in polar coordinates using the following equation: After transforming the rescaled time series data, the Gramian angular summation field (GASF) and the Gramian angular difference field (GADF) are defined by considering the sum of or difference between each two point pair following the equations below:   Figure 8a. Figure 9 shows GASFs for two different sample times. The dataset described in Section 4.3 was used here to generate data. As can be seen in this figure, different times ended with different encoded images containing different features.

Markov Transition Field
The Markov transition field is defined as the following matrix [46]: Dividing the data magnitude into Q quantile bins would generate the Q by Q matrix. The quantile bins containing the data at time step i and j are q i and q j . w ij in the matrix is the transition probability of q i → q j . Encoding procedure of Markov transition field is depicted in Figure 10.

Proposed Method
In Figure 4, the block diagram of the proposed method is depicted. At first, time series data are encoded into an image, then the generated image is fed into deep CNN to predict the next value of the time series data. Details of the deep 2D CNN used are illustrated in Figure 6.
CNN network that was used in this study consisted of one 2D convolutional layer, one max pooling layer, and two dense layers. The convolutional layer was comprised of 32 filters with a kernel size of 4. The first fully connected layer consisted of 64 nodes. The last fully connected layer had one node to provide the final estimated value. The rectified linear unit (ReLU) was used as an activation function for both convolutional and dense layers. RMS optimizer was used to compile the model. The maximum number of epochs was set to 100 in the experiments. Eighty percent and 20 percent of all data were used as the training and test sets, respectively.

Experiments and Results
In this section, the evaluation criteria are defined in Section 4.1. The datasets used are described in Section 4.2 and Section 4.3. The conducted experiments and comparison results to evaluate the proposed method are discussed in Section 4.4.

Evaluation Criteria
To evaluate quantitatively, three common criteria were used: mean absolute error (MAE), root mean squared error (RMSE), and mean absolute percentage error (MAPE). These metrics are defined in Equations (10)- (12).
whereŷ and y are the predicted and true values of the test samples, respectively, and Nis the total number of test samples.

Boston Housing Dataset
Boston housing price data [47] were used for the comparison of the four different signal to image transformations discussed in Section 3.3. This dataset is available online [48]. Results are summarized in Table 1. Results clearly proved that the recurrence plots could be the best choice in terms of accuracy improvement, so in the next experiment, RPs were used to encode time series into images.

Load Forecasting Dataset
In this paper, the historic data of a single residential customer from [49] is used to assess the performance of the proposed method comparing to other methods. This dataset contains 2,075,259 samples measured by a smart meter installed at a house near Sceaux (7 km from Paris, France). The samples were periodically retrieved and recorded from December 2006 to November 2010 (47 months).
We down-sampled the data to 96 samples per day, for every 15 min. This is because in Advanced Metering Infrastructure (AMI), data are typically collected as measurements from the meters, or pushed as commands to devices every 15 min to ensure a wealth of granular data exchange for demand response service. There were some missing values in this dataset, which we replaced by the average of available measurements at that time section in other years [50]. Down-sampled data in conjunction with original high resolution data are plotted in Figure 12.

Results
The performance of the proposed method was evaluated in comparison with three different methods: support vector machine (SVM) [8], artificial neural network (ANN) [9], and convolutional neural network (CNN) [10]. The proposed method based on three different image encoding methods was evaluated, and based on the Boston housing dataset results, recurrence plots were used in this experiment.
The proposed algorithm and also the other methods to compare were implemented using Python, Tensorflow, sklearn, and pyts. CNN consisting of one convolutional layer, one pooling layer, and one fully connected layer was used in this study. More details of the proposed method are discussed in Section 3.4.
In Figure 13a,b, the predicted and true values of test samples are compared. These figures demonstrate that the 2D method could forecast the next values with a higher accuracy than the 1D CNN. Table 2 shows the comparison of the proposed method, 1D CNN, ANN, and SVM based on MAE, RMSE, and MAPE. As shown, the proposed method outperformed the compared methods in terms of the defined criteria.

Conclusions
In order to apply demand response properly and efficiently, single residential loads should be also predicted. Reviewing the literature, a limited number of methods have been presented for load forecasting at the individual household level. The performance of these methods in terms of accuracy was also quite low compared to their peers in aggregated levels such as building levels or substation levels. The lowest MAPE, reported in the literature, was 12.78% for daily forecast and 23.31% for hourly forecast. Applying CNN was proven to improve the performance, especially for 15 min load forecasting. However, deep networks like CNN showed a high ability to understand images. We proposed a CNN based method that encoded one-dimensional time series into images, firstly. We showed that the RP image encoding technique would result in the lowest MAPE compared to the GASF, GADF, and MTF methods.
Results based on MAE, RMSE, and MAPE criteria showed that our proposed method using RP performed better (MAPE of 12.54%) in comparison with one-dimensional CNN and other classic machine learning methods, namely SVM and ANN. The proposed method outperforms its 1D counterpart by about 20% in terms of RMSE. However, it should be noted that generating image encoded data would increase the computational complexity of the proposed algorithm, which is a limit for real-time applications.
Author Contributions: conceptualization, A.E. and R.R.; methodology, A.E. and R.R.; software, A.E. and R.R.; validation, A.E. and R.R.; writing-original draft preparation, A.E. and R.R.; writing-review and editing, A.E. and R.R.; visualization, A.E. and R.R. All authors have read and agree to the published version of the manuscript.