M-SRPCNN: A Fully Convolutional Neural Network Approach for Handling Super Resolution Reconstruction on Monthly Energy Consumption Environments

: We propose M-SRPCNN, a fully convolutional generative deep neural network to recover missing historical hourly data from a sensor based on the historic monthly energy consumption. The network performs a reconstruction of the load proﬁle while keeping the overall monthly consumption, which makes it suitable to effectively replace energy apportioning systems. Experiments demonstrate that M-SRPCNN can effectively reconstruct load curves from single month overall values, outperforming traditional apportioning systems.


Introduction
Before the arrival of smart meters, it was common for Energy Management Systems (EMS) to store energy consumption information in a monthly resolution. This is the de-facto scenario in historical old data or under scenarios where only the monthly consumption invoice data are available. Knowing previous energy consumption in detail is desirable to enable wider-range energy studies like forecasting, appliance identification and virtual submetering; or to improve the data augmentation quality usually made up by EMS, which frequently rely on apportioning the energy proportionally during the measurement period.
The problem of expanding a single value into several values is known as a superresolution (SR) problem, which was first proposed for images [1]. Likely, this idea was also applied for energy consumptions to increase the frequency of the measurements. Formerly, the study field name was Super Resolution Perception (SRP) or Super Resolution Reconstruction (SRR) [2]. The best approaches to SRP are achieved by Machine-Learning techniques, more specifically, by Deep-Learning techniques.
Regarding the literature in the field, the reconstruction of hourly load profiles given the monthly load profiles is a novel topic that has not yet been addressed sufficiently using deep learning techniques. This paper focuses on using an artificial neural network to reconstruct the hourly load profile from a single extremely compressed monthly value, which supposes a reconstruction ratio of 1:744. Our proposal takes into consideration the requirement of keeping the overall monthly value for the prediction intact, matching the original monthly value, which allows the proposed method to be used as a replacement for energy apportioning systems. We propose to apply a fully Convolutional Neural Network to solve the problem, which we named M-SRPCNN (Monthly-Super Resolution Perception Convolutional Neural Network). The contributions are as follows: • We are among the first to provide a deep-learning method to up-sample monthly energy consumption measures into hourly resolution.
• A network architecture for energy SRP is provided, which results in a reconstruction ratio of 1:744 (month to hour). • The paper also proposes an additional network layer that keeps the total predicted value matched with the original monthly value during training and inference. • We provide a description and feature engineering approach for fitting into a wide resolution range of ratio reconstruction. • We present a comparison with standard interpolation methods to show the superiority of the proposal.
The paper is structured with an introduction to the problem and related work in Section 1, preliminaries and datasets for the problem in Section 2, a methodology section explaining the pre-processing steps, data description, feature engineering, definition of M-SRPCNN architecture with evaluation and metrics in Section 3, experiments and results in Section 4 and a conclusion with future projects in Section 5.

Related Work
The first low-to-high resolution reconstruction problems were tackled in the field of Computer Vision. Some of the first approaches relied on data interpolation methods that preserved maximum edge information in the reconstruction [1,3,4]. Since 2002, methods based on sparse representations have appeared [5][6][7][8]. Since 2015, the improvement of deep-learning generative models led to successfully surpass the accuracy and visual perception of the image reconstructions in Single Image Super Resolution (SISR) problems. Wang et al. [9] proposed a Convolutional Neural Network (CNN) based on the idea of sparse representations; Dong et al. [10] proposed SRCNN, demonstrating that a single deep CNN can replace the sparse-coding-representations traditionally used in SR; Ledig et al. [11] proposed SRGAN, a Generative Adversarial Network (GAN) model based on a perceptual loss over an already trained ResNet. For energy time series data, SRP or SRR is an emergent field that gained attraction recently even though research has yet to be published. Liu et al. [2], Liang et al. [12] formulated the SRP problem for energy and proposed SRPCNN, a 1-D fully CNN based on SRCNN [10] architecture adapted for SRP; and Li et al. [13] proposed to treat energy consumption as images and use a GAN network. At the same time, Kukunuri et al. [14] proposed to use a CNN with a final fully connected (FC) layer to up-sample lower resolutions with a reconstruction ratio of 1:24 (day to hour). Lu and Jin [15] proposed to use a CNN and a GAN that respects the overall value in the prediction, Liu et al. [16] proposed to use SRP to reconstruct missing values using a CNN called SRPCNN. Zhang et al. [17] proposed to treat consumption as images and use a GAN called SRGAN to reconstruct the higher resolution load profile, Zhang et al. [18] proposed a GAN called DISRGAN applied to photovoltaic plants treating the consumption as images, Ren et al. [19] proposed a CNN to upsample low resolution sources into high resolution sources and Wang et al. [20] proposed to apply a Graph CNN (GCN) for spatial-temporal convolutions by modeling consumption data as graphs in order to reconstruct higher resolutions from lower resolutions. Other non-linear modeling approaches that proved to be successful in modeling systems in various fields were proposed like Pozna and Precup [21], Zall and Kangavari [22], Hedrea et al. [23]; also, the works of Ahmed et al. [24], Precup et al. [25], Yuhana et al. [26] obtained good results in the topic. However, none of the mentioned approaches tackle the scenario where a single total monthly measurement is available, which requires a reconstruction ratio of 1:744 (month to hour).

Motivation
Historical consumption data are a source of massive information that can help us design efficient energy consumption systems. Studying this information allows us to build models that better understand inherent consumption data and to make predictions based upon it. This is especially useful in load forecasting and in study fields like Non-Intrusive Load Monitoring (NILM) [27], as shown by Liu et al. [2]. Most algorithms require enough resolution in historical data measurements to be able to perform correctly. This excludes from the studies all historical data prior to a certain date when monitoring devices did not provide such resolution, in certain places where they faced bandwidth transmission problems or in certain storage situations where the capacity didn't allow for full storage of the data. Even today, it is common for smart meters and energy Internet-Of-Things (IoT) devices to not have enough storage capacity, which relies on sending measurements to an EMS or to accumulate them as a single totalized measurement by overwriting their memory until the end of the month.
SRP enables to fill measurements gaps in load profiles or missing historical data in consumers lacking of high resolution load profiles, which can be useful to enable studies like energy disaggregation or energy forecasting. In 2020, with the formulation of SRP [2,12], it was also demonstrated that reconstructed data from low resolution contexts can be successfully used in NILM studies, but lacked testing with enough of a reconstruction ratio to cover a single month's overall measurement. Since historic data is more likely to exist in a monthly form, in this paper, we present a novel approach to address the restoration of a load profile in hourly resolution from a load profile in monthly resolution by modelling the problem in a deep neural network, demonstrating that it is possible to estimate a general hourly load profiles based on the month values. To the best of our knowledge, this is the first study proposing up-sampling from monthly resolutions to hourly resolutions.
The low-resolution consumption L and the high-resolution consumption H can be defined as down-sampled versions of S due to a degradation process with different α resolution scales as shown in Equation (2).
where A L ∈ R tα L ×tα and A H ∈ R tα H ×tα are the down-sampling matrices over S for L and H respectively. where α L ∈ Z ≥1 and α H ∈ Z ≥1 are the resolution scales of L and H respectively, and where n L ∈ R and n H ∈ R are an additive down-sampling noise of L and H respectively. Both L and H are also related by a third down-sampling model as shown in Equation (3).
where A LH ∈ R α L t×tα H and n LH is an additive noise.
In the SRP problem L is provided and H is required. As stated by Liu et al. [2] the goal of SRP is to find the optimum reconstruction mapping function F such that F (L) ≈ H. Since many possible high-resolution data sequences may satisfy the down-sampling model, F is an ill-posed function that requires constraints, which can be modelled in a Maximum a Posteriori (MAP) estimation framework. Let H = F (L). According to the Bayesian theorem, the posterior probability can be written as shown in Equation (4).
where p(L|H) is the likelihood according to the down-sampling model, p(H) is the prior on H and p(L) is a constant when L is given. The corresponding approximation of H, being H , is estimated by solving the MAP problem defined in Equation (5).
Either p(L|H) and p(H) can be solved by modelling the degradation process and prior information by universal approximators, like neural networks, trained on highresolution data.

Problem Description
This work targets the enhancement of resolution of t energy measurements corresponding to the end of each month during n years. The source time series L has a resolution α L = 1, and the target time series ground truth H has a resolution α H = 744, which corresponds to the number of hourly measurements in a 31-day month. For this reason, the expected reconstruction ratio is 1:744. The additive noise n L H is not modelled intrinsically as the model must conserve the total amount of energy, which is a constraint in energy apportioning systems. The mapping function F (L) is implemented by a deep generative neural network with a set of parameters θ, which become part of the function itself F (L; θ). The loss Mean Squared Error (MSE) is used to train the deep neural network, being it defined as shown in Equation (6).
The deep neural network parameters θ are optimized by minimizing the MSE loss function shown in Equation (7).
The optimization over large amount of data allows the deep neural network to extract prior information from the data itself and push it into network parameters θ, which is required to solve the defined MAP problem.

Convolutional Neural Networks
CNNs are part of the deep-learning networks stack suitable for sequence modelling, as they usually perform better than recurrent neural networks (RNN) architectures [28]. They are composed by stacked convolutional layers, which process the input data through the sliding window of filters performed by convolution operations. Every convolutional layer in the network is composed by a set of filters and each of them establishes local relationships among data points inside a limited data window called receptive field, which is delimited to the size of the filter itself due to its projection over the input data. Since every receptive field shifts its position during the convolution operation, the filter weights are adjusted to enhance relevant local features invariantly to its position in the input data. Even though each filter has a fixed receptive field over the input volume of its layer, they also have a global receptive field over the initial input of the network that increases with the depth of the network, being the last convolutional layer the one with its filters having the largest global receptive field. The convolution operation through a sliding window can be implemented as a set of shared weights over the input with the filters' size. This makes the implementation of CNNs to be invariant to input sizes, to be easily parallelizable, and to require less parameters than a fully connected deep neural network to cover the whole input, which reduces chances of overfitting. Furthermore, the global receptive field can be exponentially increased by dilating the convolutions on each convolutional layer, which makes the filters to enhance non-consecutive local features. Time series can be better modelled with CNNs by making its filters to enhance temporal features rather than spatial features and if the global receptive field covers enough data points to outperform an RNN for the same task [28].

Up-Sampling of Temporal Resolution Features
CNNs can process a variety of input volume sizes, being 1-D, 2-D and 3-D the most common filter volume sizes. Unlike SRPCNN [2] where 1-D filters are used, in M-SRPCNN we use 2-D volume filters due to a 2-D representation of the temporal dimension, leaving a third dimension for the energy features. The first layer is the energy consumption, with (t × α L ) sized temporal dimensions and an extra dimension of energy features with size f . The receptive field of the CNN filters covers a section of both temporal dimensions and, since the last dimension is the channels dimension which matches the filters' depth, it also covers all the available energy features. Thus, the expected input and output volumes are of shape (t × α L × f ) and (t × α H × 1) respectively. Both input and output volumes have a constant timesteps t, but the output must have a greater resolution α H than α L . When α H >> α L ,the problem requires the network to scale the number of layers to handle a recovery with such a resolution ratio differential, which increases significantly the number of network layers, hinders the training speed and increases the risk of overfitting. To solve this issue, we found empirically that vectorizing the resolution scale dimension of size α H into the shape (d × α H d ) with d|α H and d > 1, allows a better equalization of the data dimensions along the layers, leading to a smaller network architecture and better generalization power. For this reason, the last convolutional layer output is of shape The up-sampling is a sparse process that applies to two dimensions of the input volume at the same time: • The second temporal dimension by applying stacked transposed convolutions, which are sometimes also referred to as deconvolutional layers. • The last dimension of the input volume, which corresponds to the channels or energy features.
The stacking of deconvolutional layers followed by non-linear activation functions allows the network to learn a non-linear up-sampling of the temporal resolution dimension with size α L , efficiently solving the formulated SRP problem.
For the problem described in this paper α L = 1 and α H = 744 , we chose d = 31 due to a month having 31 days, which is a divisor of α H . For this reason, the network input volume shape is (t × 1 × f ) and the output volume shape is (t × 744 × 1) which becomes (t × 31 × 24) with the described vectorization improvement.

SMARKIA SRP Dataset
The SMARKIA SRP Dataset is a private dataset comprising current consumption data of 45,350 households collected between May 2018 and May 2020, encompassing a total of 2 years of hourly resolution consumption. The data values range between 0 and 1, as they were normalized by dividing between the contracted capacity of each household. The 0.056% of the dataset contains missing values. The dataset is split into two subsets that were randomly sampled using a normal distribution, forcing both subsets to be mutually exclusive. The training set is formed by 31,919 households (70.4%) and the test set of 13,431 (29.6%) households.
Training and test sets descriptions are summarized in Table 1. More than the 75% of the dataset is composed by values smaller than 0.1. A few random samples from the dataset can be visualized in Figure 1.

Data Preprocessing
The source SMARKIA SRP Dataset contains the high-resolution consumption time series H for each household's consumption data, which includes 2 years of hourly historical consumption. For this reason, we defined the number of timesteps t = 24 to match the number of months contained in the dataset. Since each month comprises a different number of days, we zero-padded the ending of the months with fewer than 31 days to ensure that every month contains the same number of values. Missing values were filled with 0. Since M-SRPCNN requires a low-resolution consumption to up-sample from, we down-sampled the data H to build the low-resolution consumption time series L as shown in Equation (8).
Being α H = 744. Every timestamp t corresponds in L to the overall consumption of the corresponding t month. Since the additive noise n L H = 0, we omitted it from the equation. The set of household's energy consumptions collected in the SMARKIA SRP dataset are already normalized between 0 and 1.

Temporal Features
We extracted temporal features F T from the timestamps of T defined in the time series L. Since t represents the end of the month date, we used the sin and the cos of each normalized month value from T, as shown in Equation (9). The representation of the temporal features F T can be visualized in Figure 2. This representation of the date allows the data points to be represented as a continuous cycle, showing the end of the year to be spatially close to the beginning. In addition, it permits the network to generalize to any temporal sequence of data, and not be restricted to the month's segment start or end of the training set. This is due to the network learning filters that forge relationships between consumption data features and consumption date features. The temporal features are employed solely in the input data. To fit the input shape required by the network, the temporal features F T are vectorized in the form (24 × 1 × 2).

Consumption Features
Consumption features do not require any normalization, as the underlying data from which they were built are already normalized to be between 0 and 1. The construction of these features are shown in Equation (10).
Just as with the temporal features, vectorization of consumption features is necessary to fit the input shape required by the network, in the form (24 × 1 × 1) and the output of the network, in the form (24 × 744 × 1) which is in turn converted to (24 × 31 × 24) through the vectorization improvement. The input data are built by stacking the last dimension of F L and F T , while the output data are composed by F H .

Network Architecture
Due to the input of temporal data being 2D structured, our proposed network is a fully convolutional neural network (FCN) based on 2D transposed convolutions that expand the second temporal dimension while, at the same time, offering dilated convolutions on the first temporal dimension. To deal with a lack of information occurring in the case of one month's single total measurement, coverage spanning a period of several month's measurements is required. For this reason, we designed the network considering the global receptive field to be greater than the number of months t in the training set. Input data covering shorter periods than t months should be zero-padded on the past, while input data with longer lengths are naturally supported by the network due to the fully convolutional architecture. Batch normalization [29] is used between every convolutional block to speed up training and to reduce overfitting. The activation function after each convolutional layer is ReLU = max(x, 0) except for the latest convolutional layer whose activation is the non-monotonic abs(x) = max(x, −x) in order to force the network to only output positive values without the risk of dying-ReLU [30] neurons. The last layer of the network, which we called OutputScaler, scales the output to fit the total monthly value from the input, which removes the possibility of noise and makes it a suitable replacement for apportioning systems. The network architecture can be seen in Figure 3. The input of the network is the stack of computed features F L and F T generated in Sections 3.2.1 and 3.2.2.

Network Outputscaler Layer
We propose the OutputScaler layer as the last layer of the network, which ensures that the total amount of energy is conserved. This layer removes the additive noise n L H from the problem, enabling the network to learn how to distribute a fixed total energy consumption in the desired resolution scale without losing or adding unwanted energy consumption to the totalized value. The layer is back-propagable and can be trained end-to-end.
Given I as the input of the OutputScaler layer and the consumption features F L . This layer requires I ≥ 0. Since I ∈ R 24×31×24 , the output O of the layer is defined as shown in Equation (11).
The output of this layer is the networks output H = O.

Evaluation and Metrics
The SMARKIA SRP dataset is split into a training set of 31,919 households and a test set of 13,431 households. The network is trained with the training set and validated with the test set. The chosen evaluation metrics are the Mean Absolute Error (MAE) (Equation (12)) to measure the general similarity of the time series, the Root Mean Squared Error (RMSE) (Equation (13)) to measure the similarity of the time series with higher sensibility to outliers, the Median Absolute Error (MedAE) (Equation (14)) to measure the similarity but trimming extreme values and reducing the bias in favour of low forecasts, and the statistical metric coefficient of determination (R 2 ) (Equation (15)) to measure how well the data fit the regression model.
Since the dataset is already normalized to be between [0, 1] per household, the result of the RMSE(x) on these data equals the result of the Normalized Root Mean Squared Error (NRMSE) (Equation (16)) Because max(H) − min(H) = 1. This makes the results comparable to other models trained with different datasets.

Experiments and Results
We trained M-SRPCNN with the optimization method Adam [31], setting its hyperparameters β 1 = 0.9, β 2 = 0.999, = 10 −7 and a learning rate σ = 0.001 adjusted experimentally. In addition, the hyperparameters of the batch normalization layers were set to = 0.001 and a momentum = 0.99. We used the Glorot uniform initialization [32] to initialize the network weights. To train the network, we preprocessed the training set from the Smarkia SRP Dataset using the preprocessing steps described in Section 3.1. The preprocessing steps allowed us to build the low resolution load profile L and its high resolution load profile H for the training set. We used the low resolution load profile L to feed the input of the network and the high resolution load profile H as the ground truth for the training. We trained the network with the preprocessed training set for 6000 epochs in an NVIDIA GeForce RTX2080 Ti, which took 12 h. We used the loss MSE defined in Section 2.3. The convergence of the model during training can be seen in Figure 4.
To evaluate the algorithm, we preprocessed the test set from the Smarkia SRP Dataset using the same preprocessing steps described in Section 3.1, which generated the corresponding low resolution load profile L and its high resolution ground truth H. We used the generated L from the test set as the input to feed the model. The prediction of the model is the high resolutionĤ. We compared the predictionĤ with the generated ground truth H from the test set using the metrics proposed in Section 3.4. For comparison purposes, we also applied the nearest, lineal and cubic interpolations to the test set low resolution load profile L in order to generate one eachĤ predictions, then we evaluated each of them with the same proposed metrics. The results of the comparison can be seen in Table 2 where the proposed method M-SRPCNN show superiority across all chosen metrics compared against traditional interpolation methods. Using the test set to evaluate the network provides information about the performance of the model against previously unseen data. The minimalist size of the deep neural network, the large dataset size, and the BatchNormalization layers used as regularization significantly reduced the risk of overfitting, allowing us to train the network for a larger number of epochs. Considering cubic interpolation as the baseline reference to compare to, in M-SRPCNN the MAE shows a 5% of improvement, the MedAE shows a 15.49% of improvement, the NRMSE and the R2 shows a 1% and a 3.39% of improvement respectively. It can be noted that the model performs significantly better with the reconstruction of low consumption values as shown with the MedAE metric, where an improvement of 15.49% with respect to the cubic interpolation is achieved. This is due to SMARKIA SRP dataset test set having more than 75% of the measurements registering a value below 0.1.
Unlike cubic interpolation, M-SRPCNN tends to infer the general consumption pattern of the consumer as can be seen in Figure 5. This phenomenon can be stated more clearly using a heatmap as shown in Figure 6. More interestingly, this generalization of the consumption pattern can be useful to highlight outliers in the consumption data as shown in Figure 7.

Conclusions
In this paper, we propose a novel application for the SRP problem, which is the reconstruction of hourly data based on monthly data. We propose a deep generative neural network architecture that we called M-SRPCNN, a deep neural network that can be used to reconstruct hourly load profile from monthly values. When trained with large enough datasets, it can successfully learn a useful prior information to reconstruct the hourly load profile. The fully convolutional architecture and temporal features engineering allows the network to process variable-length sequences of monthly consumption data in different date orders than those used during training. The proposed final network layer, which we call OutputScaler, allows the network to maintain the overall monthly consumption constant as predicted. In addition, we have demonstrated the superiority of this proposal with respect to traditional interpolation methods. Its superiority, along with the fact that overall consumption is maintained, makes it suitable as a replacement for energy apportioning systems. M-SRPCNN can enable a wider range of studies using its output as part of the input for fields like NILM and energy forecasting. We observed that the network learns to reconstruct general consumption patterns, not capturing outlier's information, which might be also useful for anomaly detection or as a source of data augmentation for other techniques.

Future Investigations
We expect to generalize the network to be able to reconstruct using different reconstruction scales at the same time. Currently, the proposed network requires the input to be the overall energy consumption value by the last day of the month. We also expect to generalize this behaviour in future research by parameterizing the day on which the month is summarized to be also part of the network input. Finally, we expect this structure to be tested as part of NILM studies, as we think this SRP proposed framework is especially well-suited for NILM regression as Liu et al. [2] demonstrated, and for anomaly detection, as we stated in this paper.