Electrical Energy Prediction in Residential Buildings for Short-Term Horizons Using Hybrid Deep Learning Strategy

: Smart grid technology based on renewable energy and energy storage systems are attracting considerable attention towards energy crises. Accurate and reliable model for electricity prediction is considered a key factor for a suitable energy management policy. Currently, electricity consumption is rapidly increasing due to the rise in human population and technology development. Therefore, in this study, we established a two-step methodology for residential building load prediction, which comprises two stages: in the ﬁrst stage, the raw data of electricity consumption are reﬁned for effective training; and the second step includes a hybrid model with the integration of convolutional neural network (CNN) and multilayer bidirectional gated recurrent unit (MB-GRU). The CNN layers are incorporated into the model as a feature extractor, while MB-GRU learns the sequences between electricity consumption data. The proposed model is evaluated using the root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE) metrics. Finally, our model is assessed over benchmark datasets that exhibited an extensive drop in the error rate in comparison to other techniques. The results indicated that the proposed model reduced errors over the individual household electricity consumption prediction (IHEPC) dataset (i.e., RMSE (5%), MSE (4%), and MAE (4%)), and for the appliances load prediction (AEP) dataset (i.e., RMSE (2%), and MAE (1%)). formal analysis, M.Y.L.; investigation, A.U.; resources, S.W.B.; data curation, Z.A.K.; writing—original draft preparation, Z.A.K.; writing—review and editing, Z.A.K., W.U. and A.U.; visualization, W.U.; supervision, S.W.B.; project administration, M.L.; The CNN layers are incorporated to extract important features from the reﬁned data, while the MB-GRU layers are used to learn the temporal information of electricity consumption data. The proposed methodology is tested over two challenging datasets and achieves better performance when compared to other methods, as demonstrated in the results section.


Introduction
The electric power industry plays an important role in the economic development of a country, and its decisive operation provides significant societal wellbeing. As reported in [1] the global energy consumption is increasing for sustainable advancement in society; therefore, the effectiveness of electricity consumption prediction needs to be improved [2]. As reported by the World Energy Outlook in 2017, the compound annual growth rate (CAGR) of electricity demand will have a global incremental rise of 1.0% in the period of 2016-2040 [3]. Another report presented in [4] described that residential buildings generally consume 27% of the total energy consumption, whereas buildings in the United States (US) consume 40% of their national energy [5]. Owing to high energy consumption levels in residential buildings, efficient management of their consumption is essential. Therefore, proper planning of energy is vital for energy saving, which is possible through effective energy consumption prediction models.
The electricity prediction strategies for short-term horizons are categorized into four types: very-short, short, medium, and long-term [6,7]. Short and very short-term predictions refer to minutely

•
The electricity consumption data are gathered from smart meter sensors, which include missing values, redundant values, outlier values, etc., due to several reasons, such as faults in meter sensors, variable weather conditions, abnormal customer consumption patterns, etc., which need to be refined before training the model. Therefore, in this work, the input raw datasets are refined before training to fill the missing values and remove the outlier values from the dataset. Similarly, the electricity consumption patterns are of very diverse nature where the neural networks are sensitive to it, so a data normalization technique is applied to bring the dataset into a standard range. • The mainstream methods use solo models for electricity consumption prediction, which are unable to precisely extract spatiotemporal patterns and have high error rates. Therefore, in this study, we proposed a hybrid model with a combination of CNN and MB-GRU that helps to improve the accuracy of electricity consumption prediction.

•
The performance of the model was evaluated using the root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE). The experimental results show that the proposed model extensively decreases the error rate when compared to baseline models.
The main goal of this work is to improve the prediction accuracy for short-term electrical load prediction in residential buildings, which reduces customer consumption and provides economic benefits. The experimental section shows the effectiveness of the proposed method, which ensures the best performance of the proposed method as compared to other baseline models.
The remainder of the paper is arranged as follows: Section 2 provides a detailed explanation of the proposed method; Section 3 includes the experimental results of the proposed method and comparison with other state-of-the-art models. Finally, the manuscript is concluded in Section 4.

Proposed Framework
Accurate electricity load prediction is very important for electricity saving and vital economic implications [37]. As reported by [38] a 1% decrease in the error rate of the electricity prediction model can profit 1.6M dollars and can save 10K MW of electricity annually. For accurate load prediction, an appropriate learning methodology is required. The electricity load prediction models are learned from historical data generated from smart meter sensors. However, some times, due to weather conditions, meter faults, etc., they generate some abnormal data that should be refined before training. Therefore, this work represents a two-step framework that includes data preprocessing and the proposed hybrid model. The preprocessing step refines the input raw data of electricity consumption and then passes it to the proposed model to learn it, as demonstrated in Figure 1, where the details of each step are further discussed in the following sections.

Data Preprocessing
For better performance of the electricity load prediction models, the training data should be analyzed before training. As previously mentioned, the historical data of electricity consumption include abnormalities that affect the model performance. In this study, we used individual household electricity consumption prediction (IHEPC) and appliances load prediction (AEP) datasets, which consist of missing and outlier values. These abnormalities are removed from the data in the preprocessing step of the proposed framework. For missing values filling, NAN interpolation techniques are used, whereas for outlier values, three sigma rules of thumb [39] are applied. After the outlier reduction and filling missing values, the datasets are normalized using the min-max normalization technique to transform the dataset into a particular range which the neural network can learn easily.

Proposed CNN and MBGRU Architecture
This work combines a CNN with a multilayered bidirectional GRU (MB-GRU) for short-term electricity load prediction, where the CNN layers are incorporated to extract features from the preprocessed input data, and the MB-GRU model is used to learn the sequences between them. CNN is a neural network architecture that learns in a hierarchical manner whereby each layer learns more and more abstract features. The first layers learn atomic/primitive representations, while the intermediate-level layers learn intermediate abstract representations, and finally, fully connected layers learn high-level patterns. Therefore, the depth of the network is defined by the number of such layers. The higher the layer count, the deeper the network and can learn tiny representations. A CNN is a particular type of deep neural network that employs alternating layers of convolutions and pooling. It contains trainable filter banks per layer. Each individual filter in a filter bank, which is called a kernel and has a fixed receptive field (window) that is scanned over a layer below it, to compute an output feature map. The kernel performs a simple dot product and bias computation as it scans the layer below it and then feeds the result through an activation function, a rectifier, for example, to compute the output map. The output map is then subsampled using sum or max pooling, the latter being more common in order to reduce sensitivity to distortions in the upper layers. This process is alternated up to some point when the features become specific to the problem at hand. Thus, the CNN is a deep neural network for learning increase and more compact features that can later be used for recognition problems. The last few layers in a typical CNN comprise a typical fully connected neural network or support vector machines in order to recognize different combinations

Data Preprocessing
For better performance of the electricity load prediction models, the training data should be analyzed before training. As previously mentioned, the historical data of electricity consumption include abnormalities that affect the model performance. In this study, we used individual household electricity consumption prediction (IHEPC) and appliances load prediction (AEP) datasets, which consist of missing and outlier values. These abnormalities are removed from the data in the preprocessing step of the proposed framework. For missing values filling, NAN interpolation techniques are used, whereas for outlier values, three sigma rules of thumb [39] are applied. After the outlier reduction and filling missing values, the datasets are normalized using the min-max normalization technique to transform the dataset into a particular range which the neural network can learn easily.

Proposed CNN and MBGRU Architecture
This work combines a CNN with a multilayered bidirectional GRU (MB-GRU) for short-term electricity load prediction, where the CNN layers are incorporated to extract features from the preprocessed input data, and the MB-GRU model is used to learn the sequences between them. CNN is a neural network architecture that learns in a hierarchical manner whereby each layer learns more and more abstract features. The first layers learn atomic/primitive representations, while the intermediate-level layers learn intermediate abstract representations, and finally, fully connected layers learn high-level patterns. Therefore, the depth of the network is defined by the number of such layers. The higher the layer count, the deeper the network and can learn tiny representations. A CNN is a particular type of deep neural network that employs alternating layers of convolutions and pooling. It contains trainable filter banks per layer. Each individual filter in a filter bank, which is called a kernel and has a fixed receptive field (window) that is scanned over a layer below it, to compute an output feature map. The kernel performs a simple dot product and bias computation as it scans the layer below it and then feeds the result through an activation function, a rectifier, for example, to compute the output map. The output map is then subsampled using sum or max pooling, the latter being more common in order to reduce sensitivity to distortions in the upper layers. This process is alternated up to some point when the features become specific to the problem at hand. Thus, the CNN is a deep neural network for learning increase and more compact features that can later be used for recognition problems. The last few layers in a typical CNN comprise a typical fully connected neural Appl. Sci. 2020, 10, 8634 5 of 12 network or support vector machines in order to recognize different combinations of features from the convolutional layers. The CNN architecture is used in different domains, such as image and video recognition [17,35,40,41], language processing [42,43], electricity load forecasting [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are used to extract spatial information and then pass the output into sequential learning algorithms such as RNN LSTM and GRU.
RNN [47] is a sequence learning architecture with backward connections among hidden layers that include some kind of memory and is extensively used in several domains such as natural language processing [48], time series analysis [49], and speech recognition [50], visual data processing [51][52][53], etc. The RNN models generate output at each time stamp from the input data, which leads to the vanishing gradient problem. The RNN model forgets the long sequence of electricity data, such as 60-min resolution, which leads to loss of important information.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 12 of features from the convolutional layers. The CNN architecture is used in different domains, such as image and video recognition [17,35,40,41], language processing [42,43], electricity load forecasting [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are used to extract spatial information and then pass the output into sequential learning algorithms such as RNN LSTM and GRU. RNN [47] is a sequence learning architecture with backward connections among hidden layers that include some kind of memory and is extensively used in several domains such as natural language processing [48], time series analysis [49], and speech recognition [50], visual data processing [51][52][53], etc. The RNN models generate output at each time stamp from the input data, which leads to the vanishing gradient problem. The RNN model forgets the long sequence of electricity data, such as 60-min resolution, which leads to loss of important information. (1) The problem of losing long sequence information is addressed by LSTM using the three-gate mechanism input, output, and forget. The mathematical representation of each gate is shown in Equations (1)- (6). In these equations the output of each gates is represented through " ", "ʄ" and " " where "ʘ" represents the activation function. The weights of the gates are represented through "ώ'', whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents the bias of the gates. The LSTM structure is complex and computationally expensive due to these gates' units and memory cells. To overcome the concern of LSTM, another lightweight architecture is developed called GRU [54] which comprises the reset and update gates. The mathematical representation of GRU gates are shown in Equations (7)-(10) where the update gate examines the earlier cell memory to remain active, and the reset gate merges the next cell input sequence with previous cell memory. In this study, we used MB-GRU, which processes the sequence of input data in both backward and forward directions [55]. The bidirectional RNN models perform better in several domains such as classification, summarization [56], and load forecasting [57]. Therefore, in this study, we incorporate bidirectional GRU layers that contain both backward and forward layers, where the output sequence of the foreword layer is iteratively calculated through input in the positive sequence. The output of the backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal features. Some researchers deployed a solo model that is insufficient to extract both types of features at a time. Therefore, in this work, we established a hybrid model that combines CNN with MB-GRU, as shown in Figure 1b. The proposed hybrid model includes an input layer, CNN layers, and bidirectional GRU layers. Two CNN layers are incorporated after several experiments over different layers and different parameters. Finally, we select filters of 8 and 4 for the first and second CNN layers with a kernel size of 3.1, respectively and used ReLU as an activation function in these layers. After convolutional layers, two bidirectional GRU layers are incorporated to learn the temporal information of the electricity historical data. Finally, the fully connected layers are integrated for the final output prediction. that include some kind of memory and is extensively used in several domains such as language processing [48], time series analysis [49], and speech recognition [50], visual data pr [51][52][53], etc. The RNN models generate output at each time stamp from the input data, whi to the vanishing gradient problem. The RNN model forgets the long sequence of electricity da as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by LSTM using the th mechanism input, output, and forget. The mathematical representation of each gate is s Equations (1)- (6). In these equations the output of each gates is represented through " ", "ʄ" where "ʘ" represents the activation function. The weights of the gates are represented thro whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents the bias of t The LSTM structure is complex and computationally expensive due to these gates' units and cells. To overcome the concern of LSTM, another lightweight architecture is developed call [54] which comprises the reset and update gates. The mathematical representation of GRU g shown in Equations (7)-(10) where the update gate examines the earlier cell memory to remai and the reset gate merges the next cell input sequence with previous cell memory. In this st used MB-GRU, which processes the sequence of input data in both backward and forward d [55]. The bidirectional RNN models perform better in several domains such as classi summarization [56], and load forecasting [57]. Therefore, in this study, we incorporate bidir GRU layers that contain both backward and forward layers, where the output sequenc foreword layer is iteratively calculated through input in the positive sequence. The outpu backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal features. Some res deployed a solo model that is insufficient to extract both types of features at a time. Therefor work, we established a hybrid model that combines CNN with MB-GRU, as shown in Figure proposed hybrid model includes an input layer, CNN layers, and bidirectional GRU laye CNN layers are incorporated after several experiments over different layers and different par Finally, we select filters of 8 and 4 for the first and second CNN layers with a kernel siz respectively and used ReLU as an activation function in these layers. After convolutional lay bidirectional GRU layers are incorporated to learn the temporal information of the e historical data. Finally, the fully connected layers are integrated for the final output predicti = (ώ Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 12 of features from the convolutional layers. The CNN architecture is used in different domains, such as image and video recognition [17,35,40,41], language processing [42,43], electricity load forecasting [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are used to extract spatial information and then pass the output into sequential learning algorithms such as RNN LSTM and GRU. RNN [47] is a sequence learning architecture with backward connections among hidden layers that include some kind of memory and is extensively used in several domains such as natural language processing [48], time series analysis [49], and speech recognition [50], visual data processing [51][52][53], etc. The RNN models generate output at each time stamp from the input data, which leads to the vanishing gradient problem. The RNN model forgets the long sequence of electricity data, such as 60-min resolution, which leads to loss of important information. (1) The problem of losing long sequence information is addressed by LSTM using the three-gate mechanism input, output, and forget. The mathematical representation of each gate is shown in Equations (1)- (6). In these equations the output of each gates is represented through " ", "ʄ" and " " where "ʘ" represents the activation function. The weights of the gates are represented through "ώ'', whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents the bias of the gates. The LSTM structure is complex and computationally expensive due to these gates' units and memory cells. To overcome the concern of LSTM, another lightweight architecture is developed called GRU [54] which comprises the reset and update gates. The mathematical representation of GRU gates are shown in Equations (7)-(10) where the update gate examines the earlier cell memory to remain active, and the reset gate merges the next cell input sequence with previous cell memory. In this study, we used MB-GRU, which processes the sequence of input data in both backward and forward directions [55]. The bidirectional RNN models perform better in several domains such as classification, summarization [56], and load forecasting [57]. Therefore, in this study, we incorporate bidirectional GRU layers that contain both backward and forward layers, where the output sequence of the foreword layer is iteratively calculated through input in the positive sequence. The output of the backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal features. Some researchers deployed a solo model that is insufficient to extract both types of features at a time. Therefore, in this work, we established a hybrid model that combines CNN with MB-GRU, as shown in Figure 1b. The proposed hybrid model includes an input layer, CNN layers, and bidirectional GRU layers. Two CNN layers are incorporated after several experiments over different layers and different parameters. Finally, we select filters of 8 and 4 for the first and second CNN layers with a kernel size of 3.1, respectively and used ReLU as an activation function in these layers. After convolutional layers, two bidirectional GRU layers are incorporated to learn the temporal information of the electricity historical data. Finally, the fully connected layers are integrated for the final output prediction.

· [
RNN [47] is a sequence learning architecture with backward connections that include some kind of memory and is extensively used in several dom language processing [48], time series analysis [49], and speech recognition [50], [51][52][53], etc. The RNN models generate output at each time stamp from the in to the vanishing gradient problem. The RNN model forgets the long sequence o as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by LSTM mechanism input, output, and forget. The mathematical representation of e Equations (1)- (6). In these equations the output of each gates is represented thr where "ʘ" represents the activation function. The weights of the gates are rep whereas" Ʈ -1" refers to the output of previous LSTM block and " " represent The LSTM structure is complex and computationally expensive due to these ga cells. To overcome the concern of LSTM, another lightweight architecture is d [54] which comprises the reset and update gates. The mathematical representa shown in Equations (7)-(10) where the update gate examines the earlier cell mem and the reset gate merges the next cell input sequence with previous cell mem used MB-GRU, which processes the sequence of input data in both backward a [55]. The bidirectional RNN models perform better in several domains s summarization [56], and load forecasting [57]. Therefore, in this study, we inc GRU layers that contain both backward and forward layers, where the ou foreword layer is iteratively calculated through input in the positive sequen backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal featu deployed a solo model that is insufficient to extract both types of features at a t work, we established a hybrid model that combines CNN with MB-GRU, as sh proposed hybrid model includes an input layer, CNN layers, and bidirectio CNN layers are incorporated after several experiments over different layers and Finally, we select filters of 8 and 4 for the first and second CNN layers wit respectively and used ReLU as an activation function in these layers. After con bidirectional GRU layers are incorporated to learn the temporal informat historical data. Finally, the fully connected layers are integrated for the final ou −1 , RNN [47] is a sequence learning architecture with backward conne that include some kind of memory and is extensively used in sever language processing [48], time series analysis [49], and speech recognitio [51][52][53], etc. The RNN models generate output at each time stamp from to the vanishing gradient problem. The RNN model forgets the long sequ as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by mechanism input, output, and forget. The mathematical representatio Equations (1)- (6). In these equations the output of each gates is represen where "ʘ" represents the activation function. The weights of the gates a whereas" Ʈ -1" refers to the output of previous LSTM block and " " rep The LSTM structure is complex and computationally expensive due to th cells. To overcome the concern of LSTM, another lightweight architectu [54] which comprises the reset and update gates. The mathematical rep shown in Equations (7)-(10) where the update gate examines the earlier c and the reset gate merges the next cell input sequence with previous ce used MB-GRU, which processes the sequence of input data in both back [55]. The bidirectional RNN models perform better in several dom summarization [56], and load forecasting [57]. Therefore, in this study, GRU layers that contain both backward and forward layers, where foreword layer is iteratively calculated through input in the positive backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and tempora deployed a solo model that is insufficient to extract both types of feature work, we established a hybrid model that combines CNN with MB-GRU proposed hybrid model includes an input layer, CNN layers, and bid CNN layers are incorporated after several experiments over different lay Finally, we select filters of 8 and 4 for the first and second CNN laye respectively and used ReLU as an activation function in these layers. Af bidirectional GRU layers are incorporated to learn the temporal in historical data. Finally, the fully connected layers are integrated for the ] + j ) (1) information and then pass the output into sequential learning algorithms such as RNN LS GRU. RNN [47] is a sequence learning architecture with backward connections among hidde that include some kind of memory and is extensively used in several domains such as language processing [48], time series analysis [49], and speech recognition [50], visual data pr [51][52][53], etc. The RNN models generate output at each time stamp from the input data, whi to the vanishing gradient problem. The RNN model forgets the long sequence of electricity da as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by LSTM using the th mechanism input, output, and forget. The mathematical representation of each gate is sh Equations (1)- (6). In these equations the output of each gates is represented through " ", "ʄ" where "ʘ" represents the activation function. The weights of the gates are represented throu whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents the bias of t The LSTM structure is complex and computationally expensive due to these gates' units and cells. To overcome the concern of LSTM, another lightweight architecture is developed call [54] which comprises the reset and update gates. The mathematical representation of GRU g shown in Equations (7)-(10) where the update gate examines the earlier cell memory to remai and the reset gate merges the next cell input sequence with previous cell memory. In this st used MB-GRU, which processes the sequence of input data in both backward and forward d [55]. The bidirectional RNN models perform better in several domains such as classi summarization [56], and load forecasting [57]. Therefore, in this study, we incorporate bidir GRU layers that contain both backward and forward layers, where the output sequenc foreword layer is iteratively calculated through input in the positive sequence. The outpu backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal features. Some res deployed a solo model that is insufficient to extract both types of features at a time. Therefor work, we established a hybrid model that combines CNN with MB-GRU, as shown in Figure proposed hybrid model includes an input layer, CNN layers, and bidirectional GRU laye CNN layers are incorporated after several experiments over different layers and different par Finally, we select filters of 8 and 4 for the first and second CNN layers with a kernel siz respectively and used ReLU as an activation function in these layers. After convolutional lay bidirectional GRU layers are incorporated to learn the temporal information of the el historical data. Finally, the fully connected layers are integrated for the final output predicti = (ώ i · [ information and then pass the output into sequential learning algorithms suc GRU.
RNN [47] is a sequence learning architecture with backward connections that include some kind of memory and is extensively used in several dom language processing [48], time series analysis [49], and speech recognition [50], v [51][52][53], etc. The RNN models generate output at each time stamp from the in to the vanishing gradient problem. The RNN model forgets the long sequence o as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by LSTM mechanism input, output, and forget. The mathematical representation of e Equations (1)- (6). In these equations the output of each gates is represented thr where "ʘ" represents the activation function. The weights of the gates are rep whereas" Ʈ -1" refers to the output of previous LSTM block and " " represent The LSTM structure is complex and computationally expensive due to these gat cells. To overcome the concern of LSTM, another lightweight architecture is d [54] which comprises the reset and update gates. The mathematical representa shown in Equations (7)-(10) where the update gate examines the earlier cell mem and the reset gate merges the next cell input sequence with previous cell mem used MB-GRU, which processes the sequence of input data in both backward a [55]. The bidirectional RNN models perform better in several domains s summarization [56], and load forecasting [57]. Therefore, in this study, we inc GRU layers that contain both backward and forward layers, where the ou foreword layer is iteratively calculated through input in the positive sequen backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal featu deployed a solo model that is insufficient to extract both types of features at a t work, we established a hybrid model that combines CNN with MB-GRU, as sh proposed hybrid model includes an input layer, CNN layers, and bidirectio CNN layers are incorporated after several experiments over different layers and Finally, we select filters of 8 and 4 for the first and second CNN layers with respectively and used ReLU as an activation function in these layers. After con bidirectional GRU layers are incorporated to learn the temporal informat historical data. Finally, the fully connected layers are integrated for the final ou −1 , information and then pass the output into sequential learning algorith GRU.
RNN [47] is a sequence learning architecture with backward conne that include some kind of memory and is extensively used in sever language processing [48], time series analysis [49], and speech recognitio [51][52][53], etc. The RNN models generate output at each time stamp from to the vanishing gradient problem. The RNN model forgets the long sequ as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by mechanism input, output, and forget. The mathematical representatio Equations (1)- (6). In these equations the output of each gates is represen where "ʘ" represents the activation function. The weights of the gates a whereas" Ʈ -1" refers to the output of previous LSTM block and " " rep The LSTM structure is complex and computationally expensive due to th cells. To overcome the concern of LSTM, another lightweight architectu [54] which comprises the reset and update gates. The mathematical repr shown in Equations (7)-(10) where the update gate examines the earlier c and the reset gate merges the next cell input sequence with previous ce used MB-GRU, which processes the sequence of input data in both backw [55]. The bidirectional RNN models perform better in several dom summarization [56], and load forecasting [57]. Therefore, in this study, GRU layers that contain both backward and forward layers, where foreword layer is iteratively calculated through input in the positive s backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and tempora deployed a solo model that is insufficient to extract both types of feature work, we established a hybrid model that combines CNN with MB-GRU proposed hybrid model includes an input layer, CNN layers, and bid CNN layers are incorporated after several experiments over different lay Finally, we select filters of 8 and 4 for the first and second CNN laye respectively and used ReLU as an activation function in these layers. Af bidirectional GRU layers are incorporated to learn the temporal in historical data. Finally, the fully connected layers are integrated for the f ] + ) (2) [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are used to extract s information and then pass the output into sequential learning algorithms such as RNN LSTM GRU. RNN [47] is a sequence learning architecture with backward connections among hidden that include some kind of memory and is extensively used in several domains such as n language processing [48], time series analysis [49], and speech recognition [50], visual data proc [51][52][53], etc. The RNN models generate output at each time stamp from the input data, which to the vanishing gradient problem. The RNN model forgets the long sequence of electricity data as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by LSTM using the thre mechanism input, output, and forget. The mathematical representation of each gate is sho Equations (1)- (6). In these equations the output of each gates is represented through " ", "ʄ" an where "ʘ" represents the activation function. The weights of the gates are represented throug whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents the bias of the The LSTM structure is complex and computationally expensive due to these gates' units and me cells. To overcome the concern of LSTM, another lightweight architecture is developed called [54] which comprises the reset and update gates. The mathematical representation of GRU gat shown in Equations (7)-(10) where the update gate examines the earlier cell memory to remain a and the reset gate merges the next cell input sequence with previous cell memory. In this stud used MB-GRU, which processes the sequence of input data in both backward and forward dire [55]. The bidirectional RNN models perform better in several domains such as classific summarization [56], and load forecasting [57]. Therefore, in this study, we incorporate bidirec GRU layers that contain both backward and forward layers, where the output sequence foreword layer is iteratively calculated through input in the positive sequence. The output backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal features. Some resea deployed a solo model that is insufficient to extract both types of features at a time. Therefore, work, we established a hybrid model that combines CNN with MB-GRU, as shown in Figure 1 proposed hybrid model includes an input layer, CNN layers, and bidirectional GRU layers CNN layers are incorporated after several experiments over different layers and different param Finally, we select filters of 8 and 4 for the first and second CNN layers with a kernel size respectively and used ReLU as an activation function in these layers. After convolutional layer bidirectional GRU layers are incorporated to learn the temporal information of the elec historical data. Finally, the fully connected layers are integrated for the final output prediction 44,45], crowed counting [46], etc. In the time series domain, the CNN layers a information and then pass the output into sequential learning algorithms s GRU.
RNN [47] is a sequence learning architecture with backward connectio that include some kind of memory and is extensively used in several d language processing [48], time series analysis [49], and speech recognition [50 [51][52][53], etc. The RNN models generate output at each time stamp from the to the vanishing gradient problem. The RNN model forgets the long sequenc as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by LST mechanism input, output, and forget. The mathematical representation of Equations (1)- (6). In these equations the output of each gates is represented t where "ʘ" represents the activation function. The weights of the gates are r whereas" Ʈ -1" refers to the output of previous LSTM block and " " represe The LSTM structure is complex and computationally expensive due to these g cells. To overcome the concern of LSTM, another lightweight architecture is [54] which comprises the reset and update gates. The mathematical represen shown in Equations (7)-(10) where the update gate examines the earlier cell m and the reset gate merges the next cell input sequence with previous cell me used MB-GRU, which processes the sequence of input data in both backward [55]. The bidirectional RNN models perform better in several domains summarization [56], and load forecasting [57]. Therefore, in this study, we i GRU layers that contain both backward and forward layers, where the foreword layer is iteratively calculated through input in the positive sequ backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal fea deployed a solo model that is insufficient to extract both types of features at work, we established a hybrid model that combines CNN with MB-GRU, as proposed hybrid model includes an input layer, CNN layers, and bidirec CNN layers are incorporated after several experiments over different layers a Finally, we select filters of 8 and 4 for the first and second CNN layers w respectively and used ReLU as an activation function in these layers. After c bidirectional GRU layers are incorporated to learn the temporal inform historical data. Finally, the fully connected layers are integrated for the final −1 , [44,45], crowed counting [46], etc. In the time series domain, the CNN l information and then pass the output into sequential learning algori GRU.
RNN [47] is a sequence learning architecture with backward con that include some kind of memory and is extensively used in sev language processing [48], time series analysis [49], and speech recognit [51][52][53], etc. The RNN models generate output at each time stamp fro to the vanishing gradient problem. The RNN model forgets the long se as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed mechanism input, output, and forget. The mathematical representa Equations (1)- (6). In these equations the output of each gates is repres where "ʘ" represents the activation function. The weights of the gate whereas" Ʈ -1" refers to the output of previous LSTM block and " " r The LSTM structure is complex and computationally expensive due to cells. To overcome the concern of LSTM, another lightweight archite [54] which comprises the reset and update gates. The mathematical re shown in Equations (7)-(10) where the update gate examines the earlie and the reset gate merges the next cell input sequence with previous used MB-GRU, which processes the sequence of input data in both ba [55]. The bidirectional RNN models perform better in several do summarization [56], and load forecasting [57]. Therefore, in this stud GRU layers that contain both backward and forward layers, wher foreword layer is iteratively calculated through input in the positiv backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and tempo deployed a solo model that is insufficient to extract both types of featu work, we established a hybrid model that combines CNN with MB-G proposed hybrid model includes an input layer, CNN layers, and b CNN layers are incorporated after several experiments over different l Finally, we select filters of 8 and 4 for the first and second CNN la respectively and used ReLU as an activation function in these layers. A bidirectional GRU layers are incorporated to learn the temporal historical data. Finally, the fully connected layers are integrated for th C of features from the convolutional layers. The CNN architecture is used in different domain image and video recognition [17,35,40,41], language processing [42,43], electricity load fo [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are used to extra information and then pass the output into sequential learning algorithms such as RNN L GRU. RNN [47] is a sequence learning architecture with backward connections among hidd that include some kind of memory and is extensively used in several domains such a language processing [48], time series analysis [49], and speech recognition [50], visual data p [51][52][53], etc. The RNN models generate output at each time stamp from the input data, wh to the vanishing gradient problem. The RNN model forgets the long sequence of electricity d as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by LSTM using the t mechanism input, output, and forget. The mathematical representation of each gate is s Equations (1)- (6). In these equations the output of each gates is represented through " ", "ʄ" where "ʘ" represents the activation function. The weights of the gates are represented thro whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents the bias of The LSTM structure is complex and computationally expensive due to these gates' units and cells. To overcome the concern of LSTM, another lightweight architecture is developed ca [54] which comprises the reset and update gates. The mathematical representation of GRU shown in Equations (7)-(10) where the update gate examines the earlier cell memory to rema and the reset gate merges the next cell input sequence with previous cell memory. In this s used MB-GRU, which processes the sequence of input data in both backward and forward d [55]. The bidirectional RNN models perform better in several domains such as class summarization [56], and load forecasting [57]. Therefore, in this study, we incorporate bid GRU layers that contain both backward and forward layers, where the output sequen foreword layer is iteratively calculated through input in the positive sequence. The outp backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal features. Some re deployed a solo model that is insufficient to extract both types of features at a time. Therefo work, we established a hybrid model that combines CNN with MB-GRU, as shown in Figur proposed hybrid model includes an input layer, CNN layers, and bidirectional GRU lay CNN layers are incorporated after several experiments over different layers and different pa Finally, we select filters of 8 and 4 for the first and second CNN layers with a kernel si respectively and used ReLU as an activation function in these layers. After convolutional la bidirectional GRU layers are incorporated to learn the temporal information of the e historical data. Finally, the fully connected layers are integrated for the final output predict = Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 12 of features from the convolutional layers. The CNN architecture is used in different domains, such as image and video recognition [17,35,40,41], language processing [42,43], electricity load forecasting [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are used to extract spatial information and then pass the output into sequential learning algorithms such as RNN LSTM and GRU. RNN [47] is a sequence learning architecture with backward connections among hidden layers that include some kind of memory and is extensively used in several domains such as natural language processing [48], time series analysis [49], and speech recognition [50], visual data processing [51][52][53], etc. The RNN models generate output at each time stamp from the input data, which leads to the vanishing gradient problem. The RNN model forgets the long sequence of electricity data, such as 60-min resolution, which leads to loss of important information. (1) The problem of losing long sequence information is addressed by LSTM using the three-gate mechanism input, output, and forget. The mathematical representation of each gate is shown in Equations (1)- (6). In these equations the output of each gates is represented through " ", "ʄ" and " " where "ʘ" represents the activation function. The weights of the gates are represented through "ώ'', whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents the bias of the gates. The LSTM structure is complex and computationally expensive due to these gates' units and memory cells. To overcome the concern of LSTM, another lightweight architecture is developed called GRU [54] which comprises the reset and update gates. The mathematical representation of GRU gates are shown in Equations (7)-(10) where the update gate examines the earlier cell memory to remain active, and the reset gate merges the next cell input sequence with previous cell memory. In this study, we used MB-GRU, which processes the sequence of input data in both backward and forward directions [55]. The bidirectional RNN models perform better in several domains such as classification, summarization [56], and load forecasting [57]. Therefore, in this study, we incorporate bidirectional GRU layers that contain both backward and forward layers, where the output sequence of the foreword layer is iteratively calculated through input in the positive sequence. The output of the backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal features. Some researchers deployed a solo model that is insufficient to extract both types of features at a time. Therefore, in this work, we established a hybrid model that combines CNN with MB-GRU, as shown in Figure 1b. The proposed hybrid model includes an input layer, CNN layers, and bidirectional GRU layers. Two CNN layers are incorporated after several experiments over different layers and different parameters. Finally, we select filters of 8 and 4 for the first and second CNN layers with a kernel size of 3.1, respectively and used ReLU as an activation function in these layers. After convolutional layers, two bidirectional GRU layers are incorporated to learn the temporal information of the electricity image and video recognition [17,35,40,41], language processing [42,43], electricity lo [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are used to information and then pass the output into sequential learning algorithms such as RN GRU.
RNN [47] is a sequence learning architecture with backward connections among that include some kind of memory and is extensively used in several domains su language processing [48], time series analysis [49], and speech recognition [50], visual d [51][52][53], etc. The RNN models generate output at each time stamp from the input dat to the vanishing gradient problem. The RNN model forgets the long sequence of electr as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by LSTM using mechanism input, output, and forget. The mathematical representation of each gat Equations (1)- (6). In these equations the output of each gates is represented through " where "ʘ" represents the activation function. The weights of the gates are represente whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents the bi The LSTM structure is complex and computationally expensive due to these gates' unit cells. To overcome the concern of LSTM, another lightweight architecture is develop [54] which comprises the reset and update gates. The mathematical representation of shown in Equations (7)-(10) where the update gate examines the earlier cell memory to and the reset gate merges the next cell input sequence with previous cell memory. In used MB-GRU, which processes the sequence of input data in both backward and forw [55]. The bidirectional RNN models perform better in several domains such as summarization [56], and load forecasting [57]. Therefore, in this study, we incorporat GRU layers that contain both backward and forward layers, where the output se foreword layer is iteratively calculated through input in the positive sequence. The backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal features. Som deployed a solo model that is insufficient to extract both types of features at a time. Th work, we established a hybrid model that combines CNN with MB-GRU, as shown in proposed hybrid model includes an input layer, CNN layers, and bidirectional GR CNN layers are incorporated after several experiments over different layers and differe Finally, we select filters of 8 and 4 for the first and second CNN layers with a ker respectively and used ReLU as an activation function in these layers. After convolutio bidirectional GRU layers are incorporated to learn the temporal information of historical data. Finally, the fully connected layers are integrated for the final output pr × C of features from the convolutional layers. The CNN architecture is used in differ image and video recognition [17,35,40,41], language processing [42,43], electri [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are u information and then pass the output into sequential learning algorithms such GRU.
RNN [47] is a sequence learning architecture with backward connections a that include some kind of memory and is extensively used in several doma language processing [48], time series analysis [49], and speech recognition [50], v [51][52][53], etc. The RNN models generate output at each time stamp from the inp to the vanishing gradient problem. The RNN model forgets the long sequence of as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by LSTM mechanism input, output, and forget. The mathematical representation of ea Equations (1)- (6). In these equations the output of each gates is represented thro where "ʘ" represents the activation function. The weights of the gates are repre whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents The LSTM structure is complex and computationally expensive due to these gate cells. To overcome the concern of LSTM, another lightweight architecture is de [54] which comprises the reset and update gates. The mathematical representat shown in Equations (7) and the reset gate merges the next cell input sequence with previous cell memo used MB-GRU, which processes the sequence of input data in both backward an [55]. The bidirectional RNN models perform better in several domains su summarization [56], and load forecasting [57]. Therefore, in this study, we inco GRU layers that contain both backward and forward layers, where the out foreword layer is iteratively calculated through input in the positive sequenc backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal feature deployed a solo model that is insufficient to extract both types of features at a tim work, we established a hybrid model that combines CNN with MB-GRU, as sho proposed hybrid model includes an input layer, CNN layers, and bidirection CNN layers are incorporated after several experiments over different layers and Finally, we select filters of 8 and 4 for the first and second CNN layers with respectively and used ReLU as an activation function in these layers. After conv bidirectional GRU layers are incorporated to learn the temporal informatio historical data. Finally, the fully connected layers are integrated for the final out −1 + of features from the convolutional layers. The CNN architecture is used image and video recognition [17,35,40,41], language processing [42,43] [ 44,45], crowed counting [46], etc. In the time series domain, the CNN lay information and then pass the output into sequential learning algorith GRU.
RNN [47] is a sequence learning architecture with backward conne that include some kind of memory and is extensively used in sever language processing [48], time series analysis [49], and speech recognitio [51][52][53], etc. The RNN models generate output at each time stamp from to the vanishing gradient problem. The RNN model forgets the long sequ as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by mechanism input, output, and forget. The mathematical representatio Equations (1)- (6). In these equations the output of each gates is represen where "ʘ" represents the activation function. The weights of the gates a whereas" Ʈ -1" refers to the output of previous LSTM block and " " rep The LSTM structure is complex and computationally expensive due to th cells. To overcome the concern of LSTM, another lightweight architectu [54] which comprises the reset and update gates. The mathematical repr shown in Equations (7) and the reset gate merges the next cell input sequence with previous ce used MB-GRU, which processes the sequence of input data in both backw [55]. The bidirectional RNN models perform better in several dom summarization [56], and load forecasting [57]. Therefore, in this study, GRU layers that contain both backward and forward layers, where foreword layer is iteratively calculated through input in the positive s backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and tempora deployed a solo model that is insufficient to extract both types of feature work, we established a hybrid model that combines CNN with MB-GRU proposed hybrid model includes an input layer, CNN layers, and bid CNN layers are incorporated after several experiments over different lay Finally, we select filters of 8 and 4 for the first and second CNN laye respectively and used ReLU as an activation function in these layers. Af bidirectional GRU layers are incorporated to learn the temporal in historical data. Finally, the fully connected layers are integrated for the f × of features from the convolutional layers. The CNN architecture is image and video recognition [17,35,40,41], language processing [44,45], crowed counting [46], etc. In the time series domain, the CN information and then pass the output into sequential learning al GRU.
RNN [47] is a sequence learning architecture with backward that include some kind of memory and is extensively used in language processing [48], time series analysis [49], and speech reco [51][52][53], etc. The RNN models generate output at each time stam to the vanishing gradient problem. The RNN model forgets the lon as 60-min resolution, which leads to loss of important information The problem of losing long sequence information is addres mechanism input, output, and forget. The mathematical repres Equations (1)- (6). In these equations the output of each gates is re where "ʘ" represents the activation function. The weights of the whereas" Ʈ -1" refers to the output of previous LSTM block and " The LSTM structure is complex and computationally expensive du cells. To overcome the concern of LSTM, another lightweight arc [54] which comprises the reset and update gates. The mathematic shown in Equations (7) and the reset gate merges the next cell input sequence with previ used MB-GRU, which processes the sequence of input data in both [55]. The bidirectional RNN models perform better in severa summarization [56], and load forecasting [57]. Therefore, in this s GRU layers that contain both backward and forward layers, w foreword layer is iteratively calculated through input in the po backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and te deployed a solo model that is insufficient to extract both types of f work, we established a hybrid model that combines CNN with M proposed hybrid model includes an input layer, CNN layers, a CNN layers are incorporated after several experiments over differe Finally, we select filters of 8 and 4 for the first and second CN respectively and used ReLU as an activation function in these laye bidirectional GRU layers are incorporated to learn the tempo historical data. Finally, the fully connected layers are integrated fo (4) O of features from the convolutional layers. The CNN architecture is used in different domains, image and video recognition [17,35,40,41], language processing [42,43], electricity load fore [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are used to extrac information and then pass the output into sequential learning algorithms such as RNN LST GRU.
RNN [47] is a sequence learning architecture with backward connections among hidden that include some kind of memory and is extensively used in several domains such as language processing [48], time series analysis [49], and speech recognition [50], visual data pro [51][52][53], etc. The RNN models generate output at each time stamp from the input data, whic to the vanishing gradient problem. The RNN model forgets the long sequence of electricity da as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by LSTM using the thr mechanism input, output, and forget. The mathematical representation of each gate is sh Equations (1)- (6). In these equations the output of each gates is represented through " ", "ʄ" a where "ʘ" represents the activation function. The weights of the gates are represented throu whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents the bias of th The LSTM structure is complex and computationally expensive due to these gates' units and m cells. To overcome the concern of LSTM, another lightweight architecture is developed calle [54] which comprises the reset and update gates. The mathematical representation of GRU g shown in Equations (7) and the reset gate merges the next cell input sequence with previous cell memory. In this stu used MB-GRU, which processes the sequence of input data in both backward and forward di [55]. The bidirectional RNN models perform better in several domains such as classif summarization [56], and load forecasting [57]. Therefore, in this study, we incorporate bidir GRU layers that contain both backward and forward layers, where the output sequence foreword layer is iteratively calculated through input in the positive sequence. The outpu backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal features. Some rese deployed a solo model that is insufficient to extract both types of features at a time. Therefore work, we established a hybrid model that combines CNN with MB-GRU, as shown in Figure proposed hybrid model includes an input layer, CNN layers, and bidirectional GRU laye CNN layers are incorporated after several experiments over different layers and different para Finally, we select filters of 8 and 4 for the first and second CNN layers with a kernel size respectively and used ReLU as an activation function in these layers. After convolutional lay bidirectional GRU layers are incorporated to learn the temporal information of the ele historical data. Finally, the fully connected layers are integrated for the final output predictio of features from the convolutional layers. The CNN architecture is used in diff image and video recognition [17,35,40,41], language processing [42,43], elect [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are information and then pass the output into sequential learning algorithms su GRU.
RNN [47] is a sequence learning architecture with backward connections that include some kind of memory and is extensively used in several dom language processing [48], time series analysis [49], and speech recognition [50], [51][52][53], etc. The RNN models generate output at each time stamp from the in to the vanishing gradient problem. The RNN model forgets the long sequence as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by LSTM mechanism input, output, and forget. The mathematical representation of Equations (1)- (6). In these equations the output of each gates is represented th where "ʘ" represents the activation function. The weights of the gates are rep whereas" Ʈ -1" refers to the output of previous LSTM block and " " represen The LSTM structure is complex and computationally expensive due to these ga cells. To overcome the concern of LSTM, another lightweight architecture is [54] which comprises the reset and update gates. The mathematical represent shown in Equations (7) and the reset gate merges the next cell input sequence with previous cell mem used MB-GRU, which processes the sequence of input data in both backward a [55]. The bidirectional RNN models perform better in several domains summarization [56], and load forecasting [57]. Therefore, in this study, we in GRU layers that contain both backward and forward layers, where the o foreword layer is iteratively calculated through input in the positive sequen backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal featu deployed a solo model that is insufficient to extract both types of features at a work, we established a hybrid model that combines CNN with MB-GRU, as sh proposed hybrid model includes an input layer, CNN layers, and bidirectio CNN layers are incorporated after several experiments over different layers an Finally, we select filters of 8 and 4 for the first and second CNN layers wi respectively and used ReLU as an activation function in these layers. After con bidirectional GRU layers are incorporated to learn the temporal informa historical data. Finally, the fully connected layers are integrated for the final o −1 , of features from the convolutional layers. The CNN architecture is used image and video recognition [17,35,40,41], language processing [42,43 [44,45], crowed counting [46], etc. In the time series domain, the CNN lay information and then pass the output into sequential learning algorith GRU.
RNN [47] is a sequence learning architecture with backward conn that include some kind of memory and is extensively used in seve language processing [48], time series analysis [49], and speech recognitio [51][52][53], etc. The RNN models generate output at each time stamp from to the vanishing gradient problem. The RNN model forgets the long seq as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed b mechanism input, output, and forget. The mathematical representati Equations (1)- (6). In these equations the output of each gates is represen where "ʘ" represents the activation function. The weights of the gates whereas" Ʈ -1" refers to the output of previous LSTM block and " " re The LSTM structure is complex and computationally expensive due to t cells. To overcome the concern of LSTM, another lightweight architect [54] which comprises the reset and update gates. The mathematical rep shown in Equations (7) and the reset gate merges the next cell input sequence with previous ce used MB-GRU, which processes the sequence of input data in both back [55]. The bidirectional RNN models perform better in several dom summarization [56], and load forecasting [57]. Therefore, in this study, GRU layers that contain both backward and forward layers, where foreword layer is iteratively calculated through input in the positive backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and tempor deployed a solo model that is insufficient to extract both types of featur work, we established a hybrid model that combines CNN with MB-GRU proposed hybrid model includes an input layer, CNN layers, and bi CNN layers are incorporated after several experiments over different lay Finally, we select filters of 8 and 4 for the first and second CNN lay respectively and used ReLU as an activation function in these layers. A bidirectional GRU layers are incorporated to learn the temporal in historical data. Finally, the fully connected layers are integrated for the Appl. Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is used in different doma image and video recognition [17,35,40,41], language processing [42,43], electricity load [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are used to ext information and then pass the output into sequential learning algorithms such as RNN GRU. RNN [47] is a sequence learning architecture with backward connections among hid that include some kind of memory and is extensively used in several domains such language processing [48], time series analysis [49], and speech recognition [50], visual data [51][52][53], etc. The RNN models generate output at each time stamp from the input data, w to the vanishing gradient problem. The RNN model forgets the long sequence of electricity as 60-min resolution, which leads to loss of important information. The problem of losing long sequence information is addressed by LSTM using the mechanism input, output, and forget. The mathematical representation of each gate is Equations (1)- (6). In these equations the output of each gates is represented through " ", " where "ʘ" represents the activation function. The weights of the gates are represented th whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents the bias o The LSTM structure is complex and computationally expensive due to these gates' units an cells. To overcome the concern of LSTM, another lightweight architecture is developed c [54] which comprises the reset and update gates. The mathematical representation of GR shown in Equations (7) and the reset gate merges the next cell input sequence with previous cell memory. In this used MB-GRU, which processes the sequence of input data in both backward and forward [55]. The bidirectional RNN models perform better in several domains such as cla summarization [56], and load forecasting [57]. Therefore, in this study, we incorporate b GRU layers that contain both backward and forward layers, where the output seque foreword layer is iteratively calculated through input in the positive sequence. The ou backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and temporal features. Some deployed a solo model that is insufficient to extract both types of features at a time. There work, we established a hybrid model that combines CNN with MB-GRU, as shown in Fig proposed hybrid model includes an input layer, CNN layers, and bidirectional GRU l CNN layers are incorporated after several experiments over different layers and different p Finally, we select filters of 8 and 4 for the first and second CNN layers with a kernel respectively and used ReLU as an activation function in these layers. After convolutional bidirectional GRU layers are incorporated to learn the temporal information of the historical data. Finally, the fully connected layers are integrated for the final output predi = O Appl. Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is used in different image and video recognition [17,35,40,41], language processing [42,43], electricity [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are use information and then pass the output into sequential learning algorithms such as GRU.
RNN [47] is a sequence learning architecture with backward connections amo that include some kind of memory and is extensively used in several domain language processing [48], time series analysis [49], and speech recognition [50], visu [51][52][53], etc. The RNN models generate output at each time stamp from the input to the vanishing gradient problem. The RNN model forgets the long sequence of ele as 60-min resolution, which leads to loss of important information. The problem of losing long sequence information is addressed by LSTM usi mechanism input, output, and forget. The mathematical representation of each Equations (1)- (6). In these equations the output of each gates is represented throug where "ʘ" represents the activation function. The weights of the gates are represe whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents the The LSTM structure is complex and computationally expensive due to these gates' u cells. To overcome the concern of LSTM, another lightweight architecture is deve [54] which comprises the reset and update gates. The mathematical representation shown in Equations (7) and the reset gate merges the next cell input sequence with previous cell memory. used MB-GRU, which processes the sequence of input data in both backward and f [55]. The bidirectional RNN models perform better in several domains such summarization [56], and load forecasting [57]. Therefore, in this study, we incorpo GRU layers that contain both backward and forward layers, where the output foreword layer is iteratively calculated through input in the positive sequence. T backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and temporal features. deployed a solo model that is insufficient to extract both types of features at a time. work, we established a hybrid model that combines CNN with MB-GRU, as shown proposed hybrid model includes an input layer, CNN layers, and bidirectional CNN layers are incorporated after several experiments over different layers and diff Finally, we select filters of 8 and 4 for the first and second CNN layers with a respectively and used ReLU as an activation function in these layers. After convolu bidirectional GRU layers are incorporated to learn the temporal information historical data. Finally, the fully connected layers are integrated for the final outpu × tan ( (C Appl. Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is us image and video recognition [17,35,40,41], language processing [42 [44,45], crowed counting [46], etc. In the time series domain, the CNN information and then pass the output into sequential learning algo GRU.
RNN [47] is a sequence learning architecture with backward co that include some kind of memory and is extensively used in se language processing [48], time series analysis [49], and speech recogn [51][52][53], etc. The RNN models generate output at each time stamp fr to the vanishing gradient problem. The RNN model forgets the long s as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed mechanism input, output, and forget. The mathematical represent Equations (1)- (6). In these equations the output of each gates is repre where "ʘ" represents the activation function. The weights of the gat whereas" Ʈ -1" refers to the output of previous LSTM block and " " The LSTM structure is complex and computationally expensive due t cells. To overcome the concern of LSTM, another lightweight archit [54] which comprises the reset and update gates. The mathematical shown in Equations (7) and the reset gate merges the next cell input sequence with previou used MB-GRU, which processes the sequence of input data in both b [55]. The bidirectional RNN models perform better in several d summarization [56], and load forecasting [57]. Therefore, in this stud GRU layers that contain both backward and forward layers, whe foreword layer is iteratively calculated through input in the positi backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and temp deployed a solo model that is insufficient to extract both types of feat work, we established a hybrid model that combines CNN with MB-G proposed hybrid model includes an input layer, CNN layers, and CNN layers are incorporated after several experiments over different Finally, we select filters of 8 and 4 for the first and second CNN l respectively and used ReLU as an activation function in these layers. bidirectional GRU layers are incorporated to learn the temporal historical data. Finally, the fully connected layers are integrated for t ). (6) The problem of losing long sequence information is addressed by LSTM using the three-gate mechanism input, output, and forget. The mathematical representation of each gate is shown in Equations (1)- (6). In these equations the output of each gates is represented through " ", "ê" and "O" where " " represents the activation function. The weights of the gates are represented through "ώ", whereas" Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 12 of features from the convolutional layers. The CNN architecture is used in different domains, such as image and video recognition [17,35,40,41], language processing [42,43], electricity load forecasting [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are used to extract spatial information and then pass the output into sequential learning algorithms such as RNN LSTM and GRU. RNN [47] is a sequence learning architecture with backward connections among hidden layers that include some kind of memory and is extensively used in several domains such as natural language processing [48], time series analysis [49], and speech recognition [50], visual data processing [51][52][53], etc. The RNN models generate output at each time stamp from the input data, which leads to the vanishing gradient problem. The RNN model forgets the long sequence of electricity data, such as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by LSTM using the three-gate mechanism input, output, and forget. The mathematical representation of each gate is shown in Equations (1)- (6). In these equations the output of each gates is represented through " ", "ʄ" and " " where "ʘ" represents the activation function. The weights of the gates are represented through "ώ'', whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents the bias of the gates. The LSTM structure is complex and computationally expensive due to these gates' units and memory cells. To overcome the concern of LSTM, another lightweight architecture is developed called GRU [54] which comprises the reset and update gates. The mathematical representation of GRU gates are shown in Equations (7)-(10) where the update gate examines the earlier cell memory to remain active, and the reset gate merges the next cell input sequence with previous cell memory. In this study, we used MB-GRU, which processes the sequence of input data in both backward and forward directions [55]. The bidirectional RNN models perform better in several domains such as classification, summarization [56], and load forecasting [57]. Therefore, in this study, we incorporate bidirectional GRU layers that contain both backward and forward layers, where the output sequence of the foreword layer is iteratively calculated through input in the positive sequence. The output of the backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal features. Some researchers deployed a solo model that is insufficient to extract both types of features at a time. Therefore, in this work, we established a hybrid model that combines CNN with MB-GRU, as shown in Figure 1b. The proposed hybrid model includes an input layer, CNN layers, and bidirectional GRU layers. Two CNN layers are incorporated after several experiments over different layers and different parameters. Finally, we select filters of 8 and 4 for the first and second CNN layers with a kernel size of 3.1, respectively and used ReLU as an activation function in these layers. After convolutional layers, two bidirectional GRU layers are incorporated to learn the temporal information of the electricity historical data. Finally, the fully connected layers are integrated for the final output prediction. −1 " refers to the output of previous LSTM block and " " represents the bias of the gates. The LSTM structure is complex and computationally expensive due to these gates' units and memory cells. To overcome the concern of LSTM, another lightweight architecture is developed called GRU [54] which comprises the reset and update gates. The mathematical representation of GRU gates are shown in Equations (7)-(10) where the update gate examines the earlier cell memory to remain active, Appl. Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is used in different domains, image and video recognition [17,35,40,41], language processing [42,43], electricity load for [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are used to extrac information and then pass the output into sequential learning algorithms such as RNN LS GRU.
RNN [47] is a sequence learning architecture with backward connections among hidde that include some kind of memory and is extensively used in several domains such as language processing [48], time series analysis [49], and speech recognition [50], visual data pro [51][52][53], etc. The RNN models generate output at each time stamp from the input data, whi to the vanishing gradient problem. The RNN model forgets the long sequence of electricity da as 60-min resolution, which leads to loss of important information.
The problem of losing long sequence information is addressed by LSTM using the th mechanism input, output, and forget. The mathematical representation of each gate is sh Equations (1)- (6). In these equations the output of each gates is represented through " ", "ʄ" where "ʘ" represents the activation function. The weights of the gates are represented throu whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents the bias of t The LSTM structure is complex and computationally expensive due to these gates' units and m cells. To overcome the concern of LSTM, another lightweight architecture is developed call [54] which comprises the reset and update gates. The mathematical representation of GRU g shown in Equations (7) and the reset gate merges the next cell input sequence with previous cell memory. In this st used MB-GRU, which processes the sequence of input data in both backward and forward di [55]. The bidirectional RNN models perform better in several domains such as classi summarization [56], and load forecasting [57]. Therefore, in this study, we incorporate bidir GRU layers that contain both backward and forward layers, where the output sequenc foreword layer is iteratively calculated through input in the positive sequence. The outpu backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and temporal features. Some rese deployed a solo model that is insufficient to extract both types of features at a time. Therefore work, we established a hybrid model that combines CNN with MB-GRU, as shown in Figure  proposed hybrid model includes an input layer, CNN layers, and bidirectional GRU laye CNN layers are incorporated after several experiments over different layers and different para Finally, we select filters of 8 and 4 for the first and second CNN layers with a kernel siz Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is used in dif image and video recognition [17,35,40,41], language processing [42,43], elec [44,45], crowed counting [46], etc. In the time series domain, the CNN layers ar information and then pass the output into sequential learning algorithms su GRU.
RNN [47] is a sequence learning architecture with backward connection that include some kind of memory and is extensively used in several do language processing [48], time series analysis [49], and speech recognition [50] [51-53], etc. The RNN models generate output at each time stamp from the i to the vanishing gradient problem. The RNN model forgets the long sequence as 60-min resolution, which leads to loss of important information. The problem of losing long sequence information is addressed by LST mechanism input, output, and forget. The mathematical representation of Equations (1)- (6). In these equations the output of each gates is represented th where "ʘ" represents the activation function. The weights of the gates are re whereas" Ʈ -1" refers to the output of previous LSTM block and " " represen The LSTM structure is complex and computationally expensive due to these g cells. To overcome the concern of LSTM, another lightweight architecture is [54] which comprises the reset and update gates. The mathematical represen shown in Equations (7) and the reset gate merges the next cell input sequence with previous cell me used MB-GRU, which processes the sequence of input data in both backward [55]. The bidirectional RNN models perform better in several domains summarization [56], and load forecasting [57]. Therefore, in this study, we in GRU layers that contain both backward and forward layers, where the o foreword layer is iteratively calculated through input in the positive seque backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and temporal feat deployed a solo model that is insufficient to extract both types of features at a work, we established a hybrid model that combines CNN with MB-GRU, as s proposed hybrid model includes an input layer, CNN layers, and bidirecti CNN layers are incorporated after several experiments over different layers an Finally, we select filters of 8 and 4 for the first and second CNN layers w −1 , Appl. Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is used image and video recognition [17,35,40,41], language processing [42,43 [44,45], crowed counting [46], etc. In the time series domain, the CNN la information and then pass the output into sequential learning algorith GRU.
RNN [47] is a sequence learning architecture with backward conn that include some kind of memory and is extensively used in seve language processing [48], time series analysis [49], and speech recognitio [51][52][53], etc. The RNN models generate output at each time stamp from to the vanishing gradient problem. The RNN model forgets the long seq as 60-min resolution, which leads to loss of important information. The problem of losing long sequence information is addressed b mechanism input, output, and forget. The mathematical representati Equations (1)- (6). In these equations the output of each gates is represe where "ʘ" represents the activation function. The weights of the gates whereas" Ʈ -1" refers to the output of previous LSTM block and " " re The LSTM structure is complex and computationally expensive due to t cells. To overcome the concern of LSTM, another lightweight architect [54] which comprises the reset and update gates. The mathematical rep shown in Equations (7) and the reset gate merges the next cell input sequence with previous c used MB-GRU, which processes the sequence of input data in both back [55]. The bidirectional RNN models perform better in several dom summarization [56], and load forecasting [57]. Therefore, in this study GRU layers that contain both backward and forward layers, where foreword layer is iteratively calculated through input in the positive backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and tempor deployed a solo model that is insufficient to extract both types of featur work, we established a hybrid model that combines CNN with MB-GR proposed hybrid model includes an input layer, CNN layers, and bi CNN layers are incorporated after several experiments over different lay Finally, we select filters of 8 and 4 for the first and second CNN lay Appl. Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is used in different domains image and video recognition [17,35,40,41], language processing [42,43], electricity load for [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are used to extrac information and then pass the output into sequential learning algorithms such as RNN LS GRU. RNN [47] is a sequence learning architecture with backward connections among hidde that include some kind of memory and is extensively used in several domains such as language processing [48], time series analysis [49], and speech recognition [50], visual data pro [51][52][53], etc. The RNN models generate output at each time stamp from the input data, whi to the vanishing gradient problem. The RNN model forgets the long sequence of electricity da as 60-min resolution, which leads to loss of important information. The problem of losing long sequence information is addressed by LSTM using the th mechanism input, output, and forget. The mathematical representation of each gate is sh Equations (1)- (6). In these equations the output of each gates is represented through " ", "ʄ" where "ʘ" represents the activation function. The weights of the gates are represented throu whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents the bias of t The LSTM structure is complex and computationally expensive due to these gates' units and m cells. To overcome the concern of LSTM, another lightweight architecture is developed call [54] which comprises the reset and update gates. The mathematical representation of GRU g shown in Equations (7) and the reset gate merges the next cell input sequence with previous cell memory. In this st used MB-GRU, which processes the sequence of input data in both backward and forward di [55]. The bidirectional RNN models perform better in several domains such as classi summarization [56], and load forecasting [57]. Therefore, in this study, we incorporate bidir GRU layers that contain both backward and forward layers, where the output sequenc foreword layer is iteratively calculated through input in the positive sequence. The outpu backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and temporal features. Some rese deployed a solo model that is insufficient to extract both types of features at a time. Therefore work, we established a hybrid model that combines CNN with MB-GRU, as shown in Figure  proposed hybrid model includes an input layer, CNN layers, and bidirectional GRU laye Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is used in dif image and video recognition [17,35,40,41], language processing [42,43], elec [44,45], crowed counting [46], etc. In the time series domain, the CNN layers ar information and then pass the output into sequential learning algorithms su GRU.
RNN [47] is a sequence learning architecture with backward connection that include some kind of memory and is extensively used in several do language processing [48], time series analysis [49], and speech recognition [50] [51-53], etc. The RNN models generate output at each time stamp from the i to the vanishing gradient problem. The RNN model forgets the long sequence as 60-min resolution, which leads to loss of important information. The problem of losing long sequence information is addressed by LST mechanism input, output, and forget. The mathematical representation of Equations (1)- (6). In these equations the output of each gates is represented th where "ʘ" represents the activation function. The weights of the gates are re whereas" Ʈ -1" refers to the output of previous LSTM block and " " represen The LSTM structure is complex and computationally expensive due to these g cells. To overcome the concern of LSTM, another lightweight architecture is [54] which comprises the reset and update gates. The mathematical represen shown in Equations (7) and the reset gate merges the next cell input sequence with previous cell me used MB-GRU, which processes the sequence of input data in both backward [55]. The bidirectional RNN models perform better in several domains summarization [56], and load forecasting [57]. Therefore, in this study, we in GRU layers that contain both backward and forward layers, where the o foreword layer is iteratively calculated through input in the positive seque backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and temporal feat deployed a solo model that is insufficient to extract both types of features at a work, we established a hybrid model that combines CNN with MB-GRU, as s proposed hybrid model includes an input layer, CNN layers, and bidirecti −1 , Appl. Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is used image and video recognition [17,35,40,41], language processing [42,43 [44,45], crowed counting [46], etc. In the time series domain, the CNN lay information and then pass the output into sequential learning algorith GRU.
RNN [47] is a sequence learning architecture with backward conn that include some kind of memory and is extensively used in seve language processing [48], time series analysis [49], and speech recognitio [51][52][53], etc. The RNN models generate output at each time stamp from to the vanishing gradient problem. The RNN model forgets the long seq as 60-min resolution, which leads to loss of important information. The problem of losing long sequence information is addressed b mechanism input, output, and forget. The mathematical representati Equations (1)- (6). In these equations the output of each gates is represen where "ʘ" represents the activation function. The weights of the gates whereas" Ʈ -1" refers to the output of previous LSTM block and " " re The LSTM structure is complex and computationally expensive due to t cells. To overcome the concern of LSTM, another lightweight architect [54] which comprises the reset and update gates. The mathematical rep shown in Equations (7) and the reset gate merges the next cell input sequence with previous c used MB-GRU, which processes the sequence of input data in both back [55]. The bidirectional RNN models perform better in several dom summarization [56], and load forecasting [57]. Therefore, in this study, GRU layers that contain both backward and forward layers, where foreword layer is iteratively calculated through input in the positive backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and tempor deployed a solo model that is insufficient to extract both types of featur work, we established a hybrid model that combines CNN with MB-GR proposed hybrid model includes an input layer, CNN layers, and bi ] + ) (8) Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is used in diff image and video recognition [17,35,40,41], language processing [42,43], elec [44,45], crowed counting [46], etc. In the time series domain, the CNN layers ar information and then pass the output into sequential learning algorithms su GRU. RNN [47] is a sequence learning architecture with backward connection that include some kind of memory and is extensively used in several do language processing [48], time series analysis [49], and speech recognition [50] [ [51][52][53], etc. The RNN models generate output at each time stamp from the i to the vanishing gradient problem. The RNN model forgets the long sequence as 60-min resolution, which leads to loss of important information. The problem of losing long sequence information is addressed by LSTM mechanism input, output, and forget. The mathematical representation of Equations (1)- (6). In these equations the output of each gates is represented th where "ʘ" represents the activation function. The weights of the gates are rep whereas" Ʈ -1" refers to the output of previous LSTM block and " " represen The LSTM structure is complex and computationally expensive due to these ga cells. To overcome the concern of LSTM, another lightweight architecture is [54] which comprises the reset and update gates. The mathematical represent shown in Equations (7) and the reset gate merges the next cell input sequence with previous cell mem used MB-GRU, which processes the sequence of input data in both backward [55]. The bidirectional RNN models perform better in several domains summarization [56], and load forecasting [57]. Therefore, in this study, we in GRU layers that contain both backward and forward layers, where the o foreword layer is iteratively calculated through input in the positive seque backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and temporal featu deployed a solo model that is insufficient to extract both types of features at a work, we established a hybrid model that combines CNN with MB-GRU, as sh · Appl. Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is used in image and video recognition [17,35,40,41], language processing [42,43], e [44,45], crowed counting [46], etc. In the time series domain, the CNN layer information and then pass the output into sequential learning algorithms GRU.
RNN [47] is a sequence learning architecture with backward connect that include some kind of memory and is extensively used in several language processing [48], time series analysis [49], and speech recognition [ [51][52][53], etc. The RNN models generate output at each time stamp from th to the vanishing gradient problem. The RNN model forgets the long sequen as 60-min resolution, which leads to loss of important information. The problem of losing long sequence information is addressed by L mechanism input, output, and forget. The mathematical representation Equations (1)- (6). In these equations the output of each gates is represente where "ʘ" represents the activation function. The weights of the gates are whereas" Ʈ -1" refers to the output of previous LSTM block and " " repre The LSTM structure is complex and computationally expensive due to thes cells. To overcome the concern of LSTM, another lightweight architecture [54] which comprises the reset and update gates. The mathematical repres shown in Equations (7) and the reset gate merges the next cell input sequence with previous cell m used MB-GRU, which processes the sequence of input data in both backwa [55]. The bidirectional RNN models perform better in several domai summarization [56], and load forecasting [57]. Therefore, in this study, we GRU layers that contain both backward and forward layers, where th foreword layer is iteratively calculated through input in the positive seq backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and temporal f deployed a solo model that is insufficient to extract both types of features a work, we established a hybrid model that combines CNN with MB-GRU, a −1 , Appl. Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is u image and video recognition [17,35,40,41], language processing [42 [44,45], crowed counting [46], etc. In the time series domain, the CNN information and then pass the output into sequential learning algo GRU.
RNN [47] is a sequence learning architecture with backward co that include some kind of memory and is extensively used in se language processing [48], time series analysis [49], and speech recogn [51][52][53], etc. The RNN models generate output at each time stamp f to the vanishing gradient problem. The RNN model forgets the long as 60-min resolution, which leads to loss of important information. The problem of losing long sequence information is addressed mechanism input, output, and forget. The mathematical represen Equations (1)- (6). In these equations the output of each gates is repre where "ʘ" represents the activation function. The weights of the ga whereas" Ʈ -1" refers to the output of previous LSTM block and " " The LSTM structure is complex and computationally expensive due t cells. To overcome the concern of LSTM, another lightweight archit [54] which comprises the reset and update gates. The mathematical shown in Equations (7) and the reset gate merges the next cell input sequence with previou used MB-GRU, which processes the sequence of input data in both b [55]. The bidirectional RNN models perform better in several summarization [56], and load forecasting [57]. Therefore, in this stu GRU layers that contain both backward and forward layers, wh foreword layer is iteratively calculated through input in the positi backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and temp deployed a solo model that is insufficient to extract both types of fea work, we established a hybrid model that combines CNN with MB-G ] + ) (9) Appl. Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is used in different domains image and video recognition [17,35,40,41], language processing [42,43], electricity load for [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are used to extrac information and then pass the output into sequential learning algorithms such as RNN LS GRU. RNN [47] is a sequence learning architecture with backward connections among hidde that include some kind of memory and is extensively used in several domains such as language processing [48], time series analysis [49], and speech recognition [50], visual data pr [51][52][53], etc. The RNN models generate output at each time stamp from the input data, whi to the vanishing gradient problem. The RNN model forgets the long sequence of electricity da as 60-min resolution, which leads to loss of important information. The problem of losing long sequence information is addressed by LSTM using the th mechanism input, output, and forget. The mathematical representation of each gate is sh Equations (1)- (6). In these equations the output of each gates is represented through " ", "ʄ" where "ʘ" represents the activation function. The weights of the gates are represented throu whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents the bias of t The LSTM structure is complex and computationally expensive due to these gates' units and cells. To overcome the concern of LSTM, another lightweight architecture is developed call [54] which comprises the reset and update gates. The mathematical representation of GRU g shown in Equations (7) and the reset gate merges the next cell input sequence with previous cell memory. In this st used MB-GRU, which processes the sequence of input data in both backward and forward di [55]. The bidirectional RNN models perform better in several domains such as classi summarization [56], and load forecasting [57]. Therefore, in this study, we incorporate bidir GRU layers that contain both backward and forward layers, where the output sequenc foreword layer is iteratively calculated through input in the positive sequence. The outpu backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and temporal features. Some res Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is used in differe image and video recognition [17,35,40,41], language processing [42,43], electric [44,45], crowed counting [46], etc. In the time series domain, the CNN layers are u information and then pass the output into sequential learning algorithms such GRU. RNN [47] is a sequence learning architecture with backward connections a that include some kind of memory and is extensively used in several doma language processing [48], time series analysis [49], and speech recognition [50], vi [51][52][53], etc. The RNN models generate output at each time stamp from the inpu to the vanishing gradient problem. The RNN model forgets the long sequence of e as 60-min resolution, which leads to loss of important information. The problem of losing long sequence information is addressed by LSTM u mechanism input, output, and forget. The mathematical representation of eac Equations (1)- (6). In these equations the output of each gates is represented throu where "ʘ" represents the activation function. The weights of the gates are repre whereas" Ʈ -1" refers to the output of previous LSTM block and " " represents The LSTM structure is complex and computationally expensive due to these gates cells. To overcome the concern of LSTM, another lightweight architecture is dev [54] which comprises the reset and update gates. The mathematical representati shown in Equations (7) and the reset gate merges the next cell input sequence with previous cell memo used MB-GRU, which processes the sequence of input data in both backward and [55]. The bidirectional RNN models perform better in several domains suc summarization [56], and load forecasting [57]. Therefore, in this study, we incor GRU layers that contain both backward and forward layers, where the outp foreword layer is iteratively calculated through input in the positive sequence backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and temporal feature

)·
Appl. Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is used in di image and video recognition [17,35,40,41], language processing [42,43], ele [44,45], crowed counting [46], etc. In the time series domain, the CNN layers a information and then pass the output into sequential learning algorithms s GRU. RNN [47] is a sequence learning architecture with backward connectio that include some kind of memory and is extensively used in several d language processing [48], time series analysis [49], and speech recognition [50 [51][52][53], etc. The RNN models generate output at each time stamp from the to the vanishing gradient problem. The RNN model forgets the long sequenc as 60-min resolution, which leads to loss of important information. The problem of losing long sequence information is addressed by LST mechanism input, output, and forget. The mathematical representation o Equations (1)- (6). In these equations the output of each gates is represented where "ʘ" represents the activation function. The weights of the gates are r whereas" Ʈ -1" refers to the output of previous LSTM block and " " represe The LSTM structure is complex and computationally expensive due to these cells. To overcome the concern of LSTM, another lightweight architecture i [54] which comprises the reset and update gates. The mathematical represen shown in Equations (7) and the reset gate merges the next cell input sequence with previous cell m used MB-GRU, which processes the sequence of input data in both backward [55]. The bidirectional RNN models perform better in several domains summarization [56], and load forecasting [57]. Therefore, in this study, we i GRU layers that contain both backward and forward layers, where the foreword layer is iteratively calculated through input in the positive sequ backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and temporal fea −1 + Appl. Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture is u image and video recognition [17,35,40,41], language processing [42 [44,45], crowed counting [46], etc. In the time series domain, the CNN information and then pass the output into sequential learning algo GRU. RNN [47] is a sequence learning architecture with backward co that include some kind of memory and is extensively used in s language processing [48], time series analysis [49], and speech recogn [51][52][53], etc. The RNN models generate output at each time stamp f to the vanishing gradient problem. The RNN model forgets the long as 60-min resolution, which leads to loss of important information. The problem of losing long sequence information is addresse mechanism input, output, and forget. The mathematical represen Equations (1)- (6). In these equations the output of each gates is repr where "ʘ" represents the activation function. The weights of the ga whereas" Ʈ -1" refers to the output of previous LSTM block and " " The LSTM structure is complex and computationally expensive due cells. To overcome the concern of LSTM, another lightweight archi [54] which comprises the reset and update gates. The mathematical shown in Equations (7) and the reset gate merges the next cell input sequence with previou used MB-GRU, which processes the sequence of input data in both b [55]. The bidirectional RNN models perform better in several summarization [56], and load forecasting [57]. Therefore, in this stu GRU layers that contain both backward and forward layers, wh foreword layer is iteratively calculated through input in the posit backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and temp · Appl. Sci. 2020, 10, x FOR PEER REVIEW of features from the convolutional layers. The CNN architecture image and video recognition [17,35,40,41], language processing [44,45], crowed counting [46], etc. In the time series domain, the C information and then pass the output into sequential learning a GRU. RNN [47] is a sequence learning architecture with backwar that include some kind of memory and is extensively used i language processing [48], time series analysis [49], and speech rec [51][52][53], etc. The RNN models generate output at each time stam to the vanishing gradient problem. The RNN model forgets the lo as 60-min resolution, which leads to loss of important informatio The problem of losing long sequence information is addre mechanism input, output, and forget. The mathematical repre Equations (1)- (6). In these equations the output of each gates is r where "ʘ" represents the activation function. The weights of the whereas" Ʈ -1" refers to the output of previous LSTM block and The LSTM structure is complex and computationally expensive d cells. To overcome the concern of LSTM, another lightweight ar [54] which comprises the reset and update gates. The mathemat shown in Equations (7) and the reset gate merges the next cell input sequence with prev used MB-GRU, which processes the sequence of input data in bo [55]. The bidirectional RNN models perform better in sever summarization [56], and load forecasting [57]. Therefore, in this GRU layers that contain both backward and forward layers, foreword layer is iteratively calculated through input in the p backward layer is calculated through the reverse of the input.
The electricity consumption patterns include spatial and t ) (10) and the reset gate merges the next cell input sequence with previous cell memory. In this study, we used MB-GRU, which processes the sequence of input data in both backward and forward directions [55]. The bidirectional RNN models perform better in several domains such as classification, summarization [56], and load forecasting [57]. Therefore, in this study, we incorporate bidirectional GRU layers that contain both backward and forward layers, where the output sequence of the foreword layer is iteratively calculated through input in the positive sequence. The output of the backward layer is calculated through the reverse of the input. The electricity consumption patterns include spatial and temporal features. Some researchers deployed a solo model that is insufficient to extract both types of features at a time. Therefore, in this work, we established a hybrid model that combines CNN with MB-GRU, as shown in Figure 1b. The proposed hybrid model includes an input layer, CNN layers, and bidirectional GRU layers. Two CNN layers are incorporated after several experiments over different layers and different parameters. Finally, we select filters of 8 and 4 for the first and second CNN layers with a kernel size of 3.1, respectively and used ReLU as an activation function in these layers. After convolutional layers, two bidirectional GRU layers are incorporated to learn the temporal information of the electricity historical data. Finally, the fully connected layers are integrated for the final output prediction.

Results and Discussion
In this section, we provide a detailed description of the dataset, evaluation metrics, and experimentation over the IHEPC and AEP datasets, and compare them with other baseline models. The model was trained over a GeFore GTX 2060 GPU with 64 GB RAM using the Keras framework with backend TensorFlow.

Datasets
The model's performance is assessed on two benchmark datasets, AEP and IHEPC [58,59]. The AEP dataset was recorded in 4.5 months in a residential house in 10 min resolution. This dataset comprises 29 various parameters of weather information (wind speed, humidity, dew point, temperature, and pressure), light, and appliance energy consumption, as presented in Table 1. The data samples were collected from both indoor and outdoor environments through a wireless sensor network. The outdoor data are collected from nearby airport. The building includes 9 indoor and 1 outdoor temperature sensors, 9 humidity sensors in which 7 are integrated with indoor environment and one is in outdoor environment. The outdoor pressure, visibility, temperature, humidity, and dew point are recorded nearby airport region. The IHEPC dataset includes 9 parameters which are; date, time, voltage, global-active-power (GAP), intensity, global-reactive-power (GRP) and three sub-metering as shown in Table 2. The dataset was recorded in a residential house in France during 2006 and 2010 for one-minute resolution. Table 1. AEP dataset variables, a short description and its units.

Metrics of Evaluation
To evaluate the performance of the model, we used RMSE, MSE, and MAE metrics. The mathematical representation of these metrics is depicted in Equations (11)- (13). RMSE calculates the difference between all predicted data points and the actual data point, then compute the mean of these square errors and finally calculate the square root of the mean values. The MSE calculates the mean disparity between the actual and model output values. The MAE calculates the mean absolute difference between the actual and predicted values.

Experimentations over IHEPC, AEP Dataset and Comparison with other Models
In this section, we evaluate the performance comparison of the proposed model with existing models for short-term load prediction (one hour ahead) over the IHEPC and AEP datasets. For the IHEPC dataset, the proposed model achieved 0.42, 0.18, and 0.29 RMSE, MSE, and MAE, respectively. The performance prediction over the test data is displayed in Figure 2a. A comparison of the proposed model over IHEPC dataset for short-term load prediction with other baseline models is shown in Figure 3. For more detail the performance of the proposed model is compared with [1,17,[33][34][35]60,61] in the short-term horizon. In [60] the authors used deep learning methodology for residential load prediction and obtained 0.79 RMSE and 0.59 MAE, whereas [33] used a CNN-LSTM hybrid network for short-term residential load prediction and achieved 0.59, 0.35, and 0.33 RMSE, MSE, and MAE, respectively. 0.47, 0.19 and 0.31 RMSE, MSE and MAE was reported in [17] whereas [34] reports 0.56, 0.31 and 0.34 values for these metrics. In [61] the authors achieved 0.38 MSE and 0.39 MAE, whereas [1] reported 0.66 RMSE. Another strategy presented in [35] attained 0.47, 0.22, and 0.33 RMSE, MSE, and MAE, respectively. Among these results, the proposed model achieved the lowest error rate for short-term electric load prediction.   Furthermore, the effectiveness of the proposed model is evaluated over the AEP dataset for a short-term horizon. For the AEP dataset, the proposed model attained 0.31, 0.10, 0.33 RMSE, MSE, and MAE, respectively, whereas the prediction results are shown in Figure 2b. Similarly, the effectiveness of the proposed model over the AEP dataset is also compared with other baseline models, as shown in Figure 4. For instance, the results are compared with [34,[62][63][64]. For further   Furthermore, the effectiveness of the proposed model is evaluated over the AEP dataset for a short-term horizon. For the AEP dataset, the proposed model attained 0.31, 0.10, 0.33 RMSE, MSE, and MAE, respectively, whereas the prediction results are shown in Figure 2b. Similarly, the effectiveness of the proposed model over the AEP dataset is also compared with other baseline models, as shown in Figure 4. For instance, the results are compared with [34,[62][63][64]. For further

Conclusions
In this study, we established a two-step methodology for short-term load prediction. In the first step, we performed data preprocessing over raw data to refine it for training. The refinement of the data is important because the historical data of energy consumption is generated from smart meter sensors, which include abnormalities such as outliers, missing values etc. These abnormalities from the raw data are extracted in this step, and finally, the normalization technique is applied to transform the data into a specific range. The second step is the hybrid model, which is a combination of CNN and multilayered bi-directional GRU (MB-GRU). The CNN layers are incorporated to extract important features from the refined data, while the MB-GRU layers are used to learn the temporal information of electricity consumption data. The proposed methodology is tested over two challenging datasets and achieves better performance when compared to other methods, as demonstrated in the results section.