Improving the Efficiency of Multistep Short-Term Electricity Load Forecasting via R-CNN with ML-LSTM

Multistep power consumption forecasting is smart grid electricity management’s most decisive problem. Moreover, it is vital to develop operational strategies for electricity management systems in smart cities for commercial and residential users. However, an efficient electricity load forecasting model is required for accurate electric power management in an intelligent grid, leading to customer financial benefits. In this article, we develop an innovative framework for short-term electricity load forecasting, which includes two significant phases: data cleaning and a Residual Convolutional Neural Network (R-CNN) with multilayered Long Short-Term Memory (ML-LSTM) architecture. Data preprocessing strategies are applied in the first phase over raw data. A deep R-CNN architecture is developed in the second phase to extract essential features from the refined electricity consumption data. The output of R-CNN layers is fed into the ML-LSTM network to learn the sequence information, and finally, fully connected layers are used for the forecasting. The proposed model is evaluated over residential IHEPC and commercial PJM datasets and extensively decreases the error rates compared to baseline models.


Introduction
Electricity load forecasting predicts future load based on single or multiple features or parameters. Features could be of multiple types, such as an ongoing month, hour, weather situation, electricity costs, economic conditions, geographical circumstances, etc. [1]. Electricity load forecasting is significantly increasing due to the development and extension of the energy market, as they endorsed trading electricity for each hour. Profitable market interactions could be enabled by accurate load forecasting [2] and leakage current prediction [3], which helps power firms guarantee electricity stability and decrease electricity wastage [4]. The electricity load prediction is handled through short-term electricity load forecasting [5], which is particularly significant due to smart grid development [6].
The United States Energy Information Administration stated that from 2015-40 the increase in power consumption would be boosted up to 28% [7], while the International Energy Agency stated that buildings and building construction account for approximately 36% of the world's total energy consumption. Stimulating building energy efficiency is vital in the low carbon economy [8,9]. Accurate energy consumption forecasting is indispensable for buildings' energy-saving design and renovation. The scrutiny of dissimilarity among the

•
The collected benchmark datasets contain a lot of missing values and outliers, which occur due to defaulted meters, weather conditions, and abnormal customer consumption. These abnormalities and redundancies in datasets lead the forecasting network to ambiguous predictions. To resolve this problem, we performed data preprocessing strategies, including outlier removal via the three sigma rules of thumb algorithm, missing value via NAN interpolation method, and the normalization of the data using the MinMax scaler.

•
We present a deep R-CNN integrated with ML-LSTM for power forecasting using real power consumption data. The motivation behind R-CNN with ML-LSTM is to extract patterns and time-varied information from the input data for effective forecasting. • The proposed model results in the lowest error rates of MAE, MSE, RMSE, and MAPE and the highest R 2 compared to recent literature. For the IHEPC dataset, the proposed model achieved 0.0447, 0.0132, 0.002, 0.9759, and 1.024 for RMSE, MAE, MSE, R 2 , and MAPE, respectively, over the hourly IHEPC dataset while these values are 0.0447, 0.0132, 0.002, 0.9759, and 1.024 over the IHEPC daily dataset. For the PJM dataset, the proposed model achieved 0.0223, 0.0163, 0.0005, 0.9907, 0.5504 for RMSE, MAE, MSE, R 2 , and MAPE, respectively. The lowest error metrics indicated the supremacy of the proposed model over state-of-the-art methods.

Literature Review
Short-term load forecasting is a current research area, and numerous studies have been conducted in the literature. These studies are mainly divided into four categories based on the learning algorithm: physical, persistence, artificial intelligence (AI), and statistical. The persistence model can predict future time series data behavior like electricity consumption or forecasting but failed for several hour ahead predictions [12]. Therefore, persistence models are not decisive for electricity forecasting. Physical models are based on mathematical expressions that consider meteorological and historical data. N. Mohan et al. [13] present a dynamic empirical model for short-term electricity forecasting based on a physical model. These models are also unreliable for electricity forecasting due to the high memory, and computational space required [12]. As compared to physical models, statistical models are less computationally expensive [14] and are typically based on autoregressive methods, i.e., GARCH [15], ARIMA [16], and linear regression methods. These models are based on linear data, while electricity consumption prediction or load forecasting is to address the concerns of DNNs and improve the forecasting performance for effective power management.

Proposed Method
The overall architecture of R-CNN with ML-LSTM is shown in Figure 1 for shortterm electricity load forecasting. A two-stage framework is presented, which includes data preprocessing and proposed R-CNN with ML-LSTM architecture. Data preprocessing includes filling missing values removing outliers, and normalizing the data for efficient training. The second step comprises R-CNN with ML-LSTM architecture, where R-CNN is employed for pattern learning. At the same time, the ML-LSTM layers are incorporated to learn the sequential information of electricity consumption data. Each step of the proposed framework is further explained in the following sub-sections.

Data Preprocessing
Smart meter sensors-based data generation contains outliers and missing values for several reasons, such as meter faults, weather conditions, unmanageable supply, storage issues, etc. [39], and must be preprocessed before training. Herein, we apply a unique preprocessing step. For evaluating the proposed method, we used IHEC Dataset, which includes the above-mentioned erroneous values. In addition, the performance is evaluated over the PJM benchmark dataset. To remove the outlier values in the dataset, we used three sigma rules [40] of thumb according to Equation (1).
where is a vector or superset of representing a value for power consumption in duration, i.e., minute, hour, day, etc. At the same time, ( ) is the average of , and

Data Preprocessing
Smart meter sensors-based data generation contains outliers and missing values for several reasons, such as meter faults, weather conditions, unmanageable supply, storage issues, etc. [39], and must be preprocessed before training. Herein, we apply a unique preprocessing step. For evaluating the proposed method, we used IHEC Dataset, which includes the above-mentioned erroneous values. In addition, the performance is evaluated over the PJM benchmark dataset. To remove the outlier values in the dataset, we used three sigma rules [40] of thumb according to Equation (1).
to address the concerns of DNNs and improve the forecasting performance for effective power management.

Proposed Method
The overall architecture of R-CNN with ML-LSTM is shown in Figure 1 for shortterm electricity load forecasting. A two-stage framework is presented, which includes data preprocessing and proposed R-CNN with ML-LSTM architecture. Data preprocessing includes filling missing values removing outliers, and normalizing the data for efficient training. The second step comprises R-CNN with ML-LSTM architecture, where R-CNN is employed for pattern learning. At the same time, the ML-LSTM layers are incorporated to learn the sequential information of electricity consumption data. Each step of the proposed framework is further explained in the following sub-sections.

Data Preprocessing
Smart meter sensors-based data generation contains outliers and missing values for several reasons, such as meter faults, weather conditions, unmanageable supply, storage issues, etc. [39], and must be preprocessed before training. Herein, we apply a unique preprocessing step. For evaluating the proposed method, we used IHEC Dataset, which includes the above-mentioned erroneous values. In addition, the performance is evaluated over the PJM benchmark dataset. To remove the outlier values in the dataset, we used three sigma rules [40] of thumb according to Equation (1).
where is a vector or superset of representing a value for power consumption in duration, i.e., minute, hour, day, etc. At the same time, ( ) is the average of , and where d i is a vector or superset of A representing a value for power consumption in duration, i.e., minute, hour, day, etc. At the same time, AVG(d) is the average of D, and STD(D) represents the standard deviation of the D. A recovering interpolation method is used as presented in Equation (2) for missing values. to address the concerns of DNNs and improve the forecasting performance for effective power management.

Proposed Method
The overall architecture of R-CNN with ML-LSTM is shown in Figure 1 for shortterm electricity load forecasting. A two-stage framework is presented, which includes data preprocessing and proposed R-CNN with ML-LSTM architecture. Data preprocessing includes filling missing values removing outliers, and normalizing the data for efficient training. The second step comprises R-CNN with ML-LSTM architecture, where R-CNN is employed for pattern learning. At the same time, the ML-LSTM layers are incorporated to learn the sequential information of electricity consumption data. Each step of the proposed framework is further explained in the following sub-sections.

Data Preprocessing
Smart meter sensors-based data generation contains outliers and missing values for several reasons, such as meter faults, weather conditions, unmanageable supply, storage issues, etc. [39], and must be preprocessed before training. Herein, we apply a unique preprocessing step. For evaluating the proposed method, we used IHEC Dataset, which includes the above-mentioned erroneous values. In addition, the performance is evaluated over the PJM benchmark dataset. To remove the outlier values in the dataset, we used three sigma rules [40] of thumb according to Equation (1).
where is a vector or superset of representing a value for power consumption in duration, i.e., minute, hour, day, etc. At the same time, ( ) is the average of , and If d i is missing or null, we placed it as a N AN. The IHEPC dataset was recorded in a one minute resolution, while the PJM dataset was recorded in a one hour resolution. The IHEPC dataset is down sampled for daily load forecasting into hourly resolutions. The input datasets are resampled into low resolution (from minutes to hours) in downsampling. The IHEPC dataset includes 2,075,259 recodes, down-sampled into 34,588 records for daily load forecasting, as shown in Figure 2. ( ) represents the standard deviation of the . A recovering interpolation method is used as presented in Equation (2) for missing values.
If is missing or null, we placed it as a . The IHEPC dataset was recorded in a one minute resolution, while the PJM dataset was recorded in a one hour resolution. The IHEPC dataset is down sampled for daily load forecasting into hourly resolutions. The input datasets are resampled into low resolution (from minutes to hours) in downsam pling. The IHEPC dataset includes 2,075,259 recodes, down-sampled into 34,588 records for daily load forecasting, as shown in Figure 2. After data cleaning, we apply the data transformation technique to transform the cleaned data into a specific format more suitable for effective training. First, we use the power transformation technique to remove shifts and transform the data into more Gauss ian-like. The power transformation includes Box-Cox [41] and Yeo-Johnson [42]. The Box-Cox is sensitive to negative values, while Yeo-Johnson can support both negative and positive values. In this work, we used the Box-Cox technique for power transfor mation to remove a shift from the electricity data distribution. This work uses univariate electricity load forecasting datasets, so the Box-Cox transformation for a single parameter is shown in Equation (3).
Finally, the min-max data normalization technique converts the data into a specific range because deep learning networks are sensitive to diverse data. The equation of minmax normalization is shown in Equation (4).
where is the actual data, while is the minimum and is the maximum values in the dataset. After data cleaning, we apply the data transformation technique to transform the cleaned data into a specific format more suitable for effective training. First, we use the power transformation technique to remove shifts and transform the data into more Gaussian-like. The power transformation includes Box-Cox [41] and Yeo-Johnson [42]. The Box-Cox is sensitive to negative values, while Yeo-Johnson can support both negative and positive values. In this work, we used the Box-Cox technique for power transformation to remove a shift from the electricity data distribution. This work uses univariate electricity load forecasting datasets, so the Box-Cox transformation for a single parameter is shown in Equation (3).
Finally, the min-max data normalization technique converts the data into a specific range because deep learning networks are sensitive to diverse data. The equation of min-max normalization is shown in Equation (4).
where d is the actual data, while d min is the minimum and d max is the maximum values in the dataset.

R-CNN with ML-LSTM
The proposed architecture integrates R-CNN with ML-LSTM for power load forecasting. R-CNN and ML-LSTM can store the complex fluctuating trends and extract complicated features for electricity load forecasting. First, the R-CNN layers extract patterns, which are then passed to ML-LSTM as input for learning. CNN is a well know deep learning architecture consisting of four types of layers: convolutional, pooling, fully connected, and regression [43,44]. The convolutional layers include multiple convolution filters, which perform a convolutional operation between convolutional neuron weights and the input volume-connected region, which generates a feature map [45,46]. The basic equation of the convolutional layer operation is shown in Equation (5).

R-CNN with ML-LSTM
The proposed architecture integrates R-CNN with ML-LSTM for power load forecasting. R-CNN and ML-LSTM can store the complex fluctuating trends and extract complicated features for electricity load forecasting. First, the R-CNN layers extract patterns, which are then passed to ML-LSTM as input for learning. CNN is a well know deep learning architecture consisting of four types of layers: convolutional, pooling, fully connected, and regression [43,44]. The convolutional layers include multiple convolution filters, which perform a convolutional operation between convolutional neuron weights and the input volume-connected region, which generates a feature map [45,46]. The basic equation of the convolutional layer operation is shown in Equation (5).
where Ƅ k ȴ is the bias of th convolution filter in the ℎ layer and X m,n ȴ demonstrates the location and activation function Ƒ(). In the convolution operation, the weights W o ȴ must be shared with the overall input region, known as weight sharing. During model building, the weight sharing significantly decreases the cost of calculation time and training parameter numbers. After convolution, the pooling operation is performed. The pooling layer reduces feature map resolution for input feature aggregation [47,48]. The output of the pooling layer is shown in Equation (6).
where ( , ) ԑ P m,n , ԑ P m,n is the region of location of , . CNN has three types of pooling layers: max, min, and average poling. The CNN's general network comprises several convolutional and pooling layers. Before the regression, the fully connected layers are typically set where every neuron in the previous layer is connected to every other in the next layer. The main purpose of the fully connected layer is to represent the learned feature distribution to a single space for high-level reasoning. The regression layer is the final output of the CNN model. Due to the strong feature extraction ability, CNN architectures are extensively applied for image classification, video classification, time series, etc. Similarly, in time-series forecasting, these models are used for traffic [49,50], renewables [51], election prediction [52], and power forecasting [53]. Recent studies of image classification show the crucial performance of CNNs. As the network depth increases to a certain level, the degradation problem occurs in which the model performance is saturated. The experimentation shows that saturation is an optimization problem that is not caused by overfitting. To address the degradation concern, R-CNN architecture has been developed [54]. The conventional CNN learns the data in a linear mechanism, i.e., a direct function Ƒ( ), but the R-CNN learns it differently, defined as ( ) = Ƒ( ) + .
The ResNet solves the degradation problem and performs satisfactory results over image recognition data, but electricity consumption is time series sequential data. The CNN architecture cannot learn the sequential features of power consumption data. Therefore, the R-CNN with ML-LSTM architecture is developed in this research study for future electricity load forecasting. The R-CNN layers extract spatial information from electricity consumption data. The extracted features of R-CNN are then fed to ML-LSTM as input for temporal learning.
The output of R-CNN is then forwarded to ML-LSTM architecture, that is responsible for storing time information. The ML-LSTM maintains long-term memory by merging its units to update the earlier hidden state, aiming to understand temporal relationships in the sequence. The three gates unit's mechanism is incorporated to determine each memory unit state through multiplication operations. The input gate, output gate, and forget gate represent each gate unit in the LSTM. The memory cells are updated with an activation. The operation of each gate in the LSTM can be shown in Equations

Data Preprocessing
Smart meter sensors-based data generation contains outliers and missin several reasons, such as meter faults, weather conditions, unmanageable sup issues, etc. [39], and must be preprocessed before training. Herein, we app preprocessing step. For evaluating the proposed method, we used IHEC Da includes the above-mentioned erroneous values. In addition, the performan ated over the PJM benchmark dataset. To remove the outlier values in the data three sigma rules [40] of thumb according to Equation (1).
is a vector or superset of representing a value for power consum ration, i.e., minute, hour, day, etc. At the same time, ( ) is the averag ((W

R-CNN with ML-LSTM
The proposed architecture integrates R-CNN with ML-LSTM for power load forecasting. R-CNN and ML-LSTM can store the complex fluctuating trends and extract complicated features for electricity load forecasting. First, the R-CNN layers extract patterns, which are then passed to ML-LSTM as input for learning. CNN is a well know deep learning architecture consisting of four types of layers: convolutional, pooling, fully connected, and regression [43,44]. The convolutional layers include multiple convolution filters, which perform a convolutional operation between convolutional neuron weights and the input volume-connected region, which generates a feature map [45,46]. The basic equation of the convolutional layer operation is shown in Equation (5).
where Ƅ k ȴ is the bias of th convolution filter in the ℎ layer and X m,n ȴ demonstrates the location and activation function Ƒ(). In the convolution operation, the weights W o ȴ must be shared with the overall input region, known as weight sharing. During model building, the weight sharing significantly decreases the cost of calculation time and training parameter numbers. After convolution, the pooling operation is performed. The pooling layer reduces feature map resolution for input feature aggregation [47,48]. The output of the pooling layer is shown in Equation (6).
where ( , ) ԑ P m,n , ԑ P m,n is the region of location of , . CNN has three types of pooling layers: max, min, and average poling. The CNN's general network comprises several convolutional and pooling layers. Before the regression, the fully connected layers are typically set where every neuron in the previous layer is connected to every other in the next layer. The main purpose of the fully connected layer is to represent the learned feature distribution to a single space for high-level reasoning. The regression layer is the final output of the CNN model.
Due to the strong feature extraction ability, CNN architectures are extensively applied for image classification, video classification, time series, etc. Similarly, in time-series forecasting, these models are used for traffic [49,50], renewables [51], election prediction [52], and power forecasting [53]. Recent studies of image classification show the crucial performance of CNNs. As the network depth increases to a certain level, the degradation problem occurs in which the model performance is saturated. The experimentation shows that saturation is an optimization problem that is not caused by overfitting. To address the degradation concern, R-CNN architecture has been developed [54]. The conventional CNN learns the data in a linear mechanism, i.e., a direct function Ƒ( ), but the R-CNN learns it differently, defined as ( ) = Ƒ( ) + . The ResNet solves the degradation problem and performs satisfactory results over image recognition data, but electricity consumption is time series sequential data. The CNN architecture cannot learn the sequential features of power consumption data. Therefore, the R-CNN with ML-LSTM architecture is developed in this research study for future electricity load forecasting. The R-CNN layers extract spatial information from electricity consumption data. The extracted features of R-CNN are then fed to ML-LSTM as input for temporal learning.
The output of R-CNN is then forwarded to ML-LSTM architecture, that is responsible for storing time information. The ML-LSTM maintains long-term memory by merging its units to update the earlier hidden state, aiming to understand temporal relationships in the sequence. The three gates unit's mechanism is incorporated to determine each memory unit state through multiplication operations. The input gate, output gate, and forget gate represent each gate unit in the LSTM. The memory cells are updated with an activation. The operation of each gate in the LSTM can be shown in Equations (7)-(9), and o ) t X

R-CNN with ML-LSTM
The proposed architecture integrates R-CNN with ML-LSTM for power load forecasting. R-CNN and ML-LSTM can store the complex fluctuating trends and extract complicated features for electricity load forecasting. First, the R-CNN layers extract patterns, which are then passed to ML-LSTM as input for learning. CNN is a well know deep learning architecture consisting of four types of layers: convolutional, pooling, fully connected, and regression [43,44]. The convolutional layers include multiple convolution filters, which perform a convolutional operation between convolutional neuron weights and the input volume-connected region, which generates a feature map [45,46]. The basic equation of the convolutional layer operation is shown in Equation (5).
where Ƅ k ȴ is the bias of th convolution filter in the ℎ layer and X m,n ȴ demonstrates the location and activation function Ƒ(). In the convolution operation, the weights W o ȴ must be shared with the overall input region, known as weight sharing. During model building, the weight sharing significantly decreases the cost of calculation time and training parameter numbers. After convolution, the pooling operation is performed. The pooling layer reduces feature map resolution for input feature aggregation [47,48]. The output of the pooling layer is shown in Equation (6).
where ( , ) ԑ P m,n , ԑ P m,n is the region of location of , . CNN has three types of pooling layers: max, min, and average poling. The CNN's general network comprises several convolutional and pooling layers. Before the regression, the fully connected layers are typically set where every neuron in the previous layer is connected to every other in the next layer. The main purpose of the fully connected layer is to represent the learned feature distribution to a single space for high-level reasoning. The regression layer is the final output of the CNN model. Due to the strong feature extraction ability, CNN architectures are extensively applied for image classification, video classification, time series, etc. Similarly, in time-series forecasting, these models are used for traffic [49,50], renewables [51], election prediction [52], and power forecasting [53]. Recent studies of image classification show the crucial performance of CNNs. As the network depth increases to a certain level, the degradation problem occurs in which the model performance is saturated. The experimentation shows that saturation is an optimization problem that is not caused by overfitting. To address the degradation concern, R-CNN architecture has been developed [54]. The conventional CNN learns the data in a linear mechanism, i.e., a direct function Ƒ( ), but the R-CNN learns it differently, defined as ( ) = Ƒ( ) + .
The ResNet solves the degradation problem and performs satisfactory results over image recognition data, but electricity consumption is time series sequential data. The CNN architecture cannot learn the sequential features of power consumption data. Therefore, the R-CNN with ML-LSTM architecture is developed in this research study for future electricity load forecasting. The R-CNN layers extract spatial information from electricity consumption data. The extracted features of R-CNN are then fed to ML-LSTM as input for temporal learning.
The output of R-CNN is then forwarded to ML-LSTM architecture, that is responsible for storing time information. The ML-LSTM maintains long-term memory by merging its units to update the earlier hidden state, aiming to understand temporal relationships in the sequence. The three gates unit's mechanism is incorporated to determine each memory unit state through multiplication operations. The input gate, output gate, and forget gate represent each gate unit in the LSTM. The memory cells are updated with an activation. The operation of each gate in the LSTM can be shown in Equations (7)-(9), and m,n +

R-CNN with ML-LSTM
The proposed architecture integrates R-CNN with ML-LSTM for power load forecasting. R-CNN and ML-LSTM can store the complex fluctuating trends and extract complicated features for electricity load forecasting. First, the R-CNN layers extract patterns, which are then passed to ML-LSTM as input for learning. CNN is a well know deep learning architecture consisting of four types of layers: convolutional, pooling, fully connected, and regression [43,44]. The convolutional layers include multiple convolution filters, which perform a convolutional operation between convolutional neuron weights and the input volume-connected region, which generates a feature map [45,46]. The basic equation of the convolutional layer operation is shown in Equation (5).
where Ƅ k ȴ is the bias of th convolution filter in the ℎ layer and X m,n ȴ demonstrates the location and activation function Ƒ(). In the convolution operation, the weights W o ȴ must be shared with the overall input region, known as weight sharing. During model building, the weight sharing significantly decreases the cost of calculation time and training parameter numbers. After convolution, the pooling operation is performed. The pooling layer reduces feature map resolution for input feature aggregation [47,48]. The output of the pooling layer is shown in Equation (6).
where ( , ) ԑ P m,n , ԑ P m,n is the region of location of , . CNN has three types of pooling layers: max, min, and average poling. The CNN's general network comprises several convolutional and pooling layers. Before the regression, the fully connected layers are typically set where every neuron in the previous layer is connected to every other in the next layer. The main purpose of the fully connected layer is to represent the learned feature distribution to a single space for high-level reasoning. The regression layer is the final output of the CNN model. Due to the strong feature extraction ability, CNN architectures are extensively applied for image classification, video classification, time series, etc. Similarly, in time-series forecasting, these models are used for traffic [49,50], renewables [51], election prediction [52], and power forecasting [53]. Recent studies of image classification show the crucial performance of CNNs. As the network depth increases to a certain level, the degradation problem occurs in which the model performance is saturated. The experimentation shows that saturation is an optimization problem that is not caused by overfitting. To address the degradation concern, R-CNN architecture has been developed [54]. The conventional CNN learns the data in a linear mechanism, i.e., a direct function Ƒ( ), but the R-CNN learns it differently, defined as ( ) = Ƒ( ) + .
The ResNet solves the degradation problem and performs satisfactory results over image recognition data, but electricity consumption is time series sequential data. The CNN architecture cannot learn the sequential features of power consumption data. Therefore, the R-CNN with ML-LSTM architecture is developed in this research study for future electricity load forecasting. The R-CNN layers extract spatial information from electricity consumption data. The extracted features of R-CNN are then fed to ML-LSTM as input for temporal learning.
The output of R-CNN is then forwarded to ML-LSTM architecture, that is responsible for storing time information. The ML-LSTM maintains long-term memory by merging its units to update the earlier hidden state, aiming to understand temporal relationships in the sequence. The three gates unit's mechanism is incorporated to determine each memory unit state through multiplication operations. The input gate, output gate, and forget gate represent each gate unit in the LSTM. The memory cells are updated with an activation. The operation of each gate in the LSTM can be shown in Equations (7)-(9), and

R-CNN with ML-LSTM
The proposed architecture integrates R-CNN with ML-LSTM for power load for casting. R-CNN and ML-LSTM can store the complex fluctuating trends and extract com plicated features for electricity load forecasting. First, the R-CNN layers extract pattern which are then passed to ML-LSTM as input for learning. CNN is a well know deep lear ing architecture consisting of four types of layers: convolutional, pooling, fully connecte and regression [43,44]. The convolutional layers include multiple convolution filte which perform a convolutional operation between convolutional neuron weights and t input volume-connected region, which generates a feature map [45,46]. The basic equati of the convolutional layer operation is shown in Equation (5).
where Ƅ k ȴ is the bias of th convolution filter in the ℎ layer and X m,n ȴ demonstrates t location and activation function Ƒ(). In the convolution operation, the weights W o ȴ mu be shared with the overall input region, known as weight sharing. During model buildin the weight sharing significantly decreases the cost of calculation time and training param eter numbers. After convolution, the pooling operation is performed. The pooling lay reduces feature map resolution for input feature aggregation [47,48]. The output of t pooling layer is shown in Equation (6).
where ( , ) ԑ P m,n , ԑ P m,n is the region of location of , . CNN has three types of pooli layers: max, min, and average poling. The CNN's general network comprises several co volutional and pooling layers. Before the regression, the fully connected layers are typ cally set where every neuron in the previous layer is connected to every other in the ne layer. The main purpose of the fully connected layer is to represent the learned featu distribution to a single space for high-level reasoning. The regression layer is the fin output of the CNN model. Due to the strong feature extraction ability, CNN architectures are extensively a plied for image classification, video classification, time series, etc. Similarly, in time-seri forecasting, these models are used for traffic [49,50], renewables [51], election predicti [52], and power forecasting [53]. Recent studies of image classification show the cruc performance of CNNs. As the network depth increases to a certain level, the degradati problem occurs in which the model performance is saturated. The experimentation show that saturation is an optimization problem that is not caused by overfitting. To addre the degradation concern, R-CNN architecture has been developed [54]. The convention CNN learns the data in a linear mechanism, i.e., a direct function Ƒ( ), but the R-CN learns it differently, defined as ( ) = Ƒ( ) + .
The ResNet solves the degradation problem and performs satisfactory results ov image recognition data, but electricity consumption is time series sequential data. T CNN architecture cannot learn the sequential features of power consumption data. Ther fore, the R-CNN with ML-LSTM architecture is developed in this research study for futu electricity load forecasting. The R-CNN layers extract spatial information from electrici consumption data. The extracted features of R-CNN are then fed to ML-LSTM as inp for temporal learning.
The output of R-CNN is then forwarded to ML-LSTM architecture, that is responsib for storing time information. The ML-LSTM maintains long-term memory by merging units to update the earlier hidden state, aiming to understand temporal relationships the sequence. The three gates unit's mechanism is incorporated to determine ea memory unit state through multiplication operations. The input gate, output gate, a forget gate represent each gate unit in the LSTM. The memory cells are updated with activation. The operation of each gate in the LSTM can be shown in Equations (7)-(9), a k (5) where ith ML-LSTM posed architecture integrates R-CNN with ML-LSTM for power load fore-N and ML-LSTM can store the complex fluctuating trends and extract comres for electricity load forecasting. First, the R-CNN layers extract patterns, n passed to ML-LSTM as input for learning. CNN is a well know deep learnre consisting of four types of layers: convolutional, pooling, fully connected, on [43,44]. The convolutional layers include multiple convolution filters, m a convolutional operation between convolutional neuron weights and the -connected region, which generates a feature map [45,46]. The basic equation utional layer operation is shown in Equation (5).
he bias of th convolution filter in the ℎ layer and X m,n ȴ demonstrates the activation function Ƒ(). In the convolution operation, the weights W o ȴ must h the overall input region, known as weight sharing. During model building, aring significantly decreases the cost of calculation time and training param-. After convolution, the pooling operation is performed. The pooling layer re map resolution for input feature aggregation [47,48]. The output of the is shown in Equation (6).
ԑ P m,n , ԑ P m,n is the region of location of , . CNN has three types of pooling in, and average poling. The CNN's general network comprises several cond pooling layers. Before the regression, the fully connected layers are typire every neuron in the previous layer is connected to every other in the next ain purpose of the fully connected layer is to represent the learned feature to a single space for high-level reasoning. The regression layer is the final CNN model. he strong feature extraction ability, CNN architectures are extensively apge classification, video classification, time series, etc. Similarly, in time-series hese models are used for traffic [49,50], renewables [51], election prediction er forecasting [53]. Recent studies of image classification show the crucial of CNNs. As the network depth increases to a certain level, the degradation rs in which the model performance is saturated. The experimentation shows n is an optimization problem that is not caused by overfitting. To address ion concern, R-CNN architecture has been developed [54]. The conventional the data in a linear mechanism, i.e., a direct function Ƒ( ), but the R-CNN rently, defined as ( ) = Ƒ( ) + . Net solves the degradation problem and performs satisfactory results over ition data, but electricity consumption is time series sequential data. The cture cannot learn the sequential features of power consumption data. There-N with ML-LSTM architecture is developed in this research study for future d forecasting. The R-CNN layers extract spatial information from electricity data. The extracted features of R-CNN are then fed to ML-LSTM as input learning. ut of R-CNN is then forwarded to ML-LSTM architecture, that is responsible e information. The ML-LSTM maintains long-term memory by merging its te the earlier hidden state, aiming to understand temporal relationships in . The three gates unit's mechanism is incorporated to determine each state through multiplication operations. The input gate, output gate, and present each gate unit in the LSTM. The memory cells are updated with an e operation of each gate in the LSTM can be shown in Equations (7)-(9), and .

R-CNN with ML-LSTM
The proposed architecture integrates R-CNN with ML-LSTM for power load foreasting. R-CNN and ML-LSTM can store the complex fluctuating trends and extract comlicated features for electricity load forecasting. First, the R-CNN layers extract patterns, hich are then passed to ML-LSTM as input for learning. CNN is a well know deep learnng architecture consisting of four types of layers: convolutional, pooling, fully connected, nd regression [43,44]. The convolutional layers include multiple convolution filters, hich perform a convolutional operation between convolutional neuron weights and the nput volume-connected region, which generates a feature map [45,46]. The basic equation f the convolutional layer operation is shown in Equation (5).
here Ƅ k ȴ is the bias of th convolution filter in the ℎ layer and X m,n ȴ demonstrates the ocation and activation function Ƒ(). In the convolution operation, the weights W o ȴ must e shared with the overall input region, known as weight sharing. During model building, he weight sharing significantly decreases the cost of calculation time and training paramter numbers. After convolution, the pooling operation is performed. The pooling layer educes feature map resolution for input feature aggregation [47,48]. The output of the ooling layer is shown in Equation (6).
here ( , ) ԑ P m,n , ԑ P m,n is the region of location of , . CNN has three types of pooling ayers: max, min, and average poling. The CNN's general network comprises several conolutional and pooling layers. Before the regression, the fully connected layers are typially set where every neuron in the previous layer is connected to every other in the next ayer. The main purpose of the fully connected layer is to represent the learned feature istribution to a single space for high-level reasoning. The regression layer is the final utput of the CNN model.
Due to the strong feature extraction ability, CNN architectures are extensively aplied for image classification, video classification, time series, etc. Similarly, in time-series orecasting, these models are used for traffic [49,50], renewables [51], election prediction 52], and power forecasting [53]. Recent studies of image classification show the crucial erformance of CNNs. As the network depth increases to a certain level, the degradation roblem occurs in which the model performance is saturated. The experimentation shows hat saturation is an optimization problem that is not caused by overfitting. To address he degradation concern, R-CNN architecture has been developed [54]. The conventional NN learns the data in a linear mechanism, i.e., a direct function Ƒ( ), but the R-CNN earns it differently, defined as ( ) = Ƒ( ) + .
The ResNet solves the degradation problem and performs satisfactory results over mage recognition data, but electricity consumption is time series sequential data. The NN architecture cannot learn the sequential features of power consumption data. Thereore, the R-CNN with ML-LSTM architecture is developed in this research study for future lectricity load forecasting. The R-CNN layers extract spatial information from electricity onsumption data. The extracted features of R-CNN are then fed to ML-LSTM as input or temporal learning.
The output of R-CNN is then forwarded to ML-LSTM architecture, that is responsible or storing time information. The ML-LSTM maintains long-term memory by merging its nits to update the earlier hidden state, aiming to understand temporal relationships in he sequence. The three gates unit's mechanism is incorporated to determine each emory unit state through multiplication operations. The input gate, output gate, and orget gate represent each gate unit in the LSTM. The memory cells are updated with an ctivation. The operation of each gate in the LSTM can be shown in Equations (7)-(9), and k is the bias of oth convolution filter in the lth layer and X

R-CNN with ML-LSTM
The proposed architecture integrates R-CNN with ML-LSTM for pow casting. R-CNN and ML-LSTM can store the complex fluctuating trends and plicated features for electricity load forecasting. First, the R-CNN layers ext which are then passed to ML-LSTM as input for learning. CNN is a well know ing architecture consisting of four types of layers: convolutional, pooling, ful and regression [43,44]. The convolutional layers include multiple convol which perform a convolutional operation between convolutional neuron we input volume-connected region, which generates a feature map [45,46]. The b of the convolutional layer operation is shown in Equation (5). where Ƅ k ȴ is the bias of th convolution filter in the ℎ layer and X m,n ȴ dem location and activation function Ƒ(). In the convolution operation, the weig be shared with the overall input region, known as weight sharing. During mo the weight sharing significantly decreases the cost of calculation time and tra eter numbers. After convolution, the pooling operation is performed. The p reduces feature map resolution for input feature aggregation [47,48]. The o pooling layer is shown in Equation (6). where ( , ) ԑ P m,n , ԑ P m,n is the region of location of , . CNN has three typ layers: max, min, and average poling. The CNN's general network comprises volutional and pooling layers. Before the regression, the fully connected lay cally set where every neuron in the previous layer is connected to every oth layer. The main purpose of the fully connected layer is to represent the lea distribution to a single space for high-level reasoning. The regression laye output of the CNN model. Due to the strong feature extraction ability, CNN architectures are ex plied for image classification, video classification, time series, etc. Similarly, i forecasting, these models are used for traffic [49,50], renewables [51], electio [52], and power forecasting [53]. Recent studies of image classification sho performance of CNNs. As the network depth increases to a certain level, the problem occurs in which the model performance is saturated. The experimen that saturation is an optimization problem that is not caused by overfitting the degradation concern, R-CNN architecture has been developed [54]. The CNN learns the data in a linear mechanism, i.e., a direct function Ƒ( ), bu learns it differently, defined as ( ) = Ƒ( ) + .
The ResNet solves the degradation problem and performs satisfactory image recognition data, but electricity consumption is time series sequent CNN architecture cannot learn the sequential features of power consumption fore, the R-CNN with ML-LSTM architecture is developed in this research stu electricity load forecasting. The R-CNN layers extract spatial information fro consumption data. The extracted features of R-CNN are then fed to ML-LS for temporal learning.
The output of R-CNN is then forwarded to ML-LSTM architecture, that i for storing time information. The ML-LSTM maintains long-term memory b units to update the earlier hidden state, aiming to understand temporal rel the sequence. The three gates unit's mechanism is incorporated to det memory unit state through multiplication operations. The input gate, outp forget gate represent each gate unit in the LSTM. The memory cells are upd activation. The operation of each gate in the LSTM can be shown in Equation

Data Preprocessing
Smart meter sensors-based data generation contains outliers and missing value several reasons, such as meter faults, weather conditions, unmanageable supply, sto issues, etc. [39], and must be preprocessed before training. Herein, we apply a un preprocessing step. For evaluating the proposed method, we used IHEC Dataset, w includes the above-mentioned erroneous values. In addition, the performance is ev ated over the PJM benchmark dataset. To remove the outlier values in the dataset, we u three sigma rules [40] of thumb according to Equation (1).
is a vector or superset of representing a value for power consumption in ration, i.e., minute, hour, day, etc. At the same time, ( ) is the average of , (). In the convolution operation, the weights W

R-CNN with ML-LSTM
The proposed architecture integrates R-CNN with ML-LS casting. R-CNN and ML-LSTM can store the complex fluctuating plicated features for electricity load forecasting. First, the R-CNN which are then passed to ML-LSTM as input for learning. CNN is ing architecture consisting of four types of layers: convolutional, and regression [43,44]. The convolutional layers include mul which perform a convolutional operation between convolutiona input volume-connected region, which generates a feature map [4 of the convolutional layer operation is shown in Equation (5). where Ƅ k ȴ is the bias of th convolution filter in the ℎ layer an location and activation function Ƒ(). In the convolution operatio be shared with the overall input region, known as weight sharing the weight sharing significantly decreases the cost of calculation eter numbers. After convolution, the pooling operation is perfo reduces feature map resolution for input feature aggregation [4 pooling layer is shown in Equation (6). where ( , ) ԑ P m,n , ԑ P m,n is the region of location of , . CNN h layers: max, min, and average poling. The CNN's general networ volutional and pooling layers. Before the regression, the fully c cally set where every neuron in the previous layer is connected t layer. The main purpose of the fully connected layer is to repre distribution to a single space for high-level reasoning. The reg output of the CNN model. Due to the strong feature extraction ability, CNN architec plied for image classification, video classification, time series, etc forecasting, these models are used for traffic [49,50], renewables [52], and power forecasting [53]. Recent studies of image classi performance of CNNs. As the network depth increases to a certa problem occurs in which the model performance is saturated. Th that saturation is an optimization problem that is not caused b the degradation concern, R-CNN architecture has been develope CNN learns the data in a linear mechanism, i.e., a direct functi learns it differently, defined as ( ) = Ƒ( ) + .
The ResNet solves the degradation problem and performs image recognition data, but electricity consumption is time ser CNN architecture cannot learn the sequential features of power c fore, the R-CNN with ML-LSTM architecture is developed in this electricity load forecasting. The R-CNN layers extract spatial inf consumption data. The extracted features of R-CNN are then fe for temporal learning.
The output of R-CNN is then forwarded to ML-LSTM archit for storing time information. The ML-LSTM maintains long-term units to update the earlier hidden state, aiming to understand t the sequence. The three gates unit's mechanism is incorpor memory unit state through multiplication operations. The inpu forget gate represent each gate unit in the LSTM. The memory c activation. The operation of each gate in the LSTM can be shown o must be shared with the overall input region, known as weight sharing. During model building, the weight sharing significantly decreases the cost of calculation time and training parameter numbers. After convolution, the pooling operation is performed. The pooling layer reduces feature map resolution for input feature aggregation [47,48]. The output of the pooling layer is shown in Equation (6).

R-CNN with ML-LSTM
The proposed architecture integrates R-CNN with ML-LSTM for power load forecasting. R-CNN and ML-LSTM can store the complex fluctuating trends and extract complicated features for electricity load forecasting. First, the R-CNN layers extract patterns, which are then passed to ML-LSTM as input for learning. CNN is a well know deep learning architecture consisting of four types of layers: convolutional, pooling, fully connected, and regression [43,44]. The convolutional layers include multiple convolution filters, which perform a convolutional operation between convolutional neuron weights and the input volume-connected region, which generates a feature map [45,46]. The basic equation of the convolutional layer operation is shown in Equation (5).
where Ƅ k ȴ is the bias of th convolution filter in the ℎ layer and X m,n ȴ demonstrates the location and activation function Ƒ(). In the convolution operation, the weights W o ȴ must be shared with the overall input region, known as weight sharing. During model building, the weight sharing significantly decreases the cost of calculation time and training parameter numbers. After convolution, the pooling operation is performed. The pooling layer reduces feature map resolution for input feature aggregation [47,48]. The output of the pooling layer is shown in Equation (6).
where ( , ) ԑ P m,n , ԑ P m,n is the region of location of , . CNN has three types of pooling layers: max, min, and average poling. The CNN's general network comprises several convolutional and pooling layers. Before the regression, the fully connected layers are typically set where every neuron in the previous layer is connected to every other in the next layer. The main purpose of the fully connected layer is to represent the learned feature distribution to a single space for high-level reasoning. The regression layer is the final output of the CNN model. Due to the strong feature extraction ability, CNN architectures are extensively applied for image classification, video classification, time series, etc. Similarly, in time-series forecasting, these models are used for traffic [49,50], renewables [51], election prediction [52], and power forecasting [53]. Recent studies of image classification show the crucial performance of CNNs. As the network depth increases to a certain level, the degradation problem occurs in which the model performance is saturated. The experimentation shows that saturation is an optimization problem that is not caused by overfitting. To address the degradation concern, R-CNN architecture has been developed [54]. The conventional CNN learns the data in a linear mechanism, i.e., a direct function Ƒ( ), but the R-CNN learns it differently, defined as ( ) = Ƒ( ) + .
The ResNet solves the degradation problem and performs satisfactory results over image recognition data, but electricity consumption is time series sequential data. The CNN architecture cannot learn the sequential features of power consumption data. Therefore, the R-CNN with ML-LSTM architecture is developed in this research study for future electricity load forecasting. The R-CNN layers extract spatial information from electricity consumption data. The extracted features of R-CNN are then fed to ML-LSTM as input for temporal learning.
The output of R-CNN is then forwarded to ML-LSTM architecture, that is responsible for storing time information. The ML-LSTM maintains long-term memory by merging its units to update the earlier hidden state, aiming to understand temporal relationships in the sequence. The three gates unit's mechanism is incorporated to determine each memory unit state through multiplication operations. The input gate, output gate, and forget gate represent each gate unit in the LSTM. The memory cells are updated with an activation. The operation of each gate in the LSTM can be shown in Equations (7)-(9), and sors 2022, 22, x FOR PEER REVIEW 6 of 14

R-CNN with ML-LSTM
The proposed architecture integrates R-CNN with ML-LSTM for power load forecasting. R-CNN and ML-LSTM can store the complex fluctuating trends and extract complicated features for electricity load forecasting. First, the R-CNN layers extract patterns, which are then passed to ML-LSTM as input for learning. CNN is a well know deep learning architecture consisting of four types of layers: convolutional, pooling, fully connected, and regression [43,44]. The convolutional layers include multiple convolution filters, which perform a convolutional operation between convolutional neuron weights and the input volume-connected region, which generates a feature map [45,46]. The basic equation of the convolutional layer operation is shown in Equation (5). where Ƅ k ȴ is the bias of th convolution filter in the ℎ layer and X m,n ȴ demonstrates the location and activation function Ƒ(). In the convolution operation, the weights W o ȴ must be shared with the overall input region, known as weight sharing. During model building, the weight sharing significantly decreases the cost of calculation time and training parameter numbers. After convolution, the pooling operation is performed. The pooling layer reduces feature map resolution for input feature aggregation [47,48]. The output of the pooling layer is shown in Equation (6).
where ( , ) ԑ P m,n , ԑ P m,n is the region of location of , . CNN has three types of pooling layers: max, min, and average poling. The CNN's general network comprises several convolutional and pooling layers. Before the regression, the fully connected layers are typically set where every neuron in the previous layer is connected to every other in the next layer. The main purpose of the fully connected layer is to represent the learned feature distribution to a single space for high-level reasoning. The regression layer is the final output of the CNN model. Due to the strong feature extraction ability, CNN architectures are extensively applied for image classification, video classification, time series, etc. Similarly, in time-series forecasting, these models are used for traffic [49,50], renewables [51], election prediction [52], and power forecasting [53]. Recent studies of image classification show the crucial performance of CNNs. As the network depth increases to a certain level, the degradation problem occurs in which the model performance is saturated. The experimentation shows that saturation is an optimization problem that is not caused by overfitting. To address the degradation concern, R-CNN architecture has been developed [54]. The conventional CNN learns the data in a linear mechanism, i.e., a direct function Ƒ( ), but the R-CNN learns it differently, defined as ( ) = Ƒ( ) + .
The ResNet solves the degradation problem and performs satisfactory results over image recognition data, but electricity consumption is time series sequential data. The CNN architecture cannot learn the sequential features of power consumption data. Therefore, the R-CNN with ML-LSTM architecture is developed in this research study for future electricity load forecasting. The R-CNN layers extract spatial information from electricity consumption data. The extracted features of R-CNN are then fed to ML-LSTM as input for temporal learning.
The output of R-CNN is then forwarded to ML-LSTM architecture, that is responsible for storing time information. The ML-LSTM maintains long-term memory by merging its units to update the earlier hidden state, aiming to understand temporal relationships in the sequence. The three gates unit's mechanism is incorporated to determine each memory unit state through multiplication operations. The input gate, output gate, and forget gate represent each gate unit in the LSTM. The memory cells are updated with an activation. The operation of each gate in the LSTM can be shown in Equations (7)

R-CNN with ML-LSTM
The proposed architecture integrates R-CNN with ML-LSTM for power load forecasting. R-CNN and ML-LSTM can store the complex fluctuating trends and extract complicated features for electricity load forecasting. First, the R-CNN layers extract patterns, which are then passed to ML-LSTM as input for learning. CNN is a well know deep learning architecture consisting of four types of layers: convolutional, pooling, fully connected, and regression [43,44]. The convolutional layers include multiple convolution filters, which perform a convolutional operation between convolutional neuron weights and the input volume-connected region, which generates a feature map [45,46]. The basic equation of the convolutional layer operation is shown in Equation (5). where Ƅ k ȴ is the bias of th convolution filter in the ℎ layer and X m,n ȴ demonstrates the location and activation function Ƒ(). In the convolution operation, the weights W o ȴ must be shared with the overall input region, known as weight sharing. During model building, the weight sharing significantly decreases the cost of calculation time and training parameter numbers. After convolution, the pooling operation is performed. The pooling layer reduces feature map resolution for input feature aggregation [47,48]. The output of the pooling layer is shown in Equation (6).
where ( , ) ԑ P m,n , ԑ P m,n is the region of location of , . CNN has three types of pooling layers: max, min, and average poling. The CNN's general network comprises several convolutional and pooling layers. Before the regression, the fully connected layers are typically set where every neuron in the previous layer is connected to every other in the next layer. The main purpose of the fully connected layer is to represent the learned feature distribution to a single space for high-level reasoning. The regression layer is the final output of the CNN model. Due to the strong feature extraction ability, CNN architectures are extensively applied for image classification, video classification, time series, etc. Similarly, in time-series forecasting, these models are used for traffic [49,50], renewables [51], election prediction [52], and power forecasting [53]. Recent studies of image classification show the crucial performance of CNNs. As the network depth increases to a certain level, the degradation problem occurs in which the model performance is saturated. The experimentation shows that saturation is an optimization problem that is not caused by overfitting. To address the degradation concern, R-CNN architecture has been developed [54]. The conventional CNN learns the data in a linear mechanism, i.e., a direct function Ƒ( ), but the R-CNN learns it differently, defined as ( ) = Ƒ( ) + .
The ResNet solves the degradation problem and performs satisfactory results over image recognition data, but electricity consumption is time series sequential data. The CNN architecture cannot learn the sequential features of power consumption data. Therefore, the R-CNN with ML-LSTM architecture is developed in this research study for future electricity load forecasting. The R-CNN layers extract spatial information from electricity consumption data. The extracted features of R-CNN are then fed to ML-LSTM as input for temporal learning.
The output of R-CNN is then forwarded to ML-LSTM architecture, that is responsible for storing time information. The ML-LSTM maintains long-term memory by merging its units to update the earlier hidden state, aiming to understand temporal relationships in the sequence. The three gates unit's mechanism is incorporated to determine each memory unit state through multiplication operations. The input gate, output gate, and forget gate represent each gate unit in the LSTM. The memory cells are updated with an activation. The operation of each gate in the LSTM can be shown in Equations (7)-(9), and i,j,o (6) where (i, j) ε P m,n , ε P m,n is the region of location of i, j. CNN has three types of pooling layers: max, min, and average poling. The CNN's general network comprises several convolutional and pooling layers. Before the regression, the fully connected layers are typically set where every neuron in the previous layer is connected to every other in the next layer. The main purpose of the fully connected layer is to represent the learned feature distribution to a single space for high-level reasoning. The regression layer is the final output of the CNN model.
Due to the strong feature extraction ability, CNN architectures are extensively applied for image classification, video classification, time series, etc. Similarly, in time-series forecasting, these models are used for traffic [49,50], renewables [51], election prediction [52], and power forecasting [53]. Recent studies of image classification show the crucial performance of CNNs. As the network depth increases to a certain level, the degradation problem occurs in which the model performance is saturated. The experimentation shows that saturation is an optimization problem that is not caused by overfitting. To address the degradation concern, R-CNN architecture has been developed [54]. The conventional CNN learns the data in a linear mechanism, i.e., a direct function of the proposed framework is further explained in the fo

Data Preprocessing
Smart meter sensors-based data generation contains several reasons, such as meter faults, weather conditions issues, etc. [39], and must be preprocessed before train preprocessing step. For evaluating the proposed method includes the above-mentioned erroneous values. In add ated over the PJM benchmark dataset. To remove the outl three sigma rules [40] of thumb according to Equation (1 is a vector or superset of representing a valu ration, i.e., minute, hour, day, etc. At the same time, (a), but the R-CNN learns it differently, defined as H(a) = corporated to learn the sequential information of electricity consumption data. Each st of the proposed framework is further explained in the following sub-sections.

Data Preprocessing
Smart meter sensors-based data generation contains outliers and missing values several reasons, such as meter faults, weather conditions, unmanageable supply, stora issues, etc. [39], and must be preprocessed before training. Herein, we apply a uniq preprocessing step. For evaluating the proposed method, we used IHEC Dataset, wh includes the above-mentioned erroneous values. In addition, the performance is eva ated over the PJM benchmark dataset. To remove the outlier values in the dataset, we us three sigma rules [40] of thumb according to Equation (1).
is a vector or superset of representing a value for power consumption in d ration, i.e., minute, hour, day, etc. At the same time, ( ) is the average of , a (a) + a. The ResNet solves the degradation problem and performs satisfactory results over image recognition data, but electricity consumption is time series sequential data. The CNN architecture cannot learn the sequential features of power consumption data. Therefore, the R-CNN with ML-LSTM architecture is developed in this research study for future electricity load forecasting. The R-CNN layers extract spatial information from electricity consumption data. The extracted features of R-CNN are then fed to ML-LSTM as input for temporal learning.
The output of R-CNN is then forwarded to ML-LSTM architecture, that is responsible for storing time information. The ML-LSTM maintains long-term memory by merging its units to update the earlier hidden state, aiming to understand temporal relationships in the sequence. The three gates unit's mechanism is incorporated to determine each memory unit state through multiplication operations. The input gate, output gate, and forget gate represent each gate unit in the LSTM. The memory cells are updated with an activation. The operation of each gate in the LSTM can be shown in Equations (7)- (9), and the output of each gate is represented by i, f , and o notation, while ∂ is the activation function, w represents the weight, and ER REVIEW 7 of 14 the output of each gate is represented by , , and notation, while is the activation function, represents the weight, and is the bias.

Architecture Design
The proposed R-CNN with ML-LSTM is based on three types of layers: R-CNN, ML-LSTM, and fully connected layers. The kernel size, filter numbers, and strides are adjustable in R-CNN layers according to the model's performance. Many learning speeds, changes, and performance can happen by adjusting these parameters varying on the input data [55]. We can confirm the performance change by increasing or decreasing these pa-is the bias.

Architecture Design
The proposed R-CNN with ML-LSTM is based on three types of layers: R-CNN, ML-LSTM, and fully connected layers. The kernel size, filter numbers, and strides are adjustable in R-CNN layers according to the model's performance. Many learning speeds, the output of each gate is represented by , , and notation, while is the activation function, represents the weight, and is the bias.

Architecture Design
The proposed R-CNN with ML-LSTM is based on three types of layers: R-CNN, ML-LSTM, and fully connected layers. The kernel size, filter numbers, and strides are adjustable in R-CNN layers according to the model's performance. Many learning speeds, the output of each gate is represented by , , and notation, while is the activation function, represents the weight, and is the bias.

Architecture Design
The proposed R-CNN with ML-LSTM is based on three types of layers: R-CNN, ML-LSTM, and fully connected layers. The kernel size, filter numbers, and strides are adjustable in R-CNN layers according to the model's performance. Many learning speeds, changes, and performance can happen by adjusting these parameters varying on the input data [55]. We can confirm the performance change by increasing or decreasing these parameters. We used a different kernel size in each layer to minimize the loss of temporal information. The data pass through the residual R-CNN layer, followed by the pooling layer for pattern learning. The output is then fed to the ML-LSTM for sequence learning, which is then forwarded to fully connected (FC) layers for final forecasting. Table 1 shows the layer type, the kernel's size, and R-CNN parameters with the ML-LSTM network.

Results
The experimental setup, evaluation metrics, dataset, performance assessment over hourly data, and performance assessment over daily data of R-CNN with the ML-LSTM model are briefly discussed in the following section.

Experimental Setup
To validate the effectiveness of the proposed approach, the IHEPC dataset is used to implement comprehensive experiments. The R-CNN with ML-LSTM is trained over an Intel-Core-i7 CPU having 32GB RAM and GEFORCE-GTX-2060-GPU in Windows 10. The implementation was performed in Python 3.5 using the Keras framework.

Evaluation Metrics
The model performance is evaluated on mean square error (MSE), mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R 2 ), and mean absolute percentage error (MAPE) metrics. MAE computes the closeness between actual and forecasted values, MSE calculates square error, RMSE is the square root of MSE, R 2 exhibits model fitting effect ranging from 0 to 1 where closer to 1 indicates best prediction performance, and MAPE is the absolute ratio error for all samples. The mathematical equations of each metric are demonstrated in Equations (10)- (12) Where y i is the actual power consumption value andŷ i is the forecasted value.

Datasets
The R-CNN with ML-LSTM model is evaluated over the UCI repository's IHEPC and PJM datasets. The IHEPC comprises nine attributes: date and time variables, active and reactive power, voltage, intensity, and three submetering variables. Description of IHEPC attributes and their unit are shown in Table 2. The IHEPC dataset was collected from a residential building in France between the period of 2006 to 2010. PJM (Pennsylvania-New Jersey-Maryland) is a regional transmission society that operates the eastern electricity grid of the US. The PJM transmits the electricity to several US regions, including Maryland, Michigan, Delaware, etc. The power consumption data are stored on PJM's official website, which is recorded in one-hour resolution in megawatts.

Comparative Analysis
The performance of R-CNN with the ML-LSTM model is compared to other models over residential electricity consumption IHEPC and regional electricity forecasting PJM datasets. The performance of the proposed model over these datasets and its comparison with other models are clarified in the subsequent sections.

Evaluation of the PJM Dataset
The superiority of R-CNN with ML-LSTM is also evaluated over several PJM datasets for daily load forecasting. The PJM benchmark includes 14 datasets of electricity load forecasting, while for the experimentation, we chose 10 datasets from the PJM as selected by [61]. In the literature, we found the comparison for 10 regions dataset, which is demonstrated in Table 5, where the proposed model acquired the lowest error rate for each dataset. The performance of R-CNN with ML-LSTM is compared with Mujeeb et al. [67], Gao et al. [68], Chou et al. [69], Khan et al. [61], and Han et al. [58]. The R-CNN with ML-LSTM secures the lower error metrics for all datasets, where the details are given in Table  5, while the prediction results for all datasets in the PJM region are shown in Figure 4.   The performance of the proposed model over the hourly resolution of data also secured the lowest error rate compared to the baseline model. For a precedent, the performance over the daily resolution of data is compared with regression [35], CNN [37], LSTM [35], CNN-LSTM [35], and FCRBM [59], where the details results of each study are given in Table 4. Comparatively, the R-CNN with ML-LSTM model also reduces the error rates over the daily dataset and achieved 0.0447, 0.0132, 0.002, 0.9759, and 2.457 for RMSE, MAE, MSE, R 2 , and MAPE, respectively, for daily load forecasting.

Evaluation of the PJM Dataset
The superiority of R-CNN with ML-LSTM is also evaluated over several PJM datasets for daily load forecasting. The PJM benchmark includes 14 datasets of electricity load forecasting, while for the experimentation, we chose 10 datasets from the PJM as selected by [61]. In the literature, we found the comparison for 10 regions dataset, which is demonstrated in Table 5, where the proposed model acquired the lowest error rate for each dataset. The performance of R-CNN with ML-LSTM is compared with Mujeeb et al. [67], Gao et al. [68], Chou et al. [69], Khan et al. [61], and Han et al. [58]. The R-CNN with ML-LSTM secures the lower error metrics for all datasets, where the details are given in Table 5, while the prediction results for all datasets in the PJM region are shown in Figure 4.

Conclusions
A two-phase framework is proposed in this work for power load forecasting. Data cleaning is the first phase of our framework where data preprocessing strategies are applied over raw data to make it clean for effective training. Secondly, a deep R-CNN with ML-LSTM architecture is developed where the R-CNN learns patterns from electricity data, and the outputs are fed to ML-LSTM layers. Electricity consumption comprises time series data that include spatial and temporal features. The R-CNN layers extract spatial features in this work, while the ML-LSTM architecture is incorporated for sequence learning. The proposed model was tested over residential and commercial benchmark datasets and conducted with satisfactory results. For residential power consumption forecasting, IHEPC data was used, while the PJM dataset was used for commercial evaluation. The experiments are performed for daily and hourly power consumption forecasting and extensively decrease the error rates. In the future, the proposed model will be tested over medium-term and long-term electricity load forecasting. In addition, we will integrate environmental sensor data that help to predict future electricity consumption. Furthermore, we also intend to investigate the performance of the R-CNN and ML-LSTM in other prediction domains such as fault prediction, renewable power generation prediction, and traffic flow prediction.