2. Related Works
In this section, we present a state-of-the-art review of the main statistical, machine learning, and hybrid techniques used for predicting electricity power consumption in recent years. In the past, statistical techniques were predominantly employed for the purpose of forecasting energy demand. For instance, in reference [
10], Bootstrap aggregating was applied to the autoregressive integrated moving average (ARIMA) and exponential smoothing techniques to predict energy demand across different countries. Additionally, in reference [
5], the seasonal autoregressive integrated moving average (SARIMA) model was compared with the neuro-fuzzy model for forecasting electric load. The study in [
11] encompassed the application of linear regression (using both single and multiple predictors) as well as quadratic regression models to analyze the hourly and daily energy consumption of a specific household. Moreover, reference [
12] introduced the combination of multiple regression and a genetic engineering technique to estimate the daily energy usage of an administration building. However, both approaches were hindered by significant limitations, including the absence of occupancy data and the fact that none of the models were validated for estimating energy usage in similar buildings. In addition, the use of computational methods has demonstrated limited effectiveness in predicting energy demand. Consequently, various prediction models have been experimented with, employing machine learning techniques to enhance forecasting accuracy [
13,
14,
15]. For example, Liu et al. [
16] created a support vector machine (SVM) model for forecasting and analyzing energy consumption in public buildings. Leveraging the robust non-linear capabilities of support vector regression, Chen et al. [
17] proposed a model for predicting electrical load based on ambient temperature. In addition, energy consumption was predicted by analyzing the collective behavior of population dynamics in [
18]. A learning algorithm based on artificial neural networks and cuckoo search was proposed to forecast the electricity consumption for the Organization of the Petroleum Exporting Countries (OPEC) [
19]. In the work of Pinto et al. [
20], an ensemble learning model was proposed, combining three machine learning algorithms random forests, gradient-boosted regression trees, and Adaboost to forecast energy consumption. However, existing machine learning methods are heavily affected by overfitting. The evolving nature of data and the dynamic relationships between variables present challenges, making it difficult to guarantee long-term reliability when overfitting occurs. Several deep sequential learning neural networks have been developed to predict electricity consumption. One study utilized a recurrent neural network model to forecast medium- to long-term electricity usage patterns in both commercial and residential buildings, offering one-hour resolution predictions [
21]. Another approach introduced a pooling-based recurrent neural network (RNN) to mitigate overfitting by increasing data diversity and volume [
22]. An RNN architecture utilizing Long Short-Term Memory (LSTM) cells was also implemented to predict the energy load in [
23]. In [
24], a model utilizing LSTM networks was put forward for routine energy consumption forecasting. Additionally, [
25] proposed an enhanced optimization technique involving a bagged echo state network (ESN) refined by a differential evolution algorithm to estimate energy usage. The effectiveness of deep extreme learning machines for predicting energy consumption in residential buildings was evaluated, showing better performance than other artificial neural networks and neuro-fuzzy systems [
26]. In order to improve the predictability despite limited knowledge and historical evidence in energy consumption, Gao et al. [
27] introduced the use of two deep learning models: a sequence-to-sequence model and a two-dimensional attention-based convolutional neural network. These deep learning models can uncover crucial and hidden features required for accurate predictions, even from non-stationary data with dynamic characteristics and varying biomarkers. However, conventional deep learning models often struggle with capturing the spatiotemporal attributes pertinent to energy usage [
4]. Additionally, reference [
28] highlights that deep learning approaches are not consistently reliable or precise for forecasting power consumption. Several factors, including the market cycle and regional economic policies, significantly influence energy usage. As a result, it is highly challenging for a single intelligent algorithm to be sufficient [
29]. Therefore, integrating effective preprocessing techniques and feature learning models for predicting power consumption holds great potential for enhancing prediction performance. For example, in [
5], stacked autoencoders and extreme learning machines were employed to efficiently extract energy consumption-related features, leading to more robust prediction performance. Additionally, a hybrid approach was utilized in [
30], combining AdaBoost ensemble technology with a neural network, support vector regression machine, genetic programming, and radial basis function network to improve energy consumption forecasting. Furthermore, a hybrid SARIMA–metaheuristic firefly algorithm–least squares support vector regression model was employed for energy consumption forecasting in [
8]. Hu et al. [
31] combined the echo state network, bagging, and differential evolution algorithm to forecast energy consumption. Additionally, a hybrid approach incorporating the Logarithmic Mean Divisia Index, empirical mode decomposition, least-square support vector machine, and particle swarm optimization was employed for energy consumption forecasting [
32]. Lastly, Kaytez [
33] proposed the use of the least-square SVM and an autoregressive integrated moving average for energy consumption forecasting. In [
34], a combination of three sophisticated reinforcement learning models—namely, asynchronous advantage Actor–Critic, deep deterministic policy gradient, and recurrent deterministic policy gradient—was introduced to address the complex and non-linear nature of energy consumption forecasting. In [
35], an ensemble model was proposed to divide energy consumption data into stable and stochastic components. Furthermore, a hybrid model incorporating ARIMA, artificial neural networks, and a combination of particle swarm optimization with support vector regression was developed and utilized for load and energy forecasting [
36]. This study [
37] aimed to create an innovative electricity consumption forecasting model called the Symbiotic Bidirectional Gated Recurrent Unit, which combines the Gated Recurrent Unit, bidirectional approach, and Symbiotic Organisms search algorithms. Furthermore, a comprehensive ensemble empirical mode decomposition with adaptive noise and machine learning model, specifically extreme gradient boosting, was recommended for predicting building energy consumption [
38]. Another hybrid model was introduced, combining CNN with multilayer bi-directional LSTM [
39]. This paper [
40] introduced a hybrid forecasting approach that leverages the empirical wavelet transform (EWT) and the Autoformer time series prediction model to address the challenges of non-stationary and non-linear electric load data. Ref. [
41] recommended integrating stationary wavelet transform (SWT) with ensemble LSTM for forecasting energy consumption. Additionally, Singla et al. [
42] developed an ensemble model to predict solar Global Horizontal Irradiance (GHI) 24 h in advance for Ahmedabad, Gujarat, India, by combining wavelet analysis with Bi-LSTM networks. They also evaluated the forecasting accuracy against models using unidirectional LSTM, unidirectional GRU, Bi-LSTM, and wavelet-enhanced Bi-LSTM. Moreover, Lin et al. [
43] applied wavelet transform to decompose crude oil price data, which were then input into a Bi-LSTM–Attention–CNN model for forecasting future prices. Also, the results of [
44] highlight the benefits of combining wavelet features with convolutional neural networks, enhancing forecasting accuracy and automating the feature extraction process. Ref. [
45] presented a hybrid approach that integrates stationary wavelet transform with deep transformers to forecast household energy consumption. In addition, paper [
46] presented an innovative ensemble forecasting model utilizing wavelet transform for short-term load forecasting (STLF), based on the load profile decomposition approach. The findings indicate that the proposed method outperforms both traditional and state-of-the-art techniques in terms of prediction accuracy [
47]. They introduce a comparison between the wavelet-based denoising models and their traditional counterparts. The state of the art highlights the variability of techniques used for developing accurate load forecasting models and the potential benefits of hybrid methods, particularly by combining wavelet filtering with deep learning. This approach will be the focus of the following sections.
4. Methodology and Results
This research establishes a framework for predicting power usage to yield reliable findings. The methodology of this work is presented in
Figure 1. This flowchart illustrates the sequence of steps in the proposed algorithm, spanning from data input to model evaluation. Each step is vital for ensuring that the data are correctly processed, the model is accurately trained, and the performance is comprehensively evaluated.
The algorithm begins with the input data stage, where the dataset is collected and loaded. This initial step is crucial to ensure that the data are ready for subsequent processing. Following this, the stationary wavelet transform normalized stage involves applying a stationary wavelet transform (SWT) to the data, followed by normalization. This transformation is essential for converting the data into a form that can be effectively utilized in further processing and modeling.
Next, in the preprocessing data stage, the data undergo various cleaning and preparation procedures. This includes handling missing values, removing noise, and splitting the data into training and testing sets. Additionally, feature engineering may be conducted to enhance the dataset’s utility for the model. This preprocessing step ensures the data are in optimal condition for model training.
The fourth step, training data with the GRU–bior2.4 algorithm, involves initializing the GRU–bior2.4 model and training it using the preprocessed training data. This stage is critical, as it constitutes the core machine learning process where the model learns from the data. The final stage, evaluation model with metric data, focuses on assessing the trained model’s performance using the testing data. Evaluation metrics such as RMSE, MAE, and MAPE are calculated to gauge the model’s performances.
The process concludes with the end stage, marking the completion of the algorithm’s execution. This structured sequence ensures a systematic approach to data handling, model training, and performance evaluation, ultimately leading to a robust and reliable deep learning learning model.
The IHEPC dataset, which is accessible in the UCI repository [
48], is used for evaluating the validation of model performance. IHEPC is a free residential dataset obtainable from the machine learning database at UCI that includes electrical consumption data from 2006 to 2010. It has 2,075,259 values, with a total of 25,979 outstanding values. Values that are missing represent 1.25% of the total data and are dealt with at the phase of preprocessing. This dataset contains power usage information at a one-minute sampling rate for over four years. For our evaluation, we separated the data into a set for training and a set for testing. During training, the predictive model is tuned using a training set, and the forecast component predicts values for output from information that are not observed in the testing set.
Figure 2 presents the original load profile.
Reliable power consumption forecasts improve energy utilization costs, help organizations make better energy planning choices, and save a significant amount of money and energy. However, forecasting power usage accurately is difficult since it involves dataset dynamics and random fluctuations. In this strategy, 80% of the collected information is utilized for training, while the remaining 20% is used for testing. In a machine learning context, the key objective for our models is to identify the function that relates inputs to outputs using examples from designated training information consisting of known input and output pairs. In order to be processed by our algorithms for forecasting the use of electricity, information should be transformed into an appropriately supervised machine learning issue [
38]. As a result, the time series is transformed into pairs of inputs and outputs using the window sliding approach, with 15 past time steps utilized as features to forecast the next step in the time series. The input data are preprocessed in the first stage to eliminate anomalies, missing, and duplicate values. For normalizing the input dataset to a given range, we employ typical scalar approaches. After that, the transformed input data are passed to the training process step. The models LSTM, GRU, Bi-GRU, Bi-LSTM, and Bi-GRU LSTM are then evaluated. Finally, we assess our models using metrics such as RMSE, MAE and MAPE. In simple terms, these measures compute the difference between the expected and real values.
where
,
, and
N represent the real value of data, the predicted value of data, and the number of samples of data, respectively.
The Mean Absolute Error (MAE) is a metric that measures the average magnitude of errors in a set of predictions, without considering whether the errors are positive or negative. It is calculated as the mean of the absolute differences between predicted and actual values. In contrast, RMSE assesses the relative difference between predicted and actual values. Meanwhile, MAPE calculates the average absolute percentage error between the forecasted and true values. It offers a clear indication of prediction accuracy in percentage terms, with a lower MAPE signifying better model performance. We also perform experiments on multiple deep learning models for comparison, including LSTM, GRU, Bi-GRU, Bi-LSTM, and Bi-GRU LSTM. The forecasting models are trained for up to 15 epochs using the previously mentioned methodologies. The model is developed using an HP Omen PC equipped with a Core i5 CPU and 16 GB of RAM. The code is written in Python3 Keras, with TensorFlow as the backend and the optimization algorithm Adam.
Table 1 summarizes the results of several deep learning algorithms. LSTM obtained 0.21, 0.07, and 8.55 RMSE, MAE, and MPAE, GRU obtained 0.20, 0.07, and 9.60 RMSE, MAE, and MPAE, Bi-GRU obtained 0.21, 0.08, and 9.94 RMSE, MAE, and MPAE, Bi-LSTM obtained 0.21, 0.08, and 9.15 RMSE, MAE, and MPAE, and Bi-GRU LSTM obtained 0.21, 0.08, and 8.98 RMSE, MAE, and MPAE.
Figure 3 illustrates the prediction performance of the GRU model with 1-minute time stamps over the IHEPC dataset and a zoomed-in view of the GRU model’s prediction performance for 1 randomly selected day from the same dataset. Although the predictions in certain areas of
Figure 3b appear acceptable, the model’s accuracy could be improved by referring to the MAPE values in the last column of
Table 1.
SWT decomposes a signal into high- and low-frequency aspects identified as detail and approximation parameters by feeding it across high-pass and low-pass filters, respectively. The fundamental benefit of SWT is that it overcomes the translation invariance of DWT by eliminating downsamplers and upsamplers. As a result, the SWT variables have the same sampling count as the initial signal. Before proceeding with standard SWT analysis, the following variables must be determined: the mother wavelet and the number of decomposition stages. The mother wavelet is often chosen based on correlations between the mother wavelet and the requested signal. The modified signal has the same shape as the original load profile but with certain modifications. By removing extraneous noise, SWT allows the model to concentrate on the key patterns and relationships within the data. Consequently, we apply a wavelet transform to denoise the original series before modeling. Specifically, the bior2.4 wavelet filter is utilized for stationary wavelet transformation, decomposing each series prior to feeding it into the LSTM, GRU, Bi-LSTM, Bi-GRU, and Bi-GRU LSTM models.
Figure 4 shows the difference between the original and transformed load profiles and provides a zoomed-in view of this difference for 1 day.
In the next step, we need to choose the best mother wavelet. So, we introduce a comparative study between different mother wavelets used in the literature. The most popular wavelet are bior2.4, rbio2.4, coif2, db2, and sym2. The competition data exhibit a wide variation in the range of values across different features, which impacts both the accuracy and stability of our forecasting model. To address this, all feature ranges are normalized by rescaling them to a consistent scale. This normalization is performed in Python (version 3.8, developed by the Python Software Foundation, Wilmington, DE, USA) using the MinMaxScaler method from the sklearn package.
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6 summarize the findings of several mother wavelets bior2.4, rbio2.4, coif2, db2, and sym2 with different deep learning algorithms LSTM, GRU, Bi-LSTM, Bi-GRU, and Bi-GRU LSTM. The results found show that bior2.4 is the most relevant mother wavelet compared to the mother wavelets used for all deep learning models. For bior2.4 with deep learning models, we see significant improvement in energy consumption prediction accuracy. bior2.4/LSTM obtained 0.07, 0.03, and 8.83 RMSE, MAE, and MPAE. These results show improvements in RMSE and MAE by 66.66% and 28% respectively. bior2.4/GRU obtained 0.06, 0.04, and 5.65. These results demonstrate improvements in RMSE, MAE, and MPAE by 70%, 42.85%, and 47.67%, respectively. bior2.4/Bi-LSTM achieved RMSE, MAE, and MPAE scores of 0.07, 0.04, and 7.93 respectively. These results represent improvements of 66.66%, 42.85%, and 7.25% in RMSE, MAE, and MPAE. bior2.4/Bi-GRU achieved scores of 0.07, 0.03, and 5.18, respectively, demonstrating improvements of 66.66%, 62.5%, and 46.04% in RMSE, MAE, and MPAE. Bi-GRU LSTM obtained 0.07, 0.03, and 5.09 RMSE, MAE, and MPAE. These results represent improvements of 66.66%, 42.85%, and 43.31% in RMSE, MAE, and MPAE. Based on these results, we can conclude that the bior2.4/GRU model provides the most accurate predictions. So we accurately predict the challenging numbers associated with significant variance in power consumption, resulting in precise and reliable power usage forecasts as shown in
Figure 5 which illustrates the prediction performance of the proposed model with 1-minute time stamps over the IHEPC dataset and a zoomed-in view of the proposed model’s prediction performance for 1 randomly selected day from the same dataset.
Table 7 summarizes the results of several deep learning algorithms. LSTM obtained 0.46, 0.28 and 38.41 RMSE, MAE and MPAE; GRU obtained 0.46, 0.28 and 39.13 RMSE, MAE and MPAE; Bi-GRU obtained 0.46, 0.28, and 43.58 RMSE, MAE, and MPAE; Bi-LSTM obtained 0.46, 0.28, and 37.75 RMSE, MAE and MPAE; and Bi-GRU LSTM obtained 0.47, 0.29, and 40.75 RMSE, MAE and MPAE. The model fails to estimate the precise value when there is a quick change or a peak in consumption. The results suggest that employing deep learning algorithms to estimate power usage is not always reliable as shown in
Figure 6. Several factors influence the prediction performance, such as lowering the quantity of data in the database utilized by the model.
Table 8 summarizes the findings for mother wavelet bior2.4 combined with different deep learning algorithms (LSTM, GRU, Bi-LSTM, Bi-GRU, and Bi-GRU LSTM). For bior2.4 paired with these models, there is a notable improvement in the energy consumption prediction accuracy. bior2.4/LSTM achieved RMSE, MAE, and MPAE scores of 0.16, 0.11, and 15.66, respectively, showing improvements of 65.95%, 60.71%, and 59.22% in RMSE, MAE, and MAPE. bior2.4/GRU obtained scores of 0.15, 0.10, and 13.62, demonstrating improvements of 67.73%, 64.28%, and 65.19% in RMSE, MAE, and MPAE. bior2.4/Bi-LSTM recorded RMSE, MAE, and MPAE scores of 0.15, 0.11, and 17.55, reflecting improvements of 67.39%, 60.71%, and 53.50%. bior2.4/Bi-GRU achieved 0.15, 0.10, and 15.07 in RMSE, MAE, and MPAE, showing improvements of 67.39%, 64.28%, and 59.72%. bior2.4/Bi-GRU LSTM obtained scores of 0.16, 0.11, and 15.66, with improvements of 65.95%, 62.06%, and 61.57%. From these results obtained, we can deduce that the bior2.4/GRU model gives the most precise model. These findings confirm our ability to accurately predict the challenging numbers associated with significant variance in power consumption, resulting in precise and reliable power usage forecasts as shown in
Figure 7.
Table 9 summarizes the results of several deep learning algorithms. LSTM obtained 0.50, 0.37, and 52.39 RMSE, MAE, and MPAE; GRU obtained 0.52, 0.38, and 55.44 RMSE, MAE, and MPAE; Bi-GRU obtained 0.52, 0.36, and 47.28 RMSE, MAE, and MPAE; Bi-LSTM obtained 0.51, 0.36, and 49.71 RMSE, MAE and MPAE; and Bi-GRU LSTM obtained 0.52, 0.37, and 52.72 RMSE, MAE, and MPAE. The model struggles to accurately estimate values during sudden changes or peaks in consumption. The findings indicate that using deep learning algorithms to predict power usage can sometimes be unreliable. Various factors, including a reduction in the amount of data available to the model, affect its prediction performance as demonstrated in
Figure 8.
Table 10 summarizes the results for mother wavelet bior2.4 combined with different deep learning algorithms (LSTM, GRU, Bi-LSTM, Bi-GRU, and Bi-GRU LSTM). Pairing bior2.4 with these models leads to significant improvements in predicting energy consumption accuracy. bior2.4/LSTM achieved RMSE, MAE, and MPAE scores of 0.18, 0.13, and 18.98, respectively, showing improvements of 65.38%, 64.48%, and 63.77% in RMSE, MAE, and MAPE. bior2.4/GRU obtained scores of 0.18, 0.13, and 18.21, demonstrating improvements of 65.38%, 65.78%, and 67.15% in RMSE, MAE, and MPAE. bior2.4/Bi-LSTM recorded RMSE, MAE, and MPAE scores of 0.18, 0.14, and 20.83, reflecting improvements of 64.70%, 61.11%, and 58.09%. bior2.4/Bi-GRU achieved scores of 0.17, 0.13, and 17.53 in RMSE, MAE, and MPAE, showing improvements of 67.30%, 63.88%, and 62.92%. bior2.4/Bi-GRU LSTM obtained scores of 0.18, 0.13, and 17.91, reflecting improvements of 65.38%, 64.86%, and 66.02%. These results indicate that the bior2.4/Bi-GRU and bior2.4/GRU models give the most accurate predictions. The results of these two models are nearly identical across all metrics used to assess accuracy. However, since the simulation time of the bior2.4/GRU model is faster than that of the bior2.4/Bi-GRU model, we can conclude that the bior2.4/GRU model is more efficient for predicting power consumption. These results validate our capability to precisely forecast the demanding figures linked with notable fluctuations in power consumption, leading to dependable and accurate predictions of power usage as shown in
Figure 9.
Our analysis shows that the proposed model excels in predicting sudden changes or peaks in consumption more accurately than deep learning algorithms. Although wavelet and deep learning methods are generally dependable for forecasting power consumption, the proposed model surpasses these approaches in both precision and reliability. Several factors affect the performance of the prediction such as reducing the amount of data and complexity in the database used in the model. These methods have been demonstrated to generate reliable forecasts. While various methods exist for predicting power consumption, they often fail to consistently achieve expected performance levels due to their unique advantages and disadvantages. Denoising helps remove unnecessary noise, enabling the model to focus on the essential patterns and correlations within the data. As a result, the original time series was denoised using wavelet transformation prior to applying the model. The stationary wavelet transform was performed using the bior2.4 wavelet filter, decomposing each series before feeding it into the LSTM, GRU, Bi-LSTM, Bi-GRU, and Bi-GRU LSTM models. The models’ hyperparameters were fine-tuned through numerous simulations of expected values. In this database, we employed many domestic devices such as a dishwasher, an oven, and a microwave in the kitchen. In the laundry area, we also have a washing machine, a tumble dryer, a refrigerator, and a lamp. We utilized a water heater that was powered by electricity and an air conditioner in the remainder of the residence. It is critical to reduce the prediction error associated with sudden fluctuations. Our model is able to predict the sudden change in electric charge when using critical loads like washing machines, ovens, and air conditioners.