Evaluating the Impact of Frequency Decomposition Techniques on LSTM-Based Household Energy Consumption Forecasting

Taktak, Maissa; Derbel, Faouzi

doi:10.3390/en18102507

Open AccessEditor’s ChoiceArticle

Evaluating the Impact of Frequency Decomposition Techniques on LSTM-Based Household Energy Consumption Forecasting

by

Maissa Taktak

^1,2,3

and

Faouzi Derbel

^1,*

¹

Smart Diagnostic and Online Monitoring, Leipzig University of Applied Sciences, Wächterstraße 13, 04107 Leipzig, Germany

²

Laboratory of Signals, Systems, Artificial Intelligence and Networks (SM@RTS), Digital Research Center of Sfax (CRNS), Sfax University, Sfax 3021, Tunisia

³

National School of Electronics and Telecommunications of Sfax, Sfax 3018, Tunisia

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(10), 2507; https://doi.org/10.3390/en18102507

Submission received: 10 April 2025 / Revised: 8 May 2025 / Accepted: 11 May 2025 / Published: 13 May 2025

(This article belongs to the Special Issue Renewable Energy System Technologies: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Accurate energy consumption forecasting is essential for efficient power grid management, yet existing deep learning models struggle with the multi-scale nature of energy consumption patterns. Contemporary approaches like LSTM and GRU networks process raw time series directly, failing to distinguish between distinct frequency components that represent different physical phenomena in household energy usage. This study presents a novel methodological method that systematically decomposes energy consumption signals into low-frequency components representing gradual trends and daily routines and high-frequency components capturing transient events, such as appliance switching, before applying predictive modeling. Our approach employs computationally efficient convolution-based filters—uniform and binomial—with varying window sizes to separate these components for specialized processing. Experiments on two real-world datasets at different temporal resolutions (1 min and 15 min) demonstrate significant improvements over state-of-the-art methods. For the Smart House dataset, our optimal configuration achieved an R² of 0.997 and RMSE of 0.034, substantially outperforming previous models with R² values of 0.863. Similarly, for the Mexican Household dataset, our approach yielded an R² of 0.994 and RMSE of 13.278, compared to previous RMSE values exceeding 82.488. These findings establish frequency decomposition as a crucial preprocessing step for energy forecasting as it significantly improve the prediction in smart grid applications.

Keywords:

energy forecasting; frequency decomposition; binomial filter; uniform filter; window size; short-term fluctuations; long-term trends; smart grid; power consumption; time series analysis

1. Introduction

Modern power grids face unprecedented challenges as they evolve toward decentralized, renewable-based infrastructures [1]. The integration of intermittent renewable energy sources introduces significant variability and unpredictability into electricity systems that were traditionally designed for stable, controllable generation [2,3]. This transformation necessitates advanced forecasting capabilities not only for routine energy management but critically for anticipating and mitigating potential grid disturbances before they cascade into system-wide failures [4]. Grid operators must now predict and preemptively address fluctuations, imbalances, and anomalies that can trigger frequency deviations, voltage instability, and even widespread outages [5,6].

The proactive identification of emerging disturbances represents a fundamental shift from reactive grid management to predictive control strategies. High-precision household energy consumption forecasting has thus become essential to this new paradigm, enabling grid operators to anticipate demand spikes, detect unusual consumption patterns, and dynamically adjust resources to maintain system stability [7]. Without such predictive capabilities, the increasing penetration of distributed energy resources and smart technologies risks compromising grid resilience and reliability amid growing demand complexity [8,9].

Existing forecasting approaches span from traditional statistical methods to advanced machine learning techniques [10]. Conventional models like ARIMA have demonstrated utility for stationary data but fail to capture the nonlinear, complex patterns that often precede critical grid disturbances [11,12]. Deep learning approaches, particularly recurrent neural networks such as LSTM and GRU architectures, have shown superior performance by capturing temporal dependencies that may signal impending anomalies [13,14]. Recent hybrid models combining signal processing with deep learning, like the SWT-enhanced models [15,16] and attention-based architectures such as DATE-TM [17], have further improved prediction accuracy. However, these state-of-the-art approaches face critical limitations when applied to disturbance prediction: they often process raw time series directly without distinguishing between normal operational variations and precursors to grid instability, underutilize frequency-domain information, i.e., trends and peak values that could reveal emerging disturbance patterns, and impose prohibitive computational costs that impede the real-time response needed for preemptive control actions.

This research addresses these limitations by investigating the transformative potential of signal decomposition in AI-based energy consumption forecasting, with particular emphasis on identifying pattern changes that may indicate imminent grid disturbances. We propose a novel methodological framework that systematically separates energy consumption data into distinct frequency components, enabling the detection of anomalous patterns across different timescales. By isolating high-frequency components, which may indicate sudden load changes, or equipment malfunctions from low-frequency components, which reflect gradual shifts in consumption trends, our approach enhances the grid’s ability to distinguish between routine variations and potential disturbance precursors.

Our specific contributions include the following:

A computationally efficient frequency decomposition preprocessing technique that separates energy consumption signals into components that correspond to different classes of potential grid disturbances;
Development of a multi-component LSTM-based forecasting architecture that processes these decomposed signals independently before recombination, enabling a more accurate prediction of consumption anomalies;
Empirical validation using real-world household energy consumption datasets demonstrating significant improvements in both routine forecasting accuracy and early detection of consumption patterns that could trigger grid instability.

The remainder of this paper is organized as follows: Section 2 presents a comprehensive review of the relevant literature, examining both traditional and AI-based forecasting methodologies alongside existing applications of signal processing in energy prediction and disturbance detection. Section 3 details our proposed methodology, including data preprocessing, decomposition techniques, and model integration strategies. Section 4 describes the experimental setup, evaluation metrics, and results from our implementation. Finally, Section 5 discusses key findings, limitations, practical implications, and directions for future research in predictive grid management.

2. Related Works

Precise forecasting of household energy consumption has become a key area of interest due to its vital role in optimizing smart grid operations, enhancing energy efficiency, and facilitating demand-side management strategies. The increasing complexity of energy consumption patterns is driven by multiple factors, including the widespread integration of renewable energy sources, the proliferation of smart appliances, and the ever-changing nature of consumer behavior [8,9]. As modern energy grids transition toward decentralized and adaptive infrastructures, accurate demand prediction is crucial to maintaining grid stability, ensuring efficient energy distribution, and preventing supply–demand imbalances.

Traditional forecasting models, such as AutoRegressive Integrated Moving Average (ARIMA), have been widely used in time series prediction due to their simplicity, interpretability, and effectiveness in handling linear trends [12]. ARIMA-based approaches have been particularly useful for stationary data, where consumption patterns exhibit relatively stable statistical properties over time. However, their applicability becomes limited when dealing with non-stationary time series that involve sudden fluctuations, seasonal variations, or long-term trends [11]. Similarly, conventional linear regression models fail to accurately capture the nonlinear relationships that often characterize household energy consumption, making them less effective for real-world applications.

To overcome these limitations, machine learning techniques have been extensively explored for energy consumption forecasting. Methods such as support vector machines (SVMs), decision trees, and ensemble learning techniques have demonstrated improved predictive accuracy compared to traditional statistical models [18]. Among these approaches, deep learning has emerged as a powerful tool due to its ability to automatically extract complex features from raw data. Recurrent neural networks (RNNs) and their variants, particularly Long Short-Term Memory (LSTM) networks, have proven highly effective in modeling sequential data by capturing long-term dependencies and learning from historical consumption patterns [13].

LSTM networks offer significant advantages over traditional methods by mitigating issues related to vanishing gradients, allowing for better retention of past information in long sequences [19,20]. This makes them particularly suitable for energy forecasting, where both short-term variations (such as energy consumption spikes) and long-term trends (such as seasonal demand changes) must be accurately predicted. Additionally, Gated Recurrent Unit (GRU)-based architectures have been proposed as efficient alternatives to LSTM networks, offering similar performance with fewer computational resources [14].

Recent studies have explored hybrid forecasting models that combine deep learning with signal processing techniques to further improve accuracy. For instance, a study in [15] proposed a combination of the stationary wavelet transform (SWT) and deep learning models, such as GRU and LSTM. That method aimed to enhance power consumption forecasting by isolating different frequency components in the energy consumption signal. While the results showed improvements in predictive accuracy, the approach suffered from high computational complexity due to the time-intensive process of applying SWT alongside deep learning models. Additionally, the dataset used in the study was collected from a household without modern smart devices, limiting its applicability to contemporary energy consumption patterns.

Another notable contribution is the DATE-TM (Diffusion-Attention-Enhanced Temporal Model) architecture, proposed in [17], which integrates attention mechanisms with diffusion-based modeling. The study reported an

R^{2}

score of 0.87 and a Root-Mean-Square Error (RMSE) of 82.48 when tested on the Mexican household dataset. However, despite achieving high forecasting accuracy, this method was constrained by its significant computational burden, with an

O (T^{2})

complexity resulting from the iterative diffusion process and attention mechanism. Furthermore, comparative experiments showed that standard deep learning models such as DeepAR and BiLSTM underperformed in comparison, with RMSE values of 83.95 and 113.23, and

R^{2}

scores of 0.75 and 0.70, respectively.

While deep learning-based forecasting models have demonstrated significant improvements in accuracy, they still face several key challenges. First, many existing models process the raw time series directly without considering the multi-scale nature of energy consumption patterns, which involve both short-term fluctuations and long-term trends. Second, most approaches do not leverage frequency-domain information, potentially missing crucial components that could enhance prediction performance. Finally, the computational cost of training complex architectures, such as transformer-based models, remains a major limitation, especially for real-time applications.

3. Methodology

This section presents our methodological framework for analyzing household energy consumption at multiple temporal resolutions (1 min and 15 min sampling rates) and enhancing forecasting accuracy through frequency decomposition and neural network integration.

3.1. Multi-Resolution Analysis Rationale

Energy consumption data exhibit distinct characteristics across temporal scales that serve complementary analytical purposes: At 1 min intervals, the signal captures transient events such as appliance switching, power surges, and rapid behavioral changes that indicate anomalous consumption patterns or grid disturbances. These high-frequency components often precede system instabilities, enabling operators to implement preventive control actions before failures propagate through distribution networks. At 15 min intervals, the signal naturally filters high-frequency noise, revealing underlying consumption trends, recurring patterns, and longer-term behavioral signatures. This resolution aligns with grid operational protocols including demand response programs, energy market clearing intervals, and renewable integration timeframes, while providing manageable data volumes for system-wide analyses of load balancing and resource allocation.

Our approach systematically decomposed energy signals into distinct frequency components before applying specialized neural network models, each optimized for a specific component. Figure 1 illustrates the framework of our proposed methodology.

3.2. Data Preprocessing

Preprocessing is a crucial step in time series forecasting to ensure data consistency, remove noise, and enhance predictive performance. The main preprocessing steps include normalization, frequency decomposition, and sequence generation.

Feature Normalization

To enhance numerical stability during model training and ensure fair contribution from all features, we normalized all variables using Min-Max scaling:

X_{scaled} = \frac{X - X_{min}}{X_{max} - X_{min}}

(1)

where

X_{min}

and

X_{max}

represent the minimum and maximum values of the feature set, respectively. This transformation rescales all values to the range [0, 1], preventing features with larger magnitudes from dominating the learning process and accelerating model convergence. Normalization is particularly important when dealing with energy consumption data, where the magnitude of different features, e.g., power consumption vs. temperature, can vary significantly.

3.3. Signal Decomposition Using Convolution

Energy consumption time series consist of multiple superimposed temporal patterns operating at different scales. To extract these patterns systematically, we decomposed the original signal into frequency components that corresponded to distinct physical phenomena in household energy usage. This decomposition is theoretically grounded in the principle that consumption signals can be separated into low-frequency components and high-frequency components.

Low-frequency components emerge from consistent behavioral patterns, such as daily routines, weekly schedules, and seasonal variations in heating/cooling needs across seasons, as well as continuous background loads including refrigerators and standby devices. These gradual trends evolve over hours, days, or seasons and form the baseline consumption profile of a household.

On the other hand, high-frequency components result from discrete events such as appliance switching, mainly using inductive components or electromagnetic components, such as motors, washing machines, microwaves, and electric kettles. These results in momentary power surges, and rapid changes in occupant behavior. These transient spikes and dips occur within minutes or seconds and represent deviations from the baseline consumption. By separating the signal into distinct frequency components, the model is able to process each temporal pattern independently.

In addition to their role in temporal pattern extraction, the convolution-based filters applied during the decomposition process function effectively as a denoising step. By isolating low- and high-frequency components prior to model training, the filters attenuate high-frequency noise typically introduced by sensor inaccuracies or transmission artifacts. This preprocessing reduces the propagation of noise through the forecasting pipeline and allows the model to concentrate on structurally relevant features. As a result, the proposed approach exhibits improved robustness under noisy data conditions, which are characteristic of real-world smart grid environments.

The proposed decomposition process employed convolution-based filters rather than traditional moving averages or classical transforms, such as the discrete wavelet transform or empirical mode decomposition, to achieve computational efficiency. We implemented and compared two filtering approaches with distinct mathematical properties: uniform and binomial filters. These filters are simple and computationally efficient, requiring only 4 ms to process the entire signal, making them suitable for real-time applications. This processing time significantly outperformed alternative methods, including traditional moving average (3.6 s), discrete wavelet transform (10 ms), stationary wavelet transform (33 ms), and empirical mode decomposition (>7.7 s). The proposed method is therefore well suited for real-world scenarios where forecasting must be performed in near real-time.

3.3.1. Theoretical Foundations of Low-Frequency Extraction

Low-frequency signal components can be extracted using various filtering techniques that selectively attenuate high-frequency components while preserving low-frequency information. Two common approaches are the uniform or rectangular filter and the binomial filter, each with distinct mathematical properties and frequency-domain characteristics.

The uniform filter applies equal weights to all observations within a window, giving each point identical importance regardless of its temporal distance from the central point. Mathematically, this filter constructs a rectangular impulse response where the kernel

K_{U}

for window size w is defined as:

K_{U} [j] = \frac{1}{w} for j = 0, 1, \dots, w - 1

(2)

which, if applied to a signal x, is calculated as follows:

y_{low_freq} [i] = \frac{1}{w} \sum_{j = i - \frac{w}{2}}^{i + \frac{w}{2}} x [j]

(3)

The frequency response of the uniform filter exhibits a sinc function pattern (

sin (x) / x

), with significant sidelobes that may allow higher-frequency components to leak through—a phenomenon known as spectral leakage [21]. Theoretically, we expect this filter to produce strong smoothing effects with equal attenuation of all points within the window. It introduces a phase delay proportional to half the window size and demonstrates relatively uniform frequency attenuation within its passband [21]. The filter creates a sharp transition between preserved and attenuated frequency components, which can be advantageous for clearly delineating different signal regimes [21].

The binomial filter applies weights according to the binomial coefficients, giving more importance to observations closer to the central point and less importance to those at the periphery of the window. This creates a more gradual transition that better approximates a Gaussian distribution. Mathematically, the kernel

K_{B}

for a binomial filter of order n (where n is the window size) is defined as:

K_{B} [j] = (\binom{n}{j}) \cdot 2^{- n} for j = 0, 1, \dots, n

(4)

These coefficients are normalized using the factor

2^{- n}

to ensure that the sum of the kernel elements equals 1, thereby preserving the total energy of the signal after filtering.

When applied to a signal, the filtered output can be expressed as:

y_{low_freq} [i] = \sum_{j = 0}^{n} K_{B} [j] \cdot x [i - \frac{n}{2} + j]

(5)

After normalization to ensure unity gain, these weights create an approximation of a Gaussian filter. The frequency response of the binomial filter resembles a Gaussian function without the sidelobes present in the uniform filter [22]. Based on its mathematical properties, the binomial filter should provide more gradual smoothing with the preservation of significant transitions. It should exhibit reduced phase distortion compared to the uniform filter, achieving superior stopband attenuation with better suppression of high-frequency noise. The filter displays a more gradual transition between preserved and attenuated frequencies, potentially preserving more of the original signal structure.

For both filter types, the window size parameter w determines the cutoff frequency that separates low-frequency from high-frequency components. Larger windows correspond to lower cutoff frequencies, resulting in smoother signals but potentially eliminating important medium-frequency variations. Smaller windows preserve more detail but may retain noise. Based on filtering theory, we expect window size

w = 3

to provide minimal smoothing, preserving most signal dynamics including potential noise. A window size

w = 5

should offer moderate smoothing, balancing detail preservation with noise reduction. The largest window size studied,

w = 10

, was likely to produce strong smoothing, emphasizing long-term trends at the expense of shorter-term variations. However, it was important to investigate these window sizes and their influences on the signal prior to their application to select the optimal window size for the current application.

Finally, padding was applied to handle edge effects when fewer values were present at the extremes.

3.3.2. Low-Frequency Component Extraction Methods

The theoretical formulations presented above were implemented through convolution operations for computational efficiency. Convolution allowed for the vectorized processing of the entire signal, significantly reducing computation time compared to direct filtering operations, especially for large datasets.

The uniform filter implementation applied the kernel defined in Equation (2) through convolution. For an input signal x, the low-frequency component extraction defined in Equation (3) can be expressed as the convolution operation:

y_{low_freq} = x * K_{U}

(6)

where ∗ denotes the convolution operation. Figure 2 illustrates this convolution process, showing how the uniform filter provides equal weighting across the entire window.

Similarly, the binomial filter was implemented through convolution with the kernel defined in Equation (4). The filtering operation from Equation (5) was expressed as:

y_{low_freq} = x * K_{B}

(7)

Figure 3 visualizes this process, highlighting how the binomial weights are distributed to emphasize central values while gradually reducing the influence of peripheral points.

For both filtering methods, after obtaining the low-frequency component

y_{low_freq}

, the high-frequency component

y_{high_freq}

was derived through signal subtraction:

y_{high_freq} [i] = x [i] - y_{low_freq} [i]

(8)

This residual component captured the rapid fluctuations, transient events, and short-term variations present in the original signal. By separating the signal into these complementary frequency components, we enabled the specialized modeling of different temporal patterns in energy consumption, potentially improving forecasting accuracy.

3.3.3. High-Frequency Component Extraction

After extracting the low-frequency component from the original signal using either the uniform or binomial filter, the high-frequency component was obtained through simple subtraction. For an input signal x and its corresponding low-frequency component

y_{low_freq}

, the high-frequency component

y_{high_freq}

was derived as:

y_{high_freq} [i] = x [i] - y_{low_freq} [i]

(9)

This residual component captured the rapid fluctuations, transient events, and short-term variations present in the original signal. The high-frequency component typically corresponds to sudden changes in energy consumption patterns, such as appliance switching events, momentary load fluctuations, and other transient phenomena that occur over shorter time intervals.

3.3.4. Visualization and Comparison of Signal Decomposition Methods

Figure 4 and Figure 5 provide a visual comparison of how uniform and binomial filters decomposed the same power consumption signal from a Mexican household with window size

w = 10

. Both figures display three time series: the original signal (top panel), the extracted low-frequency component (middle panel), and the resulting high-frequency component (bottom panel).

A detailed examination of these figures revealed several important differences in how each filter processed the energy consumption data. In terms of consumption peak treatment, the uniform filter’s low-frequency component (blue line in Figure 4) showed a relatively smooth curve that captured the general trend but attenuated many of the smaller transitions. In contrast, Figure 5 demonstrates that the binomial filter preserved more of the intermediate-scale variations while still providing effective smoothing. This was particularly evident around the significant consumption spikes occurring at approximately 5 November 2022 16:19, where the binomial filter retained more of the peak structure.

The temporal responsiveness also differed notably between the two approaches. The low-frequency component extracted by the binomial filter exhibited better temporal alignment with transitions in the original signal. This reduced lag is visible in how the blue curve in Figure 5 responds more quickly to changes in consumption patterns compared to Figure 4. The uniform filter’s equal weighting of past and recent values causes a certain degree of lag, which can impact the detection of sudden changes in energy consumption.

The high-frequency components (red lines) differed substantially between the two methods. In Figure 4, the residuals show more pronounced variations with sharper peaks, indicating that the uniform filter relegated more signal information to the high-frequency component. These fluctuations corresponded to short-term consumption changes and anomalies that were not preserved in the low-frequency trend. The binomial filter’s high-frequency component in Figure 5 shows a more balanced distribution of residual energy and contains some negative values, suggesting better preservation of phase relationships. Since the binomial filter smoothed the data more adaptively, some smaller fluctuations were preserved in the low-frequency trend instead of being completely transferred to the high-frequency component.

The binomial filter’s low-frequency output maintained more of the original signal’s structural characteristics while still providing effective noise reduction. This is evident in how the middle panel of Figure 5 preserves more of the intermediate fluctuations visible in the original signal. The adaptive weighting of past and recent values ensured that significant transitions in the consumption pattern were better captured while still filtering out noise.

Table 1 summarizes the key differences between these two filtering approaches based on both theoretical properties and observed characteristics from the visualizations.

Given the importance of accurately capturing both long-term consumption trends and short-term fluctuations, the choice of filtering technique significantly impacts the performance of forecasting models. The uniform filter offers a straightforward approach and effectively eliminates noise but may introduce excessive lag, which is undesirable in applications requiring responsiveness to rapid consumption changes. Its tendency to oversmooth certain variations can reduce the model’s ability to capture sudden shifts in power consumption patterns.

In contrast, the binomial filter provides a more refined decomposition, capturing long-term trends without introducing excessive lag. It maintains essential structures in the signal while still filtering out noise, making it potentially more suitable for applications where short-term variations are critical, such as household energy consumption forecasting. However, the optimal choice between these filtering approaches ultimately depends on the specific forecasting objectives and the temporal resolution at which predictions are required.

The visual evidence reinforces the table’s assertions regarding smoothing effects and lag response, where the binomial filter demonstrates advantages in preserving significant transitions while maintaining appropriate smoothing. These qualitative insights provide a foundation for understanding the quantitative performance differences that are analyzed in subsequent sections.

3.3.5. Frequency-Domain Analysis of the Decomposition

To further validate the effectiveness of the signal decomposition, we analyzed the separation in the frequency domain. Figure 6 presents the spectrogram of the original power consumption signal, where time is shown on the horizontal axis, frequency on the vertical axis, and color intensity indicates signal power at each frequency and time.

A horizontal red line was superimposed at the approximate cutoff frequency corresponding to the binomial filter with window size

w = 3

. As discussed earlier, this cutoff frequency was approximately

\frac{1}{3} \frac{F_{s}}{2} \approx

0.167 Hz. Frequencies below this threshold were expected to be retained predominantly in the low-frequency component, while frequencies above it were captured by the high-frequency residual.

The spectrogram confirmed that the energy of the original signal was concentrated mostly below the cutoff frequency, with higher-frequency components being less dominant. This justified the effectiveness of the filtering approach: the low-frequency component retained the essential trend information, while the high-frequency component captured rapid fluctuations and noise.

3.4. Temporal Sequence Construction

Following signal decomposition, the extracted low-frequency and high-frequency components must be prepared for input into prediction models. We employed a sliding window approach to construct temporal sequences suitable for recurrent neural network processing. For a given lookback period, each input sequence comprised consecutive observations, while the target value corresponded to the next point in the time series.

Mathematically, the sequence construction was defined as:

X [i] = [x_{i}, x_{i + 1}, \dots, x_{i + lookback - 1}]

(10)

y [i] = x_{i + lookback}

(11)

where

X [i]

represents the input sequence starting at index i, and

y [i]

is the corresponding target value to be predicted. In our implementation, we set

lookback = 60

, meaning each prediction was based on the previous 60 observations, i.e., one hour if the dataset is sampled at a rate of 1 min, or 15 h if the dataset is sampled at a rate of 15 min. This lookback period was selected to capture both short-term patterns and hourly consumption cycles while maintaining computational efficiency.

To assess the effect of different time resolutions on forecasting accuracy, multiple window sizes were tested,

w = 3, 5, 10

, corresponding to prediction intervals of 1 min and 15 min.

3.5. LSTM-Based Forecasting Model

For each dataset, two separate LSTM models were trained: one for the low-frequency component and one for the high-frequency component. The motivation for this dual-model approach was to enable the network to capture both the underlying consumption patterns (low-frequency) and transient energy spikes (high-frequency).

3.5.1. Model Architecture

Both LSTM models followed the same architecture:

A single LSTM layer with 50 hidden units to capture temporal dependencies;
A fully connected dense output layer with a single neuron for one-step-ahead prediction;
Mean Squared Error (MSE) as the loss function to penalize large prediction errors;
Adam optimizer with default parameters (learning rate = 0.001, $β_{1} = 0.9$ , $β_{2} = 0.999$ ).

The hyperparameters for the LSTM model, including the number of layers, hidden units, dropout rates, and the learning rate, were chosen through a grid search process. The grid search was performed over a predefined range of values for each hyperparameter to identify the optimal configuration.

It should be noted that while the models for the low and high frequencies shared the same initial architecture, they evolved differently during training to capture distinct signal components. The identical structural starting point ensured fair comparison, but the weight parameters adapted to the specific statistical properties of each frequency band. The low-frequency model developed sensitivity to gradual changes and persistent patterns, with its recurrent connections learning longer-term dependencies characteristic of baseline consumption trends. Conversely, the high-frequency model became attuned to rapid transitions and short-term fluctuations, with its weights optimized to detect and predict transient events.

To ensure the reliability of the chosen configuration, several data splits were tested using random sampling. These tests were conducted to evaluate the consistency of the model’s performance under different conditions. Based on the results, the 80% training and 20% testing split consistently provided the most accurate and stable outcomes. Within the training set, 10% was further reserved for validation. The models were trained using a batch size of 32 for 20 epochs.

3.5.2. Training and Prediction

Each model was trained independently on its respective dataset. Specifically, the low-frequency component was used to train one LSTM model, capturing long-term trends, while the high-frequency component was used to train another LSTM model, focusing on short-term variations. This separation allowed the model to learn the specific statistical properties of each component, rather than attempting to fit a single model to the entire signal. As a result, the training process became more effective, with the model assigning appropriate focus to both stable and transient variations in energy usage. This targeted approach enhanced the model’s ability to generalize and improved overall forecasting accuracy.

Once trained, both models generated predictions for the test set. The final energy consumption forecast, denoted as

\hat{X} (t)

, was obtained by summing the outputs of both models:

\hat{X} (t) = {\hat{X}}_{low} (t) + {\hat{X}}_{high} (t)

(12)

where

${\hat{X}}_{low} (t)$ represents the predicted low-frequency component at time t.
${\hat{X}}_{high} (t)$ represents the predicted high-frequency component at time t.
$\hat{X} (t)$ is the final predicted energy consumption at time t.

By reconstructing the signal from these two components, the model effectively captured both gradual consumption trends and rapid fluctuations, leading to a more accurate and robust forecast.

3.6. Performance Evaluation

To evaluate the accuracy and reliability of the forecasting models, four standard error metrics were computed [23].

3.6.1. Mean Squared Error (MSE)

M S E = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}

(13)

where

y_{i}

is the actual energy consumption value,

{\hat{y}}_{i}

is the predicted value, and N is the total number of observations.

3.6.2. Root-Mean-Square Error (RMSE)

R M S E = \sqrt{M S E}

(14)

The RMSE provides an intuitive interpretation of the prediction error by measuring the standard deviation of residuals.

3.6.3. Mean Absolute Error (MAE)

M A E = \frac{1}{N} \sum_{i = 1}^{N} | y_{i} - {\hat{y}}_{i} |

(15)

The MAE quantifies the average magnitude of errors, making it more robust to outliers compared to the MSE.

3.6.4. Coefficient of Determination (R²)

R^{2} = 1 - \frac{\sum {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum {(y_{i} - \bar{y})}^{2}}

(16)

where

\bar{y}

represents the mean of the actual values. The

R^{2}

metric indicates how well the model explains the variance in the data, with values closer to 1 signifying better performance.

3.7. Analysis of Temporal Window Effects

The impact of different window sizes on forecasting performance was analyzed by comparing evaluation metrics across multiple time resolutions. The objective was to determine the optimal trade-off between short-term responsiveness and long-term trend stability. A larger window size may improve trend prediction but risks losing sensitivity to short-term fluctuations, whereas a smaller window enhances immediate responsiveness at the expense of increased noise.

4. Experiments and Results

4.1. Datasets’ Description

Two publicly available datasets, Smart House and Mexican House, were employed to analyze household energy consumption patterns. These datasets contained high-resolution time series data capturing diverse variations in energy demand across different household conditions.

4.1.1. Smart House Dataset

The Smart House dataset consists of power consumption measurements collected from a single-family home equipped with multiple IoT sensors. Data were recorded at a resolution of one minute, covering both individual appliance usage and overall household energy demand. The dataset includes the following features:

Timestamp: time of measurement (in minute intervals).
Total power consumption: aggregated household power usage in kilowatts (KW).
Appliance-specific consumption: power consumption per device (e.g., refrigerator, HVAC system, washing machine).
Environmental factors: ambient temperature and humidity, which influence energy demand.

4.1.2. Mexican House Dataset

The Mexican House dataset provides detailed energy consumption records from a residential household located in northeastern Mexico [24]. Data were collected every minute over a period of 14 months, from 5 November 2022, to 5 January 2024. In total, the dataset comprises 605,260 samples, each containing 19 variables related to energy consumption and environmental conditions.

This dataset was specifically designed for domestic energy consumption forecasting and behavior analysis, addressing a gap in the existing literature where such datasets for Mexico remained scarce.

This temporal consistency is important for time series forecasting, as it maintains the natural daily and weekly cycles in energy use without artificial interruptions. In datasets affected by daylight saving time, the one-hour shift can introduce abrupt changes in consumption patterns that do not reflect actual user behavior. Such discontinuities can interfere with the model’s ability to learn true seasonal or temporal patterns. By avoiding these shifts, the Mexican Household dataset provides a continuous and stable time series, helping models more accurately capture regular usage trends and behavioral dynamics.

The dataset is stored in CSV format, with each row representing a timestamped observation. The primary attributes include the following:

Timestamp: time of measurement (recorded every minute).
Total energy usage: measured in watts (W), representing the household’s power consumption.
Temperature and humidity: recorded indoor and outdoor environmental conditions.
Solar power generation: the amount of energy produced by rooftop solar panels.

Additionally, the dataset contains other electrical and meteorological variables that contribute to understanding energy consumption patterns.

This dataset serves as a valuable resource for training and evaluating predictive models aimed at improving household energy management and efficiency.

To ensure consistency in the analysis, both datasets were resampled to match common time intervals, allowing direct comparison across different forecasting models.

The two household datasets include not only energy consumption data but also weather data for the corresponding regions. The specific variables (factors) are outlined in Table 2 and Table 3. In addition, comprehensive statistical information—such as count, minimum, maximum, mean, and standard deviation—is provided to enhance the understanding of the datasets. These statistics, presented in Table 4 and Table 5, offer a more detailed characterization of the data.

An outlier analysis was conducted for both the Mexican and Smart House datasets. The results did not indicate a significant presence of outliers, suggesting that the data reflected a consistent and realistic energy consumption behavior over time.

4.1.3. Feature Importance Analysis

In order to ensure optimal feature selection, we conducted a Gini-based feature importance analysis using tree-based models. This method evaluates the contribution of each feature by measuring the reduction in Gini impurity when the feature is used for splitting across the ensemble of trees.

The Gini impurity for a node t is defined as:

Gini (t) = 1 - \sum_{i = 1}^{C} p {(i | t)}^{2}

where

p (i | t)

represents the proportion of samples belonging to class i at node t, and C is the total number of classes.

A feature is considered important if its usage in the tree structure significantly reduces the overall Gini impurity across nodes, indicating better separation between classes or improved predictive power.

For the Smart House dataset, the resulting feature importance scores are presented in Figure 7. It was observed that use [kW] was by far the most influential feature, with an importance score of approximately 0.25. It was followed by dewPoint (around 0.09), pressure (around 0.07), and windBearing (around 0.07). In contrast, features such as Solar [kW] and House overall [kW] contributed very little to the model. Based on these results, it appeared sufficient to retain only the power consumption feature, i.e., use [kW], for further analysis, as it dominated the predictive capability.

Similarly, for the second dataset, corresponding to the Mexican Household dataset and illustrated in Figure 8, a comparable pattern was observed. The feature active_power emerged as the dominant variable with an importance score of approximately 0.42, which was equivalent to the use [kW] feature from the first dataset. Other important features included temp (approximately 0.12), current (around 0.09), and power_factor (around 0.08).

Given these observations across both datasets, only the most dominant feature—use [kW] for the Smart House and active_power for the Mexican Household—was retained for the subsequent feasibility study. This decision was justified by their overwhelming importance relative to the other features.

4.2. Impact of Low-Frequency Extraction on Energy Consumption Prediction

In energy consumption forecasting, raw power signals contain both high-frequency variations and low-frequency trends. High-frequency fluctuations are often due to short-term appliance switching, while low-frequency components represent broader consumption patterns. Extracting the low-frequency component helps improve prediction models by focusing on meaningful trends while reducing noise.

In this section, we compare the performance of two low-pass filters (uniform and binomial) applied to two datasets (Smart House dataset and Mexican House dataset) at two different granularities (1 min and 15 min). Additionally, we evaluate the impact of different window sizes (3, 5, 10) on the model’s performance.

4.3. Results for Smart House Dataset

4.3.1. Effect of the Uniform Filter on Prediction Performance

The uniform filter applies equal weighting to past values within a window, providing a simple smoothing effect. This section evaluates how different window sizes affected prediction performance at 1 min and 15 min granularities.

As shown in Table 6, the choice of window size significantly affected the prediction accuracy. A larger window (e.g., 10) resulted in a higher MSE (0.038) and a lower

R^{2}

(0.912), indicating that the model struggled to capture short-term fluctuations when the smoothing effect was too strong. Reducing the window size to five improved the performance, lowering the MSE to 0.019 and increasing

R^{2}

to 0.954.

Notably, the best performance was observed with a window size of three, which yielded the lowest MSE (0.001) and the highest

R^{2}

(0.997). This suggests that a smaller window retains more of the signal’s short-term variations, which are essential in household energy consumption forecasting. Since energy demand can exhibit rapid fluctuations due to appliance usage and occupant behavior, capturing these variations leads to superior prediction accuracy.

For the 15 min granularity, a similar trend was observed, as shown in Table 7. Larger window sizes (e.g., 10) led to reduced prediction performance, with an MSE of 0.068 and an

R^{2}

of 0.768. As the window size decreased to five, the prediction error decreased, and the model better captured mid-range fluctuations.

The best predictive accuracy was again achieved with a window size of three, for which the MSE was minimized (0.003) and the

R^{2}

reached 0.989. This reinforces the idea that smaller window sizes preserve more relevant signal details, which is particularly important for energy forecasting at fine-grained temporal resolutions.

4.3.2. Effect of the Binomial Filter on Prediction Performance

The binomial filter assigns higher weights to central data points within the smoothing window, offering a more refined filtering effect compared to the uniform filter. This section examines how different window sizes influenced prediction accuracy on the Smart House dataset at both 1 min and 15 min granularities.

As shown in Table 8, reducing the window size improved the prediction performance. A larger window size of 10 resulted in an MSE of 0.0249 and an

R^{2}

of 0.942, indicating moderate smoothing that may overlook short-term dynamics. Shrinking the window to five significantly enhanced performance (MSE = 0.010,

R^{2}

= 0.974), capturing more local variations.

The optimal performance was achieved with a window size of three, yielding the lowest MSE (0.001) and highest

R^{2}

(0.995). These results suggest that the binomial filter, like the uniform filter, benefits from smaller window sizes for short-term forecasting. However, due to its weighted structure, the binomial filter provides better performance at larger window sizes compared to its uniform counterpart.

For the 15 min granularity (Table 9), the same trend held. A larger window size of 10 led to a higher error (MSE = 0.044,

R^{2}

= 0.849), while reducing the window to 5 resulted in a notable improvement. The best performance was again observed with a window size of three, for which the MSE dropped to 0.002 and

R^{2}

reached 0.992.

These findings highlight the effectiveness of the binomial filter in preserving essential signal dynamics even at coarser granularities. Compared to the uniform filter, the binomial filter demonstrated superior performance, particularly at moderate window sizes, due to its emphasis on central data points and smoother transition handling.

4.4. Results for Mexican House Dataset

4.4.1. Effect of the Uniform Filter on Prediction Performance

The Mexican House dataset may exhibit different consumption patterns due to household behavior differences. This section evaluates whether the uniform filter behaved similarly in both datasets.

The uniform filter (simple moving average) was applied to the Mexican dataset to evaluate how different window sizes influenced prediction accuracy. The performance was measured using the Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-Squared (

R^{2}

) for both 1 min and 15 min granularities.

As shown in Table 10, the window size significantly impacted prediction accuracy. A larger window (e.g., 10) resulted in a higher MSE (3141.470) and a lower

R^{2}

(0.896), indicating that excessive smoothing obscured short-term fluctuations. Reducing the window size to five improved performance, lowering the MSE to 1392.006 and increasing

R^{2}

to 0.954.

The best performance was achieved with a window size of three, yielding the lowest MSE (176.321) and the highest

R^{2}

(0.994). This suggests that smaller windows better capture rapid variations in the data, which is critical for precise short-term forecasting in energy consumption patterns.

For the 15 min granularity (Table 11), the trend persisted. A window size of 10 led to poor performance (MSE = 5582.041,

R^{2}

= 0.641), while reducing the window to 5 improved accuracy (MSE = 2684.711,

R^{2}

= 0.827). The optimal results occurred with a window size of three, for which the MSE dropped to 127.112 W and

R^{2}

reached 0.991, confirming that smaller windows retained essential signal dynamics.

4.4.2. Effect of the Binomial Filter on Prediction Performance

To compare with the uniform filter, the binomial filter was then applied to the Mexican dataset to evaluate its impact on prediction accuracy. The binomial filter assigns greater weights to central values within the window, offering a smoother approximation while still preserving important trends. Table 12 and Table 13 present the prediction performance for different window sizes at 1 min and 15 min granularities.

From Table 12, we observe that for a window size of 10, the binomial filter yielded an MSE of 2579.860 and an

R^{2}

of 0.915, which was a modest improvement over the uniform filter at the same window size. As the window size decreased to five, performance improved (MSE = 1729.850,

R^{2}

= 0.943), indicating better tracking of the energy consumption signal.

Surprisingly, the smallest window size (3) did not achieve the best MSE (1954.790) but did produce the highest

R^{2}

value of 0.993. This suggests that although the binomial filter with a small window preserves signal dynamics, it may introduce slight distortions that increase absolute error, while still maintaining strong correlation with actual values.

At the 15 min granularity (Table 13), a similar pattern emerged. A larger window (size 10) resulted in weaker performance (MSE = 3583.850,

R^{2}

= 0.770), while reducing the window to 5 significantly improved prediction metrics (MSE = 1262.383,

R^{2}

= 0.919). The best accuracy was observed with a window size of three, achieving a very low MSE (139.570) and a high

R^{2}

(0.991).

These results highlight the binomial filter’s ability to retain critical consumption trends even with moderate smoothing. However, compared to the uniform filter, the binomial filter sometimes resulted in slightly higher errors at small window sizes, although it still provided high correlation with actual consumption values. Therefore, careful tuning of the window size is necessary to strike a balance between smoothing and predictive fidelity.

4.5. Influence of Data Split and Window Size on Prediction Accuracy

To ensure the effectiveness of the proposed solution across different data splits for the Mexican dataset, three splits were considered: 80–20%, 70–30%, and 60–40% for training and testing. For the three methods, a score of

0.994 \pm 0.000817

was obtained for these splits. This means that the influence of seasonality was limited using the proposed double LSTM approach.

The effect of varying window sizes on prediction accuracy is a critical aspect when modeling power consumption, particularly in time series with fine temporal granularity. Figure 9 and Figure 10 illustrate the predicted versus actual power consumption for a smart household dataset, using binomial smoothing filters applied with multiple window sizes at 1 min and 15 min granularities, respectively.

In Figure 9, where the data were sampled every minute, smaller window sizes preserved short-term fluctuations and transient appliance events but could retain high-frequency noise, resulting in slight overfitting and increased prediction variance. Conversely, larger windows led to smoother predictions by averaging over more data points, which effectively reduced noise and highlighted broader consumption trends. However, excessive smoothing could also dampen the sharp transitions in load demand, such as sudden spikes caused by high-power appliances, leading to underestimation during peak periods.

Figure 10, based on the same dataset but aggregated at a 15 min resolution, demonstrates a markedly different behavior. At this coarser granularity, the signal was inherently smoother, and the benefit of additional smoothing via large window sizes became less pronounced. The predictions across different window sizes were generally closer to each other, and larger windows did not excessively distort the temporal dynamics. This suggests that for coarser-grained data, larger windows can be used without significant loss of important consumption features, improving model generalization.

Comparatively, the 1 min dataset required careful window size selection to balance signal fidelity and noise reduction, as inappropriate choices could obscure important load characteristics. On the other hand, in the 15 min setting, the model was more robust to changes in window size, as the primary variability had already been attenuated through temporal aggregation.

These observations support the notion that the optimal window size for filtering depends strongly on the data resolution: high-resolution datasets benefit from moderate smoothing to suppress volatility, whereas lower-resolution datasets can tolerate and even benefit from larger smoothing windows without significant degradation in predictive performance.

4.6. Influence of Lookback Window Size on Accuracy

To ensure that the chosen lookback window size was appropriate, we systematically evaluated multiple window sizes ranging from 10 to 100. For each tested value, forecasting models were trained and assessed using key performance metrics, including the Mean Squared Error (MSE), Root-Mean-Square Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination (

R^{2}

). As shown in Figure 11, the lookback window of 60 consistently achieved the lowest error values and the highest

R^{2}

score, indicating superior predictive performance. The red dashed line in the figure highlights the selected lookback window size, which offers the best compromise between capturing relevant temporal dynamics and maintaining model robustness. It also means that 60 was sufficient to avoid any bias due to seasonality.

4.7. Discussion

The experimental results confirmed the effectiveness of frequency decomposition in improving the accuracy of short-term load forecasting. Across both the Smart House and Mexican Household datasets, models trained on decomposed signals consistently outperformed those reported in prior studies. Notably, the use of windowed uniform and binomial filters significantly enhanced model performance, especially when smaller window sizes were applied.

To further highlight the importance of signal decomposition, we conducted additional experiments by running the models without applying any decomposition. Table 14 presents the performance comparison obtained when forecasting power consumption using a CNN-LSTM model, a GRU model, a single LSTM model, the DeepAR model, the BiLSTM model, the DATE-TM approach [17], and our proposed method. The results, shown for the Mexican Household dataset, clearly demonstrate that our method significantly outperformed the baseline models, including DATE-TM. This highlights the crucial role of frequency decomposition in enhancing model performance.

Furthermore, despite their simplicity, the proposed convolution-based methods yielded highly competitive results, outperforming baseline models in both accuracy and efficiency.

To assess the statistical significance of our frequency decomposition approach, we employed the Diebold–Mariano test to compare forecasting performance between our method and a standard single LSTM model applied to the non-decomposed signal. The Diebold–Mariano test evaluates the null hypothesis of equal predictive accuracy between two forecast methods by examining their corresponding loss differentials. The DM test statistic is calculated as follows:

D M = \frac{\bar{d}}{\sqrt{\hat{V} (\bar{d})}}

(17)

where

$\bar{d} = \frac{1}{T} \sum_{t = 1}^{T} d_{t}$ is the sample mean of the loss differential series;
$d_{t} = L (e_{1, t}) - L (e_{2, t})$ represents the difference between the loss functions at time t;
$L (\cdot)$ is the loss function (typically the squared or absolute error);
$e_{1, t}$ and $e_{2, t}$ are the forecast errors from the two models at time t;
$\hat{V} (\bar{d})$ is a consistent estimate of the variance of $\bar{d}$ .

Our analysis yielded a DM test statistic of 43.57 with a p-value of 10⁻⁷, which is substantially below the conventional significance threshold of 0.05. This result provided strong evidence to reject the null hypothesis that both models had equal predictive accuracy. The magnitude of the DM statistic (43.57) indicated a large effect size, demonstrating that the forecast errors from the single LSTM model were consistently and significantly larger than those from our frequency decomposition method. This statistical evidence confirmed that the separate modeling of frequency components provided a substantial and measurable improvement in forecasting accuracy. The extremely low p-value suggested that the probability of observing such performance differences by random chance was negligible, thus validating the fundamental efficacy of our decomposition-based approach in comparison to traditional single-model techniques.

Compared to the standard single LSTM, each LSTM network in the proposed double LSTM approach had the same complexity, but the training time was longer because both models needed to be trained sequentially. Therefore, while the single model required 127.8 MB of memory for 5135.17 s of training time across 20 epochs, the dual model required 128.41 MB of memory for 9879.61 s of training time, approximately double the time with the same memory requirement. During prediction, the standard LSTM’s inference time was 13 s, half of the 27 s required by the two models to process the 3783 time samples in the test dataset. This equated to 3.4 and 7.1 ms per sample, respectively.

To evaluate the effectiveness of the proposed dual-frequency LSTM model against established methodologies, we conducted a comparative analysis with a decomposition-based approach: STL (Seasonal-Trend decomposition using Loess) followed by a standard LSTM implementation. Table 15 summarizes the performance metrics of both methods on the same test dataset.

The quantitative results demonstrate that our dual-frequency approach substantially outperformed the STL + LSTM baseline across all evaluation metrics. Specifically, our method reduced the Mean Absolute Error by 58.2% and the RMSE by 65.3%. These substantial performance differentials provided compelling evidence for both the predictive accuracy and methodological robustness of our proposed architecture.

For the Smart House dataset at 1 min granularity, the combination of decomposition and LSTM led to an R² score of 0.997 with the uniform filter and 0.995 with the binomial filter for a window size of three. These values reflected a near-perfect fit and represented a substantial improvement compared to the R² score of 0.863 found in the existing literature [17]. Similar improvements were observed in terms of MSE and MAE, demonstrating the model’s capacity to track rapid consumption dynamics with minimal error. These results are depicted in Table 16.

The Mexican Household dataset exhibited a similar trend, particularly at higher resolutions. A window size of three with uniform filtering yielded an R² of 0.994 and a low RMSE of 13.278, markedly outperforming previously reported RMSE values of 82.488 reported in the literature [17]. Even at the 15 min interval, which typically smooths out temporal fluctuations, the proposed approach maintained high predictive accuracy, with R² values exceeding 0.99 for both filters.

To rigorously validate our approach and address concerns of potential overfitting, we implemented a time series cross-validation framework. Unlike standard k-fold cross-validation, which can lead to data leakage in time series problems, we employed a temporal cross-validation strategy using TimeSeriesSplit with

n = 5

folds. This approach ensures that all training data strictly precede test data in each fold, maintaining the temporal integrity essential for forecasting tasks.

TimeSeriesSplit = (T_{1}, E_{1}), (T_{2}, E_{2}), \dots, (T_{n}, E_{n})

(18)

where:

$T_{i}$ represents the training set for fold i
$E_{i}$ represents the test set for fold i
$\forall i, j such that i < j : max (T_{i}) < min (E_{i}) and max (E_{i}) \leq min (T_{j})$

For each fold, we independently trained both the low-frequency and high-frequency models on their respective decomposed signals using the designated training set. Performance metrics were then computed on the held-out test data, with final predictions formed by recombining outputs from both frequency components. Table 17 presents the mean and standard deviation of performance metrics across all five folds.

The temporal cross-validation results demonstrated remarkable consistency across all folds, with the standard deviation of

R^{2}

scores being merely 0.003. This low variance in performance metrics across different temporal splits strongly countered concerns of overfitting or data leakage. Furthermore, even the lower bound of the 95% confidence interval for

R^{2}

(0.989) substantially exceeded the performance reported for state-of-the-art methods in the literature.

The proposed approach showed significant improvements over the state of the art. For the Smart House dataset, the MAE decreased by approximately 87.1%, and the RMSE dropped by 89.4%. The R² score increased from 0.863 to 0.997, reflecting a 15.3% improvement in variance explanation. These high values are explained by the fact that the decomposition and filtering steps simplify the input signal by removing noise and isolating meaningful patterns. This helps the model learn more effectively and make more accurate predictions. Moreover, the results were validated through careful parameter tuning and multiple data splits to ensure that the gains were reliable and not tied to a specific subset of the data.

Similarly, for the Mexican Household dataset, the MAE decreased by 66.5%, and the RMSE dropped drastically by 83.9%. The R² score improved from 0.878 to 0.994, representing a 13.2% relative enhancement. These improvements clearly demonstrated the robustness and superior accuracy of the proposed decomposition and filtering strategy when paired with the LSTM model, especially for high-resolution forecasting scenarios. To further validate this, we compared the proposed method with a standard LSTM model trained on the raw, non-decomposed signal. The baseline model yielded an MAE of 37.624, RMSE of 120.070, and R² of 0.526. In contrast, our decomposition-based approach achieved an MAE of approximately 9.626, RMSE of 13.278, and R² of 0.994. These results highlight the importance of frequency decomposition: by isolating different frequency components and modeling them separately, the model captures more fine-grained temporal structures, leading to significantly higher predictive performance. Additionally, a Gated Recurrent Unit (GRU) model was evaluated and produced results comparable to our LSTM-based approach. However, the LSTM model consistently outperformed GRU with slightly lower prediction errors and a higher coefficient of determination. Furthermore, the GRU model required approximately 5% more memory, making it less suitable for deployment in resource-constrained environments. The choice of a simple LSTM model thus offers a favorable balance between predictive accuracy and computational efficiency, aligning with the practical constraints of real-world smart grid applications.

At the 15 min resolution, the proposed method showed significant improvements over existing results. For the Smart House dataset, the coefficient of determination increased from 0.758 to 0.992, indicating a much better fit. For the Mexican Household dataset, the Mean Absolute Error and Root-Mean-Square error dropped by 99.1% and 99.2%, with the R² value rising from 0.771 to 0.991, confirming the robustness of the proposed approach across different data types. These results are depicted in Table 18.

A critical finding from our investigation was the inverse relationship between filter window size and prediction accuracy. Smaller window sizes (particularly

w = 3

) consistently yielded superior results across all experimental configurations. This suggests that preserving fine-grained signal characteristics through minimal smoothing is crucial for accurate energy forecasting. Excessive smoothing with larger windows appears to eliminate valuable predictive information about consumption dynamics, even when applied within a frequency decomposition framework.

When comparing filtering approaches, the binomial filter demonstrated slight advantages over the uniform filter in several configurations. As shown in Table 6, Table 8, Table 10, and Table 12, particularly when comparing the RMSE values, the RMSE values clearly decreased for the binomial filter compared to the uniform filter. This aligns with our theoretical understanding of the binomial filter’s properties—its graduated weighting scheme appears better suited to preserving significant signal transitions while still providing effective noise reduction. However, the performance difference between filtering methods was less pronounced than the impact of window size, indicating that window size selection is the more critical parameter in decomposition-based forecasting.

These findings have significant implications for energy forecasting applications. The optimal approach for high-accuracy household consumption prediction appears to involve a small decomposition window (

w = 3

), preferably using a binomial filter, with separate LSTM models trained on the resulting frequency components. This configuration preserves essential signal characteristics while enabling specialized prediction of different temporal patterns, resulting in remarkably accurate forecasts across diverse household environments and temporal resolutions.

The consistency of these results across two independent datasets with different characteristics—the Smart House dataset with numerous appliance-specific measurements and the Mexican Household dataset with different climate conditions and usage patterns—suggests that the benefits of frequency decomposition are generalizable rather than dataset-specific. Further analysis revealed that the decomposition approach performed well regardless of the underlying variations in usage patterns or environmental conditions. For both datasets, we observed similar improvements in key performance metrics such as Mean Absolute Error (MAE) and Root-Mean-Square Error (RMSE), indicating the robustness of the model. This suggests that the model’s performance is not overly reliant on specific features or data distributions but rather on the general ability of frequency decomposition to isolate and model relevant temporal patterns. These findings underscore the model’s adaptability across different real-world scenarios, providing confidence in its potential for broader applications beyond the datasets considered here.

5. Conclusions

Accurate forecasting of household energy consumption has become increasingly important for efficient power grid management as energy systems evolve toward more decentralized and renewable-based infrastructures. State-of-the-art deep learning approaches such as LSTM and GRU networks have demonstrated superior performance compared to traditional statistical methods; however, they typically process raw time series directly without distinguishing between different frequency components. This limitation impairs their ability to simultaneously capture both short-term fluctuations and long-term consumption trends, which represent distinct physical phenomena in household energy usage.

This study proposed an enhanced forecasting methodology that integrates signal decomposition with deep learning to improve the prediction of household energy consumption. In addition, the impact of frequency decomposition techniques on household energy consumption forecasting accuracy using LSTM neural networks was thoroughly investigated. Energy consumption time series inherently contain superimposed patterns operating at multiple temporal scales—from rapid fluctuations caused by individual appliance operations to gradual variations reflecting daily routines and seasonal patterns. We addressed this limitation by developing and evaluating a methodological framework utilizing convolution-based filtering to decompose consumption signals prior to predictive modeling. Two filtering approaches were systematically compared: uniform filtering, which applies equal weighting across the window, and binomial filtering, which assigns higher weights to central data points. These methods were assessed across multiple window sizes (w = 3, 5, 10) using two independent datasets (Smart House and Mexican Household) at different temporal resolutions (1 min and 15 min intervals).

The experimental results demonstrated that frequency decomposition significantly enhanced forecasting accuracy. For the Smart House dataset at 1 min granularity, the optimal configuration (uniform filter with w = 3) achieved an R² value of 0.997 and MSE of 0.0012, representing a substantial improvement over the R² of 0.863 reported in previous studies. Similarly, for the Mexican Household dataset, the best configuration yielded an R² of 0.994 and MSE of 176.320, compared to RMSE values exceeding 82.488 in prior research. Across all experimental conditions, smaller window sizes (particularly w = 3) consistently produced superior results, indicating that preservation of fine-grained signal characteristics is crucial for accurate energy forecasting.

The comparative analysis between filtering techniques revealed that the binomial filter demonstrated modest advantages over the uniform filter in several configurations due to its graduated weighting scheme, which better preserved significant transitions while providing effective noise reduction. However, the impact of window size selection proved more significant than the choice of filtering methodology, suggesting that optimizing the temporal resolution of decomposition is the primary determinant of forecasting performance.

The results demonstrated the efficacy and adaptability of the proposed method, with consistent performance gains across different decomposition strategies and sampling intervals. Notably, the binomial filtering technique showed superior ability to smooth fluctuations without obscuring critical features, thereby contributing to a significant reduction in prediction error metrics such as RMSE and MAE. These findings highlight the robustness of the method in handling various data characteristics, ensuring accurate predictions even in the presence of noise.

Despite the promising results, a further analysis revealed that the model’s performance was slightly impacted during peak hours. Specifically, the ability of the high-frequency model to predict peak height was limited due to the variability in amplitude distributions throughout the day. While the overall error rates remained low, these errors were more pronounced during peak hours.

In future work, we will extend our research through model optimization techniques including weight pruning and quantization of our LSTM networks, which will enable more efficient edge deployment in energy-constrained embedded systems. These optimizations will be compared with alternative efficient architectures such as Temporal Convolutional Networks (TCNs) to establish comprehensive performance benchmarks across diverse computational requirements.

Overall, the findings confirm the relevance of frequency-aware preprocessing in energy forecasting applications and pave the way for more refined hybrid architectures. Future investigations will explore adaptive decomposition schemes and context-aware window sizing to further tailor the approach to dynamic consumption environments and enhance prediction robustness, especially at peak hours.

Author Contributions

M.T., conceptualization, methodology, original draft writing, visualization, and editing; F.D., writing paragraphs, reviewing, supervision and editing; F.D. project administration and funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Open Access Publication Funds of the HTWK Leipzig.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in [Energy consumption data from a household in northeastern Mexico] at [https://doi.org/10.1016/j.dib.2024.110452], reference number [24].

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LSTM	Long Short-Term Memory
BiLSTM	Bidirectional LSTM
GRU	Gated Recurrent Unit
RNN	Recurrent Neural Network
AI	Artificial Intelligence
SWT	Stationary Wavelet Transform
DWT	Discrete wavelet transform
EMD	Empirical Mode Decomposition
SMA	Simple Moving Average
CNN	Convolutional Neural Network
DeepAR	Autoregressive Recurrent Network
SARIMA	Seasonal Autonomous Integrated Moving Average
ARIMA	AutoRegressive Integrated Moving Average
SVM	Support Vector Machine
ANN	Artificial Neural Network
STL	Seasonal-Trend decomposition using Loess
MSE	Mean Square Error
RMSE	Root-Mean-Square Error
MAE	Mean Absolute Error

References

Blaabjerg, F.; Yang, Y.; Yang, D.; Wang, X. Distributed Power-Generation Systems and Protection. Proc. IEEE 2017, 105, 1311–1331. [Google Scholar] [CrossRef]
Javaid, N.; Hafeez, G.; Iqbal, S.; Alrajeh, N.; Alabed, M.; Guizani, M. Energy Efficient Integration of Renewable Energy Sources in the Smart Grid for Demand Side Management. IEEE Access 2018, 6, 77077–77096. [Google Scholar] [CrossRef]
Arumugham, V.; Ghanimi, H.M.A.; Pustokhin, D.A.; Pustokhina, I.; Ponnam, V.S.; Alharbi, M.; Krishnamoorthy, P.; Sengan, S. An Artificial-Intelligence-Based Renewable Energy Prediction Program for Demand-Side Management in Smart Grids. Sustainability 2023, 15, 5453. [Google Scholar] [CrossRef]
Habbak, H.; Mahmoud, M.; Metwally, K.; Fouda, M.; Ibrahem, M.I. Load Forecasting Techniques and Their Applications in Smart Grids. Energies 2023, 16, 1480. [Google Scholar] [CrossRef]
Wei, N.; Li, C.; Peng, X.; Zeng, F.; Lu, X. Conventional models and artificial intelligence-based models for energy consumption forecasting: A review. J. Pet. Sci. Eng. 2019, 181, 106187. [Google Scholar] [CrossRef]
Jin, N.; Yang, F.; Mo, Y.; Zeng, Y.; Zhou, X.; Yan, K.; Ma, X. Highly accurate energy consumption forecasting model based on parallel LSTM neural networks. Adv. Eng. Inform. 2022, 51, 101442. [Google Scholar] [CrossRef]
Cascone, L.; Sadiq, S.; Ullah, S.; Mirjalili, S.; Siddiqui, H.; Umer, M. Predicting Household Electric Power Consumption Using Multi-Step Time Series with Convolutional LSTM. Big Data Res. 2022, 31, 100360. [Google Scholar] [CrossRef]
Singh, S.; Yassine, A. Mining Energy Consumption Behavior Patterns for Households in Smart Grid. IEEE Trans. Emerg. Top. Comput. 2019, 7, 404–419. [Google Scholar] [CrossRef]
Viera, J.; Aguilar, J.; Rodríguez-Moreno, M.; Quintero-Gull, C. Analysis of the Behavior Pattern of Energy Consumption through Online Clustering Techniques. Energies 2023, 16, 1649. [Google Scholar] [CrossRef]
Petropoulos, F.; Apiletti, D.; Assimakopoulos, V.; Babai, M.; Barrow, D.K.; Bergmeir, C.; Bessa, R.; Boylan, J.; Browell, J.; Carnevale, C.; et al. Forecasting: Theory and practice. Int. J. Forecast. 2020, 38, 705–871. [Google Scholar] [CrossRef]
Chodakowska, E.; Nazarko, J.; Nazarko, Ł. ARIMA Models in Electrical Load Forecasting and Their Robustness to Noise. Energies 2021, 14, 7952. [Google Scholar] [CrossRef]
Taslim, D.G.; Murwantara, I.M. Comparative analysis of ARIMA and LSTM for predicting fluctuating time series data. Bull. Electr. Eng. Inform. 2024, 13, 1943–1951. [Google Scholar] [CrossRef]
Shin, S.Y.; Woo, H.G. Energy consumption forecasting in Korea using machine learning algorithms. Energies 2022, 15, 4880. [Google Scholar] [CrossRef]
Mahjoub, S.; Chrifi-Alaoui, L.; Marhic, B.; Delahoche, L. Predicting Energy Consumption Using LSTM, Multi-Layer GRU and Drop-GRU Neural Networks. Sensors 2022, 22, 4062. [Google Scholar] [CrossRef]
Frikha, M.; Taouil, K.; Fakhfakh, A.; Derbel, F. Predicting Power Consumption Using Deep Learning with Stationary Wavelet. Forecasting 2024, 6, 864–884. [Google Scholar] [CrossRef]
Saoud, L.S.; Al-Marzouqi, H.; Hussein, R. Household energy consumption prediction using the stationary wavelet transform and transformers. IEEE Access 2022, 10, 5171–5183. [Google Scholar] [CrossRef]
Zhao, Y.; Li, J.; Chen, C.; Guan, Q. A Diffusion–Attention-Enhanced Temporal (DATE-TM) Model: A Multi-Feature-Driven Model for Very-Short-Term Household Load Forecasting. Energies 2025, 18, 486. [Google Scholar] [CrossRef]
El Alaoui, M.; Chahidi, L.O.; Rougui, M.; Lamrani, A.; Mechaqrane, A. Prediction of energy consumption of an administrative building using machine learning and statistical methods. Civ. Eng. J. 2023, 9, 1007–1022. [Google Scholar] [CrossRef]
Kerdprasop, K.; Kerdprasop, N.; Chuaybamroong, P. Deep Learning and Machine Learning Models to Predict Energy Consumption in Steel Industry. Int. J. Mach. Learn. 2023, 13, 142–145. [Google Scholar] [CrossRef]
Yan, K.; Li, W.; Ji, Z.; Qi, M.; Du, Y. A hybrid LSTM neural network for energy consumption forecasting of individual households. IEEE Access 2019, 7, 157633–157642. [Google Scholar] [CrossRef]
Oppenheim, A.V.; Schafer, R.W. Chapter 7 provides comprehensive analysis of window functions and their spectral characteristics. In Discrete-Time Signal Processing, 3rd ed.; Pearson Higher Education: Upper Saddle River, NJ, USA, 2010. [Google Scholar]
Young, I.T.; Van Vliet, L.J. Recursive implementation of the Gaussian filter. Signal Process. 1995, 44, 139–151. [Google Scholar] [CrossRef]
Taktak, M.; Baazaoui, M.K.; Ketata, I.; Sahnoun, S.; Fakhfakh, A.; Derbel, F. AI-Enhanced Distance Estimation via Radio Chip Link Quality Metrics and Time-of-Flight Analysis with UWB Technology: A Comparative Evaluation. IEEE Sens. Lett. 2024, 8, 6013604. [Google Scholar] [CrossRef]
Aguirre-Fraire, B.; Beltrán, J.; Soto-Mendoza, V. A comprehensive dataset integrating household energy consumption and weather conditions in a north-eastern Mexican urban city. Data Brief 2024, 54, 110452. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Framework of the proposed methodology for energy consumption forecasting through frequency decomposition and neural network integration.

Figure 2. Extraction of the low-frequency component from an input signal using a uniform filter via convolution, here represented by the * symbol. The arrows indicate the processing flow from input signal through filtering to output signal.

Figure 3. Extraction of the low-frequency component from an input signal using a binomial filter via convolution, here represented by the * symbol. The arrows indicate the processing flow from input signal through filtering to output signal.

Figure 4. Decomposition of power consumption signal using a uniform filter. The original signal (top) was separated into a low-frequency component (middle, blue) and a high-frequency component (bottom, red).

Figure 5. Decomposition of power consumption signal using a binomial filter. The original signal (top) was separated into a low-frequency component (middle, blue) and a high-frequency component (bottom, red).

Figure 6. Spectrogram of the original power consumption signal with superimposed cutoff frequency (red line) at approximately 0.167 Hz, corresponding to a window size of

w = 3

.

Figure 6. Spectrogram of the original power consumption signal with superimposed cutoff frequency (red line) at approximately 0.167 Hz, corresponding to a window size of

w = 3

.

Figure 7. Feature importance analysis based on Gini impurity scores for the Smart House dataset. The use [kW] feature is highlighted in red to emphasize its dominance as the most important feature.

Figure 8. Feature importance analysis based on Gini impurity scores for the Mexican Household dataset. The active_power feature is highlighted in red to emphasize its dominance as the most important feature.

Figure 9. Actual vs. predicted power consumption for a smart house dataset (1 min granularity) with a binomial filter for different window sizes.

Figure 10. Actual vs. predicted power consumption for a smart house dataset (15 min granularity) with a binomial filter for different window sizes.

Figure 11. Evaluation of forecasting performance across different lookback window sizes for the Mexican Household dataset. The red dashed line highlights the optimal lookback window of 60, which achieved the lowest error metrics (MSE, RMSE, MAE) and the highest

R^{2}

score.

Figure 11. Evaluation of forecasting performance across different lookback window sizes for the Mexican Household dataset. The red dashed line highlights the optimal lookback window of 60, which achieved the lowest error metrics (MSE, RMSE, MAE) and the highest

R^{2}

score.

Table 1. Comparative analysis of uniform and binomial filters.

Feature	Uniform Filter (SMA)	Binomial Filter
Weighting scheme	Equal weighting for all points	Higher weight on central points, lower on edges
Smoothing effect	Strong but may oversmooth key transitions	More adaptive smoothing with better trend retention
Lag in response	Higher lag due to equal weighting	Reduced lag due to adaptive weighting
High-frequency residual	More pronounced variations	Better balanced distribution due to superior low-frequency capture

Table 2. Description of variables (factors) in the Smart Household dataset.

Variable	Description
Dishwasher	Power usage of the dishwasher (kW).
Home_office	Power demand of home office appliances (kW).
Fridge	Energy consumption of the refrigerator (kW).
Wine_cellar	Electricity usage of the wine storage unit (kW).
Garage_door	Energy required for operating the garage door (kW).
Barn	Power usage for lighting and appliances in the barn (kW).
Well	Electricity consumed by the well pump (kW).
Microwave	Power required for microwave operation (kW).
Living_room	Energy used by household devices in the living area (kW).
Furnace	Power needed for heating via the furnace (kW).
Kitchen	Energy consumption of kitchen appliances (kW).
Solar	Electricity generated from solar panels (kW).
temperature	Indoor temperature measurement (°C).
humidity	Percentage of moisture present in the air.
visibility	Visibility range (km), indicating atmospheric clarity.
apparentTemperature	Feels-like temperature (°C), accounting for humidity and wind.
pressure	Air pressure (hPa), reflecting weather changes.
windSpeed	Wind speed (m/s), measuring wind movement intensity.
cloudCover	Proportion of sky obscured by clouds.
windBearing	Wind direction (°), specifying wind origin.
precipIntensity	Rate of precipitation (in/h).
dewPoint	Dew point temperature (°C), representing air saturation level.
precipProbability	Probability of precipitation (%).
load	Total energy usage from all household appliances (kW).

Table 3. Description of variables (factors) in the Mexican Household dataset.

Variable	Description
current	Electric current (A), indicating the intensity of electricity used in the household.
voltage	Electrical voltage (V), measuring the potential difference in the household power network.
reactive_power	Reactive power (W), characterizing the non-working power component in the electrical system.
apparent_power	Apparent power (VA), representing the total power, including both active and reactive components.
power_factor	Efficiency metric for energy usage, with values closer to 1 indicating better power utilization.
temp	Real-time temperature (°C) at the household location.
feels_like	Perceived temperature (°C), integrating actual temperature and humidity effects.
temp_min	Minimum daily temperature (°C).
temp_max	Maximum daily temperature (°C).
pressure	Atmospheric pressure (hPa), illustrating variations in air pressure.
humidity	Air moisture content measured in percentage.
speed	Wind velocity (m/s), representing how fast the wind is blowing.
deg	Wind direction (°), showing the origin of the wind.
temp_t+1	Predicted temperature (°C) for the next day.
feels_like_t+1	Forecasted perceived temperature (°C) for the following day.
load	Total household power consumption (W).

Table 4. Summary statistics of variables in the smart household dataset.

Variable	Mean	Std	Min	Max
use [kW]	251,954.5	145,466.43	0	503,909
gen [kW]	0.859	1.058	0	14.715
House overall [kW]	0.076	0.128	0	0.614
Dishwasher [kW]	0.032	0.191	0	1.402
Furnace 1 [kW]	0.099	0.169	0.000017	1.934
Furnace 2 [kW]	0.137	0.179	0.000067	0.795
Home office [kW]	0.081	0.104	0.000083	0.972
Fridge [kW]	0.064	0.076	0.000067	0.851
Wine cellar [kW]	0.042	0.058	0.000017	1.274
Garage door [kW]	0.014	0.014	0.000017	1.089
Kitchen 12 [kW]	0.003	0.022	0	1.167
Kitchen 14 [kW]	0.007	0.077	0	2.263
Kitchen 38 [kW]	0.000009	0.000010	0	0.000183
Barn [kW]	0.059	0.203	0	7.028
Well [kW]	0.016	0.138	0	1.633
Microwave [kW]	0.011	0.099	0	1.930
Living room [kW]	0.035	0.096	0	0.465
Solar [kW]	0.076	0.128	0	0.614
temperature [°C]	50.74	19.11	−12.64	93.72
humidity	0.664	0.194	0.13	0.98
visibility	9.253	1.611	0.27	10
apparentTemperature [°C]	48.26	22.03	−32.08	101.12
pressure [hPa]	1016.30	7.90	986.40	1042.46
windSpeed [m/s]	6.650	3.983	0	22.91
windBearing [°]	202.36	106.52	0	359
precipIntensity [mm/h]	0.0026	0.0113	0	0.191
dewPoint [°C]	38.69	19.09	−27.24	75.49
precipProbability	0.056	0.166	0	0.84

Table 5. Summary statistics of variables in the Mexican household dataset.

Variable	Mean	Std	Min	Max
Active power	286.02	189.55	24.40	2900.00
Current	2.59	1.59	0.30	24.41
Voltage	125.42	4.39	107.60	135.50
Reactive power	132.54	71.03	4.73	1293.58
Apparent power	321.84	191.74	37.14	2931.64
Power factor	0.85	0.11	0.20	1.00
Temperature	19.53	6.61	−5.56	39.37
Feels like	18.71	6.72	−6.13	36.70
Temperature min	18.09	6.52	−5.56	37.59
Temperature max	20.80	6.35	−5.56	39.44
Pressure	1015.22	5.99	996.00	1035.00
Humidity	47.95	19.45	1.00	100.00
Speed	2.62	2.09	0.00	10.29
Wind direction (deg)	152.74	127.59	0.00	360.00

Table 6. Prediction performance for the Smart House dataset (1 min granularity) with a uniform filter.

Window Size	MSE	RMSE	MAE	R-Squared
10	0.038	0.195	0.096	0.912
5	0.019	0.138	0.082	0.954
3	0.001	0.032	0.019	0.997

Table 7. Prediction performance for the Smart House dataset (15 min granularity) with a uniform filter.

Window Size	MSE	RMSE	MAE	R-Squared
10	0.068	0.261	0.167	0.768
5	0.035	0.187	0.136	0.879
3	0.003	0.055	0.040	0.989

Table 8. Prediction performance for the Smart House dataset (1 min granularity) with a binomial filter.

Window Size	MSE	RMSE	MAE	R-Squared
10	0.024	0.155	0.085	0.942
5	0.010	0.100	0.057	0.974
3	0.001	0.032	0.026	0.995

Table 9. Prediction performance for the Smart House dataset (15 min granularity) with a binomial filter.

Window Size	MSE	RMSE	MAE	R-Squared
10	0.044	0.210	0.139	0.849
5	0.010	0.100	0.066	0.964
3	0.002	0.045	0.036	0.992

Table 10. Prediction performance for the Mexican dataset (1 min granularity) with a uniform filter.

Window Size	MSE	RMSE	MAE	R²
10	3141.470	56.048	22.104	0.896
5	1392.006	37.290	14.691	0.954
3	176.321	13.278	9.626	0.994

Table 11. Prediction performance for the Mexican dataset (15 min granularity) with a uniform filter.

Window Size	MSE	RMSE	MAE	R²
10	5582.041	74.690	55.517	0.641
5	2684.711	51.820	37.773	0.827
3	127.112	11.274	8.763	0.991

Table 12. Prediction performance for the Mexican dataset (1 min granularity) with a binomial filter.

Window Size	MSE	RMSE	MAE	R²
10	2579.860	50.791	21.163	0.915
5	1729.850	41.580	14.105	0.943
3	194.790	13.956	9.354	0.993

Table 13. Prediction performance for the Mexican dataset (15 min granularity) with a binomial filter.

Window Size	MSE	RMSE	MAE	R²
10	3583.850	59.830	45.353	0.770
5	1262.383	35.520	26.035	0.919
3	139.570	11.813	8.963	0.991

Table 14. Performance comparison at 1 min granularity for different models (Mexican Household dataset).

Model	MAE	RMSE	R² Score	Signal Decomposition
CNN-LSTM model	44.663	122.227	0.507	No
GRU model	38.501	120.369	0.522	No
LSTM model	37.624	120.070	0.526	No
DeepAR [17]	37.341	83.953	0.756	No
BiLSTM [17]	66.207	113.230	0.704	No
DATE-TM [17]	28.754	82.488	0.878	No
Proposed method	9.626	13.278	0.994	Yes

Table 15. Performance comparison between STL + LSTM and proposed method at 1 min granularity (Mexican Household dataset).

Method	MAE	RMSE
STL + LSTM	23.066	38.257
Proposed method	9.626	13.278

Table 16. Performance comparison at 1 min granularity with state-of-the-art method.

Dataset	Method	MAE	RMSE	R²
Smart House	State of the art [17]	0.201	0.400	0.863
Smart House	Proposed method	0.019	0.034	0.997
Mexican House	State of the art [17]	28.754	82.488	0.878
Mexican House	Proposed method	9.626	13.278	0.994

Table 17. Time series cross-validation results at 1 min granularity (Mexican Household dataset).

Metric	Mean	Std. Dev.	95% CI
RMSE	13.889	2.015	[9.938, 17.840]
MAE	9.518	1.920	[5.753, 13.283]
R²	0.993	0.003	[0.989, 0.997]

Table 18. Performance comparison at 15 min granularity with state-of-the-art method.

Dataset	Method	MAE	RMSE	R²
Smart House	State of the art [17]	3.324	5.475	0.758
Smart House	Proposed Method	0.036	0.048	0.992
Mexican House	State of the art [17]	1031.504	1409.729	0.771
Mexican House	Proposed Method	8.763	11.270	0.991

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Taktak, M.; Derbel, F. Evaluating the Impact of Frequency Decomposition Techniques on LSTM-Based Household Energy Consumption Forecasting. Energies 2025, 18, 2507. https://doi.org/10.3390/en18102507

AMA Style

Taktak M, Derbel F. Evaluating the Impact of Frequency Decomposition Techniques on LSTM-Based Household Energy Consumption Forecasting. Energies. 2025; 18(10):2507. https://doi.org/10.3390/en18102507

Chicago/Turabian Style

Taktak, Maissa, and Faouzi Derbel. 2025. "Evaluating the Impact of Frequency Decomposition Techniques on LSTM-Based Household Energy Consumption Forecasting" Energies 18, no. 10: 2507. https://doi.org/10.3390/en18102507

APA Style

Taktak, M., & Derbel, F. (2025). Evaluating the Impact of Frequency Decomposition Techniques on LSTM-Based Household Energy Consumption Forecasting. Energies, 18(10), 2507. https://doi.org/10.3390/en18102507

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating the Impact of Frequency Decomposition Techniques on LSTM-Based Household Energy Consumption Forecasting

Abstract

1. Introduction

2. Related Works

3. Methodology

3.1. Multi-Resolution Analysis Rationale

3.2. Data Preprocessing

Feature Normalization

3.3. Signal Decomposition Using Convolution

3.3.1. Theoretical Foundations of Low-Frequency Extraction

3.3.2. Low-Frequency Component Extraction Methods

3.3.3. High-Frequency Component Extraction

3.3.4. Visualization and Comparison of Signal Decomposition Methods

3.3.5. Frequency-Domain Analysis of the Decomposition

3.4. Temporal Sequence Construction

3.5. LSTM-Based Forecasting Model

3.5.1. Model Architecture

3.5.2. Training and Prediction

3.6. Performance Evaluation

3.6.1. Mean Squared Error (MSE)

3.6.2. Root-Mean-Square Error (RMSE)

3.6.3. Mean Absolute Error (MAE)

3.6.4. Coefficient of Determination (R2)

3.7. Analysis of Temporal Window Effects

4. Experiments and Results

4.1. Datasets’ Description

4.1.1. Smart House Dataset

4.1.2. Mexican House Dataset

4.1.3. Feature Importance Analysis

4.2. Impact of Low-Frequency Extraction on Energy Consumption Prediction

4.3. Results for Smart House Dataset

4.3.1. Effect of the Uniform Filter on Prediction Performance

4.3.2. Effect of the Binomial Filter on Prediction Performance

4.4. Results for Mexican House Dataset

4.4.1. Effect of the Uniform Filter on Prediction Performance

4.4.2. Effect of the Binomial Filter on Prediction Performance

4.5. Influence of Data Split and Window Size on Prediction Accuracy

4.6. Influence of Lookback Window Size on Accuracy

4.7. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.6.4. Coefficient of Determination (R²)