Next Article in Journal
Innovative Developments and Future Prospects of Geo-Energy Technology in China
Next Article in Special Issue
A Spatial Long-Term Load Forecast Using a Multiple Delineated Machine Learning Approach
Previous Article in Journal
Regulating AI in the Energy Sector: A Scoping Review of EU Laws, Challenges, and Global Perspectives
Previous Article in Special Issue
Sizing and Characterization of Load Curves of Distribution Transformers Using Clustering and Predictive Machine Learning Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Short-Term Load Forecasting Based on Similar Day Theory and BWO-VMD

Department of Electrical and Electronic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
*
Author to whom correspondence should be addressed.
Energies 2025, 18(9), 2358; https://doi.org/10.3390/en18092358
Submission received: 15 March 2025 / Revised: 8 April 2025 / Accepted: 28 April 2025 / Published: 6 May 2025
(This article belongs to the Special Issue Machine Learning for Energy Load Forecasting)

Abstract

:
Short-term power load forecasting at the regional level is essential for maintaining grid stability and optimizing power generation, consumption, and maintenance scheduling. Considering the temporal, periodic, and nonlinear characteristics of power load, a novel short-term load forecasting method is proposed in this paper. First, Random Forest importance ranking is applied to select similar days and a weighted eigenspace coordinate system is established to measure similarity. The daily load sequence is then decomposed into high-, medium-, and low-frequency components using Variational Mode Decomposition (VMD). The high-frequency component is predicted using the similar day averaging method, while neural networks are employed for the medium and low-frequency components, leveraging historical and similar-day data, respectively. This multi-faceted approach enhances the accuracy and granularity of load pattern analysis. The final forecast is obtained by summing the predictions of these components. The case study demonstrates that the proposed model outperforms LSTM, GRU, CNN, TCN and Transformer, with an RMSE of 660.54 MW and a MAPE of 7.81%, while also exhibiting fast computational speed and low CPU usage.

1. Introduction

Power load forecasting at the regional level is crucial for departments involved in power dispatch, consumption, and planning, as it significantly impacts the operation, planning, and dispatch of power grids across the entire region [1]. Power load forecasting is typically categorized into long-term, medium-term, and short-term forecasting, based on the different periods [2]. Short-term Load Forecasting (STLF) typically covers the next few minutes to a week. Accurate STLF aids in the optimal scheduling of generating units, consumption planning, and maintenance, thereby ensuring the grid’s safety and stability.
The integration of a large-scale renewable energy source adds complexity to the system, increasing the demand for precise STLF at both system and regional levels [3,4]. Power loads, influenced by human activities, weather, and other factors, exhibit temporality, periodicity, and nonlinearity, challenging accurate predictions [5]. Numerous studies have been conducted on regional-level STLF, with methods mainly including traditional and machine learning approaches [6,7]. Traditional forecasting methods like multivariate linear regression [8], Support Vector Regression (SVR) [9], autoregression [10] and exponential smoothing [11], though simple and quick, struggle with nonlinear relationships. In recent years, machine learning methods have gained increasing application due to their ability to handle nonlinearity. Reference [12] employs Random Forest (RF) Regression to predict groundwater, demonstrating excellent forecasting performance compared to other regression models. In the application of neural networks for load forecasting, Convolutional Neural Networks (CNNs) can extract features from data and accurately handle nonlinear sequences [13] but may compromise the continuity of power load as a time series [14]. Advanced models like Long Short-Term Memory (LSTM) [15] and Gate Recurrent Units (GRU) [16] are widely used due to their temporal structure-based design. Recently, Transformer has been introduced into load forecasting as a sequence-to-sequence deep model architecture [17].
In recent years, hybrid methods have been increasingly applied to short-term forecasting. Reference [18] combines CNN and LSTM and applies this model to the power system in Bangladesh. The results confirm higher accuracy in load forecasting compared to non-hybrid models. Reference [19] combines CNN and Transformer for load forecasting, demonstrating its superiority over individual models. However, neural networks have the drawbacks of high computational complexity, slow convergence, and having local minima [20,21]. Therefore, the combination of time-series analysis and neural networks has been introduced.
References [22,23] propose the combination of similar day theory and neural network for STLF. The similar day theory selects historical days with similar characteristics to the forecasted day, which can help the model to find days that are more similar to the load sequence of the forecasted day thus improving the prediction accuracy [24,25]. Notably, extracting key features and identifying links between features are the focus of similar day theory. Reference [26] notes that similar days have the same weekday index and weather similar to the forecasted day. Reference [27] highlights the impact of renewable energy on the power grid. In load forecasting, incorporating the generation of renewable energy as features can also improve prediction accuracy. Reference [28] identifies similar days using Euclidean distance. Reference [29] integrates K-means clustering with the similar day theory, employing a weighted Euclidean distance to identify similar days. Reference [30] proposes a machine learning-based approach for similar day selection. In some studies, combining RF with weighted Euclidean distance has been shown to enhance the interpretability of similar day selection [31,32].
However, similar day theory may not be suitable for all load sequences under varying frequencies. Some other models introduce signal decomposition methods to extract key features and reduce model complexity [33]. Empirical Mode Decomposition (EMD) [34] is utilized to decompose and reconstruct the load sequence to capture the characteristics. Similar methods are Ensemble Empirical Mode Decomposition (EEMD) [35] and Variational Mode Decomposition (VMD) [36]. They can all be used in combination with ANNs. Among them, VMD is superior to other methods in dealing with signal non-stationarity [37]. Since VMD requires manual parameter setting, it is necessary to optimize these parameters. Reference [38] proposes the Greylag Goose Optimization (GGO) algorithm inspired by the V-formation flight of geese and demonstrates superior accuracy over other algorithms in both benchmark tests and engineering applications. Similarly, the Grey Wolf and Dingo Optimization (GWDTO) algorithm combines the cooperative hunting strategy of Grey Wolf Optimization (GWO) with the dynamic search behavior of Dingo Optimization (DTO), effectively balancing exploration and exploitation [39]. The hybrid approach of Particle Swarm Optimization and Al-Biruni Earth Radius (PSO-BER) optimization leverages the fast convergence of PSO and the local refinement ability of BER, achieving competitive results in high-dimensional spaces [40]. However, the Beluga Whale Optimization (BWO) algorithm, inspired by the behaviors of beluga whales, demonstrates outstanding performance across 30 benchmark functions, surpassing 15 other metaheuristic algorithms in scalability and optimization accuracy [41]. Given its strong exploration and exploitation balance, as well as its global convergence capabilities, BWO shows great potential for optimizing Variational Mode Decomposition (VMD) in complex real-world applications [42].
Although hybrid models often improve prediction accuracy, they also introduce challenges related to computational efficiency. STLF is typically applied in real-time scheduling and management of power systems. To make timely adjustments, the model must be able to complete computations within a short time frame [43]. Reference [44] compares the computation time and CPU usage of various hybrid models to select the ones with higher computational efficiency.
In this paper, a novel, efficient short-term load forecasting model based on similar day theory and VMD is proposed, which analyzes the 96-point load data of historical days and predicts the load sequence at 96 points for the next day. The contributions are summarized as follows:
(1) A weighted eigenspace coordinate system is proposed to select similar days. It uses Euclidean space and RF importance ranking to calculate similarity, which is then used to determine similar days. This similar day selection method improves the prediction accuracy of both medium-frequency and high-frequency components compared to the historical day-based method;
(2) The load sequence is decomposed using VMD algorithm, which is optimized by BWO. This is achieved by ensuring that the Intrinsic Mode Functions (IMFs) exhibit the lowest level of complexity, thereby making the trend and period of the decomposed sequence more distinct and recognizable by the forecasting model.
(3) A novel short-term load forecasting model combining similar day theory and VMD is proposed. By applying the similar day averaging method for high-frequency components and neural networks for medium- and low-frequency components, the model improves accuracy. Case studies demonstrate its superior performance compared to traditional methods.
The rest of this paper is organized as follows: Section 2 describes the methodology for similar day selection. Section 3 is the general flow of the proposed short-term load forecasting model combining similar day theory and VMD. In Section 4, a case study confirms the utility of the model. Section 5 concludes the paper. The general framework of this paper is shown in Figure 1.

2. Similar Day Selection

Traditional forecasting methods often use 1 day before, 2 days before, and 7 days before the forecasted day as inputs, but they are not necessarily the day closest to the forecasted day [45]. It is found that similar temperature, humidity, and tariff conditions produce similar load sequences. Therefore, this section describes the method of selecting similar days for the forecasting models.

2.1. Feature Processing

Due to the different resolutions of load data, this section considers daily features to accommodate inputs from various datasets. Since the units of the features differ, they must first be normalized to ensure consistency. The normalization process employs min–max normalization to achieve this:
x j = x ^ j x ^ j min x ^ j max x ^ j min
where x ^ j is the value of feature j. Features can include maximum temperature, average temperature, humidity, electricity prices, atmospheric pressure, weekday index, etc., depending on the dataset. Also, the last moment load, first moment load and peak load of the historical day can be selected as features.

2.2. Feature Weight Determination

In order to analyze the effect of each feature on the similarity of daily load sequences, the RF importance ranking method is adapted.
RF first selects samples from the sample set to form a training set using bootstrap sampling and then constructs a decision tree based on the training set obtained through this sampling process. At each node of the tree, h features are selected randomly and without repetition. These h features are used to partition the training set in order to identify the optimal splitting feature, which can be evaluated using the Gini coefficient, gain ratio, or information gain. The Gini coefficient is defined as:
G I q = s = 1 S p ^ q s ( 1 p ^ q s )
where S is the number of categories in the training set and p ^ q s is an estimate of the probability that a sample of node q belongs to class s.
The above steps are then repeated k times, where k represents the number of decision trees in RF. The test samples are predicted using the trained RF, and the final result is determined through a majority voting method.
In this paper, the Gini coefficient is used to evaluate the importance scores of features, and the change in the Gini coefficient before and after the ith decision tree node q is branched by feature j is as follows:
V I M j q ( G i n i ) ( i ) = G I q ( i ) G I l ( i ) G I r ( i )
where G I q ( i ) denotes the Gini coefficient of the pre-branching node q.  G I l ( i ) and G I r ( i ) denote the Gini coefficient of the two new nodes l and r after branching. If the nodes where feature j appears in the decision tree and i are collectively referred to as set Q, then the importance score at tree i is
V I M j ( G i n i ) ( i ) = q Q V I M j q ( G i n i ) ( i )
Since there are k trees in the RF, the final importance score of the feature j is
w j = 1 k i = 1 k V I M j ( G i n i ) ( i )
The normalized importance scores are considered as the feature weights.

2.3. Weighted Eigenspace Coordinate System Construction

Once the values and weights of features have been determined, the weighted Euclidean distance can be calculated as follows:
d i s m n = j = 1 d w j ( x m j x n j ) 2
It is defined as similarity and when it is smaller, it means that two days are more similar.
The similarity is represented in the spatial coordinate system as shown in Figure 2. The black dashed line corresponds to the eigenspace coordinate system and d coordinate axes are established for d features, respectively. The coordinate on the feature j axis is x j . The daily feature coordinate points corresponding to all features on day m and day n are b m ( x m 1 , x m 2 , , x m d ) and b n ( x n 1 , x n 2 , , x n d ) . The orange dashed line is the weighted eigenspace coordinate system with weights of wj, corresponding to the daily feature coordinate points b m w 1 x m 1 , w 2 x m 2 , , w d x m d and b n w 1 x n 1 , w 2 x n 2 , , w d x n d . The distance between two points measures the similarity of features between two days.

3. Short-Term Load Forecasting Based on Similar Day Theory and BWO-VMD

3.1. BWO Algorithm

The BWO algorithm is an optimization algorithm inspired by the population behavior of beluga whales. The algorithm contains three phases: exploration, exploitation, and whale fall.
A population consisting of beluga whales can be represented as follows:
X = x 1 , 1 x 1 , 2 x 1 , t x 2 , 1 x 2 , 2 x 2 , t x n , 1 x n , 1 x n , t
where n is the population size of beluga whales and t is the dimension of the variable to be optimized. The fitness function of beluga whales can be expressed as follows:
F X = f [ x 1 , 1 x 1 , 2 x 1 , t ] f [ x 2 , 1 x 2 , 2 x 2 , t ] f [ x n , 1 x n , 2 x n , t ]
The location of the beluga whale can be considered as a search agent, which follows a random initialization approach:
X i , j = l b j + ( u b j l b j ) × r
where ubj and lbj are upper and lower bounds on the variable, and are random numbers in the range (0, 1). In addition, beluga whales transition from the exploration phase to the exploitation phase by balancing factors B f .
B f = B 0 ( 1 g 2 G max )
where B0 is a random number between (0, 1), g is the current iteration number, and Gmax is the maximum iteration number. When Bf > 0.5, the beluga whale is in the exploration phase, it swims in mirror image; when Bf < 0.5, it is in the exploitation phase, it engages in predatory behavior.
(1) Exploration phase: The position of the search agent is determined by the swimming of a pair of beluga whales, whose positions are updated as follows:
X i , j g + 1 = X i , p j g + ( X i , p 1 g X i , p j g ) ( 1 + r 1 ) sin ( 2 π r 2 ) , j = e v e n X i , j g + 1 = X i , p j g + ( X i , p 1 g X i , p j g ) ( 1 + r 1 ) cos ( 2 π r 2 ) , j = o d d
where X i , j g + 1 is the new position of the ith beluga whale in the jth dimension, X r , p 1 g and X i , p j g are the current position of the rth and ith beluga whale (r is random), r1 and r2 are random numbers in the range (0, 1), and sin ( 2 π r 2 ) and cos ( 2 π r 2 ) are the random numbers used for averaging the fins across.
(2) Exploitation phase: The Levy flight strategy is introduced in the exploitation phase of BWO to capture prey, and its mathematical model of predation is represented as follows:
X i g + 1 = r 3 X b e s t g r 4 X i g + C 1 L F ( X r g X i g )
where X i g and X r g are the current position of the ith and rth beluga whale (r is random), X i g + 1 is the new position of the ith beluga whale, X b e s t g is the best position among the beluga whales, and r3 and r4 are random numbers in the range (0, 1). C 1 = 2 r 4 ( 1 g / G max ) is used to measure the strength of random jumps in the intensity of Levy flight. The computation of L F is as follows:
L F = 0.05 × u × σ | v | 1 / β σ = Γ ( 1 + β ) × sin ( π β / 2 ) Γ [ ( 1 + β ) / 2 ] × β × 2 ( β 1 ) / 2 1 / β
where u and v are normally distributed random numbers and β is a constant that defaults to 1.5.
(3) Whale fall phase: The location of the beluga whale is updated as:
X i g + 1 = r 5 X i g r 6 X i g + r 7 X s t e p
where r5, r6 and r7 are random numbers in the range (0, 1), X s t e p = ( u b l b ) exp ( C 2 g / G max ) is the step size of the whale fall, C 2 = 2 W f × n is the step factor, and ub and lb are upper and lower bounds on the variables. The probability of a whale fall is designed as a linear function:
W f = 0.1 0.05 g G max
The probability of a whale fall decreases from 0.1 in the initial iteration to 0.05 in the last iteration.

3.2. VMD

VMD is an adaptive signal decomposition method that effectively handles non-smooth and nonlinear signals. By iteratively identifying variational modes, the original time series is decomposed into several IMFs with limited bandwidth. The algorithm mainly includes the construction of the variational problem and the solution of the variational problem. The solution process of VMD mainly contains two constraints: (1) the sum of the bandwidths of the center frequencies of each IMF is required to be minimized; (2) the sum of all IMFs is equal to the original signal.
The VMD algorithm defines the IMF for a finite bandwidth with stringent constraints, which is shown in (16).
u k ( t ) = A k ( t ) cos ( ϕ k ( t ) )
where k is the number of IMFs set in advance.
(1)
Construction of the variational problem
To construct the variational problem, the transient zero-sequence current signal [ δ ( t ) + j / π t ] u k ( t ) is Hilbert transformed to obtain an analyzed signal with k IMFs and a one-sided spectrum:
[ δ ( t ) + j π t ] u k ( t ) e j w k t
where δ ( t ) is the impulse function and denotes the convolution operation.
After computing the squared paradigm of (16) and estimating the bandwidth of each IMF, the variational problem is constructed as follows:
min { k = 1 K t { [ δ ( t ) + j π t ] u k ( t ) } e j w k t 2 2 } s . t . k = 1 K u k = f
(2)
Solution of the variational problem
In order to transform the above constrained variational problem into an unconstrained variational problem, a quadratic penalty factor α and a Lagrange multiplier operator λ are introduced in (17), and the extended Lagrange expression is:
L ( { u k } , { w k } , λ ) = α k = 1 K t { [ δ ( t ) + j π t ] u k ( t ) } e j w k t 2 2 + f ( t ) k = 1 K u k ( t ) 2 2 + λ ( t ) , f ( t ) k = 1 K u k ( t )
It is solved by Alternate Direction Method of Multipliers (ADMM) using multiplicative operators:
u ^ k n + 1 ( w ) = λ ^ n ( w ) 2 + f ^ ( w ) i = 1 k u ^ i n + 1 ( w ) i = k + 1 K u i n ( w ) 1 + 2 α ( w w k n ) 2 w k n + 1 = 0 w | u ^ k n + 1 ( w ) | 2 d w 0 | u ^ k n + 1 ( w ) | 2 d w
where u ^ k n + 1 ( w ) is the wiener filter of the current signal, ^ is the Fourier transform, w k n + 1 is the center of gravity of the power spectrum of the current IMF, and n is the number of iterations.
In VMD, the number of decomposed components k and the quadratic penalty factor α are two important parameters optimized by BWO. An excessively large value of k can cause over-decomposition, while a value that is too small may lead to under-decomposition. Likewise, setting α too high can result in the loss of frequency band information, whereas setting it too low may introduce information redundancy.
To find the optimal values of k and α, the fitness function is defined as the minimum envelope entropy. Minimizing envelope entropy improves the separability and regularity of the decomposed modes, as lower entropy indicates less mode mixing and more concentrated intrinsic features in the signal. The envelope entropy E p of a zero-mean signal x ( j ) can be expressed as:
E p = j = 1 N p j lg p j p j = a ( j ) / j = 1 N a ( j )
where p j is the normalized form of a ( j ) and a ( j ) is the envelope signal obtained by Hilbert demodulation of the signal x ( j ) .

3.3. Signal Reconstructing

When reconstructing the signal, in order to divide the signal into three components, high-frequency, medium-frequency, and low-frequency, it is necessary to calculate the center frequency of each IMF as well as to set the high-frequency threshold and the low-frequency threshold.
(1) Center frequency calculation: Considering that the samples collect data every 15 min, there are a total of 96 data points in a day. The sampling frequency should be 96 times a day, i.e., 1/900 Hz, and this value should be multiplied with the dimensionless center frequency in the VMD solving process, i.e., to achieve the center frequency in units of Hz.
(2) High-frequency threshold setting: The high frequency component usually corresponds to short-term fluctuations within a day. The high-frequency threshold may be set to correspond to fluctuations over a period of hours. The data show significant fluctuations every 2 h, so the high-frequency threshold can be set to correspond to a 2 h frequency, which can be calculated as follows:
f max = 1 2 × 3600 = 1.389 × 10 4 Hz
(3) Low-frequency threshold setting: The low frequency component usually corresponds to the long-term trend over the course of a day. In order to extract the main trend of the daily load sequence, then the low frequency threshold can be set to correspond to a 24 h frequency:
f min = 1 24 × 3600 = 1.157 × 10 5 Hz

3.4. CNN

The forecasting model uses CNN.
As shown in Figure 3, CNN mainly consists of input layer, convolutional layer, ReLU layer, pooling layer and full-connectivity layer. The model takes historical load data and features as inputs, extracts the features of the load through convolutional operation, and then reduces the computational complexity through a pooling layer. Finally, the full-connectivity layer converts the features into a one-dimensional structure, and finally outputs the predicted values. The core of CNN is the convolutional layer, which extracts the features of load through the convolutional kernel by performing the convolutional operation. The formula for the convolution kernel to extract features is:
c = f ( X W n + b n )
where n denotes the convolution kernel size, f denotes the activation function, X denotes the input sequence, * denotes the convolution operation, W denotes the weights, and b denotes the bias vector.
By integrating the aforementioned processes, the workflow of the short-term load forecasting model based on the similar-day theory and VMD is obtained, as illustrated in Figure 4.
The load forecasting model developed in this section first decomposes the daily load sequences using the VMD method. Two parameters of VMD, namely the number of decomposed components k and the quadratic penalty factor α , are determined by BWO algorithm. Subsequently, the decomposed signals are categorized into high-frequency, medium-frequency, and low-frequency components based on their respective frequency ranges.
The low-frequency component captures the long-term trend. Power loads with similar climatic conditions and weekday indices exhibit comparable trends within a day, so the training set uses data from similar days. The medium-frequency component mainly reflects fluctuations and is strongly correlated with historical days. Therefore, the training set utilizes data from historical days. The high-frequency component is primarily composed of noise and interference, which is difficult to predict, so a similar day averaging method is employed.
Finally, the three components are summed to obtain the predicted load.

4. Case Study

In this paper, the daily 96-point load of a region in China from 1 January 2012 to 31 December 2014 is selected to be analyzed and predicted [46]. Additional experimental details can be found in the Supplementary Materials.

4.1. Evaluation Criteria

The indicators for evaluating the forecasting model are usually selected as MSE, RMSE, MAE, MAPE, etc. Using a variety of forecasting models, the goodness of the forecasting model can be evaluated by using these indicators as the evaluation criteria. RMSE focuses on the impact of large errors, while MAPE provides a normalized and easily interpretable error measure. Therefore, using these two indicators together helps to comprehensively evaluate the model’s performance.
RMSE is the Root Mean Square Error, which increases as the error becomes larger. It is defined as follows:
R M S E = 1 n i = 1 n y ^ i y i 2
RMSE’s range is [ 0 , + ] , when the error is larger, the value is larger.
MAPE is the Mean Absolute Percentage Error, as shown in (25).
MAPE’s range is [ 0 , + ] . A MAPE of 0% indicates a perfect model, and the larger the error, the larger the value.
M A P E = 100 % n i = 1 n y ^ i y i y i
In addition to MAPE and RMSE, this study incorporates computation time and CPU usage as additional evaluation criteria. These indicators provide a more comprehensive assessment of model performance, especially when dealing with large datasets or real-time forecasting applications. While accuracy is critical, the computational efficiency of a model is equally important for practical deployment, particularly in power systems where timely adjustments need to be made.

4.2. Data Preprocessing

The data includes load data and feature data. First, the weekday indexes are obtained from the official website of perpetual calendar. And then the weights of the features are determined according to the existing data. The results are shown in Table 1.
The 96-point load data of all days are decomposed with the BWO-VMD. The optimized number of decomposed sequences is 8, and the penalty coefficient is 50. To verify the validity of the parameters selected by BWO, a sliding window method is used to calculate the average envelope entropy and average energy for each IMF, as shown in Figure 5. From Figure 5, it can be observed that the low-frequency components exhibit lower envelope entropy and higher energy, indicating that these modes effectively capture the main trend components of the signal. In contrast, the high-frequency components show higher envelope entropy and lower energy, suggesting that they primarily capture the high-frequency fluctuations or noise in the signal. Overall, the VMD effectively extracts different frequency band information from the signal, validating the reasonableness of the VMD parameters optimized by BWO.
The VMD results of the forecasted day, taking 6 August 2013, as a case, are shown in Figure 6.
As shown in Figure 7, The eight IMFs are sorted according to the spectrogram, and the IMFs are classified into high-frequency, medium-frequency, and low-frequency components according to the set of high-frequency threshold and low frequency threshold. The decomposed curves of high-, medium-, and low-frequency components reveal distinct patterns that contribute differently to the overall prediction accuracy. The low-frequency component clearly captures the long-term trend of the daily load, offering stable and smooth variations that are critical for accurate baseline prediction. The medium-frequency component reflects more localized fluctuations, which align well with historical daily patterns and enhance the model’s ability to track dynamic changes. In contrast, the high-frequency component exhibits irregular and noisy behavior, suggesting limited predictive value. Nevertheless, applying a similar day averaging method to this component helps to mitigate residual noise. These observations confirm that the decomposition not only improves interpretability but also plays a vital role in enhancing prediction performance, particularly through the contributions of the low- and medium-frequency components.
As seen in Figure 8, the VMD has residuals, and by calculation, the residuals are all below 0.1%, so they are not considered. All load sequences are reconstructed and saved as three data files.

4.3. Model Parameter Setting

(1)
Low frequency
The number of days used in the training set is denoted by q, the number of days to find similar days before the forecasted day is denoted by r, and the number of similar days to take is denoted by s. After an initial screening, q is selected as 60 or 90, r is selected as 10 or 20, and s is selected as 3 or 5. The test results are shown in Table 2.
Finally, the training set is determined to be 60 days before the forecasted day, and three similar days are taken from the first 10 days of the day to be tested. The forecasting model is studied in terms of moment. In the training set, the input is the current moment load of similar days, and the output is the current moment of the forecasted day. In the test set, the input is the current moment load of similar days, and the output is the current moment of the forecasted day. A total of 96 forecasts are made to obtain a final 96-point load sequence of the forecasted day.
(2)
Medium frequency
The medium-frequency training set is empirically tested by taking 60, 90, and 120 days before the forecasted day. The number of days used in the training set is denoted by n, and the final results are shown in Table 3.
Based on the above results, it is finalized that the training set is 90 days before the forecasted day and the first 3 days before those forecasted are historical days. In the training set, the input is the current moment load of the historical day as well as the data of features of the historical day and the forecasted day, and the output is the current moment of the forecasted day. In the test set, the input is the current moment load of the historical day and the feature data of the historical day and the forecasted day, and the output is the current moment of the forecasted day.
(3)
High frequency
The high-frequency component averages the high-frequency components of the similar days according to the corresponding moments, and it finally adds them to the low frequency component as well as the medium frequency component to obtain the final prediction. In this case, the predicted and actual comparison of the high-frequency component using the similar day averaging method is shown in Figure 9.
As shown in Figure 9, the RMSE of the error between the predicted value and the true value in the high-frequency component is 20.08 MW, the error is relatively small, so the method is applicable.

4.4. Performance of Similar Day Selection

Load data from 1 January 2014 to 31 December 2014 is taken to verify the superiority of similar day selection. The average results are shown in Figure 10 and Table 4.
As can be seen in Table 4, the correlation of 0.9189 for similar days is greater than 0.9104 for historical days.
The results of the proposed model based on similar days and historical days are shown in Figure 11 and Table 5. It can be seen that the flexible use of similar days for different frequency components can significantly improve prediction accuracy.

4.5. Performance of the Proposed Model

In order to verify the superiority of the method proposed in this paper, in this section, 5 models (LSTM, GRU, CNN, TCN, Transformer) are used as comparison models to evaluate the performance of the forecasting model. They are compared in terms of MAPE, RMSE, computation time, and CPU usage. The predicted results for a typical day (18 February 2014) are shown in Figure 12. To verify the consistency of the model across different days, predictions are made for each of the 20 randomly selected days throughout 2014. They include weekdays and holidays from all four seasons of the year. The prediction results are shown in Table 6 and the computation time and CPU usage are shown in Table 7.
From Table 6, it can be seen that the average accuracy of the forecasting model is higher than that of LSTM, GRU, CNN, TCN, and Transformer. The proposed model achieves an average MAPE of 8% in 20 statistical tests, which is lower than that of other models. The average MAPE values of all models are highlighted in bold in Table 6.
As shown in Table 7, the proposed model achieves a total computation time of 0.47 s, which is notably lower than that of the Transformer model (30.93 s). Compared to other models, such as LSTM (2.61 s) and GRU (2.34 s), the proposed model also performs efficiently while maintaining a comparable CPU usage percentage of around 14.5%. This efficiency can be attributed to its use in conjunction with the CNN, which has a simple structure. Additionally, the determination of weights and load decomposition of similarity days are carried out during data preprocessing and are not involved in the training process, further reducing the model’s computation time and memory usage. As a result, the proposed model is not only faster but also requires minimal computational resources, making it highly suitable for real-time applications.
To validate the rationality of the model parameters, 5-fold cross-validation was employed. The dataset was divided into 5 subsets, and then 5 rounds of training and validation were conducted, ensuring that each subset was used as the validation set once, with the remaining subsets serving as the training set. The average RMSE and average MAPE are presented in Table 8.
The cross-validation results show that the model achieves an average MAPE of 7.81% and an RMSE of 660.54, indicating a reasonably accurate prediction with manageable error levels.

5. Conclusions

In this paper, a short-term load forecasting model combining similar day theory and BWO-VMD is proposed. The load is decomposed into frequency components, each predicted using different models. The conclusions are as follows:
(1)
The architecture and parallel computing advantages of CNN enable it to complete training and prediction tasks in a shorter time frame, whereas models such as LSTM, GRU, TCN, and Transformer typically require more time for training due to differences in how they process time series. Therefore, employing CNN in this paper is a better choice.
(2)
Using similar day theory with VMD-CNN reduces MAPE by about 7% on different cases compared to using only historical days. This indicates that the use of the similar day theory can improve prediction accuracy.
(3)
The forecasting model excels in prediction accuracy, which is improved over LSTM, GRU, CNN, TCN, and Transformer. Furthermore, it achieves a short running time and relatively low CPU usage, making it favorable for practical applications where both accuracy and computational efficiency are critical.
Future research could further process the residuals generated by the decomposition or perform smoothing operations on the different frequency components to further improve the prediction accuracy. In addition, considering more features such as the impact of renewable energy on load during similar day selection has the potential to improve prediction accuracy.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/en18092358/s1.

Author Contributions

Q.C.: conceptualization, data curation, investigation, resources, software, writing—original draft. J.S.: conceptualization, methodology, supervision, writing—review and editing. S.C.: methodology, validation, visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hong, T.; Pinson, P.; Wang, Y.; Weron, R.; Yang, D.; Zareipour, H. Energy forecasting: A review and outlook. IEEE Open Access J. Power Energy 2020, 7, 376–388. [Google Scholar] [CrossRef]
  2. Timur, O.; Üstünel, H.Y. Short-Term Electric Load Forecasting for an Industrial Plant Using Machine Learning-Based Algorithms. Energies 2025, 18, 1144. [Google Scholar] [CrossRef]
  3. Zhang, R.; Zhang, C.; Yu, M. A similar day-based short-term load forecasting method using wavelet transform and LSTM. IEEJ Trans. Electr. Electron. Eng. 2022, 17, 506–513. [Google Scholar] [CrossRef]
  4. Zhu, J.; Dong, H.; Zheng, W.; Li, S.; Huang, Y.; Xi, L. Review and prospect of data-driven techniques for load forecasting in integrated energy systems. Appl. Energy 2022, 321, 119269. [Google Scholar] [CrossRef]
  5. Wen, Y.; Pan, S.; Li, X.; Li, Z. Highly fluctuating short-term load forecasting based on improved secondary decomposition and optimized VMD. Sustain. Energy Grids Netw. 2024, 37, 101270. [Google Scholar] [CrossRef]
  6. Jiang, L.; Wang, X.; Li, W.; Wang, L.; Yin, X.; Jia, L. Hybrid multitask multi-information fusion deep learning for household short-term load forecasting. IEEE Trans. Smart Grid 2021, 12, 5362–5372. [Google Scholar] [CrossRef]
  7. Kong, W.; Dong, Z.; Hill, D.J.; Luo, F.; Xu, Y. Short-term residential load forecasting based on resident behaviour learning. IEEE Trans. Power Syst. 2017, 33, 1087–1088. [Google Scholar] [CrossRef]
  8. Song, K.; Baek, Y.; Hong, D.H.; Jang, G. Short-term load forecasting for the holidays using fuzzy linear regression method. IEEE Trans. Power Syst. 2005, 20, 96–101. [Google Scholar] [CrossRef]
  9. Elattar, E.; Goulermas, J.; Wu, Q.H. Electric load forecasting based on locally weighted support vector regression. IEEE Trans. Syst. Man Cybern. 2010, 40, 438–447. [Google Scholar] [CrossRef]
  10. Taylor, J.W.; McSharry, P.E. Short-term load forecasting methods: An evaluation based on European data. IEEE Trans. Power Syst. 2007, 22, 2213–2219. [Google Scholar] [CrossRef]
  11. Vaghefi, A.; Jafari, M.A.; Emmanuel, B.; Lu, Y.; Brouwer, J. Modeling and forecasting of cooling and electricity load demand. Appl. Energy 2015, 136, 186–196. [Google Scholar]
  12. Khaled, S.; Kumar, M. Predictive Analysis of Groundwater Resources Using Random Forest Regression. J. Artif. Intell. Metaheuristics 2025, 9, 11–19. [Google Scholar] [CrossRef]
  13. Aouad, M.; Hajj, H.; Shaban, K.; Jabr, R.A.; El-Hajj, W. A CNN-sequence-to-sequence network with attention for residential short-term load forecasting. Electr. Power Syst. Res. 2022, 211, 108152. [Google Scholar] [CrossRef]
  14. Chen, K.; Chen, K.; Wang, Q.; He, Z.; Hu, J.; He, J. Short-term load forecasting with deep residual networks. IEEE Trans. Smart Grid 2019, 10, 3943–3952. [Google Scholar] [CrossRef]
  15. Sun, G.; Jiang, C.; Wang, X.; Yang, X. Short-term building load forecast based on a data-mining feature selection and LSTM-RNN method. IEEJ Trans. Electr. Electron. Eng. 2020, 15, 1002–1010. [Google Scholar] [CrossRef]
  16. Sajjad, M.; Khan, Z.A.; Ullah, A.; Hussain, T.; Ullah, W.; Lee, M.Y. A novel CNN-GRU-based hybrid approach for short-term residential load forecasting. IEEE Access 2020, 8, 143759–143768. [Google Scholar] [CrossRef]
  17. Ferreira, A.B.A.; Leite, J.B.; Salvadeo, D.H.P. Power substation load forecasting using interpretable transformer-based temporal fusion neural networks. Electr. Power Syst. Res. 2025, 238, 111169. [Google Scholar] [CrossRef]
  18. Rafi, S.H.; Deeba, S.R.; Hossain, E.A. A short-term load forecasting method using integrated CNN and LSTM network. IEEE Access 2021, 9, 32436–32448. [Google Scholar] [CrossRef]
  19. Tian, Z.; Liu, W.; Jiang, W.; Wu, C. CNNs-Transformer based day-ahead probabilistic load forecasting for weekends with limited data availability. Energy 2024, 293, 130666. [Google Scholar] [CrossRef]
  20. Nabavi, S.A.; Mohammadi, S.; Motlagh, N.H.; Tarkoma, S.; Geyer, P. Deep learning modeling in electricity load forecasting: Improved accuracy by combining DWT and LSTM. Energy Rep. 2024, 12, 2873–2900. [Google Scholar] [CrossRef]
  21. He, M.; Wang, H.; Thwin, M. A machine learning technique for optimizing load demand prediction within air conditioning systems utilizing GRU/IASO model. Sci. Rep. 2025, 15, 3353. [Google Scholar] [CrossRef] [PubMed]
  22. Sun, F.; Huo, Y.; Fu, L.; Liu, H.; Wang, X.; Ma, Y. Load-forecasting method for IES based on LSTM and dynamic similar days with multi-features. Glob. Energy Interconnect. 2023, 6, 285–296. [Google Scholar] [CrossRef]
  23. Karimi, M.; Karami, H.; Gholami, M.; Khatibzadehazad, H.; Moslemi, N. Priority index considering temperature and date proximity for selection of similar days in knowledge-based short-term load forecasting method. Energy 2018, 144, 928–940. [Google Scholar] [CrossRef]
  24. Luo, S.; Wang, B.; Gao, Q.; Wang, Y.; Pang, X. Stacking integration algorithm based on CNN-BiLSTM-Attention with XGBoost for short-term electricity load forecasting. Energy Rep. 2024, 12, 2676–2689. [Google Scholar] [CrossRef]
  25. Son, J.; Cha, J.; Kim, H.; Wi, Y.-M. Day-ahead short-term load forecasting for holidays based on modification of similar days’ load profiles. IEEE Access 2022, 10, 17864–17880. [Google Scholar] [CrossRef]
  26. Chen, Y.; Luh, P.B.; Guan, C.; Zhao, Y.; Michel, L.D.; Coolbeth, M.A. Short-term load forecasting: Similar day-based wavelet neural networks. IEEE Trans. Power Syst. 2010, 25, 322–330. [Google Scholar] [CrossRef]
  27. Elshabrawy, M. A Review on Waste Management Techniques for Sustainable Energy Production. Metaheuristic Optim. Rev. 2025, 3, 47–58. [Google Scholar] [CrossRef]
  28. Mandal, P.; Senjyu, T.; Funabashi, T. Neural networks approach to forecast several hour ahead electricity prices and loads in deregulated market. Energy Convers. Manag. 2006, 47, 2128–2142. [Google Scholar] [CrossRef]
  29. Bai, R.; Shi, Y.; Yue, M.; Du, X. Hybrid model based on K-means++ algorithm, optimal similar day approach, and long short-term memory neural network for short-term photovoltaic power prediction. Glob. Energy Interconnect. 2023, 6, 184–196. [Google Scholar] [CrossRef]
  30. Fan, J.; Zhong, M.; Guan, Y.; Yi, S.; Xu, C.; Zhai, Y.; Zhou, Y. An online long-term load forecasting method: Hierarchical highway network based on crisscross feature collaboration. Energy 2024, 299, 131459. [Google Scholar] [CrossRef]
  31. Zhu, X.; Dong, X.; Wang, S.; Zhang, J. Identification of airport similar days based on K-prototype cluster and grey correlation analysis. J. Phys. Conf. Ser. 2023, 2491, 012003. [Google Scholar] [CrossRef]
  32. Srivastava, A.K.; Pandey, A.S.; Houran, M.A.; Kumar, V.; Kumar, D.; Tripathi, S.M.; Gangatharan, S.; Elavarasan, R.M. A day-ahead short-term load forecasting using M5P machine learning algorithm along with elitist genetic algorithm (EGA) and random forest-based hybrid feature selection. Energies 2023, 16, 867. [Google Scholar] [CrossRef]
  33. Mounir, N.; Ouadi, H.; Jrhilifa, I. Short-term electric load forecasting using an EMD-BI-LSTM approach for smart grid energy management system. Energy Build. 2023, 288, 113022. [Google Scholar] [CrossRef]
  34. Zhou, Y.; Su, Z.; Gao, K.; Wang, Z.; Ye, W.; Zeng, J. A short-term electricity load forecasting method integrating empirical modal decomposition with SAM-LSTM. Front. Energy Res. 2024, 12, 1423692. [Google Scholar] [CrossRef]
  35. Fan, G.F.; Wei, H.Z.; Huang, H.P.; Hong, W.C. Application of ensemble empirical mode decomposition with support vector regression and wavelet neural network in electric load forecasting. Energy Sources Part B Econ. Plan. Policy 2025, 20, 2468687. [Google Scholar] [CrossRef]
  36. Ma, K.; Nie, X.; Yang, J.; Zha, L.; Li, G.; Li, H. A power load forecasting method in port based on VMD-ICSS-hybrid neural network. Appl. Energy 2025, 377 Pt B, 124246. [Google Scholar] [CrossRef]
  37. Gao, X.; Guo, W.; Mei, C.; Sha, J.; Guo, Y.; Sun, H. Short-term wind power forecasting based on SSA-VMD-LSTM. Energy Rep. 2023, 9, 335–344. [Google Scholar] [CrossRef]
  38. El-Kenawy, E.M.; Khodadadi, N.; Mirjalili, S.; Abdelhamid, A.A.; Eid, M.M.; Ibrahim, A. Greylag Goose Optimization: Nature-inspired Optimization Algorithm. Expert Syst. Appl. 2024, 238 Pt E, 122147. [Google Scholar] [CrossRef]
  39. Alkanhel, R.; El-Kenawy, E.M.; Abdelhamid, A.A.; Ibrahim, A.; Alohali, M.A.; Abotaleb, M.; Khafaga, D.S. Network Intrusion Detection Based on Feature Selection and Hybrid Metaheuristic Optimization. Comput. Mater. Contin. 2023, 74, 2677–2693. [Google Scholar] [CrossRef]
  40. Hadjouni, M.; Abdelaziz, A.A.; El-Kenawy, E.M.; Ibrahim, A.; Eid, M.M.; Jamjoom, M.M. Advanced Meta-Heuristic Algorithm Based on Particle Swarm and Al-Biruni Earth Radius Optimization Methods for Oral Cancer Detection. IEEE Access 2023, 11, 23681–23700. [Google Scholar] [CrossRef]
  41. Zhong, C.; Li, G.; Meng, Z. Beluga Whale Optimization: A Novel Nature-inspired Metaheuristic Algorithm. Knowl.-Based Syst. 2022, 251, 109215. [Google Scholar] [CrossRef]
  42. Long, H.; Sun, Y.; Yang, X.; Zhao, X.; Zhao, F.; Yang, X. Defect Monitoring Method for Al-CFRTP UFSW Based on BWO–VMD–HHT and ResNet. Sci. Rep. 2024, 14, 18605. [Google Scholar] [CrossRef] [PubMed]
  43. Bakare, M.S.; Abdulkarim, A.; Shuaibu, A.N.; Muhamad, M.M. A Hybrid Long-Term Industrial Electrical Load Forecasting Model Using Optimized ANFIS with Gene Expression Programming. Energy Rep. 2024, 11, 5831–5844. [Google Scholar] [CrossRef]
  44. Zulfiqar, M.; Kamran, M.; Rasheed, M.B.; Alquthami, T.; Milyani, A.H. A Hybrid Framework for Short Term Load Forecasting with a Novel Feature Engineering and Adaptive Grasshopper Optimization in Smart Grid. Appl. Energy 2023, 338, 120829. [Google Scholar] [CrossRef]
  45. Jiang, Z.; Zhang, L.; Ji, T. NSDAR: A neural network-based model for similar day screening and electric load forecasting. Appl. Energy 2023, 349, 121647. [Google Scholar] [CrossRef]
  46. Chinese Society for Electrical Engineering—Special Committee on Electrical Mathematics. Problem Description of the 9th National College Student Electrical Mathematics Modeling Contest [EB/OL]. 25 June 2016. Available online: https://shumo.neepu.edu.cn/#/ (accessed on 10 May 2018).
Figure 1. General framework of this paper.
Figure 1. General framework of this paper.
Energies 18 02358 g001
Figure 2. Weighted eigenspace coordinate system.
Figure 2. Weighted eigenspace coordinate system.
Energies 18 02358 g002
Figure 3. The architecture of CNN.
Figure 3. The architecture of CNN.
Energies 18 02358 g003
Figure 4. Overall flow of the proposed model.
Figure 4. Overall flow of the proposed model.
Energies 18 02358 g004
Figure 5. Average envelope entropy and mode energy for each IMF.
Figure 5. Average envelope entropy and mode energy for each IMF.
Energies 18 02358 g005
Figure 6. VMD results for the forecasted day.
Figure 6. VMD results for the forecasted day.
Energies 18 02358 g006
Figure 7. High, medium, and low frequency components.
Figure 7. High, medium, and low frequency components.
Energies 18 02358 g007
Figure 8. Comparison of original and reconstructed signals.
Figure 8. Comparison of original and reconstructed signals.
Energies 18 02358 g008
Figure 9. The predicted and actual comparison of the high-frequency component.
Figure 9. The predicted and actual comparison of the high-frequency component.
Energies 18 02358 g009
Figure 10. Comparison of correlation between similar and historical days.
Figure 10. Comparison of correlation between similar and historical days.
Energies 18 02358 g010
Figure 11. Comparison of results between models based on similar and historical days.
Figure 11. Comparison of results between models based on similar and historical days.
Energies 18 02358 g011
Figure 12. Comparison of different models on a typical day.
Figure 12. Comparison of different models on a typical day.
Energies 18 02358 g012
Table 1. The importance of features.
Table 1. The importance of features.
FeaturesImportance
Maximum temperature (°C)0.12
Minimum temperature (°C)0.26
Average temperatures (°C)0.34
Relative humidity (%)0.09
Precipitation (mm)0.03
Weekday index0.15
Table 2. Low-frequency training set’s parameters.
Table 2. Low-frequency training set’s parameters.
Parameters (q,r,s)RMSEMAPE
60, 10, 3191.0047.31
90, 10, 3192.5348.55
60, 20, 3464.22133.79
60, 10, 5232.0768.04
Table 3. Medium-frequency training set’s parameters.
Table 3. Medium-frequency training set’s parameters.
Parameter (n)RMSEMAPE
60992.9616.18
90237.773.28
120455.926.31
Table 4. Comparison of average correlation between similar and historical days.
Table 4. Comparison of average correlation between similar and historical days.
TargetsCorrelation
Similar days0.9189
Historical days0.9104
Table 5. Comparison of results between models based on similar and historical days.
Table 5. Comparison of results between models based on similar and historical days.
Load UsedMAPERMSE
Similar days2.24222.27
Historical days9.91573.33
Table 6. Comparison of MAPE and RMSE across different models.
Table 6. Comparison of MAPE and RMSE across different models.
DateLSTMGRUCNNTCNTransformerProposed Model
MAPERMSEMAPERMSEMAPERMSEMAPERMSEMAPERMSEMAPERMSE
16.9546.710.1723.17.9623.919.51166.616.5959.45.7374.3
28.5723.75.6494.812.4962.85.3396.59.5624.52.2222.3
39.3818.86.5626.43.9284.74.9416.910.1825.77.7656.5
45.7526.83.0286.92.6251.61.6149.84.5360.72.8300.7
56.4661.87.9705.47.9731.55.5563.04.6433.47.2577.6
68.7707.18.0745.28.5716.78.6726.68.2713.88.5718.3
712.01261.18.3919.87.0814.48.4935.56.3746.17.3864.9
87.5770.04.4473.34.1445.93.4397.14.3502.112.71378.9
99.0850.09.61045.712.51474.613.01625.49.2941.410.0915.1
106.4590.35.7449.74.7367.25.5521.725.21847.47.4705.9
114.5430.35.6 546.55.8571.17.5660.14.8348.84.8453.2
126.8518.86.3463.46.5571.55.4411.912.5889.65.8479.7
1316.21153.118.31549.633.82886.318.81686.514.4912.420.01755.8
1421.41776.713.41222.117.01414.413.71185.014.41006.88.9834.7
153.5221.49.7565.37.5438.713.6765.53.2246.35.6340.3
1614.1728.017.7907.711.7605.721.01083.226.21347.18.5459.4
1712.41034.312.81221.011.7943.09.5673.713.31126.59.6670.9
1822.01692.414.21158.111.3973.013.01064.015.71266.314.71226.7
196.6541.46.8617.47.7713.65.6431.413.8966.36.7554.4
205.7560.23.1298.14.9483.72.8235.44.5360.72.8230.7
avg9.7805.68.9751.09.5813.79.3754.811.1821.38.0686.0
Table 7. Comparison of time and CPU usage of different models.
Table 7. Comparison of time and CPU usage of different models.
ModelTotal Time (s)CPU Usage (%)
CNN0.24 14.79
GRU2.34 14.65
LSTM2.61 14.85
TCN0.43 14.63
Transformer30.93 14.48
proposed model0.47 14.56
Table 8. Average MAPE and RMSE of the proposed model.
Table 8. Average MAPE and RMSE of the proposed model.
MAPERMSE
7.81660.54
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cheng, Q.; Shi, J.; Cheng, S. Short-Term Load Forecasting Based on Similar Day Theory and BWO-VMD. Energies 2025, 18, 2358. https://doi.org/10.3390/en18092358

AMA Style

Cheng Q, Shi J, Cheng S. Short-Term Load Forecasting Based on Similar Day Theory and BWO-VMD. Energies. 2025; 18(9):2358. https://doi.org/10.3390/en18092358

Chicago/Turabian Style

Cheng, Qi, Jing Shi, and Siwei Cheng. 2025. "Short-Term Load Forecasting Based on Similar Day Theory and BWO-VMD" Energies 18, no. 9: 2358. https://doi.org/10.3390/en18092358

APA Style

Cheng, Q., Shi, J., & Cheng, S. (2025). Short-Term Load Forecasting Based on Similar Day Theory and BWO-VMD. Energies, 18(9), 2358. https://doi.org/10.3390/en18092358

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop