Next Article in Journal
Research on Microwave Non-Destructive Testing Method for Defects in 10 kV Distribution Cable Intermediate Joints
Previous Article in Journal
Comparative CFD Investigation of Laminar and Transition SST Models in a Molten Salt Natural Circulation Loop
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Adaptive Hybrid Short-Term Load Forecasting Framework Based on Improved Rime Optimization Variational Mode Decomposition and Cross-Dimensional Attention

1
College of Electrical Engineering and New Energy, China Three Gorges University, Yichang 443002, China
2
College of Electrical Engineering, Sichuan University, Chengdu 610065, China
*
Author to whom correspondence should be addressed.
Energies 2026, 19(2), 497; https://doi.org/10.3390/en19020497
Submission received: 10 December 2025 / Revised: 7 January 2026 / Accepted: 14 January 2026 / Published: 19 January 2026
(This article belongs to the Section F: Electrical Engineering)

Abstract

Accurate Short-Term Load Forecasting (STLF) is paramount for the stable and economical operation of power systems, particularly in the context of high renewable energy penetration, which exacerbates load volatility and non-stationarity. The prevailing advanced “decomposition–ensemble” paradigm, however, faces two significant challenges when processing non-stationary signals: (1) The performance of Variational Mode Decomposition (VMD) is highly dependent on its hyperparameters (K, α), and traditional meta-heuristic algorithms (e.g., GA, GWO, PSO) are prone to converging to local optima during the optimization process; (2) Deep learning predictors struggle to dynamically weigh the importance of multi-dimensional, heterogeneous features (such as the decomposed Intrinsic Mode Functions (IMFs) and external climatic factors). To address these issues, this paper proposes a novel, adaptive hybrid forecasting framework, namely IRIME-VMD-CDA-LSTNet. Firstly, an Improved Rime Optimization Algorithm (IRIME) integrated with a Gaussian Mutation strategy is proposed. This algorithm adaptively optimizes the VMD hyperparameters by targeting the minimization of average sample entropy, enabling it to effectively escape local optima. Secondly, the optimally decomposed IMFs are combined with climatic features to construct a multi-dimensional information matrix. Finally, this matrix is fed into an innovative Cross-Dimensional Attention (CDA) LSTNet model, which dynamically allocates weights to each feature dimension. Ablation experiments conducted on a real-world dataset from a distribution substation demonstrate that, compared to GA-VMD, GWO-VMD, and PSO-VMD, the proposed IRIME-VMD method achieves a reduction in Root Mean Square Error (RMSE) of up to 18.9%. More importantly, the proposed model effectively mitigates the “prediction lag” phenomenon commonly observed in baseline models, especially during peak load periods. This framework provides a robust and high-accuracy solution for non-stationary load forecasting, holding significant practical value for the operation of modern power systems.

1. Introduction

Short-Term Load Forecasting (STLF) is a core component in the planning and operation of power systems. Its prediction accuracy directly impacts the grid’s generation scheduling, energy dispatch, system stability, and operational costs. In the context of modern smart grids, the large-scale integration of distributed generation, electric vehicles, and intermittent renewable energy sources (such as wind and solar) has drastically increased the complexity and uncertainty of the power grid. This transformation has led to load curves exhibiting stronger volatility and stochasticity, imposing unprecedentedly high requirements on the accuracy and robustness of forecasting models [1]. Therefore, developing higher-accuracy STLF models is not only crucial for effectively reducing reserve capacity costs and energy waste caused by prediction errors, but it also serves as a core technological support for promoting the large-scale accommodation of renewable energy and enhancing grid resilience [2,3,4].
To cope with the high non-linearity and non-stationarity of electric load series, load forecasting technology has evolved from traditional statistical methods to advanced artificial intelligence models [5]. Early statistical methods, such as the Autoregressive Integrated Moving Average (ARIMA) model, struggled to effectively capture the complex dynamic patterns in load data due to their reliance on linear assumptions. Subsequently, deep learning models represented by Long Short-Term Memory (LSTM) networks, leveraging their powerful non-linear fitting capabilities, improved prediction accuracy to a certain extent [6,7]. However, when these single models are directly applied to the original load sequences, they often fail to fully resolve the multi-scale, multi-frequency information embedded in the data, and frequently lead to a significant “lag” phenomenon in the prediction results at load peaks and valleys [8,9]. To overcome this limitation, the “Decomposition–Ensemble” paradigm has become the current state-of-the-art research paradigm in this field. The core idea of this strategy is to first utilize signal processing techniques to decompose the complex original sequence into a set of relatively stationary Intrinsic Mode Functions (IMFs), then forecast each component separately, and finally aggregate the results. Among these techniques, Variational Mode Decomposition (VMD), owing to its solid mathematical theoretical foundation and its significant advantages in suppressing mode mixing and noise, has been proven to be an extremely effective decomposition tool and has been widely applied in the field of energy forecasting [10,11]. Some researchers have employed VMD to decompose load sequences and combined it with the Grey Wolf Optimizer (GWO) algorithm to optimize the parameters of the subsequent Support Vector Regression (SVR) prediction model. However, this method applies the optimization algorithm to the predictive model rather than the decomposition model. The key parameters of VMD itself still rely on empirical settings, failing to achieve adaptive optimization of the decomposition process [12]. Other studies have utilized a framework combining Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN) and the Salp Swarm Algorithm (SSA) for probabilistic forecasting; however, its core decomposition technique still falls within the EMD category, failing to completely avoid mode mixing and residual noise issues, and the optimization process similarly focuses on the predictor rather than the decomposer [13]. Furthermore, Yin et al. also constructed a hybrid model based on Empirical Mode Decomposition (EMD) for sub-hourly load forecasting; however, this model is likewise limited by the inherent mode-mixing defects of the EMD method, which is precisely the core problem VMD technology aims to overcome [14]. Although these studies have made certain progress, they collectively highlight that the current research on global adaptive optimization of the decomposition process itself remains insufficient.
In this context, some cutting-edge studies have begun to explore the use of emerging meta-heuristic algorithms (such as the Rime Optimization Algorithm, RIME) to address such optimization challenges, but their application methods and focal points still have limitations. Study [15] proposed a model combining RIME-optimized VMD with a Multi-head Self-Attention LSTM for wind speed forecasting; however, the multi-head self-attention mechanism it employed is essentially a temporal attention, designed to capture dependencies between different time steps within a single sequence. Study [16] enhanced the RIME algorithm itself through a quadratic interpolation learning strategy and successfully applied it to the parameter identification of photovoltaic (PV) models; however, the focus of this study was on improving the optimizer itself, and it was not applied to solve the core challenge of adaptive parameter selection for VMD in signal decomposition-forecasting tasks. Study [17] also utilized the RIME algorithm for PV power forecasting, combining CEEMDAN decomposition with a hybrid deep learning network; however, this work, on the one hand, employed CEEMDAN for decomposition, which is prone to mode mixing, and on the other hand, the RIME algorithm optimized the hyperparameters of the prediction model, rather than the key parameters of the decomposition process. In summary, although the RIME algorithm has demonstrated potential, existing research has not yet proposed a synergistic framework that can simultaneously achieve adaptive global optimization of VMD parameters and dynamic weighting of multi-dimensional features in the forecasting stage. This is precisely the gap that the present study aims to fill.
Furthermore, in addition to the aforementioned deficiencies in the decomposition-optimization linkage, current mainstream deep learning prediction models also exhibit numerous issues in their own structure and application methods when handling short-term load forecasting tasks. Study [18] employed a CNN-LSTM-based sequence-to-sequence model for monthly peak load forecasting over a three-year period; however, this model focuses on long-term prediction, and its architecture and data granularity (monthly) cannot effectively capture the complex daily/weekly periodicities and high-frequency fluctuation characteristics inherent in short-term loads. Study [19] proposed a CDA-LSTM model combining convolution and dual attention; however, this model was primarily designed for univariate time series and failed to address the issue of how to dynamically weigh the importance of different feature dimensions in a multivariate input scenario (such as when multiple decomposed modes and various external factors coexist). Study [20] also adopted a hybrid model of CNN-LSTM combined with an attention mechanism to forecast highly uncertain Electric Vehicle (EV) charging loads; however, this study did not utilize signal decomposition techniques to pre-process the highly stochastic load sequence, instead performing direct prediction, which limits the model’s ability to handle non-stationarity.
Recognizing the limitations of these models that directly forecast the original sequences, many studies have naturally turned to hybrid models that integrate the “decomposition–ensemble” paradigm with attention mechanisms. However, despite the more advanced approach, these hybrid methods still possess their own shortcomings in specific implementation pathways and have not fully resolved all challenges. Study [21] designed a Hierarchical Decomposition Self-Attention Network named LTSNet for long-term load trend forecasting; however, its “decomposition” is implemented within the network architecture, and its application scenario is also long-term forecasting, which faces different challenges than short-term forecasting. Study [22] applied the classic LSTNet model to household short-term load forecasting; however, the standard LSTNet directly processes the original sequence, which not only easily produces prediction lag but also cannot dynamically differentiate the importance of multi-source input features. Studies [23] and [7] applied attention-enhanced LSTM or CNN-LSTM models to conventional short-term load forecasting and combined heat and power load forecasting, respectively; however, the attention mechanisms used in these studies mostly focus on the temporal dimension, used to identify critical historical time points. Study [24] adopted a “decomposition-prediction” framework combining CEEMDAN-SE and LSTM; however, the CEEMDAN decomposition method it used is more prone to mode mixing and residual noise problems compared to VMD, and this study likewise failed to solve the fundamental challenge of key parameters in the decomposition method (such as the number of decomposition layers) needing to rely on empirical settings.
In summary, existing research faces a coupled dilemma: on the one hand, suboptimal signal decomposition (e.g., VMD trapped in local optima) leads to feature aliasing, making it difficult for the predictor to identify critical information; on the other hand, existing attention mechanisms struggle to dynamically assign high weights to the true driving factors when processing such “insufficiently purified” multi-dimensional heterogeneous inputs. To break this dilemma, the core innovation of this study lies in proposing a synergistic ‘decomposition–perception’ framework. The superiority of this framework does not originate from the mere stacking of individual algorithms, but rather from the close synergy between IRIME and CDA. The core contributions can be summarized as follows:
(1)
An adaptive signal decomposition method (IRIME-VMD) is proposed. This study employs an Improved Rime Optimization Algorithm (IRIME) integrated with a Gaussian Mutation strategy to perform global adaptive optimization of the key parameter combination—the number of decomposed modes (K) and the penalty factor (α)—in Variational Mode Decomposition (VMD). By introducing Gaussian Mutation, this method enhances population diversity, effectively avoiding the problem of traditional optimization algorithms easily falling into local optima during the solution process. Consequently, it achieves an optimal decomposition of the original load sequence, transforming it into a set of more stationary and more predictable Intrinsic Mode Functions (IMFs).
(2)
A multi-dimensional feature matrix was constructed, and a prediction model based on Cross-Dimensional Attention (CDA-LSTNet) was designed. Subsequent to the signal decomposition, this study utilizes the Maximal Information Coefficient (MIC) method to screen for climatic features that are highly correlated with the load. These features are then combined with the decomposed IMF components to construct a multi-dimensional information matrix. This matrix serves as the input for a novel CDA-LSTNet prediction model. The key innovation of this model lies in the introduction of the Cross-Dimensional Attention (CDA) mechanism. This enables the network to dynamically assign importance weights to different input feature dimensions (i.e., each IMF component and climatic variable) at every time step, thereby more intelligently capturing the key driving factors that influence load variations.
(3)
An end-to-end synergistic forecasting framework was integrated and validated. This study organically combines adaptive signal decomposition (IRIME-VMD) with dynamic feature attention prediction (CDA-LSTNet) to form a complete and automated hybrid framework, spanning from raw data processing to final result prediction. Through a “decompose first, then fuse and predict” strategy, this framework synergistically leverages the strengths of each module. This not only enhances the overall prediction accuracy of the model but also effectively mitigates the prediction lag problem commonly observed in single deep learning models.

2. The IRIME-VMD-CDA-LSTNet Hybrid Model

The core concept of the IRIME-VMD-CDA-LSTNet hybrid model proposed in this paper (as shown in Figure 1) is ‘synergistic optimization’. Unlike conventional cascade models, which treat decomposition and prediction as isolated tasks, this framework functions as an interlocking and integrated whole. Given that electrical load series exhibit distinct non-linear and non-stationary characteristics due to the coupling of meteorological factors and user consumption patterns, a single model often fails to capture multi-scale features effectively. In this context, a hybrid model is defined as an integrative framework that combines distinct algorithmic components—such as signal decomposition, optimization strategies, and deep learning predictors—to leverage their complementary strengths and mitigate the limitations of individual methods. By collecting historical climatic data from the distribution substation and applying modal decomposition to the historical load data, the model achieves the disassembly of complex time-domain signals, converting them into stationary signals with specific frequency bandwidths. The IRIME-VMD modal decomposition method is employed for the decomposition to autonomously determine the optimal modal parameters, utilizing sample entropy as a feedback signal to closed-loop adjust K and α. Subsequently, the Maximum Information Coefficient (MIC) is utilized to screen climatic data, eliminating redundant features, and is combined with substation attributes and the modal decomposed components of the historical load to construct a multi-dimensional information matrix. This matrix preserves the strong correlation between extrinsic covariates and intrinsic load dynamics. This multi-dimensional information matrix is then input into the Long and short-term time-series network (LSTNet). Cross-dimensional attention is introduced to each dimensional input sequence, enabling the model to adaptively adjust the predictive weights for each dimension, thereby achieving accurate load forecasting. The overall framework of the model is illustrated in Figure 1.

2.1. Fundamental Principles

2.1.1. Principles of the RIME Algorithm

The Rime Optimization Algorithm (RIME) is a meta-heuristic optimization algorithm based on the natural formation process of frost and ice. By simulating the dynamic growth characteristics of rime ice in both soft-rime (light wind) and hard-rime (strong wind) environments, it constructs a soft-rime search strategy, a hard-rime puncture mechanism, and an improved greedy selection mechanism. This achieves a synergistic optimization of exploration and exploitation. From a mathematical perspective, the soft-rime process corresponds to an expansive search within the solution space to identify potential regions of interest, whereas the hard-rime mechanism functions as a localized exploitation operator to refine the solution accuracy [25,26].
The RIME algorithm constructs an efficient search mechanism by simulating the movement and accretion process of rime particles. In the algorithm, the rime particle population is represented as a matrix, and the fitness value of each particle is calculated to evaluate its quality [27]. The mathematical expression of the population matrix is given by:
R = x 11 , x 12 , , x 1 j x 21 , x 22 , , x 2 j x n 1 , x n 2 , , x n j
where R represents the rime particle population matrix. x i j represents the position of the i-th particle in the j-th dimension.
As the iteration process progresses, the random walk of particles within the search space continuously alters their positions, which in turn triggers the accretion phenomenon, ultimately reaching a stable state under the influence of environmental factors. The accretion process simulates the gradual formation of soft rime within the frost [28]. The position update of the rime particles is shown in the following equation:
R i j new = R o p t , j + r 1 cos θ β ( ψ ( U b i j L b i j ) + L b i j ) r 2 < E θ = π t 10 T β = 1 [ w t T ] / w E = t / T
where R i j new is the updated position of the particle; R o p t , j is the j-th particle of the current best crystal; r 1 ( 1 , 1 ) is a random number controlling the particle’s movement direction; θ is the direction angle, dynamically adjusted as the iteration count t increases; T is the maximum number of iterations; β is the environmental factor, representing external influences; w is used to control the number of segments in the step function; ψ ( 0 , 1 ) is the adhesion coefficient, used to regulate particle spacing; U b i j and L b i j are the upper and lower bounds of the search space; E is the adherence coefficient, which determines the particle update probability; r 2 ( 0 , 1 ) is a random number that, together with E, controls whether to update the particle’s position.
Furthermore, during the hard rime growth and puncture process, the crossover replacement among particles follows the following formula:
R i j new = R o p t , j , r 3 < F n o r m ( S i )
where F n o r m ( S i ) is the normalized value of the current agent fitness value, indicating the probability of the i -th particle being selected; r 3 [ 1 , 1 ] is a random number.

2.1.2. Gaussian Mutation

To prevent the RIME algorithm from getting trapped in a local optimum when searching for the global optimum, this paper introduces Gaussian Mutation to increase the algorithm’s randomness. The mutation operator can prevent the algorithm from falling into local optima while also maintaining the diversity of the population’s individuals. To reduce the probability of the RIME algorithm encountering a local optimum [29,30], the Gaussian Mutation operator is specifically expressed as follows:
X b e s t ( t + 1 ) = X α ( t ) ( 1 + G a u s s i o n ( σ ) )
where X b e s t ( t + 1 ) denotes the position of the individual after mutation, and G a u s s i o n ( σ ) is a random variable that follows a Gaussian distribution. The global optimum position is updated as follows:
X α ( t + 1 ) = X b e s t ( t + 1 ) , else X α ( t ) , f ( X b e s t ( t + 1 ) ) > f ( X α ( t ) )   a n d   rand < p
The global optimum position is updated as follows: where rand represents a random variable in the interval [0, 1], p is the selection probability, and f(x) is the fitness of the individual. As indicated by the formula, applying a mutation operation to the global optimal solution X α ( t ) can prevent the algorithm from getting trapped in a local optimum (if the current global optimum is a local optimal value). Adopting this selection strategy enables the population to evolve towards the optimal solution, while simultaneously and effectively enhancing the algorithm’s search efficiency.

2.1.3. Variational Mode Decomposition

VMD is an adaptive signal processing method that decomposes the non-stationary load signal into multiple Intrinsic Mode Functions (IMFs), each possessing specific frequency band characteristics. This achieves an enhancement in the stationarity and interpretability of the signal analysis.
Let the original signal be f(t). VMD decomposes it into K modal components u ( k ) ( k = 1 , 2 , , K ) . The constrained variational model is as follows:
min u k , w k k = 1 K t ( δ ( t ) + j π t ) u k ( t ) e j w k t 2 2 s . t . k = 1 K u k ( t ) = f ( t )
where t denotes the time derivative, δ ( t ) is the Dirac delta function, and represents the convolution operation. To transform the constrained problem into an unconstrained optimization form, a quadratic penalty factor α and a Lagrange multiplier λ ( t ) are introduced to construct the augmented Lagrangian function:
L ( u k , w k , λ ) = f ( t ) k = 1 K u k ( t ) 2 2 + α k = 1 K t [ ( δ ( t ) + t π ) u k ( t ) ] e j w k t 2 2 + λ ( t ) , f ( t ) k = 1 K u k ( t )
In Equation (7), the first term constrains the signal sparsity by minimizing the bandwidth of the modal components; the second term is the reconstruction error penalty term, and the third term is the Lagrangian multiplier term. Through the Alternating Direction Method of Multipliers, u k , w k and λ are iteratively updated until the convergence condition is met.

2.2. IRIME-VMD Modal Decomposition

Addressing the limitation of the traditional VMD algorithm, where the number of modes K and the penalty factor α rely on empirical settings, this paper proposes an adaptive parameter optimization framework based on an Improved Rime Optimization Algorithm (IRIME-VMD). By introducing Gaussian Mutation, small-amplitude perturbations are added during the optimization process to prevent the solution from converging to a local optimum. Leveraging the global search capability and dynamic update mechanism of RIME, and using the average sample entropy as the fitness function, the framework achieves the synergistic optimization of K and α [31,32]. The algorithmic procedure of IRIME-VMD is illustrated in Figure 2 (where the asterisk * denotes the optimal parameters), and the specific steps are as follows:
(1)
Initialization: The optimization process is initiated by defining the population size N, the maximum number of iterations T, and the boundary constraints [Lb, Ub] for the RIME algorithm. Simultaneously, the initial decomposition parameters for VMD are established.
(2)
Data input: The power load sequence intended for decomposition is imported into the system as the target signal.
(3)
Position update: The particle adherence coefficient E is computed based on the ratio of the current iteration t to the total iterations T. Subsequently, the particle positions are updated according to Equation (2) when the stochastic condition (r2 < E) is met.
(4)
Fitness evaluation: The average sample entropy is used as the fitness evaluation metric to measure the complexity and information redundancy of the VMD decomposition results. For a given parameter combination [K, α ], the original load sequence is decomposed into K modal components via VMD. The Hilbert transform envelope signal of each component is calculated, and the sample entropy is computed based on a sliding window. The final fitness value is the mean of the sample entropies of all modal components. Its mathematical expression is:
F = 1 K k = 1 K S E ( u k )
where S E ( u k ) is the sample entropy value of the k-th modal component.
(5)
Boundary constraint and assessment: A boundary check is enforced to constrain updated particle positions within the feasible range [Lb, Ub], ensuring the physical validity of parameters. The fitness of the new position is then evaluated; if it outperforms the original, the global optimal solution [ K , α ] is updated accordingly.
(6)
Termination and output: The process evaluates whether the termination criteria (maximum iterations T or convergence thresholds) are satisfied. Upon meeting these conditions, the optimal parameter combination [ K , α ] is output and applied to the final VMD decomposition. Conversely, if criteria are unmet, the algorithm proceeds to the adaptive mutation adjustment phase.
Introduce small-amplitude random fluctuations via Gaussian Mutation. When diversity decreases, the mutation rate increases. This adaptive mechanism ensures that when population diversity is high, a lower mutation rate is maintained to focus on local search. When population diversity is low, the mutation rate is increased to promote global search. This prevents the solution from getting trapped in a local optimum. Afterward, the RIME parameter update is executed for each individual in the population. After updating the new parameters, evaluate and update the global optimal solution. Determine if the population has undergone multiple generations without improvement. If there is improvement, determine if the convergence criteria are met. If the convergence criteria are met, perform modal decomposition according to the optimized parameters and output the decomposed modal components. If there is still no improvement, return to the current solution, re-perform VMD decomposition, and loop the aforementioned process.

2.3. CDA-LSTNet

The CDA-LSTNet prediction network constructed in this paper is the core component of the overall framework (as shown in Figure 1), and its detailed internal architecture is illustrated in Part 3 of Figure 1. This model is designed to efficiently process the multi-dimensional information matrix, which is jointly constituted by the modal components (imf1–imfn) decomposed by VMD and external climatic data (e.g., daily average temperature, daily average humidity). The specific workflow of this network is as follows:
① Data acquisition: External climatic data (e.g., daily average temperature, humidity, wind speed) and characteristic parameters of the low-voltage distribution substation (e.g., radius, capacity) are integrated. Simultaneously, the historical load data sequence of the substation is acquired.
② Modal decomposition: The IRIME-VMD method is applied to the historical load sequence, decomposing it into multiple modal components (imf1-imfn) and a residual (res) sequence to reduce non-stationarity.
③ Feature fusion: The decomposed modal components, the residual sequence, the climatic data, and the substation parameters are concatenated to construct a multi-dimensional input matrix K. Here, the matrix dimensions correspond to the feature types (n) and the sequence length (L).
④ Feature extraction: A one-dimensional Convolutional Neural Network (1D-CNN) is employed to extract local features from each dimension separately. The convolutional kernel size is set to 1, generating a feature vector of size n × 1 for each time step, which is then fed into the LSTM network.
⑤ Attention mechanism: For the output vector Q generated by the LSTM, a dimensional attention mechanism is utilized. This involves transposing the original matrix K and performing a weighted operation with Q to derive a weight vector normalized between [0, 1].
⑥ Final prediction: A weighted product operation is performed between the attention weight vector and the LSTM output matrix. The resulting vector is processed through a fully connected layer to generate the final load forecasting output, effectively incorporating multi-dimensional information weights.

3. Experimental Design and Validation

3.1. Dataset

Load forecasting was conducted using data from four distribution transformer districts (A, B, C, and D) in Chongqing, which includes a total of 365 days of load data from the entire year of 2020. The daily load profiles for these four districts are illustrated in Figure 3. Unlike aggregated system-level loads which typically display smooth seasonal patterns, the load curves at this distribution level exhibit significant volatility. These variations tend to mask the macro-seasonal trends typically associated with temperature changes. Local climatic data, specifically temperature and humidity, were also acquired to assist in analyzing load characteristics. The specific profiles of daily average temperature and relative humidity during the simulation period are illustrated in Figure 4. It is important to note that since the four districts are geographically located within the same region, they share the identical ambient meteorological conditions as depicted in the figure. Regarding the experimental setup, the first 330 days were utilized for model training and adaptive hyperparameter optimization. The forecasting task is specifically defined as multi-step ahead prediction, where the load variation for the subsequent 35 days (serving as the test set) is predicted simultaneously. This extended horizon is chosen to evaluate the model’s long-term trend capturing capability under high volatility.

3.2. Evaluation Metrics

To intuitively evaluate the accuracy and applicability of the model, this paper selects the coefficient of determination R2 (R-squared), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) to assess the model’s predictive capability.
Among these metrics, RMSE is considered the primary indicator for model assessment in this study, as it imposes heavier penalties on larger prediction errors, which are critical for the stability and reliability of power system operations. The other metrics (R2, MAE, and MAPE) serve as complementary indicators to provide a comprehensive evaluation of the model’s performance from different perspectives.
R2 represents the proportion of the total variation in the dependent variable that can be explained by the independent variable(s) through the regression relationship. The value range of R2 is from 0 to 1. The specific calculation formula is as follows:
R 2 = 1 Y ^ Y
Y ^ = i = 1 N ( y i ^ y i ) 2
Y _ = i = 1 N ( y i _ y i ) 2
where y i is the actual power consumption value, y i ^ is the power consumption value predicted by the model, ȳ is the mean value of the test samples, N is the number of data points, Ŷ is the error generated by the model prediction, and Ȳ is the error generated by the mean value.
RMSE is used to evaluate the model’s goodness-of-fit to the data by comparing the differences between the model’s prediction results and the actual observed data. The specific calculation formula is as follows:
RMSE = 1 N i = 1 N ( y i ^ y i ) 2
MAE is the expectation of the absolute values of the errors between each predicted value and the actual value. The specific calculation formula is as follows:
MAE = 1 N i = 1 N y i ^ y i
MAPE is an indicator used to evaluate the accuracy of the prediction model. The specific calculation formula is as follows:
MAPE = 1 N i = 1 N y i ^ y i y i × 100 %

4. Case Study Analysis

4.1. Baseline Model Comparison

The relevant data mentioned in Section 3 are combined with the sequences obtained from the IRIME-VMD decomposition to form a multi-dimensional information matrix, which is input into the CDA-LSTNet network. LSTM, GRU, SVR, Random Forest, ARIMA and the proposed method are, respectively, used for a comparative analysis of prediction performance. The results are shown in the table below.
A superficial analysis of the aggregated metrics in Table 1 yields a counter-intuitive and misleading result. Traditional models, particularly SVR and the statistical method ARIMA, often exhibit the most competitive results in quantitative scores. For instance, SVR obtains the lowest RMSE (261.14) on Substation A, while ARIMA’s values on Substation B (267.1), Substation C (586.1), and Substation D (485.1) all outperform the proposed method. However, a qualitative analysis of the graphs reveals that this numerical superiority is a “pseudo-high-accuracy” stemming from model failure. Figure 5, Figure 6 and Figure 7 show that these models are “lazy regressors.” They exhibit severe over-smoothing, merely generating a simple low-frequency waveform, and completely failing to capture any of the high-frequency fluctuations or critical peaks in the actual load. They trade all dynamic response capabilities for a lower average error, rendering them useless for practical energy scheduling tasks that rely on accurate peak prediction. Conversely, while standard deep learning models (LSTM and GRU) attempt to track the data’s volatility, they are clearly and severely affected by the “prediction lag” effect. Their prediction curves (red dotted line) consistently and systematically lag behind the “Test real data” (blue line) at critical peaks, reacting only after the peak has occurred.
In stark contrast, Figure 8 clearly demonstrates the superiority of our framework. The “Predictions” curve (orange) maintains a high fidelity to the “True Values” (blue), successfully tracking both the fundamental trend and the abrupt high-frequency peaks. Crucially, the severe prediction lag prevalent in other models has been effectively eliminated. This indicates that the proposed model is the only one capable of accurately capturing both the timing and magnitude of peak load events. This also confirms that aggregated metrics like RMSE (Table 1) are insufficient for evaluating non-stationary series, as they inadvertently reward over-smoothed, useless predictions. The true advantage of this model lies in its successful resolution of the critical “prediction lag” problem by pre-stabilizing the signal via adaptive IRIME-VMD decomposition.

4.2. Comparative Study

To demonstrate the advantages of the proposed method over other optimization methods, IRIME-VMD was adopted for a horizontal comparison with GA-VMD, GWO-VMD, and PSO-VMD, respectively. The decomposed sequences were then input into the CDA-LSTNet network for subsequent load power prediction. Comparative analyses were conducted for Distribution Transformer Area A and Distribution Transformer Area B separately. The details of various error conditions and the comparison of prediction curves are presented as follows.
The data in Table 2 clearly indicates that the IRIME-VMD method proposed in this paper achieves the lowest prediction error in all test cases. Taking Substation A as an example, the RMSE of the IRIME-VMD model is 284.42. Compared to GA-VMD (330.956), GWO-VMD (327.172), and PSO-VMD (350.555), the error is significantly reduced by 14.1%, 13.1%, and 18.9%, respectively. This advantage is consistently reflected in Substation B and across all other error metrics, fully demonstrating the superior performance of the proposed model in terms of prediction accuracy and robustness.
This quantitative superiority finds intuitive visual confirmation in the prediction fitting curves of Figure 9. As shown in the figure, the true load curve (“Real”, blue dashed line) exhibits high volatility and non-stationarity, especially with the drastic instantaneous peaks appearing between Day 23 and Day 33. All baseline models (yellow, orange, green lines), when faced with these high-frequency abrupt changes, all demonstrate a severe “prediction lag” effect. Their prediction curves are overly smooth, failing to capture the true magnitude and precise timing of the peaks, which leads to severe deviation from the actual load during critical periods. In stark contrast, the IRIME-VMD model proposed in this paper (red solid line) exhibits extremely high tracking fidelity, its prediction curve closely adheres to the true load curve, successfully capturing the high-frequency fluctuations and peak dynamics, and effectively mitigating the lag phenomenon.
This significant difference in performance fundamentally stems from the parameter optimization mechanism of VMD. The traditional optimization algorithms (GA, GWO, PSO) used by the baseline models, when searching for the optimal VMD parameters (number of modes K and penalty factor α), due to the limitations of their search mechanisms, are highly prone to getting trapped in “local optima.” This leads to a suboptimal signal decomposition, making it difficult for the subsequent prediction network to learn the true load patterns, which inevitably results in prediction lag. As described in our methodology, the proposed IRIME algorithm innovatively introduces a “Gaussian-Mutation” strategy. This mechanism, by increasing population diversity, significantly enhances the algorithm’s global search capability, enabling it to effectively “escape” local optimal traps. Consequently, IRIME-VMD can find a superior parameter combination, achieving a “cleaner” and more thorough signal decomposition. This high-quality decomposition fundamentally enables the subsequent prediction network to model accurately, ultimately leading to a significant reduction in the lag effect and the achievement of high-accuracy prediction results.

5. Conclusions

This paper successfully designs and validates a novel adaptive hybrid short-term load forecasting framework (IRIME-VMD-CDA-LSTNet). The framework significantly enhances the prediction accuracy and fidelity for high-volatility loads by synergistically optimizing signal decomposition and a dynamic feature attention mechanism. The main conclusions are summarized as follows:
(1)
An improved Rime-Optimization Variational Mode Decomposition method (IRIME-VMD) is proposed. Unlike traditional optimizers (e.g., GA, GWO, PSO) that rely on experience or are prone to local optima, the Gaussian-Mutation strategy introduced in this paper significantly enhances the global search capability of the RIME algorithm. The comparative study (Table 2) strongly demonstrates this: compared to GA-VMD, GWO-VMD, and PSO-VMD, IRIME-VMD reduced the RMSE on Substation A by 14.1%, 13.1%, and 18.9%, respectively. This confirms that the method effectively avoids local optima and finds superior VMD parameters, thereby achieving a “cleaner” and more stable signal decomposition, which is the fundamental prerequisite for subsequent accurate prediction.
(2)
An LSTNet model optimized by a cross-dimensional attention mechanism (CDA-LSTNet) is constructed. Most attention mechanisms in existing research are limited to the temporal dimension, neglecting the fact that the contributions of different input features (such as various IMF components, temperature, humidity, etc.) are dynamically variable at different time steps. The Cross-Dimensional Attention (CDA) mechanism designed in this paper successfully addresses this issue. It enables the model to adaptively and dynamically assign weights to each feature dimension. This allows the model to intelligently amplify the influence of key driving factors (e.g., a specific high-frequency IMF component or an abrupt temperature change) while suppressing the interference from noise components, thereby significantly enhancing the model’s dynamic perception capabilities and robustness.
(3)
The proposed hybrid framework (IRIME-VMD-CDA-LSTNet), through the synergistic interaction of the two aforementioned innovations, effectively resolves the “prediction lag” problem commonly found in traditional models (including single deep learning models and baseline decomposition models). As demonstrated in the baseline comparison, baseline models consistently exhibit significant lag and amplitude underestimation at load peaks. In contrast, the model proposed in this paper, by leveraging its high-quality decomposition and intelligent feature fusion, achieves high-fidelity tracking of the true load with virtually no phase delay. This capability is of critical practical application value for the real-time dispatch and secure operation of power systems, far outweighing the importance of simple aggregated error metrics.

Author Contributions

Conceptualization, A.Z. and D.L.; methodology, A.Z. and D.L.; validation, A.Z. and D.L.; formal analysis, A.Z.; investigation, A.Z.; resources, A.Z. and D.L.; data curation, A.Z. and D.L.; writing—original draft preparation, A.Z.; writing—review and editing, A.Z. and D.L.; visualization, A.Z. and D.L.; supervision, D.L. and J.L.; project administration, D.L. and J.L.; funding acquisition, D.L. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Baur, L.; Ditschuneit, K.; Schambach, M.; Kaymakci, C.; Wollmann, T.; Sauer, A. Explainability and Interpretability in Electric Load Forecasting Using Machine Learning Techniques–A Review. Energy AI 2024, 16, 100358. [Google Scholar] [CrossRef]
  2. Wang, C.; Zhao, H.; Liu, Y.; Fan, G. Minute-level ultra-short-term power load forecasting based on time series data features. Appl. Energy 2024, 372, 123801. [Google Scholar] [CrossRef]
  3. Liu, Y.; Liao, J.; Guo, C.; Tan, Z.; Wang, Y.; Wei, N.; Zhou, N.; Song, Y. Fault Severity Classification Based Coordination Control Strategy of Fault Current Limiter and Modular Multilevel Converter for Adaptive Fault Current Limiting. J. Mod. Power Syst. Clean Energy 2025, 13, 1432–1443. [Google Scholar] [CrossRef]
  4. Liu, Y.; Liao, J.; Guo, C.; Tan, Z.; Wang, Q.; Wang, Y.; Zhou, N. Self-Adaptive Action and Parameter Optimization of DC Series-Parallel Power Flow Controller for Fault Current Limiting in Bipolar DC Distribution Systems. J. Mod. Power Syst. Clean Energy 2025, 13, 732–746. [Google Scholar] [CrossRef]
  5. Eren, Y.; Küçükdemiral, İ. A comprehensive review on deep learning approaches for short-term load forecasting. Renew. Sustain. Energy Rev. 2024, 189, 114031. [Google Scholar] [CrossRef]
  6. Ullah, K.; Ahsan, M.; Hasanat, S.M.; Haris, M.; Yousaf, H.; Raza, S.F.; Tandon, R.; Abid, S.; Ullah, Z. Short-Term Load Forecasting: A Comprehensive Review and Simulation Study With CNN-LSTM Hybrids Approach. IEEE Access 2024, 12, 111858–111881. [Google Scholar] [CrossRef]
  7. Wan, A.; Chang, Q.; AL-Bukhaiti, K.; He, J. Short-term power load forecasting for combined heat and power using CNN-LSTM enhanced by attention mechanism. Energy 2023, 282, 128274. [Google Scholar] [CrossRef]
  8. Shafiuzzaman, M.; Safayet Islam, M.; Rubaith Bashar, T.M.; Munem, M.; Nahiduzzaman, M.; Ahsan, M.; Haider, J. Enhanced very short-term load forecasting with multi-lag feature engineering and prophet-XGBoost-CatBoost architecture. Energy 2025, 335, 137981. [Google Scholar] [CrossRef]
  9. Nabavi, S.A.; Mohammadi, S.; Motlagh, N.H.; Tarkoma, S.; Geyer, P. Deep learning modeling in electricity load forecasting: Improved accuracy by combining DWT and LSTM. Energy Rep. 2024, 12, 2873–2900. [Google Scholar] [CrossRef]
  10. Yang, D.; Guo, J.; Li, Y.; Sun, S.; Wang, S. Short-term load forecasting with an improved dynamic decomposition-reconstruction-ensemble approach. Energy 2023, 263, 125609. [Google Scholar] [CrossRef]
  11. Xiao, Y.; Wu, S.; He, C.; Hu, Y.; Yi, M. An effective hybrid wind power forecasting model based on “decomposition-reconstruction-ensemble” strategy and wind resource matching. Sustain. Energy Grids Netw. 2024, 38, 101293. [Google Scholar] [CrossRef]
  12. Zhou, M.; Hu, T.; Bian, K.; Lai, W.; Hu, F.; Hamrani, O.; Zhu, Z. Short-Term Electric Load Forecasting Based on Variational Mode Decomposition and Grey Wolf Optimization. Energies 2021, 14, 4890. [Google Scholar] [CrossRef]
  13. Hu, T.; Zhou, M.; Bian, K.; Lai, W.; Zhu, Z. Short-Term Load Probabilistic Forecasting Based on Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise Reconstruction and Salp Swarm Algorithm. Energies 2022, 15, 147. [Google Scholar] [CrossRef]
  14. Yin, C.; Wei, N.; Wu, J.; Ruan, C.; Luo, X.; Zeng, F. An Empirical Mode Decomposition-Based Hybrid Model for Sub-Hourly Load Forecasting. Energies 2024, 17, 307. [Google Scholar] [CrossRef]
  15. Liu, W.; Bai, Y.; Yue, X.; Wang, R.; Song, Q. A wind speed forcasting model based on rime optimization based VMD and multi-headed self-attention-LSTM. Energy 2024, 294, 130726. [Google Scholar] [CrossRef]
  16. Mohamed, S.A.; Shaheen, A.M.; Alqahtani, M.H.; Al Faiya, B.M. Enhancement of rime algorithm using quadratic interpolation learning for parameters identification of photovoltaic models. Sci. Rep. 2025, 15, 21166. [Google Scholar] [CrossRef]
  17. Zhou, D.; Liu, Y.; Wang, X.; Wang, F.; Jia, Y. Combined ultra-short-term photovoltaic power prediction based on CEEMDAN decomposition and RIME optimized AM-TCN-BiLSTM. Energy 2025, 318, 134847. [Google Scholar] [CrossRef]
  18. Rubasinghe, O.; Zhang, X.; Chau, T.K.; Chow, Y.H.; Fernando, T.; Iu, H.H.-C. A Novel Sequence to Sequence Data Modelling Based CNN-LSTM Algorithm for Three Years Ahead Monthly Peak Load Forecasting. IEEE Trans. Power Syst. 2024, 39, 1932–1947. [Google Scholar] [CrossRef]
  19. Chu, X.; Jin, H.; Li, Y.; Feng, J.; Mu, W. CDA-LSTM: An evolutionary convolution-based dual-attention LSTM for univariate time series prediction. Neural Comput. Appl. 2021, 33, 16113–16137. [Google Scholar] [CrossRef]
  20. Ran, J.; Gong, Y.; Hu, Y.; Cai, J. EV load forecasting using a refined CNN-LSTM-AM. Electr. Power Syst. Res. 2025, 238, 111091. [Google Scholar] [CrossRef]
  21. Zhan, X.; Kou, L.; Xue, M.; Zhang, J.; Zhou, L. Reliable Long-Term Energy Load Trend Prediction Model for Smart Grid Using Hierarchical Decomposition Self-Attention Network. IEEE Trans. Reliab. 2023, 72, 609–621. [Google Scholar] [CrossRef]
  22. Guo, X.; Gao, Y.; Li, Y.; Zheng, D.; Shan, D. Short-term household load forecasting based on Long- and Short-term Time-series network. Energy Rep. 2021, 7, 58–64. [Google Scholar] [CrossRef]
  23. Lin, J.; Ma, J.; Zhu, J.; Cui, Y. Short-term load forecasting based on LSTM networks considering attention mechanism. Int. J. Electr. Power Energy Syst. 2022, 137, 107818. [Google Scholar] [CrossRef]
  24. Li, K.; Huang, W.; Hu, G.; Li, J. Ultra-short term power load forecasting based on CEEMDAN-SE and LSTM neural network. Energy Build. 2023, 279, 112666. [Google Scholar] [CrossRef]
  25. Feng, Z.; Zhang, X.; Quan, W.; Liu, X.; An, J.; Wang, C.; Ji, X.; Kang, L. A hybrid deep learning model based on Rime optimization and multi-head attention for cooling load prediction in public buildings. Energy 2025, 339, 139100. [Google Scholar] [CrossRef]
  26. Gu, G.; Lou, J.; Wan, H. A multi-strategy improved rime optimization algorithm for three-dimensional USV path planning and global optimization. Sci. Rep. 2024, 14, 12603. [Google Scholar] [CrossRef] [PubMed]
  27. Wang, Y.; Zhang, W.; Ma, Y.; Yu, Y.; Chen, H. An improved RIME optimization algorithm based maximum power point tracking method for photovoltaic system under partially shading condition. Sci. Rep. 2025, 15, 19507. [Google Scholar] [CrossRef]
  28. Shi, J.; Chen, Y.; Heidari, A.A.; Cai, Z.; Chen, H.; Chen, Y.; Liang, G. Environment random interaction of rime optimization with Nelder-Mead simplex for parameter estimation of photovoltaic models. Sci. Rep. 2024, 14, 15701. [Google Scholar] [CrossRef]
  29. He, X.; Kawaguchi, T.; Hashimoto, S. Intelligent Identification Method of Low Voltage AC Series Arc Fault Based on Using Residual Model and Rime Optimization Algorithm. Energies 2024, 17, 4675. [Google Scholar] [CrossRef]
  30. Zhao, Z.; Dai, Y.; Li, K.; Zhang, Z.; Fang, Y.; Chen, B.; Zhao, Q. Estimation of Lithium Battery State of Health Using Hybrid Deep Learning with Multi-Step Feature Engineering and Optimization Algorithm Integration. Energies 2025, 18, 5849. [Google Scholar] [CrossRef]
  31. Wang, W.; Zhang, M.; Zhang, Z.; Du, D.; Tang, Z. Enhancing Photovoltaic Power Forecasting via Dual Signal Decomposition and an Optimized Hybrid Deep Learning Framework. Energies 2025, 18, 6159. [Google Scholar] [CrossRef]
  32. Li, K.; Yuan, L.; Qian, F.; Song, L.; Wu, X.; Wang, L.; Dai, J.; Shen, L. Short-Term Load Forecasting for Electricity Spot Markets Across Different Seasons Based on a Hybrid VMD-LSTM-Random Forest Model. Energies 2025, 18, 6097. [Google Scholar] [CrossRef]
Figure 1. The Overall Architecture of the Proposed Framework.
Figure 1. The Overall Architecture of the Proposed Framework.
Energies 19 00497 g001
Figure 2. Flowchart of IRIME-VMD Optimization.
Figure 2. Flowchart of IRIME-VMD Optimization.
Energies 19 00497 g002
Figure 3. Load profiles diagram.
Figure 3. Load profiles diagram.
Energies 19 00497 g003
Figure 4. Daily average temperature and relative humidity profiles during the sampling period.
Figure 4. Daily average temperature and relative humidity profiles during the sampling period.
Energies 19 00497 g004
Figure 5. SVR prediction effect. (a) Substation A; (b) Substation B; (c) Substation C; (d) Substation D.
Figure 5. SVR prediction effect. (a) Substation A; (b) Substation B; (c) Substation C; (d) Substation D.
Energies 19 00497 g005
Figure 6. Random forest prediction effect. (a) Substation A; (b) Substation B; (c) Substation C; (d) Substation D.
Figure 6. Random forest prediction effect. (a) Substation A; (b) Substation B; (c) Substation C; (d) Substation D.
Energies 19 00497 g006
Figure 7. ARIMA prediction effect. (a) Substation A; (b) Substation B; (c) Substation C; (d) Substation D.
Figure 7. ARIMA prediction effect. (a) Substation A; (b) Substation B; (c) Substation C; (d) Substation D.
Energies 19 00497 g007aEnergies 19 00497 g007b
Figure 8. The proposed method. (a) Substation A; (b) Substation B; (c) Substation C; (d) Substation D.
Figure 8. The proposed method. (a) Substation A; (b) Substation B; (c) Substation C; (d) Substation D.
Energies 19 00497 g008aEnergies 19 00497 g008b
Figure 9. Comparative analysis of different prediction methods for Substations A and B.
Figure 9. Comparative analysis of different prediction methods for Substations A and B.
Energies 19 00497 g009
Table 1. Evaluation metrics of forecasting performance for different methods on each distribution transformer.
Table 1. Evaluation metrics of forecasting performance for different methods on each distribution transformer.
SubstationMetricLSTMGRUSVRRandom ForestARIMAProposed Method
ARMSE299.8306.92261.14519.6335.75284.42
MAE169.56159.07207.4110.6992.59177.618
MAPE (%)27.4225.5950.212.8739.2730.13
BRMSE315.05304.18277.1320.62267.1285.39
MAE171.48171.8172.52189.38166.9170.44
MAPE (%)18.7418.1417.74219.9736.0431.69
CRMSE882.48975.9655.4607.6586.1883.75
MAE483.44521.1462.65287.31305.22605.9
MAPE (%)23.4626.826.3810.9139.3735.62
DRMSE1110.811128.7771.5540.1485.1794.8
MAE739.48683.7680.77401.6364.5506.1
MAPE (%)18.8117.6321.8510.5625.9514.6
Table 2. Prediction Errors of Different Methods for Substations A and B.
Table 2. Prediction Errors of Different Methods for Substations A and B.
SubstationMetricGA-VMDGWO-VMDPSO-VMDIRIME-VMD
ARMSE330.956327.172350.555284.42
MAE214.250212.783219.336177.618
MAPE (%)36.84736.77037.01630.13
BRMSE317.199327.172350.34285.39
MAE176.201212.783193.34170.44
MAPE (%)31.59936.77033.96431.69
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, A.; Liu, D.; Liao, J. An Adaptive Hybrid Short-Term Load Forecasting Framework Based on Improved Rime Optimization Variational Mode Decomposition and Cross-Dimensional Attention. Energies 2026, 19, 497. https://doi.org/10.3390/en19020497

AMA Style

Zhang A, Liu D, Liao J. An Adaptive Hybrid Short-Term Load Forecasting Framework Based on Improved Rime Optimization Variational Mode Decomposition and Cross-Dimensional Attention. Energies. 2026; 19(2):497. https://doi.org/10.3390/en19020497

Chicago/Turabian Style

Zhang, Aodi, Daobing Liu, and Jianquan Liao. 2026. "An Adaptive Hybrid Short-Term Load Forecasting Framework Based on Improved Rime Optimization Variational Mode Decomposition and Cross-Dimensional Attention" Energies 19, no. 2: 497. https://doi.org/10.3390/en19020497

APA Style

Zhang, A., Liu, D., & Liao, J. (2026). An Adaptive Hybrid Short-Term Load Forecasting Framework Based on Improved Rime Optimization Variational Mode Decomposition and Cross-Dimensional Attention. Energies, 19(2), 497. https://doi.org/10.3390/en19020497

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop