1. Introduction
Amid the ongoing liberalization and deregulation of the power sector, accurate electricity price forecasting has become a fundamental requirement for effective energy management and market operations [
1]. Electricity prices in competitive markets tend to fluctuate sharply and display complex temporal patterns. They are characterized by high-frequency movements, non-stationary means and variances, pronounced daily and weekly seasonality, calendar effects associated with weekends and holidays, and recurrent price spikes [
2]. Furthermore, structural challenges such as transmission bottlenecks, real-time supply–demand imbalances, and the growing share of renewable energy sources can trigger brief but sharp price spikes. These complex dynamics increase the complexity of forecasting models and heighten the risk of prediction errors, which can significantly affect power plant operators, market decision-makers, and participants [
3]. Accurate electricity price forecasting is therefore critical for energy suppliers, market operators, and participants. It enables managers to analyze future electricity demand and develop long-term energy plans while supporting market participants in risk management and purchasing decisions. For governments, it provides an essential basis for energy policy formulation; for suppliers, it assists in designing optimal bidding strategies; and for consumers, it helps them plan purchases to maximize value and minimize costs. As such, achieving precise short-term electricity price forecasts and mitigating market risks from price uncertainty remain central to current research efforts.
Mainstream electricity price forecasting methods can be categorized into three types (1) econometric models, (2) machine learning models, and (3) hybrid models. In the early stage of research, both domestic and international research predominantly relied on mathematical or econometric models. Among these, the autoregressive integrated moving average (ARIMA) model has been widely applied in electricity price forecasting. For instance, Contreras et al. developed an ARIMA model to forecast prices in the Spanish and Californian electricity markets. ARIMA combines two core components, autoregression (AR) and moving average (MA), and achieves data smoothing through differencing operations. Although ARIMA provides good interpretability, it often exhibits relatively large mean percentage errors, with daily average errors ranging from 4% to 10%, which are generally higher than those of machine learning models [
4]. To address the limitations of ARIMA model under high volatility, Zhou et al. introduced an interval forecasting approach for hourly market clearing price (MCP) prediction in the California electricity market [
5]. In contrast, Girish applied the Generalized Autoregressive Conditional Heteroskedastic (GARCH) based model to forecast hourly electricity prices in India. GARCH models assume that the moments of a time series are non-constant, implying that the error term, defined as the difference between observed and predicted values, may not have a zero mean or constant variance, unlike ARIMA models. Instead, the error term is treated as serially correlated and modeled using an AR process. This makes GARCH models well-suited for capturing implied volatility in time series data, particularly during price spikes. GARCH generally outperforms ARIMA in forecasting, except during periods of low volatility [
6]. Despite the strong performance of time series techniques like GARCH, their reliance on linear modeling limits their ability to predict highly nonlinear behaviors and sudden price fluctuations [
7]. As a result, researchers have increasingly adopted machine learning methods which have advantages in fitting nonlinear features.
Among machine learning techniques, neural network models have been extensively applied to electricity price forecasting. Compared to traditional statistical approaches, neural networks exhibit superior nonlinear fitting capabilities and adaptability, making them particularly advantageous for handling high-volatility price forecasting. In recent years, artificial neural networks (ANNs) have gained attention for their ability to capture complex nonlinear relationships between input and output datasets [
8]. For example, Li and Guo utilized a back propagation (BP) neural network combined with system marginal price (SMP) and dynamic clustering weights to forecast historical marginal price data from the PJM Interconnection (PJM) electricity market in the United States [
9]. BP networks can approximate any function with finite discontinuities, given a sufficient number of neurons in the hidden layer. Building on this, Tang and Gu employed backpropagation to update weights and biases. However, BP networks are highly sensitive to noise, requiring rigorous data preprocessing [
10]. In contrast, Extreme Learning Machines (ELMs), with random initialization and minimal parameter optimization, are somewhat more robust to noise, making them popular in electricity price forecasting. Shrivastava et al. applied the ELM model to data from the Ontario and PJM markets, demonstrating higher generalization capability and significantly faster processing speeds compared to traditional neural network algorithms [
11]. Nevertheless, as a feedforward neural network, ELM struggles to capture temporal dependencies.
The advancement of deep learning (DL) in electricity price forecasting has provided researchers with innovative approaches for time series forecasting. Among DL models, LSTM demonstrates excellent performance in handling nonlinear complex problems and time series data [
12], while CNN excels in high-dimensional feature extraction [
13], making these models widely applicable. Cantillo-Luna et al. employed a stacked LSTM combined with a time2vec layer to forecast electricity prices up to 8 h ahead, outperforming advanced statistical models like SARIMA and Holt-Winters [
14]. Similarly, Abedinia et al. utilized CNN-based approaches to forecast electricity prices in the PJM and mainland Spain markets, achieving favorable results [
15]. However, CNNs face limitations in capturing global information and long-distance dependencies. To address this, Shejul et al. integrated CNN and LSTM models for short-term electricity price forecasting, yielding improved forecasting accuracy [
16]. Additionally, Pourdaryaei et al. applied a CNN algorithm augmented with an attention mechanism to simulate the Ontario electricity market in 2020, achieving superior results [
17]. LSTM stands out in capturing patterns and long-term dependencies in nonlinear time series data, while CNN is effective at identifying local features. The squeeze-and-excitation (SE) attention mechanism further enhances CNN by adaptively weighting each channel, enabling it to focus on globally important features. This integrated framework substantially improves forecasting accuracy and overall performance.
In forecasting research, it is widely acknowledged that no single method excels in all scenarios. To enhance forecasting accuracy, integrated models have become a common approach in electricity price forecasting. These models frequently incorporate (1) decomposition modules and (2) optimization modules to improve robustness. Decomposition algorithms, by mitigating the non-stationary characteristics of time series data, are extensively utilized in hybrid methods for electricity price forecasting [
18]. A key advantage of the EMD series decomposition method is its inherent capability to handle non-stationary and nonlinear processes [
19,
20]. Afanasyev et al. applied CEEMDAN in power price forecasting, demonstrating superior performance compared to standard EMD and wavelet decomposition methods [
21]. Another critical factor impacting the performance of deep learning models is improper hyperparameter tuning. To address this, optimization algorithms are increasingly integrated into forecasting models [
22,
23]. These algorithms enable the identification of optimal model parameters, thereby enhancing both forecasting accuracy and model robustness. Literature research in the field of electricity price forecasting is listed in
Table 1.
Short-term electricity price forecasting is pivotal to stable power market operation, yet it remains a formidable task due to inherent price volatility and intricate influencing factors. Conventional econometric models and standalone machine learning approaches often fall short in two key areas: effectively extracting critical trend information from raw data and optimizing model hyperparameters, which ultimately restricts their forecasting accuracy. To address these gaps, this paper employs a decomposition -reconstruction -optimization -integration (DROI) framework to address the challenge of electricity price fluctuations. CEEMDAN is applied for data decomposition, with the resulting modules reconstructed using the t-test on the original sequence. After consolidating the decomposed frequency data, the CNN-SE Attention-LSTM (CSL) is utilized to weight and amplify trend information, enhancing feature extraction and improving forecasting accuracy. ISSA is incorporated for global optimization of the data model, determining optimal parameters such as the learning rate, L2 regularization coefficient, and the number of neurons in the hidden layer. Additionally, a residual test ensures comprehensive extraction of trend components. Both interval forecasting and deterministic forecasting methods are applied to compare the performance of econometric models with standalone machine learning predictive models. Results from six experiments demonstrate that the CEEMDAN-ISSA-CNN-SE Attention-LSTM (CSCSL) hybrid model delivers superior performance in short-term electricity price forecasting.
The main innovations of this study are summarized as follows:
(1) To mitigate errors stemming from human factors, the ISSA optimization algorithm is introduced to determine key model parameters, including the initial learning rate, regularization coefficient, and the number of neurons in the hidden layer. These parameters are optimized by minimizing the RMSE in the forecasting model. Furthermore, Lévy flight is incorporated to avoid local optima and achieve a global search, significantly enhancing the model’s stability. This approach not only improves the model’s forecasting performance but also provides a robust foundation for future parameter selection.
(2) To enhance the feature extraction capability of the model, an attention mechanism is incorporated, leading to the development of the convolution-time-loop hybrid deep learning model (CSL) based on the attention mechanism for forecasting. In the CSL forecasting model, CNN provides local perception and parameter sharing, effectively capturing local patterns in input sequences and reducing model complexity. SE Attention assigns weights to different parts of the sequence based on contextual information, accounting for correlations within the input data. LSTM, as a core component, captures long-term dependencies in sequences and demonstrates memory capabilities. This method excels in nonlinear fitting and is particularly suited for short-term electricity price forecasting tasks with inherent volatility.
(3) To improve the nonlinear fitting ability of the forecasting model, the decomposition and reconstruction methods are introduced for electricity price data processing. CEEMDAN is used to decompose the data into multiple modes, automatically adjusting parameters based on signal characteristics to effectively eliminate noise. Subsequently, the model is reconstructed according to the data’s frequency states using an independent sample t-test. This approach enhance the model’s ability to process complex features. Using this method, a single time series is decomposed into three frequency states: high, medium, and low frequency. Predicting each frequency state separately enhances the model’s forecasting performance and reduces forecasting error.
(4) To increase the interpretability of the model, the Chow test is employed to partition the dataset in conjunction with economic events. Discontinuity points are identified based on significant socio-economic events, and the Chow test is applied to confirm the presence of structural breaks. By conducting these experiments, the time series data is segmented into multiple datasets, aiding in the evaluation of the model’s adaptability to different scenarios and its robustness. This approach provides valuable insights and serves as a reference for future model selection and improvement.
The remainder of this paper is organized as follows. In
Section 2, the basic principles of the proposed model and correction method are introduced, respectively. In
Section 3, the preparation of the experiment is introduced, including dataset reconstruction, benchmark model, evaluation index, model parameters, dataset division and experiment arrangement. In
Section 4 and
Section 5, the forecasting performance and robustness of the model are studied from two perspectives. In
Section 6, the experimental results are summarized and future research prospects are discussed.
2. Methodology
This section begins with an introduction to the basic principles of the decomposition method, followed by the optimization algorithm and forecasting model (with relevant formulas provided in the
Appendix A and
Appendix B).
2.1. The Decomposition Method
For power price data characterized by high volatility, the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) can effectively decompose the original highly fluctuating data into multiple smoother components [
26]. This method combines the modal information of each component and employs econometric
t-tests to reconstruct the component characteristics. This process aims to minimize data variability and suddenness, thereby reducing reconstruction errors and enhancing forecasting accuracy, as illustrated in
Appendix A and
Appendix B. CEEMDAN is an extension of the Integrated Empirical Mode Decomposition (EEMD) and the Empirical Mode Decomposition (EMD) algorithms. It effectively addresses the mode overlap issue present in EEMD [
27]. The specific steps involved in the CEEMDAN method are outlined below:
Step 1: Data preparation: The input signal is first normalized by dividing it by its standard deviation. This step ensures consistency in the amplitude range of the signal, facilitating more uniform processing in subsequent steps.
Step 2: White noise mode decomposition: Gaussian white noise is added to the signal to generate a new composite signal. EMD is then applied to each iteration of white noise, yielding intrinsic mode functions (IMFs). The first mode is extracted from the decomposed signal, and the residual component is subsequently calculated.
Step 3: Extract residual modes: Residual components are calculated, and the process is repeated using the first-order modal component in place of the original data. This iterative procedure continues until no significant IMF signals remain in the residual. Ultimately, the original signal is fully decomposed into its IMFs.
The primary advantage of this method lies in its ability to account for noise during the signal decomposition process. Additionally, it can adaptively regulate both the noise level and the number of IMFs in the signal.
2.2. The Optimization Algorithm ISSA
In the context of forecasting volatile power prices, selecting key model parameters is crucial before the data is fed into the forecasting model. The ISSA optimization algorithm serves as an effective approach for determining model parameters that minimize the Root Mean Square Error (RMSE) through a global search process. The optimization algorithm builds upon the Sparrow Search Algorithm (SSA) and incorporates an Adaptive Spiral Fly Bird Search Algorithm (ASFSSA) to address challenges such as local optimality and high randomness [
28,
29]. The ISSA optimization process is outlined as follows:
Step 1: Random variable chaotic diffusion initialization: Population initialization is systematically designed to enhance the algorithm’s controllability.
Step 2: Sparrow food search process: An adaptive weighting is introduced in this process. During the initial stage of the algorithm, this weighting mechanism reduces the impact of random initialization while balancing the subsequent Lévy flight mechanism, thereby enhancing both local and global search capabilities of the algorithm.
Step 3: Lévy flight mechanism: Inspired by the foraging behavior of sparrows, this mechanism demonstrates strong exploratory capabilities, reducing the likelihood of convergence to local optima.
Step 4: Variable spiral search strategy: During the follower position update process, the spiral parameter z cannot remain fixed. This constraint causes the search method to be monotonous and increases the risk of converging to a local optimum, thereby diminishing the algorithm’s search capability. Acting as an adaptive variable, the parameter z dynamically modifies the spiral search trajectory of the follower. This change enables the follower to explore unknown areas more effectively, boosting both search efficiency and the algorithm’s global optimization capability. The variable spiral position update strategy is outlined in the formula below:
Step 5: Iterative optimization: Through continuous iteration, the optimal model parameters are found to make the forecasting effect best
2.3. The Forecasting Model
The forecasting model primarily consists of three modules: CNN, SE Attention and LSTM.
(1) Convolutional Neural Networks
CNNs, inspired by biological processes, are a powerful deep learning tool widely used for classification tasks. The primary components of CNNs include the convolutional layer, pooling layer, and fully connected layer. In time series forecasting, the convolutional layer effectively captures local patterns and features in data, reduces network parameters through the sharing of convolutional kernels, mitigates the risk of overfitting, and enables processing of longer time series data [
30].
Specifically, the convolutional layer consists of a rectangular grid of neurons, with its input or preceding layer also arranged in a similar grid. Neurons within a given rectangular region share the same weights, representing the convolutional filter. The pooling layer, in turn, extracts small rectangular sections from the convolutional layer and applies sampling operations, such as average or maximum pooling, to produce a single output for each section. Lastly, the fully connected layer takes all neurons from the previous layers and connects each to every neuron within the layer, enabling the integration of high-level features.
(2) Squeeze-and-Excitation Attention
Originating from image processing, the concept of attention draws inspiration from the human brain’s ability to selectively concentrate on relevant stimuli. By assigning variable weights to features, attention mechanisms emphasize essential information while diminishing the significance of less relevant details [
31].
SE Attention, short for Squeeze-and-Excitation, is an attention mechanism designed to enhance CNNs. Its core principle involves incorporating a global attention mechanism to adaptively learn the importance of individual channels [
32], as illustrated in
Figure 1.
The SE module applies a filter to weight the input data, enhancing the learning of convolutional features. This process improves the network’s sensitivity to informative features, thereby boosting its performance.
(3) Long Short-Term Memory Networks
LSTM networks, equipped with a forget gate, effectively address the challenge of handling continuous time series input [
33]. This model has found extensive applications in electricity price forecasting. The LSTM methodology is outlined as follows:
Step 1: Forget gate: The forget gate, a crucial part of LSTM, decides what information to retain or remove from the cell state at time t − 1. The system processes inputs from
ht−1, the previous hidden state, and the current input sequence
at time t, yielding an output value ranging from 0 to 1. A value of 1 represents full retention of previous information, whereas a value of 0 signifies complete removal of prior cell state data. This mechanism enables LSTM to effectively manage information flow, reducing the likelihood of losing crucial data in time series analysis.
where
represents the state of the forget gate at time t. In the LSTM cell, the forget gate is governed by a weight parameter
, a bias term
, and an S-shaped activation function
, which collectively regulate the flow of information. These components jointly determine whether to keep or discard data from the prior time step when handling new inputs.
Step 2: Signal Storage: With every time increment, the input gate of the LSTM receives new information
and determines which information to retain in the cell state. The hyperbolic tangent layer is then used to calculate the current state of the storage unit
. By using the intermediate value, the cell state is updated, producing a new cell state.
Furthermore, the cell state is parameterized by the weight matrix and the bias term . The hyperbolic tangent activation function tanh is utilized, along with the Hadamard product .
Step 3: Information Transmission: At time t, the input gate state, denoted as
, controls the transfer of information from
to
. The associated weight matrix and bias term are represented by
and
, respectively. Furthermore, the cell state is parameterized by the weight matrix
and the bias term
. The hyperbolic tangent activation function tanh is utilized, along with the Hadamard product
. In an LSTM network, the output gate selects the relevant information from the current cell state to produce the final output. Governed by a sigmoid layer, this gate selects specific segments of the memory cell state for export, and a tanh layer refines the chosen information to produce the output
.
where
signifies the hidden state at the current time instance. At time t, the state of the output gate, represented by
, is calculated using the weight matrix
and bias
.
2.4. Procedures and Frame for the Proposed Model
To accurately predict electricity price data, a novel DROI framework is proposed. The CEEMDAN method is utilized to adaptively decompose time series data into multiple components, which are reconstructed using an econometric
t-test to generate data with three distinct frequency states. These three frequency components are processed by CNN to extract trend features, where the CNN channels are weighted by SE Attention, and LSTM is employed to handle time-series dependencies for enhanced forecasting accuracy. To optimize the model’s initial parameters, ISSA is used to find the coefficients that minimize RMSE, thereby improving model interpretability. The overall framework of the proposed method is depicted in
Figure 2, with the detailed implementation steps outlined below.
6. Conclusions
Accurate electricity price forecasting is critical in highly competitive markets for plant operators, market operators, and participants. This study focuses on addressing the challenges of short-term electricity price forecasting, using the Spanish electricity market as a case study. The market data is divided into multiple subsets based on economic events and breakpoint tests. Decomposition and reconstruction techniques are then employed to segment the data into modal datasets, with ISSA determining the optimal parameters. For each frequency state dataset, CNN is utilized for feature extraction, enhanced by SE Attention for channel weighting, and LSTM is applied for time series forecasting. Compared with other comparative methods, the average Mean Absolute Percentage Error (MAPE) of this method is reduced by 46.01%. This can effectively help electricity retailers and industrial users alleviate the “cost mismatch risk” while assisting power generation companies in optimizing resource allocation, thereby enhancing the overall profit level of the industry.
The benchmark suite in this study comprises one econometric model, six machine learning models, seven hybrid models, and naive models designed to simulate daily decision-making. The forecasting performance and robustness of the proposed model are evaluated using both fluctuating and non-fluctuating datasets. Additionally, the study accounts for seasonal factors and performs a detailed analysis using seasonal models. The experimental results reveal that: (1) the proposed model achieves superior forecasting performance compared to other benchmarks; (2) the decomposition and reconstruction modules contribute to significant model optimization; (3) the proposed model outperforms Naive methods and demonstrates promising application potential. Future research could focus on: (1) Performance of economic analyses on each decomposed module to assess the impact of major events or economic policies; and (2) Integration of error correction mechanisms for datasets with pronounced volatility to improve forecasting accuracy.
However, this study has several potential limitations that require further exploration. First, the case study is confined to the Spanish electricity market, and its applicability to other markets with different characteristics (e.g., varying regulatory frameworks or energy mix structures) remains untested. Second, while the study accounted for the impact of economic events on electricity prices and conducted breakpoint tests, it did not delve into the underlying political and economic factors or the specific mechanisms through which they exert influence.
Building on the aforementioned limitations and findings, future research can be advanced in the following aspects: (1) Expand the data scope to include multi-regional and multi-type electricity markets, and incorporate cross-market comparative analysis to enhance the model’s generalization ability. (2) Integrate econometric analysis into the decomposition module, and introduce dummy variables to quantify the impact of major economic events (such as carbon policy adjustments and international natural gas price shocks) on each modal dataset. This will help establish linkages between economic mechanisms and model features, thereby improving the interpretability of forecasting results.