Short-Term Photovoltaic Power Forecasting Based on ICEEMDAN-TCN-BiLSTM-MHA

Li, Yuan; Zhai, Shiming; Yi, Guoyang; Pang, Shaoyun; Luo, Xu

doi:10.3390/sym17101599

Open AccessArticle

Short-Term Photovoltaic Power Forecasting Based on ICEEMDAN-TCN-BiLSTM-MHA

by

Yuan Li

¹,

Shiming Zhai

^1,*,

Guoyang Yi

²,

Shaoyun Pang

² and

Xu Luo

²

¹

The College of Civil Engineering and Transportation, Qinghai University of Nationalities, Xining 810007, China

²

The State Key Laboratory of Coastal and Offshore Engineering, Dalian University of Technology, Dalian 116024, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(10), 1599; https://doi.org/10.3390/sym17101599

Submission received: 21 August 2025 / Revised: 14 September 2025 / Accepted: 22 September 2025 / Published: 25 September 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

In this paper, an efficient hybrid photovoltaic (PV) power forecasting model is proposed to enhance the stability and accuracy of PV power prediction under typical weather conditions. First, the Improved Complementary Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN) is employed to decompose both meteorological features affecting PV power and the power output itself into intrinsic mode functions. This process enhances the stationarity and noise robustness of input data while reducing the computational complexity of subsequent model processing. To enhance the detail-capturing capability of the Bidirectional Long Short-Term Memory (BiLSTM) model and improve its dynamic response speed and prediction accuracy under abrupt irradiance fluctuations, we integrate a Temporal Convolutional Network (TCN) into the BiLSTM architecture. Finally, a Multi-head Self-Attention (MHA) mechanism is employed to dynamically weight multivariate meteorological features, enhancing the model’s adaptive focus on key meteorological factors while suppressing noise interference. The results show that the ICEEMDAN-TCN-BiLSTM-MHA combined model reduces the Mean Absolute Percentage Error (MAPE) by 78.46% and 78.59% compared to the BiLSTM model in sunny and cloudy scenarios, respectively, and by 58.44% in rainy scenarios. This validates the accuracy and stability of the ICEEMDAN-TCN-BiLSTM-MHA combined model, demonstrating its application potential and promotional value in the field of PV power forecasting.

Keywords:

power forecasting; solar energy generation; deep learning; improved complementary ensemble empirical mode decomposition with adaptive noise; Bidirectional Long Short-Term Memory

1. Introduction

The world is gradually shifting toward clean, renewable energy sources for power generation and systematically restructuring its energy mix. Increasing the share of renewable energy in the energy mix is crucial to building an ecologically sustainable power supply system. Photovoltaic (PV) power generation has become an important technical approach for constructing low-carbon power systems and optimizing energy structures due to its advantages of low operating costs, renewability, and modular deployment [1,2,3]. However, PV power generation systems exhibit significant randomness, volatility, and intermittency due to the influence of various meteorological parameters such as ambient temperature, component temperature, solar irradiance, humidity, and wind speed, significantly increasing the complexity of power prediction. In the context of large-scale grid connection, power prediction errors can easily lead to imbalances between power supply and demand, threatening system safety and stability [4]. Therefore, obtaining reliable PV power generation information is of great significance for optimizing energy dispatch decisions and ensuring the reliable operation of the power grid.

Currently, the main technologies in the field of PV power prediction can be divided into three categories: physical methods, statistical methods, and artificial intelligence models [5,6,7]. Physical methods construct physical models and atmospheric radiation transfer equations, combining weather forecast data to simulate the PV conversion process and achieve accurate prediction of PV power. However, the prediction accuracy of this physical method exhibits significant dependency: its accuracy is strictly constrained by the precision of PV system design parameters and the reliability of weather forecast data [8,9,10]. Compared with the limitations of physical methods, statistical methods construct prediction models by exploring statistical correlations between historical power data and meteorological parameters, thereby avoiding the need to construct complex physical equations [11]. While statistical methods have made significant progress in PV power forecasting, their performance remains limited under irradiance fluctuation scenarios due to the inherent intermittency and instability of PV generation, particularly when relying on simple irradiance-power mapping relationships. In recent years, artificial intelligence models have significantly improved the unstable prediction performance of traditional statistical methods caused by intense fluctuations in irradiance due to their excellent nonlinear modeling capabilities and multiscale feature extraction mechanisms. They have been widely applied in PV power prediction tasks [12,13,14]. Artificial intelligence models, particularly Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN) and their variants, have demonstrated superior capability in capturing complex spatiotemporal dependencies between meteorological features and power generation, significantly improving the reliability and stability of multi-timescale forecasting [15,16,17].

While numerous PV power forecasting methods and strategies have been explored in existing studies [18,19,20], AI-driven models have emerged as a research focus owing to their superior prediction accuracy and robust generalization capabilities across diverse operational conditions. For example, Min Kyeong Park et al. [21] used an LSTM model to predict PV power, avoiding the problems of gradient vanishing or gradient explosion caused by excessive sequence length in traditional Recurrent Neural Network (RNN) models. Ze Wu et al. [22] compared the performance of the Informer model and the LSTM model in short-term PV power forecasting tasks. The results showed that the Informer model effectively addressed the limitation of the LSTM model in focusing on key information when processing long sequence data by introducing multi-head self-attention (MHA). Jiahui Wang et al. [23] introduced a CNN model to predict PV power under three typical weather conditions, and validated the generalization and robustness of the CNN model through comparative analysis with other models. Although artificial intelligence models had made breakthrough progress in the field of PV power prediction, single model architectures still faced core challenges such as insufficient generalization capabilities and unstable performance in extreme weather prediction.

With the development of computer technology, combination models constructed using computer technology have been widely applied in PV power prediction research, especially combination models of machine learning and deep learning [24]. For example, Keyong Hu et al. [25] proposed a CNN–Transformer hybrid model that leverages CNN’s local feature extraction, parameter sharing, and hierarchical compression capabilities to compensate for Transformer’s limitations in capturing fine-grained details and computational efficiency, demonstrating superior PV power forecasting performance compared to standalone LSTM or Transformer models. Jibo Wang et al. [26] developed a dedicated short-term forecasting method for PV clusters, where the Adaptive Multi-Objective Differential Evolution (AMODE) algorithm optimizes Bidirectional Long Short-Term Memory (BiLSTM) parameters, significantly enhancing its capability to handle long-range dependencies and complex multi-feature time series tasks. Jie Meng et al. [27] proposed a hybrid BiLSTM-Attention-ISOA forecasting model, where the attention mechanism addresses BiLSTM’s limitation in capturing long-sequence dependencies, while the Improved Sea-horse Optimization Algorithm (ISOA) enhances model performance through hyperparameter optimization. Following a comparative analysis between Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) and Variational Mode Decomposition (VMD), Wenbo Zhao et al. [28] developed a hybrid VMD-IDBO-KELM forecasting model, where the VMD reduces noise interference in historical data while the Improved Dung Beetle Optimizer (IDBO) optimizes kernel and regularization parameters in the Kernel Extreme Learning Machine (KELM), collectively enhancing prediction accuracy. However, although CEEMDAN and VMD effectively reduce the interference of noise in the data on optimal prediction results, certain issues remain in their practical applications [29,30,31]. CEEMDAN may produce some spurious modes in the early stages of decomposition, which can adversely affect the predictive performance of the model. In contrast, although VMD does not suffer from the aforementioned issue, the quality of its decomposition is susceptible to noise—strong noise can interfere with the convergence of the algorithm, potentially exacerbating mode mixing. In comparison, Improved Complementary Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN) effectively avoids the problem of generating spurious modes in the early stages by improving the noise introduction strategy; meanwhile, its strong adaptability enhances its robustness to noise compared to VMD, enabling cleaner modal separation in high-noise environments and thereby providing higher-fidelity components for predictive models. Existing research primarily adopts hybrid approaches that combine signal decomposition with deep learning models, or integrate deep learning architectures with attention mechanisms as shown in Table 1. However, these methods often focus solely on mitigating the impact of data noise on prediction accuracy, or fail to adequately capture fine-grained features, thereby limiting model performance under extreme weather conditions [32,33,34]. To address these limitations, the proposed ICEEMDAN-TCN-BiLSTM-MHA hybrid model employs ICEEMDAN to effectively suppress non-stationarity in input data, thereby supplying higher-quality components for subsequent forecasting. Furthermore, the integrated TCN-BiLSTM architecture enhanced with multi-head attention mechanism simultaneously improves local feature extraction, long-range dependency modeling, and key information highlighting, leading to enhanced prediction accuracy and stability in challenging scenarios such as extreme weather.

To address the aforementioned challenges in PV power forecasting, this study proposes a novel ICEEMDAN-TCN-BiLSTM-MHA hybrid model that systematically integrates signal decomposition, temporal feature extraction, and adaptive weighting mechanisms. The framework first employs ICEEMDAN to decompose historical PV power and meteorological data into intrinsic mode functions, enhancing data stationarity and noise robustness. Building upon this preprocessing, the model architecture combines three key components: (1) a Temporal Convolutional Network (TCN) with dilated causal convolutions to capture long-range temporal patterns while preserving local details, (2) a Bidirectional LSTM (BiLSTM) layer that processes both forward and backward temporal dependencies through its gating mechanisms for comprehensive feature learning, and (3) a Multi-head Attention (MHA) module that dynamically weights meteorological features to suppress noise interference and focus on critical weather determinants. This integrated design not only addresses the limitations of individual components (e.g., BiLSTM’s difficulty in processing long sequences and the TCN’s potential information loss) but also creates synergistic effects that significantly improve prediction accuracy, particularly under challenging weather transitions. Finally, to validate the accuracy and stability of the proposed model, a dataset from a PV power plant in Qinghai Province, China, was selected for comparative experiments using different algorithms to evaluate the predictive performance of the ICEEMDAN-TCN-BiLSTM-MHA combined model.

The structure of this paper is organized as follows: The theoretical methods employed in this study are described in Section 2. The proposed PV power forecasting methodology is presented in Section 3. The experimental results and corresponding analysis are discussed in Section 4. Finally, the main conclusions are summarized in Section 5.

2. Theoretical Method

2.1. ICEEMDAN

ICEEMDAN is a signal processing technique widely used in signal decomposition and data processing [35,36]. As an improved version of Empirical Mode Decomposition (EMD), ICEEMDAN’s unique noise amplitude adaptive attenuation and residual-guided fully integrated architecture effectively avoid the problem of inaccurate decomposition results caused by traditional EMD mode aliasing, thereby significantly improving the reliability and accuracy of signal decomposition [37]. The main operations for implementing this technology are as follows:

(1): Add Gaussian white noise to the original input time series to broaden the signal bandwidth, and calculate the $i$ signal $x_{i} (t)$ to be decomposed.

$x_{i} (t) = x (t) + ε_{0} E_{1} (w_{i} (t))$

(1)

where $x_{i} (t)$ and $x (t)$ represent the $i$ -th signal and the original signal, respectively, $ε_{0}$ represents the noise standard deviation of the signal during the first decomposition, $w_{i} (t)$ represents the $i$ -th Gaussian white noise with zero mean and unit variance, and $E (\cdot)$ represents the first $I M F$ obtained after decomposition.
(2): Calculate the mean value of the first round of empirical mode decomposition (EMD) results to obtain the first residual component $r_{1} (t)$ and the first $I M F$ , as shown in the following formula:

$r_{1} (t) = \frac{1}{n} \sum_{i = 1}^{n} M (x_{i} (t))$

(2)

$I M F_{1} = x - r_{1} (t)$

(3)

where $M (\cdot)$ denotes the local mean and $r_{1} (t)$ denotes the first residual.
(3): Continue to add Gaussian white noise to the first residual component $r_{1} (t)$ , and calculate the mean value of the second EMD result to obtain the second modal component $I M F_{2}$ . Repeat the above calculation process to obtain the $k$ -th modal component.

$I M F_{2} = r_{1} (t) - r_{2} (t) = r_{1} (t) - \frac{1}{n} \sum_{i = 1}^{n} M [r_{1} (t) + ε_{1} E (w_{i}) (t)]$

(4)

$I M F_{k} = r_{k - 1} (t) - r_{k} (t)$

(5)

$r_{k} (t) = \frac{1}{n} \sum_{i = 1}^{n} M [r_{k - 1} (t) + ε_{k - 1} (w_{i} (t))]$

(6)
(4): Continue repeating the above steps until the residual component can no longer be decomposed and calculated, ultimately decomposing the original component into a fixed component and a residual component that cannot be further decomposed.

x (t) = r (t) + \sum_{i = 1}^{n} I M F_{i}

(7)

2.2. The TCN

The Temporal Convolutional Network (TCN), as an enhanced feedforward neural network derived from CNN architecture, is specifically designed for sequential prediction tasks through its unique integration of causal convolution, dilated convolution, and residual modules (as shown in Figure 1). The causal convolution mechanism establishes temporal constraints via forward zero-padding, ensuring that the output at time step t depends exclusively on current and historical inputs without future information leakage. Simultaneously, dilated convolution expands the receptive field by inserting (d-1) zeros between kernel elements through the dilation rate d, effectively capturing long-range temporal dependencies. The extracted features then undergo nonlinear transformation through ReLU activation, followed by dimensionality reduction in pooling layers to maintain computational efficiency while preserving critical feature components. Crucially, the residual module implements skip connections that fuse original inputs with processed features, creating gradient propagation shortcuts that effectively mitigate gradient anomalies during backpropagation in deep network training. This comprehensive architecture collectively addresses key challenges in temporal modeling, including temporal dependency preservation, long-range pattern capture, nonlinear representation capacity, and training stability maintenance.

2.3. BiLSTM Network

The Long Short-Term Memory (LSTM) network, as a specialized recurrent neural network (RNN) variant, has been widely adopted for sequence prediction and temporal data analysis due to its superior efficiency and accuracy. Distinguished by its gated architecture (comprising forget, input, and output gates), LSTM precisely regulates information flow through synergistic gate operations, effectively addressing the gradient vanishing/explosion problems inherent in conventional RNNs. As illustrated in Figure 2, the network’s unique cell state mechanism maintains a constant gradient propagation path while gate functions dynamically modulate information retention and update, collectively enhancing long-term temporal dependency learning. This sophisticated design enables simultaneous mitigation of gradient anomalies and improved capture of dynamic temporal patterns, establishing LSTM as a robust solution for complex sequential modeling tasks.

The key variables presented in Figure 2 can be mathematically derived using Equation (8) to Equation (13).

I_{t} = sigmoid (W_{1} \cdot [h_{t - 1}, x_{t}] + b_{I})

(8)

F_{t} = sigmoid (W_{F} \cdot [h_{t - 1}, x_{t}] + b_{F})

(9)

O_{t} = s i g m o i d (W_{O} \cdot [h_{t - 1}, x_{t}] + b_{O})

(10)

E_{t} = \tanh (W_{E} \cdot [h_{t - 1}, x_{t}] + b_{E})

(11)

C_{t} = I_{t} * C_{t - 1} + F_{t} * O_{t}

(12)

Z_{t} = O_{t} * \tanh (C_{t})

(13)

where

I_{t}

,

F_{t}

, and

O_{t}

represent the input gate, forget gate, and output gate, respectively;

E_{t}

denotes the memory unit;

h_{t}

and

h_{t - 1}

denote the hidden states at the current time point and the previous time point, respectively;

C_{t}

and

C_{t - 1}

denote the unit states at the current time point and the previous time point, respectively;

x_{t}

denotes the input at the current time point;

sigmoid

denotes the activation function;

*

denotes multiplication by an element;

W_{I}

,

W_{F}

,

W_{E}

, and

W_{O}

denote the weight matrices for the corresponding gates;

b_{I}

,

b_{F}

,

b_{O}

and

b_{E}

denote the biases for the corresponding gates.

The BiLSTM network is a deep neural network architecture that enhances sequence modeling capabilities by integrating forward and backward LSTM layers, as shown in Figure 3. The forward LSTM layer processes the input sequence in chronological order to capture historical dependency features, while the backward LSTM layer processes the sequence in reverse chronological order to extract future dependency features. This bidirectional collaborative mechanism enables the joint analysis of sequence dynamics from both historical and future dimensions, overcoming the modeling constraints of unidirectional LSTM, which can only utilize past sequence information. The formula for the BiLSTM model is shown below:

h_{a, t} = LSTM (x_{t}, h_{a, t - 1})

(14)

h_{b, t} = LSTM (x_{t}, h_{b, t - 1})

(15)

y_{t} = σ (W_{h}^{a} \cdot h_{a, t} + W_{h}^{b} \cdot h_{b, t} + b)

(16)

where

h_{a, t}

and

h_{b, t}

represent the forward and backward hidden layer states, respectively, LSTM represents the LSTM function,

W_{h}^{a}

and

W_{h}^{b}

represent the forward and backward weight parameters, respectively,

σ

represents the activation function, and

b

represents the bias.

2.4. Multi-Head Self-Attention Mechanism

MHA is an extended algorithm based on Scaled Dot-Product Attention (SDPA), which calculates the reconstructed attention matrix by generating Query, Key, and Value vectors, and introduces a softmax function to normalize the attention matrix, thereby significantly enhancing the model’s ability to understand input data. Additionally, MHA can simultaneously model dependencies between any positions in a sequence, focusing on key information through a differentiated weight allocation mechanism. This effectively overcomes the limitations of traditional deep learning methods, such as the BiLSTM model, which struggle to establish global temporal feature associations [38]. In this study, MHA is used to construct deep dynamic interactions between input features, further enhancing the BiLSTM model’s ability to process long sequences and simulate complex dependencies. The main formula for this algorithm is as follows:

Q_{i} = X W_{Q}

(17)

K_{i} = X W_{K}

(18)

V_{i} = X W_{V}

(19)

A_{i} = Sofmax (\frac{Q_{i} K_{i}^{T}}{\sqrt{d_{k}}})

(20)

H_{i} = A_{i} \cdot V_{i}

(21)

H_{c o n c a t} = [H_{1}, H_{2}, H_{3}, \cdot \cdot \cdot, H_{h}]

(22)

H = H_{c o n c a t} W_{O}

(23)

where

X

represents the input features,

W_{Q}

,

W_{K}

,

W_{V}

,

W_{O}

represent the learnable weight matrices,

Q

,

K

,

V

represent the Query, Key, and Value,

A_{i}

represents the attention weights calculated by different attention heads,

Softmax

represents the activation function,

H_{i}

represents the weighted sum of

V_{i}

by different attention heads,

d_{k}

represents the adjustment factor,

H_{c o n c a t}

represents the feature matrix formed by concatenating all

H_{i}

, and

H

represents the final output.

2.5. Assessment Metrics

To evaluate the predictive performance of the proposed ICEEMDAN-TCN-BiLSTM-MHA combination model, the Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), Terrell’s inequality coefficient (U1), and R-Squared (R2) were used to measure the model’s error [39]. The calculation formula for each evaluation indicator is as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} |G_{i} - G P_{i}|

(24)

M A P E = \frac{1}{n} \sum_{i = 1}^{n} |\frac{G_{i} - G P_{i}}{G_{i}}| \times 100 %

(25)

R M S E \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(G_{i} - G P_{i})}^{2}}

(26)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(G_{i} - G P_{i})}^{2}}{\sum_{i = 1}^{n} {(G_{i} - G P_{i})}^{2}}

(27)

U_{1} = \frac{\sqrt{\frac{1}{n} \sum_{i = 1}^{n} (G_{i} - G P_{i})}}{\sqrt{\frac{1}{n} \sum_{i = 1}^{n} G_{i}^{2}} + \sqrt{\frac{1}{n} \sum_{i = 1}^{n} G P_{i}^{2}}}

(28)

where

G_{i}

represents the actual value of the sample,

G P_{i}

represents the predicted value of PV, and

n

represents the number of samples.

3. Method

3.1. The Short-Term Prediction Process of Photovoltaic Based on ICEEMDAN-TCN-BiLSTM-MHA

The combination of TCN and BiLSTM addresses the limitation of BiLSTM networks in capturing detailed features of input data while reducing the computational complexity of BiLSTM networks. The integration of BiLSTM with MHA enables the model to focus on key information in the data, enhancing its ability to establish long-term effective temporal dependencies. Using MAE as the objective function and introducing ICEEMDAN to smooth the original power data effectively suppresses the modal aliasing issues of traditional EMD are effectively suppressed, thereby improving the model’s prediction accuracy and reducing asymmetric randomness in PV power forecasting. The TCN model comprises one TCN layer and two fully connected layers. The TCN layer contains 30 filters and uses the ReLU activation function. The first fully connected layer also consists of 30 filters with ReLU activation, while the second fully connected layer is configured with one filter and employs a linear activation function. The BiLSTM model is structured with four hidden layers and one dropout layer. The numbers of neurons in the hidden layers are 8, 32, 64, and 128, respectively, and the dropout rate is set to 0.174. The architectural layers and parameter configurations for both the TCN and BiLSTM models are implemented according to [40,41]. The parameters of the TCN-BiLSTM model component are summarized in Table 2. Furthermore, other key parameters in the model—including the number of heads in the MHA mechanism, the Gaussian noise level in ICEEMDAN, the learning rate, the batch size, the number of iterations, and the dropout rate in the BiLSTM layer—were optimized using the Particle Swarm Optimization (PSO) algorithm within predefined upper and lower bounds. The resulting optimal parameter values are presented in Table 3. In summary, this paper constructs a PV power prediction model based on ICEEMDAN-TCN-BiLSTM-MHA, with the prediction process shown in Figure 4.

This prediction model consists of the following six steps:

Step 1. Remove data with PV output power of 0 and detect outliers in the data. Use the mean interpolation method of adjacent data points to fill in the outliers.

Step 2. Based on the weather annotation information in the dataset, classify the historical PV output power into sunny, cloudy, and rainy days according to weather type characteristics.

Step 3. Perform Pearson correlation analysis on the processed data, and combine point-line diagrams of different features and PV output power to screen the primary meteorological features influencing PV output power as model inputs.

Step 4. Perform ICEEMDAN modal decomposition on the screened data, reducing noise and complexity in the input data by decomposing the original input data into multiple independent sub-sequences.

Step 5. Construct the ICEEMDAN-TCN-BiLSTM-MHA prediction model, using mean squared error (MSE) as the error loss function.

Step 6. The parameters of the hybrid ICEEMDAN-TCN-BiLSTM-MHA model are optimized using the PSO algorithm.

Step 7. Evaluate the prediction error of the model for PV power output using MAE, RMSE, MAPE, U1, and R2.

3.2. Dataset

The experimental data were obtained from a PV power station in Haidong City, Qinghai Province, China, with a longitude of 101°49′3.81″ and a latitude of 36°39′3.016″. The dataset covers PV power generation data for the entire year of 2024, comprising a total of 29,713 data samples, with a sampling frequency of 15 min for PV output power. The dataset includes six meteorological features: Component temperature, ambient temperature, humidity, total solar radiation, direct radiation, and diffuse radiation, as well as historical actual values of PV output power, all of which were collected by on-site sensors. Furthermore, the dataset includes three weather type labels: sunny, cloudy, and rainy. The corresponding meteorological data were derived from a national reference climatological station operated by the China Meteorological Administration. As shown in Table 4 the determination thresholds for each weather type are provided.

3.3. Data Anomaly Handling

There is a strong linear correlation between PV output power and total solar radiation, direct radiation, and diffuse radiation [42]. However, during data collection at PV power plants, issues such as signal interruptions caused by extreme weather conditions affecting wireless communication and instrument errors in the data collection system may result in missing or anomalous data in the raw data. To enhance the validity of the data and improve the predictive accuracy of the model, it is necessary to eliminate the interference of unfavorable data on model training. First, 15,671 datasets with constant zero power at night were excluded; then, the remaining 14,042 datasets were subjected to Random Sample Consensus (RANSAC) outlier detection in batches of 5000. The RANSAC regression algorithm identifies outliers by calculating the regression residuals for each data point. If the absolute value of the residual exceeds three times the standard deviation (exceeding 3

σ

), the data point is deemed an outlier. As shown in Figure 5, the detection results indicate the presence of some outliers in the data. For the 97 missing values and 32 outliers in the dataset, the mean interpolation method using neighboring data points was employed for filling. Ultimately, 13,913 valid experimental datasets were obtained. The characteristics of the dataset are presented in Table 5.

3.4. Pearson Correlation Analysis

To reduce the complexity of model learning and enhance computational efficiency, while avoiding the negative impact of redundant and irrelevant features on model symmetry, Pearson correlation analysis was used to analyze the correlation between six types of meteorological features affecting PV output power and different seasons, including spring (March to May), summer (June to August), autumn (September to November), and winter (December to February), as shown in Figure 6. Based on the correlation analysis results, the Pearson correlation coefficients for total solar radiation, direct radiation, and scattered radiation were significantly higher than those for other features, indicating a strong linear correlation between these features and PV output power. In contrast, humidity and ambient temperature exhibit lower correlations with actual PV system output power, indicating their limited influence on output power. As shown in the short-term photovoltaic power forecasting curve in Figure 7 and the error evaluation metrics in Table 6 he results indicate that when features—including component temperature, ambient temperature, humidity, total solar radiation, direct radiation, and scattered radiation—are used as inputs, compared to using only 4 features (module temperature, global irradiance, direct irradiance, and diffuse irradiance), the model’s prediction performance does not improve significantly, whereas the computation time increases by 28.57%. Therefore, this study selects total solar radiation, direct radiation, scattered radiation, and PV module temperature—which significantly impact PV output power—as model input features. The calculation process for the Pearson correlation coefficient is shown in Equation (29), with a value range of [−1, 1]. The closer the absolute value is to 1, the stronger the linear correlation between variables, while the closer the absolute value is to 0, the weaker the linear correlation [43].

ρ = \frac{cov (R (X), R (Y))}{σ R (X) σ R (Y)}

(29)

where

R (X)

and

R (Y)

represent the rank data of feature

X

and target variable

Y

, respectively,

cov (R (X), R (Y))

represents the covariance of the rank data of the two,

σ R (X)

and

σ R (Y)

represent the standard deviations of

R (X)

and

R (Y)

, respectively.

3.5. ICEEMDAN Decomposition

In the study of PV power output prediction, historical data often exhibit multi-peak and complex fluctuation characteristics due to uncertainty and non-stationarity, making data preprocessing a key step in improving model performance. For non-stationary input data, ICEEMDAN is used as the data decomposition method. This method decomposes the original input data into multiple independent sub-sequences, significantly reducing data non-stationarity and enhancing the model’s ability to learn the correlation between meteorological characteristics and PV power output.

To further validate the effectiveness of the data preprocessing method, six different machine learning and deep learning prediction models (Back Propagation Neural Network (BPNN), Kernel-based Extreme Learning Machine (KELM), TCN, LSTM, eXtreme Gradient Boosting (XGBoost), and Transformer) were selected for a comparative experiment, as shown in Figure 8 The results indicate that the data processed by the ICEEMDAN algorithm exhibits higher accuracy in PV power output prediction.

4. Results and Discussion

To validate the accuracy and generalization capability of the proposed ICEEMDAN-TCN-BiLSTM-MHA hybrid model, a cross-regional comparative analysis was conducted using operational data from two photovoltaic power stations located in different climatic zones: Xining City, Qinghai Province and Haian City, Jiangsu Province.

The dataset from Xining, which features a plateau continental climate, was collected in 2024. It includes four types of feature variables—ambient temperature, total solar radiation, direct radiation, and scattered radiation—as well as the actual photovoltaic power output. All parameters were measured and collected via on-site sensors. After preprocessing, a total of 13,913 valid data samples were obtained. The dataset from Haian, characterized by a subtropical monsoon climate, was collected in 2022. It also contains the same four feature variables and the actual power output, similarly acquired through field sensors. This dataset yielded 13,716 valid samples after preprocessing. In both cases, 80% of each dataset was used for model training. Furthermore, based on weather records from the China Meteorological Administration, the data from both stations were categorized into three typical weather conditions for comparative analysis: sunny, cloudy, and rainy.

The experimental configuration utilized a Windows 64-bit operating system, an Intel Core i9-13900HX processor, an NVIDIA GeForce RTX 4060 graphics card, and Python version 3.11 as the programming language, with GPU acceleration utilized to expedite the computational process of the model. To mitigate the impact of random factors on experimental results, each model was run 10 times, and the average of these results was used to evaluate the model’s predictive performance.

4.1. A Comparative Analysis of Model Prediction Results on the Qinghai Dataset

Based on meteorological scenario data from PV power plants in Qinghai Province, China, an ICEEMDAN-TCN-BiLSTM-MHA combined model framework was constructed to achieve intraday prediction of PV output power. Four different comparison models—BiLSTM, TCN-BiLSTM, BiLSTM-MHA, and TCN-BiLSTM-MHA—were compared with the proposed ICEEMDAN-TCN-BiLSTM-MHA combined model through experimental comparisons, as shown in Figure 9 and Table 7, Table 8 and Table 9 In sunny weather scenarios, all models demonstrated high prediction accuracy (R²

\geq

0.98). Among them, the ICEEMDAN-TCN-BiLSTM-MHA model demonstrated a significant advantage with a MAPE of 2.891%, significantly outperforming other comparison models and single models (BiLSTM model MAPE: 13.429%, TCN-BiLSTM model MAPE: 8.850%, BiLSTM-MHA model MAPE: 7.757%, TCN-BiLSTM-MHA model MAPE: 8.324%). In cloudy scenes, the predictive performance of different models showed significant differentiation. The combined models generally performed better, with the BiLSTM-MHA and TCN-BiLSTM-MHA models achieving MAPE values of 13.49% and 7.00%, respectively, and RMSE values of 2741.33 and 1367.42, respectively. In contrast, the ICEEMDAN-TCN-BiLSTM-MHA model achieved more accurate PV power output prediction results. In rainy weather scenarios, model performance showed significant differences due to fluctuations in irradiance. Among them, the ICEEMDAN-TCN-BiLSTM-MHA combination model outperformed other comparison models in terms of prediction accuracy, with MAPE reduced by 36.27% and 26.34% compared to the BiLSTM-MHA and TCN-BiLSTM-MHA models, respectively. The BiLSTM and TCN-BiLSTM models exhibit more pronounced errors in this scenario, with MAPE values reaching 2.41 times and 2.11 times that of ICEEMDAN-TCN-BiLSTM-MHA, respectively.

4.2. A Comparative Analysis of Model Generalization Performance on the Jangsu Dataset

The experimental results from the photovoltaic power station in Jiangsu Province, China, are presented in Figure 10 and Table 10, Table 11 and Table 12. Based on the error evaluation metrics listed in Table 10, Table 11 and Table 12, the proposed ICEEMDAN-TCN-BiLSTM-MHA hybrid model demonstrates accurate predictive performance under three typical weather conditions—sunny, cloudy, and rainy all models demonstrated high prediction accuracy (R²

\geq

0.97). A comprehensive comparison of the error metrics between Qinghai Province and Jiangsu Province, as summarized in Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12 shows that the differences in the relative error metrics MAPE and U₁ between the Jiangsu and Qinghai datasets are relatively small across all three weather scenarios, ranging from 4.65% to 11.28% and from 7.05% to 10.91%, respectively. These results indicate that the proposed model maintains strong stability and generalization capability under different regional conditions.

The experimental results demonstrate that the ICEEMDAN-TCN-BiLSTM-MHA hybrid model consistently exhibits high predictive accuracy and stability across datasets from diverse climatic regions. The cross-regional comparative experiments further validate its strong generalization capability.

5. Discussion

In practical applications, the proposed ICEEMDAN-TCN-BiLSTM-MHA hybrid model can provide high-precision short-term forecasting for PV power plant operations, supporting power generation planning, energy storage regulation, and market trading strategy optimization. Particularly under abrupt weather changes, it helps mitigate operational risks caused by prediction deviations and can be further integrated into smart grid monitoring systems to facilitate the efficient grid integration of renewable energy. Owing to the intermittent, fluctuating, and non-stationary nature of PV power generation, the ICEEMDAN algorithm effectively reduces noise interference in PV power prediction through modal decomposition of the data. Meanwhile, the multi-scale temporal feature extraction capability of the TCN-BiLSTM model, combined with the dynamic weighting mechanism of MHA for key meteorological features, further enhances the prediction performance of the model. Therefore, the proposed hybrid model demonstrates stronger specialized applicability in PV power forecasting. However, the model currently faces challenges related to high computational complexity, with major bottlenecks including the iterative computational overhead of ICEEMDAN decomposition, the parameter growth caused by deep BiLSTM structures, and the additional computational burden introduced by the global dynamic weighting calculations in MHA. Therefore, future work will focus on further optimizing model parameters and comparing various algorithmic models, such as Kolmogorov–Arnold Networks (KANs), XGBoost, and KELM, to reduce the computational burden.

6. Conclusions

PV power generation is characterized by intermittency, fluctuation, and a symmetrical distribution of daily power output. Therefore, high-accuracy forecasting of its output power is of great significance for ensuring the security and stability of large-scale grid integration, as well as improving energy utilization efficiency. This study constructs a hybrid prediction model named ICEEMDAN-TCN-BiLSTM-MHA based on a BiLSTM backbone. By incorporating a TCN module to enhance local feature extraction and a multi-head attention mechanism to explore feature correlations across multiple subspaces, the model significantly improves its capability for feature representation and dependency capture. Meanwhile, the ICEEMDAN algorithm is employed to decompose meteorological inputs, constructing high signal-to-noise ratio input data, thereby comprehensively improving the prediction accuracy and generalization performance of the model. Experiments under three typical weather conditions in the Qinghai dataset show that the proposed hybrid model significantly outperforms single benchmark models. Particularly under rainy conditions, compared to the single BiLSTM model, MAE, MAPE, RMSE, and U1 decrease by 65.75%, 58.44%, 68.96%, and 67.82%, respectively, while R² increases by 13.95%. Furthermore, ICEEMDAN decomposition effectively suppresses data non-stationarity and noise interference, accelerating convergence and improving prediction performance across six different machine learning and deep learning models. Finally, cross-regional tests based on data from the Qinghai and Jiangsu provinces demonstrate that the model maintains stable and reliable performance under different climatic and regional conditions, verifying its superior accuracy and generalization capability. Future research will explore further improvements, such as optimizing the model’s internal hyperparameters through a comparative analysis of various deep learning methods, to enhance prediction accuracy.

Author Contributions

Methodology, Y.L. and S.Z.; validation, Y.L., S.Z. and G.Y.; investigation, Y.L., G.Y., S.P. and X.L.; resources, Y.L.; data curation, Y.L. and S.Z.; writing—original draft preparation, S.Z.; writing—review and editing, Y.L.; project administration, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Research and Development and Transformation Program of Qinghai Province (2024QY205), the National Natural Science Foundation of China (No. 5236060138) and the National Natural Science Foundation of China (66M2024013) Graduate Innovation Program of Qinghai Minzu University.

Data Availability Statement

Due to the data privacy concerns of the regional power plants involved, the data used in this study is restricted from being provided.

Acknowledgments

We thank the School of Hydraulic Engineering at Dalian University of Technology for providing computer data processing technology.

Conflicts of Interest

The authors declare no conflicts of interests.

Abbreviations

The following abbreviations are used in this manuscript:

BiLSTM	Bidirectional Long Short-Term Memory
ICEEMDAN	Improved Complementary Ensemble Empirical Mode Decomposition with Adaptive Noise
MHA	Multi-head self-Attention
TCN	Temporal Convolutional Network
MAPE	Mean Absolute Percentage Error
LSTM	Long Short-Term Memory
CNN	Convolutional Neural Network
RNN	Recurrent Neural Network
AMODE	Adaptive Multi-Objective Differential Evolution
ISOA	Improved Sea-horse Optimization Algorithm
VMD	Variational Mode Decomposition
IDBO	Improved Dung Beetle Optimizer
KELM	Kernel Extreme Learning Machine
MAE	Mean Absolute Error
RMSE	Root Mean Square Error
U₁	Terrell’s inequality coefficient
R²	R-Squared
EMD	Empirical Mode Decomposition
RANSAC	Random Sample Consensus
PV	Photovoltaic
PSO	Particle Swarm Optimization
BPNN	Back Propagation Neural Network
XGBoost	eXtreme Gradient Boosting
KELM	Kernel-based Extreme Learning Machine
KAN	Kolmogorov–Arnold Networks
CEEMDAN	Complete Ensemble Empirical Mode Decomposition with Adaptive Noise

References

Garniwa, P.M.P.; Rajagukguk, R.A.; Kamil, R.; Lee, H. Intraday forecast of global horizontal irradiance using optical flow method and long short-term memory model. Sol. Energy 2023, 252, 234–251. [Google Scholar] [CrossRef]
Han, Y.; Sun, Y.; Wu, J. An efficient and low-cost solar-aided lignite drying power generation system based on cascade utilisation of concentrating and non-concentrating solar energy. Energy 2024, 289, 129932. [Google Scholar] [CrossRef]
Wu, Y.; Liu, J.; Li, S.; Jin, M. Physical model and long short-term memory-based combined prediction of photovoltaic power generation. J. Power Electron. 2024, 24, 1118–1128. [Google Scholar] [CrossRef]
Mellit, A.; Massi Pavan, A.; Ogliari, E.; Leva, S.; Lughi, V. Advanced methods for photovoltaic output power forecasting: A review. Appl. Sci. 2020, 10, 487. [Google Scholar] [CrossRef]
Beigi, M.; Beigi Harchegani, H.; Torki, M.; Kaveh, M.; Szymanek, M.; Khalife, E.; Dziwulski, J. Forecasting of power output of a PVPS based on meteorological data using RNN approaches. Sustainability 2022, 14, 3104. [Google Scholar] [CrossRef]
Ganaie, M.A.; Hu, M.; Malik, A.K.; Tanveer, M.; Suganthan, P.N. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 2022, 115, 105151. [Google Scholar] [CrossRef]
Vu, B.H.; Chung, I.-Y. Optimal generation scheduling and operating reserve management for PV generation using RNN-based forecasting models for stand-alone microgrids. Renew. Energy 2022, 195, 1137–1154. [Google Scholar] [CrossRef]
Mayer, M.J. Benefits of physical and machine learning hybridization for photovoltaic power forecasting. Renew. Sustain. Energy Rev. 2022, 168, 112772. [Google Scholar] [CrossRef]
Santos, L.d.O.; AlSkaif, T.; Barroso, G.C.; Carvalho, P.C.M.d. Photovoltaic power estimation and forecast models integrating physics and machine learning: A review on hybrid techniques. Sol. Energy 2024, 284, 113044. [Google Scholar] [CrossRef]
Syauqi, A.; Pavian Eldi, G.; Andika, R.; Lim, H. Reducing data requirement for accurate photovoltaic power prediction using hybrid machine learning-physical model on diverse dataset. Sol. Energy 2024, 279, 112814. [Google Scholar] [CrossRef]
Zhang, R.; Bu, S.; Zhou, M.; Li, G.; Zhan, B.; Zhang, Z. Deep reinforcement learning based interpretable photovoltaic power prediction framework. Sustain. Energy Technol. Assess. 2024, 67, 103830. [Google Scholar] [CrossRef]
Tahir, M.F.; Yousaf, M.Z.; Tzes, A.; El Moursi, M.S.; El-Fouly, T.H.M. Enhanced solar photovoltaic power prediction using diverse machine learning algorithms with hyperparameter optimization. Renew. Sustain. Energy Rev. 2024, 200, 114851. [Google Scholar] [CrossRef]
Wang, M.; Wang, P.; Zhang, T. Evidential extreme learning machine algorithm-based day-ahead photovoltaic power forecasting. Energies 2022, 15, 3382. [Google Scholar] [CrossRef]
Wang, X.; Sun, Y.; Luo, D.; Peng, J. Comparative study of machine learning approaches for predicting short-term photovoltaic power output based on weather type classification. Energy 2022, 240, 122733. [Google Scholar] [CrossRef]
Mellit, A.; Pavan, A.M.; Lughi, V. Deep learning neural networks for short-term photovoltaic power forecasting. Renew. Energy 2021, 172, 276–288. [Google Scholar] [CrossRef]
Pombo, D.V.; Bindner, H.W.; Spataru, S.V.; Sørensen, P.E.; Bacher, P. Increasing the accuracy of hourly multi-output solar power forecast with physics-informed machine learning. Sensors 2022, 22, 749. [Google Scholar] [CrossRef]
Zhang, M.; Han, Y.; Wang, C.; Yang, P.; Wang, C.; Zalhaf, A.S. Ultra-short-term photovoltaic power prediction based on similar day clustering and temporal convolutional network with bidirectional long short-term memory model: A case study using dkasc data. Appl. Energy 2024, 375, 124085. [Google Scholar] [CrossRef]
Mayer, M.J. Influence of design data availability on the accuracy of physical photovoltaic power forecasts. Sol. Energy 2021, 227, 532–540. [Google Scholar] [CrossRef]
Ren, X.; Zhang, F.; Zhu, H.; Liu, Y. Quad-kernel deep convolutional neural network for intra-hour photovoltaic power forecasting. Appl. Energy 2022, 323, 119682. [Google Scholar] [CrossRef]
Shan, S.; Li, C.; Ding, Z.; Wang, Y.; Zhang, K.; Wei, H. Ensemble learning based multi-modal intra-hour irradiance forecasting. Energy Convers. Manag. 2022, 270, 116206. [Google Scholar] [CrossRef]
Park, M.K.; Lee, J.M.; Kang, W.H.; Choi, J.M.; Lee, K.H. Predictive model for pv power generation using rnn (lstm). J. Mech. Sci. Technol. 2021, 35, 795–803. [Google Scholar] [CrossRef]
Wu, Z.; Pan, F.; Li, D.; He, H.; Zhang, T.; Yang, S. Prediction of photovoltaic power by the informer model based on convolutional neural network. Sustainability 2022, 14, 13022. [Google Scholar] [CrossRef]
Wang, J.; Jia, M.; Li, S.; Chen, K.; Zhang, C.; Song, X.; Zhang, Q. Short-term power-generation prediction of high humidity island photovoltaic power station based on a deep hybrid model. Sustainability 2024, 16, 2853. [Google Scholar] [CrossRef]
Gong, J.; Qu, Z.; Zhu, Z.; Xu, H. Parallel timesnet-bilstm model for ultra-short-term photovoltaic power forecasting using stl decomposition and auto-tuning. Energy 2025, 320, 135286. [Google Scholar] [CrossRef]
Hu, K.; Fu, Z.; Lang, C.; Li, W.; Tao, Q.; Wang, B. Short-term photovoltaic power generation prediction based on copula function and cnn-cosattention-transformer. Sustainability 2024, 16, 5940. [Google Scholar] [CrossRef]
Wang, J.; Zhang, Z.; Xu, W.; Li, Y.; Niu, G. Short-term photovoltaic power forecasting using a bi-lstm neural network optimized by hybrid algorithms. Sustainability 2025, 17, 5277. [Google Scholar] [CrossRef]
Meng, J.; Yuan, Q.; Zhang, W.; Yan, T.; Kong, F. Short-term prediction of rural photovoltaic power generation based on improved dung beetle optimization algorithm. Sustainability 2024, 16, 5467. [Google Scholar] [CrossRef]
Zhao, W.; Fan, L. Short-term load forecasting method for industrial buildings based on signal decomposition and composite prediction model. Sustainability 2024, 16, 2522. [Google Scholar] [CrossRef]
Cui, S.; Lyu, S.; Ma, Y.; Wang, K. Improved informer PV power short-term prediction model based on weather typing and AHA-VMD-MPE. Energy 2024, 307, 132766. [Google Scholar] [CrossRef]
Tang, H.; Kang, F.; Li, X.; Sun, Y. Short-term photovoltaic power prediction model based on feature construction and improved transformer. Energy 2025, 320, 135213. [Google Scholar] [CrossRef]
Zhang, J.; Hao, Y.; Fan, R.; Wang, Z. An ultra-short-term pv power forecasting method for changeable weather based on clustering and signal decomposition. Energies 2023, 16, 3092. [Google Scholar] [CrossRef]
Chang, C.; Ma, G.; Zhang, J.; Tao, J. Investigation on the CNN-LSTM-MHA-based model for the heating energy consumption prediction of residential buildings considering active and passive factors. Energy 2025, 333, 137508. [Google Scholar] [CrossRef]
Jerse, G.; Marcucci, A. Deep learning LSTM-based approaches for 10.7 cm solar radio flux forecasting up to 45-days. Astron. Comput. 2024, 46, 100786. [Google Scholar] [CrossRef]
Peng, S.; Zhu, J.; Wu, T.; Yuan, C.; Cang, J.; Zhang, K.; Pecht, M. Prediction of wind and PV power by fusing the multi-stage feature extraction and a PSO-BiLSTM model. Energy 2024, 298, 131345. [Google Scholar] [CrossRef]
Sun, X.; Liu, H. Multivariate short-term wind speed prediction based on PSO-VMD-SE-ICEEMDAN two-stage decomposition and Att-S2S. Energy 2024, 305, 132228. [Google Scholar] [CrossRef]
Zhao, H.; Huang, X.; Xiao, Z.; Shi, H.; Li, C.; Tai, Y. Week-ahead hourly solar irradiation forecasting method based on ICEEMDAN and TimesNet networks. Renew. Energy 2024, 220, 119706. [Google Scholar] [CrossRef]
Qiao, W.; Fu, Z.; Du, M.; Nan, W.; Liu, E. Seasonal peak load prediction of underground gas storage using a novel two-stage model combining improved complete ensemble empirical mode decomposition and long short-term memory with a sparrow search algorithm. Energy 2023, 274, 127376. [Google Scholar] [CrossRef]
Ma, S.; Wang, H.; Yu, Z.; Du, L.; Zhang, M.; Fu, Q. AttenEpilepsy: A 2D convolutional network model based on multi-head self-attention. Eng. Anal. Bound. Elem. 2024, 169, 105989. [Google Scholar] [CrossRef]
Cao, Z.; Liu, H. A novel carbon price forecasting method based on model matching, adaptive decomposition, and reinforcement learning ensemble strategy. Environ. Sci. Pollut. Res. 2022, 30, 36044–36067. [Google Scholar] [CrossRef]
Guo, J.; Li, D.; Du, B. A stacked ensemble method based on TCN and convolutional bi-directional GRU with multiple time windows for remaining useful life estimation. Appl. Soft Comput. 2024, 150, 111071. [Google Scholar] [CrossRef]
Sun, J.; Fan, C.; Yan, H. SOH estimation of lithium-ion batteries based on multi-feature deep fusion and XGBoost. Energy 2024, 306, 132429. [Google Scholar] [CrossRef]
Jebli, I.; Belouadha, F.-Z.; Kabbaj, M.I.; Tilioua, A. Prediction of solar energy guided by pearson correlation using machine learning. Energy 2021, 224, 120109. [Google Scholar] [CrossRef]
Gao, H.; Qiu, S.; Fang, J.; Ma, N.; Wang, J.; Cheng, K.; Wang, H.; Zhu, Y.; Hu, D.; Liu, H.; et al. Short-term prediction of PV power based on combined modal decomposition and NARX-LSTM-LightGBM. Sustainability 2023, 15, 8266. [Google Scholar] [CrossRef]

Figure 1. (a) The network structure of the TCN. (b) Residual connection structure.

Figure 2. The network structure of the LSTM Network.

Figure 3. The network structure of the BiLSTM Network.

Figure 4. (a) Flow chart of ICEEMDAN-TCN-BiLSTM-MHA model. (b) Flow chart.

Figure 5. (a) Component temperature outlier check. (b) Temperature outlier check. (c) Humidity outlier check. (d) Toal radiation outlier check. (e) Direct radiation outlier check. (f) Scattered radiation outlier check.

Figure 6. (a) Pearson correlation analysis conducted in the period from March to May. (b) Pearson correlation analysis conducted from June to August. (c) Pearson correlation analysis conducted from September to November. (d) Pearson correlation analysis conducted from December to February.

Figure 7. Comparative prediction curves of short-term photovoltaic power output generated by the ICEEMDAN-TCN-BiLSTM-MHA model with varying numbers of input feature dimensions.

Figure 8. (a) BPNN PV Power Prediction. (b) ICEEMDAN-BPNN PV Power Prediction. (c) Convergence curves of BPNN and ICEEMDAN-BPNN. (d) KELM PV Power Prediction. (e) ICEEMDAN-KELM PV Power Prediction. (f) Convergence curves of KELM and ICEEMDAN-KELM. (g) TCN PV Power Prediction. (h) ICEEMDAN-TCN PV Power Prediction. (i) Convergence curves of TCN and ICEEMDAN-TCN. (j) LSTM PV Power Prediction. (k) ICEEMDAN-LSTM PV Power Prediction. (l) Convergence curves of LSTM and ICEEMDAN-LSTM. (m) XGBoost PV Power Prediction. (n) ICEEMDAN-XGBoost PV Power Prediction. (o) Convergence curves of XGBoost and ICEEMDAN-XGBoost. (p) Transformer PV Power Prediction. (q) ICEEMDAN-Transformer PV Power Prediction. (r) Convergence curves of Transformer and ICEEMDAN-Transformer.

Figure 9. (a) Sunny day PV power prediction. (b) Forecast error on sunny days. (c) Cloudy day PV power prediction. (d) Forecast error on Cloudy days. (e) Rainy day PV power prediction. (f) Forecast error on Rainy days.

Figure 10. (a) Sunny day PV power prediction. (b) Forecast error on sunny days. (c) Cloudy day PV power prediction. (d) Forecast error on Cloudy days. (e) Rainy day PV power prediction. (f) Forecast error on Rainy days.

Table 1. Performance differences among different models.

Model	Advantage	Disadvantage
CNN	High computational efficiency, a small parameter footprint, and strong local feature extraction capabilities.	A lack of sequential modeling capability and difficulties in capturing long-term temporal dependencies.
LSTM	Proficiency in temporal data processing and effective modeling of mid- to long-range dependencies.	High computational cost and difficulty in capturing critical information in long-sequence data.
Transformer	The self-attention mechanism enables parallel computation and captures global dependencies between any positions in a sequence.	High training data requirements, significant computational complexity, and a tendency to overfit on small datasets.
BiLSTM	Leveraging both past and future contextual information to capture dependencies with the current element for effective prediction.	Higher computational cost compared to LSTM, along with insufficient capability in capturing local features in the data.
CNN–Transformer	Well-suited for long-sequence data processing with the ability to capture local detailed features.	Its complex composite structure, coupled with a plethora of hyperparameters that require extensive tuning, poses significant challenges to training.
BiLSTM-MHA	The BiLSTM model provides the ability to capture the influence of past and future contexts on the current information, and the attention mechanism offers the capacity to focus on key temporal information.	High computational and training costs, along with an inability to capture detailed features, resulting from the fusion of BiLSTM and attention mechanisms.
ISOA-BiLSTM-MHA	The ISOA enables the automatic discovery of optimal hyperparameters for the BiLSTM and attention mechanism, leading to substantial improvements in model performance, stability, and alleviation of tuning difficulties.	Increased overall computational cost and undue model complexity due to the incorporation of an additional parameter optimization process.
VMD-IDBO-KELM	VMD effectively mitigates the impact of noise in the data on forecasting, thereby providing a cleaner signal for the IDBO-optimized KELM model, which is particularly suited for analyzing non-stationary and highly fluctuating data.	Process complexity and a critical dependence of the predictive performance on the quality of VMD data decomposition.

Table 2. The parameter values of the TCN-BiLSTM model.

Layers	Parameters	Option
TCN layer	Kernel size	6
	Num of filters	30
	Activation	Relu
	padding	5
First fully connected layer	Dilations	[1, 2, 4, 8, 16]
	Num of filters	30
	Activation	Relu
Second fully connected layer	Num of filters	30
Second fully connected layer	Activation	Linear
BiLSTM layer	First hidden layer	8
	Second hidden layer	32
	Third hidden layer	64
	Fourth hidden layer	128
	Dropout layer	0.174

Table 3. The parameter of the ICEEMDAN-TCN-BiLSTM-MHA model.

Parameters	Number of MHA Heads	Gaussian Noise	Learning Rate	Batch Size	Iterations	Dropout
Upper limit	1	0.001	0.001	10	10	0
Lower limit	6	0.5	0.1	100	80	1
Optimum value	4	0.031	0.008	45	35	0.174

Table 4. Weather classification thresholds.

Weather Situation	Weather Classification Thresholds
Sunny day	No clouds in the sky, or mid-/low-level cloud cover less than 10%, with high-level cloud cover below 40%.
Cloudy sky	Mid-/low-level cloud cover equal to or greater than 10%, with high-level cloud cover reaching 40% or more.
Rainy day	Rainfall equal to or greater than 0.1 mm within 24 h.

Table 5. Dataset characteristics.

Time	Sample Size	Characteristic	Remove Samples with Zero PV Power	RANSCA Outlier Detection	Detect Missing Values	The Remaining Sample Size
2024-01	2446	6	1335	3	3	1105
2024-02	2453	6	1347	1	9	1096
2024-03	2449	6	1321	0	11	1117
2024-04	2455	6	1329	2	7	1117
2024-05	2478	6	1304	5	13	1156
2024-06	2494	6	1277	3	6	1208
2024-07	2531	6	1245	1	2	1283
2024-08	2517	6	1259	4	14	1240
2024-09	2501	6	1284	1	6	1210
2024-10	2490	6	1303	3	8	1176
2024-11	2457	6	1325	5	10	1117
2024-12	2442	6	1342	4	8	1088

Table 6. Evaluation metrics for the prediction error of the ICEEMDAN-TCN-BiLSTM-MHA model under different numbers of features.

Model	Characteristic	MAPE (%)	R²	MAE	RMSE	U₁	Time
ICEEMDAN-TCN-BiLSTM-MHA	6	4.818	0.996	571.251	1138.158	0.023	42 min
ICEEMDAN-TCN-BiLSTM-MHA	4	5.256	0.991	867.035	1278.194	0.026	30 min

Table 7. Error evaluation indicators of different models in sunny weather scenarios (Qinghai Province Dataset).

Model	Sunny Day
Model	R²	MAE	MAPE (%)	RMSE	U₁	Iteration Time
BiLSTM	0.98	1256.04	13.42	1510.35	2.23	13 min
TCN-BiLSTM	0.99	691.32	8.85	898.83	1.37	16 min
BiLSTM-MHA	0.99	516.01	7.75	744.59	1.31	18 min
TCN-BiLSTM-MHA	0.99	261.58	5.27	482.49	0.75	25 min
ICEEMDAN-TCN-BiLSTM-MHA	0.99	195.45	2.89	334.60	0.55	27 min

Table 8. Error evaluation indicators of different models in cloudy weather scenarios (Qinghai Province Dataset).

Model	Cloudy Day
Model	R²	MAE	MAPE (%)	RMSE	U₁	Iteration Time
BiLSTM	0.90	3627.31	24.99	4392.14	8.06	13 min
TCN-BiLSTM	0.93	2702.24	17.06	3483.22	6.34	16 min
BiLSTM-MHA	0.96	2241.29	13.49	2741.39	4.92	18 min
TCN-BiLSTM-MHA	0.97	1270.47	7.00	1367.42	1.75	25 min
ICEEMDAN-TCN-BiLSTM-MHA	0.98	901.11	5.35	1089.71	1.45	31 min

Table 9. Error evaluation indicators of different models in rainy weather scenarios (Qinghai Province Dataset).

Model	Rainy Day
Model	R²	MAE	MAPE (%)	RMSE	U₁	Iteration Time
BiLSTM	0.86	2591.06	21.73	3443.10	8.11	13 min
TCN-BiLSTM	0.89	2300.49	19.01	2881.53	6.91	16 min
BiLSTM-MHA	0.93	1851.50	14.27	2319.63	5.58	18 min
TCN-BiLSTM-MHA	0.95	1557.46	12.26	1889.18	4.62	25 min
ICEEMDAN-TCN-BiLSTM-MHA	0.98	887.89	9.03	1068.78	2.61	27 min

Table 10. Error evaluation indicators of different models in sunny weather scenarios (Jangsu Province Dataset).

Model	Sunny Day
Model	R²	MAE	MAPE (%)	RMSE	U₁	Iteration Time
BiLSTM	0.98	1486.04	11.91	1753.91	2.19	12 min
TCN-BiLSTM	0.98	791.32	8.13	928.83	1.77	14 min
BiLSTM-MHA	0.99	602.04	6.88	884.26	1.43	18 min
TCN-BiLSTM-MHA	0.99	350.77	5.33	553.79	0.82	24 min
ICEEMDAN-TCN-BiLSTM-MHA	0.99	295.45	2.68	474.47	0.49	28 min

Table 11. Error evaluation indicators of different models in cloudy weather scenarios (Jangsu Province Dataset).

Model	Cloudy Day
Model	R²	MAE	MAPE (%)	RMSE	U₁	Iteration Time
BiLSTM	0.97	3809.67	23.64	4675.58	8.13	12 min
TCN-BiLSTM	0.98	2856.23	18.90	3712.50	6.93	14 min
BiLSTM-MHA	0.98	2454.66	12.82	2944.93	4.54	18 min
TCN-BiLSTM-MHA	0.99	1439.16	7.83	1577.08	1.83	24 min
ICEEMDAN-TCN-BiLSTM-MHA	0.99	1045.11	6.03	1379.78	1.56	29 min

Table 12. Error evaluation indicators of different models in rainy weather scenarios (Jangsu Province Dataset).

Model	Rainy Day
Model	R²	MAE	MAPE (%)	RMSE	U₁	Iteration Time
BiLSTM	0.88	2882.94	20.96	3492.47	7.89	12 min
TCN-BiLSTM	0.89	2592.28	18.62	3013.46	6.69	14 min
BiLSTM-MHA	0.94	2130.04	15.37	2582.50	5.90	18 min
TCN-BiLSTM-MHA	0.96	1683.44	11.46	2145.81	4.17	24 min
ICEEMDAN-TCN-BiLSTM-MHA	0.97	1127.14	9.47	1531.30	2.42	26 min

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Zhai, S.; Yi, G.; Pang, S.; Luo, X. Short-Term Photovoltaic Power Forecasting Based on ICEEMDAN-TCN-BiLSTM-MHA. Symmetry 2025, 17, 1599. https://doi.org/10.3390/sym17101599

AMA Style

Li Y, Zhai S, Yi G, Pang S, Luo X. Short-Term Photovoltaic Power Forecasting Based on ICEEMDAN-TCN-BiLSTM-MHA. Symmetry. 2025; 17(10):1599. https://doi.org/10.3390/sym17101599

Chicago/Turabian Style

Li, Yuan, Shiming Zhai, Guoyang Yi, Shaoyun Pang, and Xu Luo. 2025. "Short-Term Photovoltaic Power Forecasting Based on ICEEMDAN-TCN-BiLSTM-MHA" Symmetry 17, no. 10: 1599. https://doi.org/10.3390/sym17101599

APA Style

Li, Y., Zhai, S., Yi, G., Pang, S., & Luo, X. (2025). Short-Term Photovoltaic Power Forecasting Based on ICEEMDAN-TCN-BiLSTM-MHA. Symmetry, 17(10), 1599. https://doi.org/10.3390/sym17101599

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Short-Term Photovoltaic Power Forecasting Based on ICEEMDAN-TCN-BiLSTM-MHA

Abstract

1. Introduction

2. Theoretical Method

2.1. ICEEMDAN

2.2. The TCN

2.3. BiLSTM Network

2.4. Multi-Head Self-Attention Mechanism

2.5. Assessment Metrics

3. Method

3.1. The Short-Term Prediction Process of Photovoltaic Based on ICEEMDAN-TCN-BiLSTM-MHA

3.2. Dataset

3.3. Data Anomaly Handling

3.4. Pearson Correlation Analysis

3.5. ICEEMDAN Decomposition

4. Results and Discussion

4.1. A Comparative Analysis of Model Prediction Results on the Qinghai Dataset

4.2. A Comparative Analysis of Model Generalization Performance on the Jangsu Dataset

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI