Research on Photovoltaic Output Power Forecasting Based on an Attention-Enhanced BiGRU Optimized by an Improved Marine Predators Algorithm

Liu, Shanglin; Fu, Hua; Xie, Sen; Han, Haotong; Liu, Hao; Han, Bing; Cui, Peng

doi:10.3390/sym18020282

Open AccessArticle

Research on Photovoltaic Output Power Forecasting Based on an Attention-Enhanced BiGRU Optimized by an Improved Marine Predators Algorithm

by

Shanglin Liu

¹,

Hua Fu

¹,

Sen Xie

^2,*,

Haotong Han

¹,

Hao Liu

¹,

Bing Han

³ and

Peng Cui

³

¹

Faculty of Electrical and Control Engineering, Liaoning Technical University, Huludao 125000, China

²

Institute of Intelligence Science and Engineering, Shenzhen Polytechnic University, Shenzhen 518055, China

³

Panjin Power Supply Company, State Grid Liaoning Electric Power Co., Ltd., Panjin 124010, China

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(2), 282; https://doi.org/10.3390/sym18020282

Submission received: 22 December 2025 / Revised: 27 January 2026 / Accepted: 31 January 2026 / Published: 3 February 2026

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

Accurate photovoltaic (PV) output power forecasting is essential for reliable power system operation, yet rapidly changing meteorological conditions often degrade forecasting accuracy. This study proposes an attention-enhanced bidirectional gated recurrent unit (BiGRU) optimized by an improved Marine Predators Algorithm (IMPA) for PV output power forecasting. Kernel Principal Component Analysis (KPCA) is first employed to extract compact nonlinear representations and suppress redundant features. Then, a dual multi-head self-attention mechanism is integrated before and after the BiGRU layer to strengthen temporal feature learning under fluctuating weather. Finally, the IMPA is designed to improve exploration–exploitation balance and automatically optimize key hyperparameters. Experiments under sunny, cloudy, and rainy conditions demonstrate that IMPA-Att-BiGRU reduces MAE and RMSE by 35.7–58.5% and 22.8–49.1% versus BiGRU, respectively, while increasing R² by 2.2–4.1 percentage points. Against the best benchmark (LSTM), MAE and RMSE are further reduced by 38.1–49.5% and 33.8–52.4%. Moreover, in a cross-day rolling forecasting test with fivefold results, IMPA-Att-BiGRU achieves 62.4% MAE and 49.3% RMSE reductions over BiGRU, confirming robust performance under long-horizon error accumulation.

Keywords:

photovoltaic power forecasting; marine predators algorithm; multi-head self-attention; bidirectional gated recurrent unit; renewable energy forecasting

1. Introduction

With the continuous increase in installed capacity and application area, photovoltaic (PV) power generation has become one of the most crucial components of renewable energy power systems [1,2,3]. Meanwhile, accurate photovoltaic output power forecasting has been quite significant for generation scheduling, grid stability maintenance, and the efficient integration of renewable energy sources [4,5,6]. However, the output power of photovoltaics is always limited by complex meteorological elements, such as solar irradiance, ambient temperature, and so on [7], especially under cloudy and rainy weather conditions, introducing strong stochasticity into accurate PV output power forecasting [8].

The recent studies on photovoltaic power forecasting can generally be categorized into physics-based or statistical methods [9]. The physics-based methods predominantly rely on capturing the meteorological information around the PV installations. Many existing studies focus primarily on solar irradiance data, while the interactions between multiple meteorological factors and PV performance are often insufficiently explored [10]. One of the most popular statistical approaches, the autoregressive integrated moving average (ARIMA) models, over-relies on linear assumptions and exhibits limited capability in capturing the nonlinear and dynamic characteristics of PV output power [11].

Data-driven methods have been more frequently applied to PV output power forecasting since the rapid development of machine learning, especially some representative models, including support vector machines (SVMs) [12], K-nearest neighbors (KNN) [13], and extreme gradient boosting (XGBoost) [14]. More recently, some typical deep learning methods were designed to effectively learn sequential dependencies in time-series forecasting tasks to achieve PV output power forecasting, particularly the original and the variant recurrent neural networks (RNNs) [15], including the long short-term memory (LSTM) [16] and the gated recurrent unit (GRU) [17] models. Despite these advances, some challenges remain unresolved. Despite recent progress in incorporating attention-based models [18,19] for PV output power forecasting, many approaches still rely on single-stage attention designs, empirically selected hyperparameters without a principled global optimization procedure, and limited preprocessing to alleviate redundancy and nonlinear coupling among meteorological variables, which may collectively hinder accuracy and robustness when weather conditions fluctuate sharply.

To address these challenges, this study proposes a novel PV output power forecasting model integrating KPCA, dual-stage multi-head self-attention, and an improved Marine Predators Algorithm. Specifically, KPCA is employed as a nonlinear feature-compression front-end to extract informative latent components and alleviate redundancy and nonlinear coupling in the original inputs, thereby reducing the learning burden. On this basis, a BiGRU forecaster enhanced by a dual-stage multi-head self-attention mechanism is constructed, where attention is applied to both the KPCA-compressed inputs and the BiGRU hidden representations to enable complementary saliency learning at the input and representation levels. Furthermore, an improved Marine Predators Algorithm (IMPA) is developed to provide a systematic and reproducible hyperparameter search by strengthening global exploration and local exploitation, leading to a more reliable model configuration under diverse meteorological conditions. It is noteworthy that several variants of the improved MPA have been reported for PV-related optimization, such as VMD parameter tuning and photovoltaic system parameter identification [20,21]. However, these variants employ different enhancement operators and are typically tailored to different objectives; therefore, they are not readily transferable to hyperparameter tuning of the attention-enhanced BiGRU (Att-BiGRU) PV output power forecasting model. From the perspective of symmetry, photovoltaic output power series exhibit inherent temporal structures, especially under stable weather conditions, where power evolution presents approximate rise–peak–decline patterns. Bidirectional recurrent architectures are well suited to capture such temporal symmetry by jointly modeling past-to-future and future-to-past dependencies. In addition, the self-attention mechanism introduces a symmetric weighting strategy over temporal features, allowing the model to adaptively balance the contribution of different time steps. Meanwhile, the improved Marine Predators Algorithm maintains a symmetric balance between global exploration and local exploitation during parameter optimization. These symmetry-aware characteristics collectively contribute to the robustness and accuracy of the proposed forecasting model.

In summary, this study demonstrates not only theoretical innovation but also experimental validation in terms of PV output power forecasting accuracy, highlighting the practical applicability of the proposed IMPA-Att-BiGRU PV model and offering insights into future research and practical applicability.

2. Methodology

2.1. Kernel Principal Component Analysis for Feature Extraction

The accuracy of PV output power forecasting highly relies on real meteorological datasets containing a large number of components, which consist of solar irradiance, temperature, humidity, rainfall, wind speed, and so on. However, too many input variables will cause the prediction model to face significant nonlinear correlation and redundancy when extracting hidden features and may even introduce the unavoidable noise information, increasing the complexity of the forecasting procedure. Hence, Kernel Principal Component Analysis (KPCA) is employed to compress the original inputs into a compact and informative feature representation with lower dimensions.

KPCA is selected in preference to typical dimensionality-reduction approaches such as ICA and autoencoders for two key reasons. First, PV output power is nonlinearly coupled with meteorological inputs, and KPCA captures this nonlinear structure through kernel-induced feature mapping. In contrast, ICA relies on independence assumptions that may not hold for highly correlated meteorological variables and may be less effective in capturing nonlinear relationships. Second, compared with autoencoders, KPCA introduces no additional trainable parameters and is less sensitive to initialization and training hyperparameters, which improves stability and reproducibility—particularly important when the downstream forecasting model is already a deep model.

By mapping the complex data into a high-dimensional feature space, KPCA could capture the intrinsic nonlinear structures from the original input [22]. This transformation facilitates the characterization of complex nonlinear dependencies among PV-related meteorological variables. As a result, feature redundancy and noise are effectively reduced while the essential nonlinear characteristics of the original data are well preserved. Moreover, KPCA enables nonlinear patterns embedded in the original input data to be represented in the feature space where linear principal component extraction can be efficiently performed.

Assume that the sample set is:

X_{n \times m} = {[x_{1}, x_{2}, \dots, x_{n}]}^{T}, x_{i} \in R^{m}, i = 1, 2, \dots, n

(1)

where n denotes the total quantity of the m-dimension samples. Subsequently, a nonlinear mapping

θ

is employed to project the samples into the h-dimension feature space H:

θ : x_{i} \in R^{m} \to θ (x_{i}) \in R^{h}

(2)

Then, the kernel function f_k and the covariance matrix Q could be respectively represented as:

\{\begin{cases} f_{k} (x_{i}, x_{j}) = θ {(x_{i})}^{T} θ (x_{j}) = 〈θ (x_{i}), θ (x_{j})〉 \\ Q = \frac{1}{n - 1} \sum_{j = 1}^{n} θ (x_{j}) θ {(x_{j})}^{T} \\ i, j = 1, 2, \dots, n \end{cases}

(3)

In this study, KPCA adopts the radial basis function (RBF) kernel to capture the nonlinear coupling between PV output power and meteorological variables. This choice is motivated by the fact that PV output power is nonlinearly coupled with meteorological drivers and the inputs are often highly correlated; the RBF kernel provides a smooth and flexible nonlinear mapping with only one bandwidth parameter, enabling KPCA to capture the coupling in a stable and reproducible manner. The kernel is defined as:

f_{k} (x_{i}, x_{j}) = \exp (- \frac{{‖x_{i} - x_{j}‖}^{2}}{2 σ^{2}})

(4)

where

σ

denotes the kernel bandwidth. The bandwidth

σ

is determined systematically on the training set using the median heuristic by setting it to the median of pairwise Euclidean distances among training samples, and the resulting value is fixed for all experiments to ensure reproducibility. Prior to KPCA, all input variables are Z-score standardized based on the training-set statistics (zero mean and unit variance), and the same transformation is applied to the testing set to avoid information leakage. The RBF kernel is a widely used universal kernel with a single bandwidth parameter, which offers a flexible yet stable nonlinear mapping and is well suited to PV output power forecasting where meteorological inputs are highly correlated and nonlinearly coupled with PV output power.

Consequently, the characteristic equation between the eigenvector V_k and its eigenvalue µ_k will be:

μ_{k} V_{k} = Q V_{k} = \frac{1}{n - 1} \sum_{j = 1}^{n} θ (x_{j}) θ {(x_{j})}^{T} V_{k} = \frac{1}{n - 1} \sum_{j = 1}^{k} 〈 θ (x_{j}), V_{k} 〉 θ (x_{j})

(5)

Each eigenvector V_k could be regarded as a linear combination as:

V_{k} = \sum_{i = 1}^{n} v_{k, i} θ (x_{i}), k = 1, 2, \dots, n

(6)

where v_k and i are linear correlation coefficients.

It can be inferred from Equations (3) and (5):

\{\begin{cases} μ_{k} \sum_{i = 1}^{n} v_{k, i} K_{k, i} = \frac{1}{n - 1} \sum_{i = 1}^{n} v_{k, i} \sum_{i = 1}^{n} K_{k, i} K_{j i} \\ λ_{k} v_{k} = K v_{k} \end{cases}

(7)

where

λ_{k} = (n - 1) μ_{k}

;

K \in R^{n \times n}

is a kernel matrix.

The eigenvector matrix established by the covariance matrix Q’s eigenvector, V_k, is shown as:

V_{f} = [V_{1} V_{2} \dots V_{l} V_{l + 1} \dots V_{n}]

(8)

While selecting one principal component, the matrix

\overset{\land}{V_{f}} = [V_{1} V_{2} \dots V_{l}] \in R^{n \times l}

could be obtained in the principal component space.

Since the eigenvector V_k should meet the normal constraint in the feature space H, K’s eigenvectors will be:

\{\begin{cases} 〈v_{k}, v_{k}〉 = \frac{〈V_{k}, V_{k}〉}{λ_{k}} \\ 〈V_{k}, V_{k}〉 = 1 \end{cases}

(9)

Since the eigenvector could be defined as the norm of

\frac{1}{\sqrt{λ_{k}}}

:

v_{k} = \frac{1}{\sqrt{λ_{k}}} v^{k}

(10)

Consequently, in the feature space,

\overset{\land}{V_{f}}

could be inferred as:

\{\begin{cases} \overset{\land}{V_{f}} = [\frac{1}{\sqrt{λ_{1}}} X^{T} v^{l}] = X^{T} \overset{\land}{V} Λ^{- \frac{1}{2}} \\ Λ = d i a g (λ_{1}, λ_{2}, \dots, λ_{l}) \\ \overset{\land}{V} = [v^{1} v^{2} \dots v^{l}] \end{cases}

(11)

where

X = [θ (x_{1}) θ (x_{2}) \dots θ (x_{n})]

.

By transforming the original high-dimensional meteorological inputs into a compact set of nonlinear principal components, KPCA provides a structured and noise-reduced feature representation for subsequent temporal modeling. This dimensionality reduction not only alleviates the computational burden of deep learning models but also enhances the stability of sequence learning by filtering redundant and highly correlated variables.

2.2. BiGRU

A gated recurrent unit (GRU) is a variant of a recurrent neural network (RNN), which is updated to avoid frequent gradient vanishing or exploding. There is only one reset gate and one update gate to control the historical state in the output and the information combined with the current state [23]. The basic structure of a GRU is illustrated in Figure 1.

\{\begin{cases} z_{t} = s i g m o i d (W_{z} x_{t} + U_{z} h_{t - 1} + b_{z}) \\ r_{t} = s i g m o i d (W_{r} x_{t} + U_{r} h_{t - 1} + b_{r}) \\ h_{t}^{*} = \tanh (W_{h} x_{t} + U_{h} (r_{t} ⊙ h_{t - 1})) \\ h_{t} = (1 - z_{t}) h_{t - 1} + z_{t} h_{t}^{*} \end{cases}

(12)

where W, U, and b are the weight matrix; r_t is the reset gate vector; and z_t is the updated gate vector.

GRU acts as a unidirectional recurrent neural network structure with a transmission state from front to back. Furthermore, two superimposed unidirectional GRUs are combined for the BiGRU, and the output is controlled by both GRUs simultaneously. The BiGRU structure is shown in Figure 2.

BiGRU is capable of analyzing the sequence in not only forward but also backward directions, which makes it sensitive to capturing the temporal dependencies and the bidirectional correlations inherent in PV output power time series. Compared with the unidirectional structure, BiGRU enables the model to exploit the complete contextual information, which is particularly beneficial for handling the strong non-stationarity and intermittency induced by varying meteorological conditions [24]. The bidirectional structure enables the model to capture both historical and future contextual information in a symmetric manner, which is particularly beneficial for PV power series characterized by strong diurnal regularity and periodic patterns.

2.3. Multi-Head Self-Attention Mechanism

In PV output power forecasting tasks, meteorological variability exhibits strong nonlinearity and temporal heterogeneity, making it difficult for conventional sequence models to consistently identify key influencing factors. Although the BiGRU model is capable of modeling temporal dependencies, it often lacks an explicit mechanism to distinguish the relationship among the input features.

The multi-head self-attention mechanism enables efficient modeling of long-term dependencies and integrates information from the entire sequence by allowing every position to directly attend to all other positions, regardless of their temporal distance. Meanwhile, the multi-head self-attention mechanism allows multiple attention subspaces to be learned in parallel, facilitating the extraction of diverse feature interactions and temporal patterns. Specifically, each input vector is projected into query, key, and value representations. The similarity between the query and key determines an attention weight, which reflects the relevance of one position to another. In this way, multi-head self-attention enables flexible, content-dependent interactions between distant time steps or features and selectively emphasizes critical information while suppressing irrelevant or redundant components. Given the d-dimension, the input sequence X is:

X = [x_{1}, \dots, x_{N}]

(13)

where N represents the sequence length. Then the query, key, and value vector, which are respectively defined as Q, K, and V, will be obtained by a cubic linear transformation between the input and the three corresponding weight matrices, W^Q, W^K, and W^V, to calculate A_i, the attention score for each head of the total n heads:

\{\begin{cases} Q = X W^{Q} \\ K = X W^{K} \\ V = X W^{V} \\ A_{i} = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V \end{cases}

(14)

where d_k is Q’s dimension in k. Ultimately, as shown in Figure 3, the multi-attention output will be:

M_{n} = C o n c a t (A_{1}, A_{2}, \dots, A_{n}) W^{O}

(15)

where W^O is the output projection matrix.

In the proposed forecasting model, two layers of multi-head self-attention mechanisms are embedded both before and after the BiGRU layers to enhance feature representation and temporal dependency modeling from complementary perspectives. The input-level multi-head self-attention applied before the BiGRU layer could adaptively evaluate the relative importance of different input features and temporal positions after KPCA-based dimensionality reduction. By assigning higher attention weights to informative meteorological patterns and suppressing redundant or noisy components, this module enables the model to better cope with the strong variability and non-stationarity caused by changing weather conditions, thereby providing more discriminative inputs for subsequent temporal modeling. Subsequently, the output-level multi-head self-attention mechanism is further employed to reweight the hidden representations generated by the BiGRU model, aiming to preserve the deep temporal dependencies between the input and the output. The attention-enhanced design enables the forecasting model to maintain a balanced focus on informative time steps while preserving the temporal symmetry captured by the bidirectional recurrent structure, effectively integrating feature-level selection and temporal-level refinement, resulting in improved PV output power forecasting accuracy.

2.4. Improved Marine Predators Algorithm

The Marine Predators Algorithm (MPA) is an intelligent optimization algorithm whose inspiration comes from the natural preying rules in the marine ecosystem [25]. As a population-based method, the MPA designs its initial solution X₀ as uniformly distributed within the range of [X_min, X_max]:

\{\begin{cases} X_{0} = X_{\min} + r a n d \times (X_{\max} - X_{\min}) \\ r a n d \in (0, 1) \end{cases}

(16)

The prey (the initial individual), Prey, and the top predator (the fittest solution), Elite, are respectively defined as:

E l i t e = {[\begin{array}{c} X_{1, 1}^{I} X_{1, 2}^{I} \dots X_{1, d}^{I} \\ X_{2, 1}^{I} X_{2, 2}^{I} \dots X_{2, d}^{I} \\ \dots \dots \dots \\ X_{n, 1}^{I} X_{n, 2}^{I} \dots X_{n, d}^{I} \end{array}]}_{n \times d}

(17)

P r e y = {[\begin{array}{c} X_{1, 1} X_{1, 2} \dots X_{1, d} \\ X_{2, 1} X_{2, 2} \dots X_{2, d} \\ \dots \dots \dots \\ X_{n, 1} X_{n, 2} \dots X_{n, d} \end{array}]}_{n \times d}

(18)

where X_i,j represents the jth dimension of ith individual.

According to the proportional relationship between the current iteration t and the maximum iteration t_max, the complete optimization process is divided into three phases:

Phase 1 (

t < \frac{1}{3} t_{\max}

): The individual is moving faster than the fittest solution:

\{\begin{cases} s_{i} = R_{B} \otimes (E_{i} - R_{B} \otimes P_{i}) \\ P_{i} = P_{i} + 0.5 R \otimes s_{i} \\ R \in [0, 1] \end{cases}

(19)

where E_i and P_i respectively represent the ith predator and prey, the notation

\otimes

shows entry-wise multiplications, s_i is the individual’s movement step size, and R_B denotes the Brownian motion vector containing random numbers based on a normal distribution.

Phase 2 (

\frac{1}{3} t_{\max} < t < \frac{2}{3} t_{\max}

): Fifty percent of the population is selected for exploration and the other half for exploitation. The mathematical model is applied as:

\begin{array}{l} For the first 50 % population (from 1 to n / 2) : \\ \{\begin{cases} s_{i} = R_{L} \otimes (E_{i} - R_{L} \otimes P_{i}) \\ P_{i} = P_{i} + 0.5 R \otimes s_{i} \end{cases} \\ For the second 50 % population (from n / 2 to n) : \\ \{\begin{cases} s_{i} = R_{B} \otimes (R_{B} \otimes E_{i} - P_{i}) \\ P_{i} = E_{i} + 0.5 C \otimes s_{i} \\ C = {(1 - t / t_{\max})}^{2 t / t_{\max}} \end{cases} \end{array}

(20)

where R_L denotes the Levy movement vector of random numbers based on Levy distribution, and C is an adaptive parameter.

Phase 3 (

t > \frac{2}{3} t_{\max}

): The fittest solution is moving faster than the individual:

\{\begin{cases} s_{i} = R_{L} \otimes (R_{L} \otimes E_{i} - P_{i}) \\ P_{i} = E_{i} + 0.5 C \otimes s_{i} \end{cases}

(21)

2.4.1. Information Exchanging and Quasi-Opposition-Based Learning

In order to avoid overlooking valid information existing in the search region or falling into local stagnation during the optimization process, all the independent individuals should realize the information intersection with other individuals in the region, and then for each individual P_i, another different individual P_j (

i \neq j

) is selected randomly from the population for information exchange to address this. The new candidate solution of P_i is then obtained as:

P_{i_n e w} = \{\begin{cases} P_{i} + R \times (E_{i} - P_{j}), if p_{e} < 0.5 \\ P_{i} - R \times (E_{i} - P_{j}), else \end{cases}

(22)

where P_i is updated by P_{i_new} only if its fitness is worse than the new one, and p_e ∈ [0, 1] is a random number that is responsible for controlling the direction of information exchange. The new solution P_{i_new} is designed to carry beneficial advantages offered by the fittest solution and other individuals simultaneously.

After the whole population completes information exchange, where each individual P_i has compared the fitness with P_{i_new}, the new candidate prey population will be generated. Subsequently, the quasi-opposition learning mechanism is applied to expand the convergence region and diversify the present population.

Then, the concept of quasi-opposition-based learning is employed to further diversify the present population and improve the convergence range. In this strategy, a random range of individuals is picked up to map the corresponding opposite individuals:

P_{i}^{o} = X_{\max} + X_{\min} - P_{i}

(23)

where

P_{i}^{o}

is the opposite solution of P_i. After finding the center of the search space P_c and the opposite solution, a quasi-opposite solution

P_{i}^{q}

is generated at a random position within the range between them:

\{\begin{cases} P_{c} = \frac{X_{\max} + X_{\min}}{2} \\ P_{i}^{q} = \{\begin{cases} P_{c} + r a n d \times (P_{i}^{o} - P_{c}), if P_{i}^{o} > P_{c} \\ P_{i}^{o} + r a n d \times (P_{c} - P_{i}^{o}), else \end{cases} \end{cases}

(24)

Then, a new individual matrix is constructed by both the primitive individuals and the quasi-opposite ones simultaneously, from which the first n solutions sorted by their fitness are selected for the new candidate population, which is kept for the next iteration.

2.4.2. Improvement of Exploration and Exploitation Methods

In the second phase of the MPA (

\frac{1}{3} t_{\max} < t < \frac{2}{3} t_{\max}

),half of the population is arranged for exploration and the other half for exploitation. In order to simultaneously enhance not only the exploration but also the exploitation capabilities of the algorithm, the individual position update method in the GWO is introduced to improve its corresponding process.

Inspired by the prey-hunting activity of grey wolves, the Grey Wolf Optimizer (GWO) has been applied in various applications due to its advantages in exploration and exploitation [26,27,28]. In the GWO, A is an important parameter, and the algorithm controls its exploration and exploitation through the scope of A. The calculation method for A is as follows:

\{\begin{cases} A = 2 a \cdot R - a \\ R \in [0, 1] \end{cases}

(25)

where a is a variable factor, whose changing trend is shown in Figure 4:

The process of updating the position of the grey wolf is:

W_{i} = W_{i}^{P} - A \cdot |2 R \cdot W_{i}^{P} - W_{i}|

(26)

where W_i and

W_{i}^{P}

respectively represent the ith wolf and prey.

Therefore, in the second phase of the IMPA, the exploration and exploitation methods are improved as:

\begin{array}{l} For the first 50 % population (from 1 to n / 2) : \\ \{\begin{cases} s_{i} = R_{L} \otimes (E_{i} - R_{L} \otimes (E_{i} - A \cdot |2 R \cdot E_{i} - P_{i}|)) \\ P_{i} = P_{i} + 0.5 R \otimes s_{i} \end{cases} \\ For the second 50 % population (from n / 2 to n) : \\ \{\begin{cases} s_{i} = R_{B} \otimes (R_{B} \otimes E_{i} - (E_{i} - A \cdot |2 R \cdot E_{i} - P_{i}|)) \\ P_{i} = E_{i} + 0.5 C \otimes s_{i} \\ C = {(1 - t / t_{\max})}^{2 t / t_{\max}} \end{cases} \end{array}

(27)

From an optimization perspective, the original MPA may suffer from an imbalance between global exploration and local exploitation. By symmetrically constraining the exploration and exploitation process, the improved algorithm maintains a dynamic balanced interaction between different individuals, preventing premature convergence while preserving search efficiency.

2.4.3. Refracted Opposition-Based Learning

The MPA has better optimization performance in the early iteration of the algorithm but worse optimization performance in its later iterations due to the gradual increase in individual similarity. Consequently, the IMPA introduces the refracted opposition-based learning strategy into the MPA for individual mutation to enhance individual diversity by the end of the iteration. The refracted opposition-based learning applies the refraction principle of light to ameliorate the update process of the predator as follows:

E_{i}^{R} = [(m + 1) (X_{\max} - X_{\min}) - 2 E_{i}] / 2 m

(28)

where

E_{i}^{R}

is the predator that mutates through the refracted opposition-based learning and m is a variable refractive index parameter, while the length of refracting light could be changed by regulating its value so that the algorithm can escape from the local minimum. For this, the value of m ∈ [m_min, m_max] can be written by the linear gradient:

m = m_{\max} - (m_{\max} - m_{\min}) (t / t_{\max})

(29)

where the value of m is affected by the number of iterations. Through the process of refracted opposition-based learning, the original predator E_i mutates to generate

E_{i}^{R}

, and both of them exploit through Levy flight at the same time, and the one with better fitness will be chosen for the iteration in a low-velocity ratio in order to avoid ignoring the optimal solution. This will result in the mathematical model in the low-velocity ratio being updated as follows if

E_{i}^{R}

is the better predator:

\{\begin{cases} s_{i}^{R} = R_{L} \otimes (R_{L} \otimes E_{i}^{R} - P_{i}) \\ P_{i}^{R} = E_{i}^{R} + 0.5 C \otimes s_{i}^{R} \end{cases}

(30)

where

s_{i}^{R}

is the step size of the movement of

E_{i}^{R}

, and it will serve as a basis for the mutation prey

P_{i}^{R}

to update its position.

The pseudocode of the proposed IMPA is provided in Table 1.

2.4.4. Performance Evaluation of IMPA

To assess the effect of the optimization search for the IMPA, the typical unimodal benchmark functions F₁, F₂, and the typical multimodal benchmark functions F₃, F₄, with theoretical optima of 0, 0, 0, and −10.1532, respectively, are selected for simulation experiments. Meanwhile, the MPA, the grey wolf optimizer (GWO), and particle swarm optimization (PSO) [29] are set as the comparison algorithms. The benchmark functions are shown in the following equations:

F_{1} = \sum_{i = 1}^{n} x_{i}^{2}

(31)

F_{2} = \sum_{i = 1} {(\sum_{j - 1}^{i} x_{j})}^{2}

(32)

F_{3} = \{\begin{cases} \frac{π}{n} \{10 \sin (π y_{1}) + \sum_{i = 1}^{n - 1} {(y_{i} - 1)}^{2} [1 + 10 \sin^{2} (π y_{i + 1})] + {(y_{n} - 1)}^{2}\} \\ + \sum_{i = 1}^{n} u (x_{i}, 10, 100, 4) \\ y_{i} = 1 + \frac{x_{i} + 1}{4} u (x_{i}, a, k, m) = \{\begin{cases} k {(x_{i} - a)}^{m} x_{i} > a \\ 0 - a < x_{i} < a \\ k {(- x_{i} - a)}^{m} x_{i} < - a \end{cases} \end{cases}

(33)

F_{4} = - \sum_{i = 1}^{5} {[(X - a_{i}) {(X - a_{i})}^{T} + c_{i}]}^{- 1}

(34)

Moreover, F₁ to F₄ are shown in Figure 5, Figure 6, Figure 7 and Figure 8. The independent optimization tests are performed through each group of algorithms for each benchmark function, and the number of iterations is 500. The standard deviation, the mean, the worst value, and the optimal value are separately calculated and shown in Table 2.

Moreover, the IMPA demonstrates superior performance over other comparative algorithms for the unimodal or multimodal benchmark functions in Table 2. In particular, the theoretical optimal result is found through the IMPA with a standard deviation of 0 for the functions F₁ and F₂, and the multimodal function F₄. This shows that the computational accuracy and robustness of the IMPA are improved via the introduction of the information exchange and quasi-opposition-based learning, the improvement of exploration and exploitation, and the refracted opposition-based learning. Moreover, as can be seen in Figure 9, Figure 10, Figure 11 and Figure 12, compared with other algorithms, the lowest number of iterations for searching the optimal value is required by the IMPA. This indicates that the convergence speed is improved by increasing the proportion of high-quality individuals in the IMPA. In summary, compared with the MPA, PSO, and GWO algorithms, the IMPA has better convergence ability, robustness, and local extremum escape performance, confirming the effectiveness of the proposed improvement strategies.

2.5. IMPA-Att-BiGRU PV Output Power Forecasting Model

The forecasting performance of the BiGRU model is sensitive to several key hyperparameters, including the learning rate, the hidden units, and the dropout rate. Therefore, the IMPA algorithm is introduced to perform global optimization of these hyperparameters. Meanwhile, the multi-head self-attention mechanism is incorporated to enhance temporal feature learning and to adaptively extract the most informative patterns from the input obtained from KPCA. The flowchart of the IMPA-Att-BiGRU forecasting model is illustrated in Figure 13, while the corresponding data flow is shown in Figure 14.

Step 1: Historical time-series sequences of the meteorological variable data are used as the model input.

Step 2: The input data is divided into the training dataset and the testing dataset in a ratio of 80% and 20%.

Step 3: KPCA is applied to reduce the dimensionality of the input data and extract the nonlinear principal components used for subsequent model construction.

Step 4: The key hyperparameters of the BiGRU are identified and the IMPA is used to optimize the hyperparameter set.

Step 5: The multi-head self-attention modules are applied both before and after the BiGRU layers to capture complementary temporal and feature representations.

Step 6: The IMPA-Att-BiGRU forecasting model is trained using the training dataset.

Step 7: The forecasting performance of IMPA-Att-BiGRU is evaluated using the testing dataset.

3. Results and Discussion

In this study, two complementary PV output power forecasting experiments are conducted to evaluate the proposed IMPA–Att-BiGRU model, including single-day one-step (15 min) prediction and cross-day rolling forecasting over four consecutive days. There are three commonly used evaluation metrics shown in Equation (35): the mean absolute error (MAE), the root mean square error (RMSE), and the coefficient of determination (R²), which are adopted to quantitatively evaluate the forecasting performance of the proposed model.

\{\begin{cases} M A E = \frac{1}{N} \sum_{i = 1}^{N} |{\hat{y}}_{i} - y_{i}| \\ R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}} \\ R^{2} = 1 - \frac{\sum_{i} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i} {(y_{i} - \bar{y})}^{2}} \end{cases}

(35)

where

{\hat{y}}_{i}

and y_i represent the ith sample’s forecasted value and actual value, respectively; N is the total number of samples; and

\bar{y}

is the mean of the actual values.

For fair comparison, the backbone architecture of the proposed IMPA-Att-BiGRU model is kept consistent across the two complementary experiments. Specifically, the BiGRU forecaster employs a fixed two-layer configuration to balance representation capacity and generalization, and the second-layer hidden units are set to half of the first-layer units to control model complexity. The learning rate, first-layer hidden units, and dropout rate are automatically tuned by the IMPA, with the search space defined as learning rate in [3 × 10⁻⁴, 3 × 10⁻³] sampled on a logarithmic scale, hidden units in [64, 256] with a step size of 16, and dropout in [0.10, 0.35] with a step size of 0.05. The IMPA configuration is fixed across experiments with a population size of 30 and a maximum iteration budget of 200, and an early-termination rule is applied to stop the optimization if the best validation fitness does not improve for 30 consecutive iterations.

3.1. Single-Day One-Step Forecasting Experiment

3.1.1. Dataset Processing

The dataset used in this study was collected from an operational PV generation system located in southern China (installed capacity: 50 kW) from 8 a.m. to 8 p.m. each day. Measurements were recorded every 15 min between 08:00 and 20:00 each day, covering the period from 1 January 2014 to 10 December 2018, which provides long-term routine operational data for PV forecasting. The dataset was divided into training and testing subsets with a ratio of 80% and 20%, respectively. Although all experiments adopted the data split ratio, independent data partitions were generated for different experimental settings using fixed random seeds to ensure reproducibility and experimental fairness.

The dataset was collected from a real operating PV generation system, where measurements are inevitably affected by practical issues such as sensor noise, communication dropouts, missing records, and occasional gross errors. Therefore, data reconciliation is a necessary front-end in real-world PV output power forecasting pipelines to ensure the reliability of the initial inputs and to prevent error propagation to downstream learning and evaluation. The data reconciliation was applied to improve input reliability by treating missing values and suppressing gross errors (outliers) following the robust reconciliation frameworks in Refs. [30,31]. In practice, preprocessing was conducted on the training set only, and the same rules were applied to the testing set to avoid information leakage. First, missing values were detected by checking each variable against its sampling timeline. Short gaps (no more than four consecutive time steps) were imputed using time-series interpolation (linear interpolation in time), whereas longer consecutive gaps were treated as invalid segments and excluded from model training. Second, gross errors were identified by comparing measurements with reconciled estimates obtained from a robust reconciliation model. The reconciliation was formulated by minimizing weighted residuals between measured and reconciled values, with an M-estimation penalty to reduce the influence of abnormal observations. Samples with absolute normalized residuals larger than 3 were flagged as outliers and replaced by their reconciled estimates.

Ten meteorological and operational variables were used as input features in the dataset. These variables describe irradiance conditions, atmospheric states, and PV module operating characteristics. The details of the data characteristics are shown in Table 3.

At first, KPCA was applied prior to model training in order to remove the feature components with low explanatory power. KPCA enabled the extraction of the most informative nonlinear structures from the original 10-dimensional feature space while eliminating redundancy among variables. The KPCA adopted an RBF kernel, and the bandwidth was set to

σ

= 4.176, which was estimated on the Z-score standardized training set using the median heuristic. The contribution rate results of each principal component are illustrated in Figure 15.

As depicted in Figure 15, KPCA revealed a sharp decay in eigenvalue magnitude after the fifth principal component, indicating an intrinsic low-dimensional manifold within the original 10-dimensional feature space, while the cumulative rate of the principal components 1–5 exceeded 95%. Although the cumulative contribution rate (≥95%) provides a common guideline for selecting the number of retained components, the forecasting performance may still depend on the retained component number k. Therefore, a sensitivity analysis was conducted by varying k from 3 to 7. To isolate the effect of k, all other settings were kept unchanged, including the evaluation split and training protocol. The corresponding results of the sensitivity analysis for the proposed IMPA-Att-BiGRU model are shown in Table 4.

Based on the sensitivity analysis in Table 4, k = 5 was selected as the retained KPCA principal component number. When k was reduced to 3 or 4, the forecasting accuracy degraded noticeably, indicating that retaining too few components discards informative nonlinear variability that is important for PV output power forecasting—particularly around peaks and rapid fluctuations, to which RMSE is more sensitive. In contrast, increasing k beyond 5 yielded only marginal gains: The improvements from k = 5 to k = 6 or k = 7 were minor, suggesting a clear performance plateau where additional components mainly introduce redundant information. Therefore, k = 5 provides a favorable trade-off between compact representation (cumulative contribution rate 0.9738) and forecasting accuracy. Accordingly, a 5-dimensional feature representation was adopted as the compact input for the subsequent experiments, without sacrificing the nonlinear structure of the information.

3.1.2. Component-Wise Ablation Study of the Proposed Model

To verify the individual contribution of each enhanced module in IMPA-Att-BiGRU, the ablation experiments were conducted under normal weather conditions (sunny) and abnormal weather conditions (cloudy and rainy), comparing the BiGRU model, the Att-BiGRU, and the IMPA-Att-BiGRU model for the PV output power forecasting. Figure 16, Figure 17 and Figure 18 illustrate the representative forecasting results under each weather condition on a single representative test day, respectively, while the quantitative error metrics are summarized in Table 5.

As shown by the ablation experiment results, all the three evaluation metrics of the proposed IMPA-Att-BiGRU forecasting model achieved the best performance under different weather conditions, including sunny, rainy, and cloudy. As a comparison, the original BiGRU model without any improvement or optimization had the highest error regardless of weather conditions.

It should be noted that since the meteorological parameters were stable on sunny days, the variation trend of photovoltaic output power on each sunny day was roughly similar, while the variation trend of meteorological parameters on cloudy and rainy days was random, resulting in the accuracy of each PV power forecasting model on sunny days being higher than that on cloudy and rainy days. However, it can be proved from the experimental results that under cloudy and rainy weather conditions, the Att-BiGRU better fit the change trend of the actual photovoltaic output power than the original BiGRU model, which strongly confirms that the multi-head self-attention mechanism improves the capabilities of feature extraction and analysis. In terms of accuracy, although the forecasting error of Att-BiGRU was lower than that of the original BiGRU model, the forecasting accuracy under various weather conditions was significantly improved after the hyperparameter optimization by the proposed IMPA algorithm.

In summary, it can be proved that the simultaneous introduction of the multi-head self-attention mechanism and the IMPA to the BiGRU model is more meaningful for improving the accuracy of PV output power forecasting results. These results demonstrate that IMPA-Att-BiGRU could simultaneously capture not only the hidden temporal dependencies but also the short-term fluctuations, confirming that the multi-head self-attention mechanism and the IMPA provide complementary enhancements to PV output power forecasting.

3.1.3. Performance Comparison of Optimization Algorithms in PV Output Power Forecasting

To further verify the more effective parameter optimization effect of the IMPA on the PV output power forecasting model, comparative experiments were conducted against the other three optimization algorithms mentioned above (PSO, GWO, and MPA). The forecasting results under sunny, cloudy, and rainy weather conditions are illustrated in Figure 19, Figure 20 and Figure 21, while the quantitative evaluation results in terms of MAE, RMSE, and R² are summarized in Table 6.

Under sunny weather conditions, all the optimized forecasting models were able to closely track the smooth diurnal variation of the PV output power, as shown in Figure 19. However, noticeable performance differences were still observed. PSO-Att-BiGRU exhibited the largest prediction deviation, with an MAE of 1.3721 and an RMSE of 2.1424, whereas GWO-Att-BiGRU and MPA-Att-BiGRU achieved progressively improved accuracy. In contrast, the proposed IMPA-Att-BiGRU achieved the best performance, reducing the MAE and RMSE to 1.0166 and 1.6234, respectively, and yielding the highest R² value of 0.9979.

For cloudy conditions, characterized by moderate irradiance fluctuations, the performance advantage of IMPA-Att-BiGRU became more pronounced, as illustrated in Figure 20. Compared with MPA-Att-BiGRU, the proposed method reduced MAE and RMSE by approximately 15.8% and 13.3%, respectively. Meanwhile, IMPA-Att-BiGRU also outperformed GWO-Att-BiGRU and PSO-Att-BiGRU, achieving the highest R² value of 0.9939. These results indicate that the proposed improvement strategies of the IMPA can effectively enhance the exploration and exploitation balance of the original algorithm, enabling the forecasting model to better capture the highly fluctuating power variations under cloudy conditions.

Under rainy weather conditions, where PV output exhibited severe intermittency and rapid power ramps, all models experienced increased forecasting difficulty, as shown in Figure 21. Nevertheless, IMPA-Att-BiGRU consistently maintained superior performance. As reported in Table 6, the MAE and RMSE of IMPA-Att-BiGRU were reduced to 1.2855 and 2.6652, respectively, which were significantly lower than those obtained by PSO-Att-BiGRU, GWO-Att-BiGRU, and MPA-Att-BiGRU. Moreover, IMPA-Att-BiGRU achieved the highest R² value of 0.9925, indicating a more robust capability in capturing both abrupt fluctuations and overall power trends under highly volatile weather conditions. The results prove that the proposed IMPA algorithm not only maintains the local exploitation but also enhances the global exploration performance. Compared with the PSO, the GWO, and the original MPA, the optimization effect of the IMPA on the hyperparameters of the forecasting model is much better, which confirms the effectiveness of the improvement strategies.

In summary, the IMPA’s enhancements are particularly beneficial for hyperparameter tuning because the fitness landscape is expensive and often multimodal. The information-exchange mechanism accelerates early convergence by propagating effective hyperparameter patterns across individuals, while the quasi-opposite learning increases diversity and helps escape local optima without increasing the population size. The improved exploration–exploitation dynamics provide a more balanced global search and local refinement, which stabilizes convergence under a limited evaluation budget. Finally, the refracted opposition strategy refines the elite solution by probing a targeted counterpart around the current best hyperparameters and retaining it only when it improves validation fitness, reducing premature stagnation and improving the robustness of the final tuned configuration.

3.1.4. Performance Comparison with Benchmark Forecasting Models

Three commonly used prediction models, including the XGBoost model, the RNN model, and the LSTM model, were selected for comparative experiments to confirm the performance of IMPA-Att-BiGRU in PV output power forecasting.

These models represent classical machine learning approaches and commonly adopted recurrent neural network architectures for time-series forecasting. All benchmark models were trained and evaluated under the same experimental settings to ensure a fair comparison. Figure 22, Figure 23 and Figure 24 illustrate the forecasting results of different models under different weather conditions, while the corresponding quantitative evaluation results in terms of MAE, RMSE, and R² are summarized in Table 7.

Under sunny conditions, all benchmark models were generally able to follow the smooth diurnal variation of PV output power; however, noticeable deviations still occurred around the ramp-up and peak periods. The proposed IMPA-Att-BiGRU achieved the highest forecasting accuracy, reducing MAE and RMSE by 40.12% and 33.77%, respectively, compared with the best benchmark forecasting model (LSTM), while simultaneously yielding a higher R². The experiments results indicate that IMPA-Att-BiGRU provides a substantially tighter fit to the actual power curve even under relatively stable irradiance conditions.

For cloudy conditions, characterized by moderate and irregular irradiance fluctuations, the forecasting performance differences among the different models became more obvious. IMPA-Att-BiGRU significantly outperformed the benchmark models, achieving MAE and RMSE reductions of 49.53% and 52.37%, respectively, relative to the best-performing benchmark forecasting model (LSTM). For rainy conditions, PV output power showed frequent intermittent and slope events, while the proposed IMPA-Att-BiGRU model still maintained better performance, compared with the best-performing benchmark forecasting model (LSTM), achieving MAE and RMSE reductions of 38.15% and 47.41%, respectively. Meanwhile, since it always maintained the highest R², the proposed model could face various weather changes while forecasting PV output power. In summary, these results validate the effectiveness of IMPA-Att-BiGRU for practical PV output power forecasting applications.

3.2. Cross-Day Rolling Forecasting Experiment

In addition to assessing continuous-time forecasting performance, a four-day rolling forecasting protocol was designed to examine whether the IMPA-Att-BiGRU model can maintain stable accuracy under long-horizon error accumulation, while an additional cross-region dataset was used to further examine generalization beyond the single-day setting. The dataset used in this study was collected from a PV generation system located in northwestern China (installed capacity: 50 kW). Measurements were recorded at 15 min intervals from 1 January 2016, to 14 October 2019, covering 24 h per day. The dataset was divided into training and testing subsets with a ratio of 80% and 20%, respectively. Meanwhile, a fivefold cross-validation scheme was employed on the full timeline of the dataset. In each fold, the model was trained on the corresponding training split and tested on the held-out split, where rolling forecasting was performed for all valid starting points whose four-day horizons laid completely in the test segment. Specifically, a recursive rolling strategy was adopted: At each step, the one-step-ahead prediction was fed back as part of the input for the next step, so that errors could accumulate over the four-day horizon. The final results are reported as the mean ± standard deviation across the five folds. Unless otherwise specified, all model configurations, hyperparameter search ranges, training settings, and evaluation criteria in this section were kept exactly the same as those in Section 3.1. This design ensured a fair comparison and isolated the impact of the dataset and the cross-day rolling setting on forecasting performance, rather than confounding factors introduced by different parameter choices. Figure 25 illustrates the forecasting results of different models of the cross-day rolling forecasting experiment, while the corresponding quantitative evaluation results in terms of MAE, RMSE, and R² are summarized in Table 8.

Since in the cross-day rolling forecasting experiment a four-day recursive multi-step strategy was adopted, where one-step-ahead predictions were iteratively fed back as inputs for subsequent steps, the cross-day rolling forecasting task was more challenging than the single-day evaluation because it adopted a recursive multi-step strategy over a four-day horizon and covered a full 24 h period, so larger absolute errors were expected. As shown in Figure 25, all models could reproduce the overall diurnal evolution of PV output power, whereas noticeable deviations emerged around sharp ramps and peak regions due to long-horizon error accumulation. The quantitative results in Table 8 further confirm this observation. Compared with the plain BiGRU, Att-BiGRU substantially improved accuracy by introducing the dual multi-head self-attention module, reducing MAE and RMSE from 5.9577 kW and 8.6803 kW to 3.7591 kW and 6.3248 kW, respectively, while increasing R² from 0.9566 to 0.9823. With IMPA-based hyperparameter optimization, IMPA-Att-BiGRU achieved the best performance, further lowering MAE and RMSE to 2.2418 kW and 4.4049 kW and yielding the highest R² of 0.9950. Notably, IMPA-Att-BiGRU also exhibited the smallest standard deviations across folds, indicating improved robustness against different data splits and reduced sensitivity to long-horizon error propagation. Overall, the results demonstrate that the attention mechanism enhanced temporal feature focusing under complex fluctuations, and the proposed IMPA optimization further strengthened the balance between fitting accuracy and generalization, enabling more stable multi-day PV output power forecasting.

4. Conclusions

This study developed an attention-enhanced BiGRU forecasting model optimized by an improved Marine Predators Algorithm (IMPA), termed IMPA-Att-BiGRU, for photovoltaic (PV) output power forecasting under diverse weather conditions. By integrating KPCA-based nonlinear feature compression, a dual multi-head self-attention module around the BiGRU backbone, and IMPA-driven hyperparameter optimization, the proposed model achieves consistently improved accuracy and robustness.

From the ablation study, introducing the multi-head self-attention mechanism and the IMPA brings substantial error reductions compared with the plain BiGRU baseline. Specifically, IMPA-Att-BiGRU reduces MAE by 35.7–58.5% and RMSE by 22.8–49.1% across sunny, cloudy, and rainy conditions, while improving R² by 0.0218–0.0411 in absolute terms, confirming the complementary benefits of attention-based representation enhancement and IMPA-based parameter optimization. The optimization algorithm comparison further verifies the effectiveness of the IMPA over other metaheuristics. Relative to MPA-Att-BiGRU, IMPA-Att-BiGRU achieves additional reductions of 9.9–24.0% in MAE and 9.2–13.3% in RMSE, with consistent improvements in goodness of fit. In the benchmark model comparison, the proposed method also outperforms classical baselines (XGBoost, RNN, and LSTM). Against the best benchmark under each weather type, IMPA-Att-BiGRU reduces MAE by 40.1% (sunny), 49.5% (cloudy), and 38.1% (rainy), and reduces RMSE by 33.8% (sunny), 52.4% (cloudy), and 47.4% (rainy), demonstrating strong robustness under increasingly volatile irradiance.

To evaluate long-horizon stability, a cross-day rolling forecasting experiment with fivefold cross-validation is further conducted. Compared with BiGRU, IMPA-Att-BiGRU achieves 62.4% lower MAE and 49.3% lower RMSE, and increases R² from 0.9566 to 0.9950. Notably, the reduced standard deviations in MAE/RMSE indicate more stable performance across folds, suggesting that the proposed model can mitigate error accumulation in multi-day recursive forecasting and maintain reliable tracking of PV output trajectories.

Overall, the proposed IMPA-Att-BiGRU provides an accurate and robust solution for PV output power forecasting under both stable and highly fluctuating weather conditions, and shows clear potential for practical deployment in PV operation scheduling and energy management.

5. Future Prospects

Future research will focus on several directions to further enhance the applicability and robustness of the proposed IMPA-Att-BiGRU model. First, probabilistic PV output power forecasting will be investigated by integrating uncertainty quantification techniques, such as prediction intervals or quantile-based learning, to better characterize forecasting uncertainty under highly volatile meteorological conditions. Second, multi-site and cross-regional modeling will be explored to evaluate the generalization capability of the proposed method across different climatic zones and PV system configurations. Third, considering practical deployment requirements, lightweight model structures and real-time inference strategies will be studied to improve computational efficiency and enable online forecasting and adaptive updating in real-world PV power systems. Overall, these future research directions aim to further extend the proposed model toward more reliable, scalable, and practical PV power forecasting applications in complex and dynamic energy systems.

Author Contributions

Conceptualization, S.L. and H.F.; methodology, S.L.; software, H.H. and H.L.; validation, S.L., S.X. and H.H.; formal analysis, H.H. and H.L.; resources, S.X., B.H. and P.C.; data curation, S.X., B.H. and P.C.; writing—original draft preparation, S.L.; writing—review and editing, S.X.; supervision, H.F.; funding acquisition, S.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Digital Factory Management and Control Technology R&D Center (6025310013PQ) and the Shenzhen Polytechnic University Research Fund (6025310056K).

Data Availability Statement

The processed datasets are available from the corresponding author upon request.

Acknowledgments

The authors gratefully acknowledge the support from the H.F. Model Worker Innovation Laboratory. Meanwhile, we are grateful for the original partners who supported data collection and analyses for the initial work on the photovoltaic output power forecasting.

Conflicts of Interest

Author Bing Han and Peng Cui were employed by the Panjin Power Supply Company, State Grid Liaoning Electric Power Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PV	Photovoltaic
ARIMA	Autoregressive Integrated Moving Average
GRU	Gated Recurrent Unit
BiGRU	Bidirectional Gated Recurrent Unit
LSTM	Long Short-Term Memory
RNN	Recurrent Neural Network
XGBoost	Extreme Gradient Boosting
SVM	Support Vector Machines
KNN	K-Nearest Neighbors
MPA	Marine Predators Algorithm
IMPA	Improved Marine Predators Algorithm
GWO	Grey Wolf Optimizer
PSO	Particle Swarm Optimization
KPCA	Kernel Principal Component Analysis
Att	Attention
MAE	Mean Absolute Error
RMSE	Root Mean Square Error
R²	Coefficient of Determination

References

Tian, J.; Ooka, R.; Lee, D. Multi-scale solar radiation and photovoltaic power forecasting with machine learning algorithms in urban environment: A state-of-the-art review. J. Clean. Prod. 2023, 426, 139040. [Google Scholar] [CrossRef]
Iheanetu, K.J. Solar photovoltaic power forecasting: A review. Sustainability 2022, 14, 17005. [Google Scholar] [CrossRef]
Wang, L.; Liu, Y.; Li, T.; Xie, X.; Chang, C. The short-term forecasting of asymmetry photovoltaic power based on the feature extraction of PV power and SVM algorithm. Symmetry 2020, 12, 1777. [Google Scholar] [CrossRef]
Wang, T.; Gong, Z.; Wang, Z.; Liu, Y.; Ma, Y.; Wang, F.; Li, J. Research and optimization of ultra-short-term photovoltaic power prediction model based on symmetric parallel TCN-TST-BiGRU architecture. Symmetry 2025, 17, 1855. [Google Scholar] [CrossRef]
Hui, L.; Ren, Z.Y.; Yan, X.; Li, W.Y.; Bo, H. A multi-data driven hybrid learning method for weekly photovoltaic power scenario forecast. IEEE Trans. Sustain. Energy 2021, 13, 91–100. [Google Scholar] [CrossRef]
Gu, B.; Shen, H.Q.; Lei, X.H.; Hu, H.; Liu, X.Y. Forecasting and uncertainty analysis of day-ahead photovoltaic power using a novel forecasting method. Appl. Energy 2021, 299, 117291. [Google Scholar] [CrossRef]
Sun, Y.; Wang, Z.; Wang, J.; Li, Q. Short-term solar photovoltaic power prediction utilizing the VMD-BKA-BP neural network. Symmetry 2025, 17, 784. [Google Scholar] [CrossRef]
Park, S.; Kim, Y.; Ferrier, N.J.; Collis, S.M.; Sankaran, R.; Beckman, P.H. Prediction of solar irradiance and photovoltaic solar energy product based on cloud coverage Estimation using machine learning methods. Atmosphere 2021, 12, 395. [Google Scholar] [CrossRef]
Gu, B.; Li, X.; Xu, F.L.; Yang, X.P.; Wang, F.Y.; Wang, P.Z. Forecasting and uncertainty analysis of day-ahead photovoltaic power based on WT-CNN-BiLSTM-AM-GMM. Sustainability 2023, 15, 6538. [Google Scholar] [CrossRef]
Singh, P.; Mandpura, A.K.; Yadav, V.K. Power forecasting in photovoltaic system using hybrid ANN and wavelet transform based method. J. Sci. Ind. Res. 2022, 82, 63–74. [Google Scholar]
Li, Y.; Zhai, S.; Yi, G.; Pang, S.; Luo, X. Short-term photovoltaic power forecasting based on ICEEMDAN-TCN-BiLSTM-MHA. Symmetry 2025, 17, 1599. [Google Scholar] [CrossRef]
Balraj, G.; Victoire, A.A.; Jaikumar, S.; Victoire, A. Variational mode decomposition combined fuzzy-Twin support vector machine model with deep learning for solar photovoltaic power forecasting. PLoS ONE 2022, 17, e0273632. [Google Scholar] [CrossRef] [PubMed]
Li, H.; Liu, P.; Guo, S.L.; Zuo, Q.U.; Cheng, L.; Tao, J.; Huang, K.D.; Yang, Z.K.; Han, D.Y.; Ming, B. Integrating teleconnection factors into long-term complementary operating rules for hybrid power systems: A case study of Longyangxia hydro-photovoltaic plant in China. Renew. Energy 2022, 186, 517–534. [Google Scholar] [CrossRef]
Zhu, J.B.; Li, M.R.; Luo, L.; Zhang, B.D.; Cui, M.J.; Yu, L.J. Short-term PV power forecast methodology based on multi-scale fluctuation characteristics extraction. Renew. Energy 2023, 208, 141–151. [Google Scholar] [CrossRef]
Lateko, A.A.H.; Yang, H.T.; Huang, C.M.; Aprillia, H.; Hsu, C.Y.; Zhong, J.L.; Phuong, N.H. Stacking ensemble method with the RNN meta-learner for short-term PV power forecasting. Energies 2021, 14, 4733. [Google Scholar] [CrossRef]
Akhter, M.N.; Mekhilef, S.; Mokhlis, H.; Ali, R.; Usama, M.; Muhammad, M.A.; Khairudin, A.S.M. A hybrid deep learning method for an hour ahead power output forecasting of three different photovoltaic systems. Appl. Energy 2022, 307, 118185. [Google Scholar] [CrossRef]
He, K.; Zhang, Y.; Wang, Y.K.; Zhou, R.H.; Liu, H. Feature-enhanced multivariate ensemble model for PV power spatio-temporal forecasting and scenario generation. Appl. Soft Comput. 2025, 183, 113646. [Google Scholar] [CrossRef]
Zhang, Z.B.; Huang, X.Q.; Li, C.L.; Cheng, F.Y.; Tai, Y.H. CRAformer: A cross-residual attention transformer for solar irradiation multistep forecasting. Energy 2025, 320, 135214. [Google Scholar] [CrossRef]
Xie, G.M.; Zhang, Z.J.; Xie, S.; Yuan, C.W.; Liu, H. CPWformer-DEC: Improved Transformer with class-priority weather attention and dynamic error compensation for photovoltaic power forecasting. Expert Syst. Appl. 2026, 301, 130580. [Google Scholar] [CrossRef]
Ding, Y.M.; Zhou, S.N.; Deng, W.W. Sustainable PV power forecasting via MPA-VMD optimized BiGRU with attention mechanism. Mathematics 2025, 13, 1531. [Google Scholar] [CrossRef]
Abdel-Basset, M.; El-Shahat, D.; Chakrabortty, R.K.; Ryan, M. Parameter estimation of photovoltaic models using an improved marine predators algorithm. Symmetry 2024, 16, 1643. [Google Scholar] [CrossRef]
Li, P.; Zhang, W.L.; Lu, C.J.; Zhang, R.Z.; Li, X.L. Robust kernel principal component analysis with optimal mean. Neural Netw. 2022, 152, 347–352. [Google Scholar] [CrossRef]
Zha, W.T.; Li, X.Y.; Du, Y.J.; Liang, Y.Y. Interval forecast method for wind power based on GCN-GRU. Symmetry 2024, 16, 1643. [Google Scholar] [CrossRef]
Li, Y.H.; Yang, N.; Bi, G.H.; Chen, S.Y.; Luo, Z.; Shen, X. Carbon price forecasting using a hybrid deep learning model: TKMixer-BiGRU-SA. Symmetry 2025, 17, 6. [Google Scholar] [CrossRef]
Faramarzi, A.; Heidarinejad, M.; Mirjalili, S.; Gandomi, A.H. Marine Predators Algorithm: A nature-inspired metaheuristic. Expert Syst. Appl. 2020, 152, 113377. [Google Scholar] [CrossRef]
Wang, H.B.; Zhang, J.Y.; Fan, J.K.; Zhang, C.Y.D.; Deng, B.; Zhao, W.T. An improved grey wolf optimizer with flexible crossover and mutation for cluster task scheduling. Inf. Sci. 2025, 704, 121943. [Google Scholar] [CrossRef]
Wang, Y.; Ran, S.J.; Wang, G.G. Role-oriented binary grey wolf optimizer using foraging-following and Levy flight for feature selection. Appl. Math. Model. 2024, 126, 310–326. [Google Scholar] [CrossRef]
Chantar, H.; Mafarja, M.; Alsawalqah, H.; Heidari, A.A.; Aljarah, I.; Faris, H. Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification. Neural Comput. Appl. 2020, 32, 12201–12220. [Google Scholar] [CrossRef]
Shami, T.M.; El-Saleh, A.A.; Alswaitti, M.; Al-Tashi, Q.; Summakieh, M.A.; Mirjalili, S. Particle swarm optimization: A comprehensive survey. IEEE Access 2022, 10, 10031–10061. [Google Scholar] [CrossRef]
Xie, S.; Wang, H.Z.; Peng, J.C.; Liu, X.L.; Yuan, X.F. A hierarchical data reconciliation based on multiple time-delay interval estimation for industrial processes. ISA Trans. 2020, 105, 198–209. [Google Scholar] [CrossRef]
Xie, S.; Yang, C.H.; Yuan, X.F.; Wang, X.L.; Xie, Y.F. A novel robust data reconciliation method for industrial processes. Control Eng. Pract. 2019, 83, 203–212. [Google Scholar] [CrossRef]

Figure 1. The basic structure of the GRU.

Figure 2. The BiGRU structure.

Figure 3. The structure of the multi-head self-attention layer.

Figure 4. The iteration curve of a.

Figure 5. The unimodal benchmark function F₁.

Figure 6. The unimodal benchmark function F₂.

Figure 7. The multimodal benchmark function F₃.

Figure 8. The multimodal benchmark function F₄.

Figure 9. Comparison of optimization search for F₁.

Figure 10. Comparison of optimization search for F₂.

Figure 11. Comparison of optimization search for F₃.

Figure 12. Comparison of the optimization search for F₄.

Figure 13. Flowchart of the IMPA-Att-BiGRU forecasting model.

Figure 14. Data flow of the IMPA-Att-BiGRU forecasting model.

Figure 15. KPCA contribution rates and cumulative contribution rates of the principal components.

Figure 16. The forecasting and the actual PV output power under sunny weather conditions.

Figure 17. The forecasting and the actual PV output power under cloudy weather conditions.

Figure 18. The forecasting and the actual PV output power under rainy weather conditions.

Figure 19. The forecasting and the actual PV output power under sunny weather conditions.

Figure 20. The forecasting and the actual PV output power under cloudy weather conditions.

Figure 21. The forecasting and the actual PV output power under rainy weather conditions.

Figure 22. The forecasting and the actual PV output power under sunny weather conditions.

Figure 23. The forecasting and the actual PV output power under cloudy weather conditions.

Figure 24. The forecasting and the actual PV output power under rainy weather conditions.

Figure 25. The forecasting and the actual PV output power of the cross-day rolling forecasting experiment.

Table 1. Pseudocode of IMPA.

Pseudocode of IMPA
Initialize search agent (prey) population i = 1, …, n based on Equation (16)
While termination criteria are not met
Calculate the fitness, construct the best solution (predator)
For i = 1, …, n
Generate candidate solution using information exchange based on Equation (22)
If candidate is better than P_i then
P_i = candidate
End (if)
End (for)
Select a subset of individuals and generate opposite solutions based on Equation (23)
Generate quasi-opposite solutions based on Equation (24)
Merge original and quasi-opposite individuals
Sort by fitness and keep the best n individuals, update predator
If t < t_max/3
Update prey based on Equation (19), update predator
Else if t_max/3 < t < 2 × t_max/3
Update prey based on Equations (25)–(27), update predator
Else if t > 2 × t_max/3
Update prey based on Equation (21), update predator
Generate refracted counterpart of predator based on Equations (28) and (29)
If the refracted predator is better than the current predator then
Update prey based on Equation (30), update predator
End (if)
End (if)
End (while)

Table 2. Evaluations for the comparison optimization methods.

Function	Algorithm	Optimal	Worst	Mean	Std
F₁	PSO	3.92 × 10⁻⁸	1.55 × 10⁻⁶	4.50 × 10⁻⁷	4.18 × 10⁻⁷
	GWO	4.22 × 10⁻⁸¹	4.24 × 10⁻⁷³	5.00 × 10⁻⁷⁴	1.30 × 10⁻⁷⁴
	MPA	3.93 × 10⁻³⁷	1.07 × 10⁻³⁴	5.81 × 10⁻³⁵	4.33 × 10⁻³⁵
	IMPA	0.00	0.00	0.00	0.00
F₂	PSO	2.46 × 10⁻⁵	5.37 × 10⁻²	2.25 × 10⁻²	1.81 × 10⁻²
	GWO	1.29 × 10⁻³⁹	7.57 × 10⁻³⁶	7.37 × 10⁻³⁷	2.30 × 10⁻³⁷
	MPA	7.93 × 10⁻²¹	1.47 × 10⁻¹⁹	9.07 × 10⁻²⁰	1.15 × 10⁻²⁰
	IMPA	0.00	0.00	0.00	0.00
F₃	PSO	1.28 × 10⁻²	8.53 × 10⁻²	4.02 × 10⁻²	2.41 × 10⁻³
	GWO	1.52 × 10⁻⁷	7.29 × 10⁻⁷	3.30 × 10⁻⁷	1.13 × 10⁻⁷
	MPA	3.31 × 10⁻¹³	1.59 × 10⁻¹¹	2.19 × 10⁻¹²	4.04 × 10⁻¹²
	IMPA	1.39 × 10⁻¹⁵	8.48 × 10⁻¹⁴	8.20 × 10⁻¹⁴	1.38 × 10⁻¹³
F₄	PSO	−5.0692	−5.0545	−5.0583	2.10 × 10⁻³
	GWO	−5.0830	−5.0628	−5.0791	1.60 × 10⁻³
	MPA	−5.0870	−5.0764	−5.0812	1.33 × 10⁻³
	IMPA	−10.1532	−10.1532	−10.1532	0.00

Table 3. Description of input features.

No.	Feature Name	Unit
1	Ambient temperature	°C
2	Module temperature	°C
3	Wind speed	m/s
4	Relative humidity	%
5	Global horizontal irradiance	W/m²
6	Diffuse horizontal irradiance	W/m²
7	Tilted global irradiance	W/m²
8	Tilted diffuse irradiance	W/m²
9	Rainfall	mm
10	Air pressure	hPa

Table 4. Results of the sensitivity analysis of the retained KPCA principal component number on the forecasting performance of IMPA-Att-BiGRU.

k	Cumulative Contribution Rate	MAE (kW)	RMSE (kW)	R²
3	0.7644	1.2864	3.2701	0.9631
4	0.8767	1.1943	2.9014	0.9810
5	0.9738	1.1401	2.0126	0.9960
6	0.9840	1.1299	2.0028	0.9968
7	0.9916	1.1211	2.0002	0.9971

Table 5. Quantitative performance comparison of different models.

Weather Conditions	Model	MAE (kW)	RMSE (kW)	R²
Sunny	BiGRU	1.6233	2.3712	0.9757
	Att-BiGRU	1.4093	2.1659	0.9841
	IMPA-Att-BiGRU	1.0442	1.8317	0.9975
Cloudy	BiGRU	2.0037	4.1635	0.9582
	Att-BiGRU	1.8920	3.1891	0.9811
	IMPA-Att-BiGRU	1.1397	2.1186	0.9943
Rainy	BiGRU	2.9407	4.8009	0.9518
	Att-BiGRU	2.5626	4.1259	0.9792
	IMPA-Att-BiGRU	1.2214	2.6599	0.9929

Table 6. Quantitative performance comparison of different models.

Weather Conditions	Model	MAE (kW)	RMSE (kW)	R²
Sunny	PSO-Att-BiGRU	1.3721	2.1424	0.9850
	GWO-Att-BiGRU	1.2928	2.0037	0.9872
	MPA-Att-BiGRU	1.1287	1.8191	0.9913
	IMPA-Att-BiGRU	1.0166	1.6234	0.9979
Cloudy	PSO-Att-BiGRU	1.7113	3.0279	0.9819
	GWO-Att-BiGRU	1.5701	2.7664	0.9826
	MPA-Att-BiGRU	1.3996	2.4639	0.9897
	IMPA-Att-BiGRU	1.1781	2.1375	0.9939
Rainy	PSO-Att-BiGRU	2.4481	4.0536	0.9815
	GWO-Att-BiGRU	2.0073	3.6288	0.9830
	MPA-Att-BiGRU	1.6912	2.9349	0.9872
	IMPA-Att-BiGRU	1.2855	2.6652	0.9925

Table 7. Quantitative performance comparison of different models.

Weather Conditions	Model	MAE (kW)	RMSE (kW)	R²
Sunny	XGBoost	2.4025	3.1221	0.9651
	RNN	2.1679	2.9916	0.9686
	LSTM	1.7247	2.5901	0.9744
	IMPA-Att-BiGRU	1.0328	1.7154	0.9976
Cloudy	XGBoost	2.7720	4.9501	0.9483
	RNN	2.6008	4.6799	0.9512
	LSTM	2.2317	4.4018	0.9559
	IMPA-Att-BiGRU	1.1263	2.0961	0.9946
Rainy	XGBoost	3.1850	5.9014	0.9369
	RNN	2.8911	5.3261	0.9435
	LSTM	2.0265	4.9357	0.9499
	IMPA-Att-BiGRU	1.2534	2.5957	0.9929

Table 8. Quantitative performance comparison of different models.

Model	MAE (kW)	RMSE (kW)	R²
BiGRU	5.9577 ± 1.4617	8.6803 ± 0.6375	0.9566 ± 0.0041
Att-BiGRU	3.7591 ± 0.8224	6.3248 ± 0.5981	0.9823 ± 0.0024
IMPA-Att-BiGRU	2.2418 ± 0.2372	4.4049 ± 0.3122	0.9950 ± 0.0011

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, S.; Fu, H.; Xie, S.; Han, H.; Liu, H.; Han, B.; Cui, P. Research on Photovoltaic Output Power Forecasting Based on an Attention-Enhanced BiGRU Optimized by an Improved Marine Predators Algorithm. Symmetry 2026, 18, 282. https://doi.org/10.3390/sym18020282

AMA Style

Liu S, Fu H, Xie S, Han H, Liu H, Han B, Cui P. Research on Photovoltaic Output Power Forecasting Based on an Attention-Enhanced BiGRU Optimized by an Improved Marine Predators Algorithm. Symmetry. 2026; 18(2):282. https://doi.org/10.3390/sym18020282

Chicago/Turabian Style

Liu, Shanglin, Hua Fu, Sen Xie, Haotong Han, Hao Liu, Bing Han, and Peng Cui. 2026. "Research on Photovoltaic Output Power Forecasting Based on an Attention-Enhanced BiGRU Optimized by an Improved Marine Predators Algorithm" Symmetry 18, no. 2: 282. https://doi.org/10.3390/sym18020282

APA Style

Liu, S., Fu, H., Xie, S., Han, H., Liu, H., Han, B., & Cui, P. (2026). Research on Photovoltaic Output Power Forecasting Based on an Attention-Enhanced BiGRU Optimized by an Improved Marine Predators Algorithm. Symmetry, 18(2), 282. https://doi.org/10.3390/sym18020282

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Photovoltaic Output Power Forecasting Based on an Attention-Enhanced BiGRU Optimized by an Improved Marine Predators Algorithm

Abstract

1. Introduction

2. Methodology

2.1. Kernel Principal Component Analysis for Feature Extraction

2.2. BiGRU

2.3. Multi-Head Self-Attention Mechanism

2.4. Improved Marine Predators Algorithm

2.4.1. Information Exchanging and Quasi-Opposition-Based Learning

2.4.2. Improvement of Exploration and Exploitation Methods

2.4.3. Refracted Opposition-Based Learning

2.4.4. Performance Evaluation of IMPA

2.5. IMPA-Att-BiGRU PV Output Power Forecasting Model

3. Results and Discussion

3.1. Single-Day One-Step Forecasting Experiment

3.1.1. Dataset Processing

3.1.2. Component-Wise Ablation Study of the Proposed Model

3.1.3. Performance Comparison of Optimization Algorithms in PV Output Power Forecasting

3.1.4. Performance Comparison with Benchmark Forecasting Models

3.2. Cross-Day Rolling Forecasting Experiment

4. Conclusions

5. Future Prospects

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI