Next Article in Journal
Correlation Between Ultrasonic Scattering Coefficients and Orientation Distribution Coefficients (ODCs) in Textured Polycrystalline Materials with Arbitrary Crystallite Symmetry
Previous Article in Journal
A Multi-Strategy Augmented Newton–Raphson-Based Optimizer for Global Optimization Problems and Robot Path Planning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Photovoltaic Output Power Forecasting Based on an Attention-Enhanced BiGRU Optimized by an Improved Marine Predators Algorithm

1
Faculty of Electrical and Control Engineering, Liaoning Technical University, Huludao 125000, China
2
Institute of Intelligence Science and Engineering, Shenzhen Polytechnic University, Shenzhen 518055, China
3
Panjin Power Supply Company, State Grid Liaoning Electric Power Co., Ltd., Panjin 124010, China
*
Author to whom correspondence should be addressed.
Symmetry 2026, 18(2), 282; https://doi.org/10.3390/sym18020282
Submission received: 22 December 2025 / Revised: 27 January 2026 / Accepted: 31 January 2026 / Published: 3 February 2026
(This article belongs to the Section Computer)

Abstract

Accurate photovoltaic (PV) output power forecasting is essential for reliable power system operation, yet rapidly changing meteorological conditions often degrade forecasting accuracy. This study proposes an attention-enhanced bidirectional gated recurrent unit (BiGRU) optimized by an improved Marine Predators Algorithm (IMPA) for PV output power forecasting. Kernel Principal Component Analysis (KPCA) is first employed to extract compact nonlinear representations and suppress redundant features. Then, a dual multi-head self-attention mechanism is integrated before and after the BiGRU layer to strengthen temporal feature learning under fluctuating weather. Finally, the IMPA is designed to improve exploration–exploitation balance and automatically optimize key hyperparameters. Experiments under sunny, cloudy, and rainy conditions demonstrate that IMPA-Att-BiGRU reduces MAE and RMSE by 35.7–58.5% and 22.8–49.1% versus BiGRU, respectively, while increasing R2 by 2.2–4.1 percentage points. Against the best benchmark (LSTM), MAE and RMSE are further reduced by 38.1–49.5% and 33.8–52.4%. Moreover, in a cross-day rolling forecasting test with fivefold results, IMPA-Att-BiGRU achieves 62.4% MAE and 49.3% RMSE reductions over BiGRU, confirming robust performance under long-horizon error accumulation.

1. Introduction

With the continuous increase in installed capacity and application area, photovoltaic (PV) power generation has become one of the most crucial components of renewable energy power systems [1,2,3]. Meanwhile, accurate photovoltaic output power forecasting has been quite significant for generation scheduling, grid stability maintenance, and the efficient integration of renewable energy sources [4,5,6]. However, the output power of photovoltaics is always limited by complex meteorological elements, such as solar irradiance, ambient temperature, and so on [7], especially under cloudy and rainy weather conditions, introducing strong stochasticity into accurate PV output power forecasting [8].
The recent studies on photovoltaic power forecasting can generally be categorized into physics-based or statistical methods [9]. The physics-based methods predominantly rely on capturing the meteorological information around the PV installations. Many existing studies focus primarily on solar irradiance data, while the interactions between multiple meteorological factors and PV performance are often insufficiently explored [10]. One of the most popular statistical approaches, the autoregressive integrated moving average (ARIMA) models, over-relies on linear assumptions and exhibits limited capability in capturing the nonlinear and dynamic characteristics of PV output power [11].
Data-driven methods have been more frequently applied to PV output power forecasting since the rapid development of machine learning, especially some representative models, including support vector machines (SVMs) [12], K-nearest neighbors (KNN) [13], and extreme gradient boosting (XGBoost) [14]. More recently, some typical deep learning methods were designed to effectively learn sequential dependencies in time-series forecasting tasks to achieve PV output power forecasting, particularly the original and the variant recurrent neural networks (RNNs) [15], including the long short-term memory (LSTM) [16] and the gated recurrent unit (GRU) [17] models. Despite these advances, some challenges remain unresolved. Despite recent progress in incorporating attention-based models [18,19] for PV output power forecasting, many approaches still rely on single-stage attention designs, empirically selected hyperparameters without a principled global optimization procedure, and limited preprocessing to alleviate redundancy and nonlinear coupling among meteorological variables, which may collectively hinder accuracy and robustness when weather conditions fluctuate sharply.
To address these challenges, this study proposes a novel PV output power forecasting model integrating KPCA, dual-stage multi-head self-attention, and an improved Marine Predators Algorithm. Specifically, KPCA is employed as a nonlinear feature-compression front-end to extract informative latent components and alleviate redundancy and nonlinear coupling in the original inputs, thereby reducing the learning burden. On this basis, a BiGRU forecaster enhanced by a dual-stage multi-head self-attention mechanism is constructed, where attention is applied to both the KPCA-compressed inputs and the BiGRU hidden representations to enable complementary saliency learning at the input and representation levels. Furthermore, an improved Marine Predators Algorithm (IMPA) is developed to provide a systematic and reproducible hyperparameter search by strengthening global exploration and local exploitation, leading to a more reliable model configuration under diverse meteorological conditions. It is noteworthy that several variants of the improved MPA have been reported for PV-related optimization, such as VMD parameter tuning and photovoltaic system parameter identification [20,21]. However, these variants employ different enhancement operators and are typically tailored to different objectives; therefore, they are not readily transferable to hyperparameter tuning of the attention-enhanced BiGRU (Att-BiGRU) PV output power forecasting model. From the perspective of symmetry, photovoltaic output power series exhibit inherent temporal structures, especially under stable weather conditions, where power evolution presents approximate rise–peak–decline patterns. Bidirectional recurrent architectures are well suited to capture such temporal symmetry by jointly modeling past-to-future and future-to-past dependencies. In addition, the self-attention mechanism introduces a symmetric weighting strategy over temporal features, allowing the model to adaptively balance the contribution of different time steps. Meanwhile, the improved Marine Predators Algorithm maintains a symmetric balance between global exploration and local exploitation during parameter optimization. These symmetry-aware characteristics collectively contribute to the robustness and accuracy of the proposed forecasting model.
In summary, this study demonstrates not only theoretical innovation but also experimental validation in terms of PV output power forecasting accuracy, highlighting the practical applicability of the proposed IMPA-Att-BiGRU PV model and offering insights into future research and practical applicability.

2. Methodology

2.1. Kernel Principal Component Analysis for Feature Extraction

The accuracy of PV output power forecasting highly relies on real meteorological datasets containing a large number of components, which consist of solar irradiance, temperature, humidity, rainfall, wind speed, and so on. However, too many input variables will cause the prediction model to face significant nonlinear correlation and redundancy when extracting hidden features and may even introduce the unavoidable noise information, increasing the complexity of the forecasting procedure. Hence, Kernel Principal Component Analysis (KPCA) is employed to compress the original inputs into a compact and informative feature representation with lower dimensions.
KPCA is selected in preference to typical dimensionality-reduction approaches such as ICA and autoencoders for two key reasons. First, PV output power is nonlinearly coupled with meteorological inputs, and KPCA captures this nonlinear structure through kernel-induced feature mapping. In contrast, ICA relies on independence assumptions that may not hold for highly correlated meteorological variables and may be less effective in capturing nonlinear relationships. Second, compared with autoencoders, KPCA introduces no additional trainable parameters and is less sensitive to initialization and training hyperparameters, which improves stability and reproducibility—particularly important when the downstream forecasting model is already a deep model.
By mapping the complex data into a high-dimensional feature space, KPCA could capture the intrinsic nonlinear structures from the original input [22]. This transformation facilitates the characterization of complex nonlinear dependencies among PV-related meteorological variables. As a result, feature redundancy and noise are effectively reduced while the essential nonlinear characteristics of the original data are well preserved. Moreover, KPCA enables nonlinear patterns embedded in the original input data to be represented in the feature space where linear principal component extraction can be efficiently performed.
Assume that the sample set is:
X n × m = [ x 1 , x 2 , , x n ] T , x i R m , i = 1 , 2 , , n
where n denotes the total quantity of the m-dimension samples. Subsequently, a nonlinear mapping θ is employed to project the samples into the h-dimension feature space H:
θ : x i R m θ ( x i ) R h
Then, the kernel function fk and the covariance matrix Q could be respectively represented as:
f k ( x i , x j ) = θ ( x i ) T θ ( x j ) = θ ( x i ) , θ ( x j ) Q = 1 n 1 j = 1 n θ ( x j ) θ ( x j ) T i , j = 1 , 2 , , n
In this study, KPCA adopts the radial basis function (RBF) kernel to capture the nonlinear coupling between PV output power and meteorological variables. This choice is motivated by the fact that PV output power is nonlinearly coupled with meteorological drivers and the inputs are often highly correlated; the RBF kernel provides a smooth and flexible nonlinear mapping with only one bandwidth parameter, enabling KPCA to capture the coupling in a stable and reproducible manner. The kernel is defined as:
f k ( x i , x j ) = exp ( x i x j 2 2 σ 2 )
where σ denotes the kernel bandwidth. The bandwidth σ is determined systematically on the training set using the median heuristic by setting it to the median of pairwise Euclidean distances among training samples, and the resulting value is fixed for all experiments to ensure reproducibility. Prior to KPCA, all input variables are Z-score standardized based on the training-set statistics (zero mean and unit variance), and the same transformation is applied to the testing set to avoid information leakage. The RBF kernel is a widely used universal kernel with a single bandwidth parameter, which offers a flexible yet stable nonlinear mapping and is well suited to PV output power forecasting where meteorological inputs are highly correlated and nonlinearly coupled with PV output power.
Consequently, the characteristic equation between the eigenvector Vk and its eigenvalue µk will be:
μ k V k = Q V k = 1 n 1 j = 1 n θ ( x j ) θ ( x j ) T V k = 1 n 1 j = 1 k θ ( x j ) , V k θ ( x j )
Each eigenvector Vk could be regarded as a linear combination as:
V k = i = 1 n v k , i θ ( x i ) , k = 1 , 2 , , n
where vk and i are linear correlation coefficients.
It can be inferred from Equations (3) and (5):
μ k i = 1 n v k , i K k , i = 1 n 1 i = 1 n v k , i i = 1 n K k , i K j i λ k v k = K v k
where λ k = ( n 1 ) μ k ; K R n × n is a kernel matrix.
The eigenvector matrix established by the covariance matrix Q’s eigenvector, Vk, is shown as:
V f = [ V 1   V 2     V l   V l + 1   V n ]
While selecting one principal component, the matrix V f = [ V 1   V 2   V l ] R n × l could be obtained in the principal component space.
Since the eigenvector Vk should meet the normal constraint in the feature space H, K’s eigenvectors will be:
v k , v k = V k , V k λ k V k , V k = 1
Since the eigenvector could be defined as the norm of 1 λ k :
v k = 1 λ k v k
Consequently, in the feature space, V f could be inferred as:
V f = [ 1 λ 1 X T v l ] = X T V Λ 1 2 Λ = d i a g ( λ 1 , λ 2 , , λ l ) V = [ v 1   v 2     v l ]
where X = [ θ ( x 1 )   θ ( x 2 )     θ ( x n ) ] .
By transforming the original high-dimensional meteorological inputs into a compact set of nonlinear principal components, KPCA provides a structured and noise-reduced feature representation for subsequent temporal modeling. This dimensionality reduction not only alleviates the computational burden of deep learning models but also enhances the stability of sequence learning by filtering redundant and highly correlated variables.

2.2. BiGRU

A gated recurrent unit (GRU) is a variant of a recurrent neural network (RNN), which is updated to avoid frequent gradient vanishing or exploding. There is only one reset gate and one update gate to control the historical state in the output and the information combined with the current state [23]. The basic structure of a GRU is illustrated in Figure 1.
z t = s i g m o i d ( W z x t + U z h t 1 + b z ) r t = s i g m o i d ( W r x t + U r h t 1 + b r ) h t = tanh ( W h x t + U h ( r t h t 1 ) ) h t = ( 1 z t ) h t 1 + z t h t
where W, U, and b are the weight matrix; rt is the reset gate vector; and zt is the updated gate vector.
GRU acts as a unidirectional recurrent neural network structure with a transmission state from front to back. Furthermore, two superimposed unidirectional GRUs are combined for the BiGRU, and the output is controlled by both GRUs simultaneously. The BiGRU structure is shown in Figure 2.
BiGRU is capable of analyzing the sequence in not only forward but also backward directions, which makes it sensitive to capturing the temporal dependencies and the bidirectional correlations inherent in PV output power time series. Compared with the unidirectional structure, BiGRU enables the model to exploit the complete contextual information, which is particularly beneficial for handling the strong non-stationarity and intermittency induced by varying meteorological conditions [24]. The bidirectional structure enables the model to capture both historical and future contextual information in a symmetric manner, which is particularly beneficial for PV power series characterized by strong diurnal regularity and periodic patterns.

2.3. Multi-Head Self-Attention Mechanism

In PV output power forecasting tasks, meteorological variability exhibits strong nonlinearity and temporal heterogeneity, making it difficult for conventional sequence models to consistently identify key influencing factors. Although the BiGRU model is capable of modeling temporal dependencies, it often lacks an explicit mechanism to distinguish the relationship among the input features.
The multi-head self-attention mechanism enables efficient modeling of long-term dependencies and integrates information from the entire sequence by allowing every position to directly attend to all other positions, regardless of their temporal distance. Meanwhile, the multi-head self-attention mechanism allows multiple attention subspaces to be learned in parallel, facilitating the extraction of diverse feature interactions and temporal patterns. Specifically, each input vector is projected into query, key, and value representations. The similarity between the query and key determines an attention weight, which reflects the relevance of one position to another. In this way, multi-head self-attention enables flexible, content-dependent interactions between distant time steps or features and selectively emphasizes critical information while suppressing irrelevant or redundant components. Given the d-dimension, the input sequence X is:
X = [ x 1 , , x N ]
where N represents the sequence length. Then the query, key, and value vector, which are respectively defined as Q, K, and V, will be obtained by a cubic linear transformation between the input and the three corresponding weight matrices, WQ, WK, and WV, to calculate Ai, the attention score for each head of the total n heads:
Q = X W Q K = X W K V = X W V A i = s o f t m a x ( Q K T d k ) V
where dk is Q’s dimension in k. Ultimately, as shown in Figure 3, the multi-attention output will be:
M n = C o n c a t ( A 1 , A 2 , , A n ) W O
where WO is the output projection matrix.
In the proposed forecasting model, two layers of multi-head self-attention mechanisms are embedded both before and after the BiGRU layers to enhance feature representation and temporal dependency modeling from complementary perspectives. The input-level multi-head self-attention applied before the BiGRU layer could adaptively evaluate the relative importance of different input features and temporal positions after KPCA-based dimensionality reduction. By assigning higher attention weights to informative meteorological patterns and suppressing redundant or noisy components, this module enables the model to better cope with the strong variability and non-stationarity caused by changing weather conditions, thereby providing more discriminative inputs for subsequent temporal modeling. Subsequently, the output-level multi-head self-attention mechanism is further employed to reweight the hidden representations generated by the BiGRU model, aiming to preserve the deep temporal dependencies between the input and the output. The attention-enhanced design enables the forecasting model to maintain a balanced focus on informative time steps while preserving the temporal symmetry captured by the bidirectional recurrent structure, effectively integrating feature-level selection and temporal-level refinement, resulting in improved PV output power forecasting accuracy.

2.4. Improved Marine Predators Algorithm

The Marine Predators Algorithm (MPA) is an intelligent optimization algorithm whose inspiration comes from the natural preying rules in the marine ecosystem [25]. As a population-based method, the MPA designs its initial solution X0 as uniformly distributed within the range of [Xmin, Xmax]:
X 0 = X min + r a n d × ( X max X min ) r a n d ( 0 , 1 )
The prey (the initial individual), Prey, and the top predator (the fittest solution), Elite, are respectively defined as:
E l i t e = X 1 , 1 I   X 1 , 2 I     X 1 , d I X 2 , 1 I   X 2 , 2 I     X 2 , d I             X n , 1 I   X n , 2 I     X n , d I n × d
P r e y = X 1 , 1   X 1 , 2     X 1 , d X 2 , 1   X 2 , 2     X 2 , d             X n , 1   X n , 2     X n , d n × d
where Xi,j represents the jth dimension of ith individual.
According to the proportional relationship between the current iteration t and the maximum iteration tmax, the complete optimization process is divided into three phases:
Phase 1 ( t < 1 3 t max ): The individual is moving faster than the fittest solution:
s i = R B ( E i R B P i ) P i = P i + 0.5 R s i R [ 0 , 1 ]
where Ei and Pi respectively represent the ith predator and prey, the notation shows entry-wise multiplications, si is the individual’s movement step size, and RB denotes the Brownian motion vector containing random numbers based on a normal distribution.
Phase 2 ( 1 3 t max < t < 2 3 t max ): Fifty percent of the population is selected for exploration and the other half for exploitation. The mathematical model is applied as:
For   the   first   50 %   population   ( from   1   to   n / 2 ) : s i = R L ( E i R L P i ) P i = P i + 0.5 R s i For   the   second   50 %   population   ( from   n / 2   to   n ) : s i = R B ( R B E i P i ) P i = E i + 0.5 C s i C = ( 1 t / t max ) 2 t / t max
where RL denotes the Levy movement vector of random numbers based on Levy distribution, and C is an adaptive parameter.
Phase 3 ( t > 2 3 t max ): The fittest solution is moving faster than the individual:
s i = R L ( R L E i P i ) P i = E i + 0.5 C s i

2.4.1. Information Exchanging and Quasi-Opposition-Based Learning

In order to avoid overlooking valid information existing in the search region or falling into local stagnation during the optimization process, all the independent individuals should realize the information intersection with other individuals in the region, and then for each individual Pi, another different individual Pj ( i j ) is selected randomly from the population for information exchange to address this. The new candidate solution of Pi is then obtained as:
P i _ n e w = P i + R × ( E i P j ) ,   if   p e < 0.5 P i R × ( E i P j ) ,   else
where Pi is updated by Pi_new only if its fitness is worse than the new one, and pe ∈ [0, 1] is a random number that is responsible for controlling the direction of information exchange. The new solution Pi_new is designed to carry beneficial advantages offered by the fittest solution and other individuals simultaneously.
After the whole population completes information exchange, where each individual Pi has compared the fitness with Pi_new, the new candidate prey population will be generated. Subsequently, the quasi-opposition learning mechanism is applied to expand the convergence region and diversify the present population.
Then, the concept of quasi-opposition-based learning is employed to further diversify the present population and improve the convergence range. In this strategy, a random range of individuals is picked up to map the corresponding opposite individuals:
P i o = X max + X min P i
where P i o is the opposite solution of Pi. After finding the center of the search space Pc and the opposite solution, a quasi-opposite solution P i q is generated at a random position within the range between them:
P c = X max + X min 2 P i q = P c + r a n d × ( P i o P c ) ,   if   P i o > P c P i o + r a n d × ( P c P i o ) ,   else
Then, a new individual matrix is constructed by both the primitive individuals and the quasi-opposite ones simultaneously, from which the first n solutions sorted by their fitness are selected for the new candidate population, which is kept for the next iteration.

2.4.2. Improvement of Exploration and Exploitation Methods

In the second phase of the MPA ( 1 3 t max < t < 2 3 t max ),half of the population is arranged for exploration and the other half for exploitation. In order to simultaneously enhance not only the exploration but also the exploitation capabilities of the algorithm, the individual position update method in the GWO is introduced to improve its corresponding process.
Inspired by the prey-hunting activity of grey wolves, the Grey Wolf Optimizer (GWO) has been applied in various applications due to its advantages in exploration and exploitation [26,27,28]. In the GWO, A is an important parameter, and the algorithm controls its exploration and exploitation through the scope of A. The calculation method for A is as follows:
A = 2 a R a R [ 0 , 1 ]
where a is a variable factor, whose changing trend is shown in Figure 4:
The process of updating the position of the grey wolf is:
W i = W i P A 2 R W i P W i
where Wi and W i P respectively represent the ith wolf and prey.
Therefore, in the second phase of the IMPA, the exploration and exploitation methods are improved as:
For   the   first   50 %   population   ( from   1   to   n / 2 ) : s i = R L ( E i R L ( E i A 2 R E i P i ) ) P i = P i + 0.5 R s i For   the   second   50 %   population   ( from   n / 2   to   n ) : s i = R B ( R B E i ( E i A 2 R E i P i ) ) P i = E i + 0.5 C s i C = ( 1 t / t max ) 2 t / t max
From an optimization perspective, the original MPA may suffer from an imbalance between global exploration and local exploitation. By symmetrically constraining the exploration and exploitation process, the improved algorithm maintains a dynamic balanced interaction between different individuals, preventing premature convergence while preserving search efficiency.

2.4.3. Refracted Opposition-Based Learning

The MPA has better optimization performance in the early iteration of the algorithm but worse optimization performance in its later iterations due to the gradual increase in individual similarity. Consequently, the IMPA introduces the refracted opposition-based learning strategy into the MPA for individual mutation to enhance individual diversity by the end of the iteration. The refracted opposition-based learning applies the refraction principle of light to ameliorate the update process of the predator as follows:
E i R = [ ( m + 1 ) ( X max X min ) 2 E i ] / 2 m
where E i R is the predator that mutates through the refracted opposition-based learning and m is a variable refractive index parameter, while the length of refracting light could be changed by regulating its value so that the algorithm can escape from the local minimum. For this, the value of m ∈ [mmin, mmax] can be written by the linear gradient:
m = m max ( m max m min ) ( t / t max )
where the value of m is affected by the number of iterations. Through the process of refracted opposition-based learning, the original predator Ei mutates to generate E i R , and both of them exploit through Levy flight at the same time, and the one with better fitness will be chosen for the iteration in a low-velocity ratio in order to avoid ignoring the optimal solution. This will result in the mathematical model in the low-velocity ratio being updated as follows if E i R is the better predator:
s i R = R L ( R L E i R P i ) P i R = E i R + 0.5 C s i R
where s i R is the step size of the movement of E i R , and it will serve as a basis for the mutation prey P i R to update its position.
The pseudocode of the proposed IMPA is provided in Table 1.

2.4.4. Performance Evaluation of IMPA

To assess the effect of the optimization search for the IMPA, the typical unimodal benchmark functions F1, F2, and the typical multimodal benchmark functions F3, F4, with theoretical optima of 0, 0, 0, and −10.1532, respectively, are selected for simulation experiments. Meanwhile, the MPA, the grey wolf optimizer (GWO), and particle swarm optimization (PSO) [29] are set as the comparison algorithms. The benchmark functions are shown in the following equations:
F 1 = i = 1 n x i 2
F 2 = i = 1 j 1 i x j 2
F 3 = π n 10 sin π y 1 + i = 1 n 1 y i 1 2 1 + 10 sin 2 π y i + 1 + y n 1 2 + i = 1 n u x i , 10 , 100 , 4 y i = 1 + x i + 1 4 u x i , a , k , m = k x i a m x i > a 0 a < x i < a k x i a m x i < a
F 4 = i = 1 5 X a i X a i T + c i 1
Moreover, F1 to F4 are shown in Figure 5, Figure 6, Figure 7 and Figure 8. The independent optimization tests are performed through each group of algorithms for each benchmark function, and the number of iterations is 500. The standard deviation, the mean, the worst value, and the optimal value are separately calculated and shown in Table 2.
Moreover, the IMPA demonstrates superior performance over other comparative algorithms for the unimodal or multimodal benchmark functions in Table 2. In particular, the theoretical optimal result is found through the IMPA with a standard deviation of 0 for the functions F1 and F2, and the multimodal function F4. This shows that the computational accuracy and robustness of the IMPA are improved via the introduction of the information exchange and quasi-opposition-based learning, the improvement of exploration and exploitation, and the refracted opposition-based learning. Moreover, as can be seen in Figure 9, Figure 10, Figure 11 and Figure 12, compared with other algorithms, the lowest number of iterations for searching the optimal value is required by the IMPA. This indicates that the convergence speed is improved by increasing the proportion of high-quality individuals in the IMPA. In summary, compared with the MPA, PSO, and GWO algorithms, the IMPA has better convergence ability, robustness, and local extremum escape performance, confirming the effectiveness of the proposed improvement strategies.

2.5. IMPA-Att-BiGRU PV Output Power Forecasting Model

The forecasting performance of the BiGRU model is sensitive to several key hyperparameters, including the learning rate, the hidden units, and the dropout rate. Therefore, the IMPA algorithm is introduced to perform global optimization of these hyperparameters. Meanwhile, the multi-head self-attention mechanism is incorporated to enhance temporal feature learning and to adaptively extract the most informative patterns from the input obtained from KPCA. The flowchart of the IMPA-Att-BiGRU forecasting model is illustrated in Figure 13, while the corresponding data flow is shown in Figure 14.
Step 1: Historical time-series sequences of the meteorological variable data are used as the model input.
Step 2: The input data is divided into the training dataset and the testing dataset in a ratio of 80% and 20%.
Step 3: KPCA is applied to reduce the dimensionality of the input data and extract the nonlinear principal components used for subsequent model construction.
Step 4: The key hyperparameters of the BiGRU are identified and the IMPA is used to optimize the hyperparameter set.
Step 5: The multi-head self-attention modules are applied both before and after the BiGRU layers to capture complementary temporal and feature representations.
Step 6: The IMPA-Att-BiGRU forecasting model is trained using the training dataset.
Step 7: The forecasting performance of IMPA-Att-BiGRU is evaluated using the testing dataset.

3. Results and Discussion

In this study, two complementary PV output power forecasting experiments are conducted to evaluate the proposed IMPA–Att-BiGRU model, including single-day one-step (15 min) prediction and cross-day rolling forecasting over four consecutive days. There are three commonly used evaluation metrics shown in Equation (35): the mean absolute error (MAE), the root mean square error (RMSE), and the coefficient of determination (R2), which are adopted to quantitatively evaluate the forecasting performance of the proposed model.
M A E = 1 N i = 1 N y ^ i y i R M S E = 1 N i = 1 N ( y ^ i y i ) 2 R 2 = 1 i ( y ^ i y i ) 2 i ( y i y ¯ ) 2
where y ^ i and yi represent the ith sample’s forecasted value and actual value, respectively; N is the total number of samples; and y ¯ is the mean of the actual values.
For fair comparison, the backbone architecture of the proposed IMPA-Att-BiGRU model is kept consistent across the two complementary experiments. Specifically, the BiGRU forecaster employs a fixed two-layer configuration to balance representation capacity and generalization, and the second-layer hidden units are set to half of the first-layer units to control model complexity. The learning rate, first-layer hidden units, and dropout rate are automatically tuned by the IMPA, with the search space defined as learning rate in [3 × 10−4, 3 × 10−3] sampled on a logarithmic scale, hidden units in [64, 256] with a step size of 16, and dropout in [0.10, 0.35] with a step size of 0.05. The IMPA configuration is fixed across experiments with a population size of 30 and a maximum iteration budget of 200, and an early-termination rule is applied to stop the optimization if the best validation fitness does not improve for 30 consecutive iterations.

3.1. Single-Day One-Step Forecasting Experiment

3.1.1. Dataset Processing

The dataset used in this study was collected from an operational PV generation system located in southern China (installed capacity: 50 kW) from 8 a.m. to 8 p.m. each day. Measurements were recorded every 15 min between 08:00 and 20:00 each day, covering the period from 1 January 2014 to 10 December 2018, which provides long-term routine operational data for PV forecasting. The dataset was divided into training and testing subsets with a ratio of 80% and 20%, respectively. Although all experiments adopted the data split ratio, independent data partitions were generated for different experimental settings using fixed random seeds to ensure reproducibility and experimental fairness.
The dataset was collected from a real operating PV generation system, where measurements are inevitably affected by practical issues such as sensor noise, communication dropouts, missing records, and occasional gross errors. Therefore, data reconciliation is a necessary front-end in real-world PV output power forecasting pipelines to ensure the reliability of the initial inputs and to prevent error propagation to downstream learning and evaluation. The data reconciliation was applied to improve input reliability by treating missing values and suppressing gross errors (outliers) following the robust reconciliation frameworks in Refs. [30,31]. In practice, preprocessing was conducted on the training set only, and the same rules were applied to the testing set to avoid information leakage. First, missing values were detected by checking each variable against its sampling timeline. Short gaps (no more than four consecutive time steps) were imputed using time-series interpolation (linear interpolation in time), whereas longer consecutive gaps were treated as invalid segments and excluded from model training. Second, gross errors were identified by comparing measurements with reconciled estimates obtained from a robust reconciliation model. The reconciliation was formulated by minimizing weighted residuals between measured and reconciled values, with an M-estimation penalty to reduce the influence of abnormal observations. Samples with absolute normalized residuals larger than 3 were flagged as outliers and replaced by their reconciled estimates.
Ten meteorological and operational variables were used as input features in the dataset. These variables describe irradiance conditions, atmospheric states, and PV module operating characteristics. The details of the data characteristics are shown in Table 3.
At first, KPCA was applied prior to model training in order to remove the feature components with low explanatory power. KPCA enabled the extraction of the most informative nonlinear structures from the original 10-dimensional feature space while eliminating redundancy among variables. The KPCA adopted an RBF kernel, and the bandwidth was set to σ = 4.176, which was estimated on the Z-score standardized training set using the median heuristic. The contribution rate results of each principal component are illustrated in Figure 15.
As depicted in Figure 15, KPCA revealed a sharp decay in eigenvalue magnitude after the fifth principal component, indicating an intrinsic low-dimensional manifold within the original 10-dimensional feature space, while the cumulative rate of the principal components 1–5 exceeded 95%. Although the cumulative contribution rate (≥95%) provides a common guideline for selecting the number of retained components, the forecasting performance may still depend on the retained component number k. Therefore, a sensitivity analysis was conducted by varying k from 3 to 7. To isolate the effect of k, all other settings were kept unchanged, including the evaluation split and training protocol. The corresponding results of the sensitivity analysis for the proposed IMPA-Att-BiGRU model are shown in Table 4.
Based on the sensitivity analysis in Table 4, k = 5 was selected as the retained KPCA principal component number. When k was reduced to 3 or 4, the forecasting accuracy degraded noticeably, indicating that retaining too few components discards informative nonlinear variability that is important for PV output power forecasting—particularly around peaks and rapid fluctuations, to which RMSE is more sensitive. In contrast, increasing k beyond 5 yielded only marginal gains: The improvements from k = 5 to k = 6 or k = 7 were minor, suggesting a clear performance plateau where additional components mainly introduce redundant information. Therefore, k = 5 provides a favorable trade-off between compact representation (cumulative contribution rate 0.9738) and forecasting accuracy. Accordingly, a 5-dimensional feature representation was adopted as the compact input for the subsequent experiments, without sacrificing the nonlinear structure of the information.

3.1.2. Component-Wise Ablation Study of the Proposed Model

To verify the individual contribution of each enhanced module in IMPA-Att-BiGRU, the ablation experiments were conducted under normal weather conditions (sunny) and abnormal weather conditions (cloudy and rainy), comparing the BiGRU model, the Att-BiGRU, and the IMPA-Att-BiGRU model for the PV output power forecasting. Figure 16, Figure 17 and Figure 18 illustrate the representative forecasting results under each weather condition on a single representative test day, respectively, while the quantitative error metrics are summarized in Table 5.
As shown by the ablation experiment results, all the three evaluation metrics of the proposed IMPA-Att-BiGRU forecasting model achieved the best performance under different weather conditions, including sunny, rainy, and cloudy. As a comparison, the original BiGRU model without any improvement or optimization had the highest error regardless of weather conditions.
It should be noted that since the meteorological parameters were stable on sunny days, the variation trend of photovoltaic output power on each sunny day was roughly similar, while the variation trend of meteorological parameters on cloudy and rainy days was random, resulting in the accuracy of each PV power forecasting model on sunny days being higher than that on cloudy and rainy days. However, it can be proved from the experimental results that under cloudy and rainy weather conditions, the Att-BiGRU better fit the change trend of the actual photovoltaic output power than the original BiGRU model, which strongly confirms that the multi-head self-attention mechanism improves the capabilities of feature extraction and analysis. In terms of accuracy, although the forecasting error of Att-BiGRU was lower than that of the original BiGRU model, the forecasting accuracy under various weather conditions was significantly improved after the hyperparameter optimization by the proposed IMPA algorithm.
In summary, it can be proved that the simultaneous introduction of the multi-head self-attention mechanism and the IMPA to the BiGRU model is more meaningful for improving the accuracy of PV output power forecasting results. These results demonstrate that IMPA-Att-BiGRU could simultaneously capture not only the hidden temporal dependencies but also the short-term fluctuations, confirming that the multi-head self-attention mechanism and the IMPA provide complementary enhancements to PV output power forecasting.

3.1.3. Performance Comparison of Optimization Algorithms in PV Output Power Forecasting

To further verify the more effective parameter optimization effect of the IMPA on the PV output power forecasting model, comparative experiments were conducted against the other three optimization algorithms mentioned above (PSO, GWO, and MPA). The forecasting results under sunny, cloudy, and rainy weather conditions are illustrated in Figure 19, Figure 20 and Figure 21, while the quantitative evaluation results in terms of MAE, RMSE, and R2 are summarized in Table 6.
Under sunny weather conditions, all the optimized forecasting models were able to closely track the smooth diurnal variation of the PV output power, as shown in Figure 19. However, noticeable performance differences were still observed. PSO-Att-BiGRU exhibited the largest prediction deviation, with an MAE of 1.3721 and an RMSE of 2.1424, whereas GWO-Att-BiGRU and MPA-Att-BiGRU achieved progressively improved accuracy. In contrast, the proposed IMPA-Att-BiGRU achieved the best performance, reducing the MAE and RMSE to 1.0166 and 1.6234, respectively, and yielding the highest R2 value of 0.9979.
For cloudy conditions, characterized by moderate irradiance fluctuations, the performance advantage of IMPA-Att-BiGRU became more pronounced, as illustrated in Figure 20. Compared with MPA-Att-BiGRU, the proposed method reduced MAE and RMSE by approximately 15.8% and 13.3%, respectively. Meanwhile, IMPA-Att-BiGRU also outperformed GWO-Att-BiGRU and PSO-Att-BiGRU, achieving the highest R2 value of 0.9939. These results indicate that the proposed improvement strategies of the IMPA can effectively enhance the exploration and exploitation balance of the original algorithm, enabling the forecasting model to better capture the highly fluctuating power variations under cloudy conditions.
Under rainy weather conditions, where PV output exhibited severe intermittency and rapid power ramps, all models experienced increased forecasting difficulty, as shown in Figure 21. Nevertheless, IMPA-Att-BiGRU consistently maintained superior performance. As reported in Table 6, the MAE and RMSE of IMPA-Att-BiGRU were reduced to 1.2855 and 2.6652, respectively, which were significantly lower than those obtained by PSO-Att-BiGRU, GWO-Att-BiGRU, and MPA-Att-BiGRU. Moreover, IMPA-Att-BiGRU achieved the highest R2 value of 0.9925, indicating a more robust capability in capturing both abrupt fluctuations and overall power trends under highly volatile weather conditions. The results prove that the proposed IMPA algorithm not only maintains the local exploitation but also enhances the global exploration performance. Compared with the PSO, the GWO, and the original MPA, the optimization effect of the IMPA on the hyperparameters of the forecasting model is much better, which confirms the effectiveness of the improvement strategies.
In summary, the IMPA’s enhancements are particularly beneficial for hyperparameter tuning because the fitness landscape is expensive and often multimodal. The information-exchange mechanism accelerates early convergence by propagating effective hyperparameter patterns across individuals, while the quasi-opposite learning increases diversity and helps escape local optima without increasing the population size. The improved exploration–exploitation dynamics provide a more balanced global search and local refinement, which stabilizes convergence under a limited evaluation budget. Finally, the refracted opposition strategy refines the elite solution by probing a targeted counterpart around the current best hyperparameters and retaining it only when it improves validation fitness, reducing premature stagnation and improving the robustness of the final tuned configuration.

3.1.4. Performance Comparison with Benchmark Forecasting Models

Three commonly used prediction models, including the XGBoost model, the RNN model, and the LSTM model, were selected for comparative experiments to confirm the performance of IMPA-Att-BiGRU in PV output power forecasting.
These models represent classical machine learning approaches and commonly adopted recurrent neural network architectures for time-series forecasting. All benchmark models were trained and evaluated under the same experimental settings to ensure a fair comparison. Figure 22, Figure 23 and Figure 24 illustrate the forecasting results of different models under different weather conditions, while the corresponding quantitative evaluation results in terms of MAE, RMSE, and R2 are summarized in Table 7.
Under sunny conditions, all benchmark models were generally able to follow the smooth diurnal variation of PV output power; however, noticeable deviations still occurred around the ramp-up and peak periods. The proposed IMPA-Att-BiGRU achieved the highest forecasting accuracy, reducing MAE and RMSE by 40.12% and 33.77%, respectively, compared with the best benchmark forecasting model (LSTM), while simultaneously yielding a higher R2. The experiments results indicate that IMPA-Att-BiGRU provides a substantially tighter fit to the actual power curve even under relatively stable irradiance conditions.
For cloudy conditions, characterized by moderate and irregular irradiance fluctuations, the forecasting performance differences among the different models became more obvious. IMPA-Att-BiGRU significantly outperformed the benchmark models, achieving MAE and RMSE reductions of 49.53% and 52.37%, respectively, relative to the best-performing benchmark forecasting model (LSTM). For rainy conditions, PV output power showed frequent intermittent and slope events, while the proposed IMPA-Att-BiGRU model still maintained better performance, compared with the best-performing benchmark forecasting model (LSTM), achieving MAE and RMSE reductions of 38.15% and 47.41%, respectively. Meanwhile, since it always maintained the highest R2, the proposed model could face various weather changes while forecasting PV output power. In summary, these results validate the effectiveness of IMPA-Att-BiGRU for practical PV output power forecasting applications.

3.2. Cross-Day Rolling Forecasting Experiment

In addition to assessing continuous-time forecasting performance, a four-day rolling forecasting protocol was designed to examine whether the IMPA-Att-BiGRU model can maintain stable accuracy under long-horizon error accumulation, while an additional cross-region dataset was used to further examine generalization beyond the single-day setting. The dataset used in this study was collected from a PV generation system located in northwestern China (installed capacity: 50 kW). Measurements were recorded at 15 min intervals from 1 January 2016, to 14 October 2019, covering 24 h per day. The dataset was divided into training and testing subsets with a ratio of 80% and 20%, respectively. Meanwhile, a fivefold cross-validation scheme was employed on the full timeline of the dataset. In each fold, the model was trained on the corresponding training split and tested on the held-out split, where rolling forecasting was performed for all valid starting points whose four-day horizons laid completely in the test segment. Specifically, a recursive rolling strategy was adopted: At each step, the one-step-ahead prediction was fed back as part of the input for the next step, so that errors could accumulate over the four-day horizon. The final results are reported as the mean ± standard deviation across the five folds. Unless otherwise specified, all model configurations, hyperparameter search ranges, training settings, and evaluation criteria in this section were kept exactly the same as those in Section 3.1. This design ensured a fair comparison and isolated the impact of the dataset and the cross-day rolling setting on forecasting performance, rather than confounding factors introduced by different parameter choices. Figure 25 illustrates the forecasting results of different models of the cross-day rolling forecasting experiment, while the corresponding quantitative evaluation results in terms of MAE, RMSE, and R2 are summarized in Table 8.
Since in the cross-day rolling forecasting experiment a four-day recursive multi-step strategy was adopted, where one-step-ahead predictions were iteratively fed back as inputs for subsequent steps, the cross-day rolling forecasting task was more challenging than the single-day evaluation because it adopted a recursive multi-step strategy over a four-day horizon and covered a full 24 h period, so larger absolute errors were expected. As shown in Figure 25, all models could reproduce the overall diurnal evolution of PV output power, whereas noticeable deviations emerged around sharp ramps and peak regions due to long-horizon error accumulation. The quantitative results in Table 8 further confirm this observation. Compared with the plain BiGRU, Att-BiGRU substantially improved accuracy by introducing the dual multi-head self-attention module, reducing MAE and RMSE from 5.9577 kW and 8.6803 kW to 3.7591 kW and 6.3248 kW, respectively, while increasing R2 from 0.9566 to 0.9823. With IMPA-based hyperparameter optimization, IMPA-Att-BiGRU achieved the best performance, further lowering MAE and RMSE to 2.2418 kW and 4.4049 kW and yielding the highest R2 of 0.9950. Notably, IMPA-Att-BiGRU also exhibited the smallest standard deviations across folds, indicating improved robustness against different data splits and reduced sensitivity to long-horizon error propagation. Overall, the results demonstrate that the attention mechanism enhanced temporal feature focusing under complex fluctuations, and the proposed IMPA optimization further strengthened the balance between fitting accuracy and generalization, enabling more stable multi-day PV output power forecasting.

4. Conclusions

This study developed an attention-enhanced BiGRU forecasting model optimized by an improved Marine Predators Algorithm (IMPA), termed IMPA-Att-BiGRU, for photovoltaic (PV) output power forecasting under diverse weather conditions. By integrating KPCA-based nonlinear feature compression, a dual multi-head self-attention module around the BiGRU backbone, and IMPA-driven hyperparameter optimization, the proposed model achieves consistently improved accuracy and robustness.
From the ablation study, introducing the multi-head self-attention mechanism and the IMPA brings substantial error reductions compared with the plain BiGRU baseline. Specifically, IMPA-Att-BiGRU reduces MAE by 35.7–58.5% and RMSE by 22.8–49.1% across sunny, cloudy, and rainy conditions, while improving R2 by 0.0218–0.0411 in absolute terms, confirming the complementary benefits of attention-based representation enhancement and IMPA-based parameter optimization. The optimization algorithm comparison further verifies the effectiveness of the IMPA over other metaheuristics. Relative to MPA-Att-BiGRU, IMPA-Att-BiGRU achieves additional reductions of 9.9–24.0% in MAE and 9.2–13.3% in RMSE, with consistent improvements in goodness of fit. In the benchmark model comparison, the proposed method also outperforms classical baselines (XGBoost, RNN, and LSTM). Against the best benchmark under each weather type, IMPA-Att-BiGRU reduces MAE by 40.1% (sunny), 49.5% (cloudy), and 38.1% (rainy), and reduces RMSE by 33.8% (sunny), 52.4% (cloudy), and 47.4% (rainy), demonstrating strong robustness under increasingly volatile irradiance.
To evaluate long-horizon stability, a cross-day rolling forecasting experiment with fivefold cross-validation is further conducted. Compared with BiGRU, IMPA-Att-BiGRU achieves 62.4% lower MAE and 49.3% lower RMSE, and increases R2 from 0.9566 to 0.9950. Notably, the reduced standard deviations in MAE/RMSE indicate more stable performance across folds, suggesting that the proposed model can mitigate error accumulation in multi-day recursive forecasting and maintain reliable tracking of PV output trajectories.
Overall, the proposed IMPA-Att-BiGRU provides an accurate and robust solution for PV output power forecasting under both stable and highly fluctuating weather conditions, and shows clear potential for practical deployment in PV operation scheduling and energy management.

5. Future Prospects

Future research will focus on several directions to further enhance the applicability and robustness of the proposed IMPA-Att-BiGRU model. First, probabilistic PV output power forecasting will be investigated by integrating uncertainty quantification techniques, such as prediction intervals or quantile-based learning, to better characterize forecasting uncertainty under highly volatile meteorological conditions. Second, multi-site and cross-regional modeling will be explored to evaluate the generalization capability of the proposed method across different climatic zones and PV system configurations. Third, considering practical deployment requirements, lightweight model structures and real-time inference strategies will be studied to improve computational efficiency and enable online forecasting and adaptive updating in real-world PV power systems. Overall, these future research directions aim to further extend the proposed model toward more reliable, scalable, and practical PV power forecasting applications in complex and dynamic energy systems.

Author Contributions

Conceptualization, S.L. and H.F.; methodology, S.L.; software, H.H. and H.L.; validation, S.L., S.X. and H.H.; formal analysis, H.H. and H.L.; resources, S.X., B.H. and P.C.; data curation, S.X., B.H. and P.C.; writing—original draft preparation, S.L.; writing—review and editing, S.X.; supervision, H.F.; funding acquisition, S.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Digital Factory Management and Control Technology R&D Center (6025310013PQ) and the Shenzhen Polytechnic University Research Fund (6025310056K).

Data Availability Statement

The processed datasets are available from the corresponding author upon request.

Acknowledgments

The authors gratefully acknowledge the support from the H.F. Model Worker Innovation Laboratory. Meanwhile, we are grateful for the original partners who supported data collection and analyses for the initial work on the photovoltaic output power forecasting.

Conflicts of Interest

Author Bing Han and Peng Cui were employed by the Panjin Power Supply Company, State Grid Liaoning Electric Power Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
PVPhotovoltaic
ARIMAAutoregressive Integrated Moving Average
GRUGated Recurrent Unit
BiGRUBidirectional Gated Recurrent Unit
LSTMLong Short-Term Memory
RNNRecurrent Neural Network
XGBoostExtreme Gradient Boosting
SVMSupport Vector Machines
KNNK-Nearest Neighbors
MPAMarine Predators Algorithm
IMPAImproved Marine Predators Algorithm
GWOGrey Wolf Optimizer
PSOParticle Swarm Optimization
KPCAKernel Principal Component Analysis
AttAttention
MAEMean Absolute Error
RMSERoot Mean Square Error
R2Coefficient of Determination

References

  1. Tian, J.; Ooka, R.; Lee, D. Multi-scale solar radiation and photovoltaic power forecasting with machine learning algorithms in urban environment: A state-of-the-art review. J. Clean. Prod. 2023, 426, 139040. [Google Scholar] [CrossRef]
  2. Iheanetu, K.J. Solar photovoltaic power forecasting: A review. Sustainability 2022, 14, 17005. [Google Scholar] [CrossRef]
  3. Wang, L.; Liu, Y.; Li, T.; Xie, X.; Chang, C. The short-term forecasting of asymmetry photovoltaic power based on the feature extraction of PV power and SVM algorithm. Symmetry 2020, 12, 1777. [Google Scholar] [CrossRef]
  4. Wang, T.; Gong, Z.; Wang, Z.; Liu, Y.; Ma, Y.; Wang, F.; Li, J. Research and optimization of ultra-short-term photovoltaic power prediction model based on symmetric parallel TCN-TST-BiGRU architecture. Symmetry 2025, 17, 1855. [Google Scholar] [CrossRef]
  5. Hui, L.; Ren, Z.Y.; Yan, X.; Li, W.Y.; Bo, H. A multi-data driven hybrid learning method for weekly photovoltaic power scenario forecast. IEEE Trans. Sustain. Energy 2021, 13, 91–100. [Google Scholar] [CrossRef]
  6. Gu, B.; Shen, H.Q.; Lei, X.H.; Hu, H.; Liu, X.Y. Forecasting and uncertainty analysis of day-ahead photovoltaic power using a novel forecasting method. Appl. Energy 2021, 299, 117291. [Google Scholar] [CrossRef]
  7. Sun, Y.; Wang, Z.; Wang, J.; Li, Q. Short-term solar photovoltaic power prediction utilizing the VMD-BKA-BP neural network. Symmetry 2025, 17, 784. [Google Scholar] [CrossRef]
  8. Park, S.; Kim, Y.; Ferrier, N.J.; Collis, S.M.; Sankaran, R.; Beckman, P.H. Prediction of solar irradiance and photovoltaic solar energy product based on cloud coverage Estimation using machine learning methods. Atmosphere 2021, 12, 395. [Google Scholar] [CrossRef]
  9. Gu, B.; Li, X.; Xu, F.L.; Yang, X.P.; Wang, F.Y.; Wang, P.Z. Forecasting and uncertainty analysis of day-ahead photovoltaic power based on WT-CNN-BiLSTM-AM-GMM. Sustainability 2023, 15, 6538. [Google Scholar] [CrossRef]
  10. Singh, P.; Mandpura, A.K.; Yadav, V.K. Power forecasting in photovoltaic system using hybrid ANN and wavelet transform based method. J. Sci. Ind. Res. 2022, 82, 63–74. [Google Scholar]
  11. Li, Y.; Zhai, S.; Yi, G.; Pang, S.; Luo, X. Short-term photovoltaic power forecasting based on ICEEMDAN-TCN-BiLSTM-MHA. Symmetry 2025, 17, 1599. [Google Scholar] [CrossRef]
  12. Balraj, G.; Victoire, A.A.; Jaikumar, S.; Victoire, A. Variational mode decomposition combined fuzzy-Twin support vector machine model with deep learning for solar photovoltaic power forecasting. PLoS ONE 2022, 17, e0273632. [Google Scholar] [CrossRef] [PubMed]
  13. Li, H.; Liu, P.; Guo, S.L.; Zuo, Q.U.; Cheng, L.; Tao, J.; Huang, K.D.; Yang, Z.K.; Han, D.Y.; Ming, B. Integrating teleconnection factors into long-term complementary operating rules for hybrid power systems: A case study of Longyangxia hydro-photovoltaic plant in China. Renew. Energy 2022, 186, 517–534. [Google Scholar] [CrossRef]
  14. Zhu, J.B.; Li, M.R.; Luo, L.; Zhang, B.D.; Cui, M.J.; Yu, L.J. Short-term PV power forecast methodology based on multi-scale fluctuation characteristics extraction. Renew. Energy 2023, 208, 141–151. [Google Scholar] [CrossRef]
  15. Lateko, A.A.H.; Yang, H.T.; Huang, C.M.; Aprillia, H.; Hsu, C.Y.; Zhong, J.L.; Phuong, N.H. Stacking ensemble method with the RNN meta-learner for short-term PV power forecasting. Energies 2021, 14, 4733. [Google Scholar] [CrossRef]
  16. Akhter, M.N.; Mekhilef, S.; Mokhlis, H.; Ali, R.; Usama, M.; Muhammad, M.A.; Khairudin, A.S.M. A hybrid deep learning method for an hour ahead power output forecasting of three different photovoltaic systems. Appl. Energy 2022, 307, 118185. [Google Scholar] [CrossRef]
  17. He, K.; Zhang, Y.; Wang, Y.K.; Zhou, R.H.; Liu, H. Feature-enhanced multivariate ensemble model for PV power spatio-temporal forecasting and scenario generation. Appl. Soft Comput. 2025, 183, 113646. [Google Scholar] [CrossRef]
  18. Zhang, Z.B.; Huang, X.Q.; Li, C.L.; Cheng, F.Y.; Tai, Y.H. CRAformer: A cross-residual attention transformer for solar irradiation multistep forecasting. Energy 2025, 320, 135214. [Google Scholar] [CrossRef]
  19. Xie, G.M.; Zhang, Z.J.; Xie, S.; Yuan, C.W.; Liu, H. CPWformer-DEC: Improved Transformer with class-priority weather attention and dynamic error compensation for photovoltaic power forecasting. Expert Syst. Appl. 2026, 301, 130580. [Google Scholar] [CrossRef]
  20. Ding, Y.M.; Zhou, S.N.; Deng, W.W. Sustainable PV power forecasting via MPA-VMD optimized BiGRU with attention mechanism. Mathematics 2025, 13, 1531. [Google Scholar] [CrossRef]
  21. Abdel-Basset, M.; El-Shahat, D.; Chakrabortty, R.K.; Ryan, M. Parameter estimation of photovoltaic models using an improved marine predators algorithm. Symmetry 2024, 16, 1643. [Google Scholar] [CrossRef]
  22. Li, P.; Zhang, W.L.; Lu, C.J.; Zhang, R.Z.; Li, X.L. Robust kernel principal component analysis with optimal mean. Neural Netw. 2022, 152, 347–352. [Google Scholar] [CrossRef]
  23. Zha, W.T.; Li, X.Y.; Du, Y.J.; Liang, Y.Y. Interval forecast method for wind power based on GCN-GRU. Symmetry 2024, 16, 1643. [Google Scholar] [CrossRef]
  24. Li, Y.H.; Yang, N.; Bi, G.H.; Chen, S.Y.; Luo, Z.; Shen, X. Carbon price forecasting using a hybrid deep learning model: TKMixer-BiGRU-SA. Symmetry 2025, 17, 6. [Google Scholar] [CrossRef]
  25. Faramarzi, A.; Heidarinejad, M.; Mirjalili, S.; Gandomi, A.H. Marine Predators Algorithm: A nature-inspired metaheuristic. Expert Syst. Appl. 2020, 152, 113377. [Google Scholar] [CrossRef]
  26. Wang, H.B.; Zhang, J.Y.; Fan, J.K.; Zhang, C.Y.D.; Deng, B.; Zhao, W.T. An improved grey wolf optimizer with flexible crossover and mutation for cluster task scheduling. Inf. Sci. 2025, 704, 121943. [Google Scholar] [CrossRef]
  27. Wang, Y.; Ran, S.J.; Wang, G.G. Role-oriented binary grey wolf optimizer using foraging-following and Levy flight for feature selection. Appl. Math. Model. 2024, 126, 310–326. [Google Scholar] [CrossRef]
  28. Chantar, H.; Mafarja, M.; Alsawalqah, H.; Heidari, A.A.; Aljarah, I.; Faris, H. Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification. Neural Comput. Appl. 2020, 32, 12201–12220. [Google Scholar] [CrossRef]
  29. Shami, T.M.; El-Saleh, A.A.; Alswaitti, M.; Al-Tashi, Q.; Summakieh, M.A.; Mirjalili, S. Particle swarm optimization: A comprehensive survey. IEEE Access 2022, 10, 10031–10061. [Google Scholar] [CrossRef]
  30. Xie, S.; Wang, H.Z.; Peng, J.C.; Liu, X.L.; Yuan, X.F. A hierarchical data reconciliation based on multiple time-delay interval estimation for industrial processes. ISA Trans. 2020, 105, 198–209. [Google Scholar] [CrossRef]
  31. Xie, S.; Yang, C.H.; Yuan, X.F.; Wang, X.L.; Xie, Y.F. A novel robust data reconciliation method for industrial processes. Control Eng. Pract. 2019, 83, 203–212. [Google Scholar] [CrossRef]
Figure 1. The basic structure of the GRU.
Figure 1. The basic structure of the GRU.
Symmetry 18 00282 g001
Figure 2. The BiGRU structure.
Figure 2. The BiGRU structure.
Symmetry 18 00282 g002
Figure 3. The structure of the multi-head self-attention layer.
Figure 3. The structure of the multi-head self-attention layer.
Symmetry 18 00282 g003
Figure 4. The iteration curve of a.
Figure 4. The iteration curve of a.
Symmetry 18 00282 g004
Figure 5. The unimodal benchmark function F1.
Figure 5. The unimodal benchmark function F1.
Symmetry 18 00282 g005
Figure 6. The unimodal benchmark function F2.
Figure 6. The unimodal benchmark function F2.
Symmetry 18 00282 g006
Figure 7. The multimodal benchmark function F3.
Figure 7. The multimodal benchmark function F3.
Symmetry 18 00282 g007
Figure 8. The multimodal benchmark function F4.
Figure 8. The multimodal benchmark function F4.
Symmetry 18 00282 g008
Figure 9. Comparison of optimization search for F1.
Figure 9. Comparison of optimization search for F1.
Symmetry 18 00282 g009
Figure 10. Comparison of optimization search for F2.
Figure 10. Comparison of optimization search for F2.
Symmetry 18 00282 g010
Figure 11. Comparison of optimization search for F3.
Figure 11. Comparison of optimization search for F3.
Symmetry 18 00282 g011
Figure 12. Comparison of the optimization search for F4.
Figure 12. Comparison of the optimization search for F4.
Symmetry 18 00282 g012
Figure 13. Flowchart of the IMPA-Att-BiGRU forecasting model.
Figure 13. Flowchart of the IMPA-Att-BiGRU forecasting model.
Symmetry 18 00282 g013
Figure 14. Data flow of the IMPA-Att-BiGRU forecasting model.
Figure 14. Data flow of the IMPA-Att-BiGRU forecasting model.
Symmetry 18 00282 g014
Figure 15. KPCA contribution rates and cumulative contribution rates of the principal components.
Figure 15. KPCA contribution rates and cumulative contribution rates of the principal components.
Symmetry 18 00282 g015
Figure 16. The forecasting and the actual PV output power under sunny weather conditions.
Figure 16. The forecasting and the actual PV output power under sunny weather conditions.
Symmetry 18 00282 g016
Figure 17. The forecasting and the actual PV output power under cloudy weather conditions.
Figure 17. The forecasting and the actual PV output power under cloudy weather conditions.
Symmetry 18 00282 g017
Figure 18. The forecasting and the actual PV output power under rainy weather conditions.
Figure 18. The forecasting and the actual PV output power under rainy weather conditions.
Symmetry 18 00282 g018
Figure 19. The forecasting and the actual PV output power under sunny weather conditions.
Figure 19. The forecasting and the actual PV output power under sunny weather conditions.
Symmetry 18 00282 g019
Figure 20. The forecasting and the actual PV output power under cloudy weather conditions.
Figure 20. The forecasting and the actual PV output power under cloudy weather conditions.
Symmetry 18 00282 g020
Figure 21. The forecasting and the actual PV output power under rainy weather conditions.
Figure 21. The forecasting and the actual PV output power under rainy weather conditions.
Symmetry 18 00282 g021
Figure 22. The forecasting and the actual PV output power under sunny weather conditions.
Figure 22. The forecasting and the actual PV output power under sunny weather conditions.
Symmetry 18 00282 g022
Figure 23. The forecasting and the actual PV output power under cloudy weather conditions.
Figure 23. The forecasting and the actual PV output power under cloudy weather conditions.
Symmetry 18 00282 g023
Figure 24. The forecasting and the actual PV output power under rainy weather conditions.
Figure 24. The forecasting and the actual PV output power under rainy weather conditions.
Symmetry 18 00282 g024
Figure 25. The forecasting and the actual PV output power of the cross-day rolling forecasting experiment.
Figure 25. The forecasting and the actual PV output power of the cross-day rolling forecasting experiment.
Symmetry 18 00282 g025
Table 1. Pseudocode of IMPA.
Table 1. Pseudocode of IMPA.
Pseudocode of IMPA
Initialize search agent (prey) population i = 1, …, n based on Equation (16)
While termination criteria are not met
Calculate the fitness, construct the best solution (predator)
    For i = 1, …, n
         Generate candidate solution using information exchange based on Equation (22)
             If candidate is better than Pi then
                  Pi = candidate
             End (if)
    End (for)
Select a subset of individuals and generate opposite solutions based on Equation (23)
Generate quasi-opposite solutions based on Equation (24)
Merge original and quasi-opposite individuals
Sort by fitness and keep the best n individuals, update predator
    If t < tmax/3
Update prey based on Equation (19), update predator
Else if tmax/3 < t < 2 × tmax/3
Update prey based on Equations (25)–(27), update predator
Else if t > 2 × tmax/3
Update prey based on Equation (21), update predator
Generate refracted counterpart of predator based on Equations (28) and (29)
             If the refracted predator is better than the current predator then
         Update prey based on Equation (30), update predator
End (if)
End (if)
End (while)
Table 2. Evaluations for the comparison optimization methods.
Table 2. Evaluations for the comparison optimization methods.
FunctionAlgorithmOptimalWorstMeanStd
F1PSO3.92 × 10−81.55 × 10−64.50 × 10−74.18 × 10−7
GWO4.22 × 10−814.24 × 10−735.00 × 10−741.30 × 10−74
MPA3.93 × 10−371.07 × 10−345.81 × 10−354.33 × 10−35
IMPA0.000.000.000.00
F2PSO2.46 × 10−55.37 × 10−22.25 × 10−21.81 × 10−2
GWO1.29 × 10−397.57 × 10−367.37 × 10−372.30 × 10−37
MPA7.93 × 10−211.47 × 10−199.07 × 10−201.15 × 10−20
IMPA0.000.000.000.00
F3PSO1.28 × 10−28.53 × 10−24.02 × 10−22.41 × 10−3
GWO1.52 × 10−77.29 × 10−73.30 × 10−71.13 × 10−7
MPA3.31 × 10−131.59 × 10−112.19 × 10−124.04 × 10−12
IMPA1.39 × 10−158.48 × 10−148.20 × 10−141.38 × 10−13
F4PSO−5.0692−5.0545−5.05832.10 × 10−3
GWO−5.0830−5.0628−5.07911.60 × 10−3
MPA−5.0870−5.0764−5.08121.33 × 10−3
IMPA−10.1532−10.1532−10.15320.00
Table 3. Description of input features.
Table 3. Description of input features.
No.Feature NameUnit
1Ambient temperature°C
2Module temperature°C
3Wind speedm/s
4Relative humidity%
5Global horizontal irradianceW/m2
6Diffuse horizontal irradianceW/m2
7Tilted global irradianceW/m2
8Tilted diffuse irradianceW/m2
9Rainfallmm
10Air pressurehPa
Table 4. Results of the sensitivity analysis of the retained KPCA principal component number on the forecasting performance of IMPA-Att-BiGRU.
Table 4. Results of the sensitivity analysis of the retained KPCA principal component number on the forecasting performance of IMPA-Att-BiGRU.
kCumulative Contribution RateMAE (kW)RMSE (kW)R2
30.76441.28643.27010.9631
40.87671.19432.90140.9810
50.97381.14012.01260.9960
60.98401.12992.00280.9968
70.99161.12112.00020.9971
Table 5. Quantitative performance comparison of different models.
Table 5. Quantitative performance comparison of different models.
Weather ConditionsModelMAE (kW)RMSE (kW)R2
SunnyBiGRU1.62332.37120.9757
Att-BiGRU1.40932.16590.9841
IMPA-Att-BiGRU1.04421.83170.9975
CloudyBiGRU2.00374.16350.9582
Att-BiGRU1.89203.18910.9811
IMPA-Att-BiGRU1.13972.11860.9943
RainyBiGRU2.94074.80090.9518
Att-BiGRU2.56264.12590.9792
IMPA-Att-BiGRU1.22142.65990.9929
Table 6. Quantitative performance comparison of different models.
Table 6. Quantitative performance comparison of different models.
Weather ConditionsModelMAE (kW)RMSE (kW)R2
SunnyPSO-Att-BiGRU1.37212.14240.9850
GWO-Att-BiGRU1.29282.00370.9872
MPA-Att-BiGRU1.12871.81910.9913
IMPA-Att-BiGRU1.01661.62340.9979
CloudyPSO-Att-BiGRU1.71133.02790.9819
GWO-Att-BiGRU1.57012.76640.9826
MPA-Att-BiGRU1.39962.46390.9897
IMPA-Att-BiGRU1.17812.13750.9939
RainyPSO-Att-BiGRU2.44814.05360.9815
GWO-Att-BiGRU2.00733.62880.9830
MPA-Att-BiGRU1.69122.93490.9872
IMPA-Att-BiGRU1.28552.66520.9925
Table 7. Quantitative performance comparison of different models.
Table 7. Quantitative performance comparison of different models.
Weather ConditionsModelMAE (kW)RMSE (kW)R2
SunnyXGBoost2.40253.12210.9651
RNN2.16792.99160.9686
LSTM1.72472.59010.9744
IMPA-Att-BiGRU1.03281.71540.9976
CloudyXGBoost2.77204.95010.9483
RNN2.60084.67990.9512
LSTM2.23174.40180.9559
IMPA-Att-BiGRU1.12632.09610.9946
RainyXGBoost3.18505.90140.9369
RNN2.89115.32610.9435
LSTM2.02654.93570.9499
IMPA-Att-BiGRU1.25342.59570.9929
Table 8. Quantitative performance comparison of different models.
Table 8. Quantitative performance comparison of different models.
ModelMAE (kW)RMSE (kW)R2
BiGRU5.9577 ± 1.46178.6803 ± 0.63750.9566 ± 0.0041
Att-BiGRU3.7591 ± 0.82246.3248 ± 0.59810.9823 ± 0.0024
IMPA-Att-BiGRU2.2418 ± 0.23724.4049 ± 0.31220.9950 ± 0.0011
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, S.; Fu, H.; Xie, S.; Han, H.; Liu, H.; Han, B.; Cui, P. Research on Photovoltaic Output Power Forecasting Based on an Attention-Enhanced BiGRU Optimized by an Improved Marine Predators Algorithm. Symmetry 2026, 18, 282. https://doi.org/10.3390/sym18020282

AMA Style

Liu S, Fu H, Xie S, Han H, Liu H, Han B, Cui P. Research on Photovoltaic Output Power Forecasting Based on an Attention-Enhanced BiGRU Optimized by an Improved Marine Predators Algorithm. Symmetry. 2026; 18(2):282. https://doi.org/10.3390/sym18020282

Chicago/Turabian Style

Liu, Shanglin, Hua Fu, Sen Xie, Haotong Han, Hao Liu, Bing Han, and Peng Cui. 2026. "Research on Photovoltaic Output Power Forecasting Based on an Attention-Enhanced BiGRU Optimized by an Improved Marine Predators Algorithm" Symmetry 18, no. 2: 282. https://doi.org/10.3390/sym18020282

APA Style

Liu, S., Fu, H., Xie, S., Han, H., Liu, H., Han, B., & Cui, P. (2026). Research on Photovoltaic Output Power Forecasting Based on an Attention-Enhanced BiGRU Optimized by an Improved Marine Predators Algorithm. Symmetry, 18(2), 282. https://doi.org/10.3390/sym18020282

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop